Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU
On Tue, May 16, 2017 at 10:07:41AM +, Lofstedt, Marta wrote: > > > > -Original Message- > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > > Sent: Tuesday, May 16, 2017 12:48 PM > > To: Lofstedt, Marta> > Cc: Daniel Vetter ; Martin Peres > > ; intel-gfx@lists.freedesktop.org > > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to > > capture > > the state of the GPU > > > > On Tue, May 16, 2017 at 09:43:52AM +, Lofstedt, Marta wrote: > > > > > > > > > > -Original Message- > > > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > > > > Sent: Tuesday, May 16, 2017 12:04 PM > > > > To: Lofstedt, Marta > > > > Cc: Daniel Vetter ; Martin Peres > > > > ; intel-gfx@lists.freedesktop.org > > > > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a > > > > test to capture the state of the GPU > > > > > > > > On Tue, May 16, 2017 at 08:54:51AM +, Lofstedt, Marta wrote: > > > > > > > > > > > > > > > > -Original Message- > > > > > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > > > > > > Sent: Tuesday, May 16, 2017 11:21 AM > > > > > > To: Lofstedt, Marta > > > > > > Cc: Daniel Vetter ; Martin Peres > > > > > > ; intel-gfx@lists.freedesktop.org > > > > > > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add > > > > > > a test to capture the state of the GPU > > > > > > > > > > > > On Tue, May 16, 2017 at 07:42:51AM +, Lofstedt, Marta wrote: > > > > > > > I hereby pull-out this patch. > > > > > > > The idea of it was to know if we were already wedged at the > > > > > > > beginning of > > > > > > testing, that would give us information on how to interpret > > > > > > silly results; such that test starting to get skipped and/or we > > > > > > got dmesg-warns/incomplete on tests that usually should be skipped. > > > > > > > Also, we are planning to soon deploy a piglit.conf solution > > > > > > > where testing > > > > > > will be terminated on wedged, so I agree that my test isn't really > > needed. > > > > > > > > > > > > Not everything is broken by wedged; internally we just use that > > > > > > as an indicator that GEM is hosed. KMS should still work, we > > > > > > must still be able to drive the displays to show the error and > > > > > > keep the servers alive until the data is saved (and hopefully > > > > > > gracefully degrade that we don't have to interrupt their immediate > > session). > > > > > > > > > > It doesn't matter if it is broken or not, if we are terminally > > > > > wedged the rest > > > > of the result may be silly. Look for example at CI_DRM_2612, the > > > > fi-elk-e7500 is wedged at igt@gem_busy@basic-hang-default, then all > > > > test are skipped until gem_exec_reloc@basic-cpu-gtt-noreloc where > > > > the machine hangs, but it is a gem test so it should have been > > > > skipped, right. My conclusion from seeing this pattern multiple > > > > times is that after terminally wedged, silly things can happen, i.e. > > > > we can't trust the results, and since we don't want silly bugs, the CI > > testing should be stopped. > > > > > > > > The machine didn't hang, it was remotely killed because the run timed > > out. > > > How do you know that? > > > > The dmesg is a stream of flip timeouts until we run out of total BAT runtime > > (12 minutes + some startup slack). > > -Chris > > Then look at CI_DRM_2602, wedged at igt@gem_busy@basic-hang-default, after a > lot of skipping, we get incomplete result for another test, this time > gem_exec_reloc@basic-gtt-cpu-noreloc > > So, gem_exec_reloc@basic-cpu-gtt-noreloc and > gem_exec_reloc@basic-gtt-cpu-noreloc are falsely getting blamed and my > conclusion is that this is due to the permanent wedging started at > gem_busy@basic-hang-default. So, to avoid bug reports for > gem_exec_reloc@basic-cpu-gtt-noreloc and gem_exec_reloc@basic-gtt-cpu- > noreloc the suggestion is to stop testing after we are terminally wedged. The conclusion is still wrong though, it doesn't derive from the wedged state itself but from another bug that can be triggered by a recovered gpu hang. The issue is that post-processing isn't yet smart enough to determine the real cause and picks on the victim. We are always going to have that problem in some form, the question is what can be done to make the process smart to avoid false positives or rather just say this is a general bug and be careful not to be precise on the blame. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU
> -Original Message- > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > Sent: Tuesday, May 16, 2017 12:48 PM > To: Lofstedt, Marta> Cc: Daniel Vetter ; Martin Peres > ; intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to > capture > the state of the GPU > > On Tue, May 16, 2017 at 09:43:52AM +, Lofstedt, Marta wrote: > > > > > > > -Original Message- > > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > > > Sent: Tuesday, May 16, 2017 12:04 PM > > > To: Lofstedt, Marta > > > Cc: Daniel Vetter ; Martin Peres > > > ; intel-gfx@lists.freedesktop.org > > > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a > > > test to capture the state of the GPU > > > > > > On Tue, May 16, 2017 at 08:54:51AM +, Lofstedt, Marta wrote: > > > > > > > > > > > > > -Original Message- > > > > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > > > > > Sent: Tuesday, May 16, 2017 11:21 AM > > > > > To: Lofstedt, Marta > > > > > Cc: Daniel Vetter ; Martin Peres > > > > > ; intel-gfx@lists.freedesktop.org > > > > > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add > > > > > a test to capture the state of the GPU > > > > > > > > > > On Tue, May 16, 2017 at 07:42:51AM +, Lofstedt, Marta wrote: > > > > > > I hereby pull-out this patch. > > > > > > The idea of it was to know if we were already wedged at the > > > > > > beginning of > > > > > testing, that would give us information on how to interpret > > > > > silly results; such that test starting to get skipped and/or we > > > > > got dmesg-warns/incomplete on tests that usually should be skipped. > > > > > > Also, we are planning to soon deploy a piglit.conf solution > > > > > > where testing > > > > > will be terminated on wedged, so I agree that my test isn't really > needed. > > > > > > > > > > Not everything is broken by wedged; internally we just use that > > > > > as an indicator that GEM is hosed. KMS should still work, we > > > > > must still be able to drive the displays to show the error and > > > > > keep the servers alive until the data is saved (and hopefully > > > > > gracefully degrade that we don't have to interrupt their immediate > session). > > > > > > > > It doesn't matter if it is broken or not, if we are terminally > > > > wedged the rest > > > of the result may be silly. Look for example at CI_DRM_2612, the > > > fi-elk-e7500 is wedged at igt@gem_busy@basic-hang-default, then all > > > test are skipped until gem_exec_reloc@basic-cpu-gtt-noreloc where > > > the machine hangs, but it is a gem test so it should have been > > > skipped, right. My conclusion from seeing this pattern multiple > > > times is that after terminally wedged, silly things can happen, i.e. > > > we can't trust the results, and since we don't want silly bugs, the CI > testing should be stopped. > > > > > > The machine didn't hang, it was remotely killed because the run timed > out. > > How do you know that? > > The dmesg is a stream of flip timeouts until we run out of total BAT runtime > (12 minutes + some startup slack). > -Chris Then look at CI_DRM_2602, wedged at igt@gem_busy@basic-hang-default, after a lot of skipping, we get incomplete result for another test, this time gem_exec_reloc@basic-gtt-cpu-noreloc So, gem_exec_reloc@basic-cpu-gtt-noreloc and gem_exec_reloc@basic-gtt-cpu-noreloc are falsely getting blamed and my conclusion is that this is due to the permanent wedging started at gem_busy@basic-hang-default. So, to avoid bug reports for gem_exec_reloc@basic-cpu-gtt-noreloc and gem_exec_reloc@basic-gtt-cpu- noreloc the suggestion is to stop testing after we are terminally wedged. > > -- > Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU
On Tue, May 16, 2017 at 09:43:52AM +, Lofstedt, Marta wrote: > > > > -Original Message- > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > > Sent: Tuesday, May 16, 2017 12:04 PM > > To: Lofstedt, Marta> > Cc: Daniel Vetter ; Martin Peres > > ; intel-gfx@lists.freedesktop.org > > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to > > capture > > the state of the GPU > > > > On Tue, May 16, 2017 at 08:54:51AM +, Lofstedt, Marta wrote: > > > > > > > > > > -Original Message- > > > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > > > > Sent: Tuesday, May 16, 2017 11:21 AM > > > > To: Lofstedt, Marta > > > > Cc: Daniel Vetter ; Martin Peres > > > > ; intel-gfx@lists.freedesktop.org > > > > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a > > > > test to capture the state of the GPU > > > > > > > > On Tue, May 16, 2017 at 07:42:51AM +, Lofstedt, Marta wrote: > > > > > I hereby pull-out this patch. > > > > > The idea of it was to know if we were already wedged at the > > > > > beginning of > > > > testing, that would give us information on how to interpret silly > > > > results; such that test starting to get skipped and/or we got > > > > dmesg-warns/incomplete on tests that usually should be skipped. > > > > > Also, we are planning to soon deploy a piglit.conf solution where > > > > > testing > > > > will be terminated on wedged, so I agree that my test isn't really > > > > needed. > > > > > > > > Not everything is broken by wedged; internally we just use that as > > > > an indicator that GEM is hosed. KMS should still work, we must still > > > > be able to drive the displays to show the error and keep the servers > > > > alive until the data is saved (and hopefully gracefully degrade that > > > > we don't have to interrupt their immediate session). > > > > > > It doesn't matter if it is broken or not, if we are terminally wedged the > > > rest > > of the result may be silly. Look for example at CI_DRM_2612, the > > fi-elk-e7500 > > is wedged at igt@gem_busy@basic-hang-default, then all test are skipped > > until gem_exec_reloc@basic-cpu-gtt-noreloc where the machine hangs, but > > it is a gem test so it should have been skipped, right. My conclusion from > > seeing this pattern multiple times is that after terminally wedged, silly > > things > > can happen, i.e. we can't trust the results, and since we don't want silly > > bugs, > > the CI testing should be stopped. > > > > The machine didn't hang, it was remotely killed because the run timed out. > How do you know that? The dmesg is a stream of flip timeouts until we run out of total BAT runtime (12 minutes + some startup slack). -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU
> -Original Message- > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > Sent: Tuesday, May 16, 2017 12:04 PM > To: Lofstedt, Marta> Cc: Daniel Vetter ; Martin Peres > ; intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to > capture > the state of the GPU > > On Tue, May 16, 2017 at 08:54:51AM +, Lofstedt, Marta wrote: > > > > > > > -Original Message- > > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > > > Sent: Tuesday, May 16, 2017 11:21 AM > > > To: Lofstedt, Marta > > > Cc: Daniel Vetter ; Martin Peres > > > ; intel-gfx@lists.freedesktop.org > > > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a > > > test to capture the state of the GPU > > > > > > On Tue, May 16, 2017 at 07:42:51AM +, Lofstedt, Marta wrote: > > > > I hereby pull-out this patch. > > > > The idea of it was to know if we were already wedged at the > > > > beginning of > > > testing, that would give us information on how to interpret silly > > > results; such that test starting to get skipped and/or we got > > > dmesg-warns/incomplete on tests that usually should be skipped. > > > > Also, we are planning to soon deploy a piglit.conf solution where > > > > testing > > > will be terminated on wedged, so I agree that my test isn't really needed. > > > > > > Not everything is broken by wedged; internally we just use that as > > > an indicator that GEM is hosed. KMS should still work, we must still > > > be able to drive the displays to show the error and keep the servers > > > alive until the data is saved (and hopefully gracefully degrade that > > > we don't have to interrupt their immediate session). > > > > It doesn't matter if it is broken or not, if we are terminally wedged the > > rest > of the result may be silly. Look for example at CI_DRM_2612, the fi-elk-e7500 > is wedged at igt@gem_busy@basic-hang-default, then all test are skipped > until gem_exec_reloc@basic-cpu-gtt-noreloc where the machine hangs, but > it is a gem test so it should have been skipped, right. My conclusion from > seeing this pattern multiple times is that after terminally wedged, silly > things > can happen, i.e. we can't trust the results, and since we don't want silly > bugs, > the CI testing should be stopped. > > The machine didn't hang, it was remotely killed because the run timed out. How do you know that? > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU
On Tue, May 16, 2017 at 08:54:51AM +, Lofstedt, Marta wrote: > > > > -Original Message- > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > > Sent: Tuesday, May 16, 2017 11:21 AM > > To: Lofstedt, Marta> > Cc: Daniel Vetter ; Martin Peres > > ; intel-gfx@lists.freedesktop.org > > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to > > capture > > the state of the GPU > > > > On Tue, May 16, 2017 at 07:42:51AM +, Lofstedt, Marta wrote: > > > I hereby pull-out this patch. > > > The idea of it was to know if we were already wedged at the beginning of > > testing, that would give us information on how to interpret silly results; > > such > > that test starting to get skipped and/or we got dmesg-warns/incomplete on > > tests that usually should be skipped. > > > Also, we are planning to soon deploy a piglit.conf solution where testing > > will be terminated on wedged, so I agree that my test isn't really needed. > > > > Not everything is broken by wedged; internally we just use that as an > > indicator that GEM is hosed. KMS should still work, we must still be able to > > drive the displays to show the error and keep the servers alive until the > > data > > is saved (and hopefully gracefully degrade that we don't have to interrupt > > their immediate session). > > It doesn't matter if it is broken or not, if we are terminally wedged the > rest of the result may be silly. Look for example at CI_DRM_2612, the > fi-elk-e7500 is wedged at igt@gem_busy@basic-hang-default, then all test are > skipped until gem_exec_reloc@basic-cpu-gtt-noreloc where the machine hangs, > but it is a gem test so it should have been skipped, right. My conclusion > from seeing this pattern multiple times is that after terminally wedged, > silly things can happen, i.e. we can't trust the results, and since we don't > want silly bugs, the CI testing should be stopped. Oh, and note it was running so slowly because of a KMS bug that is quite common. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU
On Tue, May 16, 2017 at 08:54:51AM +, Lofstedt, Marta wrote: > > > > -Original Message- > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > > Sent: Tuesday, May 16, 2017 11:21 AM > > To: Lofstedt, Marta> > Cc: Daniel Vetter ; Martin Peres > > ; intel-gfx@lists.freedesktop.org > > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to > > capture > > the state of the GPU > > > > On Tue, May 16, 2017 at 07:42:51AM +, Lofstedt, Marta wrote: > > > I hereby pull-out this patch. > > > The idea of it was to know if we were already wedged at the beginning of > > testing, that would give us information on how to interpret silly results; > > such > > that test starting to get skipped and/or we got dmesg-warns/incomplete on > > tests that usually should be skipped. > > > Also, we are planning to soon deploy a piglit.conf solution where testing > > will be terminated on wedged, so I agree that my test isn't really needed. > > > > Not everything is broken by wedged; internally we just use that as an > > indicator that GEM is hosed. KMS should still work, we must still be able to > > drive the displays to show the error and keep the servers alive until the > > data > > is saved (and hopefully gracefully degrade that we don't have to interrupt > > their immediate session). > > It doesn't matter if it is broken or not, if we are terminally wedged the > rest of the result may be silly. Look for example at CI_DRM_2612, the > fi-elk-e7500 is wedged at igt@gem_busy@basic-hang-default, then all test are > skipped until gem_exec_reloc@basic-cpu-gtt-noreloc where the machine hangs, > but it is a gem test so it should have been skipped, right. My conclusion > from seeing this pattern multiple times is that after terminally wedged, > silly things can happen, i.e. we can't trust the results, and since we don't > want silly bugs, the CI testing should be stopped. The machine didn't hang, it was remotely killed because the run timed out. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU
> -Original Message- > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > Sent: Tuesday, May 16, 2017 11:21 AM > To: Lofstedt, Marta> Cc: Daniel Vetter ; Martin Peres > ; intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to > capture > the state of the GPU > > On Tue, May 16, 2017 at 07:42:51AM +, Lofstedt, Marta wrote: > > I hereby pull-out this patch. > > The idea of it was to know if we were already wedged at the beginning of > testing, that would give us information on how to interpret silly results; > such > that test starting to get skipped and/or we got dmesg-warns/incomplete on > tests that usually should be skipped. > > Also, we are planning to soon deploy a piglit.conf solution where testing > will be terminated on wedged, so I agree that my test isn't really needed. > > Not everything is broken by wedged; internally we just use that as an > indicator that GEM is hosed. KMS should still work, we must still be able to > drive the displays to show the error and keep the servers alive until the data > is saved (and hopefully gracefully degrade that we don't have to interrupt > their immediate session). It doesn't matter if it is broken or not, if we are terminally wedged the rest of the result may be silly. Look for example at CI_DRM_2612, the fi-elk-e7500 is wedged at igt@gem_busy@basic-hang-default, then all test are skipped until gem_exec_reloc@basic-cpu-gtt-noreloc where the machine hangs, but it is a gem test so it should have been skipped, right. My conclusion from seeing this pattern multiple times is that after terminally wedged, silly things can happen, i.e. we can't trust the results, and since we don't want silly bugs, the CI testing should be stopped. > > Whilst that may be out of scope for BAT, it should not be completely beyond > us for robustness testing. > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU
On Tue, May 16, 2017 at 07:42:51AM +, Lofstedt, Marta wrote: > I hereby pull-out this patch. > The idea of it was to know if we were already wedged at the beginning of > testing, that would give us information on how to interpret silly results; > such that test starting to get skipped and/or we got dmesg-warns/incomplete > on tests that usually should be skipped. > Also, we are planning to soon deploy a piglit.conf solution where testing > will be terminated on wedged, so I agree that my test isn't really needed. Not everything is broken by wedged; internally we just use that as an indicator that GEM is hosed. KMS should still work, we must still be able to drive the displays to show the error and keep the servers alive until the data is saved (and hopefully gracefully degrade that we don't have to interrupt their immediate session). Whilst that may be out of scope for BAT, it should not be completely beyond us for robustness testing. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU
I hereby pull-out this patch. The idea of it was to know if we were already wedged at the beginning of testing, that would give us information on how to interpret silly results; such that test starting to get skipped and/or we got dmesg-warns/incomplete on tests that usually should be skipped. Also, we are planning to soon deploy a piglit.conf solution where testing will be terminated on wedged, so I agree that my test isn't really needed. /Marta > -Original Message- > From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel > Vetter > Sent: Monday, May 15, 2017 5:01 PM > To: Chris Wilson; Martin Peres > ; Lofstedt, Marta > ; intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to > capture > the state of the GPU > > On Fri, May 12, 2017 at 02:17:29PM +0100, Chris Wilson wrote: > > The ABI for testing whether the device is wedged before use is > > gem_throttle(). This is encapsulated into igt_require_gem() and any > > test that depends upon execution on the GPU should be checking for GEM > first. > > > > This test should just be > > igt_gem_require(drm_open_driver(DRIVER_INTEL)); > > Well, how do you expect people to spot your nifty tools and refactorings if > they don't show up on the m-l, don't get acks or reviews, and don't come > with api-docs? At least neither gmail nor google did find that patch submitted > anywhere. > > Marta, can you pls review Chris' patch: > > commit 9518cb59abe35143f15abac2b7ffdb99820ef53c > Author: Chris Wilson > Date: Wed Feb 22 15:24:54 2017 + > > igt: Start marking up GEM tests that require an alive GPU to function > > Best would be to check that all gem tests really use this, and then also add > abi > docs for this new helper. > > Also, since we have igt_require_gem() already, what's the value of this > additional testcase? If we want to reboot machines if the gpu died, then we > need to have such a test at a higher level, in piglit. What we could do is > add a > special exit code to igt that indicates a runtime problem (and tells piglit to > reboot), essentially PLEASE_REBOOT instead of SKIP (which is what > igt_require_gem() results in). > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU
On Mon, May 15, 2017 at 04:00:55PM +0200, Daniel Vetter wrote: > On Fri, May 12, 2017 at 02:17:29PM +0100, Chris Wilson wrote: > > The ABI for testing whether the device is wedged before use is > > gem_throttle(). This is encapsulated into igt_require_gem() and any test > > that depends upon execution on the GPU should be checking for GEM first. > > > > This test should just be igt_gem_require(drm_open_driver(DRIVER_INTEL)); > > Well, how do you expect people to spot your nifty tools and refactorings The ABI was established in commit 537e73f3f935b917f2f5f9b51499cb29d65e3889 Author: Chris WilsonDate: Fri Sep 24 17:37:41 2010 +0100 Disable dri2 after forcing fallbacks -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU
On Fri, May 12, 2017 at 02:17:29PM +0100, Chris Wilson wrote: > The ABI for testing whether the device is wedged before use is > gem_throttle(). This is encapsulated into igt_require_gem() and any test > that depends upon execution on the GPU should be checking for GEM first. > > This test should just be igt_gem_require(drm_open_driver(DRIVER_INTEL)); Well, how do you expect people to spot your nifty tools and refactorings if they don't show up on the m-l, don't get acks or reviews, and don't come with api-docs? At least neither gmail nor google did find that patch submitted anywhere. Marta, can you pls review Chris' patch: commit 9518cb59abe35143f15abac2b7ffdb99820ef53c Author: Chris WilsonDate: Wed Feb 22 15:24:54 2017 + igt: Start marking up GEM tests that require an alive GPU to function Best would be to check that all gem tests really use this, and then also add abi docs for this new helper. Also, since we have igt_require_gem() already, what's the value of this additional testcase? If we want to reboot machines if the gpu died, then we need to have such a test at a higher level, in piglit. What we could do is add a special exit code to igt that indicates a runtime problem (and tells piglit to reboot), essentially PLEASE_REBOOT instead of SKIP (which is what igt_require_gem() results in). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU
On Fri, May 12, 2017 at 04:08:48PM +0300, Martin Peres wrote: > The commit name "Add a test to capture the state of the GPU" should > likely be renamed to "Add a test that checks the current state of > the driver". > > On 12/05/17 14:07, Marta Lofstedt wrote: > >When running testlist for CI iot would be good to know if > > iot -> it > > >the state of the GPU is as expected. This adds a subtest to check > >if the GPU is wedged. > > This is not mandatory, but how about adding: > > "When coupled with piglit's abort-on feature, this makes the piglit > results contain the reason for the abort when the driver's initial > state is unacceptable, improving debuggability" > > With the commit title and the typo fixed, the patch is: > Acked-by: Martin Peres> > > > >Signed-off-by: Marta Lofstedt > >--- > > tests/Makefile.sources| 1 + > > tests/initial_state.c | 52 > > +++ > > tests/intel-ci/extended.testlist | 1 + > > tests/intel-ci/fast-feedback.testlist | 1 + > > 4 files changed, 55 insertions(+) > > create mode 100644 tests/initial_state.c > > > >diff --git a/tests/Makefile.sources b/tests/Makefile.sources > >index 9553e4d9..32431a05 100644 > >--- a/tests/Makefile.sources > >+++ b/tests/Makefile.sources > >@@ -152,6 +152,7 @@ TESTS_progs_M = \ > > vgem_basic \ > > vgem_slow \ > > meta_test \ > >+initial_state \ > > $(NULL) > > if HAVE_CHAMELIUM > >diff --git a/tests/initial_state.c b/tests/initial_state.c > >new file mode 100644 > >index ..1c5d9a74 > >--- /dev/null > >+++ b/tests/initial_state.c > >@@ -0,0 +1,52 @@ > >+/* > >+ * Copyright © 2017 Intel Corporation > >+ * > >+ * Permission is hereby granted, free of charge, to any person obtaining a > >+ * copy of this software and associated documentation files (the > >"Software"), > >+ * to deal in the Software without restriction, including without limitation > >+ * the rights to use, copy, modify, merge, publish, distribute, sublicense, > >+ * and/or sell copies of the Software, and to permit persons to whom the > >+ * Software is furnished to do so, subject to the following conditions: > >+ * > >+ * The above copyright notice and this permission notice (including the next > >+ * paragraph) shall be included in all copies or substantial portions of the > >+ * Software. > >+ * > >+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS > >OR > >+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > >+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > >+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR > >OTHER > >+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > >+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > >DEALINGS > >+ * IN THE SOFTWARE. > >+ * > >+ */ > >+ > >+#include > >+#include > >+#include "igt.h" > >+ > >+/* > >+ * The purpose of this test is to capture the > >+ * state of the GPU before running a testlist. > >+ * > >+ * Note, this test currently only checks if > >+ * i915 is wedged. But, it could be extended to report > >+ * on other bad initial states. > >+ */ > >+ > >+static void is_wedged(void) > >+{ > >+char buf[128]; > >+int fd = drm_open_driver(DRIVER_INTEL); > >+ > >+igt_debugfs_read(fd, "i915_wedged", buf); The ABI for testing whether the device is wedged before use is gem_throttle(). This is encapsulated into igt_require_gem() and any test that depends upon execution on the GPU should be checking for GEM first. This test should just be igt_gem_require(drm_open_driver(DRIVER_INTEL)); -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU
The commit name "Add a test to capture the state of the GPU" should likely be renamed to "Add a test that checks the current state of the driver". On 12/05/17 14:07, Marta Lofstedt wrote: When running testlist for CI iot would be good to know if iot -> it the state of the GPU is as expected. This adds a subtest to check if the GPU is wedged. This is not mandatory, but how about adding: "When coupled with piglit's abort-on feature, this makes the piglit results contain the reason for the abort when the driver's initial state is unacceptable, improving debuggability" With the commit title and the typo fixed, the patch is: Acked-by: Martin PeresSigned-off-by: Marta Lofstedt --- tests/Makefile.sources| 1 + tests/initial_state.c | 52 +++ tests/intel-ci/extended.testlist | 1 + tests/intel-ci/fast-feedback.testlist | 1 + 4 files changed, 55 insertions(+) create mode 100644 tests/initial_state.c diff --git a/tests/Makefile.sources b/tests/Makefile.sources index 9553e4d9..32431a05 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -152,6 +152,7 @@ TESTS_progs_M = \ vgem_basic \ vgem_slow \ meta_test \ + initial_state \ $(NULL) if HAVE_CHAMELIUM diff --git a/tests/initial_state.c b/tests/initial_state.c new file mode 100644 index ..1c5d9a74 --- /dev/null +++ b/tests/initial_state.c @@ -0,0 +1,52 @@ +/* + * Copyright © 2017 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + */ + +#include +#include +#include "igt.h" + +/* + * The purpose of this test is to capture the + * state of the GPU before running a testlist. + * + * Note, this test currently only checks if + * i915 is wedged. But, it could be extended to report + * on other bad initial states. + */ + +static void is_wedged(void) +{ + char buf[128]; + int fd = drm_open_driver(DRIVER_INTEL); + + igt_debugfs_read(fd, "i915_wedged", buf); + igt_assert(!strstr(buf, "1")); +} + +igt_main +{ + + igt_subtest("is_wedged") + is_wedged(); +} diff --git a/tests/intel-ci/extended.testlist b/tests/intel-ci/extended.testlist index a16c9c84..2ec08b55 100644 --- a/tests/intel-ci/extended.testlist +++ b/tests/intel-ci/extended.testlist @@ -1,3 +1,4 @@ +igt@initial_state@is_wedged igt@core_auth@many-magics igt@core_getclient igt@core_get_client_auth@master-drop diff --git a/tests/intel-ci/fast-feedback.testlist b/tests/intel-ci/fast-feedback.testlist index 5ffa2cef..a6a0a810 100644 --- a/tests/intel-ci/fast-feedback.testlist +++ b/tests/intel-ci/fast-feedback.testlist @@ -1,3 +1,4 @@ +igt@initial_state@is_wedged igt@core_auth@basic-auth igt@core_prop_blob@basic igt@drv_getparams_basic@basic-eu-total ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU
When running testlist for CI iot would be good to know if the state of the GPU is as expected. This adds a subtest to check if the GPU is wedged. Signed-off-by: Marta Lofstedt--- tests/Makefile.sources| 1 + tests/initial_state.c | 52 +++ tests/intel-ci/extended.testlist | 1 + tests/intel-ci/fast-feedback.testlist | 1 + 4 files changed, 55 insertions(+) create mode 100644 tests/initial_state.c diff --git a/tests/Makefile.sources b/tests/Makefile.sources index 9553e4d9..32431a05 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -152,6 +152,7 @@ TESTS_progs_M = \ vgem_basic \ vgem_slow \ meta_test \ + initial_state \ $(NULL) if HAVE_CHAMELIUM diff --git a/tests/initial_state.c b/tests/initial_state.c new file mode 100644 index ..1c5d9a74 --- /dev/null +++ b/tests/initial_state.c @@ -0,0 +1,52 @@ +/* + * Copyright © 2017 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + */ + +#include +#include +#include "igt.h" + +/* + * The purpose of this test is to capture the + * state of the GPU before running a testlist. + * + * Note, this test currently only checks if + * i915 is wedged. But, it could be extended to report + * on other bad initial states. + */ + +static void is_wedged(void) +{ + char buf[128]; + int fd = drm_open_driver(DRIVER_INTEL); + + igt_debugfs_read(fd, "i915_wedged", buf); + igt_assert(!strstr(buf, "1")); +} + +igt_main +{ + + igt_subtest("is_wedged") + is_wedged(); +} diff --git a/tests/intel-ci/extended.testlist b/tests/intel-ci/extended.testlist index a16c9c84..2ec08b55 100644 --- a/tests/intel-ci/extended.testlist +++ b/tests/intel-ci/extended.testlist @@ -1,3 +1,4 @@ +igt@initial_state@is_wedged igt@core_auth@many-magics igt@core_getclient igt@core_get_client_auth@master-drop diff --git a/tests/intel-ci/fast-feedback.testlist b/tests/intel-ci/fast-feedback.testlist index 5ffa2cef..a6a0a810 100644 --- a/tests/intel-ci/fast-feedback.testlist +++ b/tests/intel-ci/fast-feedback.testlist @@ -1,3 +1,4 @@ +igt@initial_state@is_wedged igt@core_auth@basic-auth igt@core_prop_blob@basic igt@drv_getparams_basic@basic-eu-total -- 2.11.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx