Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU

2017-05-16 Thread Chris Wilson
On Tue, May 16, 2017 at 10:07:41AM +, Lofstedt, Marta wrote:
> 
> 
> > -Original Message-
> > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
> > Sent: Tuesday, May 16, 2017 12:48 PM
> > To: Lofstedt, Marta 
> > Cc: Daniel Vetter ; Martin Peres
> > ; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to 
> > capture
> > the state of the GPU
> > 
> > On Tue, May 16, 2017 at 09:43:52AM +, Lofstedt, Marta wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
> > > > Sent: Tuesday, May 16, 2017 12:04 PM
> > > > To: Lofstedt, Marta 
> > > > Cc: Daniel Vetter ; Martin Peres
> > > > ; intel-gfx@lists.freedesktop.org
> > > > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a
> > > > test to capture the state of the GPU
> > > >
> > > > On Tue, May 16, 2017 at 08:54:51AM +, Lofstedt, Marta wrote:
> > > > >
> > > > >
> > > > > > -Original Message-
> > > > > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
> > > > > > Sent: Tuesday, May 16, 2017 11:21 AM
> > > > > > To: Lofstedt, Marta 
> > > > > > Cc: Daniel Vetter ; Martin Peres
> > > > > > ; intel-gfx@lists.freedesktop.org
> > > > > > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add
> > > > > > a test to capture the state of the GPU
> > > > > >
> > > > > > On Tue, May 16, 2017 at 07:42:51AM +, Lofstedt, Marta wrote:
> > > > > > > I hereby pull-out this patch.
> > > > > > > The idea of it was to know if we were already wedged at the
> > > > > > > beginning of
> > > > > > testing, that would give us information on how to interpret
> > > > > > silly results; such that test starting to get skipped and/or we
> > > > > > got dmesg-warns/incomplete on tests that usually should be skipped.
> > > > > > > Also, we are planning to soon deploy a piglit.conf solution
> > > > > > > where testing
> > > > > > will be terminated on wedged, so I agree that my test isn't really
> > needed.
> > > > > >
> > > > > > Not everything is broken by wedged; internally we just use that
> > > > > > as an indicator that GEM is hosed. KMS should still work, we
> > > > > > must still be able to drive the displays to show the error and
> > > > > > keep the servers alive until the data is saved (and hopefully
> > > > > > gracefully degrade that we don't have to interrupt their immediate
> > session).
> > > > >
> > > > > It doesn't matter if it is broken or not, if we are terminally
> > > > > wedged the rest
> > > > of the result may be silly. Look for example at CI_DRM_2612, the
> > > > fi-elk-e7500 is wedged at igt@gem_busy@basic-hang-default, then all
> > > > test are skipped until gem_exec_reloc@basic-cpu-gtt-noreloc where
> > > > the machine hangs, but it is a gem test so it should have been
> > > > skipped, right. My conclusion from seeing this pattern multiple
> > > > times is that after terminally wedged, silly things can happen, i.e.
> > > > we can't trust the results, and since we don't want silly bugs, the CI
> > testing should be stopped.
> > > >
> > > > The machine didn't hang, it was remotely killed because the run timed
> > out.
> > > How do you know that?
> > 
> > The dmesg is a stream of flip timeouts until we run out of total BAT runtime
> > (12 minutes + some startup slack).
> > -Chris
> 
> Then look at CI_DRM_2602, wedged at igt@gem_busy@basic-hang-default, after a 
> lot of skipping, we get incomplete result for another test, this time 
> gem_exec_reloc@basic-gtt-cpu-noreloc
> 
> So, gem_exec_reloc@basic-cpu-gtt-noreloc and 
> gem_exec_reloc@basic-gtt-cpu-noreloc are falsely getting blamed and my 
> conclusion is that this is due to the permanent wedging started at 
> gem_busy@basic-hang-default. So, to avoid bug reports for 
> gem_exec_reloc@basic-cpu-gtt-noreloc and gem_exec_reloc@basic-gtt-cpu- 
> noreloc the suggestion is to stop testing after we are terminally wedged. 

The conclusion is still wrong though, it doesn't derive from the wedged
state itself but from another bug that can be triggered by a recovered
gpu hang. The issue is that post-processing isn't yet smart enough to
determine the real cause and picks on the victim. We are always going to
have that problem in some form, the question is what can be done to make
the process smart to avoid false positives or rather just say this is a
general bug and be careful not to be precise on the blame.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU

2017-05-16 Thread Lofstedt, Marta


> -Original Message-
> From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
> Sent: Tuesday, May 16, 2017 12:48 PM
> To: Lofstedt, Marta 
> Cc: Daniel Vetter ; Martin Peres
> ; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to 
> capture
> the state of the GPU
> 
> On Tue, May 16, 2017 at 09:43:52AM +, Lofstedt, Marta wrote:
> >
> >
> > > -Original Message-
> > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
> > > Sent: Tuesday, May 16, 2017 12:04 PM
> > > To: Lofstedt, Marta 
> > > Cc: Daniel Vetter ; Martin Peres
> > > ; intel-gfx@lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a
> > > test to capture the state of the GPU
> > >
> > > On Tue, May 16, 2017 at 08:54:51AM +, Lofstedt, Marta wrote:
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
> > > > > Sent: Tuesday, May 16, 2017 11:21 AM
> > > > > To: Lofstedt, Marta 
> > > > > Cc: Daniel Vetter ; Martin Peres
> > > > > ; intel-gfx@lists.freedesktop.org
> > > > > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add
> > > > > a test to capture the state of the GPU
> > > > >
> > > > > On Tue, May 16, 2017 at 07:42:51AM +, Lofstedt, Marta wrote:
> > > > > > I hereby pull-out this patch.
> > > > > > The idea of it was to know if we were already wedged at the
> > > > > > beginning of
> > > > > testing, that would give us information on how to interpret
> > > > > silly results; such that test starting to get skipped and/or we
> > > > > got dmesg-warns/incomplete on tests that usually should be skipped.
> > > > > > Also, we are planning to soon deploy a piglit.conf solution
> > > > > > where testing
> > > > > will be terminated on wedged, so I agree that my test isn't really
> needed.
> > > > >
> > > > > Not everything is broken by wedged; internally we just use that
> > > > > as an indicator that GEM is hosed. KMS should still work, we
> > > > > must still be able to drive the displays to show the error and
> > > > > keep the servers alive until the data is saved (and hopefully
> > > > > gracefully degrade that we don't have to interrupt their immediate
> session).
> > > >
> > > > It doesn't matter if it is broken or not, if we are terminally
> > > > wedged the rest
> > > of the result may be silly. Look for example at CI_DRM_2612, the
> > > fi-elk-e7500 is wedged at igt@gem_busy@basic-hang-default, then all
> > > test are skipped until gem_exec_reloc@basic-cpu-gtt-noreloc where
> > > the machine hangs, but it is a gem test so it should have been
> > > skipped, right. My conclusion from seeing this pattern multiple
> > > times is that after terminally wedged, silly things can happen, i.e.
> > > we can't trust the results, and since we don't want silly bugs, the CI
> testing should be stopped.
> > >
> > > The machine didn't hang, it was remotely killed because the run timed
> out.
> > How do you know that?
> 
> The dmesg is a stream of flip timeouts until we run out of total BAT runtime
> (12 minutes + some startup slack).
> -Chris

Then look at CI_DRM_2602, wedged at igt@gem_busy@basic-hang-default, after a 
lot of skipping, we get incomplete result for another test, this time 
gem_exec_reloc@basic-gtt-cpu-noreloc

So, gem_exec_reloc@basic-cpu-gtt-noreloc and 
gem_exec_reloc@basic-gtt-cpu-noreloc are falsely getting blamed and my 
conclusion is that this is due to the permanent wedging started at 
gem_busy@basic-hang-default. So, to avoid bug reports for 
gem_exec_reloc@basic-cpu-gtt-noreloc and gem_exec_reloc@basic-gtt-cpu- noreloc 
the suggestion is to stop testing after we are terminally wedged. 

> 
> --
> Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU

2017-05-16 Thread Chris Wilson
On Tue, May 16, 2017 at 09:43:52AM +, Lofstedt, Marta wrote:
> 
> 
> > -Original Message-
> > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
> > Sent: Tuesday, May 16, 2017 12:04 PM
> > To: Lofstedt, Marta 
> > Cc: Daniel Vetter ; Martin Peres
> > ; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to 
> > capture
> > the state of the GPU
> > 
> > On Tue, May 16, 2017 at 08:54:51AM +, Lofstedt, Marta wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
> > > > Sent: Tuesday, May 16, 2017 11:21 AM
> > > > To: Lofstedt, Marta 
> > > > Cc: Daniel Vetter ; Martin Peres
> > > > ; intel-gfx@lists.freedesktop.org
> > > > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a
> > > > test to capture the state of the GPU
> > > >
> > > > On Tue, May 16, 2017 at 07:42:51AM +, Lofstedt, Marta wrote:
> > > > > I hereby pull-out this patch.
> > > > > The idea of it was to know if we were already wedged at the
> > > > > beginning of
> > > > testing, that would give us information on how to interpret silly
> > > > results; such that test starting to get skipped and/or we got
> > > > dmesg-warns/incomplete on tests that usually should be skipped.
> > > > > Also, we are planning to soon deploy a piglit.conf solution where
> > > > > testing
> > > > will be terminated on wedged, so I agree that my test isn't really 
> > > > needed.
> > > >
> > > > Not everything is broken by wedged; internally we just use that as
> > > > an indicator that GEM is hosed. KMS should still work, we must still
> > > > be able to drive the displays to show the error and keep the servers
> > > > alive until the data is saved (and hopefully gracefully degrade that
> > > > we don't have to interrupt their immediate session).
> > >
> > > It doesn't matter if it is broken or not, if we are terminally wedged the 
> > > rest
> > of the result may be silly. Look for example at CI_DRM_2612, the 
> > fi-elk-e7500
> > is wedged at igt@gem_busy@basic-hang-default, then all test are skipped
> > until gem_exec_reloc@basic-cpu-gtt-noreloc where the machine hangs, but
> > it is a gem test so it should have been skipped, right. My conclusion from
> > seeing this pattern multiple times is that after terminally wedged, silly 
> > things
> > can happen, i.e. we can't trust the results, and since we don't want silly 
> > bugs,
> > the CI testing should be stopped.
> > 
> > The machine didn't hang, it was remotely killed because the run timed out.
> How do you know that?

The dmesg is a stream of flip timeouts until we run out of total BAT
runtime (12 minutes + some startup slack).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU

2017-05-16 Thread Lofstedt, Marta


> -Original Message-
> From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
> Sent: Tuesday, May 16, 2017 12:04 PM
> To: Lofstedt, Marta 
> Cc: Daniel Vetter ; Martin Peres
> ; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to 
> capture
> the state of the GPU
> 
> On Tue, May 16, 2017 at 08:54:51AM +, Lofstedt, Marta wrote:
> >
> >
> > > -Original Message-
> > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
> > > Sent: Tuesday, May 16, 2017 11:21 AM
> > > To: Lofstedt, Marta 
> > > Cc: Daniel Vetter ; Martin Peres
> > > ; intel-gfx@lists.freedesktop.org
> > > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a
> > > test to capture the state of the GPU
> > >
> > > On Tue, May 16, 2017 at 07:42:51AM +, Lofstedt, Marta wrote:
> > > > I hereby pull-out this patch.
> > > > The idea of it was to know if we were already wedged at the
> > > > beginning of
> > > testing, that would give us information on how to interpret silly
> > > results; such that test starting to get skipped and/or we got
> > > dmesg-warns/incomplete on tests that usually should be skipped.
> > > > Also, we are planning to soon deploy a piglit.conf solution where
> > > > testing
> > > will be terminated on wedged, so I agree that my test isn't really needed.
> > >
> > > Not everything is broken by wedged; internally we just use that as
> > > an indicator that GEM is hosed. KMS should still work, we must still
> > > be able to drive the displays to show the error and keep the servers
> > > alive until the data is saved (and hopefully gracefully degrade that
> > > we don't have to interrupt their immediate session).
> >
> > It doesn't matter if it is broken or not, if we are terminally wedged the 
> > rest
> of the result may be silly. Look for example at CI_DRM_2612, the fi-elk-e7500
> is wedged at igt@gem_busy@basic-hang-default, then all test are skipped
> until gem_exec_reloc@basic-cpu-gtt-noreloc where the machine hangs, but
> it is a gem test so it should have been skipped, right. My conclusion from
> seeing this pattern multiple times is that after terminally wedged, silly 
> things
> can happen, i.e. we can't trust the results, and since we don't want silly 
> bugs,
> the CI testing should be stopped.
> 
> The machine didn't hang, it was remotely killed because the run timed out.
How do you know that?
> -Chris
> 
> --
> Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU

2017-05-16 Thread Chris Wilson
On Tue, May 16, 2017 at 08:54:51AM +, Lofstedt, Marta wrote:
> 
> 
> > -Original Message-
> > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
> > Sent: Tuesday, May 16, 2017 11:21 AM
> > To: Lofstedt, Marta 
> > Cc: Daniel Vetter ; Martin Peres
> > ; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to 
> > capture
> > the state of the GPU
> > 
> > On Tue, May 16, 2017 at 07:42:51AM +, Lofstedt, Marta wrote:
> > > I hereby pull-out this patch.
> > > The idea of it was to know if we were already wedged at the beginning of
> > testing, that would give us information on how to interpret silly results; 
> > such
> > that test starting to get skipped and/or we got dmesg-warns/incomplete on
> > tests that usually should be skipped.
> > > Also, we are planning to soon deploy a piglit.conf solution where testing
> > will be terminated on wedged, so I agree that my test isn't really needed.
> > 
> > Not everything is broken by wedged; internally we just use that as an
> > indicator that GEM is hosed. KMS should still work, we must still be able to
> > drive the displays to show the error and keep the servers alive until the 
> > data
> > is saved (and hopefully gracefully degrade that we don't have to interrupt
> > their immediate session).
> 
> It doesn't matter if it is broken or not, if we are terminally wedged the 
> rest of the result may be silly. Look for example at CI_DRM_2612, the 
> fi-elk-e7500 is wedged at igt@gem_busy@basic-hang-default, then all test are 
> skipped until gem_exec_reloc@basic-cpu-gtt-noreloc where the machine hangs, 
> but it is a gem test so it should have been skipped, right. My conclusion 
> from seeing this pattern multiple times is that after terminally wedged, 
> silly things can happen, i.e. we can't trust the results, and since we don't 
> want silly bugs, the CI testing should be stopped.

Oh, and note it was running so slowly because of a KMS bug that is quite
common.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU

2017-05-16 Thread Chris Wilson
On Tue, May 16, 2017 at 08:54:51AM +, Lofstedt, Marta wrote:
> 
> 
> > -Original Message-
> > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
> > Sent: Tuesday, May 16, 2017 11:21 AM
> > To: Lofstedt, Marta 
> > Cc: Daniel Vetter ; Martin Peres
> > ; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to 
> > capture
> > the state of the GPU
> > 
> > On Tue, May 16, 2017 at 07:42:51AM +, Lofstedt, Marta wrote:
> > > I hereby pull-out this patch.
> > > The idea of it was to know if we were already wedged at the beginning of
> > testing, that would give us information on how to interpret silly results; 
> > such
> > that test starting to get skipped and/or we got dmesg-warns/incomplete on
> > tests that usually should be skipped.
> > > Also, we are planning to soon deploy a piglit.conf solution where testing
> > will be terminated on wedged, so I agree that my test isn't really needed.
> > 
> > Not everything is broken by wedged; internally we just use that as an
> > indicator that GEM is hosed. KMS should still work, we must still be able to
> > drive the displays to show the error and keep the servers alive until the 
> > data
> > is saved (and hopefully gracefully degrade that we don't have to interrupt
> > their immediate session).
> 
> It doesn't matter if it is broken or not, if we are terminally wedged the 
> rest of the result may be silly. Look for example at CI_DRM_2612, the 
> fi-elk-e7500 is wedged at igt@gem_busy@basic-hang-default, then all test are 
> skipped until gem_exec_reloc@basic-cpu-gtt-noreloc where the machine hangs, 
> but it is a gem test so it should have been skipped, right. My conclusion 
> from seeing this pattern multiple times is that after terminally wedged, 
> silly things can happen, i.e. we can't trust the results, and since we don't 
> want silly bugs, the CI testing should be stopped.

The machine didn't hang, it was remotely killed because the run timed out.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU

2017-05-16 Thread Lofstedt, Marta


> -Original Message-
> From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
> Sent: Tuesday, May 16, 2017 11:21 AM
> To: Lofstedt, Marta 
> Cc: Daniel Vetter ; Martin Peres
> ; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to 
> capture
> the state of the GPU
> 
> On Tue, May 16, 2017 at 07:42:51AM +, Lofstedt, Marta wrote:
> > I hereby pull-out this patch.
> > The idea of it was to know if we were already wedged at the beginning of
> testing, that would give us information on how to interpret silly results; 
> such
> that test starting to get skipped and/or we got dmesg-warns/incomplete on
> tests that usually should be skipped.
> > Also, we are planning to soon deploy a piglit.conf solution where testing
> will be terminated on wedged, so I agree that my test isn't really needed.
> 
> Not everything is broken by wedged; internally we just use that as an
> indicator that GEM is hosed. KMS should still work, we must still be able to
> drive the displays to show the error and keep the servers alive until the data
> is saved (and hopefully gracefully degrade that we don't have to interrupt
> their immediate session).

It doesn't matter if it is broken or not, if we are terminally wedged the rest 
of the result may be silly. Look for example at CI_DRM_2612, the fi-elk-e7500 
is wedged at igt@gem_busy@basic-hang-default, then all test are skipped until 
gem_exec_reloc@basic-cpu-gtt-noreloc where the machine hangs, but it is a gem 
test so it should have been skipped, right. My conclusion from seeing this 
pattern multiple times is that after terminally wedged, silly things can 
happen, i.e. we can't trust the results, and since we don't want silly bugs, 
the CI testing should be stopped.

> 
> Whilst that may be out of scope for BAT, it should not be completely beyond
> us for robustness testing.
> -Chris
> 
> --
> Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU

2017-05-16 Thread Chris Wilson
On Tue, May 16, 2017 at 07:42:51AM +, Lofstedt, Marta wrote:
> I hereby pull-out this patch.
> The idea of it was to know if we were already wedged at the beginning of 
> testing, that would give us information on how to interpret silly results; 
> such that test starting to get skipped and/or we got dmesg-warns/incomplete 
> on tests that usually should be skipped.
> Also, we are planning to soon deploy a piglit.conf solution where testing 
> will be terminated on wedged, so I agree that my test isn't really needed.

Not everything is broken by wedged; internally we just use that as an
indicator that GEM is hosed. KMS should still work, we must still be
able to drive the displays to show the error and keep the servers alive
until the data is saved (and hopefully gracefully degrade that we don't
have to interrupt their immediate session).

Whilst that may be out of scope for BAT, it should not be completely
beyond us for robustness testing.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU

2017-05-16 Thread Lofstedt, Marta
I hereby pull-out this patch.
The idea of it was to know if we were already wedged at the beginning of 
testing, that would give us information on how to interpret silly results; such 
that test starting to get skipped and/or we got dmesg-warns/incomplete on tests 
that usually should be skipped.
Also, we are planning to soon deploy a piglit.conf solution where testing will 
be terminated on wedged, so I agree that my test isn't really needed.

/Marta

> -Original Message-
> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Monday, May 15, 2017 5:01 PM
> To: Chris Wilson ; Martin Peres
> ; Lofstedt, Marta
> ; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to 
> capture
> the state of the GPU
> 
> On Fri, May 12, 2017 at 02:17:29PM +0100, Chris Wilson wrote:
> > The ABI for testing whether the device is wedged before use is
> > gem_throttle(). This is encapsulated into igt_require_gem() and any
> > test that depends upon execution on the GPU should be checking for GEM
> first.
> >
> > This test should just be
> > igt_gem_require(drm_open_driver(DRIVER_INTEL));
> 
> Well, how do you expect people to spot your nifty tools and refactorings if
> they don't show up on the m-l, don't get acks or reviews, and don't come
> with api-docs? At least neither gmail nor google did find that patch submitted
> anywhere.
> 
> Marta, can you pls review Chris' patch:
> 
> commit 9518cb59abe35143f15abac2b7ffdb99820ef53c
> Author: Chris Wilson 
> Date:   Wed Feb 22 15:24:54 2017 +
> 
> igt: Start marking up GEM tests that require an alive GPU to function
> 
> Best would be to check that all gem tests really use this, and then also add 
> abi
> docs for this new helper.
> 
> Also, since we have igt_require_gem() already, what's the value of this
> additional testcase? If we want to reboot machines if the gpu died, then we
> need to have such a test at a higher level, in piglit. What we could do is 
> add a
> special exit code to igt that indicates a runtime problem (and tells piglit to
> reboot), essentially PLEASE_REBOOT instead of SKIP (which is what
> igt_require_gem() results in).
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU

2017-05-15 Thread Chris Wilson
On Mon, May 15, 2017 at 04:00:55PM +0200, Daniel Vetter wrote:
> On Fri, May 12, 2017 at 02:17:29PM +0100, Chris Wilson wrote:
> > The ABI for testing whether the device is wedged before use is
> > gem_throttle(). This is encapsulated into igt_require_gem() and any test
> > that depends upon execution on the GPU should be checking for GEM first.
> > 
> > This test should just be igt_gem_require(drm_open_driver(DRIVER_INTEL));
> 
> Well, how do you expect people to spot your nifty tools and refactorings

The ABI was established in

commit 537e73f3f935b917f2f5f9b51499cb29d65e3889
Author: Chris Wilson 
Date:   Fri Sep 24 17:37:41 2010 +0100

Disable dri2 after forcing fallbacks

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU

2017-05-15 Thread Daniel Vetter
On Fri, May 12, 2017 at 02:17:29PM +0100, Chris Wilson wrote:
> The ABI for testing whether the device is wedged before use is
> gem_throttle(). This is encapsulated into igt_require_gem() and any test
> that depends upon execution on the GPU should be checking for GEM first.
> 
> This test should just be igt_gem_require(drm_open_driver(DRIVER_INTEL));

Well, how do you expect people to spot your nifty tools and refactorings
if they don't show up on the m-l, don't get acks or reviews, and don't
come with api-docs? At least neither gmail nor google did find that patch
submitted anywhere.

Marta, can you pls review Chris' patch:

commit 9518cb59abe35143f15abac2b7ffdb99820ef53c 

Author: Chris Wilson  

Date:   Wed Feb 22 15:24:54 2017 +  


igt: Start marking up GEM tests that require an alive GPU to function

Best would be to check that all gem tests really use this, and then also
add abi docs for this new helper.

Also, since we have igt_require_gem() already, what's the value of this
additional testcase? If we want to reboot machines if the gpu died, then
we need to have such a test at a higher level, in piglit. What we could do
is add a special exit code to igt that indicates a runtime problem (and
tells piglit to reboot), essentially PLEASE_REBOOT instead of SKIP (which
is what igt_require_gem() results in).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU

2017-05-12 Thread Chris Wilson
On Fri, May 12, 2017 at 04:08:48PM +0300, Martin Peres wrote:
> The commit name "Add a test to capture the state of the GPU" should
> likely be renamed to "Add a test that checks the current state of
> the driver".
> 
> On 12/05/17 14:07, Marta Lofstedt wrote:
> >When running testlist for CI iot would be good to know if
> 
> iot -> it
> 
> >the state of the GPU is as expected. This adds a subtest to check
> >if the GPU is wedged.
> 
> This is not mandatory, but how about adding:
> 
> "When coupled with piglit's abort-on feature, this makes the piglit
> results contain the reason for the abort when the driver's initial
> state is unacceptable, improving debuggability"
> 
> With the commit title and the typo fixed, the patch is:
> Acked-by: Martin Peres 
> 
> >
> >Signed-off-by: Marta Lofstedt 
> >---
> >  tests/Makefile.sources|  1 +
> >  tests/initial_state.c | 52 
> > +++
> >  tests/intel-ci/extended.testlist  |  1 +
> >  tests/intel-ci/fast-feedback.testlist |  1 +
> >  4 files changed, 55 insertions(+)
> >  create mode 100644 tests/initial_state.c
> >
> >diff --git a/tests/Makefile.sources b/tests/Makefile.sources
> >index 9553e4d9..32431a05 100644
> >--- a/tests/Makefile.sources
> >+++ b/tests/Makefile.sources
> >@@ -152,6 +152,7 @@ TESTS_progs_M = \
> > vgem_basic \
> > vgem_slow \
> > meta_test \
> >+initial_state \
> > $(NULL)
> >  if HAVE_CHAMELIUM
> >diff --git a/tests/initial_state.c b/tests/initial_state.c
> >new file mode 100644
> >index ..1c5d9a74
> >--- /dev/null
> >+++ b/tests/initial_state.c
> >@@ -0,0 +1,52 @@
> >+/*
> >+ * Copyright © 2017 Intel Corporation
> >+ *
> >+ * Permission is hereby granted, free of charge, to any person obtaining a
> >+ * copy of this software and associated documentation files (the 
> >"Software"),
> >+ * to deal in the Software without restriction, including without limitation
> >+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> >+ * and/or sell copies of the Software, and to permit persons to whom the
> >+ * Software is furnished to do so, subject to the following conditions:
> >+ *
> >+ * The above copyright notice and this permission notice (including the next
> >+ * paragraph) shall be included in all copies or substantial portions of the
> >+ * Software.
> >+ *
> >+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
> >OR
> >+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> >+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> >+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
> >OTHER
> >+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> >+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> >DEALINGS
> >+ * IN THE SOFTWARE.
> >+ *
> >+ */
> >+
> >+#include 
> >+#include 
> >+#include "igt.h"
> >+
> >+/*
> >+ * The purpose of this test is to capture the
> >+ * state of the GPU before running a testlist.
> >+ *
> >+ * Note, this test currently only checks if
> >+ * i915 is wedged. But, it could be extended to report
> >+ * on other bad initial states.
> >+ */
> >+
> >+static void is_wedged(void)
> >+{
> >+char buf[128];
> >+int fd = drm_open_driver(DRIVER_INTEL);
> >+
> >+igt_debugfs_read(fd, "i915_wedged", buf);

The ABI for testing whether the device is wedged before use is
gem_throttle(). This is encapsulated into igt_require_gem() and any test
that depends upon execution on the GPU should be checking for GEM first.

This test should just be igt_gem_require(drm_open_driver(DRIVER_INTEL));
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU

2017-05-12 Thread Martin Peres
The commit name "Add a test to capture the state of the GPU" should 
likely be renamed to "Add a test that checks the current state of the 
driver".


On 12/05/17 14:07, Marta Lofstedt wrote:

When running testlist for CI iot would be good to know if


iot -> it


the state of the GPU is as expected. This adds a subtest to check
if the GPU is wedged.


This is not mandatory, but how about adding:

"When coupled with piglit's abort-on feature, this makes the piglit 
results contain the reason for the abort when the driver's initial state 
is unacceptable, improving debuggability"


With the commit title and the typo fixed, the patch is:
Acked-by: Martin Peres 



Signed-off-by: Marta Lofstedt 
---
  tests/Makefile.sources|  1 +
  tests/initial_state.c | 52 +++
  tests/intel-ci/extended.testlist  |  1 +
  tests/intel-ci/fast-feedback.testlist |  1 +
  4 files changed, 55 insertions(+)
  create mode 100644 tests/initial_state.c

diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index 9553e4d9..32431a05 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -152,6 +152,7 @@ TESTS_progs_M = \
vgem_basic \
vgem_slow \
meta_test \
+   initial_state \
$(NULL)
  
  if HAVE_CHAMELIUM

diff --git a/tests/initial_state.c b/tests/initial_state.c
new file mode 100644
index ..1c5d9a74
--- /dev/null
+++ b/tests/initial_state.c
@@ -0,0 +1,52 @@
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include 
+#include 
+#include "igt.h"
+
+/*
+ * The purpose of this test is to capture the
+ * state of the GPU before running a testlist.
+ *
+ * Note, this test currently only checks if
+ * i915 is wedged. But, it could be extended to report
+ * on other bad initial states.
+ */
+
+static void is_wedged(void)
+{
+   char buf[128];
+   int fd = drm_open_driver(DRIVER_INTEL);
+
+   igt_debugfs_read(fd, "i915_wedged", buf);
+   igt_assert(!strstr(buf, "1"));
+}
+
+igt_main
+{
+
+   igt_subtest("is_wedged")
+   is_wedged();
+}
diff --git a/tests/intel-ci/extended.testlist b/tests/intel-ci/extended.testlist
index a16c9c84..2ec08b55 100644
--- a/tests/intel-ci/extended.testlist
+++ b/tests/intel-ci/extended.testlist
@@ -1,3 +1,4 @@
+igt@initial_state@is_wedged
  igt@core_auth@many-magics
  igt@core_getclient
  igt@core_get_client_auth@master-drop
diff --git a/tests/intel-ci/fast-feedback.testlist 
b/tests/intel-ci/fast-feedback.testlist
index 5ffa2cef..a6a0a810 100644
--- a/tests/intel-ci/fast-feedback.testlist
+++ b/tests/intel-ci/fast-feedback.testlist
@@ -1,3 +1,4 @@
+igt@initial_state@is_wedged
  igt@core_auth@basic-auth
  igt@core_prop_blob@basic
  igt@drv_getparams_basic@basic-eu-total


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH i-g-t] tests/initial_state: Add a test to capture the state of the GPU

2017-05-12 Thread Marta Lofstedt
When running testlist for CI iot would be good to know if
the state of the GPU is as expected. This adds a subtest to check
if the GPU is wedged.

Signed-off-by: Marta Lofstedt 
---
 tests/Makefile.sources|  1 +
 tests/initial_state.c | 52 +++
 tests/intel-ci/extended.testlist  |  1 +
 tests/intel-ci/fast-feedback.testlist |  1 +
 4 files changed, 55 insertions(+)
 create mode 100644 tests/initial_state.c

diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index 9553e4d9..32431a05 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -152,6 +152,7 @@ TESTS_progs_M = \
vgem_basic \
vgem_slow \
meta_test \
+   initial_state \
$(NULL)
 
 if HAVE_CHAMELIUM
diff --git a/tests/initial_state.c b/tests/initial_state.c
new file mode 100644
index ..1c5d9a74
--- /dev/null
+++ b/tests/initial_state.c
@@ -0,0 +1,52 @@
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include 
+#include 
+#include "igt.h"
+
+/*
+ * The purpose of this test is to capture the
+ * state of the GPU before running a testlist.
+ *
+ * Note, this test currently only checks if
+ * i915 is wedged. But, it could be extended to report
+ * on other bad initial states.
+ */
+
+static void is_wedged(void)
+{
+   char buf[128];
+   int fd = drm_open_driver(DRIVER_INTEL);
+
+   igt_debugfs_read(fd, "i915_wedged", buf);
+   igt_assert(!strstr(buf, "1"));
+}
+
+igt_main
+{
+
+   igt_subtest("is_wedged")
+   is_wedged();
+}
diff --git a/tests/intel-ci/extended.testlist b/tests/intel-ci/extended.testlist
index a16c9c84..2ec08b55 100644
--- a/tests/intel-ci/extended.testlist
+++ b/tests/intel-ci/extended.testlist
@@ -1,3 +1,4 @@
+igt@initial_state@is_wedged
 igt@core_auth@many-magics
 igt@core_getclient
 igt@core_get_client_auth@master-drop
diff --git a/tests/intel-ci/fast-feedback.testlist 
b/tests/intel-ci/fast-feedback.testlist
index 5ffa2cef..a6a0a810 100644
--- a/tests/intel-ci/fast-feedback.testlist
+++ b/tests/intel-ci/fast-feedback.testlist
@@ -1,3 +1,4 @@
+igt@initial_state@is_wedged
 igt@core_auth@basic-auth
 igt@core_prop_blob@basic
 igt@drv_getparams_basic@basic-eu-total
-- 
2.11.0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx