Re: [Mesa-dev] [PATCH 0/7] i965: Stop hanging on Haswell

2017-06-15 Thread Jason Ekstrand
On Thu, Jun 15, 2017 at 9:32 AM, Chris Wilson 
wrote:

> Quoting Jason Ekstrand (2017-06-15 16:58:13)
> > On Thu, Jun 15, 2017 at 4:15 AM, Chris Wilson 
> wrote:
> >
> > Quoting Kenneth Graunke (2017-06-14 21:44:45)
> > > If Chris is right, and what we're really seeing is that
> MI_SET_CONTEXT
> > > needs additional flushing, it probably makes sense to fix the
> kernel.
> > > If it's really fast clear related, then we should do it in Mesa.
> >
> > If I'm right, it's more of a userspace problem because you have to
> > insert a pipeline stall before STATE_BASE_ADDRESS when switching
> between
> > blorp/normal and back again, in the same batch. That the
> MI_SET_CONTEXT
> > may be restoring the dirty GPU state from the previous batch just
> means
> > that
> > you have to think of batches as being one long continuous batch.
> > -Chris
> >
> >
> >  Given that, I doubt your explanation is correct.  Right now, we should
> be
> > correct under the "long continuous batch" assumption and we're hanging.
> So I
> > think that either MI_SET_CONTEXT doesn't stall hard enough or we're
> conflicting
> > with another process somehow.
>
> What I said was too simplistic, as the MI_SET_CONTEXT would be
> introducing side-effects (such as the pipeline being active, hmm, unless
> it does flush at the end!). What I mean is that if it is
> MI_SET_CONTEXT causing the pipeline to be active, you would need to
> treat switching operations within the same pipeline equally. That you
> would need a pipeline stall after a blorp/hiz not just to ensure the
> data is written but to ensure that the STATE_BASE_ADDRESS doesn't trip
> up.
>

Right.  It's entirely possible that MI_SET_CONTEXT could trip of
STATE_BASE_ADDRESS or that simple dirty caches can.  We've seen cache
flushing issues around STATE_BASE_ADDRESS in Vulkan where we set it
multiple times per batch.


> Of course, now I said that it would be a side-effect of MI_SET_CONTEXT
> causing the state of the GPU pipelines to be different from expectation,
> it becomes the kernel responsibility to add the flush. Argh!
>

Yeah... I don't think it would be a bad thing for the kernel to do a
end-of-pipe sync after MI_SET_CONTEXT (or maybe before?) but us just
handling it in userspace is probably reasonable.


> I'm open to putting it into the kernel, though I'd rather userspace
> handled it. We want to keep the kernel out of the loop as much as
> possible.
>

I think I agree with you there.  Which is why Ken and I merged the patches
yesterday :-)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/7] i965: Stop hanging on Haswell

2017-06-15 Thread Chris Wilson
Quoting Jason Ekstrand (2017-06-15 16:58:13)
> On Thu, Jun 15, 2017 at 4:15 AM, Chris Wilson  
> wrote:
> 
> Quoting Kenneth Graunke (2017-06-14 21:44:45)
> > If Chris is right, and what we're really seeing is that MI_SET_CONTEXT
> > needs additional flushing, it probably makes sense to fix the kernel.
> > If it's really fast clear related, then we should do it in Mesa.
> 
> If I'm right, it's more of a userspace problem because you have to
> insert a pipeline stall before STATE_BASE_ADDRESS when switching between
> blorp/normal and back again, in the same batch. That the MI_SET_CONTEXT
> may be restoring the dirty GPU state from the previous batch just means
> that
> you have to think of batches as being one long continuous batch.
> -Chris
> 
> 
>  Given that, I doubt your explanation is correct.  Right now, we should be
> correct under the "long continuous batch" assumption and we're hanging.  So I
> think that either MI_SET_CONTEXT doesn't stall hard enough or we're 
> conflicting
> with another process somehow.

What I said was too simplistic, as the MI_SET_CONTEXT would be
introducing side-effects (such as the pipeline being active, hmm, unless
it does flush at the end!). What I mean is that if it is
MI_SET_CONTEXT causing the pipeline to be active, you would need to
treat switching operations within the same pipeline equally. That you
would need a pipeline stall after a blorp/hiz not just to ensure the
data is written but to ensure that the STATE_BASE_ADDRESS doesn't trip
up.

Of course, now I said that it would be a side-effect of MI_SET_CONTEXT
causing the state of the GPU pipelines to be different from expectation,
it becomes the kernel responsibility to add the flush. Argh!

I'm open to putting it into the kernel, though I'd rather userspace
handled it. We want to keep the kernel out of the loop as much as
possible.
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/7] i965: Stop hanging on Haswell

2017-06-15 Thread Jason Ekstrand
On Thu, Jun 15, 2017 at 4:15 AM, Chris Wilson 
wrote:

> Quoting Kenneth Graunke (2017-06-14 21:44:45)
> > On Tuesday, June 13, 2017 2:53:20 PM PDT Jason Ekstrand wrote:
> > > As I've been working on converting more things in the GL driver over to
> > > blorp, I've been highly annoyed by all of the hangs on Haswell.  About
> one
> > > in 3-5 Jenkins runs would hang somewhere.  After looking at about a
> > > half-dozen error states, I noticed that all of the hangs seemed to be
> on
> > > fast-clear operations (clear or resolve) that happen at the start of a
> > > batch, right after STATE_BASE_ADDRESS.
> > >
> > > Haswell seems to be a bit more picky than other hardware about having
> > > fast-clear operations in flight at the same time as regular rendering
> and
> > > hangs if the two ever overlap.  (Other hardware can get rendering
> > > corruption but not usually hangs.)  Also, Haswell doesn't fully stall
> if
> > > you just do a RT flush and a CS stall.  The hardware docs refer to
> > > something they call an "end of pipe sync" which is a CS stall with a
> write
> > > to the workaround BO.  On Haswell, you also need to read from that same
> > > address to create a memory dependency and make sure the system is fully
> > > stalled.
> > >
> > > When you call brw_blorp_resolve_color it calls
> brw_emit_pipe_control_flush
> > > and does the correct flushes and then calls into core blorp to do the
> > > actual resolve operation.  If the batch doesn't have enough space left
> in
> > > it for the fast-clear operation, the batch will get split and the
> > > fast-clear will happen in the next batch.  I believe what is happening
> is
> > > that while we're building the second batch that actually contains the
> > > fast-clear, some other process completes a batch and inserts it
> between our
> > > PIPE_CONTROL to do the stall and the actual fast-clear.  We then end up
> > > with more stuff in flight than we can handle and the GPU explodes.
> > >
> > > I'm not 100% convinced of this explanation because it seems a bit fishy
> > > that a context switch wouldn't be enough to fully flush out the GPU.
> > > However, what I do know is that, without these patches I get a hang in
> one
> > > out of three to five Jenkins runs on my wip/i965-blorp-ds branch.
> With the
> > > patches (or an older variant that did the same thing), I have done
> almost 20
> > > Jenkins runs and have yet to see a hang.  I'd call that success.
> > >
> > > Jason Ekstrand (6):
> > >   i965: Flush around state base address
> > >   i965: Take a uint64_t immediate in emit_pipe_control_write
> > >   i965: Unify the two emit_pipe_control functions
> > >   i965: Do an end-of-pipe sync prior to STATE_BASE_ADDRESS
> > >   i965/blorp: Do an end-of-pipe sync around CCS ops
> > >   i965: Do an end-of-pipe sync after flushes
> > >
> > > Topi Pohjolainen (1):
> > >   i965: Add an end-of-pipe sync helper
> > >
> > >  src/mesa/drivers/dri/i965/brw_blorp.c|  16 +-
> > >  src/mesa/drivers/dri/i965/brw_context.h  |   3 +-
> > >  src/mesa/drivers/dri/i965/brw_misc_state.c   |  38 +
> > >  src/mesa/drivers/dri/i965/brw_pipe_control.c | 243
> ++-
> > >  src/mesa/drivers/dri/i965/brw_queryobj.c |   5 +-
> > >  src/mesa/drivers/dri/i965/gen6_queryobj.c|   2 +-
> > >  src/mesa/drivers/dri/i965/genX_blorp_exec.c  |   2 +-
> > >  7 files changed, 211 insertions(+), 98 deletions(-)
> > >
> > >
> >
> > The series is:
> > Reviewed-by: Kenneth Graunke 
> >
> > If Chris is right, and what we're really seeing is that MI_SET_CONTEXT
> > needs additional flushing, it probably makes sense to fix the kernel.
> > If it's really fast clear related, then we should do it in Mesa.
>
> If I'm right, it's more of a userspace problem because you have to
> insert a pipeline stall before STATE_BASE_ADDRESS when switching between
> blorp/normal and back again, in the same batch. That the MI_SET_CONTEXT
> may be restoring the dirty GPU state from the previous batch just means
> that
> you have to think of batches as being one long continuous batch.
> -Chris
>

 Given that, I doubt your explanation is correct.  Right now, we should be
correct under the "long continuous batch" assumption and we're hanging.  So
I think that either MI_SET_CONTEXT doesn't stall hard enough or we're
conflicting with another process somehow.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/7] i965: Stop hanging on Haswell

2017-06-15 Thread Chris Wilson
Quoting Kenneth Graunke (2017-06-14 21:44:45)
> On Tuesday, June 13, 2017 2:53:20 PM PDT Jason Ekstrand wrote:
> > As I've been working on converting more things in the GL driver over to
> > blorp, I've been highly annoyed by all of the hangs on Haswell.  About one
> > in 3-5 Jenkins runs would hang somewhere.  After looking at about a
> > half-dozen error states, I noticed that all of the hangs seemed to be on
> > fast-clear operations (clear or resolve) that happen at the start of a
> > batch, right after STATE_BASE_ADDRESS.
> > 
> > Haswell seems to be a bit more picky than other hardware about having
> > fast-clear operations in flight at the same time as regular rendering and
> > hangs if the two ever overlap.  (Other hardware can get rendering
> > corruption but not usually hangs.)  Also, Haswell doesn't fully stall if
> > you just do a RT flush and a CS stall.  The hardware docs refer to
> > something they call an "end of pipe sync" which is a CS stall with a write
> > to the workaround BO.  On Haswell, you also need to read from that same
> > address to create a memory dependency and make sure the system is fully
> > stalled.
> > 
> > When you call brw_blorp_resolve_color it calls brw_emit_pipe_control_flush
> > and does the correct flushes and then calls into core blorp to do the
> > actual resolve operation.  If the batch doesn't have enough space left in
> > it for the fast-clear operation, the batch will get split and the
> > fast-clear will happen in the next batch.  I believe what is happening is
> > that while we're building the second batch that actually contains the
> > fast-clear, some other process completes a batch and inserts it between our
> > PIPE_CONTROL to do the stall and the actual fast-clear.  We then end up
> > with more stuff in flight than we can handle and the GPU explodes.
> > 
> > I'm not 100% convinced of this explanation because it seems a bit fishy
> > that a context switch wouldn't be enough to fully flush out the GPU.
> > However, what I do know is that, without these patches I get a hang in one
> > out of three to five Jenkins runs on my wip/i965-blorp-ds branch.  With the
> > patches (or an older variant that did the same thing), I have done almost 20
> > Jenkins runs and have yet to see a hang.  I'd call that success.
> > 
> > Jason Ekstrand (6):
> >   i965: Flush around state base address
> >   i965: Take a uint64_t immediate in emit_pipe_control_write
> >   i965: Unify the two emit_pipe_control functions
> >   i965: Do an end-of-pipe sync prior to STATE_BASE_ADDRESS
> >   i965/blorp: Do an end-of-pipe sync around CCS ops
> >   i965: Do an end-of-pipe sync after flushes
> > 
> > Topi Pohjolainen (1):
> >   i965: Add an end-of-pipe sync helper
> > 
> >  src/mesa/drivers/dri/i965/brw_blorp.c|  16 +-
> >  src/mesa/drivers/dri/i965/brw_context.h  |   3 +-
> >  src/mesa/drivers/dri/i965/brw_misc_state.c   |  38 +
> >  src/mesa/drivers/dri/i965/brw_pipe_control.c | 243 
> > ++-
> >  src/mesa/drivers/dri/i965/brw_queryobj.c |   5 +-
> >  src/mesa/drivers/dri/i965/gen6_queryobj.c|   2 +-
> >  src/mesa/drivers/dri/i965/genX_blorp_exec.c  |   2 +-
> >  7 files changed, 211 insertions(+), 98 deletions(-)
> > 
> > 
> 
> The series is:
> Reviewed-by: Kenneth Graunke 
> 
> If Chris is right, and what we're really seeing is that MI_SET_CONTEXT
> needs additional flushing, it probably makes sense to fix the kernel.
> If it's really fast clear related, then we should do it in Mesa.

If I'm right, it's more of a userspace problem because you have to
insert a pipeline stall before STATE_BASE_ADDRESS when switching between
blorp/normal and back again, in the same batch. That the MI_SET_CONTEXT
may be restoring the dirty GPU state from the previous batch just means that
you have to think of batches as being one long continuous batch.
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/7] i965: Stop hanging on Haswell

2017-06-14 Thread Kenneth Graunke
On Tuesday, June 13, 2017 2:53:20 PM PDT Jason Ekstrand wrote:
> As I've been working on converting more things in the GL driver over to
> blorp, I've been highly annoyed by all of the hangs on Haswell.  About one
> in 3-5 Jenkins runs would hang somewhere.  After looking at about a
> half-dozen error states, I noticed that all of the hangs seemed to be on
> fast-clear operations (clear or resolve) that happen at the start of a
> batch, right after STATE_BASE_ADDRESS.
> 
> Haswell seems to be a bit more picky than other hardware about having
> fast-clear operations in flight at the same time as regular rendering and
> hangs if the two ever overlap.  (Other hardware can get rendering
> corruption but not usually hangs.)  Also, Haswell doesn't fully stall if
> you just do a RT flush and a CS stall.  The hardware docs refer to
> something they call an "end of pipe sync" which is a CS stall with a write
> to the workaround BO.  On Haswell, you also need to read from that same
> address to create a memory dependency and make sure the system is fully
> stalled.
> 
> When you call brw_blorp_resolve_color it calls brw_emit_pipe_control_flush
> and does the correct flushes and then calls into core blorp to do the
> actual resolve operation.  If the batch doesn't have enough space left in
> it for the fast-clear operation, the batch will get split and the
> fast-clear will happen in the next batch.  I believe what is happening is
> that while we're building the second batch that actually contains the
> fast-clear, some other process completes a batch and inserts it between our
> PIPE_CONTROL to do the stall and the actual fast-clear.  We then end up
> with more stuff in flight than we can handle and the GPU explodes.
> 
> I'm not 100% convinced of this explanation because it seems a bit fishy
> that a context switch wouldn't be enough to fully flush out the GPU.
> However, what I do know is that, without these patches I get a hang in one
> out of three to five Jenkins runs on my wip/i965-blorp-ds branch.  With the
> patches (or an older variant that did the same thing), I have done almost 20
> Jenkins runs and have yet to see a hang.  I'd call that success.
> 
> Jason Ekstrand (6):
>   i965: Flush around state base address
>   i965: Take a uint64_t immediate in emit_pipe_control_write
>   i965: Unify the two emit_pipe_control functions
>   i965: Do an end-of-pipe sync prior to STATE_BASE_ADDRESS
>   i965/blorp: Do an end-of-pipe sync around CCS ops
>   i965: Do an end-of-pipe sync after flushes
> 
> Topi Pohjolainen (1):
>   i965: Add an end-of-pipe sync helper
> 
>  src/mesa/drivers/dri/i965/brw_blorp.c|  16 +-
>  src/mesa/drivers/dri/i965/brw_context.h  |   3 +-
>  src/mesa/drivers/dri/i965/brw_misc_state.c   |  38 +
>  src/mesa/drivers/dri/i965/brw_pipe_control.c | 243 
> ++-
>  src/mesa/drivers/dri/i965/brw_queryobj.c |   5 +-
>  src/mesa/drivers/dri/i965/gen6_queryobj.c|   2 +-
>  src/mesa/drivers/dri/i965/genX_blorp_exec.c  |   2 +-
>  7 files changed, 211 insertions(+), 98 deletions(-)
> 
> 

The series is:
Reviewed-by: Kenneth Graunke 

If Chris is right, and what we're really seeing is that MI_SET_CONTEXT
needs additional flushing, it probably makes sense to fix the kernel.
If it's really fast clear related, then we should do it in Mesa.

I'm not sure we'll ever be able to properly determine that.

Even if we go the kernel route, we should land patches 1-3.

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/7] i965: Stop hanging on Haswell

2017-06-14 Thread Jason Ekstrand
On Wed, Jun 14, 2017 at 2:00 AM, Chris Wilson 
wrote:

> Quoting Jason Ekstrand (2017-06-13 22:53:20)
> > As I've been working on converting more things in the GL driver over to
> > blorp, I've been highly annoyed by all of the hangs on Haswell.  About
> one
> > in 3-5 Jenkins runs would hang somewhere.  After looking at about a
> > half-dozen error states, I noticed that all of the hangs seemed to be on
> > fast-clear operations (clear or resolve) that happen at the start of a
> > batch, right after STATE_BASE_ADDRESS.
> >
> > Haswell seems to be a bit more picky than other hardware about having
> > fast-clear operations in flight at the same time as regular rendering and
> > hangs if the two ever overlap.  (Other hardware can get rendering
> > corruption but not usually hangs.)  Also, Haswell doesn't fully stall if
> > you just do a RT flush and a CS stall.  The hardware docs refer to
> > something they call an "end of pipe sync" which is a CS stall with a
> write
> > to the workaround BO.  On Haswell, you also need to read from that same
> > address to create a memory dependency and make sure the system is fully
> > stalled.
> >
> > When you call brw_blorp_resolve_color it calls
> brw_emit_pipe_control_flush
> > and does the correct flushes and then calls into core blorp to do the
> > actual resolve operation.  If the batch doesn't have enough space left in
> > it for the fast-clear operation, the batch will get split and the
> > fast-clear will happen in the next batch.  I believe what is happening is
> > that while we're building the second batch that actually contains the
> > fast-clear, some other process completes a batch and inserts it between
> our
> > PIPE_CONTROL to do the stall and the actual fast-clear.  We then end up
> > with more stuff in flight than we can handle and the GPU explodes.
> >
> > I'm not 100% convinced of this explanation because it seems a bit fishy
> > that a context switch wouldn't be enough to fully flush out the GPU.
> > However, what I do know is that, without these patches I get a hang in
> one
> > out of three to five Jenkins runs on my wip/i965-blorp-ds branch.  With
> the
> > patches (or an older variant that did the same thing), I have done
> almost 20
> > Jenkins runs and have yet to see a hang.  I'd call that success.
>

For the record, I *think* this also improves Sky Lake.  I believe I saw
hangs (less often, maybe 1 in 10) without this and have seen none with it.


> Note that a context switch is itself just a batch that restores the
> registers
> and GPU state.
>
> The kernel does
>
> PIPE_CONTROLs for invalidate-caches
> MI_SET_CONTEXT
> MI_BB_START
> PIPE_CONTROLs for flush-caches
> MI_STORE_DWORD (seqno)
> MI_USER_INTERRUPT
>
> What I believe you are seeing is that MI_SET_CONTEXT is leaving the GPU
> in an active state requiring a pipeline barrier before adjusting. It
> will be the equivalent of switching between GL and blorp in the middle of
> a batch.
>

That's also a reasonable theory (or maybe even better).  However, the
work-around is the same.


> The question I have is whether we apply the fix in the kernel, i.e. do a
> full end of pipe sync after every MI_SET_CONTEXT. Userspace has the
> advantage of knowing if/when such a hammer is required, but equally we
> have to learn where by trial-and-error and if a second context user ever
> manifests, they will have to be taught the same lessons.
>

Right.

Here's arguments for doing it in the kernel:

 1) It's the "right" place to do it because it appears to be a
cross-context issue.
 2) The kernel knows whether or not you're getting an actual context switch
and can insert the end-of-pipe sync when an actual context switch happens
rather than on every batch.

Here's arguments for doing it in userspace:

 1) Userspace knows whether or not we're doing an actual fast-clear
operation and can only flush for fast-clears at the beginning of the batch.
 2) The kernel isn't flushing now so we'll get hangs until people update
kernels unless we do it in userspace.

My gut says userspace but that's because I tend to have a mild distrust of
the kernel.  There are some things that are the kernel's job (dealing with
context switches, for instance) but I'm a big fan of putting anything in
userspace that can reasonably go there.

Here's some more data.  Knowing this was a big giant hammer, I ran a full
suite of benchmarks overnight on my Haswell GT3 and this is what I found:

Test   0-master 1-i965-end-of-pipe diff
bench_manhattan4442.510 4430.870   -11.640
bench_manhattanoff 4683.300 4663.000   -20.300
bench_OglBatch0773.523  771.027-2.496
bench_OglBatch1775.858  771.802-4.056
bench_OglBatch4747.629  745.522-2.107
bench_OglPSBump2   513.528  514.9441.416

So the only 

Re: [Mesa-dev] [PATCH 0/7] i965: Stop hanging on Haswell

2017-06-14 Thread Chris Wilson
Quoting Jason Ekstrand (2017-06-13 22:53:20)
> As I've been working on converting more things in the GL driver over to
> blorp, I've been highly annoyed by all of the hangs on Haswell.  About one
> in 3-5 Jenkins runs would hang somewhere.  After looking at about a
> half-dozen error states, I noticed that all of the hangs seemed to be on
> fast-clear operations (clear or resolve) that happen at the start of a
> batch, right after STATE_BASE_ADDRESS.
> 
> Haswell seems to be a bit more picky than other hardware about having
> fast-clear operations in flight at the same time as regular rendering and
> hangs if the two ever overlap.  (Other hardware can get rendering
> corruption but not usually hangs.)  Also, Haswell doesn't fully stall if
> you just do a RT flush and a CS stall.  The hardware docs refer to
> something they call an "end of pipe sync" which is a CS stall with a write
> to the workaround BO.  On Haswell, you also need to read from that same
> address to create a memory dependency and make sure the system is fully
> stalled.
> 
> When you call brw_blorp_resolve_color it calls brw_emit_pipe_control_flush
> and does the correct flushes and then calls into core blorp to do the
> actual resolve operation.  If the batch doesn't have enough space left in
> it for the fast-clear operation, the batch will get split and the
> fast-clear will happen in the next batch.  I believe what is happening is
> that while we're building the second batch that actually contains the
> fast-clear, some other process completes a batch and inserts it between our
> PIPE_CONTROL to do the stall and the actual fast-clear.  We then end up
> with more stuff in flight than we can handle and the GPU explodes.
> 
> I'm not 100% convinced of this explanation because it seems a bit fishy
> that a context switch wouldn't be enough to fully flush out the GPU.
> However, what I do know is that, without these patches I get a hang in one
> out of three to five Jenkins runs on my wip/i965-blorp-ds branch.  With the
> patches (or an older variant that did the same thing), I have done almost 20
> Jenkins runs and have yet to see a hang.  I'd call that success.

Note that a context switch is itself just a batch that restores the registers
and GPU state.

The kernel does

PIPE_CONTROLs for invalidate-caches
MI_SET_CONTEXT
MI_BB_START
PIPE_CONTROLs for flush-caches
MI_STORE_DWORD (seqno)
MI_USER_INTERRUPT

What I believe you are seeing is that MI_SET_CONTEXT is leaving the GPU
in an active state requiring a pipeline barrier before adjusting. It
will be the equivalent of switching between GL and blorp in the middle of
a batch.

The question I have is whether we apply the fix in the kernel, i.e. do a
full end of pipe sync after every MI_SET_CONTEXT. Userspace has the
advantage of knowing if/when such a hammer is required, but equally we
have to learn where by trial-and-error and if a second context user ever
manifests, they will have to be taught the same lessons.
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/7] i965: Stop hanging on Haswell

2017-06-13 Thread Jason Ekstrand
As I've been working on converting more things in the GL driver over to
blorp, I've been highly annoyed by all of the hangs on Haswell.  About one
in 3-5 Jenkins runs would hang somewhere.  After looking at about a
half-dozen error states, I noticed that all of the hangs seemed to be on
fast-clear operations (clear or resolve) that happen at the start of a
batch, right after STATE_BASE_ADDRESS.

Haswell seems to be a bit more picky than other hardware about having
fast-clear operations in flight at the same time as regular rendering and
hangs if the two ever overlap.  (Other hardware can get rendering
corruption but not usually hangs.)  Also, Haswell doesn't fully stall if
you just do a RT flush and a CS stall.  The hardware docs refer to
something they call an "end of pipe sync" which is a CS stall with a write
to the workaround BO.  On Haswell, you also need to read from that same
address to create a memory dependency and make sure the system is fully
stalled.

When you call brw_blorp_resolve_color it calls brw_emit_pipe_control_flush
and does the correct flushes and then calls into core blorp to do the
actual resolve operation.  If the batch doesn't have enough space left in
it for the fast-clear operation, the batch will get split and the
fast-clear will happen in the next batch.  I believe what is happening is
that while we're building the second batch that actually contains the
fast-clear, some other process completes a batch and inserts it between our
PIPE_CONTROL to do the stall and the actual fast-clear.  We then end up
with more stuff in flight than we can handle and the GPU explodes.

I'm not 100% convinced of this explanation because it seems a bit fishy
that a context switch wouldn't be enough to fully flush out the GPU.
However, what I do know is that, without these patches I get a hang in one
out of three to five Jenkins runs on my wip/i965-blorp-ds branch.  With the
patches (or an older variant that did the same thing), I have done almost 20
Jenkins runs and have yet to see a hang.  I'd call that success.

Jason Ekstrand (6):
  i965: Flush around state base address
  i965: Take a uint64_t immediate in emit_pipe_control_write
  i965: Unify the two emit_pipe_control functions
  i965: Do an end-of-pipe sync prior to STATE_BASE_ADDRESS
  i965/blorp: Do an end-of-pipe sync around CCS ops
  i965: Do an end-of-pipe sync after flushes

Topi Pohjolainen (1):
  i965: Add an end-of-pipe sync helper

 src/mesa/drivers/dri/i965/brw_blorp.c|  16 +-
 src/mesa/drivers/dri/i965/brw_context.h  |   3 +-
 src/mesa/drivers/dri/i965/brw_misc_state.c   |  38 +
 src/mesa/drivers/dri/i965/brw_pipe_control.c | 243 ++-
 src/mesa/drivers/dri/i965/brw_queryobj.c |   5 +-
 src/mesa/drivers/dri/i965/gen6_queryobj.c|   2 +-
 src/mesa/drivers/dri/i965/genX_blorp_exec.c  |   2 +-
 7 files changed, 211 insertions(+), 98 deletions(-)

-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev