Re: [Mesa-dev] [PATCH 4/7] i965: Add an end-of-pipe sync helper

2017-06-15 Thread Jason Ekstrand
On Thu, Jun 15, 2017 at 9:14 AM, Chris Wilson 
wrote:

> Quoting Jason Ekstrand (2017-06-15 16:59:19)
> > On Thu, Jun 15, 2017 at 4:11 AM, Chris Wilson 
> wrote:
> > The kernel does have a LRI after a flush before signaling the batch
> is
> > complete. I don't see a need to add another...
> >
> > The question is whether this posting is required for GPU visibility
> of
> > results or just CPU? I suspect this is just for CPU in which case it
> > doesn't belong here at all, but before flagging rendering as ready
> for
> > async (i.e. not involving the kernel) inspection.
> >
> >
> > The docs, if you choose to believe them, seem to indicate that this is
> needed
> > for GPU visibility as well as CPU.
>
> The kernel has LRI (for semaphore updates) not LRM, is that significant?
> Took me long enough to notice the difference.
>

I don't know.  We're getting so far outside the realm of documentation here
that it's crazy.  What I do know is that the comments in the windows source
indicate that SDI is insufficient on Haswell.  My gut says it has something
to do with forcing a round-trip through the memory controller.  For
semaphore updates, LRI may be sufficient since they're register-based on
gen7.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/7] i965: Add an end-of-pipe sync helper

2017-06-15 Thread Chris Wilson
Quoting Jason Ekstrand (2017-06-15 16:59:19)
> On Thu, Jun 15, 2017 at 4:11 AM, Chris Wilson  
> wrote:
> The kernel does have a LRI after a flush before signaling the batch is
> complete. I don't see a need to add another...
> 
> The question is whether this posting is required for GPU visibility of
> results or just CPU? I suspect this is just for CPU in which case it
> doesn't belong here at all, but before flagging rendering as ready for
> async (i.e. not involving the kernel) inspection.
> 
> 
> The docs, if you choose to believe them, seem to indicate that this is needed
> for GPU visibility as well as CPU.

The kernel has LRI (for semaphore updates) not LRM, is that significant?
Took me long enough to notice the difference.
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/7] i965: Add an end-of-pipe sync helper

2017-06-15 Thread Jason Ekstrand
On Thu, Jun 15, 2017 at 4:11 AM, Chris Wilson 
wrote:

> Quoting Kenneth Graunke (2017-06-14 21:41:56)
> > On Tuesday, June 13, 2017 2:53:24 PM PDT Jason Ekstrand wrote:
> > > From: Topi Pohjolainen 
> > >
> > > v2 (Jason Ekstrand):
> > >  - Take a flags parameter to control the flushes
> > >  - Refactoring
> > >
> > > Signed-off-by: Topi Pohjolainen 
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_context.h  |  1 +
> > >  src/mesa/drivers/dri/i965/brw_pipe_control.c | 96
> +++-
> > >  2 files changed, 96 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h
> b/src/mesa/drivers/dri/i965/brw_context.h
> > > index 7b9be8a..b137409 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_context.h
> > > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> > > @@ -1641,6 +1641,7 @@ void brw_emit_pipe_control_flush(struct
> brw_context *brw, uint32_t flags);
> > >  void brw_emit_pipe_control_write(struct brw_context *brw, uint32_t
> flags,
> > >   struct brw_bo *bo, uint32_t offset,
> > >   uint64_t imm);
> > > +void brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t
> flags);
> > >  void brw_emit_mi_flush(struct brw_context *brw);
> > >  void brw_emit_post_sync_nonzero_flush(struct brw_context *brw);
> > >  void brw_emit_depth_stall_flushes(struct brw_context *brw);
> > > diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c
> b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> > > index 39bb9c7..338e4fc 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> > > @@ -271,7 +271,6 @@ gen7_emit_cs_stall_flush(struct brw_context *brw)
> > > brw->workaround_bo, 0, 0);
> > >  }
> > >
> > > -
> > >  /**
> > >   * Emits a PIPE_CONTROL with a non-zero post-sync operation, for
> > >   * implementing two workarounds on gen6.  From section 1.4.7.1
> > > @@ -320,6 +319,101 @@ brw_emit_post_sync_nonzero_flush(struct
> brw_context *brw)
> > > brw->workaround_bo, 0, 0);
> > >  }
> > >
> > > +/*
> > > + * From Sandybridge PRM, volume 2, "1.7.2 End-of-Pipe
> Synchronization":
> > > + *
> > > + *  Write synchronization is a special case of end-of-pipe
> > > + *  synchronization that requires that the render cache and/or depth
> > > + *  related caches are flushed to memory, where the data will become
> > > + *  globally visible. This type of synchronization is required prior
> to
> > > + *  SW (CPU) actually reading the result data from memory, or
> initiating
> > > + *  an operation that will use as a read surface (such as a texture
> > > + *  surface) a previous render target and/or depth/stencil buffer
> > > + *
> > > + *
> > > + * From Haswell PRM, volume 2, part 1, "End-of-Pipe Synchronization":
> > > + *
> > > + *  Exercising the write cache flush bits (Render Target Cache Flush
> > > + *  Enable, Depth Cache Flush Enable, DC Flush) in PIPE_CONTROL only
> > > + *  ensures the write caches are flushed and doesn't guarantee the
> data
> > > + *  is globally visible.
> > > + *
> > > + *  SW can track the completion of the end-of-pipe-synchronization by
> > > + *  using "Notify Enable" and "PostSync Operation - Write Immediate
> > > + *  Data" in the PIPE_CONTROL command.
> > > + */
> > > +void
> > > +brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags)
> > > +{
> > > +   if (brw->gen >= 6) {
> > > +  /* From Sandybridge PRM, volume 2, "1.7.3.1 Writing a Value to
> Memory":
> > > +   *
> > > +   *"The most common action to perform upon reaching a
> synchronization
> > > +   *point is to write a value out to memory. An immediate
> value
> > > +   *(included with the synchronization command) may be
> written."
> > > +   *
> > > +   *
> > > +   * From Broadwell PRM, volume 7, "End-of-Pipe Synchronization":
> > > +   *
> > > +   *"In case the data flushed out by the render engine is to
> be read
> > > +   *back in to the render engine in coherent manner, then the
> render
> > > +   *engine has to wait for the fence completion before
> accessing the
> > > +   *flushed data. This can be achieved by following means on
> various
> > > +   *products: PIPE_CONTROL command with CS Stall and the
> required
> > > +   *write caches flushed with Post-Sync-Operation as Write
> Immediate
> > > +   *Data.
> > > +   *
> > > +   *Example:
> > > +   *   - Workload-1 (3D/GPGPU/MEDIA)
> > > +   *   - PIPE_CONTROL (CS Stall, Post-Sync-Operation Write
> Immediate
> > > +   * Data, Required Write Cache Flush bits set)
> > > +   *   - Workload-2 (Can use the data produce or output by
> Workload-1)
> > > +   */
> > > +  brw_emit_pipe_control_write(brw,
> > > +  flags | PIPE_CONTROL_CS_STALL |
> >

Re: [Mesa-dev] [PATCH 4/7] i965: Add an end-of-pipe sync helper

2017-06-15 Thread Chris Wilson
Quoting Kenneth Graunke (2017-06-14 21:41:56)
> On Tuesday, June 13, 2017 2:53:24 PM PDT Jason Ekstrand wrote:
> > From: Topi Pohjolainen 
> > 
> > v2 (Jason Ekstrand):
> >  - Take a flags parameter to control the flushes
> >  - Refactoring
> > 
> > Signed-off-by: Topi Pohjolainen 
> > ---
> >  src/mesa/drivers/dri/i965/brw_context.h  |  1 +
> >  src/mesa/drivers/dri/i965/brw_pipe_control.c | 96 
> > +++-
> >  2 files changed, 96 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> > b/src/mesa/drivers/dri/i965/brw_context.h
> > index 7b9be8a..b137409 100644
> > --- a/src/mesa/drivers/dri/i965/brw_context.h
> > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> > @@ -1641,6 +1641,7 @@ void brw_emit_pipe_control_flush(struct brw_context 
> > *brw, uint32_t flags);
> >  void brw_emit_pipe_control_write(struct brw_context *brw, uint32_t flags,
> >   struct brw_bo *bo, uint32_t offset,
> >   uint64_t imm);
> > +void brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags);
> >  void brw_emit_mi_flush(struct brw_context *brw);
> >  void brw_emit_post_sync_nonzero_flush(struct brw_context *brw);
> >  void brw_emit_depth_stall_flushes(struct brw_context *brw);
> > diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
> > b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> > index 39bb9c7..338e4fc 100644
> > --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
> > +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> > @@ -271,7 +271,6 @@ gen7_emit_cs_stall_flush(struct brw_context *brw)
> > brw->workaround_bo, 0, 0);
> >  }
> >  
> > -
> >  /**
> >   * Emits a PIPE_CONTROL with a non-zero post-sync operation, for
> >   * implementing two workarounds on gen6.  From section 1.4.7.1
> > @@ -320,6 +319,101 @@ brw_emit_post_sync_nonzero_flush(struct brw_context 
> > *brw)
> > brw->workaround_bo, 0, 0);
> >  }
> >  
> > +/*
> > + * From Sandybridge PRM, volume 2, "1.7.2 End-of-Pipe Synchronization":
> > + *
> > + *  Write synchronization is a special case of end-of-pipe
> > + *  synchronization that requires that the render cache and/or depth
> > + *  related caches are flushed to memory, where the data will become
> > + *  globally visible. This type of synchronization is required prior to
> > + *  SW (CPU) actually reading the result data from memory, or initiating
> > + *  an operation that will use as a read surface (such as a texture
> > + *  surface) a previous render target and/or depth/stencil buffer
> > + *
> > + *
> > + * From Haswell PRM, volume 2, part 1, "End-of-Pipe Synchronization":
> > + *
> > + *  Exercising the write cache flush bits (Render Target Cache Flush
> > + *  Enable, Depth Cache Flush Enable, DC Flush) in PIPE_CONTROL only
> > + *  ensures the write caches are flushed and doesn't guarantee the data
> > + *  is globally visible.
> > + *
> > + *  SW can track the completion of the end-of-pipe-synchronization by
> > + *  using "Notify Enable" and "PostSync Operation - Write Immediate
> > + *  Data" in the PIPE_CONTROL command. 
> > + */
> > +void
> > +brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags)
> > +{
> > +   if (brw->gen >= 6) {
> > +  /* From Sandybridge PRM, volume 2, "1.7.3.1 Writing a Value to 
> > Memory":
> > +   *
> > +   *"The most common action to perform upon reaching a 
> > synchronization
> > +   *point is to write a value out to memory. An immediate value
> > +   *(included with the synchronization command) may be written."
> > +   *
> > +   *
> > +   * From Broadwell PRM, volume 7, "End-of-Pipe Synchronization":
> > +   *
> > +   *"In case the data flushed out by the render engine is to be 
> > read
> > +   *back in to the render engine in coherent manner, then the 
> > render
> > +   *engine has to wait for the fence completion before accessing 
> > the
> > +   *flushed data. This can be achieved by following means on 
> > various
> > +   *products: PIPE_CONTROL command with CS Stall and the required
> > +   *write caches flushed with Post-Sync-Operation as Write 
> > Immediate
> > +   *Data.
> > +   *
> > +   *Example:
> > +   *   - Workload-1 (3D/GPGPU/MEDIA)
> > +   *   - PIPE_CONTROL (CS Stall, Post-Sync-Operation Write 
> > Immediate
> > +   * Data, Required Write Cache Flush bits set)
> > +   *   - Workload-2 (Can use the data produce or output by 
> > Workload-1)
> > +   */
> > +  brw_emit_pipe_control_write(brw,
> > +  flags | PIPE_CONTROL_CS_STALL |
> > +  PIPE_CONTROL_WRITE_IMMEDIATE,
> > +  brw->workaround_bo, 0, 0);
> > +
> > +  if (brw->is_haswell) {
> > + /* Haswell needs addition

Re: [Mesa-dev] [PATCH 4/7] i965: Add an end-of-pipe sync helper

2017-06-14 Thread Kenneth Graunke
On Tuesday, June 13, 2017 2:53:24 PM PDT Jason Ekstrand wrote:
> From: Topi Pohjolainen 
> 
> v2 (Jason Ekstrand):
>  - Take a flags parameter to control the flushes
>  - Refactoring
> 
> Signed-off-by: Topi Pohjolainen 
> ---
>  src/mesa/drivers/dri/i965/brw_context.h  |  1 +
>  src/mesa/drivers/dri/i965/brw_pipe_control.c | 96 
> +++-
>  2 files changed, 96 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index 7b9be8a..b137409 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1641,6 +1641,7 @@ void brw_emit_pipe_control_flush(struct brw_context 
> *brw, uint32_t flags);
>  void brw_emit_pipe_control_write(struct brw_context *brw, uint32_t flags,
>   struct brw_bo *bo, uint32_t offset,
>   uint64_t imm);
> +void brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags);
>  void brw_emit_mi_flush(struct brw_context *brw);
>  void brw_emit_post_sync_nonzero_flush(struct brw_context *brw);
>  void brw_emit_depth_stall_flushes(struct brw_context *brw);
> diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
> b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> index 39bb9c7..338e4fc 100644
> --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
> +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> @@ -271,7 +271,6 @@ gen7_emit_cs_stall_flush(struct brw_context *brw)
> brw->workaround_bo, 0, 0);
>  }
>  
> -
>  /**
>   * Emits a PIPE_CONTROL with a non-zero post-sync operation, for
>   * implementing two workarounds on gen6.  From section 1.4.7.1
> @@ -320,6 +319,101 @@ brw_emit_post_sync_nonzero_flush(struct brw_context 
> *brw)
> brw->workaround_bo, 0, 0);
>  }
>  
> +/*
> + * From Sandybridge PRM, volume 2, "1.7.2 End-of-Pipe Synchronization":
> + *
> + *  Write synchronization is a special case of end-of-pipe
> + *  synchronization that requires that the render cache and/or depth
> + *  related caches are flushed to memory, where the data will become
> + *  globally visible. This type of synchronization is required prior to
> + *  SW (CPU) actually reading the result data from memory, or initiating
> + *  an operation that will use as a read surface (such as a texture
> + *  surface) a previous render target and/or depth/stencil buffer
> + *
> + *
> + * From Haswell PRM, volume 2, part 1, "End-of-Pipe Synchronization":
> + *
> + *  Exercising the write cache flush bits (Render Target Cache Flush
> + *  Enable, Depth Cache Flush Enable, DC Flush) in PIPE_CONTROL only
> + *  ensures the write caches are flushed and doesn't guarantee the data
> + *  is globally visible.
> + *
> + *  SW can track the completion of the end-of-pipe-synchronization by
> + *  using "Notify Enable" and "PostSync Operation - Write Immediate
> + *  Data" in the PIPE_CONTROL command. 
> + */
> +void
> +brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags)
> +{
> +   if (brw->gen >= 6) {
> +  /* From Sandybridge PRM, volume 2, "1.7.3.1 Writing a Value to Memory":
> +   *
> +   *"The most common action to perform upon reaching a 
> synchronization
> +   *point is to write a value out to memory. An immediate value
> +   *(included with the synchronization command) may be written."
> +   *
> +   *
> +   * From Broadwell PRM, volume 7, "End-of-Pipe Synchronization":
> +   *
> +   *"In case the data flushed out by the render engine is to be read
> +   *back in to the render engine in coherent manner, then the render
> +   *engine has to wait for the fence completion before accessing the
> +   *flushed data. This can be achieved by following means on various
> +   *products: PIPE_CONTROL command with CS Stall and the required
> +   *write caches flushed with Post-Sync-Operation as Write Immediate
> +   *Data.
> +   *
> +   *Example:
> +   *   - Workload-1 (3D/GPGPU/MEDIA)
> +   *   - PIPE_CONTROL (CS Stall, Post-Sync-Operation Write Immediate
> +   * Data, Required Write Cache Flush bits set)
> +   *   - Workload-2 (Can use the data produce or output by 
> Workload-1)
> +   */
> +  brw_emit_pipe_control_write(brw,
> +  flags | PIPE_CONTROL_CS_STALL |
> +  PIPE_CONTROL_WRITE_IMMEDIATE,
> +  brw->workaround_bo, 0, 0);
> +
> +  if (brw->is_haswell) {
> + /* Haswell needs addition work-arounds:
> +  *
> +  * From Haswell PRM, volume 2, part 1, "End-of-Pipe 
> Synchronization":
> +  *
> +  *Option 1:
> +  *PIPE_CONTROL command with the CS Stall and the required write
> +  *caches flushed with Post-SyncOperat

[Mesa-dev] [PATCH 4/7] i965: Add an end-of-pipe sync helper

2017-06-13 Thread Jason Ekstrand
From: Topi Pohjolainen 

v2 (Jason Ekstrand):
 - Take a flags parameter to control the flushes
 - Refactoring

Signed-off-by: Topi Pohjolainen 
---
 src/mesa/drivers/dri/i965/brw_context.h  |  1 +
 src/mesa/drivers/dri/i965/brw_pipe_control.c | 96 +++-
 2 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 7b9be8a..b137409 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1641,6 +1641,7 @@ void brw_emit_pipe_control_flush(struct brw_context *brw, 
uint32_t flags);
 void brw_emit_pipe_control_write(struct brw_context *brw, uint32_t flags,
  struct brw_bo *bo, uint32_t offset,
  uint64_t imm);
+void brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags);
 void brw_emit_mi_flush(struct brw_context *brw);
 void brw_emit_post_sync_nonzero_flush(struct brw_context *brw);
 void brw_emit_depth_stall_flushes(struct brw_context *brw);
diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c 
b/src/mesa/drivers/dri/i965/brw_pipe_control.c
index 39bb9c7..338e4fc 100644
--- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
+++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
@@ -271,7 +271,6 @@ gen7_emit_cs_stall_flush(struct brw_context *brw)
brw->workaround_bo, 0, 0);
 }
 
-
 /**
  * Emits a PIPE_CONTROL with a non-zero post-sync operation, for
  * implementing two workarounds on gen6.  From section 1.4.7.1
@@ -320,6 +319,101 @@ brw_emit_post_sync_nonzero_flush(struct brw_context *brw)
brw->workaround_bo, 0, 0);
 }
 
+/*
+ * From Sandybridge PRM, volume 2, "1.7.2 End-of-Pipe Synchronization":
+ *
+ *  Write synchronization is a special case of end-of-pipe
+ *  synchronization that requires that the render cache and/or depth
+ *  related caches are flushed to memory, where the data will become
+ *  globally visible. This type of synchronization is required prior to
+ *  SW (CPU) actually reading the result data from memory, or initiating
+ *  an operation that will use as a read surface (such as a texture
+ *  surface) a previous render target and/or depth/stencil buffer
+ *
+ *
+ * From Haswell PRM, volume 2, part 1, "End-of-Pipe Synchronization":
+ *
+ *  Exercising the write cache flush bits (Render Target Cache Flush
+ *  Enable, Depth Cache Flush Enable, DC Flush) in PIPE_CONTROL only
+ *  ensures the write caches are flushed and doesn't guarantee the data
+ *  is globally visible.
+ *
+ *  SW can track the completion of the end-of-pipe-synchronization by
+ *  using "Notify Enable" and "PostSync Operation - Write Immediate
+ *  Data" in the PIPE_CONTROL command. 
+ */
+void
+brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags)
+{
+   if (brw->gen >= 6) {
+  /* From Sandybridge PRM, volume 2, "1.7.3.1 Writing a Value to Memory":
+   *
+   *"The most common action to perform upon reaching a synchronization
+   *point is to write a value out to memory. An immediate value
+   *(included with the synchronization command) may be written."
+   *
+   *
+   * From Broadwell PRM, volume 7, "End-of-Pipe Synchronization":
+   *
+   *"In case the data flushed out by the render engine is to be read
+   *back in to the render engine in coherent manner, then the render
+   *engine has to wait for the fence completion before accessing the
+   *flushed data. This can be achieved by following means on various
+   *products: PIPE_CONTROL command with CS Stall and the required
+   *write caches flushed with Post-Sync-Operation as Write Immediate
+   *Data.
+   *
+   *Example:
+   *   - Workload-1 (3D/GPGPU/MEDIA)
+   *   - PIPE_CONTROL (CS Stall, Post-Sync-Operation Write Immediate
+   * Data, Required Write Cache Flush bits set)
+   *   - Workload-2 (Can use the data produce or output by Workload-1)
+   */
+  brw_emit_pipe_control_write(brw,
+  flags | PIPE_CONTROL_CS_STALL |
+  PIPE_CONTROL_WRITE_IMMEDIATE,
+  brw->workaround_bo, 0, 0);
+
+  if (brw->is_haswell) {
+ /* Haswell needs addition work-arounds:
+  *
+  * From Haswell PRM, volume 2, part 1, "End-of-Pipe Synchronization":
+  *
+  *Option 1:
+  *PIPE_CONTROL command with the CS Stall and the required write
+  *caches flushed with Post-SyncOperation as Write Immediate Data
+  *followed by eight dummy MI_STORE_DATA_IMM (write to scratch
+  *spce) commands.
+  *
+  *Example:
+  *   - Workload-1
+  *   - PIPE_CONTROL (CS Stall, Post-Sync-Operation Write
+  *