Re: [Mesa-dev] [PATCH 4/7] i965: Add an end-of-pipe sync helper
On Thu, Jun 15, 2017 at 9:14 AM, Chris Wilson wrote: > Quoting Jason Ekstrand (2017-06-15 16:59:19) > > On Thu, Jun 15, 2017 at 4:11 AM, Chris Wilson > wrote: > > The kernel does have a LRI after a flush before signaling the batch > is > > complete. I don't see a need to add another... > > > > The question is whether this posting is required for GPU visibility > of > > results or just CPU? I suspect this is just for CPU in which case it > > doesn't belong here at all, but before flagging rendering as ready > for > > async (i.e. not involving the kernel) inspection. > > > > > > The docs, if you choose to believe them, seem to indicate that this is > needed > > for GPU visibility as well as CPU. > > The kernel has LRI (for semaphore updates) not LRM, is that significant? > Took me long enough to notice the difference. > I don't know. We're getting so far outside the realm of documentation here that it's crazy. What I do know is that the comments in the windows source indicate that SDI is insufficient on Haswell. My gut says it has something to do with forcing a round-trip through the memory controller. For semaphore updates, LRI may be sufficient since they're register-based on gen7. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/7] i965: Add an end-of-pipe sync helper
Quoting Jason Ekstrand (2017-06-15 16:59:19) > On Thu, Jun 15, 2017 at 4:11 AM, Chris Wilson > wrote: > The kernel does have a LRI after a flush before signaling the batch is > complete. I don't see a need to add another... > > The question is whether this posting is required for GPU visibility of > results or just CPU? I suspect this is just for CPU in which case it > doesn't belong here at all, but before flagging rendering as ready for > async (i.e. not involving the kernel) inspection. > > > The docs, if you choose to believe them, seem to indicate that this is needed > for GPU visibility as well as CPU. The kernel has LRI (for semaphore updates) not LRM, is that significant? Took me long enough to notice the difference. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/7] i965: Add an end-of-pipe sync helper
On Thu, Jun 15, 2017 at 4:11 AM, Chris Wilson wrote: > Quoting Kenneth Graunke (2017-06-14 21:41:56) > > On Tuesday, June 13, 2017 2:53:24 PM PDT Jason Ekstrand wrote: > > > From: Topi Pohjolainen > > > > > > v2 (Jason Ekstrand): > > > - Take a flags parameter to control the flushes > > > - Refactoring > > > > > > Signed-off-by: Topi Pohjolainen > > > --- > > > src/mesa/drivers/dri/i965/brw_context.h | 1 + > > > src/mesa/drivers/dri/i965/brw_pipe_control.c | 96 > +++- > > > 2 files changed, 96 insertions(+), 1 deletion(-) > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h > b/src/mesa/drivers/dri/i965/brw_context.h > > > index 7b9be8a..b137409 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_context.h > > > +++ b/src/mesa/drivers/dri/i965/brw_context.h > > > @@ -1641,6 +1641,7 @@ void brw_emit_pipe_control_flush(struct > brw_context *brw, uint32_t flags); > > > void brw_emit_pipe_control_write(struct brw_context *brw, uint32_t > flags, > > > struct brw_bo *bo, uint32_t offset, > > > uint64_t imm); > > > +void brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t > flags); > > > void brw_emit_mi_flush(struct brw_context *brw); > > > void brw_emit_post_sync_nonzero_flush(struct brw_context *brw); > > > void brw_emit_depth_stall_flushes(struct brw_context *brw); > > > diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c > b/src/mesa/drivers/dri/i965/brw_pipe_control.c > > > index 39bb9c7..338e4fc 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c > > > +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c > > > @@ -271,7 +271,6 @@ gen7_emit_cs_stall_flush(struct brw_context *brw) > > > brw->workaround_bo, 0, 0); > > > } > > > > > > - > > > /** > > > * Emits a PIPE_CONTROL with a non-zero post-sync operation, for > > > * implementing two workarounds on gen6. From section 1.4.7.1 > > > @@ -320,6 +319,101 @@ brw_emit_post_sync_nonzero_flush(struct > brw_context *brw) > > > brw->workaround_bo, 0, 0); > > > } > > > > > > +/* > > > + * From Sandybridge PRM, volume 2, "1.7.2 End-of-Pipe > Synchronization": > > > + * > > > + * Write synchronization is a special case of end-of-pipe > > > + * synchronization that requires that the render cache and/or depth > > > + * related caches are flushed to memory, where the data will become > > > + * globally visible. This type of synchronization is required prior > to > > > + * SW (CPU) actually reading the result data from memory, or > initiating > > > + * an operation that will use as a read surface (such as a texture > > > + * surface) a previous render target and/or depth/stencil buffer > > > + * > > > + * > > > + * From Haswell PRM, volume 2, part 1, "End-of-Pipe Synchronization": > > > + * > > > + * Exercising the write cache flush bits (Render Target Cache Flush > > > + * Enable, Depth Cache Flush Enable, DC Flush) in PIPE_CONTROL only > > > + * ensures the write caches are flushed and doesn't guarantee the > data > > > + * is globally visible. > > > + * > > > + * SW can track the completion of the end-of-pipe-synchronization by > > > + * using "Notify Enable" and "PostSync Operation - Write Immediate > > > + * Data" in the PIPE_CONTROL command. > > > + */ > > > +void > > > +brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags) > > > +{ > > > + if (brw->gen >= 6) { > > > + /* From Sandybridge PRM, volume 2, "1.7.3.1 Writing a Value to > Memory": > > > + * > > > + *"The most common action to perform upon reaching a > synchronization > > > + *point is to write a value out to memory. An immediate > value > > > + *(included with the synchronization command) may be > written." > > > + * > > > + * > > > + * From Broadwell PRM, volume 7, "End-of-Pipe Synchronization": > > > + * > > > + *"In case the data flushed out by the render engine is to > be read > > > + *back in to the render engine in coherent manner, then the > render > > > + *engine has to wait for the fence completion before > accessing the > > > + *flushed data. This can be achieved by following means on > various > > > + *products: PIPE_CONTROL command with CS Stall and the > required > > > + *write caches flushed with Post-Sync-Operation as Write > Immediate > > > + *Data. > > > + * > > > + *Example: > > > + * - Workload-1 (3D/GPGPU/MEDIA) > > > + * - PIPE_CONTROL (CS Stall, Post-Sync-Operation Write > Immediate > > > + * Data, Required Write Cache Flush bits set) > > > + * - Workload-2 (Can use the data produce or output by > Workload-1) > > > + */ > > > + brw_emit_pipe_control_write(brw, > > > + flags | PIPE_CONTROL_CS_STALL | > >
Re: [Mesa-dev] [PATCH 4/7] i965: Add an end-of-pipe sync helper
Quoting Kenneth Graunke (2017-06-14 21:41:56) > On Tuesday, June 13, 2017 2:53:24 PM PDT Jason Ekstrand wrote: > > From: Topi Pohjolainen > > > > v2 (Jason Ekstrand): > > - Take a flags parameter to control the flushes > > - Refactoring > > > > Signed-off-by: Topi Pohjolainen > > --- > > src/mesa/drivers/dri/i965/brw_context.h | 1 + > > src/mesa/drivers/dri/i965/brw_pipe_control.c | 96 > > +++- > > 2 files changed, 96 insertions(+), 1 deletion(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h > > b/src/mesa/drivers/dri/i965/brw_context.h > > index 7b9be8a..b137409 100644 > > --- a/src/mesa/drivers/dri/i965/brw_context.h > > +++ b/src/mesa/drivers/dri/i965/brw_context.h > > @@ -1641,6 +1641,7 @@ void brw_emit_pipe_control_flush(struct brw_context > > *brw, uint32_t flags); > > void brw_emit_pipe_control_write(struct brw_context *brw, uint32_t flags, > > struct brw_bo *bo, uint32_t offset, > > uint64_t imm); > > +void brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags); > > void brw_emit_mi_flush(struct brw_context *brw); > > void brw_emit_post_sync_nonzero_flush(struct brw_context *brw); > > void brw_emit_depth_stall_flushes(struct brw_context *brw); > > diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c > > b/src/mesa/drivers/dri/i965/brw_pipe_control.c > > index 39bb9c7..338e4fc 100644 > > --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c > > +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c > > @@ -271,7 +271,6 @@ gen7_emit_cs_stall_flush(struct brw_context *brw) > > brw->workaround_bo, 0, 0); > > } > > > > - > > /** > > * Emits a PIPE_CONTROL with a non-zero post-sync operation, for > > * implementing two workarounds on gen6. From section 1.4.7.1 > > @@ -320,6 +319,101 @@ brw_emit_post_sync_nonzero_flush(struct brw_context > > *brw) > > brw->workaround_bo, 0, 0); > > } > > > > +/* > > + * From Sandybridge PRM, volume 2, "1.7.2 End-of-Pipe Synchronization": > > + * > > + * Write synchronization is a special case of end-of-pipe > > + * synchronization that requires that the render cache and/or depth > > + * related caches are flushed to memory, where the data will become > > + * globally visible. This type of synchronization is required prior to > > + * SW (CPU) actually reading the result data from memory, or initiating > > + * an operation that will use as a read surface (such as a texture > > + * surface) a previous render target and/or depth/stencil buffer > > + * > > + * > > + * From Haswell PRM, volume 2, part 1, "End-of-Pipe Synchronization": > > + * > > + * Exercising the write cache flush bits (Render Target Cache Flush > > + * Enable, Depth Cache Flush Enable, DC Flush) in PIPE_CONTROL only > > + * ensures the write caches are flushed and doesn't guarantee the data > > + * is globally visible. > > + * > > + * SW can track the completion of the end-of-pipe-synchronization by > > + * using "Notify Enable" and "PostSync Operation - Write Immediate > > + * Data" in the PIPE_CONTROL command. > > + */ > > +void > > +brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags) > > +{ > > + if (brw->gen >= 6) { > > + /* From Sandybridge PRM, volume 2, "1.7.3.1 Writing a Value to > > Memory": > > + * > > + *"The most common action to perform upon reaching a > > synchronization > > + *point is to write a value out to memory. An immediate value > > + *(included with the synchronization command) may be written." > > + * > > + * > > + * From Broadwell PRM, volume 7, "End-of-Pipe Synchronization": > > + * > > + *"In case the data flushed out by the render engine is to be > > read > > + *back in to the render engine in coherent manner, then the > > render > > + *engine has to wait for the fence completion before accessing > > the > > + *flushed data. This can be achieved by following means on > > various > > + *products: PIPE_CONTROL command with CS Stall and the required > > + *write caches flushed with Post-Sync-Operation as Write > > Immediate > > + *Data. > > + * > > + *Example: > > + * - Workload-1 (3D/GPGPU/MEDIA) > > + * - PIPE_CONTROL (CS Stall, Post-Sync-Operation Write > > Immediate > > + * Data, Required Write Cache Flush bits set) > > + * - Workload-2 (Can use the data produce or output by > > Workload-1) > > + */ > > + brw_emit_pipe_control_write(brw, > > + flags | PIPE_CONTROL_CS_STALL | > > + PIPE_CONTROL_WRITE_IMMEDIATE, > > + brw->workaround_bo, 0, 0); > > + > > + if (brw->is_haswell) { > > + /* Haswell needs addition
Re: [Mesa-dev] [PATCH 4/7] i965: Add an end-of-pipe sync helper
On Tuesday, June 13, 2017 2:53:24 PM PDT Jason Ekstrand wrote: > From: Topi Pohjolainen > > v2 (Jason Ekstrand): > - Take a flags parameter to control the flushes > - Refactoring > > Signed-off-by: Topi Pohjolainen > --- > src/mesa/drivers/dri/i965/brw_context.h | 1 + > src/mesa/drivers/dri/i965/brw_pipe_control.c | 96 > +++- > 2 files changed, 96 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h > b/src/mesa/drivers/dri/i965/brw_context.h > index 7b9be8a..b137409 100644 > --- a/src/mesa/drivers/dri/i965/brw_context.h > +++ b/src/mesa/drivers/dri/i965/brw_context.h > @@ -1641,6 +1641,7 @@ void brw_emit_pipe_control_flush(struct brw_context > *brw, uint32_t flags); > void brw_emit_pipe_control_write(struct brw_context *brw, uint32_t flags, > struct brw_bo *bo, uint32_t offset, > uint64_t imm); > +void brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags); > void brw_emit_mi_flush(struct brw_context *brw); > void brw_emit_post_sync_nonzero_flush(struct brw_context *brw); > void brw_emit_depth_stall_flushes(struct brw_context *brw); > diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c > b/src/mesa/drivers/dri/i965/brw_pipe_control.c > index 39bb9c7..338e4fc 100644 > --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c > +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c > @@ -271,7 +271,6 @@ gen7_emit_cs_stall_flush(struct brw_context *brw) > brw->workaround_bo, 0, 0); > } > > - > /** > * Emits a PIPE_CONTROL with a non-zero post-sync operation, for > * implementing two workarounds on gen6. From section 1.4.7.1 > @@ -320,6 +319,101 @@ brw_emit_post_sync_nonzero_flush(struct brw_context > *brw) > brw->workaround_bo, 0, 0); > } > > +/* > + * From Sandybridge PRM, volume 2, "1.7.2 End-of-Pipe Synchronization": > + * > + * Write synchronization is a special case of end-of-pipe > + * synchronization that requires that the render cache and/or depth > + * related caches are flushed to memory, where the data will become > + * globally visible. This type of synchronization is required prior to > + * SW (CPU) actually reading the result data from memory, or initiating > + * an operation that will use as a read surface (such as a texture > + * surface) a previous render target and/or depth/stencil buffer > + * > + * > + * From Haswell PRM, volume 2, part 1, "End-of-Pipe Synchronization": > + * > + * Exercising the write cache flush bits (Render Target Cache Flush > + * Enable, Depth Cache Flush Enable, DC Flush) in PIPE_CONTROL only > + * ensures the write caches are flushed and doesn't guarantee the data > + * is globally visible. > + * > + * SW can track the completion of the end-of-pipe-synchronization by > + * using "Notify Enable" and "PostSync Operation - Write Immediate > + * Data" in the PIPE_CONTROL command. > + */ > +void > +brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags) > +{ > + if (brw->gen >= 6) { > + /* From Sandybridge PRM, volume 2, "1.7.3.1 Writing a Value to Memory": > + * > + *"The most common action to perform upon reaching a > synchronization > + *point is to write a value out to memory. An immediate value > + *(included with the synchronization command) may be written." > + * > + * > + * From Broadwell PRM, volume 7, "End-of-Pipe Synchronization": > + * > + *"In case the data flushed out by the render engine is to be read > + *back in to the render engine in coherent manner, then the render > + *engine has to wait for the fence completion before accessing the > + *flushed data. This can be achieved by following means on various > + *products: PIPE_CONTROL command with CS Stall and the required > + *write caches flushed with Post-Sync-Operation as Write Immediate > + *Data. > + * > + *Example: > + * - Workload-1 (3D/GPGPU/MEDIA) > + * - PIPE_CONTROL (CS Stall, Post-Sync-Operation Write Immediate > + * Data, Required Write Cache Flush bits set) > + * - Workload-2 (Can use the data produce or output by > Workload-1) > + */ > + brw_emit_pipe_control_write(brw, > + flags | PIPE_CONTROL_CS_STALL | > + PIPE_CONTROL_WRITE_IMMEDIATE, > + brw->workaround_bo, 0, 0); > + > + if (brw->is_haswell) { > + /* Haswell needs addition work-arounds: > + * > + * From Haswell PRM, volume 2, part 1, "End-of-Pipe > Synchronization": > + * > + *Option 1: > + *PIPE_CONTROL command with the CS Stall and the required write > + *caches flushed with Post-SyncOperat
[Mesa-dev] [PATCH 4/7] i965: Add an end-of-pipe sync helper
From: Topi Pohjolainen v2 (Jason Ekstrand): - Take a flags parameter to control the flushes - Refactoring Signed-off-by: Topi Pohjolainen --- src/mesa/drivers/dri/i965/brw_context.h | 1 + src/mesa/drivers/dri/i965/brw_pipe_control.c | 96 +++- 2 files changed, 96 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 7b9be8a..b137409 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -1641,6 +1641,7 @@ void brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags); void brw_emit_pipe_control_write(struct brw_context *brw, uint32_t flags, struct brw_bo *bo, uint32_t offset, uint64_t imm); +void brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags); void brw_emit_mi_flush(struct brw_context *brw); void brw_emit_post_sync_nonzero_flush(struct brw_context *brw); void brw_emit_depth_stall_flushes(struct brw_context *brw); diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c b/src/mesa/drivers/dri/i965/brw_pipe_control.c index 39bb9c7..338e4fc 100644 --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c @@ -271,7 +271,6 @@ gen7_emit_cs_stall_flush(struct brw_context *brw) brw->workaround_bo, 0, 0); } - /** * Emits a PIPE_CONTROL with a non-zero post-sync operation, for * implementing two workarounds on gen6. From section 1.4.7.1 @@ -320,6 +319,101 @@ brw_emit_post_sync_nonzero_flush(struct brw_context *brw) brw->workaround_bo, 0, 0); } +/* + * From Sandybridge PRM, volume 2, "1.7.2 End-of-Pipe Synchronization": + * + * Write synchronization is a special case of end-of-pipe + * synchronization that requires that the render cache and/or depth + * related caches are flushed to memory, where the data will become + * globally visible. This type of synchronization is required prior to + * SW (CPU) actually reading the result data from memory, or initiating + * an operation that will use as a read surface (such as a texture + * surface) a previous render target and/or depth/stencil buffer + * + * + * From Haswell PRM, volume 2, part 1, "End-of-Pipe Synchronization": + * + * Exercising the write cache flush bits (Render Target Cache Flush + * Enable, Depth Cache Flush Enable, DC Flush) in PIPE_CONTROL only + * ensures the write caches are flushed and doesn't guarantee the data + * is globally visible. + * + * SW can track the completion of the end-of-pipe-synchronization by + * using "Notify Enable" and "PostSync Operation - Write Immediate + * Data" in the PIPE_CONTROL command. + */ +void +brw_emit_end_of_pipe_sync(struct brw_context *brw, uint32_t flags) +{ + if (brw->gen >= 6) { + /* From Sandybridge PRM, volume 2, "1.7.3.1 Writing a Value to Memory": + * + *"The most common action to perform upon reaching a synchronization + *point is to write a value out to memory. An immediate value + *(included with the synchronization command) may be written." + * + * + * From Broadwell PRM, volume 7, "End-of-Pipe Synchronization": + * + *"In case the data flushed out by the render engine is to be read + *back in to the render engine in coherent manner, then the render + *engine has to wait for the fence completion before accessing the + *flushed data. This can be achieved by following means on various + *products: PIPE_CONTROL command with CS Stall and the required + *write caches flushed with Post-Sync-Operation as Write Immediate + *Data. + * + *Example: + * - Workload-1 (3D/GPGPU/MEDIA) + * - PIPE_CONTROL (CS Stall, Post-Sync-Operation Write Immediate + * Data, Required Write Cache Flush bits set) + * - Workload-2 (Can use the data produce or output by Workload-1) + */ + brw_emit_pipe_control_write(brw, + flags | PIPE_CONTROL_CS_STALL | + PIPE_CONTROL_WRITE_IMMEDIATE, + brw->workaround_bo, 0, 0); + + if (brw->is_haswell) { + /* Haswell needs addition work-arounds: + * + * From Haswell PRM, volume 2, part 1, "End-of-Pipe Synchronization": + * + *Option 1: + *PIPE_CONTROL command with the CS Stall and the required write + *caches flushed with Post-SyncOperation as Write Immediate Data + *followed by eight dummy MI_STORE_DATA_IMM (write to scratch + *spce) commands. + * + *Example: + * - Workload-1 + * - PIPE_CONTROL (CS Stall, Post-Sync-Operation Write + *