On Thu, Mar 24, 2016 at 7:45 PM, Francisco Jerez <curroje...@riseup.net> wrote:
> Francisco Jerez <curroje...@riseup.net> writes: > > > Jason Ekstrand <ja...@jlekstrand.net> writes: > > > >> On Thu, Mar 24, 2016 at 5:50 PM, Kenneth Graunke <kenn...@whitecape.org > > > >> wrote: > >> > >>> dEQP-GLES31.functional.fbo.no_attachments.* draws a quad with no > >>> framebuffer attachments, using a shader that discards based on > >>> gl_FragCoord. It uses occlusion queries to inspect whether pixels > >>> are rendered or not. > >>> > >>> Unfortunately, the hardware is not dispatching any pixel shaders, > >>> so discards never happen, and the full quad of pixels increments > >>> PS_DEPTH_COUNT, making the occlusion query results bogus. > >>> > >>> To understand why, we have to delve into the WM_INT internal > >>> signalling mechanism's formulas. > >>> > >>> The "WM_INT::Pixel Shader Kill Pixel" signal is defined as: > >>> > >>> 3DSTATE_WM::ForceKillPixel == ON || > >>> (3DSTATE_WM::ForceKillPixel != Off && > >>> !WM_INT::WM_HZ_OP && > >>> 3DSTATE_WM::EDSC_Mode != PREPS && > >>> (WM_INT::Depth Write Enable || WM_INT::Stencil Write Enable) && > >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > >>> (3DSTATE_PS_EXTRA::PixelShaderKillsPixels || > >>> 3DSTATE_PS_EXTRA:: oMask Present to RenderTarget || > >>> 3DSTATE_PS_BLEND::AlphaToCoverageEnable || > >>> 3DSTATE_PS_BLEND::AlphaTestEnable || > >>> 3DSTATE_WM_CHROMAKEY::ChromaKeyKillEnable)) > >>> > >>> Because there is no depth or stencil buffer, writes to those buffers > >>> are disabled. So the highlighted condition is false, making the whole > >>> "Kill Pixel" condition false. This then feeds into the following > >>> "WM_INT::ThreadDispatchEnable" condition: > >>> > >>> 3DSTATE_WM::ForceThreadDispatch != OFF && > >>> !WM_INT::WM_HZ_OP && > >>> 3DSTATE_PS_EXTRA::PixelShaderValid && > >>> (3DSTATE_PS_EXTRA::PixelShaderHasUAV || > >>> WM_INT::Pixel Shader Kill Pixel || > >>> WM_INT::RTIndependentRasterizationEnable || > >>> (!3DSTATE_PS_EXTRA::PixelShaderDoesNotWriteRT && > >>> 3DSTATE_PS_BLEND::HasWriteableRT) || > >>> (WM_INT::Pixel Shader Computed Depth Mode != PSCDEPTH_OFF && > >>> (WM_INT::Depth Test Enable || WM_INT::Depth Write Enable)) || > >>> (3DSTATE_PS_EXTRA::Computed Stencil && WM_INT::Stencil Test > Enable) || > >>> (3DSTATE_WM::EDSC_Mode == 1 && (WM_INT::Depth Test Enable || > >>> WM_INT::Depth Write Enable || > >>> WM_INT::Stencil Test Enable))) > >>> > >>> Given that there's no depth/stencil testing, no writeable render > target, > >>> and the hardware thinks kill pixel doesn't happen, all of these > >>> conditions are false. We have to whack some bit to make PS invocations > >>> happen. There are many options. > >>> > >>> Curro suggested using the UAV bit. There's some precedence in doing > >>> that - we set it for fragment shaders that do SSBO/image/atomic writes > >>> when no color buffer writes are enabled. We can simply include discard > >>> here too. > >>> > >>> Fixes 64 dEQP-GLES31.functional.fbo.no_attachments.* tests. > >>> > >>> Signed-off-by: Kenneth Graunke <kenn...@whitecape.org> > >>> --- > >>> src/mesa/drivers/dri/i965/gen8_ps_state.c | 4 ++-- > >>> 1 file changed, 2 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c > >>> b/src/mesa/drivers/dri/i965/gen8_ps_state.c > >>> index b9a06e7..cda8159 100644 > >>> --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c > >>> +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c > >>> @@ -93,8 +93,8 @@ gen8_upload_ps_extra(struct brw_context *brw, > >>> * > >>> * BRW_NEW_FS_PROG_DATA | BRW_NEW_FRAGMENT_PROGRAM | _NEW_BUFFERS | > >>> _NEW_COLOR > >>> */ > >>> - if (_mesa_active_fragment_shader_has_side_effects(&brw->ctx) && > >>> - !brw_color_buffer_write_enabled(brw)) > >>> + if ((_mesa_active_fragment_shader_has_side_effects(ctx) || > >>> + prog_data->uses_kill) && !brw_color_buffer_write_enabled(brw)) > >>> > >> > >> Whould you mind adding a short comment as to why uses_kill is in here? > You > >> don't need the full explination above, bot something along the lines of > >> > >> "Gen8 hardware tries to compute ThreadDispatchEnable for us but doesn't > >> take into account KillPixels when no depth or stencil writes are > enabled. > >> In order for occlusion queries to work correctly with no attachments, we > >> need to force-enable here." > >> > > That sounds good to me. > > > >> You can come up with your own text if you'd like. > >> > >> One other thought: Do we know if HAS_UAV has any perf impact other than > >> actually running the shader? I would guess not, but if it does we may > want > >> to make the condition a bit more specifi. > >> > > > > It does, it causes an implicit DC flush at the bottom of the pipeline, > > see the comment right on top of this code for a more detailed > > explanation. It would definitely be possible to make the check more > > specific by e.g. setting the bit only if both stencil and depth writes > > are disabled (in addition to color buffer writes), but Ken had some > > concerns about the additional state dependencies that would introduce > > when I brought up the same question earlier today in the office. Anyway > > this only gets turned on when no color buffer writes are enabled so it > > doesn't seem like a huge deal, but it would of course be nice to do the > > additional checks. > > > Let me elaborate a bit more why I don't think setting the has-UAV bit in > this case is a big deal: On the one hand, unless the application is > using shader images, SSBOs or atomics the implicit DC flush shouldn't be > too expensive because the DC is likely to be already close to being > coherent with the L3, so the flush shouldn't do much work -- However if > the shader *is* using images, SSBOs or atomics the has_side_effects > check in the same conditional will evaluate to true and the has-UAV bit > will be set (or not) regardless of the additional uses_kill condition. > On the other hand, because we don't use the UAV coherency mechanism (and > are unlikely to use it anytime soon), the DC flush is asynchronous and > doesn't lead to a pipeline stall on subsequent geometry processing, so > it shouldn't hurt much anyway. > Right, but what about the dept-stencil-only-with-a-discard case? In that case, we don't need to force pixel shader dispatch because KillsPixels will enable it and we would really rather not have a DC flush after every 3DPRIMITIVE. > BTW, if you're curious about the workings of the UAV coherency > mechanism, I wrote most of what I could find out in the commit message > of 5346c1167064d6429c6338974c6342f8346fd34b. I haven't found any decent > documentation about it anywhere else... > > >> Other than that, looks good to me. > >> --Jason > >> > >> > >>> dw1 |= GEN8_PSX_SHADER_HAS_UAV; > >>> > >>> if (prog_data->computed_stencil) { > >>> -- > >>> 2.7.4 > >>> > >>> _______________________________________________ > >>> mesa-dev mailing list > >>> mesa-dev@lists.freedesktop.org > >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > >>> > >> _______________________________________________ > >> mesa-dev mailing list > >> mesa-dev@lists.freedesktop.org > >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev