Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-06 Thread Rogovin, Kevin


>I'd have to think about it harder but even those are not likely to actually 
>get ASTC decode.  Maybe for some sort of decompression thing but if you're 
>doing
> GetCompressedTexImage, it'll probably turn into a blorp_copy which will use 
> R32G32B32A32_UINT.

I am thinking for the case where an application calls glGetTexImage() or 
glGetTextureImage(), which I think is legal in GL and the GL implementation is 
expected to do the decompress.

-Kevin
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Jason Ekstrand
On Tue, Dec 5, 2017 at 10:34 AM, Rogovin, Kevin 
wrote:

> Hi,
>
> >This isn't true.  100% of the intel_mipmap_tree -> blorp_surf
> translations are handled by that function.
> >  It's a perfectly reasonable place to handle these things.  It could
> also be handled in genX(blorp_exec) if that makes someone more comfortable.
>
> This is where I placed the ASTC enumeration setting, in genX(blorp_exec).
> I added a boolean signaling if the caller to blorp believed it would sample
> from an ASTC, if it did it sets the enum as ASTC otherwise as AUX.
>
> I confess I am not super familiar with blorp, but as far as I can tell,
> the only way for blorp to read an ASTC is for GetTexImage/GetTexSubImage
> since an ASTC surface cannot be used for an FBO.
>

I'd have to think about it harder but even those are not likely to actually
get ASTC decode.  Maybe for some sort of decompression thing but if you're
doing GetCompressedTexImage, it'll probably turn into a blorp_copy which
will use R32G32B32A32_UINT.


> > The problem is that a single invalidate doesn't actually cause a
> synchronization point in the rendering operation.  All it does is torch the
> texture cache and it
> > does so immediately and in parallel with currently active rendering.  If
> you can't have ASTC5x5 in the texture cache with a CCS_E surface, then we
> need to
> > do a CS stall to ensure that the previous draw is complete before we
> invalidate.  Otherwise, if the draw with ASTC5x5 is still in-flight, the
> texture cache will
> > immediately start to get populated with ASTC5x5 data again.
>
> Ahh futz, I forget to add that stall.. by luck the guilty application
> worked anyways (when I first wrote the work I did intel_batchbuffer_flush()
> and relaxed it down to texture invalidate).
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Rogovin, Kevin
Hi,

>This isn't true.  100% of the intel_mipmap_tree -> blorp_surf translations are 
>handled by that function. 
>  It's a perfectly reasonable place to handle these things.  It could also be 
> handled in genX(blorp_exec) if that makes someone more comfortable.

This is where I placed the ASTC enumeration setting, in genX(blorp_exec). I 
added a boolean signaling if the caller to blorp believed it would sample from 
an ASTC, if it did it sets the enum as ASTC otherwise as AUX.

I confess I am not super familiar with blorp, but as far as I can tell, the 
only way for blorp to read an ASTC is for GetTexImage/GetTexSubImage since an 
ASTC surface cannot be used for an FBO.

> The problem is that a single invalidate doesn't actually cause a 
> synchronization point in the rendering operation.  All it does is torch the 
> texture cache and it 
> does so immediately and in parallel with currently active rendering.  If you 
> can't have ASTC5x5 in the texture cache with a CCS_E surface, then we need to
> do a CS stall to ensure that the previous draw is complete before we 
> invalidate.  Otherwise, if the draw with ASTC5x5 is still in-flight, the 
> texture cache will 
> immediately start to get populated with ASTC5x5 data again.

Ahh futz, I forget to add that stall.. by luck the guilty application worked 
anyways (when I first wrote the work I did intel_batchbuffer_flush() and 
relaxed it down to texture invalidate).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Rogovin, Kevin

> Are you saying that this bug extends over hardware context?

Different HW contexts imply different execbuffer2 ioctl's. The kernel inserts a 
full blown flush of everything after (or before, I cannot remember) each 
execbuffer2 call. This way there is context isolation in HW buggineness.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Jason Ekstrand
On Tue, Dec 5, 2017 at 10:17 AM, Pohjolainen, Topi <
topi.pohjolai...@gmail.com> wrote:

> On Tue, Dec 05, 2017 at 10:26:33AM +, Rogovin, Kevin wrote:
> > Hi,
> >
> >
> > >> Here are my comments of the patch posted:
> > >>
> > >>  1.  it is essentially replication and moving around of the code of
> the patch series posted earlier but missing various
> > >>   important bits: preventing the sampler from using the auxiliary
> buffer (this requires to modify surface state
> > >>   sent in brw_wm_surfaces.c). It also does not cover blorp
> sufficiently (blorp might read from an ASTC5x5
> > >>   and there are more paths in blorp than blorp_surf_for_miptree()
> that sample from surfaces.
> > >>
> >
> > >Can you explain both more in detail? Resolves done in
> > >brw_predraw_resolve_inputs() mean that there is nothing interesting in
> the aux buffers and surface setup won't therefore enable auxiliary for
> texture surfaces.
> >
> > That there is nothing interesting is irrelevant to the sampler if the
> SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the
> SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the
> GPU command; The sampler will always try to read the auxiliary buffer if it
> is given the opportunity to do so. Indeed, it is quite feasible that less
> bandwidth is consumed if the sampler is given the chance to read an
> auxiliary buffer in place of the buffer; as such even if the surface is
> resolved one may wish to feed the sampler the auxiliary buffer. Indeed, for
> HiZ, i965 programs to use the HiZ auxiliary buffer even if the depth buffer
> is fully resolved (see inte_mipmap_tree_sample_with_hiz() in
> intel_mipmap_tree.c).
>

We do have code which checks for the resolve state and disables the
auxiliary buffer.  However, I don't like relying on it for correctness as
that's bitten us before.  I think it'd be better to check the W/A state for
ASTC5x5, assert that everything is resolved, and manually disable aux in
that case.


> > > In case of blorp, as far as I know all operations sampling something
> should go thru blorp_surf_for_miptree(). Can you point out cases that don't?
> >
> > Blorp is used in more than blorp_surf_for_miptree(), for example
> implementing GetTexImage(). Indeed, it is possible for blorp to sample from
> an ASTC5x5 (you can see this handled in the patch series I posted). I chose
> the hammer that the default is to just assume blorp is going to access
> auxiliary buffers unless a flag is set when the caller knows that blorp is
> going to sample from an astc5x5 (against see my patch series).
>

This isn't true.  100% of the intel_mipmap_tree -> blorp_surf translations
are handled by that function.  It's a perfectly reasonable place to handle
these things.  It could also be handled in genX(blorp_exec) if that makes
someone more comfortable.

> >Right. In the case of sampling both aux and astc5x5 in the same draw
> cycle the only thing to do is to disable aux. With my question of direction
> I meant the texture
> > > cache flush between two cycles. Do we need to flush in both cases
> > > 1) ASTC5x5 in first cycle and AUX in the following
> > > 2) AUX in first cycle and ASTC5x5 in the following
> >
> > YES we need to flush in both cases. What is happening is that the
> sampler hardware is bugged. Let us suppose it was bugged in only 1
> direction, take 1. Then if the sampler first samples from an ASTC5x5 then
> an AUX it would not hang, but the other way it would. However, if there are
> multiple draws in flight where one samples from an ASTC5x5 and the other
> does not, the command buffer order gives ZERO guarantee that the sampler
> will sample in that order because fragments get executed out-of-order even
> across draw calls even within a subslice (this is why sendc is needed at
> end of shader in GEN).
> >
>
> I need to clarify this bit a little more. In the current setup we have only
> one draw cycle in flight per context that uses sample engine. There may be
> blitter calls in parallel (although I'm not sure) but they don't use
> sampling
> engine anyway.
>
> The draw cycle itself may have multiple draws programmed into it but as
> they
> all are based on the same surface setup there is naturally no flushing
> problem.
> Surfaces with auxiliary would be resolved before the draw and programmed
> without auxiliary buffer.
>

The problem is that a single invalidate doesn't actually cause a
synchronization point in the rendering operation.  All it does is torch the
texture cache and it does so immediately and in parallel with currently
active rendering.  If you can't have ASTC5x5 in the texture cache with a
CCS_E surface, then we need to do a CS stall to ensure that the previous
draw is complete before we invalidate.  Otherwise, if the draw with ASTC5x5
is still in-flight, the texture cache will immediately start to get
populated with ASTC5x5 data again.
___
mesa-dev mailing list

Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Pohjolainen, Topi
On Tue, Dec 05, 2017 at 10:26:33AM +, Rogovin, Kevin wrote:
> Hi,
> 
> 
> >> Here are my comments of the patch posted:
> >> 
> >>  1.  it is essentially replication and moving around of the code of the 
> >> patch series posted earlier but missing various
> >>   important bits: preventing the sampler from using the auxiliary 
> >> buffer (this requires to modify surface state
> >>   sent in brw_wm_surfaces.c). It also does not cover blorp 
> >> sufficiently (blorp might read from an ASTC5x5
> >>   and there are more paths in blorp than blorp_surf_for_miptree() that 
> >> sample from surfaces.
> >> 
> 
> >Can you explain both more in detail? Resolves done in
> >brw_predraw_resolve_inputs() mean that there is nothing interesting in the 
> >aux buffers and surface setup won't therefore enable auxiliary for texture 
> >surfaces.
> 
> That there is nothing interesting is irrelevant to the sampler if the 
> SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the 
> SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the 
> GPU command; The sampler will always try to read the auxiliary buffer if it 
> is given the opportunity to do so. Indeed, it is quite feasible that less 
> bandwidth is consumed if the sampler is given the chance to read an auxiliary 
> buffer in place of the buffer; as such even if the surface is resolved one 
> may wish to feed the sampler the auxiliary buffer. Indeed, for HiZ, i965 
> programs to use the HiZ auxiliary buffer even if the depth buffer is fully 
> resolved (see inte_mipmap_tree_sample_with_hiz() in intel_mipmap_tree.c).
> 
> > In case of blorp, as far as I know all operations sampling something should 
> > go thru blorp_surf_for_miptree(). Can you point out cases that don't?
> 
> Blorp is used in more than blorp_surf_for_miptree(), for example implementing 
> GetTexImage(). Indeed, it is possible for blorp to sample from an ASTC5x5 
> (you can see this handled in the patch series I posted). I chose the hammer 
> that the default is to just assume blorp is going to access auxiliary buffers 
> unless a flag is set when the caller knows that blorp is going to sample from 
> an astc5x5 (against see my patch series).
> 
> >Right. In the case of sampling both aux and astc5x5 in the same draw cycle 
> >the only thing to do is to disable aux. With my question of direction I 
> >meant the texture 
> > cache flush between two cycles. Do we need to flush in both cases
> > 1) ASTC5x5 in first cycle and AUX in the following
> > 2) AUX in first cycle and ASTC5x5 in the following
> 
> YES we need to flush in both cases. What is happening is that the sampler 
> hardware is bugged. Let us suppose it was bugged in only 1 direction, take 1. 
> Then if the sampler first samples from an ASTC5x5 then an AUX it would not 
> hang, but the other way it would. However, if there are multiple draws in 
> flight where one samples from an ASTC5x5 and the other does not, the command 
> buffer order gives ZERO guarantee that the sampler will sample in that order 
> because fragments get executed out-of-order even across draw calls even 
> within a subslice (this is why sendc is needed at end of shader in GEN).
> 

I need to clarify this bit a little more. In the current setup we have only
one draw cycle in flight per context that uses sample engine. There may be
blitter calls in parallel (although I'm not sure) but they don't use sampling
engine anyway.

The draw cycle itself may have multiple draws programmed into it but as they
all are based on the same surface setup there is naturally no flushing problem.
Surfaces with auxiliary would be resolved before the draw and programmed
without auxiliary buffer.

Are you saying that this bug extends over hardware context?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Rogovin, Kevin

> If you take a look at brw_update_texture_surface(), just in the end before
> brw_emit_surface_state() the logic explictly consults for 
> intel_miptree_texture_aux_usage(). 
> This in turn tells if the auxiliary buffer is resolved and it doesn't need to 
> be programmed.

The full stack trace is this:

 brw_update_texture_surface() calls intel_miptree_texture_usage() which for HiZ 
calls the function intel_miptree_sample_with_hiz() which, as the name suggest, 
returns true if and only if it is ok to use the HiZ to speed up depth texture 
reads. Indeed, the function calls intel_miptree_texture_usage() switches on 
intel_mipmap_tree::aux_usage which the documentation states is about the 
intended usage of the auxiliary buffer. Indeed, if the value is 
ISL_AUX_USAGE_NONE that means, quoting the comment tag above the field, "that 
auxiliary compression is permanently disabled". The conclusion then is that 
even if a buffer is fully resolved, the value of aux_usage is not 
ISL_AUX_USAGE_NONE and that the return value of calls 
intel_miptree_texture_usage() gives the return value assuming that the 
auxiliary buffer can be used with respect to that it holds good values enough 
for the sampler.

>This path goes thru brw_blorp_download_miptree() which in turn takes either
> brw_blorp_copy_miptrees() or brw_blorp_blit_miptrees(). Both in turn consult 
> blorp_surf_for_miptree().

If you are 100% sure (because I am not) that ALL blorp paths walk through that, 
then I have no real objection except it needs to do the magic of checking if it 
is reading from an ASTC5x5 or a surface with an auxiliary and update the 
enumeration appropriately.

> This would be a nice a piece of documentation to add into the code. Thanks 
> for explaining.

I can add this, though I do freely admit I take this for granted and an axiom 
of HW accelerated graphics.

>I don't see how the bit mask prevents one from forcing anything.

By not being able to encode such a state (both are present) such a state is 
impossible to store and cannot be reached. From the point of view of code, an 
enum is slightly more pleasant to read than bitmaks IMO.

> I now do see a flaw in the RFC related to this. In 
> brw_predraw_resolve_inputs() one would 
> actually need to smash down the recorded AUX mask bit when it reacts to 
> ASTC5x5 being present.

Indeed, really two passes are needed over the textures. The first pass to 
detect if ASTC5x5 is present and a second to resolve auxiliary using textures 
if they are present.

>Well, if nothing else, I would really like to see you used 
>brw_predraw_resolve_inputs() for the resolves.

I am happy with that as that walks through the textures anyways BUT it is 
called before brw_predraw_resolve_framebuffer() which might do some resolving 
too. The easy way out is to permute their calling order unless there is some 
other dangling assumption preventing permuting the call order of those two.

> 
> -Kevin
>  
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Pohjolainen, Topi
On Tue, Dec 05, 2017 at 01:00:28PM +0200, Pohjolainen, Topi wrote:
> On Tue, Dec 05, 2017 at 10:26:33AM +, Rogovin, Kevin wrote:
> > Hi,
> > 
> > 
> > >> Here are my comments of the patch posted:
> > >> 
> > >>  1.  it is essentially replication and moving around of the code of the 
> > >> patch series posted earlier but missing various
> > >>   important bits: preventing the sampler from using the auxiliary 
> > >> buffer (this requires to modify surface state
> > >>   sent in brw_wm_surfaces.c). It also does not cover blorp 
> > >> sufficiently (blorp might read from an ASTC5x5
> > >>   and there are more paths in blorp than blorp_surf_for_miptree() 
> > >> that sample from surfaces.
> > >> 
> > 
> > >Can you explain both more in detail? Resolves done in
> > >brw_predraw_resolve_inputs() mean that there is nothing interesting in the 
> > >aux buffers and surface setup won't therefore enable auxiliary for texture 
> > >surfaces.
> > 
> > That there is nothing interesting is irrelevant to the sampler if the 
> > SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the 
> > SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the 
> > GPU command; The sampler will always try to read the auxiliary buffer if it 
> > is given the opportunity to do so. Indeed, it is quite feasible that less 
> > bandwidth is consumed if the sampler is given the chance to read an 
> > auxiliary buffer in place of the buffer; as such even if the surface is 
> > resolved one may wish to feed the sampler the auxiliary buffer. Indeed, for 
> > HiZ, i965 programs to use the HiZ auxiliary buffer even if the depth buffer 
> > is fully resolved (see inte_mipmap_tree_sample_with_hiz() in 
> > intel_mipmap_tree.c).
> 
> If you take a look at brw_update_texture_surface(), just in the end before
> brw_emit_surface_state() the logic explictly consults for
> intel_miptree_texture_aux_usage(). This in turn tells if the auxiliary buffer
> is resolved and it doesn't need to be programmed.
> 
> > 
> > > In case of blorp, as far as I know all operations sampling something 
> > > should go thru blorp_surf_for_miptree(). Can you point out cases that 
> > > don't?
> > 
> > Blorp is used in more than blorp_surf_for_miptree(), for example 
> > implementing GetTexImage(). Indeed, it is possible for blorp to sample from 
> > an ASTC5x5 (you can see this handled in the patch series I posted). I chose 
> > the hammer that the default is to just assume blorp is going to access 
> > auxiliary buffers unless a flag is set when the caller knows that blorp is 
> > going to sample from an astc5x5 (against see my patch series).
> 
> This path goes thru brw_blorp_download_miptree() which in turn takes either
> brw_blorp_copy_miptrees() or brw_blorp_blit_miptrees(). Both in turn consult
> blorp_surf_for_miptree().
> 
> > 
> > >Right. In the case of sampling both aux and astc5x5 in the same draw cycle 
> > >the only thing to do is to disable aux. With my question of direction I 
> > >meant the texture 
> > > cache flush between two cycles. Do we need to flush in both cases
> > > 1) ASTC5x5 in first cycle and AUX in the following
> > > 2) AUX in first cycle and ASTC5x5 in the following
> > 
> > YES we need to flush in both cases. What is happening is that the sampler 
> > hardware is bugged. Let us suppose it was bugged in only 1 direction, take 
> > 1. Then if the sampler first samples from an ASTC5x5 then an AUX it would 
> > not hang, but the other way it would. However, if there are multiple draws 
> > in flight where one samples from an ASTC5x5 and the other does not, the 
> > command buffer order gives ZERO guarantee that the sampler will sample in 
> > that order because fragments get executed out-of-order even across draw 
> > calls even within a subslice (this is why sendc is needed at end of shader 
> > in GEN).
> 
> This would be a nice a piece of documentation to add into the code. Thanks
> for explaining.
> 
> > 
> > >>  4. With 3 in mind, using the bit-masks is not a good idea as we want to 
> > >> then enforce at the code level
> > >>   that only one of the two is possible without texture invalidates.
> > > Can you elaborate this a little more? It tells if aux is/was used and it 
> > > tells if astc5x5 is/was used. That is all we need, right?
> > 
> > WRONG. We must enforce that a given draw call can have neither or only one. 
> > By having bitmasks it is possible to support a state having both.
> 
> I don't see how the bit mask prevents one from forcing anything. I now do see
> a flaw in the RFC related to this. In brw_predraw_resolve_inputs() one would
> actually need to smash down the recorded AUX mask bit when it reacts to 
> ASTC5x5
> being present.
> 
> > 
> > At any rate, please review the patch series I have posted and I am happy to 
> > take suggestions to improve that patch series that I have tested.
> 
> Well, if nothing else, I would really like to see you used
> brw_predraw_resolve_in

Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Pohjolainen, Topi
On Tue, Dec 05, 2017 at 10:26:33AM +, Rogovin, Kevin wrote:
> Hi,
> 
> 
> >> Here are my comments of the patch posted:
> >> 
> >>  1.  it is essentially replication and moving around of the code of the 
> >> patch series posted earlier but missing various
> >>   important bits: preventing the sampler from using the auxiliary 
> >> buffer (this requires to modify surface state
> >>   sent in brw_wm_surfaces.c). It also does not cover blorp 
> >> sufficiently (blorp might read from an ASTC5x5
> >>   and there are more paths in blorp than blorp_surf_for_miptree() that 
> >> sample from surfaces.
> >> 
> 
> >Can you explain both more in detail? Resolves done in
> >brw_predraw_resolve_inputs() mean that there is nothing interesting in the 
> >aux buffers and surface setup won't therefore enable auxiliary for texture 
> >surfaces.
> 
> That there is nothing interesting is irrelevant to the sampler if the 
> SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the 
> SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the 
> GPU command; The sampler will always try to read the auxiliary buffer if it 
> is given the opportunity to do so. Indeed, it is quite feasible that less 
> bandwidth is consumed if the sampler is given the chance to read an auxiliary 
> buffer in place of the buffer; as such even if the surface is resolved one 
> may wish to feed the sampler the auxiliary buffer. Indeed, for HiZ, i965 
> programs to use the HiZ auxiliary buffer even if the depth buffer is fully 
> resolved (see inte_mipmap_tree_sample_with_hiz() in intel_mipmap_tree.c).

If you take a look at brw_update_texture_surface(), just in the end before
brw_emit_surface_state() the logic explictly consults for
intel_miptree_texture_aux_usage(). This in turn tells if the auxiliary buffer
is resolved and it doesn't need to be programmed.

> 
> > In case of blorp, as far as I know all operations sampling something should 
> > go thru blorp_surf_for_miptree(). Can you point out cases that don't?
> 
> Blorp is used in more than blorp_surf_for_miptree(), for example implementing 
> GetTexImage(). Indeed, it is possible for blorp to sample from an ASTC5x5 
> (you can see this handled in the patch series I posted). I chose the hammer 
> that the default is to just assume blorp is going to access auxiliary buffers 
> unless a flag is set when the caller knows that blorp is going to sample from 
> an astc5x5 (against see my patch series).

This path goes thru brw_blorp_download_miptree() which in turn takes either
brw_blorp_copy_miptrees() or brw_blorp_blit_miptrees(). Both in turn consult
blorp_surf_for_miptree().

> 
> >Right. In the case of sampling both aux and astc5x5 in the same draw cycle 
> >the only thing to do is to disable aux. With my question of direction I 
> >meant the texture 
> > cache flush between two cycles. Do we need to flush in both cases
> > 1) ASTC5x5 in first cycle and AUX in the following
> > 2) AUX in first cycle and ASTC5x5 in the following
> 
> YES we need to flush in both cases. What is happening is that the sampler 
> hardware is bugged. Let us suppose it was bugged in only 1 direction, take 1. 
> Then if the sampler first samples from an ASTC5x5 then an AUX it would not 
> hang, but the other way it would. However, if there are multiple draws in 
> flight where one samples from an ASTC5x5 and the other does not, the command 
> buffer order gives ZERO guarantee that the sampler will sample in that order 
> because fragments get executed out-of-order even across draw calls even 
> within a subslice (this is why sendc is needed at end of shader in GEN).

This would be a nice a piece of documentation to add into the code. Thanks
for explaining.

> 
> >>  4. With 3 in mind, using the bit-masks is not a good idea as we want to 
> >> then enforce at the code level
> >>   that only one of the two is possible without texture invalidates.
> > Can you elaborate this a little more? It tells if aux is/was used and it 
> > tells if astc5x5 is/was used. That is all we need, right?
> 
> WRONG. We must enforce that a given draw call can have neither or only one. 
> By having bitmasks it is possible to support a state having both.

I don't see how the bit mask prevents one from forcing anything. I now do see
a flaw in the RFC related to this. In brw_predraw_resolve_inputs() one would
actually need to smash down the recorded AUX mask bit when it reacts to ASTC5x5
being present.

> 
> At any rate, please review the patch series I have posted and I am happy to 
> take suggestions to improve that patch series that I have tested.

Well, if nothing else, I would really like to see you used
brw_predraw_resolve_inputs() for the resolves.

> 
> -Kevin
>  
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Rogovin, Kevin
Hi,


>> Here are my comments of the patch posted:
>> 
>>  1.  it is essentially replication and moving around of the code of the 
>> patch series posted earlier but missing various
>>   important bits: preventing the sampler from using the auxiliary buffer 
>> (this requires to modify surface state
>>   sent in brw_wm_surfaces.c). It also does not cover blorp sufficiently 
>> (blorp might read from an ASTC5x5
>>   and there are more paths in blorp than blorp_surf_for_miptree() that 
>> sample from surfaces.
>> 

>Can you explain both more in detail? Resolves done in
>brw_predraw_resolve_inputs() mean that there is nothing interesting in the aux 
>buffers and surface setup won't therefore enable auxiliary for texture 
>surfaces.

That there is nothing interesting is irrelevant to the sampler if the 
SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the 
SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the GPU 
command; The sampler will always try to read the auxiliary buffer if it is 
given the opportunity to do so. Indeed, it is quite feasible that less 
bandwidth is consumed if the sampler is given the chance to read an auxiliary 
buffer in place of the buffer; as such even if the surface is resolved one may 
wish to feed the sampler the auxiliary buffer. Indeed, for HiZ, i965 programs 
to use the HiZ auxiliary buffer even if the depth buffer is fully resolved (see 
inte_mipmap_tree_sample_with_hiz() in intel_mipmap_tree.c).

> In case of blorp, as far as I know all operations sampling something should 
> go thru blorp_surf_for_miptree(). Can you point out cases that don't?

Blorp is used in more than blorp_surf_for_miptree(), for example implementing 
GetTexImage(). Indeed, it is possible for blorp to sample from an ASTC5x5 (you 
can see this handled in the patch series I posted). I chose the hammer that the 
default is to just assume blorp is going to access auxiliary buffers unless a 
flag is set when the caller knows that blorp is going to sample from an astc5x5 
(against see my patch series).

>Right. In the case of sampling both aux and astc5x5 in the same draw cycle the 
>only thing to do is to disable aux. With my question of direction I meant the 
>texture 
> cache flush between two cycles. Do we need to flush in both cases
> 1) ASTC5x5 in first cycle and AUX in the following
> 2) AUX in first cycle and ASTC5x5 in the following

YES we need to flush in both cases. What is happening is that the sampler 
hardware is bugged. Let us suppose it was bugged in only 1 direction, take 1. 
Then if the sampler first samples from an ASTC5x5 then an AUX it would not 
hang, but the other way it would. However, if there are multiple draws in 
flight where one samples from an ASTC5x5 and the other does not, the command 
buffer order gives ZERO guarantee that the sampler will sample in that order 
because fragments get executed out-of-order even across draw calls even within 
a subslice (this is why sendc is needed at end of shader in GEN).

>>  4. With 3 in mind, using the bit-masks is not a good idea as we want to 
>> then enforce at the code level
>>   that only one of the two is possible without texture invalidates.
> Can you elaborate this a little more? It tells if aux is/was used and it 
> tells if astc5x5 is/was used. That is all we need, right?

WRONG. We must enforce that a given draw call can have neither or only one. By 
having bitmasks it is possible to support a state having both.

At any rate, please review the patch series I have posted and I am happy to 
take suggestions to improve that patch series that I have tested.

-Kevin
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Pohjolainen, Topi
On Tue, Dec 05, 2017 at 05:41:56AM +, Rogovin, Kevin wrote:
> Hi,
> 
>  The patch series I have submitted handles the case of needing to resolve 
> texture surfaces when a draw (or compute) accesses a texture which is 
> astc5x5. As it is quite clear you understand the issue and know the code of 
> i965 the patch centers on, you are an excellent person to review the code.
> 

We discussed earlier if we could handle the needed resolves in
brw_predraw_resolve_inputs(). I understood you had reservations on how it
would turn out and therefore I thought I try writing it out in code. I think
we could fit it there nicely.

I have to admit that I went quite a bit further. One thing I was looking for
was getting the code recording the needed bits into context closer to the
point consuming them. That looks doable as well, now reacting to the bits and
recording are both in gen9_astc5x5_sampler_wa(). It also becomes clear that
brw_predraw_resolve_inputs() does not depend on the bits in the context but
solely on the current texture settings.
This is important to me as I'm quite sure I'll be debugging things later on
where I need to pay attention to what happens with this workaround (among
other things).

> Here are my comments of the patch posted:
> 
>  1.  it is essentially replication and moving around of the code of the patch 
> series posted earlier but missing various
>   important bits: preventing the sampler from using the auxiliary buffer 
> (this requires to modify surface state
>   sent in brw_wm_surfaces.c). It also does not cover blorp sufficiently 
> (blorp might read from an ASTC5x5
>   and there are more paths in blorp than blorp_surf_for_miptree() that 
> sample from surfaces.
> 

Can you explain both more in detail? Resolves done in
brw_predraw_resolve_inputs() mean that there is nothing interesting in the
aux buffers and surface setup won't therefore enable auxiliary for texture
surfaces.

In case of blorp, as far as I know all operations sampling something should go
thru blorp_surf_for_miptree(). Can you point out cases that don't?

>  2.  using the check that the GPU is gen9 (and for that matter, using the 
> name gen9_ astc5x5_sampler_wa())
>   is not ideal; I would not be surprised that the bug might not be 
> present on various re-spins of Gen9 and/or
>   it might linger on to future Gens. I do NOT know; using a Boolean 
> assigned earlier makes the code more
>   future-ish proof.

This is quite common style of ours, workaround has the name of the platform
it was first introduced in.

> 
>  3.  the nature of GPU fragment dispatch is going to require a texture 
> invalidate even if the sampler only
>   have the bug in one direction; this is because a subslice is not 
> guaranteed to process fragments in any
>   order. The crux is that a single sampler serves an entire sub-slice 
> which has more than 1 EU. It is quite
>   possible that one EU has threads of a draw call ahead of the other and 
> depending on the timing, portions
>   of those fragments' coming after might be processed by the sampler of 
> those before of those fragments
>   coming before in batchbuffer order. Indeed a single EU might have 
> threads from separate draws as well.
>   A texture invalidate places a barrier in the pipeline preventing the 
> mixing (and means that efficiency of 
>  GPU drops quite a bit with every texture invalidate sadly). 
> 

Right. In the case of sampling both aux and astc5x5 in the same draw cycle the
only thing to do is to disable aux. With my question of direction I meant the
texture cache flush between two cycles. Do we need to flush in both cases

1) ASTC5x5 in first cycle and AUX in the following
2) AUX in first cycle and ASTC5x5 in the following

>  4. With 3 in mind, using the bit-masks is not a good idea as we want to then 
> enforce at the code level
>   that only one of the two is possible without texture invalidates.

Can you elaborate this a little more? It tells if aux is/was used and it tells
if astc5x5 is/was used. That is all we need, right?

> 
> -Kevin
> 
> 
> -Original Message-
> From: Topi Pohjolainen [mailto:topi.pohjolai...@gmail.com] 
> Sent: Monday, December 4, 2017 12:49 PM
> To: mesa-dev@lists.freedesktop.org
> Cc: Rogovin, Kevin 
> Subject: [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
> 
> This is just drafting some thoughts and only compile tested.
> 
> CC: "Rogovin, Kevin" 
> ---
>  src/mesa/drivers/dri/i965/brw_blorp.c   |  8 +
>  src/mesa/drivers/dri/i965/brw_context.h | 10 ++
>  src/mesa/drivers/dri/i965/brw_draw.c| 54 
> -
>  3 files changed, 71 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
> b/src/mesa/drivers/dri/i965/brw_blorp.c
> index 680121b6ab..b3f84ab8ca 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp.c
> +++ b/src/mesa/drivers/dri/i965/brw_blorp.c
> @@ -186,11 +186,19 @@ blorp_surf_for_miptree(st

Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-04 Thread Rogovin, Kevin
Hi,

 The patch series I have submitted handles the case of needing to resolve 
texture surfaces when a draw (or compute) accesses a texture which is astc5x5. 
As it is quite clear you understand the issue and know the code of i965 the 
patch centers on, you are an excellent person to review the code.

Here are my comments of the patch posted:

 1.  it is essentially replication and moving around of the code of the patch 
series posted earlier but missing various
  important bits: preventing the sampler from using the auxiliary buffer 
(this requires to modify surface state
  sent in brw_wm_surfaces.c). It also does not cover blorp sufficiently 
(blorp might read from an ASTC5x5
  and there are more paths in blorp than blorp_surf_for_miptree() that 
sample from surfaces.

 2.  using the check that the GPU is gen9 (and for that matter, using the name 
gen9_ astc5x5_sampler_wa())
  is not ideal; I would not be surprised that the bug might not be present 
on various re-spins of Gen9 and/or
  it might linger on to future Gens. I do NOT know; using a Boolean 
assigned earlier makes the code more
  future-ish proof.

 3.  the nature of GPU fragment dispatch is going to require a texture 
invalidate even if the sampler only
  have the bug in one direction; this is because a subslice is not 
guaranteed to process fragments in any
  order. The crux is that a single sampler serves an entire sub-slice which 
has more than 1 EU. It is quite
  possible that one EU has threads of a draw call ahead of the other and 
depending on the timing, portions
  of those fragments' coming after might be processed by the sampler of 
those before of those fragments
  coming before in batchbuffer order. Indeed a single EU might have threads 
from separate draws as well.
  A texture invalidate places a barrier in the pipeline preventing the 
mixing (and means that efficiency of 
 GPU drops quite a bit with every texture invalidate sadly). 

 4. With 3 in mind, using the bit-masks is not a good idea as we want to then 
enforce at the code level
  that only one of the two is possible without texture invalidates.

-Kevin


-Original Message-
From: Topi Pohjolainen [mailto:topi.pohjolai...@gmail.com] 
Sent: Monday, December 4, 2017 12:49 PM
To: mesa-dev@lists.freedesktop.org
Cc: Rogovin, Kevin 
Subject: [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

This is just drafting some thoughts and only compile tested.

CC: "Rogovin, Kevin" 
---
 src/mesa/drivers/dri/i965/brw_blorp.c   |  8 +
 src/mesa/drivers/dri/i965/brw_context.h | 10 ++
 src/mesa/drivers/dri/i965/brw_draw.c| 54 -
 3 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
b/src/mesa/drivers/dri/i965/brw_blorp.c
index 680121b6ab..b3f84ab8ca 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.c
+++ b/src/mesa/drivers/dri/i965/brw_blorp.c
@@ -186,11 +186,19 @@ blorp_surf_for_miptree(struct brw_context *brw,
  surf->aux_addr.buffer = mt->hiz_buf->bo;
  surf->aux_addr.offset = mt->hiz_buf->offset;
   }
+
+  if (!is_render_target && brw->screen->devinfo.gen == 9)
+ gen9_astc5x5_sampler_wa(brw, GEN9_ASTC5X5_WA_TEX_TYPE_AUX);
} else {
   surf->aux_addr = (struct blorp_address) {
  .buffer = NULL,
   };
   memset(&surf->clear_color, 0, sizeof(surf->clear_color));
+
+  if (!is_render_target && brw->screen->devinfo.gen == 9 &&
+  (mt->format == MESA_FORMAT_RGBA_ASTC_5x5 ||
+   mt->format == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5))
+ gen9_astc5x5_sampler_wa(brw, 
+ GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5);
}
assert((surf->aux_usage == ISL_AUX_USAGE_NONE) ==
   (surf->aux_addr.buffer == NULL)); diff --git 
a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 0670483806..44602c23c0 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -165,6 +165,11 @@ enum brw_cache_id {
BRW_MAX_CACHE
 };
 
+enum gen9_astc5x5_wa_tex_type {
+   GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5 = 1 << 0,
+   GEN9_ASTC5X5_WA_TEX_TYPE_AUX = 1 << 1,
+};
+
 enum brw_state_id {
/* brw_cache_ids must come first - see brw_program_cache.c */
BRW_STATE_URB_FENCE = BRW_MAX_CACHE, @@ -1262,6 +1267,8 @@ struct 
brw_context
 */
bool draw_aux_buffer_disabled[MAX_DRAW_BUFFERS];
 
+   enum gen9_astc5x5_wa_tex_type gen9_sampler_wa_tex_mask;
+
__DRIcontext *driContext;
struct intel_screen *screen;
 };
@@ -1286,6 +1293,9 @@ void intel_update_renderbuffers(__DRIcontext *context,
 __DRIdrawable *drawable);  void 
intel_prepare_render(struct brw_context *brw);
 
+void gen9_astc5x5_sampler_wa(struct brw_context *brw,
+ enum gen9_astc5x5_wa_tex_type curr_mask);
+
 void brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering