Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
>I'd have to think about it harder but even those are not likely to actually >get ASTC decode. Maybe for some sort of decompression thing but if you're >doing > GetCompressedTexImage, it'll probably turn into a blorp_copy which will use > R32G32B32A32_UINT. I am thinking for the case where an application calls glGetTexImage() or glGetTextureImage(), which I think is legal in GL and the GL implementation is expected to do the decompress. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
On Tue, Dec 5, 2017 at 10:34 AM, Rogovin, Kevin wrote: > Hi, > > >This isn't true. 100% of the intel_mipmap_tree -> blorp_surf > translations are handled by that function. > > It's a perfectly reasonable place to handle these things. It could > also be handled in genX(blorp_exec) if that makes someone more comfortable. > > This is where I placed the ASTC enumeration setting, in genX(blorp_exec). > I added a boolean signaling if the caller to blorp believed it would sample > from an ASTC, if it did it sets the enum as ASTC otherwise as AUX. > > I confess I am not super familiar with blorp, but as far as I can tell, > the only way for blorp to read an ASTC is for GetTexImage/GetTexSubImage > since an ASTC surface cannot be used for an FBO. > I'd have to think about it harder but even those are not likely to actually get ASTC decode. Maybe for some sort of decompression thing but if you're doing GetCompressedTexImage, it'll probably turn into a blorp_copy which will use R32G32B32A32_UINT. > > The problem is that a single invalidate doesn't actually cause a > synchronization point in the rendering operation. All it does is torch the > texture cache and it > > does so immediately and in parallel with currently active rendering. If > you can't have ASTC5x5 in the texture cache with a CCS_E surface, then we > need to > > do a CS stall to ensure that the previous draw is complete before we > invalidate. Otherwise, if the draw with ASTC5x5 is still in-flight, the > texture cache will > > immediately start to get populated with ASTC5x5 data again. > > Ahh futz, I forget to add that stall.. by luck the guilty application > worked anyways (when I first wrote the work I did intel_batchbuffer_flush() > and relaxed it down to texture invalidate). > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
Hi, >This isn't true. 100% of the intel_mipmap_tree -> blorp_surf translations are >handled by that function. > It's a perfectly reasonable place to handle these things. It could also be > handled in genX(blorp_exec) if that makes someone more comfortable. This is where I placed the ASTC enumeration setting, in genX(blorp_exec). I added a boolean signaling if the caller to blorp believed it would sample from an ASTC, if it did it sets the enum as ASTC otherwise as AUX. I confess I am not super familiar with blorp, but as far as I can tell, the only way for blorp to read an ASTC is for GetTexImage/GetTexSubImage since an ASTC surface cannot be used for an FBO. > The problem is that a single invalidate doesn't actually cause a > synchronization point in the rendering operation. All it does is torch the > texture cache and it > does so immediately and in parallel with currently active rendering. If you > can't have ASTC5x5 in the texture cache with a CCS_E surface, then we need to > do a CS stall to ensure that the previous draw is complete before we > invalidate. Otherwise, if the draw with ASTC5x5 is still in-flight, the > texture cache will > immediately start to get populated with ASTC5x5 data again. Ahh futz, I forget to add that stall.. by luck the guilty application worked anyways (when I first wrote the work I did intel_batchbuffer_flush() and relaxed it down to texture invalidate). ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
> Are you saying that this bug extends over hardware context? Different HW contexts imply different execbuffer2 ioctl's. The kernel inserts a full blown flush of everything after (or before, I cannot remember) each execbuffer2 call. This way there is context isolation in HW buggineness. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
On Tue, Dec 5, 2017 at 10:17 AM, Pohjolainen, Topi < topi.pohjolai...@gmail.com> wrote: > On Tue, Dec 05, 2017 at 10:26:33AM +, Rogovin, Kevin wrote: > > Hi, > > > > > > >> Here are my comments of the patch posted: > > >> > > >> 1. it is essentially replication and moving around of the code of > the patch series posted earlier but missing various > > >> important bits: preventing the sampler from using the auxiliary > buffer (this requires to modify surface state > > >> sent in brw_wm_surfaces.c). It also does not cover blorp > sufficiently (blorp might read from an ASTC5x5 > > >> and there are more paths in blorp than blorp_surf_for_miptree() > that sample from surfaces. > > >> > > > > >Can you explain both more in detail? Resolves done in > > >brw_predraw_resolve_inputs() mean that there is nothing interesting in > the aux buffers and surface setup won't therefore enable auxiliary for > texture surfaces. > > > > That there is nothing interesting is irrelevant to the sampler if the > SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the > SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the > GPU command; The sampler will always try to read the auxiliary buffer if it > is given the opportunity to do so. Indeed, it is quite feasible that less > bandwidth is consumed if the sampler is given the chance to read an > auxiliary buffer in place of the buffer; as such even if the surface is > resolved one may wish to feed the sampler the auxiliary buffer. Indeed, for > HiZ, i965 programs to use the HiZ auxiliary buffer even if the depth buffer > is fully resolved (see inte_mipmap_tree_sample_with_hiz() in > intel_mipmap_tree.c). > We do have code which checks for the resolve state and disables the auxiliary buffer. However, I don't like relying on it for correctness as that's bitten us before. I think it'd be better to check the W/A state for ASTC5x5, assert that everything is resolved, and manually disable aux in that case. > > > In case of blorp, as far as I know all operations sampling something > should go thru blorp_surf_for_miptree(). Can you point out cases that don't? > > > > Blorp is used in more than blorp_surf_for_miptree(), for example > implementing GetTexImage(). Indeed, it is possible for blorp to sample from > an ASTC5x5 (you can see this handled in the patch series I posted). I chose > the hammer that the default is to just assume blorp is going to access > auxiliary buffers unless a flag is set when the caller knows that blorp is > going to sample from an astc5x5 (against see my patch series). > This isn't true. 100% of the intel_mipmap_tree -> blorp_surf translations are handled by that function. It's a perfectly reasonable place to handle these things. It could also be handled in genX(blorp_exec) if that makes someone more comfortable. > >Right. In the case of sampling both aux and astc5x5 in the same draw > cycle the only thing to do is to disable aux. With my question of direction > I meant the texture > > > cache flush between two cycles. Do we need to flush in both cases > > > 1) ASTC5x5 in first cycle and AUX in the following > > > 2) AUX in first cycle and ASTC5x5 in the following > > > > YES we need to flush in both cases. What is happening is that the > sampler hardware is bugged. Let us suppose it was bugged in only 1 > direction, take 1. Then if the sampler first samples from an ASTC5x5 then > an AUX it would not hang, but the other way it would. However, if there are > multiple draws in flight where one samples from an ASTC5x5 and the other > does not, the command buffer order gives ZERO guarantee that the sampler > will sample in that order because fragments get executed out-of-order even > across draw calls even within a subslice (this is why sendc is needed at > end of shader in GEN). > > > > I need to clarify this bit a little more. In the current setup we have only > one draw cycle in flight per context that uses sample engine. There may be > blitter calls in parallel (although I'm not sure) but they don't use > sampling > engine anyway. > > The draw cycle itself may have multiple draws programmed into it but as > they > all are based on the same surface setup there is naturally no flushing > problem. > Surfaces with auxiliary would be resolved before the draw and programmed > without auxiliary buffer. > The problem is that a single invalidate doesn't actually cause a synchronization point in the rendering operation. All it does is torch the texture cache and it does so immediately and in parallel with currently active rendering. If you can't have ASTC5x5 in the texture cache with a CCS_E surface, then we need to do a CS stall to ensure that the previous draw is complete before we invalidate. Otherwise, if the draw with ASTC5x5 is still in-flight, the texture cache will immediately start to get populated with ASTC5x5 data again. ___ mesa-dev mailing list
Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
On Tue, Dec 05, 2017 at 10:26:33AM +, Rogovin, Kevin wrote: > Hi, > > > >> Here are my comments of the patch posted: > >> > >> 1. it is essentially replication and moving around of the code of the > >> patch series posted earlier but missing various > >> important bits: preventing the sampler from using the auxiliary > >> buffer (this requires to modify surface state > >> sent in brw_wm_surfaces.c). It also does not cover blorp > >> sufficiently (blorp might read from an ASTC5x5 > >> and there are more paths in blorp than blorp_surf_for_miptree() that > >> sample from surfaces. > >> > > >Can you explain both more in detail? Resolves done in > >brw_predraw_resolve_inputs() mean that there is nothing interesting in the > >aux buffers and surface setup won't therefore enable auxiliary for texture > >surfaces. > > That there is nothing interesting is irrelevant to the sampler if the > SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the > SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the > GPU command; The sampler will always try to read the auxiliary buffer if it > is given the opportunity to do so. Indeed, it is quite feasible that less > bandwidth is consumed if the sampler is given the chance to read an auxiliary > buffer in place of the buffer; as such even if the surface is resolved one > may wish to feed the sampler the auxiliary buffer. Indeed, for HiZ, i965 > programs to use the HiZ auxiliary buffer even if the depth buffer is fully > resolved (see inte_mipmap_tree_sample_with_hiz() in intel_mipmap_tree.c). > > > In case of blorp, as far as I know all operations sampling something should > > go thru blorp_surf_for_miptree(). Can you point out cases that don't? > > Blorp is used in more than blorp_surf_for_miptree(), for example implementing > GetTexImage(). Indeed, it is possible for blorp to sample from an ASTC5x5 > (you can see this handled in the patch series I posted). I chose the hammer > that the default is to just assume blorp is going to access auxiliary buffers > unless a flag is set when the caller knows that blorp is going to sample from > an astc5x5 (against see my patch series). > > >Right. In the case of sampling both aux and astc5x5 in the same draw cycle > >the only thing to do is to disable aux. With my question of direction I > >meant the texture > > cache flush between two cycles. Do we need to flush in both cases > > 1) ASTC5x5 in first cycle and AUX in the following > > 2) AUX in first cycle and ASTC5x5 in the following > > YES we need to flush in both cases. What is happening is that the sampler > hardware is bugged. Let us suppose it was bugged in only 1 direction, take 1. > Then if the sampler first samples from an ASTC5x5 then an AUX it would not > hang, but the other way it would. However, if there are multiple draws in > flight where one samples from an ASTC5x5 and the other does not, the command > buffer order gives ZERO guarantee that the sampler will sample in that order > because fragments get executed out-of-order even across draw calls even > within a subslice (this is why sendc is needed at end of shader in GEN). > I need to clarify this bit a little more. In the current setup we have only one draw cycle in flight per context that uses sample engine. There may be blitter calls in parallel (although I'm not sure) but they don't use sampling engine anyway. The draw cycle itself may have multiple draws programmed into it but as they all are based on the same surface setup there is naturally no flushing problem. Surfaces with auxiliary would be resolved before the draw and programmed without auxiliary buffer. Are you saying that this bug extends over hardware context? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
> If you take a look at brw_update_texture_surface(), just in the end before > brw_emit_surface_state() the logic explictly consults for > intel_miptree_texture_aux_usage(). > This in turn tells if the auxiliary buffer is resolved and it doesn't need to > be programmed. The full stack trace is this: brw_update_texture_surface() calls intel_miptree_texture_usage() which for HiZ calls the function intel_miptree_sample_with_hiz() which, as the name suggest, returns true if and only if it is ok to use the HiZ to speed up depth texture reads. Indeed, the function calls intel_miptree_texture_usage() switches on intel_mipmap_tree::aux_usage which the documentation states is about the intended usage of the auxiliary buffer. Indeed, if the value is ISL_AUX_USAGE_NONE that means, quoting the comment tag above the field, "that auxiliary compression is permanently disabled". The conclusion then is that even if a buffer is fully resolved, the value of aux_usage is not ISL_AUX_USAGE_NONE and that the return value of calls intel_miptree_texture_usage() gives the return value assuming that the auxiliary buffer can be used with respect to that it holds good values enough for the sampler. >This path goes thru brw_blorp_download_miptree() which in turn takes either > brw_blorp_copy_miptrees() or brw_blorp_blit_miptrees(). Both in turn consult > blorp_surf_for_miptree(). If you are 100% sure (because I am not) that ALL blorp paths walk through that, then I have no real objection except it needs to do the magic of checking if it is reading from an ASTC5x5 or a surface with an auxiliary and update the enumeration appropriately. > This would be a nice a piece of documentation to add into the code. Thanks > for explaining. I can add this, though I do freely admit I take this for granted and an axiom of HW accelerated graphics. >I don't see how the bit mask prevents one from forcing anything. By not being able to encode such a state (both are present) such a state is impossible to store and cannot be reached. From the point of view of code, an enum is slightly more pleasant to read than bitmaks IMO. > I now do see a flaw in the RFC related to this. In > brw_predraw_resolve_inputs() one would > actually need to smash down the recorded AUX mask bit when it reacts to > ASTC5x5 being present. Indeed, really two passes are needed over the textures. The first pass to detect if ASTC5x5 is present and a second to resolve auxiliary using textures if they are present. >Well, if nothing else, I would really like to see you used >brw_predraw_resolve_inputs() for the resolves. I am happy with that as that walks through the textures anyways BUT it is called before brw_predraw_resolve_framebuffer() which might do some resolving too. The easy way out is to permute their calling order unless there is some other dangling assumption preventing permuting the call order of those two. > > -Kevin > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
On Tue, Dec 05, 2017 at 01:00:28PM +0200, Pohjolainen, Topi wrote: > On Tue, Dec 05, 2017 at 10:26:33AM +, Rogovin, Kevin wrote: > > Hi, > > > > > > >> Here are my comments of the patch posted: > > >> > > >> 1. it is essentially replication and moving around of the code of the > > >> patch series posted earlier but missing various > > >> important bits: preventing the sampler from using the auxiliary > > >> buffer (this requires to modify surface state > > >> sent in brw_wm_surfaces.c). It also does not cover blorp > > >> sufficiently (blorp might read from an ASTC5x5 > > >> and there are more paths in blorp than blorp_surf_for_miptree() > > >> that sample from surfaces. > > >> > > > > >Can you explain both more in detail? Resolves done in > > >brw_predraw_resolve_inputs() mean that there is nothing interesting in the > > >aux buffers and surface setup won't therefore enable auxiliary for texture > > >surfaces. > > > > That there is nothing interesting is irrelevant to the sampler if the > > SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the > > SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the > > GPU command; The sampler will always try to read the auxiliary buffer if it > > is given the opportunity to do so. Indeed, it is quite feasible that less > > bandwidth is consumed if the sampler is given the chance to read an > > auxiliary buffer in place of the buffer; as such even if the surface is > > resolved one may wish to feed the sampler the auxiliary buffer. Indeed, for > > HiZ, i965 programs to use the HiZ auxiliary buffer even if the depth buffer > > is fully resolved (see inte_mipmap_tree_sample_with_hiz() in > > intel_mipmap_tree.c). > > If you take a look at brw_update_texture_surface(), just in the end before > brw_emit_surface_state() the logic explictly consults for > intel_miptree_texture_aux_usage(). This in turn tells if the auxiliary buffer > is resolved and it doesn't need to be programmed. > > > > > > In case of blorp, as far as I know all operations sampling something > > > should go thru blorp_surf_for_miptree(). Can you point out cases that > > > don't? > > > > Blorp is used in more than blorp_surf_for_miptree(), for example > > implementing GetTexImage(). Indeed, it is possible for blorp to sample from > > an ASTC5x5 (you can see this handled in the patch series I posted). I chose > > the hammer that the default is to just assume blorp is going to access > > auxiliary buffers unless a flag is set when the caller knows that blorp is > > going to sample from an astc5x5 (against see my patch series). > > This path goes thru brw_blorp_download_miptree() which in turn takes either > brw_blorp_copy_miptrees() or brw_blorp_blit_miptrees(). Both in turn consult > blorp_surf_for_miptree(). > > > > > >Right. In the case of sampling both aux and astc5x5 in the same draw cycle > > >the only thing to do is to disable aux. With my question of direction I > > >meant the texture > > > cache flush between two cycles. Do we need to flush in both cases > > > 1) ASTC5x5 in first cycle and AUX in the following > > > 2) AUX in first cycle and ASTC5x5 in the following > > > > YES we need to flush in both cases. What is happening is that the sampler > > hardware is bugged. Let us suppose it was bugged in only 1 direction, take > > 1. Then if the sampler first samples from an ASTC5x5 then an AUX it would > > not hang, but the other way it would. However, if there are multiple draws > > in flight where one samples from an ASTC5x5 and the other does not, the > > command buffer order gives ZERO guarantee that the sampler will sample in > > that order because fragments get executed out-of-order even across draw > > calls even within a subslice (this is why sendc is needed at end of shader > > in GEN). > > This would be a nice a piece of documentation to add into the code. Thanks > for explaining. > > > > > >> 4. With 3 in mind, using the bit-masks is not a good idea as we want to > > >> then enforce at the code level > > >> that only one of the two is possible without texture invalidates. > > > Can you elaborate this a little more? It tells if aux is/was used and it > > > tells if astc5x5 is/was used. That is all we need, right? > > > > WRONG. We must enforce that a given draw call can have neither or only one. > > By having bitmasks it is possible to support a state having both. > > I don't see how the bit mask prevents one from forcing anything. I now do see > a flaw in the RFC related to this. In brw_predraw_resolve_inputs() one would > actually need to smash down the recorded AUX mask bit when it reacts to > ASTC5x5 > being present. > > > > > At any rate, please review the patch series I have posted and I am happy to > > take suggestions to improve that patch series that I have tested. > > Well, if nothing else, I would really like to see you used > brw_predraw_resolve_in
Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
On Tue, Dec 05, 2017 at 10:26:33AM +, Rogovin, Kevin wrote: > Hi, > > > >> Here are my comments of the patch posted: > >> > >> 1. it is essentially replication and moving around of the code of the > >> patch series posted earlier but missing various > >> important bits: preventing the sampler from using the auxiliary > >> buffer (this requires to modify surface state > >> sent in brw_wm_surfaces.c). It also does not cover blorp > >> sufficiently (blorp might read from an ASTC5x5 > >> and there are more paths in blorp than blorp_surf_for_miptree() that > >> sample from surfaces. > >> > > >Can you explain both more in detail? Resolves done in > >brw_predraw_resolve_inputs() mean that there is nothing interesting in the > >aux buffers and surface setup won't therefore enable auxiliary for texture > >surfaces. > > That there is nothing interesting is irrelevant to the sampler if the > SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the > SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the > GPU command; The sampler will always try to read the auxiliary buffer if it > is given the opportunity to do so. Indeed, it is quite feasible that less > bandwidth is consumed if the sampler is given the chance to read an auxiliary > buffer in place of the buffer; as such even if the surface is resolved one > may wish to feed the sampler the auxiliary buffer. Indeed, for HiZ, i965 > programs to use the HiZ auxiliary buffer even if the depth buffer is fully > resolved (see inte_mipmap_tree_sample_with_hiz() in intel_mipmap_tree.c). If you take a look at brw_update_texture_surface(), just in the end before brw_emit_surface_state() the logic explictly consults for intel_miptree_texture_aux_usage(). This in turn tells if the auxiliary buffer is resolved and it doesn't need to be programmed. > > > In case of blorp, as far as I know all operations sampling something should > > go thru blorp_surf_for_miptree(). Can you point out cases that don't? > > Blorp is used in more than blorp_surf_for_miptree(), for example implementing > GetTexImage(). Indeed, it is possible for blorp to sample from an ASTC5x5 > (you can see this handled in the patch series I posted). I chose the hammer > that the default is to just assume blorp is going to access auxiliary buffers > unless a flag is set when the caller knows that blorp is going to sample from > an astc5x5 (against see my patch series). This path goes thru brw_blorp_download_miptree() which in turn takes either brw_blorp_copy_miptrees() or brw_blorp_blit_miptrees(). Both in turn consult blorp_surf_for_miptree(). > > >Right. In the case of sampling both aux and astc5x5 in the same draw cycle > >the only thing to do is to disable aux. With my question of direction I > >meant the texture > > cache flush between two cycles. Do we need to flush in both cases > > 1) ASTC5x5 in first cycle and AUX in the following > > 2) AUX in first cycle and ASTC5x5 in the following > > YES we need to flush in both cases. What is happening is that the sampler > hardware is bugged. Let us suppose it was bugged in only 1 direction, take 1. > Then if the sampler first samples from an ASTC5x5 then an AUX it would not > hang, but the other way it would. However, if there are multiple draws in > flight where one samples from an ASTC5x5 and the other does not, the command > buffer order gives ZERO guarantee that the sampler will sample in that order > because fragments get executed out-of-order even across draw calls even > within a subslice (this is why sendc is needed at end of shader in GEN). This would be a nice a piece of documentation to add into the code. Thanks for explaining. > > >> 4. With 3 in mind, using the bit-masks is not a good idea as we want to > >> then enforce at the code level > >> that only one of the two is possible without texture invalidates. > > Can you elaborate this a little more? It tells if aux is/was used and it > > tells if astc5x5 is/was used. That is all we need, right? > > WRONG. We must enforce that a given draw call can have neither or only one. > By having bitmasks it is possible to support a state having both. I don't see how the bit mask prevents one from forcing anything. I now do see a flaw in the RFC related to this. In brw_predraw_resolve_inputs() one would actually need to smash down the recorded AUX mask bit when it reacts to ASTC5x5 being present. > > At any rate, please review the patch series I have posted and I am happy to > take suggestions to improve that patch series that I have tested. Well, if nothing else, I would really like to see you used brw_predraw_resolve_inputs() for the resolves. > > -Kevin > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
Hi, >> Here are my comments of the patch posted: >> >> 1. it is essentially replication and moving around of the code of the >> patch series posted earlier but missing various >> important bits: preventing the sampler from using the auxiliary buffer >> (this requires to modify surface state >> sent in brw_wm_surfaces.c). It also does not cover blorp sufficiently >> (blorp might read from an ASTC5x5 >> and there are more paths in blorp than blorp_surf_for_miptree() that >> sample from surfaces. >> >Can you explain both more in detail? Resolves done in >brw_predraw_resolve_inputs() mean that there is nothing interesting in the aux >buffers and surface setup won't therefore enable auxiliary for texture >surfaces. That there is nothing interesting is irrelevant to the sampler if the SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the GPU command; The sampler will always try to read the auxiliary buffer if it is given the opportunity to do so. Indeed, it is quite feasible that less bandwidth is consumed if the sampler is given the chance to read an auxiliary buffer in place of the buffer; as such even if the surface is resolved one may wish to feed the sampler the auxiliary buffer. Indeed, for HiZ, i965 programs to use the HiZ auxiliary buffer even if the depth buffer is fully resolved (see inte_mipmap_tree_sample_with_hiz() in intel_mipmap_tree.c). > In case of blorp, as far as I know all operations sampling something should > go thru blorp_surf_for_miptree(). Can you point out cases that don't? Blorp is used in more than blorp_surf_for_miptree(), for example implementing GetTexImage(). Indeed, it is possible for blorp to sample from an ASTC5x5 (you can see this handled in the patch series I posted). I chose the hammer that the default is to just assume blorp is going to access auxiliary buffers unless a flag is set when the caller knows that blorp is going to sample from an astc5x5 (against see my patch series). >Right. In the case of sampling both aux and astc5x5 in the same draw cycle the >only thing to do is to disable aux. With my question of direction I meant the >texture > cache flush between two cycles. Do we need to flush in both cases > 1) ASTC5x5 in first cycle and AUX in the following > 2) AUX in first cycle and ASTC5x5 in the following YES we need to flush in both cases. What is happening is that the sampler hardware is bugged. Let us suppose it was bugged in only 1 direction, take 1. Then if the sampler first samples from an ASTC5x5 then an AUX it would not hang, but the other way it would. However, if there are multiple draws in flight where one samples from an ASTC5x5 and the other does not, the command buffer order gives ZERO guarantee that the sampler will sample in that order because fragments get executed out-of-order even across draw calls even within a subslice (this is why sendc is needed at end of shader in GEN). >> 4. With 3 in mind, using the bit-masks is not a good idea as we want to >> then enforce at the code level >> that only one of the two is possible without texture invalidates. > Can you elaborate this a little more? It tells if aux is/was used and it > tells if astc5x5 is/was used. That is all we need, right? WRONG. We must enforce that a given draw call can have neither or only one. By having bitmasks it is possible to support a state having both. At any rate, please review the patch series I have posted and I am happy to take suggestions to improve that patch series that I have tested. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
On Tue, Dec 05, 2017 at 05:41:56AM +, Rogovin, Kevin wrote: > Hi, > > The patch series I have submitted handles the case of needing to resolve > texture surfaces when a draw (or compute) accesses a texture which is > astc5x5. As it is quite clear you understand the issue and know the code of > i965 the patch centers on, you are an excellent person to review the code. > We discussed earlier if we could handle the needed resolves in brw_predraw_resolve_inputs(). I understood you had reservations on how it would turn out and therefore I thought I try writing it out in code. I think we could fit it there nicely. I have to admit that I went quite a bit further. One thing I was looking for was getting the code recording the needed bits into context closer to the point consuming them. That looks doable as well, now reacting to the bits and recording are both in gen9_astc5x5_sampler_wa(). It also becomes clear that brw_predraw_resolve_inputs() does not depend on the bits in the context but solely on the current texture settings. This is important to me as I'm quite sure I'll be debugging things later on where I need to pay attention to what happens with this workaround (among other things). > Here are my comments of the patch posted: > > 1. it is essentially replication and moving around of the code of the patch > series posted earlier but missing various > important bits: preventing the sampler from using the auxiliary buffer > (this requires to modify surface state > sent in brw_wm_surfaces.c). It also does not cover blorp sufficiently > (blorp might read from an ASTC5x5 > and there are more paths in blorp than blorp_surf_for_miptree() that > sample from surfaces. > Can you explain both more in detail? Resolves done in brw_predraw_resolve_inputs() mean that there is nothing interesting in the aux buffers and surface setup won't therefore enable auxiliary for texture surfaces. In case of blorp, as far as I know all operations sampling something should go thru blorp_surf_for_miptree(). Can you point out cases that don't? > 2. using the check that the GPU is gen9 (and for that matter, using the > name gen9_ astc5x5_sampler_wa()) > is not ideal; I would not be surprised that the bug might not be > present on various re-spins of Gen9 and/or > it might linger on to future Gens. I do NOT know; using a Boolean > assigned earlier makes the code more > future-ish proof. This is quite common style of ours, workaround has the name of the platform it was first introduced in. > > 3. the nature of GPU fragment dispatch is going to require a texture > invalidate even if the sampler only > have the bug in one direction; this is because a subslice is not > guaranteed to process fragments in any > order. The crux is that a single sampler serves an entire sub-slice > which has more than 1 EU. It is quite > possible that one EU has threads of a draw call ahead of the other and > depending on the timing, portions > of those fragments' coming after might be processed by the sampler of > those before of those fragments > coming before in batchbuffer order. Indeed a single EU might have > threads from separate draws as well. > A texture invalidate places a barrier in the pipeline preventing the > mixing (and means that efficiency of > GPU drops quite a bit with every texture invalidate sadly). > Right. In the case of sampling both aux and astc5x5 in the same draw cycle the only thing to do is to disable aux. With my question of direction I meant the texture cache flush between two cycles. Do we need to flush in both cases 1) ASTC5x5 in first cycle and AUX in the following 2) AUX in first cycle and ASTC5x5 in the following > 4. With 3 in mind, using the bit-masks is not a good idea as we want to then > enforce at the code level > that only one of the two is possible without texture invalidates. Can you elaborate this a little more? It tells if aux is/was used and it tells if astc5x5 is/was used. That is all we need, right? > > -Kevin > > > -Original Message- > From: Topi Pohjolainen [mailto:topi.pohjolai...@gmail.com] > Sent: Monday, December 4, 2017 12:49 PM > To: mesa-dev@lists.freedesktop.org > Cc: Rogovin, Kevin > Subject: [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug > > This is just drafting some thoughts and only compile tested. > > CC: "Rogovin, Kevin" > --- > src/mesa/drivers/dri/i965/brw_blorp.c | 8 + > src/mesa/drivers/dri/i965/brw_context.h | 10 ++ > src/mesa/drivers/dri/i965/brw_draw.c| 54 > - > 3 files changed, 71 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c > b/src/mesa/drivers/dri/i965/brw_blorp.c > index 680121b6ab..b3f84ab8ca 100644 > --- a/src/mesa/drivers/dri/i965/brw_blorp.c > +++ b/src/mesa/drivers/dri/i965/brw_blorp.c > @@ -186,11 +186,19 @@ blorp_surf_for_miptree(st
Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
Hi, The patch series I have submitted handles the case of needing to resolve texture surfaces when a draw (or compute) accesses a texture which is astc5x5. As it is quite clear you understand the issue and know the code of i965 the patch centers on, you are an excellent person to review the code. Here are my comments of the patch posted: 1. it is essentially replication and moving around of the code of the patch series posted earlier but missing various important bits: preventing the sampler from using the auxiliary buffer (this requires to modify surface state sent in brw_wm_surfaces.c). It also does not cover blorp sufficiently (blorp might read from an ASTC5x5 and there are more paths in blorp than blorp_surf_for_miptree() that sample from surfaces. 2. using the check that the GPU is gen9 (and for that matter, using the name gen9_ astc5x5_sampler_wa()) is not ideal; I would not be surprised that the bug might not be present on various re-spins of Gen9 and/or it might linger on to future Gens. I do NOT know; using a Boolean assigned earlier makes the code more future-ish proof. 3. the nature of GPU fragment dispatch is going to require a texture invalidate even if the sampler only have the bug in one direction; this is because a subslice is not guaranteed to process fragments in any order. The crux is that a single sampler serves an entire sub-slice which has more than 1 EU. It is quite possible that one EU has threads of a draw call ahead of the other and depending on the timing, portions of those fragments' coming after might be processed by the sampler of those before of those fragments coming before in batchbuffer order. Indeed a single EU might have threads from separate draws as well. A texture invalidate places a barrier in the pipeline preventing the mixing (and means that efficiency of GPU drops quite a bit with every texture invalidate sadly). 4. With 3 in mind, using the bit-masks is not a good idea as we want to then enforce at the code level that only one of the two is possible without texture invalidates. -Kevin -Original Message- From: Topi Pohjolainen [mailto:topi.pohjolai...@gmail.com] Sent: Monday, December 4, 2017 12:49 PM To: mesa-dev@lists.freedesktop.org Cc: Rogovin, Kevin Subject: [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug This is just drafting some thoughts and only compile tested. CC: "Rogovin, Kevin" --- src/mesa/drivers/dri/i965/brw_blorp.c | 8 + src/mesa/drivers/dri/i965/brw_context.h | 10 ++ src/mesa/drivers/dri/i965/brw_draw.c| 54 - 3 files changed, 71 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c b/src/mesa/drivers/dri/i965/brw_blorp.c index 680121b6ab..b3f84ab8ca 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp.c +++ b/src/mesa/drivers/dri/i965/brw_blorp.c @@ -186,11 +186,19 @@ blorp_surf_for_miptree(struct brw_context *brw, surf->aux_addr.buffer = mt->hiz_buf->bo; surf->aux_addr.offset = mt->hiz_buf->offset; } + + if (!is_render_target && brw->screen->devinfo.gen == 9) + gen9_astc5x5_sampler_wa(brw, GEN9_ASTC5X5_WA_TEX_TYPE_AUX); } else { surf->aux_addr = (struct blorp_address) { .buffer = NULL, }; memset(&surf->clear_color, 0, sizeof(surf->clear_color)); + + if (!is_render_target && brw->screen->devinfo.gen == 9 && + (mt->format == MESA_FORMAT_RGBA_ASTC_5x5 || + mt->format == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5)) + gen9_astc5x5_sampler_wa(brw, + GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5); } assert((surf->aux_usage == ISL_AUX_USAGE_NONE) == (surf->aux_addr.buffer == NULL)); diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 0670483806..44602c23c0 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -165,6 +165,11 @@ enum brw_cache_id { BRW_MAX_CACHE }; +enum gen9_astc5x5_wa_tex_type { + GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5 = 1 << 0, + GEN9_ASTC5X5_WA_TEX_TYPE_AUX = 1 << 1, +}; + enum brw_state_id { /* brw_cache_ids must come first - see brw_program_cache.c */ BRW_STATE_URB_FENCE = BRW_MAX_CACHE, @@ -1262,6 +1267,8 @@ struct brw_context */ bool draw_aux_buffer_disabled[MAX_DRAW_BUFFERS]; + enum gen9_astc5x5_wa_tex_type gen9_sampler_wa_tex_mask; + __DRIcontext *driContext; struct intel_screen *screen; }; @@ -1286,6 +1293,9 @@ void intel_update_renderbuffers(__DRIcontext *context, __DRIdrawable *drawable); void intel_prepare_render(struct brw_context *brw); +void gen9_astc5x5_sampler_wa(struct brw_context *brw, + enum gen9_astc5x5_wa_tex_type curr_mask); + void brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering