Re: [Mesa-dev] [PATCH 0/2] Implement INTEL_fragment_shader_ordering
Hi, > You are completely missing the point. Any test which tests the > GL_ARB_fragment_shader_interlock extension with a > beginInvocationInterlockARB() call inside of control-flow would be an > invalid test for GL_ARB_fragment_shader_interlock since that behavior is > undefined. Therefore, any such test would be a bad test. Unless we have > tests which test beginFragmentShaderOrderingINTEL() inside non-uniform > control-flow, it is insufficiently tested. Therefore, any tests we may > have for GL_ARB_fragment_shader_interlock are either invalid use of the ARB > extension or they are insufficient to test the INTEL extension. My apologies, I misunderstood your question "is that tested?"; I thought the question was about if the assentation that the current implementation of beginFragmentShaderOrderingINTEL() and beginInvocationInterlockARB() were functionally interchangeable has been verified/tested. FWIW, a test where beginInvocationInterlockARB() is called within any control flow or for that matter not called from main(), the shader is supposed to be rejected according the spec. In a negative way, it is another test. Though, while acknowledging Ian's point that "works for vendor X" has caused a great deal of GLSL fragmentation, I still personally prefer to implementation to try to take what applications throw at them as much as reasonable possible. However, that is not my decision to make and whatever the community chooses is what it shall be. I agree with you that taking the current ARB test and then adding some non-uniform flow control (along with removing the layout() qualifiers) to it is the best way to go and I *think* Plamena has offered to do so. At this point, I can just let for it all to sort out and bow out of the discussion at this point as the series seems to have brought additional issues up that are out-of-scope for what I had in mind with the series (namely something small-simple to expose the extension and not enforcing the limitation of the ARB extension). -Kevin -Original Message- From: Rogovin, Kevin Sent: Tuesday, August 28, 2018 10:10 PM To: 'mesa-dev@lists.freedesktop.org' Cc: ja...@jlekstrand.net Subject: [Mesa-dev] [PATCH 0/2] Implement INTEL_fragment_shader_ordering Hi, > Is that tested? I have tested it in a simple shader test I made (i.e. not in a dedicated test suite such as dEQP, piglit or something else). The created assembly is identical. The specific example is a shader where I replace calls of beginFragmentShaderOrderingINTEL() with beginInvocationInterlockARB() and the created assembly is the same. Running with INTEL_DEBUG=fs produced identical output except the SSA NIR had the different opcode. AFAIK, the current Mesa implementation of the ARB extension does not in any way shape or form enforce any of "no control flow", "only once and only in main" requirements. Indeed, Mesa happily accepts code even without the endInvocationInterlockARB() call as well. Personally, I am happy that it is more accepting rather than rejecting the code. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/2] Implement INTEL_fragment_shader_ordering
Hi, > Is that tested? I have tested it in a simple shader test I made (i.e. not in a dedicated test suite such as dEQP, piglit or something else). The created assembly is identical. The specific example is a shader where I replace calls of beginFragmentShaderOrderingINTEL() with beginInvocationInterlockARB() and the created assembly is the same. Running with INTEL_DEBUG=fs produced identical output except the SSA NIR had the different opcode. AFAIK, the current Mesa implementation of the ARB extension does not in any way shape or form enforce any of "no control flow", "only once and only in main" requirements. Indeed, Mesa happily accepts code even without the endInvocationInterlockARB() call as well. Personally, I am happy that it is more accepting rather than rejecting the code. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/2] Implement INTEL_fragment_shader_ordering
Hi, > Having the same underlying compiler intrinsic and having the same behavior > are not the same thing. The INTEL extension allows strictly more > functionality than the ARB extension so it needs even more testing. In > particular, it allows you to do the barrier in non-uniform control-flow > which is a very different thing than at the top of a shader like is allowed > by ARB_fragment_shader_interlock. I expect the INTEL extension to need > substantially more testing than the ARB one. Just an FYI: the Mesa implementation for the ARB (and thus NV) extension accept shaders where the begin function takes place inside of control flow actually. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [1/2] mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering.
Hi all, Patch addressing the missing enum warning is here: https://lists.freedesktop.org/archives/mesa-dev/2018-August/203796.html . Also, see my reply to Francisco why I think it is better to have a new intrinsic function for that. Best Regards, -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [1/2] mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering.
Hi, Sighs; I missed that warning on the build since there is so much build output. I can issue a small patch to handle the missing enum. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/2] Implement INTEL_fragment_shader_ordering
Hi, Sorry for the doubly reply, but I wanted to add one more important bit in my thinking in addition to doing minimal code changes and following existing convention. The ARB/NV extensions defined a critical section where as the INTEL extension is just a barrier function. I suspect (but I do not know) that the NVIDIA hardware that supports the extension is implemented by some weird interfacing logic with the pixel backend and that implementation requires that there is just one critical section under no control flow whereas the INTEL is just a memory barrier. In that light, I think they should be 3 different intrinsics with the thinking that if the Nouveau driver adds support for the ARB/NV extension it will do the IR analysis to do what is needed for the critical-section style extension. -Kevin From: Rogovin, Kevin Sent: Tuesday, August 28, 2018 7:05 PM To: mesa-dev@lists.freedesktop.org Cc: Jerez Plata, Francisco Subject: Re: [Mesa-dev] [PATCH 0/2] Implement INTEL_fragment_shader_ordering Hi, I followed the convention that was already present. The code from ARB/NV_fragment_shader_interlock had an intrinsic for begin critical section and an intrinsic for end critical section. I figured that since this extension has a completely different thinking (i.e. a memory barrier instead of a section) it warranted a new intrinsic. That and doing this way incurred the least amount of changes which I figured is always a good thing. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/2] Implement INTEL_fragment_shader_ordering
Hi, On the questions of tests: If people want, I can adapt the test in piglit for ARB_fragment_shader_interlock to this INTEL one. In general, I have an app/library that uses the extension and testing of that does definitely work on i965 (which should be utterly unsurprising since the produced assembly is identical). Indeed the current implementation in Mesa of ARB_fragment_shader_interlock is as flexible as the INTEL one since the compiler accepts code with the begin/end interlock anywhere since the only backend that supports it, i965, can easily accept it. I view the INTEL_fragment_shader_ordering implementation in a similar light as the NV_fragment_shader_interlock where it actually does not really do anything new, but signals to an application that the Mesa/i965 implementation allows more (and does it correctly) than the ARB/NV extensions restrict to. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/2] Implement INTEL_fragment_shader_ordering
Hi, I followed the convention that was already present. The code from ARB/NV_fragment_shader_interlock had an intrinsic for begin critical section and an intrinsic for end critical section. I figured that since this extension has a completely different thinking (i.e. a memory barrier instead of a section) it warranted a new intrinsic. That and doing this way incurred the least amount of changes which I figured is always a good thing. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] i965: prevent using auxilary buffers when an astc5x5 texture is present
Hi, I will just pipe in on the scenario that needs to be checked and ways around the scenario as well. Firstly, What Tapani has below is what I first wrote as well when developing a v4, it is cleaner and from that point of view I prefer it as well. However, the following scenario made me have reservations: 1. The final patch in the series has a simple test if the src in blorp_params HAS an auxiliary buffer, rather than a test if the sampler will read the surface using the auxiliary buffer. Note that on a simple blit from one texture that has an auxiliary (for example lossless color compression) to another surface that blorp needs to use the auxiliary buffer, this case is likely to happen in glBlitFramebuffer). 2. If brw_resolve_framebuffer() triggers a blorp call (for example render to and read from a same texture ala GL_ARB_texture_barrier), then blorp MIGHT set the mode to BRW_ASTC5x5_WA_MODE_HAS_AUX (even though the resolve does not have the sampler read from the surface in the way that helps trigger the hang). However, then if the mode was set in brw_resolve_inputs() to HAS_ASTC5x5, then that information is lost. The state upload for surfaces, because it uses the mode value, will then let auxiliary buffers be used by the sampler. I do not know the internals of blorp well; if it is the case that the check made in patch 3/3 for having an auxiliary buffer is false in the case of the blorp calls triggered by brw_resolve_framebuffer(), then what Tapani posted is better; if that is not the case, then a nasty, hard to track bug will then lurk. -Kevin -Original Message- From: Palli, Tapani Sent: Friday, March 16, 2018 11:32 AM To: mesa-dev@lists.freedesktop.org Cc: ja...@jlekstrand.net; Rogovin, Kevin ; Palli, Tapani Subject: [PATCH v2] i965: prevent using auxilary buffers when an astc5x5 texture is present From: Kevin Rogovin If ASTC5x5 textures are present, resolve all textures that the sampler accesses so that auxilary buffer is unneeded when the astc5x5 workaround is needed and also program the sampler state to not use the auxilary buffer as well. v2: call gen9_set_astc5x5_wa_mode directly in brw_predraw_resolve_inputs (Tapani) Signed-off-by: Kevin Rogovin Signed-off-by: Tapani Pälli --- I've gone through this patch and wanted to express my thoughts as v2 rather than review. I've also discussed with Kevin offline about a plan to build a proper Piglit test to hit this and attempt to hit blorp as well (currently nothing really hitting that one). src/mesa/drivers/dri/i965/brw_draw.c | 64 +--- src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 12 - 2 files changed, 68 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index ef7c6a7df9..0b4e993576 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -392,6 +392,8 @@ mark_textures_used_for_txf(BITSET_WORD *used_for_txf, * * Resolve the depth buffer's HiZ buffer, resolve the depth buffer of each * enabled depth texture, and flush the render cache for any dirty textures. + * In addition, if the ASTC5x5 workaround is needed and if ASTC5x5 + textures + * are present, resolve textures so that auxilary buffers are not needed. */ void brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering, @@ -412,8 +414,33 @@ brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering, mark_textures_used_for_txf(used_for_txf, ctx->ComputeProgram._Current); } - /* Resolve depth buffer and render cache of each enabled texture. */ int maxEnabledUnit = ctx->Texture._MaxEnabledTexImageUnit; + bool disable_aux = false; + bool texture_astc5x5_present = false; + bool texture_with_auxilary_present = false; + + if (gen9_astc5x5_wa_required(brw)) { + /* walk through all the texture units to see if an ASTC5x5 and/or + * a texture with an auxilary buffer is to be accessed. + */ + for (int i = 0; i <= maxEnabledUnit; i++) { + if (!ctx->Texture.Unit[i]._Current) +continue; + tex_obj = intel_texture_object(ctx->Texture.Unit[i]._Current); + if (!tex_obj) +continue; + if (tex_obj->mt && tex_obj->mt->aux_usage != ISL_AUX_USAGE_NONE) { +texture_with_auxilary_present = true; + } + if (tex_obj->_Format == MESA_FORMAT_RGBA_ASTC_5x5 || + tex_obj->_Format == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5) { +texture_astc5x5_present = true; + } + } + disable_aux = texture_astc5x5_present; + } + + /* Resolve depth buffer and render cache of each enabled texture. */ for (int i = 0; i <= maxEnabledUnit; i++) { if (!ctx->Texture.Unit[i]._Current) continue; @@ -445,9 +472,16 @@ brw_predraw_resolve_inputs(struct brw_context *brw, bool ren
Re: [Mesa-dev] [PATCH v3 2/7] i965: restore diable_aux argument to intel_miptree_prepare_texture()
Just to make sure, you want it so that a new field is added to brw_context that tells intel_miptree_prepare_texture() to disable auxilary buffer using? That variable would then need to be carefully set and tracked carefully if anything besids brw_draw uses the function. This appears to be opposite request than that what was given in patch 1 (that request I thought was good). FWIW, there is now a work in progress taking these review requests into use https://github.com/krogueintel/asem/tree/astc5x5-wa-v4; at this point in time this issue and the blorp issues remain in that branch. -Kevin From: Jason Ekstrand [mailto:ja...@jlekstrand.net] Sent: Tuesday, March 6, 2018 6:00 PM To: Rogovin, Kevin Cc: ML mesa-dev Subject: Re: [Mesa-dev] [PATCH v3 2/7] i965: restore diable_aux argument to intel_miptree_prepare_texture() We took it out with good reason... I'd rather we do something similar to what we did for render targets and just pass aux usage directly from brw_predraw_resolve_inputs to brw_wm_surface_state through the context. On Tue, Feb 27, 2018 at 1:30 AM, mailto:kevin.rogo...@intel.com>> wrote: From: Kevin Rogovin mailto:kevin.rogo...@intel.com>> Signed-off-by: Kevin Rogovin mailto:kevin.rogo...@intel.com>> --- src/mesa/drivers/dri/i965/brw_draw.c | 9 ++--- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 5 +++-- src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 3 ++- 3 files changed, 11 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index 299e7f9..0241035 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -447,7 +447,8 @@ brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering, intel_miptree_prepare_texture(brw, tex_obj->mt, view_format, min_level, num_levels, -min_layer, num_layers); +min_layer, num_layers, +false); /* If any programs are using it with texelFetch, we may need to also do * a prepare with an sRGB format to ensure texelFetch works "properly". @@ -458,7 +459,8 @@ brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering, if (txf_format != view_format) { intel_miptree_prepare_texture(brw, tex_obj->mt, txf_format, min_level, num_levels, - min_layer, num_layers); + min_layer, num_layers, + false); } } @@ -530,7 +532,8 @@ brw_predraw_resolve_framebuffer(struct brw_context *brw, if (irb) { intel_miptree_prepare_texture(brw, irb->mt, irb->mt->surf.format, irb->mt_level, 1, - irb->mt_layer, irb->layer_count); + irb->mt_layer, irb->layer_count, + false); } } } diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index c6213b2..dbd9f7a 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -2648,9 +2648,10 @@ intel_miptree_prepare_texture(struct brw_context *brw, struct intel_mipmap_tree *mt, enum isl_format view_format, uint32_t start_level, uint32_t num_levels, - uint32_t start_layer, uint32_t num_layers) + uint32_t start_layer, uint32_t num_layers, + bool disable_aux) { - enum isl_aux_usage aux_usage = + enum isl_aux_usage aux_usage = (disable_aux) ? ISL_AUX_USAGE_NONE : intel_miptree_texture_aux_usage(brw, mt, view_format); bool clear_supported = aux_usage != ISL_AUX_USAGE_NONE; diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h index 07c8580..ee72309 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h @@ -642,7 +642,8 @@ intel_miptree_prepare_texture(struct brw_context *brw, struct intel_mipmap_tree *mt, enum isl_format view_format, uint32_t start_level, uint32_t num_levels, - uint32_t start_layer, uint32_t num_layers); + uint32_t start_layer, uint32_t num_layers, + bool disable_aux); void intel_miptree_prepare_image(struct brw_context *brw,
Re: [Mesa-dev] [PATCH v3 7/7] i965: ASTC5x5 workaround logic for blorp
What would be a good place for this in the blorp code? I am guessing the only time that blorp would trigger an astc read is in blitting… what would you suggest for keeping it inside of blorp? The blorp_context struct is part of blorp not i965… to keep it in blorp would mean it needs to be passed down along the function calls somehow … -Kevin From: Jason Ekstrand [mailto:ja...@jlekstrand.net] Sent: Tuesday, March 6, 2018 6:07 PM To: Rogovin, Kevin Cc: ML mesa-dev Subject: Re: [Mesa-dev] [PATCH v3 7/7] i965: ASTC5x5 workaround logic for blorp On Tue, Feb 27, 2018 at 1:30 AM, mailto:kevin.rogo...@intel.com>> wrote: From: Kevin Rogovin mailto:kevin.rogo...@intel.com>> Signed-off-by: Kevin Rogovin mailto:kevin.rogo...@intel.com>> --- src/mesa/drivers/dri/i965/genX_blorp_exec.c | 5 + src/mesa/drivers/dri/i965/intel_tex_image.c | 16 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/genX_blorp_exec.c b/src/mesa/drivers/dri/i965/genX_blorp_exec.c index 062171a..b7c35e9 100644 --- a/src/mesa/drivers/dri/i965/genX_blorp_exec.c +++ b/src/mesa/drivers/dri/i965/genX_blorp_exec.c @@ -230,6 +230,11 @@ genX(blorp_exec)(struct blorp_batch *batch, struct gl_context *ctx = &brw->ctx; bool check_aperture_failed_once = false; + if (brw->astc5x5_wa.blorp_sampling_from_astc5x5) { + gen9_set_astc5x5_wa_mode(brw, BRW_ASTC5x5_WA_MODE_HAS_ASTC5x5); + } else { + gen9_set_astc5x5_wa_mode(brw, BRW_ASTC5x5_WA_MODE_HAS_AUX); + } /* Flush the sampler and render caches. We definitely need to flush the * sampler cache so that we get updated contents from the render cache for * the glBlitFramebuffer() source. Also, we are sometimes warned in the diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c b/src/mesa/drivers/dri/i965/intel_tex_image.c index e25bc9a..38967fb 100644 --- a/src/mesa/drivers/dri/i965/intel_tex_image.c +++ b/src/mesa/drivers/dri/i965/intel_tex_image.c @@ -824,10 +824,18 @@ intel_get_tex_sub_image(struct gl_context *ctx, DBG("%s\n", __func__); if (_mesa_is_bufferobj(ctx->Pack.BufferObj)) { - if (intel_gettexsubimage_blorp(brw, texImage, - xoffset, yoffset, zoffset, - width, height, depth, format, type, - pixels, &ctx->Pack)) + bool blorp_success; + + brw->astc5x5_wa.blorp_sampling_from_astc5x5 = + (texImage->TexFormat == MESA_FORMAT_RGBA_ASTC_5x5 || + texImage->TexFormat == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5); + blorp_success = intel_gettexsubimage_blorp(brw, texImage, + xoffset, yoffset, zoffset, + width, height, depth, + format, type, pixels, + &ctx->Pack); + brw->astc5x5_wa.blorp_sampling_from_astc5x5 = false; + if (blorp_success) Let's keep all of the code for detecting whether or not BLORP needs and implementing it together in BLORP. We can just as easily detect ASTC5x5 there as here. return; perf_debug("%s: fallback to CPU mapping in PBO case\n", __func__); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org<mailto:mesa-dev@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 3/7] i965: set ASTC5x5 workaround texture type tracking on texture validate
Originally, I had the entire astc and aux checking in a dedicated function; would that be more preferable? -Kevin From: Jason Ekstrand [mailto:ja...@jlekstrand.net] Sent: Tuesday, March 6, 2018 6:02 PM To: Rogovin, Kevin Cc: ML mesa-dev Subject: Re: [Mesa-dev] [PATCH v3 3/7] i965: set ASTC5x5 workaround texture type tracking on texture validate I don't really like this being in texture validation. It has nothing to do with texture validation. Instead, we should put it in brw_predraw_resolve_inputs and do it as a pre-pass. This would let us store astc5x5_present as a local variable. Also, I don't think we need texture_with_auxiliary_present for anything since our solution is to resolve those. On Tue, Feb 27, 2018 at 1:30 AM, mailto:kevin.rogo...@intel.com>> wrote: From: Kevin Rogovin mailto:kevin.rogo...@intel.com>> Signed-off-by: Kevin Rogovin mailto:kevin.rogo...@intel.com>> --- src/mesa/drivers/dri/i965/intel_tex_validate.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/src/mesa/drivers/dri/i965/intel_tex_validate.c b/src/mesa/drivers/dri/i965/intel_tex_validate.c index eaa60ba..2bf6c65 100644 --- a/src/mesa/drivers/dri/i965/intel_tex_validate.c +++ b/src/mesa/drivers/dri/i965/intel_tex_validate.c @@ -179,6 +179,8 @@ brw_validate_textures(struct brw_context *brw) struct gl_context *ctx = &brw->ctx; const int max_enabled_unit = ctx->Texture._MaxEnabledTexImageUnit; + brw->astc5x5_wa.texture_astc5x5_present = false; + brw->astc5x5_wa.texture_with_auxilary_present = false; for (int unit = 0; unit <= max_enabled_unit; unit++) { struct gl_texture_object *tex_obj = ctx->Texture.Unit[unit]._Current; @@ -194,5 +196,18 @@ brw_validate_textures(struct brw_context *brw) intel_update_max_level(tex_obj, sampler); intel_finalize_mipmap_tree(brw, tex_obj); + + /* ASTC5x5 workaround needs to know if textures in use have + * auxilary in buffers and/or a texture in use is ASTC5x5 + */ + struct intel_texture_object *tex = intel_texture_object(tex_obj); + struct intel_mipmap_tree *mt = tex->mt; + if (mt && mt->aux_usage != ISL_AUX_USAGE_NONE) { + brw->astc5x5_wa.texture_with_auxilary_present = true; + } + if (tex->_Format == MESA_FORMAT_RGBA_ASTC_5x5 || + tex->_Format == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5) { + brw->astc5x5_wa.texture_astc5x5_present = true; + } } } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org<mailto:mesa-dev@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround
Hi, For those interested, I made a version of the series that does not change intel_miptree_texture_aux_usage(), but instead restores the disable_aux argument to intel_miptree_prepare_texture() and uses that. What I do not like is that it forced one to have several little things tweaked here and there to make it well whereas in contrast hitting intel_miptree_texture_aux_usage() put it in one place exactly. The branch is: https://github.com/krogueintel/asem/tree/astc5x5-wa-v3 . I took a look at blorp's using of intel_miptree_prepare_texture(). It uses it in only one place: when blitting (no surprise). Having the approach of changing intel_miptree_prepare_texture() will have blorp potentially not use auxiliary buffers on blitting; functionality will NOT regress because blorp angelically calls intel_miptree_prepare_access(), but it might degrade performance. Specifically: blorp when reading form the source surface will not use the auxiliary surface if the last draw/compute had astc5x5 texture. At this point, the choice is essentially: - cleaner and simpler code by hitting of intel_miptree_prepare_texture() at cost of potential performance hit of where a blit follows a draw (or compute) that uses an ASTC5x5 texture OR - ickier code without that one cost in blorp. I honestly suspect that one will never find an application that has a performance difference between the two unless one deliberately makes such an application. -Kevin -Original Message- From: Palli, Tapani Sent: Wednesday, February 14, 2018 9:58 AM To: Rogovin, Kevin ; Jason Ekstrand Cc: ML mesa-dev Subject: Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround On 14.02.2018 09:54, Tapani Pälli wrote: > > > On 14.02.2018 09:38, Rogovin, Kevin wrote: >> Hi, >> >> The 3rd patch, "i965: use ASTC5x5 workaround in >> intel_miptree_texture_aux_usage has issues: >> 1. Definitely: brw_draw lacks the call to >> gen9_astc5x5_perform_wa() which generates the needed flush between >> batchbuffers > > Now it happens via intel_miptree_prepare_texture (in > intel_miptree_texture_aux_usage). No it does not, sorry for that :) Yeah I believe I need to restore that. > >> 2. Uneasy: I am nervous about hitting >> intel_miptree_texture_aux_usage() as it is used in lots of places >> directly and indirectly. The interaction with blorp resolve makes me >> very uneasy I would rather restore the bool argument disable_aux >> to intel_miptree_prepare_render() to keep the ASTC evil a little more >> localized (the function is used only in brw_draw.c for resolving inputs). > > IMO it feels more centralized .. but no strong opinions here. I > haven't spotted any regressions because of this. > > >> -Kevin >> >> >> -Original Message- >> From: Palli, Tapani >> Sent: Monday, February 12, 2018 10:14 AM >> To: Rogovin, Kevin ; Jason Ekstrand >> >> Cc: ML mesa-dev >> Subject: Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround >> >> >> >> On 02/12/2018 09:44 AM, Tapani Pälli wrote: >>> Hi; >>> >>> On 02/08/2018 09:50 AM, Rogovin, Kevin wrote: >>>> Hi, >>>> >>>> I gave it a whirl of setting the .mocs field set to 0 passed to >>>> isl_surf_fill_state() ALWAYS. Sadly CarChase GLES continued to hang >>>> (where as the GL did not because it does not use ASTC). This makes >>>> sense since MOCS (atleast last time I looked at it) only really >>>> controls cache usage for L3 and eLLC (please anyone correct me if I >>>> am wrong in this) whereas the issue is that the samplers mess up >>>> how they deal with its own (private) cache. >>>> >>>> It really is nasty that it appears (as of now) that this >>>> complicated work around is needed and needs to somehow be >>>> re-implemented in anv as well. >>> >>> It seems surrounding code has changed so that these patches need >>> some changes. Kevin, are you planning to rebase/refactor these changes? >> >> FYI I've rebased the patches and did additional porting (because of >> commit df13588d21) here: >> >> https://cgit.freedesktop.org/~tpalli/mesa/log/?h=astc5x5 >> >> Let me know if this looks OK for you. >> >>> >>>> -Kevin >>>> >>>> *From:*Jason Ekstrand [mailto:ja...@jlekstrand.net] >>>> *Sent:* Thursday, February 8, 2018 2:47 AM >>>> *To:* Rogovin, Kevin >>>> *Cc:* ML mesa-dev >>>> *Subject:* Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround >>>> >>>> Random thought:
Re: [Mesa-dev] [PATCH v3] intel/tools: new intel_sanitize_gpu tool
Reviewed by: kevin.rogovin [at] intel.com -Original Message- From: Phillips, Scott D Sent: Friday, February 9, 2018 3:11 AM To: mesa-dev@lists.freedesktop.org; Rogovin, Kevin Subject: [PATCH v3] intel/tools: new intel_sanitize_gpu tool From: Kevin Rogovin Adds a new debug tool to pad each GEM BO allocated with (weak) pseudo-random noise values which are then checked after each batchbuffer dispatch to the kernel. This can be quite valuable to find diffucult to track down heisenberg style bugs. [scott.d.phill...@intel.com: split to separate tool] v2: (by Scott D Phillips) - track gem handles per fd (Kevin) - remove handles on GEM_CLOSE (Kevin) - ignore prime handles - meson & shell script v3: (by Scott D Phillips) - don't track prime bos at all (Kevin) - protect the hash table with a mutex (Kevin) - hook fds by drm_version.name, not path (Chris Wilson) --- src/intel/tools/intel_sanitize_gpu.c | 430 ++ src/intel/tools/intel_sanitize_gpu.in | 4 + src/intel/tools/meson.build | 25 ++ 3 files changed, 459 insertions(+) create mode 100644 src/intel/tools/intel_sanitize_gpu.c create mode 100755 src/intel/tools/intel_sanitize_gpu.in diff --git a/src/intel/tools/intel_sanitize_gpu.c b/src/intel/tools/intel_sanitize_gpu.c new file mode 100644 index 000..9b49b0bbf22 --- /dev/null +++ b/src/intel/tools/intel_sanitize_gpu.c @@ -0,0 +1,430 @@ +/* + * Copyright © 2015-2018 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person +obtaining a + * copy of this software and associated documentation files (the +"Software"), + * to deal in the Software without restriction, including without +limitation + * the rights to use, copy, modify, merge, publish, distribute, +sublicense, + * and/or sell copies of the Software, and to permit persons to whom +the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the +next + * paragraph) shall be included in all copies or substantial portions +of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT +SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR +OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, +ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +DEALINGS + * IN THE SOFTWARE. + */ + +#undef _FILE_OFFSET_BITS /* prevent #define open open64 */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "util/hash_table.h" + +#define INTEL_LOG_TAG "INTEL-SANITIZE-GPU" +#include "common/intel_log.h" +#include "common/gen_clflush.h" + +static int (*libc_open)(const char *pathname, int flags, mode_t mode); +static int (*libc_close)(int fd); static int (*libc_ioctl)(int fd, +unsigned long request, void *argp); static int (*libc_fcntl)(int fd, +int cmd, int param); + +#define DRM_MAJOR 226 + +/* TODO: we want to make sure that the padding forces + * the BO to take another page on the (PP)GTT; 4KB + * may or may not be the page size for the BO. Indeed, + * depending on GPU, kernel version and GEM size, the + * page size can be one of 4KB, 64KB or 2M. + */ +#define PADDING_SIZE 4096 + +struct refcnt_hash_table { + struct hash_table *t; + int refcnt; +}; + +pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; +#define MUTEX_LOCK() do {\ + if (unlikely(pthread_mutex_lock(&mutex))) { \ + intel_loge("mutex_lock failed"); \ + abort(); \ + } \ +} while (0) +#define MUTEX_UNLOCK() do { \ + if (unlikely(pthread_mutex_unlock(&mutex))) { \ + intel_loge("mutex_unlock failed"); \ + abort(); \ + } \ +} while (0) + +static struct hash_table *fds_to_bo_sizes = NULL; + +static inline struct hash_table* +bo_size_table(int fd) +{ + struct hash_entry *e = _mesa_hash_table_search(fds_to_bo_sizes, + (void*)(uintptr_t)fd); + return e ? ((struct refcnt_hash_table*)e->data)->t : NULL; } + +static inline uint64_t +bo_size(int fd, uint32_t handle) +{ + struct hash_table *t = bo_size_table(fd); + if (!t) + return UINT64_MAX; + struct hash_entry *e = _mesa_hash_table_search(t, (void*)(uintptr_t)handle); + return e ? (uint64_t)e->data : UINT64_MAX; } + +static inline bool +is_dr
Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround
Hi, The 3rd patch, "i965: use ASTC5x5 workaround in intel_miptree_texture_aux_usage has issues: 1. Definitely: brw_draw lacks the call to gen9_astc5x5_perform_wa() which generates the needed flush between batchbuffers 2. Uneasy: I am nervous about hitting intel_miptree_texture_aux_usage() as it is used in lots of places directly and indirectly. The interaction with blorp resolve makes me very uneasy I would rather restore the bool argument disable_aux to intel_miptree_prepare_render() to keep the ASTC evil a little more localized (the function is used only in brw_draw.c for resolving inputs). -Kevin -Original Message- From: Palli, Tapani Sent: Monday, February 12, 2018 10:14 AM To: Rogovin, Kevin ; Jason Ekstrand Cc: ML mesa-dev Subject: Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround On 02/12/2018 09:44 AM, Tapani Pälli wrote: > Hi; > > On 02/08/2018 09:50 AM, Rogovin, Kevin wrote: >> Hi, >> >> I gave it a whirl of setting the .mocs field set to 0 passed to >> isl_surf_fill_state() ALWAYS. Sadly CarChase GLES continued to hang >> (where as the GL did not because it does not use ASTC). This makes >> sense since MOCS (atleast last time I looked at it) only really >> controls cache usage for L3 and eLLC (please anyone correct me if I >> am wrong in this) whereas the issue is that the samplers mess up how >> they deal with its own (private) cache. >> >> It really is nasty that it appears (as of now) that this complicated >> work around is needed and needs to somehow be re-implemented in anv >> as well. > > It seems surrounding code has changed so that these patches need some > changes. Kevin, are you planning to rebase/refactor these changes? FYI I've rebased the patches and did additional porting (because of commit df13588d21) here: https://cgit.freedesktop.org/~tpalli/mesa/log/?h=astc5x5 Let me know if this looks OK for you. > >> -Kevin >> >> *From:*Jason Ekstrand [mailto:ja...@jlekstrand.net] >> *Sent:* Thursday, February 8, 2018 2:47 AM >> *To:* Rogovin, Kevin >> *Cc:* ML mesa-dev >> *Subject:* Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround >> >> Random thought: >> >> Nanley and I were talking about this just now and I was complaining >> about how much I hate the fact that this workaround exists because we >> can't implement it in Vulkan. Then I got an idea. What would happen >> if we just set MOCS to zero (uncached) for ASTC 5x5 textures? Does >> that make the hang go away? How bad is the car chase performance >> with that compared to this series? It's a bit of a big hammer but >> has the advantage of simplicity. If it causes performance to tank on >> anything then then the more complex solution is probably worth it but >> I thought it was worth a try. >> >> --Jason >> >> On Thu, Dec 14, 2017 at 9:39 AM, > <mailto:kevin.rogo...@intel.com>> wrote: >> >> From: Kevin Rogovin > <mailto:kevin.rogo...@intel.com>> >> >> This patch series implements a needed workaround for Gen9 for >> ASTC5x5 >> sampler reads. The crux of the work around is to make sure that >> the >> sampler does not read an ASTC5x5 texture and a surface with an >> auxilary >> buffer without having a texture cache invalidate and command >> streamer >> stall between such accesses. >> >> With this patch series applied to the (current) master branch of >> mesa, >> carchase works on my SKL GT4. >> >> v2: >> Rename workaround functions from brw_ to gen9_ >> (suggested/requested by Topi Pohjolainen). >> >> Place texture resolve to avoid using auxilary surface >> when ASTC5x5 is detected in brw_predraw_resolve_inputs() >> instead of another detected function; doing so allows >> one to avoid walking the textures again. >> (suggested/requested by Topi Pohjolainen). >> >> Emit command streamer stall in addition to texture >> invalidate. >> (original short-coming caught by Jason Ekstrand) >> >> Place workaround function in (new) dedicated file. >> >> Minor path re-ordering to accomodate changes. >> >> Kevin Rogovin (5): >> i965: define astx5x5 workaround infrastructure >> i965: set ASTC5x5 workaround texture type tracking on texture >> validate >> i965: use ASTC5x5 workaround in brw_draw >> i965: use ASTC5x5 workaround in brw_compute >> i965
Re: [Mesa-dev] [PATCH v2] intel/tools: new intel_sanitize_gpu tool
HI, Review comments below. -Original Message- From: Phillips, Scott D Sent: Thursday, February 8, 2018 2:19 AM To: mesa-dev@lists.freedesktop.org; Rogovin, Kevin Subject: [PATCH v2] intel/tools: new intel_sanitize_gpu tool From: Kevin Rogovin Adds a new debug tool to pad each GEM BO allocated with (weak) pseudo-random noise values which are then checked after each batchbuffer dispatch to the kernel. This can be quite valuable to find diffucult to track down heisenberg style bugs. [scott.d.phill...@intel.com: split to separate tool] v2: (by Scott D Phillips) - track gem handles per fd (Kevin) - remove handles on GEM_CLOSE (Kevin) - ignore prime handles - meson & shell script --- src/intel/tools/intel_sanitize_gpu.c | 399 ++ src/intel/tools/intel_sanitize_gpu.in | 4 + src/intel/tools/meson.build | 25 +++ 3 files changed, 428 insertions(+) create mode 100644 src/intel/tools/intel_sanitize_gpu.c create mode 100755 src/intel/tools/intel_sanitize_gpu.in diff --git a/src/intel/tools/intel_sanitize_gpu.c b/src/intel/tools/intel_sanitize_gpu.c new file mode 100644 index 000..84e9196da31 --- /dev/null +++ b/src/intel/tools/intel_sanitize_gpu.c @@ -0,0 +1,399 @@ +/* + * Copyright © 2015-2018 Intel Corporation + * +static int +prime_fd(int fd, struct drm_prime_handle *handle) { + int ret = libc_ioctl(fd, DRM_IOCTL_PRIME_FD_TO_HANDLE, handle); + if (ret != 0) + return ret; + + /* +* Can't pad prime buffers, mark it as size=UINT64_MAX and we'll +* skip it in verification. +*/ + _mesa_hash_table_insert(bo_size_table(fd), (void*)(uintptr_t)handle->handle, + (void*)(uintptr_t)UINT64_MAX); + + return 0; +} Rather than tracking such GEM handles, just ignore them entirely since one cannot pad them at all. This is the same story as for USER_PTR GEM BO's as well. + +__attribute__ ((visibility ("default"))) int open(const char *path, int +flags, ...) { + va_list args; + mode_t mode; + + va_start(args, flags); + mode = va_arg(args, int); + va_end(args); + + int fd = libc_open(path, flags, mode); + + if (fd >= 0 && strcmp(path, "/dev/dri/renderD128") == 0) + add_drm_fd(fd); + + return fd; +} + +__attribute__ ((visibility ("default"), alias ("open"))) int +open64(const char *path, int flags, ...); + +__attribute__ ((visibility ("default"))) int close(int fd) { + if (is_drm_fd(fd)) + del_drm_fd(fd); + + return libc_close(fd); +} + +__attribute__ ((visibility ("default"))) int fcntl(int fd, int cmd, +...) { + va_list args; + int param; + + va_start(args, cmd); + param = va_arg(args, int); + va_end(args); + + int res = libc_fcntl(fd, cmd, param); + + if (is_drm_fd(fd) && cmd == F_DUPFD_CLOEXEC) + dup_drm_fd(fd, res); + + return res; +} + +__attribute__ ((visibility ("default"))) int ioctl(int fd, unsigned +long request, ...) { + va_list args; + void *argp; + struct stat buf; + + va_start(args, request); + argp = va_arg(args, void *); + va_end(args); + + if (_IOC_TYPE(request) == DRM_IOCTL_BASE && + !is_drm_fd(fd) && fstat(fd, &buf) == 0 && + (buf.st_mode & S_IFMT) == S_IFCHR && major(buf.st_rdev) == DRM_MAJOR) { + intel_loge("missed drm fd %d", fd); + add_drm_fd(fd); + } + + if (is_drm_fd(fd)) { + switch (request) { + case DRM_IOCTL_GEM_CLOSE: + return gem_close(fd, (struct drm_gem_close*)argp); + + case DRM_IOCTL_PRIME_FD_TO_HANDLE: + return prime_fd(fd, (struct drm_prime_handle*)argp); + + case DRM_IOCTL_I915_GEM_CREATE: + return create_with_padding(fd, (struct + drm_i915_gem_create*)argp); + + case DRM_IOCTL_I915_GEM_EXECBUFFER2: + case DRM_IOCTL_I915_GEM_EXECBUFFER2_WR: + return exec_and_check_padding(fd, request, + (struct + drm_i915_gem_execbuffer2*)argp); + + default: + break; + } + } + return libc_ioctl(fd, request, argp); } + One needs to mutex lock the internal structures of the created .so to correctly handle simultaneous ioctl's coming from multi-threaded applications. I suggest putting that common mutex lock in the functions open(), close() and ioctl(). ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround
Hi, I gave it a whirl of setting the .mocs field set to 0 passed to isl_surf_fill_state() ALWAYS. Sadly CarChase GLES continued to hang (where as the GL did not because it does not use ASTC). This makes sense since MOCS (atleast last time I looked at it) only really controls cache usage for L3 and eLLC (please anyone correct me if I am wrong in this) whereas the issue is that the samplers mess up how they deal with its own (private) cache. It really is nasty that it appears (as of now) that this complicated work around is needed and needs to somehow be re-implemented in anv as well. -Kevin From: Jason Ekstrand [mailto:ja...@jlekstrand.net] Sent: Thursday, February 8, 2018 2:47 AM To: Rogovin, Kevin Cc: ML mesa-dev Subject: Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround Random thought: Nanley and I were talking about this just now and I was complaining about how much I hate the fact that this workaround exists because we can't implement it in Vulkan. Then I got an idea. What would happen if we just set MOCS to zero (uncached) for ASTC 5x5 textures? Does that make the hang go away? How bad is the car chase performance with that compared to this series? It's a bit of a big hammer but has the advantage of simplicity. If it causes performance to tank on anything then then the more complex solution is probably worth it but I thought it was worth a try. --Jason On Thu, Dec 14, 2017 at 9:39 AM, mailto:kevin.rogo...@intel.com>> wrote: From: Kevin Rogovin mailto:kevin.rogo...@intel.com>> This patch series implements a needed workaround for Gen9 for ASTC5x5 sampler reads. The crux of the work around is to make sure that the sampler does not read an ASTC5x5 texture and a surface with an auxilary buffer without having a texture cache invalidate and command streamer stall between such accesses. With this patch series applied to the (current) master branch of mesa, carchase works on my SKL GT4. v2: Rename workaround functions from brw_ to gen9_ (suggested/requested by Topi Pohjolainen). Place texture resolve to avoid using auxilary surface when ASTC5x5 is detected in brw_predraw_resolve_inputs() instead of another detected function; doing so allows one to avoid walking the textures again. (suggested/requested by Topi Pohjolainen). Emit command streamer stall in addition to texture invalidate. (original short-coming caught by Jason Ekstrand) Place workaround function in (new) dedicated file. Minor path re-ordering to accomodate changes. Kevin Rogovin (5): i965: define astx5x5 workaround infrastructure i965: set ASTC5x5 workaround texture type tracking on texture validate i965: use ASTC5x5 workaround in brw_draw i965: use ASTC5x5 workaround in brw_compute i965: ASTC5x5 workaround logic for blorp src/mesa/drivers/dri/i965/Makefile.sources | 1 + src/mesa/drivers/dri/i965/brw_compute.c | 6 src/mesa/drivers/dri/i965/brw_context.c | 6 src/mesa/drivers/dri/i965/brw_context.h | 24 src/mesa/drivers/dri/i965/brw_draw.c | 16 +-- src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 5 src/mesa/drivers/dri/i965/gen9_astc5x5_wa.c | 36 src/mesa/drivers/dri/i965/genX_blorp_exec.c | 5 src/mesa/drivers/dri/i965/intel_batchbuffer.c| 1 + src/mesa/drivers/dri/i965/intel_tex_image.c | 16 --- src/mesa/drivers/dri/i965/intel_tex_validate.c | 13 + src/mesa/drivers/dri/i965/meson.build| 1 + 12 files changed, 124 insertions(+), 6 deletions(-) create mode 100644 src/mesa/drivers/dri/i965/gen9_astc5x5_wa.c -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org<mailto:mesa-dev@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH RFC] intel/tools: new intel_sanitize_gpu tool
Hi, Two comments on the code you sent: 1. the map to find the size of a given GEM needs to be keyed by (fd, gem_handle) and not just the gem_handle; if an application (though very unlikely) has multiple fd's to the GPU (for example if an application made multiple connections to X, or if the application is using OpenGL and OpenCL (or libva) ). 2. I would also handle the destroy GEM ioctl to remove its entry from the map. -Kevin -Original Message- From: Phillips, Scott D Sent: Wednesday, February 7, 2018 2:35 AM To: mesa-dev@lists.freedesktop.org; Rogovin, Kevin Subject: [PATCH RFC] intel/tools: new intel_sanitize_gpu tool From: Kevin Rogovin Adds a new debug tool to pad each GEM BO allocated with (weak) pseudo-random noise values which are then checked after each batchbuffer dispatch to the kernel. This can be quite valuable to find diffucult to track down heisenberg style bugs. [scott.d.phill...@intel.com: split to separate tool] --- Hi Kevin, here's another take on your out-of-bounds write detection code, packaged as a separate tool instead of in the driver code. Doing it this way could make it easier to use across i965 and anv (although USERPTR is needed for anv support). The build bits of this are less than half baked at the moment. What do you think? src/intel/tools/intel_sanitize_gpu.c | 306 + src/intel/tools/meson.build| 10 ++ src/intel/tools/run_intel_sanitize_gpu | 5 + 3 files changed, 321 insertions(+) create mode 100644 src/intel/tools/intel_sanitize_gpu.c create mode 100755 src/intel/tools/run_intel_sanitize_gpu diff --git a/src/intel/tools/intel_sanitize_gpu.c b/src/intel/tools/intel_sanitize_gpu.c new file mode 100644 index 000..d74492bb02e --- /dev/null +++ b/src/intel/tools/intel_sanitize_gpu.c @@ -0,0 +1,306 @@ +/* + * Copyright © 2015-2018 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person +obtaining a + * copy of this software and associated documentation files (the +"Software"), + * to deal in the Software without restriction, including without +limitation + * the rights to use, copy, modify, merge, publish, distribute, +sublicense, + * and/or sell copies of the Software, and to permit persons to whom +the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the +next + * paragraph) shall be included in all copies or substantial portions +of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT +SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR +OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, +ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +DEALINGS + * IN THE SOFTWARE. + */ + +#undef _FILE_OFFSET_BITS /* prevent #define open open64 */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "util/hash_table.h" +#include "util/set.h" + +#define INTEL_LOG_TAG "INTEL-SANITIZE-GPU" +#include "common/intel_log.h" +#include "common/gen_clflush.h" + +static int (*libc_open)(const char *pathname, int flags, mode_t mode); +static int (*libc_close)(int fd); static int (*libc_ioctl)(int fd, +unsigned long request, void *argp); static int (*libc_fcntl)(int fd, +int cmd, int param); + +#define DRM_MAJOR 226 + +/* TODO: we want to make sure that the padding forces + * the BO to take another page on the (PP)GTT; 4KB + * may or may not be the page size for the BO. Indeed, + * depending on GPU, kernel version and GEM size, the + * page size can be one of 4KB, 64KB or 2M. + */ +#define PADDING_SIZE 4096 + +static struct set *drm_fds = NULL; +static struct hash_table *bo_sizes = NULL; + +#define IS_DRM_FD(fd) (_mesa_set_search(drm_fds, \ +(void*)(uintptr_t)(fd)) != NULL) +#define BO_SIZE(handle) ({ \ + struct hash_entry *e = _mesa_hash_table_search(bo_sizes, \ + (void*)(uintptr_t)(handle)); \ + e ? (uintptr_t)e->data : 0; \ +}) + +/* Our goal is not to have noise good enough for cryto, + * but instead values that are unique-ish enough that + * it is incredibly unlikely that a buffer overwrite + * will produce the exact same values. + */ +static uint8_t +next_noise_value(uint8_t prev_noise) +{ + uint32_t v = prev_noise; + return (v * 103u + 227u) &
Re: [Mesa-dev] [PATCH RFC] intel/tools: new intel_sanitize_gpu tool
HI Scott, I like the idea of making this a pre-loadable .so as it lets it get reused for anything using the kernel interface using GEM's. However, there are a few bits, that if addressed, would make this perfect IMO: 1. It needs a way of being able to label the GEMs (brw_bufmgr.c provides this). The current code below will just tell us which ioctl had a bad thing happen to a GEM BO, but it won't tell us which GEM easily. Admittedly a debug session will tell us the handle of the GEM, but then it is quite icky to figure what the GEM BO was for (OpenGL BO, scratch BO, etc.) 2. In an ideal world, the .so will work perfectly with other pre-loadable .so's that intercept ioctl as well (for example aub or the batchbuffer logger I made); I freely admit that I am a touch fuzzy on the LD_PRELOAD rules, but it looks like one cannot use this tool at the same time as any other tool that tries to intercept ioctl's (though to be fair, those tools as well don't work with others either I think). However, if one moves away from requiring at as an .so, but instead use it as something to allocate (and string-label) GEM BO's, then it could work fine; the interface would be something like: uint32_t //allocate with padding intel_allocate_gem(int fd, uint64_t size, const char *gem_label); uint32_t //delete gem created with intel_allocate_gem() intel_delete_gem_bo(int fd, uint32_t gem_handle); void //needed when GEM BO's are reused for different purposes intel_change_gem_label(int fd, uint32_t gem_handle, const char *new_gem_label); bool //returns true if gem padding is good OR if gem was not created with intel_allocate_gem() intel_gem_padding_good(int fd, uint32_t gem_handle); void intel_execbuffer2(int fd, struct drm_i915_gem_execbuffer2 *p); void intel_execbuffer2_wr(int fd, struct drm_i915_gem_execbuffer2 *p); The implementation would have a map keyed by the pair (fd, gem_handle) with value of size of GEM and padding size. If the debug flag for checking is down, then all the lookups and necessary locking is skipped so the thing would have (almost) no overhead when the option was not active. One question that pops into my mind: is it possible to use brw_bufmgr of i965 in anv? If so, then we get reuse with anv for free. -Kevin -Original Message- From: Phillips, Scott D Sent: Wednesday, February 7, 2018 2:35 AM To: mesa-dev@lists.freedesktop.org; Rogovin, Kevin Subject: [PATCH RFC] intel/tools: new intel_sanitize_gpu tool From: Kevin Rogovin Adds a new debug tool to pad each GEM BO allocated with (weak) pseudo-random noise values which are then checked after each batchbuffer dispatch to the kernel. This can be quite valuable to find diffucult to track down heisenberg style bugs. [scott.d.phill...@intel.com: split to separate tool] --- Hi Kevin, here's another take on your out-of-bounds write detection code, packaged as a separate tool instead of in the driver code. Doing it this way could make it easier to use across i965 and anv (although USERPTR is needed for anv support). The build bits of this are less than half baked at the moment. What do you think? src/intel/tools/intel_sanitize_gpu.c | 306 + src/intel/tools/meson.build| 10 ++ src/intel/tools/run_intel_sanitize_gpu | 5 + 3 files changed, 321 insertions(+) create mode 100644 src/intel/tools/intel_sanitize_gpu.c create mode 100755 src/intel/tools/run_intel_sanitize_gpu diff --git a/src/intel/tools/intel_sanitize_gpu.c b/src/intel/tools/intel_sanitize_gpu.c new file mode 100644 index 000..d74492bb02e --- /dev/null +++ b/src/intel/tools/intel_sanitize_gpu.c @@ -0,0 +1,306 @@ +/* + * Copyright © 2015-2018 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person +obtaining a + * copy of this software and associated documentation files (the +"Software"), + * to deal in the Software without restriction, including without +limitation + * the rights to use, copy, modify, merge, publish, distribute, +sublicense, + * and/or sell copies of the Software, and to permit persons to whom +the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the +next + * paragraph) shall be included in all copies or substantial portions +of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT +SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR +OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, +ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +DEALINGS + * IN THE SOFTWARE. + */ + +#undef _FILE_OFFSET_BITS /* prev
Re: [Mesa-dev] [PATCH v3 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for each bo used in batchbuffer is correct
Hi, Thanks, so the items that need to be fixed are: Patch 1: use the better name for the macro value to better match the string Patch 2: either use pread/pwrite for both set and check noise or use map for both (I will use map) Patch 3: fine as is. Apparently, the mesa-dev archive is acting like /dev/null again as the patch series, your review and discussion have disappeared. I will post a v4 shortly unless there are any additional shortcomings that need to be addressed. -Kevin -Original Message- From: Jason Ekstrand [mailto:ja...@jlekstrand.net] Sent: Monday, January 29, 2018 6:41 PM To: Rogovin, Kevin Subject: RE: [PATCH v3 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for each bo used in batchbuffer is correct Nope. That one looked fine as-is. On January 28, 2018 23:13:40 "Rogovin, Kevin" wrote: > Any comments/review for Patch 3? > > -Original Message----- > From: Rogovin, Kevin > Sent: Friday, January 26, 2018 10:56 AM > To: mesa-dev@lists.freedesktop.org > Cc: Rogovin, Kevin > Subject: [PATCH v3 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, check > that noise padding for each bo used in batchbuffer is correct > > From: Kevin Rogovin > > Signed-off-by: Kevin Rogovin > --- > src/mesa/drivers/dri/i965/intel_batchbuffer.c | 22 > +- > 1 file changed, 21 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c > b/src/mesa/drivers/dri/i965/intel_batchbuffer.c > index 02bfd3f333..fc6998a7ca 100644 > --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c > +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c > @@ -1019,11 +1019,31 @@ _intel_batchbuffer_flush_fence(struct > brw_context *brw, > > ret = submit_batch(brw, in_fence_fd, out_fence_fd); > > - if (unlikely(INTEL_DEBUG & DEBUG_SYNC)) { > + if (unlikely(INTEL_DEBUG & (DEBUG_SYNC | DEBUG_OUT_OF_BOUND_CHK))) > + { >fprintf(stderr, "waiting for idle\n"); >brw_bo_wait_rendering(brw->batch.batch.bo); > } > > + if (unlikely(INTEL_DEBUG & DEBUG_OUT_OF_BOUND_CHK)) { > + bool detected_out_of_bounds_write = false; > + > + for (int i = 0; i < brw->batch.exec_count; i++) { > + struct brw_bo *bo = brw->batch.exec_bos[i]; > + > + if (!brw_bo_padding_is_good(bo)) { > +detected_out_of_bounds_write = true; > +fprintf(stderr, > +"Detected buffer out-of-bounds write from brw_bo %p " > +"(GEM %u, label = \"%s\")\n", > +bo, bo->gem_handle, bo->name); > + } > + } > + > + if (unlikely(detected_out_of_bounds_write)) { > + abort(); > + } > + } > + > /* Start a new batch buffer. */ > brw_new_batch(brw); > > -- > 2.15.1 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 1/3] intel/common:add debug flag for adding and checking padding on BO's
Sure, I can change the bit flag name to DEBUG_CHECK_OOB; From: Jason Ekstrand [mailto:ja...@jlekstrand.net] Sent: Friday, January 26, 2018 12:11 PM To: Rogovin, Kevin Cc: ML mesa-dev Subject: Re: [Mesa-dev] [PATCH v3 1/3] intel/common:add debug flag for adding and checking padding on BO's On Fri, Jan 26, 2018 at 12:56 AM, mailto:kevin.rogo...@intel.com>> wrote: From: Kevin Rogovin mailto:kevin.rogo...@intel.com>> Signed-off-by: Kevin Rogovin mailto:kevin.rogo...@intel.com>> --- src/intel/common/gen_debug.c | 1 + src/intel/common/gen_debug.h | 1 + 2 files changed, 2 insertions(+) diff --git a/src/intel/common/gen_debug.c b/src/intel/common/gen_debug.c index a978f2f581..2154b23756 100644 --- a/src/intel/common/gen_debug.c +++ b/src/intel/common/gen_debug.c @@ -85,6 +85,7 @@ static const struct debug_control debug_control[] = { { "nohiz", DEBUG_NO_HIZ }, { "color", DEBUG_COLOR }, { "reemit", DEBUG_REEMIT }, + { "check_oob", DEBUG_OUT_OF_BOUND_CHK }, Maybe call it DEBUG_CHECK_OOB to match the string? { NULL,0 } }; diff --git a/src/intel/common/gen_debug.h b/src/intel/common/gen_debug.h index da5b5a569d..92fc68b494 100644 --- a/src/intel/common/gen_debug.h +++ b/src/intel/common/gen_debug.h @@ -83,6 +83,7 @@ extern uint64_t INTEL_DEBUG; #define DEBUG_NO_HIZ (1ull << 39) #define DEBUG_COLOR (1ull << 40) #define DEBUG_REEMIT (1ull << 41) +#define DEBUG_OUT_OF_BOUND_CHK(1ull << 42) #ifdef HAVE_ANDROID_PLATFORM #define LOG_TAG "INTEL-MESA" -- 2.15.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org<mailto:mesa-dev@lists.freedesktop.org> https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 2/3] i965: add noise padding to buffer object and function to check if noise is correct
Hi, The main reason I went with map and invalidate is because the kernel on pread will only wait on execbuffer2 calls that declare they are going to write to the given GEM; there is the off-chance that a wild write might hit the padding of a GEM and the function contract is to check the padding. This is following the review comments from Chris Wilson. -Kevin From: Jason Ekstrand [mailto:ja...@jlekstrand.net] Sent: Friday, January 26, 2018 12:13 PM To: Rogovin, Kevin Cc: ML mesa-dev Subject: Re: [Mesa-dev] [PATCH v3 2/3] i965: add noise padding to buffer object and function to check if noise is correct On Fri, Jan 26, 2018 at 12:56 AM, mailto:kevin.rogo...@intel.com>> wrote: From: Kevin Rogovin mailto:kevin.rogo...@intel.com>> Signed-off-by: Kevin Rogovin mailto:kevin.rogo...@intel.com>> Reviewed-by: Chris Wilson mailto:ch...@chris-wilson.co.uk>> --- src/mesa/drivers/dri/i965/brw_bufmgr.c | 115 - src/mesa/drivers/dri/i965/brw_bufmgr.h | 13 2 files changed, 127 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c index fb180289a0..7e50ba512e 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c @@ -220,6 +220,39 @@ bucket_for_size(struct brw_bufmgr *bufmgr, uint64_t size) &bufmgr->cache_bucket[index] : NULL; } +/* Our goal is not to have noise good enough for cryto, + * but instead values that are unique-ish enough that + * it is incredibly unlikely that a buffer overwrite + * will produce the exact same values. + */ +static uint8_t +next_noise_value(uint8_t prev_noise) +{ + uint32_t v = prev_noise; + return (v * 103u + 227u) & 0xFF; +} + +static void +fill_noise_buffer(uint8_t *dst, uint8_t start, uint32_t length) +{ + for(uint32_t i = 0; i < length; ++i) { + dst[i] = start; + start = next_noise_value(start); + } +} + +static uint8_t* +generate_noise_buffer(uint8_t start, uint32_t length) +{ + uint8_t *return_value; + return_value = malloc(length); + if (return_value) { + fill_noise_buffer(return_value, start, length); + } + + return return_value; +} + int brw_bo_busy(struct brw_bo *bo) { @@ -367,7 +400,18 @@ retry: bo->size = bo_size; bo->idle = true; - struct drm_i915_gem_create create = { .size = bo_size }; + bo->padding_size = 0; + if (unlikely(INTEL_DEBUG & DEBUG_OUT_OF_BOUND_CHK)) { + /* TODO: we want to make sure that the padding forces + * the BO to take another page on the (PP)GTT; 4KB + * may or may not be the page size for the GEM. Indeed, + * depending on generation, kernel version and GEM size, + * the page size can be one of 4KB, 64KB or 2M. + */ + bo->padding_size = 4096; + } + + struct drm_i915_gem_create create = { .size = bo_size + bo->padding_size }; /* All new BOs we get from the kernel are zeroed, so we don't need to * worry about that here. @@ -378,6 +422,27 @@ retry: goto err; } + if (unlikely(bo->padding_size > 0)) { + uint8_t *noise_values; + struct drm_i915_gem_pwrite pwrite; + + noise_values = generate_noise_buffer(create.handle, bo->padding_size); + if (!noise_values) +goto err_free; + + pwrite.handle = create.handle; + pwrite.pad = 0; + pwrite.offset = bo_size; + pwrite.size = bo->padding_size; + pwrite.data_ptr = (__u64) (uintptr_t) noise_values; + + ret = drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_PWRITE, &pwrite); + free(noise_values); + + if (ret != 0) +goto err_free; + } + bo->gem_handle = create.handle; bo->bufmgr = bufmgr; @@ -424,6 +489,53 @@ err: return NULL; } +bool +brw_bo_padding_is_good(struct brw_bo *bo) +{ + if (bo->padding_size > 0) { + struct drm_i915_gem_mmap mmap_arg = { + .handle = bo->gem_handle, + .offset = bo->size, + .size = bo->padding_size, + .flags = 0, + }; + uint8_t *mapped; + int ret; + uint8_t expected_value; + + /* We cannot use brw_bo_map() because it maps precisely the range + * [0, bo->size) and we wish to map the range of the padding which + * is [bo->size, bo->size + bo->pading_size) + */ + ret = drmIoctl(bo->bufmgr->fd, DRM_IOCTL_I915_GEM_MMAP, &mmap_arg); + if (ret != 0) { + DBG("Unable to map mapping buffer %d (%s) buffer for pad checking.\n", + bo->gem_handle, bo->name); + return false; + } + + mapped = (uint8_t*) (uintptr_t) mmap_arg.addr_ptr; + /* bah-humbug, we need to see the latest contents and + * if the bo is not cache coherent we likely need to + * inva
Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround
Hi, I gave it a try by modifying isl_genX(surf_fill_state_s) in src/intel/isl/isl_surface_state.c where I set SamplerL2BypassModeDisable ALWAYS as true for GEN9; sadly car chase continued to hang. -Kevin -Original Message- From: Nanley Chery [mailto:nanleych...@gmail.com] Sent: Friday, December 15, 2017 8:34 PM To: Rogovin, Kevin Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround On Thu, Dec 14, 2017 at 07:39:46PM +0200, kevin.rogo...@intel.com wrote: > From: Kevin Rogovin > > This patch series implements a needed workaround for Gen9 for ASTC5x5 > sampler reads. The crux of the work around is to make sure that the > sampler does not read an ASTC5x5 texture and a surface with an > auxilary buffer without having a texture cache invalidate and command > streamer stall between such accesses. > This workaround sounds like it deals with the same types of surfaces dealt with in the RENDER_SURFACE_STATE field, Sampler L2 Out of Order Mode Disable (or SamplerL2BypassModeDisable in our driver). Here's the programming note from the SKL PRM on this field: * This bit must be set for the following surface types: BC2_UNORM BC3_UNORM BC5_UNORM BC5_SNORM BC7_UNORM * This bit must be set for surfaces which contain a HiZ auxilliary surface if other surfaces using AUX_CCS_E or AUX_CCS_D auxiliary surface state (lossless color compression) are being sampled at the same time. Have we tried setting this bit for ASTC_5x5 textures? -Nanley > With this patch series applied to the (current) master branch of mesa, > carchase works on my SKL GT4. > > v2: > Rename workaround functions from brw_ to gen9_ > (suggested/requested by Topi Pohjolainen). > > Place texture resolve to avoid using auxilary surface > when ASTC5x5 is detected in brw_predraw_resolve_inputs() > instead of another detected function; doing so allows > one to avoid walking the textures again. > (suggested/requested by Topi Pohjolainen). > > Emit command streamer stall in addition to texture > invalidate. > (original short-coming caught by Jason Ekstrand) > > Place workaround function in (new) dedicated file. > > Minor path re-ordering to accomodate changes. > > Kevin Rogovin (5): > i965: define astx5x5 workaround infrastructure > i965: set ASTC5x5 workaround texture type tracking on texture validate > i965: use ASTC5x5 workaround in brw_draw > i965: use ASTC5x5 workaround in brw_compute > i965: ASTC5x5 workaround logic for blorp > > src/mesa/drivers/dri/i965/Makefile.sources | 1 + > src/mesa/drivers/dri/i965/brw_compute.c | 6 > src/mesa/drivers/dri/i965/brw_context.c | 6 > src/mesa/drivers/dri/i965/brw_context.h | 24 > src/mesa/drivers/dri/i965/brw_draw.c | 16 +-- > src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 5 > src/mesa/drivers/dri/i965/gen9_astc5x5_wa.c | 36 > > src/mesa/drivers/dri/i965/genX_blorp_exec.c | 5 > src/mesa/drivers/dri/i965/intel_batchbuffer.c| 1 + > src/mesa/drivers/dri/i965/intel_tex_image.c | 16 --- > src/mesa/drivers/dri/i965/intel_tex_validate.c | 13 + > src/mesa/drivers/dri/i965/meson.build| 1 + > 12 files changed, 124 insertions(+), 6 deletions(-) create mode > 100644 src/mesa/drivers/dri/i965/gen9_astc5x5_wa.c > > -- > 2.7.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for each bo used in batchbuffer is correct
Hi, > Yes. It's just the accidental writes into the read-only bo that you may miss. > Your call, and I have no > objections if such limitations are described in the comments. I'll add the comment; since there then less code to read (indeed using brw_bo_map will not work because it does not map the range including the padding). I've made a version at: https://github.com/krogueintel/asem/tree/i965-pad-gem-bos-v2-dont-use-pread which does not use pread, but it looks so much ickier to me. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for each bo used in batchbuffer is correct
Just to make sure of the reason of the original objection to trusting pread blindly: pread will miss writes IF the GEM BO was not listed in the execbuffer2 ioctl. If that is the case, I am ok with missing those writes because those writes (should) only occur if the GPU writes to padding of the BO by bad-luck; this series is looking really to just protect against that buffer B is written to at offset O where offset O is larger than the size of buffer B (the series only checks for writes where O is also less than the size plus the padding). Detecting the bad-luck writes requires much more pain and suffering: 1. Scratch contents read/write 2. For BO's that driver knows are not writeable by GPU to then mirror those BO contents. (1) is not so bad, but (2) is terribly hideous. With the goal that the patch series is only looking to detect out-of-bound writes (up to padding size), I think pread would be fine then. Thoughts? -Kevin P.S. I am already writing it to do mapping and such, but it is more code sadly. -Original Message- From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] Sent: Wednesday, December 13, 2017 1:23 PM To: Rogovin, Kevin ; mesa-dev@lists.freedesktop.org Cc: joonas.lahti...@linux.intel.com Subject: RE: [Mesa-dev] [PATCH v2 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for each bo used in batchbuffer is correct Quoting Rogovin, Kevin (2017-12-13 11:19:10) > Hi, > > > Actually that's not strictly true. Since you only do a pread here, it will > > only synchronize against the last declared write to the bo. > > There's no guaranteed sync with the last batch for a set of read-only bo. > > Similarly, because of no domain tracking, it won't also ensure that the bo > > is cache coherent before the read back. > > Futz. Would then doing brw_bo_map() with flags having value MAP_READ > be sufficient then? This is what function > brw_get_buffer_subdata() of intel_buffer_objects.c does to copy BO contents > to application supplied buffer. Yes. I would also pencil in a note to do a map_range, or at least MAP_INCOHERENT flag to avoid the gen_invalidate_range and do the cache invalidation manually (for !llc). -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for each bo used in batchbuffer is correct
Hi, > Actually that's not strictly true. Since you only do a pread here, it will > only synchronize against the last declared write to the bo. > There's no guaranteed sync with the last batch for a set of read-only bo. > Similarly, because of no domain tracking, it won't also ensure that the bo is > cache coherent before the read back. Futz. Would then doing brw_bo_map() with flags having value MAP_READ be sufficient then? This is what function brw_get_buffer_subdata() of intel_buffer_objects.c does to copy BO contents to application supplied buffer. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for each bo used in batchbuffer is correct
Hi, Just got confirmation that kernel does the syncing required to make sure that pread values are realiable. -Kevin -Original Message- From: Rogovin, Kevin Sent: Wednesday, December 13, 2017 8:19 AM To: 'Jason Ekstrand' Cc: mesa-dev@lists.freedesktop.org; Lahtinen, Joonas Subject: RE: [Mesa-dev] [PATCH 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for each bo used in batchbuffer is correct Hi, > I think you want to do this at the end of submit_batch instead and add > a brw_bo_wait_rendering on the batch. Otherwise, your bounds checking is > racing with the GPU. I remember being told that pread has the kernel do the required waiting, however I am not 100% sure of this (which is why I cc'd Joonas to either confirm or deny the assertion). Joonas? -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for each bo used in batchbuffer is correct
Hi, > I think you want to do this at the end of submit_batch instead and add a > brw_bo_wait_rendering on the batch. > Otherwise, your bounds checking is racing with the GPU. I remember being told that pread has the kernel do the required waiting, however I am not 100% sure of this (which is why I cc'd Joonas to either confirm or deny the assertion). Joonas? -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] i965: add noise padding to buffer object and function to check if noise is correct
Hi, Thankyou for reading the code and giving advice to improve upon it. Below are some thoughts: > I can't help but think that this could be a bit simpler and involve throwing > fewer pointers around. I was thinking this too; the easiest way to do this is to just have the same noise for all the paddings; that would mean there is just one pointer of data and that would be a private member of brw_bufmgr. > This is 4096. I think we could just have a single uint32_t padding field > which is either 0 or 4096 (More on that later). If the kernel supports huge pages, though now there might be "two" different things as far as page size goes then: the page size for CPU things and the page size for the PPGTT. I don't know if they must be the same or if they can be different. I also do not know how to actually get the page size used by the PPGTT, as getpagesize() is the page size for the CPU page tabling magic. Any suggestions on how to get the page size used for the PPGTT? I think it is worthwhile to make sure that atleast some of the noise is in the next page, but I admit that is just a hunchy thing. > Does using rand() really help us? Why not just come up with some hash-like > thing which generates consistent pseudo-random data? > How about something like "value.[i] = i * 853 + 193" (some random primes)? > That would mean that we can generate the data and check > it without having to store it in a per-bo temporary. If you want it to be > magic per-bo, you could also seed it somehow with the bo handl > (just add handle * 607). I figured that rand() was the most reliable way to generate noise in addition to the least amount of code. However, if the padding values are generated with an internal routine (maybe something like value[i] = 223 * value[i - 1] + 123, with value[0] = handle; all truncated to 8-bits), that would drop the need completely for the per-buffer storage. > If we always allocate 4096B of padding, then you don't need to heap allocate > it and can just put it on the stack for the purpose of interacting with > pread/pwrite. It's a bit big but still perfectly reasonable. If a day came > when we wanted to make the padding size adjustable, it would still > probably be reasonable to make the heap allocations temporary so we have less > random stuff in the BO we have to cleanup. I was tempted to make it stack allocated, but the 4096 size scared me off... and when I thought of huge page support of 2M I ran screaming. > There's a part of me that wants to kill pread/write. However, I think you > may have come up with the only good use of it I've ever seen. :-) I was told that one of the advantages of pread is the kernel will then do the syncing magic for you, i.e. waiting for the GPU to be done with the buffer; I freely admit that now I am not 100% sure of this. > If we still keep these heap allocations, deleting them should be keyed off of > bo->padding.size or nothing at all. That is how I originally wrote it, but to handle the case where creating a brw_bo fails midway (i.e. after GEM create ioctl, but during the tmp buffer allocation), checking the existence by .size > 0 was not going to work. However, for everywhere else it is fine. At this point I am tempted to do the following for the noise padding: 1. take your suggestion and make the noise per brw_bo, but the noise is generated with an incremental chaotic function that uses GEM-handle value as a start 2. have a single integer in the brw_bo struct indicating the amount of noise padding it has; 3. for checking use that single integer to heap-allocate a temporary buffer to store the pread contents OR have the necessary syncing operations and use the mapping pointer to read the values. The latter is more tempting I admit though. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: compute scratch space size correctly for Gen9
Glad that this helped. The main lead for fixing the bug I got from using the patch series posted earlier this week " GEM BO padding to find OOB buffer writes" (URL: https://lists.freedesktop.org/archives/mesa-dev/2017-December/179658.html). I am hoping that that patch series can get reviewed and land in Mesa so that hunting for a certain classes of Heisenberg bugs can be less Heisenberg like. Best Regards, -Kevin -Original Message- From: Kenneth Graunke [mailto:kenn...@whitecape.org] Sent: Tuesday, December 12, 2017 9:09 PM To: mesa-dev@lists.freedesktop.org Cc: Rogovin, Kevin Subject: Re: [Mesa-dev] [PATCH 2/2] i965: compute scratch space size correctly for Gen9 On Tuesday, December 12, 2017 4:17:27 AM PST kevin.rogo...@intel.com wrote: > From: Kevin Rogovin > > Signed-off-by: Kevin Rogovin > --- > src/mesa/drivers/dri/i965/brw_program.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_program.c > b/src/mesa/drivers/dri/i965/brw_program.c > index 6aa4100..1ae0aa0 100644 > --- a/src/mesa/drivers/dri/i965/brw_program.c > +++ b/src/mesa/drivers/dri/i965/brw_program.c > @@ -368,9 +368,13 @@ brw_alloc_stage_scratch(struct brw_context *brw, > * > * According to the other driver team, this applies to compute shaders > * as well. This is not currently documented at all. > + * > + * brw->screen->subslice_total is the TOTAL number of subslices > + * and we wish to view that there are 4 subslices per slice > + * instead of the actual number of subslices per slice. > */ >if (devinfo->gen >= 9) > - subslices = 4; > + subslices = 4 * brw->screen->devinfo.num_slices; > >/* WaCSScratchSize:hsw > * > Thank you! I'd meant to clean up the nonsense in patch 1 a while ago, but I guess I got distracted. Good catch on the bug, too... First is R-b, and this one gets: Fixes: 8ecdbb61360 "i965: Pretend there are 4 subslices for compute shader threads on Gen9+." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104005 Reviewed-by: Kenneth Graunke Both are now pushed: To ssh://git.freedesktop.org/git/mesa/mesa 7469966ed2a..b1ce812c514 master -> master ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: compute scratch space size correctly for Gen9
Ahh futz this was the wrong one!! it should just be subslices += 4 * brw->screen->devinfo.num_slices; Sighs (too many terminals open with too many branches :P ) I need to post a v2; I will post it shortly. -Kevin -Original Message- From: Rogovin, Kevin Sent: Tuesday, December 12, 2017 12:05 PM To: mesa-dev@lists.freedesktop.org Cc: Rogovin, Kevin Subject: [PATCH 2/2] i965: compute scratch space size correctly for Gen9 From: Kevin Rogovin Signed-off-by: Kevin Rogovin --- src/mesa/drivers/dri/i965/brw_program.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c index 6aa41009e7..7bfcad9a65 100644 --- a/src/mesa/drivers/dri/i965/brw_program.c +++ b/src/mesa/drivers/dri/i965/brw_program.c @@ -368,9 +368,17 @@ brw_alloc_stage_scratch(struct brw_context *brw, * * According to the other driver team, this applies to compute shaders * as well. This is not currently documented at all. + * + * brw->screen->subslice_total is the TOTAL number of subslices + * and we wish to view that there are 4 subslices per slice + * instead of the actaul number of subslices per slice. */ - if (devinfo->gen >= 9) - subslices = 4; + if (devinfo->gen >= 9) { + subslices = 0; + for (int i = 0; i < brw->screen->devinfo.num_slices; ++i) { +subslices += 4 * brw->screen->devinfo.num_subslices[i]; + } + } /* WaCSScratchSize:hsw * -- 2.15.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/2] i965: scratch space fixes
> These patches fix GPU hangs I'm seeing also with the *GL* version of > CarChase on KBL GT3e, when using Ubuntu 16.04 kernel. > NOTE: those hangs don't happen when doing tests with latest drm-tip > kernel. Besides the older Ubuntu kernel, I'm seeing hangs also with the > 4.13 drm-tip kernel, but these have happened only since early November > version of Mesa. Chances are it is caused by that the out-of-bounds overwrites in older kernels landed in the scratch page; it is essentially just luck of the out-of-bounds write lands in scratch or someplace important where corrupted data hangs the GPU (for example the memory holding the GEN shaders). -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965: correctly program MEDIA_VFE_STATE for compute shading
Just a comment: in truth the MEDIA_VFE_STATE -was- programmed correctly without this patch; it turns out that the PerThreadScratchSpace are the first bits in the bytes holding the scratch base pointer; those first bits are used by the HW (and the GENX pack knows this and accounts for it) to stash state. In all honesty this patch is not necessary to fix car-chase, the patch is just a readability patch. My apologies for jumping the gun and not checking if the bits for PerThreadScratchSpace were of the first bits of the BO for scratch space. Sighs. In spite of that it is just a readability patch, I think it should land to aid in readability of the code. -Kevin -Original Message- From: Rogovin, Kevin Sent: Tuesday, December 12, 2017 12:05 PM To: mesa-dev@lists.freedesktop.org Cc: Rogovin, Kevin Subject: [PATCH 1/2] i965: correctly program MEDIA_VFE_STATE for compute shading From: Kevin Rogovin Signed-off-by: Kevin Rogovin --- src/mesa/drivers/dri/i965/genX_state_upload.c | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c b/src/mesa/drivers/dri/i965/genX_state_upload.c index 04a492539a..50ac5bc59f 100644 --- a/src/mesa/drivers/dri/i965/genX_state_upload.c +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c @@ -4183,28 +4183,35 @@ genX(upload_cs_state)(struct brw_context *brw) brw_batch_emit(brw, GENX(MEDIA_VFE_STATE), vfe) { if (prog_data->total_scratch) { - uint32_t bo_offset; + uint32_t per_thread_scratch_value; if (GEN_GEN >= 8) { /* Broadwell's Per Thread Scratch Space is in the range [0, 11] * where 0 = 1k, 1 = 2k, 2 = 4k, ..., 11 = 2M. */ -bo_offset = ffs(stage_state->per_thread_scratch) - 11; +per_thread_scratch_value = ffs(stage_state->per_thread_scratch) - 11; } else if (GEN_IS_HASWELL) { /* Haswell's Per Thread Scratch Space is in the range [0, 10] * where 0 = 2k, 1 = 4k, 2 = 8k, ..., 10 = 2M. */ -bo_offset = ffs(stage_state->per_thread_scratch) - 12; +per_thread_scratch_value = ffs(stage_state->per_thread_scratch) - 12; } else { /* Earlier platforms use the range [0, 11] to mean [1kB, 12kB] * where 0 = 1kB, 1 = 2kB, 2 = 3kB, ..., 11 = 12kB. */ -bo_offset = stage_state->per_thread_scratch / 1024 - 1; +per_thread_scratch_value = stage_state->per_thread_scratch / 1024 - 1; } - vfe.ScratchSpaceBasePointer = -rw_bo(stage_state->scratch_bo, bo_offset); + vfe.ScratchSpaceBasePointer = rw_bo(stage_state->scratch_bo, 0); + vfe.PerThreadScratchSpace = per_thread_scratch_value; } + /* If brw->screen->subslice_total is greater than one, then + * devinfo->max_cs_threads stores number of threads per sub-slice; + * thus we need to multiply by that number by subslices to get + * the actual maximum number of threads; the -1 is because the HW + * has a bias of 1 (would not make sense to say the maximum number + * of threads is 0). + */ const uint32_t subslices = MAX2(brw->screen->subslice_total, 1); vfe.MaximumNumberofThreads = devinfo->max_cs_threads * subslices - 1; vfe.NumberofURBEntries = GEN_GEN >= 8 ? 2 : 0; -- 2.15.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
>I'd have to think about it harder but even those are not likely to actually >get ASTC decode. Maybe for some sort of decompression thing but if you're >doing > GetCompressedTexImage, it'll probably turn into a blorp_copy which will use > R32G32B32A32_UINT. I am thinking for the case where an application calls glGetTexImage() or glGetTextureImage(), which I think is legal in GL and the GL implementation is expected to do the decompress. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
Hi, >This isn't true. 100% of the intel_mipmap_tree -> blorp_surf translations are >handled by that function. > It's a perfectly reasonable place to handle these things. It could also be > handled in genX(blorp_exec) if that makes someone more comfortable. This is where I placed the ASTC enumeration setting, in genX(blorp_exec). I added a boolean signaling if the caller to blorp believed it would sample from an ASTC, if it did it sets the enum as ASTC otherwise as AUX. I confess I am not super familiar with blorp, but as far as I can tell, the only way for blorp to read an ASTC is for GetTexImage/GetTexSubImage since an ASTC surface cannot be used for an FBO. > The problem is that a single invalidate doesn't actually cause a > synchronization point in the rendering operation. All it does is torch the > texture cache and it > does so immediately and in parallel with currently active rendering. If you > can't have ASTC5x5 in the texture cache with a CCS_E surface, then we need to > do a CS stall to ensure that the previous draw is complete before we > invalidate. Otherwise, if the draw with ASTC5x5 is still in-flight, the > texture cache will > immediately start to get populated with ASTC5x5 data again. Ahh futz, I forget to add that stall.. by luck the guilty application worked anyways (when I first wrote the work I did intel_batchbuffer_flush() and relaxed it down to texture invalidate). ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
> Are you saying that this bug extends over hardware context? Different HW contexts imply different execbuffer2 ioctl's. The kernel inserts a full blown flush of everything after (or before, I cannot remember) each execbuffer2 call. This way there is context isolation in HW buggineness. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
> If you take a look at brw_update_texture_surface(), just in the end before > brw_emit_surface_state() the logic explictly consults for > intel_miptree_texture_aux_usage(). > This in turn tells if the auxiliary buffer is resolved and it doesn't need to > be programmed. The full stack trace is this: brw_update_texture_surface() calls intel_miptree_texture_usage() which for HiZ calls the function intel_miptree_sample_with_hiz() which, as the name suggest, returns true if and only if it is ok to use the HiZ to speed up depth texture reads. Indeed, the function calls intel_miptree_texture_usage() switches on intel_mipmap_tree::aux_usage which the documentation states is about the intended usage of the auxiliary buffer. Indeed, if the value is ISL_AUX_USAGE_NONE that means, quoting the comment tag above the field, "that auxiliary compression is permanently disabled". The conclusion then is that even if a buffer is fully resolved, the value of aux_usage is not ISL_AUX_USAGE_NONE and that the return value of calls intel_miptree_texture_usage() gives the return value assuming that the auxiliary buffer can be used with respect to that it holds good values enough for the sampler. >This path goes thru brw_blorp_download_miptree() which in turn takes either > brw_blorp_copy_miptrees() or brw_blorp_blit_miptrees(). Both in turn consult > blorp_surf_for_miptree(). If you are 100% sure (because I am not) that ALL blorp paths walk through that, then I have no real objection except it needs to do the magic of checking if it is reading from an ASTC5x5 or a surface with an auxiliary and update the enumeration appropriately. > This would be a nice a piece of documentation to add into the code. Thanks > for explaining. I can add this, though I do freely admit I take this for granted and an axiom of HW accelerated graphics. >I don't see how the bit mask prevents one from forcing anything. By not being able to encode such a state (both are present) such a state is impossible to store and cannot be reached. From the point of view of code, an enum is slightly more pleasant to read than bitmaks IMO. > I now do see a flaw in the RFC related to this. In > brw_predraw_resolve_inputs() one would > actually need to smash down the recorded AUX mask bit when it reacts to > ASTC5x5 being present. Indeed, really two passes are needed over the textures. The first pass to detect if ASTC5x5 is present and a second to resolve auxiliary using textures if they are present. >Well, if nothing else, I would really like to see you used >brw_predraw_resolve_inputs() for the resolves. I am happy with that as that walks through the textures anyways BUT it is called before brw_predraw_resolve_framebuffer() which might do some resolving too. The easy way out is to permute their calling order unless there is some other dangling assumption preventing permuting the call order of those two. > > -Kevin > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
Hi, >> Here are my comments of the patch posted: >> >> 1. it is essentially replication and moving around of the code of the >> patch series posted earlier but missing various >> important bits: preventing the sampler from using the auxiliary buffer >> (this requires to modify surface state >> sent in brw_wm_surfaces.c). It also does not cover blorp sufficiently >> (blorp might read from an ASTC5x5 >> and there are more paths in blorp than blorp_surf_for_miptree() that >> sample from surfaces. >> >Can you explain both more in detail? Resolves done in >brw_predraw_resolve_inputs() mean that there is nothing interesting in the aux >buffers and surface setup won't therefore enable auxiliary for texture >surfaces. That there is nothing interesting is irrelevant to the sampler if the SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the GPU command; The sampler will always try to read the auxiliary buffer if it is given the opportunity to do so. Indeed, it is quite feasible that less bandwidth is consumed if the sampler is given the chance to read an auxiliary buffer in place of the buffer; as such even if the surface is resolved one may wish to feed the sampler the auxiliary buffer. Indeed, for HiZ, i965 programs to use the HiZ auxiliary buffer even if the depth buffer is fully resolved (see inte_mipmap_tree_sample_with_hiz() in intel_mipmap_tree.c). > In case of blorp, as far as I know all operations sampling something should > go thru blorp_surf_for_miptree(). Can you point out cases that don't? Blorp is used in more than blorp_surf_for_miptree(), for example implementing GetTexImage(). Indeed, it is possible for blorp to sample from an ASTC5x5 (you can see this handled in the patch series I posted). I chose the hammer that the default is to just assume blorp is going to access auxiliary buffers unless a flag is set when the caller knows that blorp is going to sample from an astc5x5 (against see my patch series). >Right. In the case of sampling both aux and astc5x5 in the same draw cycle the >only thing to do is to disable aux. With my question of direction I meant the >texture > cache flush between two cycles. Do we need to flush in both cases > 1) ASTC5x5 in first cycle and AUX in the following > 2) AUX in first cycle and ASTC5x5 in the following YES we need to flush in both cases. What is happening is that the sampler hardware is bugged. Let us suppose it was bugged in only 1 direction, take 1. Then if the sampler first samples from an ASTC5x5 then an AUX it would not hang, but the other way it would. However, if there are multiple draws in flight where one samples from an ASTC5x5 and the other does not, the command buffer order gives ZERO guarantee that the sampler will sample in that order because fragments get executed out-of-order even across draw calls even within a subslice (this is why sendc is needed at end of shader in GEN). >> 4. With 3 in mind, using the bit-masks is not a good idea as we want to >> then enforce at the code level >> that only one of the two is possible without texture invalidates. > Can you elaborate this a little more? It tells if aux is/was used and it > tells if astc5x5 is/was used. That is all we need, right? WRONG. We must enforce that a given draw call can have neither or only one. By having bitmasks it is possible to support a state having both. At any rate, please review the patch series I have posted and I am happy to take suggestions to improve that patch series that I have tested. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug
Hi, The patch series I have submitted handles the case of needing to resolve texture surfaces when a draw (or compute) accesses a texture which is astc5x5. As it is quite clear you understand the issue and know the code of i965 the patch centers on, you are an excellent person to review the code. Here are my comments of the patch posted: 1. it is essentially replication and moving around of the code of the patch series posted earlier but missing various important bits: preventing the sampler from using the auxiliary buffer (this requires to modify surface state sent in brw_wm_surfaces.c). It also does not cover blorp sufficiently (blorp might read from an ASTC5x5 and there are more paths in blorp than blorp_surf_for_miptree() that sample from surfaces. 2. using the check that the GPU is gen9 (and for that matter, using the name gen9_ astc5x5_sampler_wa()) is not ideal; I would not be surprised that the bug might not be present on various re-spins of Gen9 and/or it might linger on to future Gens. I do NOT know; using a Boolean assigned earlier makes the code more future-ish proof. 3. the nature of GPU fragment dispatch is going to require a texture invalidate even if the sampler only have the bug in one direction; this is because a subslice is not guaranteed to process fragments in any order. The crux is that a single sampler serves an entire sub-slice which has more than 1 EU. It is quite possible that one EU has threads of a draw call ahead of the other and depending on the timing, portions of those fragments' coming after might be processed by the sampler of those before of those fragments coming before in batchbuffer order. Indeed a single EU might have threads from separate draws as well. A texture invalidate places a barrier in the pipeline preventing the mixing (and means that efficiency of GPU drops quite a bit with every texture invalidate sadly). 4. With 3 in mind, using the bit-masks is not a good idea as we want to then enforce at the code level that only one of the two is possible without texture invalidates. -Kevin -Original Message- From: Topi Pohjolainen [mailto:topi.pohjolai...@gmail.com] Sent: Monday, December 4, 2017 12:49 PM To: mesa-dev@lists.freedesktop.org Cc: Rogovin, Kevin Subject: [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug This is just drafting some thoughts and only compile tested. CC: "Rogovin, Kevin" --- src/mesa/drivers/dri/i965/brw_blorp.c | 8 + src/mesa/drivers/dri/i965/brw_context.h | 10 ++ src/mesa/drivers/dri/i965/brw_draw.c| 54 - 3 files changed, 71 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c b/src/mesa/drivers/dri/i965/brw_blorp.c index 680121b6ab..b3f84ab8ca 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp.c +++ b/src/mesa/drivers/dri/i965/brw_blorp.c @@ -186,11 +186,19 @@ blorp_surf_for_miptree(struct brw_context *brw, surf->aux_addr.buffer = mt->hiz_buf->bo; surf->aux_addr.offset = mt->hiz_buf->offset; } + + if (!is_render_target && brw->screen->devinfo.gen == 9) + gen9_astc5x5_sampler_wa(brw, GEN9_ASTC5X5_WA_TEX_TYPE_AUX); } else { surf->aux_addr = (struct blorp_address) { .buffer = NULL, }; memset(&surf->clear_color, 0, sizeof(surf->clear_color)); + + if (!is_render_target && brw->screen->devinfo.gen == 9 && + (mt->format == MESA_FORMAT_RGBA_ASTC_5x5 || + mt->format == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5)) + gen9_astc5x5_sampler_wa(brw, + GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5); } assert((surf->aux_usage == ISL_AUX_USAGE_NONE) == (surf->aux_addr.buffer == NULL)); diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 0670483806..44602c23c0 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -165,6 +165,11 @@ enum brw_cache_id { BRW_MAX_CACHE }; +enum gen9_astc5x5_wa_tex_type { + GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5 = 1 << 0, + GEN9_ASTC5X5_WA_TEX_TYPE_AUX = 1 << 1, +}; + enum brw_state_id { /* brw_cache_ids must come first - see brw_program_cache.c */ BRW_STATE_URB_FENCE = BRW_MAX_CACHE, @@ -1262,6 +1267,8 @@ struct brw_context */ bool draw_aux_buffer_disabled[MAX_DRAW_BUFFERS]; + enum gen9_astc5x5_wa_tex_type gen9_sampler_wa_tex_mask; + __DRIcontext *driContext; struct intel_screen *screen; }; @@ -1286,6 +1293,9 @@ void intel_update_renderbuffers(__DRIcontext *context, __DRIdrawable *drawable); void intel_prepare_render(struct brw_context *brw); +void gen9_astc5x5_sampler_wa(struct brw_context *brw, +
Re: [Mesa-dev] [PATCH 0/5] i965: ASTC5x5 workaround
Hi, > 1) do extra tex cache flush when needed and specifically only when needed > 2) resolve surfaces when undesired combination is about to be sampled >Latter case could be addressed also with on-the-fly check in >brw_predraw_resolve_inputs(). There one goes thru all the active textures for >the > next draw call and resolves when necessary. One could check for the undesired > combination of textures there. I understand we would need to > walk the textures twice, first to check for any occurrences of one type and > then for the other. This, however, would fit to the way we resolve > other texture types and prevent from adding more things to check into the > context. This patch series does handle the second case, as follows: a) Checking if there are astc5x5 textures and/or textures with auxiliary is done in brw_validate_textures(); this was a really nice place because the function loops over all textures anyways b) the resolve operation is handled by the added function brw_atsc_perform_wa(); this also sets the mode correctly too after the resolve. I chose to make a dedicated function that does the mode setting and resolve. Putting the resolve logic into brw_predraw_resolve_inputs() is not a bad idea and I am open to it; the interior of the loop would then look ickier as it would had the check on each texture unit of if the workaround is necessary and if there is an astc5x5 sampler handing around. It would also kill off a member from the astc5x5_wa struct that tracks if there are auxiliary textures. At any rate, I'd appreciate some review for the code so it can land in some form shortly. -Kevin > > > Kevin Rogovin (5): > i965: define astc5x5 workaround infrastructure > i965: ASTC5x5 workaround logic for blorp > i965: set ASTC5x5 workaround texture type tracking on texture validate > i965: use ASTC5x5 workaround in brw_draw > i965: use ASTC5x5 workaround in brw_compute > > src/mesa/drivers/dri/i965/brw_compute.c | 6 +++ > src/mesa/drivers/dri/i965/brw_context.c | 63 > > src/mesa/drivers/dri/i965/brw_context.h | 23 + > src/mesa/drivers/dri/i965/brw_draw.c | 6 +++ > src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 5 ++ > src/mesa/drivers/dri/i965/genX_blorp_exec.c | 5 ++ > src/mesa/drivers/dri/i965/intel_batchbuffer.c| 1 + > src/mesa/drivers/dri/i965/intel_tex_image.c | 16 -- > src/mesa/drivers/dri/i965/intel_tex_validate.c | 13 + > 9 files changed, 134 insertions(+), 4 deletions(-) > > -- > 2.14.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/5] i965: ASTC5x5 workaround
Hi, For ANV I do not know as I have not really poked into its code. For i965, this patch series handles the situation as to what to do if a draw of dispatch compute accesses both an ASTC5x5 texture and a texture with an auxiliary buffer. It does this by checking if there are both such textures and ASTC5x5 textures in the list of currently bound textures. If the answer is yes, then it resolves all such auxiliary requiring textures that use an auxiliary buffer so that the sampler does not need them when it reads from the surfaces. The resolve stuff is handled in the function brw_astc5x5_perform_wa(() in brw_context.c of the first patch, the checking is handled in the 3rd patch by modifying brw_tex_validate() in intel_tex_validate.c. The 4'th and 5'th patches are deceptively small since all they do is add a call to brw_astc5x5_perform_wa(() in brw_draw.c and brw_compute.c respectively. The 4th patch also has a small addition to prevent surface state for sampler state to have the auxiliary surface given in the call. As to how to do similar auto-resolve and tweak of state on ANV, I need to dive quite deep into the code to see how to do it. -Kevin -Original Message- From: Matt Turner [mailto:matts...@gmail.com] Sent: Friday, December 1, 2017 8:25 PM To: Rogovin, Kevin Cc: Ilia Mirkin ; mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH 0/5] i965: ASTC5x5 workaround On Fri, Dec 1, 2017 at 10:06 AM, Rogovin, Kevin wrote: > Hi, > > Yes ANV will need something like this as well. If the GPU samples from both > an ASTC5x5 texture and one with an aux buffer without a texture cache > invalidate between such accesses, then the GPU hangs, which in turn makes the > system unresponsive for a few seconds until the kernel resets the GPU; then > an ioctl will fail in i965 which means things are very bad usually and (for > me atleast on my system with how I build mesa) the application crashes. I think his question is -- is there anything we can do about the case where a single shader program samples ASTC5x5 and a texture with an aux buffer? Presumably there's no way to invalidate the texture cache during a shader program, so what can you do? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/5] i965: ASTC5x5 workaround
Hi, Yes ANV will need something like this as well. If the GPU samples from both an ASTC5x5 texture and one with an aux buffer without a texture cache invalidate between such accesses, then the GPU hangs, which in turn makes the system unresponsive for a few seconds until the kernel resets the GPU; then an ioctl will fail in i965 which means things are very bad usually and (for me atleast on my system with how I build mesa) the application crashes. -Kevin -Original Message- From: ibmir...@gmail.com [mailto:ibmir...@gmail.com] On Behalf Of Ilia Mirkin Sent: Friday, December 1, 2017 7:38 PM To: Rogovin, Kevin Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH 0/5] i965: ASTC5x5 workaround On Fri, Dec 1, 2017 at 12:19 PM, wrote: > From: Kevin Rogovin > > This patch series implements a needed workaround for Gen9 for ASTC5x5 > sampler reads. The crux of the work around is to make sure that the > sampler does not read an ASTC5x5 texture and a surface with an > auxilary buffer without having a texture cache invalidate between such > accesses. Presumably anv needs something like this too? What happens if you have a single draw which samples from both an ASTC5x5 texture and one with an aux buffer? [Just curious.] ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 07/18] intel/compiler: fix for memmove argument on annotating error
Thankyou! very much for the patch to the command line disassembler. -Kevin -Original Message- From: Matt Turner [mailto:matts...@gmail.com] Sent: Friday, November 17, 2017 6:52 AM To: Rogovin, Kevin Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH 07/18] intel/compiler: fix for memmove argument on annotating error On Wed, Nov 15, 2017 at 12:13 AM, Rogovin, Kevin wrote: > I have just seen that I have had an epic brain lapse on this. > > The code is pretty clear, the correct value of count should be ann_count - i. > This is because: > a. The value of ann_count is the value of the array size BEFORE the insert; > this is clear from the code within the if (offset + ..) where it increments > ann_count. > b. Since ann_count is the size before the insert, it should move the content > in the open range [i, ann_count) to [i + 1, ann_count + 1); thus the number > of elements is given by ann_count - i. > > Changing the count to ann_count - i does the right thing, and also makes it > quite clear that Matt is correct on patch 11: that it was just papering over > a bug from using the wrong count value. I think I used the wrong data structure :) I reproduced the problem you were having and then got tired of thinking about how to memmove elements. I rewrote the annotation code yesterday to use a linked list, which is much more natural. I just sent the patches. Attached is a patch you can squash into your series in order to make it work on top of my series. > However, once this is done, the build does assert() on some shaders that I > have; this is because it fails to understand some registers. I think that's a result of the disassembler not knowing how to disassemble sends/sendsc. Both Toni and Neil have written patches for that. I'll see if I can rebase them. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Difference between Meson and autotools release builds
Thanks, I missed that; sorry for the mailing list chatter caused by missing the original e-mail on the material. -Kevin -Original Message- From: Eric Engestrom [mailto:eric.engest...@imgtec.com] Sent: Wednesday, November 15, 2017 1:39 PM To: Rogovin, Kevin Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] Difference between Meson and autotools release builds On Wednesday, 2017-11-15 09:43:59 +, Rogovin, Kevin wrote: > Hi, > > I do not know if anyone has noticed or if it is deliberate, but the meson > builds have that assert()'s are active but the autotools release builds have > that assert() is inactive. Hi, Yep, this is by design, meson decouples the build type (eg. "release") from whether asserts are enabled; you can read more about it in this thread: https://lists.freedesktop.org/archives/mesa-dev/2017-November/175739.html You can already read the not-yet-landed documentation in this patch: https://lists.freedesktop.org/archives/mesa-dev/2017-November/175854.html (or if it lands before you read this, the up-to-date meson-for-mesa documentation will live at: https://mesa3d.org/meson.html) Cheers, Eric ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Difference between Meson and autotools release builds
Just to clarify, I do not mean just the defaults, but doing -buildtype release also leaves assert()'s active (atleast on my system setup). -Kevin From: Rogovin, Kevin Sent: Wednesday, November 15, 2017 11:44 AM To: mesa-dev@lists.freedesktop.org Subject: Difference between Meson and autotools release builds Hi, I do not know if anyone has noticed or if it is deliberate, but the meson builds have that assert()'s are active but the autotools release builds have that assert() is inactive. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Difference between Meson and autotools release builds
Hi, I do not know if anyone has noticed or if it is deliberate, but the meson builds have that assert()'s are active but the autotools release builds have that assert() is inactive. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 07/18] intel/compiler: fix for memmove argument on annotating error
I have just seen that I have had an epic brain lapse on this. The code is pretty clear, the correct value of count should be ann_count - i. This is because: a. The value of ann_count is the value of the array size BEFORE the insert; this is clear from the code within the if (offset + ..) where it increments ann_count. b. Since ann_count is the size before the insert, it should move the content in the open range [i, ann_count) to [i + 1, ann_count + 1); thus the number of elements is given by ann_count - i. Changing the count to ann_count - i does the right thing, and also makes it quite clear that Matt is correct on patch 11: that it was just papering over a bug from using the wrong count value. However, once this is done, the build does assert() on some shaders that I have; this is because it fails to understand some registers. -Kevin -Original Message- From: Matt Turner [mailto:matts...@gmail.com] Sent: Monday, November 13, 2017 11:21 PM To: Rogovin, Kevin Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH 07/18] intel/compiler: fix for memmove argument on annotating error On Mon, Nov 13, 2017 at 1:12 PM, Rogovin, Kevin wrote: > Hi, > > > I confess I am not 100% on this code and I did educated guessing what it is > trying to do; I figured it was trying to insert contents at the current index > i; and that ann_count is the size -after- the insert. thus I figured the > memmove is to move the half open interval [i, ann_count-1) to the half open > interval [i + 1, ann_count). The number of elements in the half open range > [i, ann_count - 1) is given by ann_count - i - 1. > > I tried changing the count from ann_count - i - 1 to ann_count - i + 1 and > then the disassembler crashed in annotation on the same shaders that I have > had it crash on before. Interesting. Could you send me the shader binary? I'll take a look. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 18/18] intel/tools: add command line GEN shader disassembler tool
Hi, I've made a preliminary meson.build in src/intel/tools for the stuff; One can see it in the commit " src/intel/tools: add BatchbufferLogger to meson build system " at https://github.com/krogueintel/asem/tree/batchbuffer-logger . Comments welcome. Best Regards, -Kevin P.S. This was my first experience with Meson; It was so much nicer to add a target compared to autotool and meson is so much faster to build from a git clean than autotools. So much faster. -Original Message- From: Eric Engestrom [mailto:eric.engest...@imgtec.com] Sent: Monday, November 13, 2017 8:09 PM To: Rogovin, Kevin Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH 18/18] intel/tools: add command line GEN shader disassembler tool On Monday, 2017-11-13 15:18:06 +0200, kevin.rogo...@intel.com wrote: > From: Kevin Rogovin > > Signed-off-by: Kevin Rogovin > --- > src/intel/Makefile.tools.am | 21 ++- > src/intel/tools/.gitignore| 1 + > src/intel/tools/gen_shader_disassembler.c | 221 > ++ > 3 files changed, 242 insertions(+), 1 deletion(-) create mode 100644 > src/intel/tools/gen_shader_disassembler.c > > diff --git a/src/intel/Makefile.tools.am b/src/intel/Makefile.tools.am > index dd68d8f173..71eb3253c3 100644 > --- a/src/intel/Makefile.tools.am > +++ b/src/intel/Makefile.tools.am > @@ -32,7 +32,8 @@ intellib_LTLIBRARIES = \ > > intelbin_PROGRAMS = tools/i965_batchbuffer_dump_show \ > tools/i965_batchbuffer_dump_show_xml \ > - tools/i965_batchbuffer_dump_show_json > + tools/i965_batchbuffer_dump_show_json \ > + tools/gen_shader_disassembler Could you add these to src/intel/tools/meson.build as well? Thanks :) > > intelbin_SCRIPTS = tools/i965_batchbuffer_logger_sh CLEANFILES += > $(intelbin_SCRIPTS) @@ -111,3 +112,21 @@ > tools_i965_batchbuffer_dump_show_xml_SOURCES = \ > > tools_i965_batchbuffer_dump_show_json_SOURCES = \ > tools/i965_batchbuffer_dump_show_json.cpp > + > +tools_gen_shader_disassembler_SOURCES = \ > + tools/gen_shader_disassembler.c \ > + tools/disasm.c \ > + tools/gen_disasm.h > + > +tools_gen_shader_disassembler_LDADD = \ > + common/libintel_common.la \ > + compiler/libintel_compiler.la \ > + $(top_builddir)/src/util/libmesautil.la \ > + $(PTHREAD_LIBS) \ > + $(EXPAT_LIBS) \ > + $(ZLIB_LIBS) > + > +tools_gen_shader_disassembler_CFLAGS = \ > + $(AM_CFLAGS) \ > + $(EXPAT_CFLAGS) \ > + $(ZLIB_CFLAGS) > diff --git a/src/intel/tools/.gitignore b/src/intel/tools/.gitignore > index ea4dc23c20..e9b22c89aa 100644 > --- a/src/intel/tools/.gitignore > +++ b/src/intel/tools/.gitignore > @@ -4,3 +4,4 @@ > /i965_batchbuffer_dump_show > /i965_batchbuffer_dump_show_xml > /i965_batchbuffer_dump_show_json > +/gen_shader_disassembler > diff --git a/src/intel/tools/gen_shader_disassembler.c > b/src/intel/tools/gen_shader_disassembler.c > new file mode 100644 > index 00..bd6c400fcc > --- /dev/null > +++ b/src/intel/tools/gen_shader_disassembler.c > @@ -0,0 +1,221 @@ > +/* > + * Copyright © 2017 Intel Corporation > + * > + * Permission is hereby granted, free of charge, to any person > +obtaining a > + * copy of this software and associated documentation files (the > +"Software"), > + * to deal in the Software without restriction, including without > +limitation > + * the rights to use, copy, modify, merge, publish, distribute, > +sublicense, > + * and/or sell copies of the Software, and to permit persons to whom > +the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including > +the next > + * paragraph) shall be included in all copies or substantial portions > +of the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > +EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > +MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT > +SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES > +OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, > +ARISING > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR > +OTHER DEALINGS > + * IN THE SOFTWARE. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "compiler/brw_inst.h" > +#include "compiler/brw_eu.h" > + > +static > +void > +print_opcodes(const void *data, int da
Re: [Mesa-dev] [PATCH 11/18] intel/tools/disasm: make sure that entire range is disassembled
Your theory makes sense to me too; I suspect that something in the annotation code is the ultimate cause. -Original Message- From: Matt Turner [mailto:matts...@gmail.com] Sent: Monday, November 13, 2017 9:15 PM To: Rogovin, Kevin Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH 11/18] intel/tools/disasm: make sure that entire range is disassembled On Mon, Nov 13, 2017 at 5:17 AM, wrote: > From: Kevin Rogovin > > Without this patch, if a shader has errors, the disassembly of the > shader often stops after the first opcode that has errors. I can't see anything wrong with the current code. If I'm right that 07/18 is wrong, could this be papering over the bug introduced in that patch which would have caused annotations to get lost? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/18] intel/tools/disasm: add gen_disasm_assembly_length function
I need this function in order for the logger to save shader binary to disk. -Kevin -Original Message- From: Matt Turner [mailto:matts...@gmail.com] Sent: Monday, November 13, 2017 9:43 PM To: Rogovin, Kevin Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH 10/18] intel/tools/disasm: add gen_disasm_assembly_length function Pending the questions about 07/18, I don't think this should be necessary. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/18] intel/compiler:add function to give option to print offsets into assembly
I am fine with that suggestion of changing the function signature and to not add a new one; I wanted the changes I made to be as uninvasive as possible which is why I added the function instead of changing an existing one. -Kevin -Original Message- From: Matt Turner [mailto:matts...@gmail.com] Sent: Monday, November 13, 2017 9:42 PM To: Rogovin, Kevin Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH 08/18] intel/compiler:add function to give option to print offsets into assembly On Mon, Nov 13, 2017 at 5:17 AM, wrote: > From: Kevin Rogovin > > Signed-off-by: Kevin Rogovin > --- > src/intel/compiler/brw_eu.c | 11 ++- > src/intel/compiler/brw_eu.h | 3 +++ > 2 files changed, 13 insertions(+), 1 deletion(-) > > diff --git a/src/intel/compiler/brw_eu.c b/src/intel/compiler/brw_eu.c > index bc297a21b3..8969ae5bda 100644 > --- a/src/intel/compiler/brw_eu.c > +++ b/src/intel/compiler/brw_eu.c > @@ -339,6 +339,15 @@ const unsigned *brw_get_program( struct > brw_codegen *p, void brw_disassemble(const struct gen_device_info > *devinfo, > const void *assembly, int start, int end, FILE *out) > +{ > + brw_disassemble_print_offset_option(devinfo, assembly, start, end, out, > + false); } > + > +void > +brw_disassemble_print_offset_option(const struct gen_device_info *devinfo, > +const void *assembly, int start, int end, > +FILE *out, bool print_offsets) > { > bool dump_hex = (INTEL_DEBUG & DEBUG_HEX) != 0; > > @@ -346,7 +355,7 @@ brw_disassemble(const struct gen_device_info *devinfo, >const brw_inst *insn = assembly + offset; >brw_inst uncompacted; >bool compacted = brw_inst_cmpt_control(devinfo, insn); > - if (0) > + if (print_offsets) > fprintf(out, "0x%08x: ", offset); > >if (compacted) { > diff --git a/src/intel/compiler/brw_eu.h b/src/intel/compiler/brw_eu.h > index 95503d5513..497cf9e575 100644 > --- a/src/intel/compiler/brw_eu.h > +++ b/src/intel/compiler/brw_eu.h > @@ -128,6 +128,9 @@ int brw_disassemble_inst(FILE *file, const struct > gen_device_info *devinfo, > const struct brw_inst *inst, bool > is_compacted); void brw_disassemble(const struct gen_device_info *devinfo, > const void *assembly, int start, int end, FILE > *out); > +void brw_disassemble_print_offset_option(const struct gen_device_info > *devinfo, > + const void *assembly, int start, > int end, > + FILE *out, bool > +print_offsets); Instead of adding a new function, I'd be in favor of just adding the bool print_offsets parameter to brw_disassemble (before the FILE parameter). Then we can add an INTEL_DEBUG=print_offsets and pass INTEL_DEBUG & DEBUG_PRINT_OFFSETS as the argument in the existing calls. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 07/18] intel/compiler: fix for memmove argument on annotating error
Hi, I confess I am not 100% on this code and I did educated guessing what it is trying to do; I figured it was trying to insert contents at the current index i; and that ann_count is the size -after- the insert. thus I figured the memmove is to move the half open interval [i, ann_count-1) to the half open interval [i + 1, ann_count). The number of elements in the half open range [i, ann_count - 1) is given by ann_count - i - 1. I tried changing the count from ann_count - i - 1 to ann_count - i + 1 and then the disassembler crashed in annotation on the same shaders that I have had it crash on before. -Kevin -Original Message- From: Matt Turner [mailto:matts...@gmail.com] Sent: Monday, November 13, 2017 9:02 PM To: Rogovin, Kevin Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH 07/18] intel/compiler: fix for memmove argument on annotating error On Mon, Nov 13, 2017 at 5:17 AM, wrote: > From: Kevin Rogovin > > Without this fix, disassembling of GEN shaders with GPU commands that > the disassembler does not know would result in errors being added to > the annotator which would crash when more than one error was added. > > Signed-off-by: Kevin Rogovin > --- > src/intel/compiler/intel_asm_annotation.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/src/intel/compiler/intel_asm_annotation.c > b/src/intel/compiler/intel_asm_annotation.c > index b07a545a12..7aa222f04e 100644 > --- a/src/intel/compiler/intel_asm_annotation.c > +++ b/src/intel/compiler/intel_asm_annotation.c > @@ -181,8 +181,9 @@ annotation_insert_error(struct annotation_info > *annotation, unsigned offset, > continue; > >if (offset + sizeof(brw_inst) != next->offset) { > - memmove(next, cur, > - (annotation->ann_count - i + 2) * sizeof(struct > annotation)); > + int count; > + count = annotation->ann_count - i - 1; > + memmove(next, cur, count * sizeof(struct annotation)); I don't see how this can be right. Take for example a case where we have ann_count == 4. We have annotation->ann[0..4] where ann[4] is the end marker... a little surprising. We want to insert an error annotation on an instruction in ann[2] but not at the end, so we need to split ann[2]. cur = 2, next = 3. We need to memmove elements 2, 3, 4 one spot later, and then update ann[2] and ann[3] (which after the memmove is a copy of ann[2]). Count should be ann_count (4) - i (2) + 1 -> 3. The code currently says +2 instead of +1 and that seems like a bug. What do you think? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 06/18] intel/common/gen_decoder: make useable from C++ source
Hi, Oh MY! I missed that it was added to the driver (because miraculously the patch applied cleanly anyway). Thanks, I will drop this patch if I make a v3. -Kevin -Original Message- From: Landwerlin, Lionel G Sent: Monday, November 13, 2017 3:45 PM To: Rogovin, Kevin ; mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH 06/18] intel/common/gen_decoder: make useable from C++ source On 13/11/17 13:17, kevin.rogo...@intel.com wrote: > From: Kevin Rogovin > > Signed-off-by: Kevin Rogovin > --- > src/intel/common/gen_decoder.h | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/src/intel/common/gen_decoder.h > b/src/intel/common/gen_decoder.h index 8b00b6edc2..e3b2457dfd 100644 > --- a/src/intel/common/gen_decoder.h > +++ b/src/intel/common/gen_decoder.h > @@ -34,6 +34,10 @@ > extern "C" { > #endif The line above shows you can drop this patch. > > +#ifdef __cplusplus > +extern "C" { > +#endif > + > struct gen_spec; > struct gen_group; > struct gen_field; > @@ -185,6 +189,9 @@ void gen_print_group(FILE *out, >struct gen_group *group, >uint64_t offset, const uint32_t *p, >bool color); > +#ifdef __cplusplus > +} > +#endif > > #ifdef __cplusplus > } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU
Hi all, Just a gentle poke. Even though a serious issue was already found by Chris Wilson on batchbuffer migration, I would like for folks to look at the series (in particular the monster patch 16) and give comments. With those comments I will then create a v2 (indeed I've already implemented fixes for the issue that Chris pointed out on batchbuffer migration and a pair of issues I realized on the script at patch 17). Best Regards, -Kevin -Original Message- From: Rogovin, Kevin Sent: Wednesday, September 27, 2017 2:38 PM To: Chris Wilson ; mesa-dev@lists.freedesktop.org Subject: RE: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU Hi, If we just want to send to the kernel the data from the trace, I can do that very easily; just make such a GEM BO, comprising of dword-pairs of (TraceCallID, BatchbufferOffset). That will be a small buffer and together with the apitrace file, will give complete data. I could probably make such a dedicated tool quite quickly, or add that functionality to the logger. -Kevin -Original Message- From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] Sent: Wednesday, September 27, 2017 1:21 PM To: Rogovin, Kevin ; mesa-dev@lists.freedesktop.org Subject: RE: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU Quoting Rogovin, Kevin (2017-09-27 07:53:29) > Hi, > > Right now the way the thing works is that it walks the batchbuffer just > after the kernel returns from the ioctl and updates its internal view of the > GPU state as it walks and emits to the log file the data. The log on a single > batchbuffer is (essentially) just a list of call ID's from the apitrace > together of "where in the batchbuffer" that call started. > > I confess that I had not realized the potential application for using > something like this to help diagnose GPU hangs! I think it is a really good > idea. What I could do is the following (and it is not terribly hard to do): > >1. -BEFORE- issuing the ioctl, the logger walks just the api markers in > the log of the batchbuffer, and makes a new GEM BO filled with apitrace data > (call ID, and maybe GL function data) and modify the ioctl to have an extra > buffer. Yes. With the current intel_batchbuffer.c this should be relatively easy (I suggest you limit yourself to recent kernels for that simplification); see EXEC_BATCH_FIRST and remember to mark the trace bo as EXEC_OBJECT_CAPTURE. > 2. -AFTER- the ioctl returns, emit the log data (as now) and delete the GEM > BO; In order to read the GPU state more accurately I need to walk the log and > update the GPU state after the ioctl (mostly out of paranoia for values > copied from BO's to pipeline registers). Up to, but my paranoia goes the other way. Once the ioctl returns the hw is indeed using that memory, so I have less trust of it. If you need to tie the relocated pointers to the trace, I would also emit relocations into the trace. For the reasons of port-mortem GPU hang debugging, I would want the execbuf be complete before the ioctl, rather than post processing. > What would happen, is that if a batchbuffer made the GPU hang, you would then > know all the GL commands (trace ID's from the API trace) that made stuff on > that batchbuffer. Then one could go back to the apitrace of the troublesome > application and have a much better starting place to debug. Yup. As times go on, I hope this becomes a more complete flight-recorder that we don't have to rely on referencing back to a separate trace to work out the interesting calls. My goal is that you can give one instruction (that doesn't require any additional dependencies, so can just be LD_PRELOAD=i965-fdr.so, or better a script installed in mesa-utils?) to a bug reporter and that will then capture enough information. > We could also do something evil looking and put another modification on > apitrace where it can have a list of call trace ranges where it inserts > glFinish after each call. Those glFinish()'s will then force the ioctl of the > exact troublesome draw call without needing to tell i965 to flush after each > draw call. > > Just to make sure, you want the "apitrace" data (call ID list, maybe function > name) in a GEM BO? Which GEM BO should it be in the list so that kernel debug > code know which one to use to dump? I would guess if the batchbuffer is the > first buffer, then it would be the last buffer, otherwise if the batch buffer > is the last one, I guess it would be one just before, but that might screw up > reloc-data if any of the relocs in the batchbuffer refer to itself. I can > also emit the data to a file and close the file before the ioctl and if the > ioctl returns, delete said file (assuming a GPU hang always stops the > process, then a hang
Re: [Mesa-dev] [PATCH 06/22] i965: Enable BatchbufferLogger in i965 driver
>My worry is that batch->bo is not constant for the construction of a single >execbuf2. > If intel_batchbuffer.c runs out of room inside the batch->bo, it will > allocate a new one and continue building it. That will be ok, but if the (fd, GEM BO handle) changes, there is a serious issue. > I'm just mentioning that may not be the same handle as some of the earlier > state calls. This is a serious issue; sighs. The Logger will not miss any GPU state (since that is handled at ioctl interception time), but it will get the GL/GLES call ID's royally hosed because the api call markers will be on the log associated to that (old) (fd, GEM handle) pair. Futz. The solution is to add another function to the driver interface that the driver calls when it "migrates" stuff from one BO to another (i.e. when it grows a batchbuffer to fit in that last draw call). Good catch on that, I had utterly forgotten that the batchbuffer split lets i965 "grow" the batchbuffer to fit in that one last call. All the stuff I have run on it so far have not hit that, but there will be loads where it gets hit. I will fix this ASAP. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU
Hi, If we just want to send to the kernel the data from the trace, I can do that very easily; just make such a GEM BO, comprising of dword-pairs of (TraceCallID, BatchbufferOffset). That will be a small buffer and together with the apitrace file, will give complete data. I could probably make such a dedicated tool quite quickly, or add that functionality to the logger. -Kevin -Original Message- From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] Sent: Wednesday, September 27, 2017 1:21 PM To: Rogovin, Kevin ; mesa-dev@lists.freedesktop.org Subject: RE: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU Quoting Rogovin, Kevin (2017-09-27 07:53:29) > Hi, > > Right now the way the thing works is that it walks the batchbuffer just > after the kernel returns from the ioctl and updates its internal view of the > GPU state as it walks and emits to the log file the data. The log on a single > batchbuffer is (essentially) just a list of call ID's from the apitrace > together of "where in the batchbuffer" that call started. > > I confess that I had not realized the potential application for using > something like this to help diagnose GPU hangs! I think it is a really good > idea. What I could do is the following (and it is not terribly hard to do): > >1. -BEFORE- issuing the ioctl, the logger walks just the api markers in > the log of the batchbuffer, and makes a new GEM BO filled with apitrace data > (call ID, and maybe GL function data) and modify the ioctl to have an extra > buffer. Yes. With the current intel_batchbuffer.c this should be relatively easy (I suggest you limit yourself to recent kernels for that simplification); see EXEC_BATCH_FIRST and remember to mark the trace bo as EXEC_OBJECT_CAPTURE. > 2. -AFTER- the ioctl returns, emit the log data (as now) and delete the GEM > BO; In order to read the GPU state more accurately I need to walk the log and > update the GPU state after the ioctl (mostly out of paranoia for values > copied from BO's to pipeline registers). Up to, but my paranoia goes the other way. Once the ioctl returns the hw is indeed using that memory, so I have less trust of it. If you need to tie the relocated pointers to the trace, I would also emit relocations into the trace. For the reasons of port-mortem GPU hang debugging, I would want the execbuf be complete before the ioctl, rather than post processing. > What would happen, is that if a batchbuffer made the GPU hang, you would then > know all the GL commands (trace ID's from the API trace) that made stuff on > that batchbuffer. Then one could go back to the apitrace of the troublesome > application and have a much better starting place to debug. Yup. As times go on, I hope this becomes a more complete flight-recorder that we don't have to rely on referencing back to a separate trace to work out the interesting calls. My goal is that you can give one instruction (that doesn't require any additional dependencies, so can just be LD_PRELOAD=i965-fdr.so, or better a script installed in mesa-utils?) to a bug reporter and that will then capture enough information. > We could also do something evil looking and put another modification on > apitrace where it can have a list of call trace ranges where it inserts > glFinish after each call. Those glFinish()'s will then force the ioctl of the > exact troublesome draw call without needing to tell i965 to flush after each > draw call. > > Just to make sure, you want the "apitrace" data (call ID list, maybe function > name) in a GEM BO? Which GEM BO should it be in the list so that kernel debug > code know which one to use to dump? I would guess if the batchbuffer is the > first buffer, then it would be the last buffer, otherwise if the batch buffer > is the last one, I guess it would be one just before, but that might screw up > reloc-data if any of the relocs in the batchbuffer refer to itself. I can > also emit the data to a file and close the file before the ioctl and if the > ioctl returns, delete said file (assuming a GPU hang always stops the > process, then a hang would leave behind a file). My vision is that you would attach all "files" to the execbuf, but then again I'm focusing on fdr and not debugging of new features. So long as we are talking about a few megabytes of trace data that isn't too bad. Then we don't have to fiddle around with extra files to find the ones corresponding to the hang, as they will be recorded in the error state. The contents I leave up to you :) (I figure it is a snowball, once a tracing mechanism exists for capturing GPU hangs, there'll be lots of suggestions! One is probably just to capture the aub annotations alongside the batch. Hmm, th
Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU
HI, In spirit, stuffing data into MI_NOOP is nicer since then one can just rely on aubinator to read that data and go to town. The main issues I see are the following. 1. One needs to now insert MI_NOOP's into the command buffer in order to insert strings. This changes what is sent to the GPU (though one can argue that MI_NOOP should not really matter). The big nasty potential change is in situations where the command buffer approaches full, with the MI_NOOP it fills faster and thus that dramatically changes what a driver sends to the GPU since a new batchbuffer triggers more state and -FLUSHES-. 2. It means more modifications to the driver in order to insert the messages. 3. The driver needs to somehow get a call-id from the application in order to know what value to place in the MI_NOOP. The worst issue (for me) is #1; #3 is solveable-ish by making some function pointer available to set the value to stuff in the MI_NOOP unused bits. Issue #2 is quite icky because I have more in mind for the logger than Mesa/i965 and I want to keep the work to add it to a driver to a bare minimum. FWIW, when I started this, I wanted to do it via aub-dumper and aubinator where they would produce auxiliary files that had the necessary data to know what in it came from where. But the more I looked at the issues I wanted to solve, the more trickier it seemed to me to use aubdumper and aubinator to accomplish that. -Original Message- From: Landwerlin, Lionel G Sent: Wednesday, September 27, 2017 12:35 PM To: Rogovin, Kevin ; Chris Wilson ; mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU A few months ago I implemented debug messages in the command stream by stuffing the unused bits of MI_NOOP : https://patchwork.freedesktop.org/series/26079/ Aubinator would then read the bits and print the messages. We might be able to reuse similar idea to get away with any external interface. Instead of having strings of characters we could put a marker with the BO handle that could be used to store all of the metadata about a particular draw call. What do you think? - Lionel On 27/09/17 07:53, Rogovin, Kevin wrote: > Hi, > > Right now the way the thing works is that it walks the batchbuffer just > after the kernel returns from the ioctl and updates its internal view of the > GPU state as it walks and emits to the log file the data. The log on a single > batchbuffer is (essentially) just a list of call ID's from the apitrace > together of "where in the batchbuffer" that call started. > > I confess that I had not realized the potential application for using > something like this to help diagnose GPU hangs! I think it is a really good > idea. What I could do is the following (and it is not terribly hard to do): > > 1. -BEFORE- issuing the ioctl, the logger walks just the api markers in > the log of the batchbuffer, and makes a new GEM BO filled with apitrace data > (call ID, and maybe GL function data) and modify the ioctl to have an extra > buffer. > >2. -AFTER- the ioctl returns, emit the log data (as now) and delete the > GEM BO; In order to read the GPU state more accurately I need to walk the log > and update the GPU state after the ioctl (mostly out of paranoia for values > copied from BO's to pipeline registers). > > What would happen, is that if a batchbuffer made the GPU hang, you would then > know all the GL commands (trace ID's from the API trace) that made stuff on > that batchbuffer. Then one could go back to the apitrace of the troublesome > application and have a much better starting place to debug. > > We could also do something evil looking and put another modification on > apitrace where it can have a list of call trace ranges where it inserts > glFinish after each call. Those glFinish()'s will then force the ioctl of the > exact troublesome draw call without needing to tell i965 to flush after each > draw call. > > Just to make sure, you want the "apitrace" data (call ID list, maybe function > name) in a GEM BO? Which GEM BO should it be in the list so that kernel debug > code know which one to use to dump? I would guess if the batchbuffer is the > first buffer, then it would be the last buffer, otherwise if the batch buffer > is the last one, I guess it would be one just before, but that might screw up > reloc-data if any of the relocs in the batchbuffer refer to itself. I can > also emit the data to a file and close the file before the ioctl and if the > ioctl returns, delete said file (assuming a GPU hang always stops the > process, then a hang would leave behind a file). > > Let me know, what is best, and I will do it. > > -Kevin > > > -Original Message- > From: Chris Wilson [mai
Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU
Hi, Sighs, I forget one -critical- issue on decoding the batchbuffer: it needs to reloc data from the kernel to have any chance of correctly decoding things referred to by the batch buffer (which is oodles of stuff), thus decode can only happen after kernel succeeds. However, I can make it emit a separate file before the call to the kernel giving atleast the apitrace data (and perhaps the filename to give an indication too) and delete the file after the ioctl returns; from there it is straight forward to have n files alive to see the last n execbuffer2 ioctls. Though, if the logger is to send the api-trace call id's to the kernel on execbuffer2, then this does not matter. -Kevin -Original Message- From: Rogovin, Kevin Sent: Wednesday, September 27, 2017 9:53 AM To: 'Chris Wilson' ; mesa-dev@lists.freedesktop.org Subject: RE: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU Hi, Right now the way the thing works is that it walks the batchbuffer just after the kernel returns from the ioctl and updates its internal view of the GPU state as it walks and emits to the log file the data. The log on a single batchbuffer is (essentially) just a list of call ID's from the apitrace together of "where in the batchbuffer" that call started. I confess that I had not realized the potential application for using something like this to help diagnose GPU hangs! I think it is a really good idea. What I could do is the following (and it is not terribly hard to do): 1. -BEFORE- issuing the ioctl, the logger walks just the api markers in the log of the batchbuffer, and makes a new GEM BO filled with apitrace data (call ID, and maybe GL function data) and modify the ioctl to have an extra buffer. 2. -AFTER- the ioctl returns, emit the log data (as now) and delete the GEM BO; In order to read the GPU state more accurately I need to walk the log and update the GPU state after the ioctl (mostly out of paranoia for values copied from BO's to pipeline registers). What would happen, is that if a batchbuffer made the GPU hang, you would then know all the GL commands (trace ID's from the API trace) that made stuff on that batchbuffer. Then one could go back to the apitrace of the troublesome application and have a much better starting place to debug. We could also do something evil looking and put another modification on apitrace where it can have a list of call trace ranges where it inserts glFinish after each call. Those glFinish()'s will then force the ioctl of the exact troublesome draw call without needing to tell i965 to flush after each draw call. Just to make sure, you want the "apitrace" data (call ID list, maybe function name) in a GEM BO? Which GEM BO should it be in the list so that kernel debug code know which one to use to dump? I would guess if the batchbuffer is the first buffer, then it would be the last buffer, otherwise if the batch buffer is the last one, I guess it would be one just before, but that might screw up reloc-data if any of the relocs in the batchbuffer refer to itself. I can also emit the data to a file and close the file before the ioctl and if the ioctl returns, delete said file (assuming a GPU hang always stops the process, then a hang would leave behind a file). Let me know, what is best, and I will do it. -Kevin -Original Message- From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] Sent: Tuesday, September 26, 2017 11:20 PM To: Rogovin, Kevin ; mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU Quoting Rogovin, Kevin (2017-09-26 10:35:44) > Hi, > > Attached to this message are the following: > 1. a file giving example usage of the tool with a modified > apitrace to produce json output > > 2. the patches to apitrace to make it BatchbufferLogger aware > > 3. the JSON files (gzipped) made from the example. > > > I encourage (and hope) people will take a look at the JSON to see the > potential of the tool. The automatic apitrace-esque logging seems very useful. How easy would it be to write that trace into a bo and associate with the execbuffer (from my pov, it should be that hard)? That way you could get the most recent actions before a GPU hang, attach them to a bug and decode them at leisure. (An extension may be to keep a ring of the last N traces so that you can see some setup a few batches ago that triggered a hang in this one.) I presume you already have such a plan, and I'm just preaching to the choir. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU
Hi, Right now the way the thing works is that it walks the batchbuffer just after the kernel returns from the ioctl and updates its internal view of the GPU state as it walks and emits to the log file the data. The log on a single batchbuffer is (essentially) just a list of call ID's from the apitrace together of "where in the batchbuffer" that call started. I confess that I had not realized the potential application for using something like this to help diagnose GPU hangs! I think it is a really good idea. What I could do is the following (and it is not terribly hard to do): 1. -BEFORE- issuing the ioctl, the logger walks just the api markers in the log of the batchbuffer, and makes a new GEM BO filled with apitrace data (call ID, and maybe GL function data) and modify the ioctl to have an extra buffer. 2. -AFTER- the ioctl returns, emit the log data (as now) and delete the GEM BO; In order to read the GPU state more accurately I need to walk the log and update the GPU state after the ioctl (mostly out of paranoia for values copied from BO's to pipeline registers). What would happen, is that if a batchbuffer made the GPU hang, you would then know all the GL commands (trace ID's from the API trace) that made stuff on that batchbuffer. Then one could go back to the apitrace of the troublesome application and have a much better starting place to debug. We could also do something evil looking and put another modification on apitrace where it can have a list of call trace ranges where it inserts glFinish after each call. Those glFinish()'s will then force the ioctl of the exact troublesome draw call without needing to tell i965 to flush after each draw call. Just to make sure, you want the "apitrace" data (call ID list, maybe function name) in a GEM BO? Which GEM BO should it be in the list so that kernel debug code know which one to use to dump? I would guess if the batchbuffer is the first buffer, then it would be the last buffer, otherwise if the batch buffer is the last one, I guess it would be one just before, but that might screw up reloc-data if any of the relocs in the batchbuffer refer to itself. I can also emit the data to a file and close the file before the ioctl and if the ioctl returns, delete said file (assuming a GPU hang always stops the process, then a hang would leave behind a file). Let me know, what is best, and I will do it. -Kevin -Original Message- From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] Sent: Tuesday, September 26, 2017 11:20 PM To: Rogovin, Kevin ; mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU Quoting Rogovin, Kevin (2017-09-26 10:35:44) > Hi, > > Attached to this message are the following: > 1. a file giving example usage of the tool with a modified > apitrace to produce json output > > 2. the patches to apitrace to make it BatchbufferLogger aware > > 3. the JSON files (gzipped) made from the example. > > > I encourage (and hope) people will take a look at the JSON to see the > potential of the tool. The automatic apitrace-esque logging seems very useful. How easy would it be to write that trace into a bo and associate with the execbuffer (from my pov, it should be that hard)? That way you could get the most recent actions before a GPU hang, attach them to a bug and decode them at leisure. (An extension may be to keep a ring of the last N traces so that you can see some setup a few batches ago that triggered a hang in this one.) I presume you already have such a plan, and I'm just preaching to the choir. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 05/22] i965: correctly assign SamplerCount of INTERFACE_DESCRIPTOR_DATA
Hi, AFAIK, for all shader types both binding table entry count and number of samplers can be zero; the hardware uses those values to pre-fetch data. The docs say one may wish to leave it at zero if there is a risk of state thrashing if the number (for either) is large. FWIW, my main reason for fixing these two values was so that the logger could decode the binding tables and samplers correctly. If people want to push these 2 patches without having the entire series RB'd, that is fine with me, I am going to need to rebase anyways as this is just an RFC. Moreover all the patches that are fixes/enhancements to existing code, i.e. patches 0004-0005 and patches 0007-0015, I would be happy to see them pushed even if the main thrush of this series, the BatchbufferLogger, has comments (which I think it will). -Kevin -Original Message- From: Justen, Jordan L Sent: Tuesday, September 26, 2017 2:53 AM To: Landwerlin, Lionel G ; Rogovin, Kevin ; mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH 05/22] i965: correctly assign SamplerCount of INTERFACE_DESCRIPTOR_DATA On 2017-09-25 05:46:32, Lionel Landwerlin wrote: > I'm genuinely surprised we didn't noticed this problem before :| Indeed. Did jenkins show any 'fixes' from this patch? I think this patch should be handled separately from this RFC series. Reviewed-by: Jordan Justen > Fixes: 71bfb44005bf ("i965: Port brw_cs_state tracked state to genxml.") > Reviewed-by: Lionel Landwerlin > Cc: "17.2" > > On 25/09/17 11:34, kevin.rogo...@intel.com wrote: > > From: Kevin Rogovin > > > > Signed-off-by: Kevin Rogovin > > --- > > src/mesa/drivers/dri/i965/genX_state_upload.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c > > b/src/mesa/drivers/dri/i965/genX_state_upload.c > > index bbb47ea..32c7d22 100644 > > --- a/src/mesa/drivers/dri/i965/genX_state_upload.c > > +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c > > @@ -4197,7 +4197,7 @@ genX(upload_cs_state)(struct brw_context *brw) > > const struct GENX(INTERFACE_DESCRIPTOR_DATA) idd = { > > .KernelStartPointer = brw->cs.base.prog_offset, > > .SamplerStatePointer = stage_state->sampler_offset, > > - .SamplerCount = DIV_ROUND_UP(stage_state->sampler_count, 4) >> 2, > > + .SamplerCount = DIV_ROUND_UP(stage_state->sampler_count, 4), > > .BindingTablePointer = stage_state->bind_bo_offset, > > .BindingTableEntryCount = prog_data->binding_table.size_bytes / 4, > > .ConstantURBEntryReadLength = cs_prog_data->push.per_thread.regs, > > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/22] i965: assign BindingTableEntryCount of INTERFACE_DESCRIPTOR_DATA
Hi, The value is the number of entries, not the size in bytes; BindingTableEntryCount for compute/media is programmed the same way as for all the other shader stages (where they too are divided by 4). FWIW, the hardware will work without this patch because genX initializes the value as 0. When the value is zero, it means the GPU will not prefetch the binding table entries. -Kevin -Original Message- From: Landwerlin, Lionel G Sent: Monday, September 25, 2017 3:10 PM To: Rogovin, Kevin ; mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH 04/22] i965: assign BindingTableEntryCount of INTERFACE_DESCRIPTOR_DATA On 25/09/17 11:34, kevin.rogo...@intel.com wrote: > From: Kevin Rogovin > > Signed-off-by: Kevin Rogovin > --- > src/mesa/drivers/dri/i965/genX_state_upload.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c > b/src/mesa/drivers/dri/i965/genX_state_upload.c > index 6127616..bbb47ea 100644 > --- a/src/mesa/drivers/dri/i965/genX_state_upload.c > +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c > @@ -4199,6 +4199,7 @@ genX(upload_cs_state)(struct brw_context *brw) > .SamplerStatePointer = stage_state->sampler_offset, > .SamplerCount = DIV_ROUND_UP(stage_state->sampler_count, 4) >> 2, > .BindingTablePointer = stage_state->bind_bo_offset, > + .BindingTableEntryCount = prog_data->binding_table.size_bytes / > + 4, I'm not sure this should be divided by 4. The documentation doesn't seemto say it behaves like SamplerCount. > .ConstantURBEntryReadLength = cs_prog_data->push.per_thread.regs, > .NumberofThreadsinGPGPUThreadGroup = cs_prog_data->threads, > .SharedLocalMemorySize = encode_slm_size(GEN_GEN, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: add 2xMSAA and 16xMSAA to DRI configs for Gen9.
Hi, Please do not push yet. I will post a V2 that makes the pointer thing not so offensively ugly. Also, I missed the opportunity to also fix the DRI conf for Gen8, as it does have 2xMSAA. -Kevin -Original Message- From: Ben Widawsky [mailto:b...@bwidawsk.net] Sent: Thursday, August 24, 2017 8:10 PM To: Rogovin, Kevin Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH] i965: add 2xMSAA and 16xMSAA to DRI configs for Gen9. On 17-08-24 14:16:39, kevin.rogo...@intel.com wrote: >From: Kevin Rogovin > >Special thanks to Eero Tamminen for reporting rasterizer numbers being >twice what it should be for 2xMSAA under a benchmark. > >Signed-off-by: Kevin Rogovin >--- > src/mesa/drivers/dri/i965/intel_screen.c | 14 +++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > >diff --git a/src/mesa/drivers/dri/i965/intel_screen.c >b/src/mesa/drivers/dri/i965/intel_screen.c >index 579554f..67eb776 100644 >--- a/src/mesa/drivers/dri/i965/intel_screen.c >+++ b/src/mesa/drivers/dri/i965/intel_screen.c >@@ -1882,7 +1882,7 @@ intel_screen_make_configs(__DRIscreen *dri_screen) >}; > >static const uint8_t singlesample_samples[1] = {0}; >- static const uint8_t multisample_samples[2] = {4, 8}; >+ static const uint8_t multisample_samples_2_4_8_16[] = {2, 4, 8, >+ 16}; > >struct intel_screen *screen = dri_screen->driverPrivate; >const struct gen_device_info *devinfo = &screen->devinfo; @@ >-1959,6 +1959,7 @@ intel_screen_make_configs(__DRIscreen *dri_screen) > * supported. Singlebuffer configs are not supported because no one wants > * them. > */ >+ No unnecessary whitespace changes, please. >for (unsigned i = 0; i < ARRAY_SIZE(formats); i++) { > if (devinfo->gen < 6) > break; >@@ -1966,6 +1967,7 @@ intel_screen_make_configs(__DRIscreen *dri_screen) > __DRIconfig **new_configs; > const int num_depth_stencil_bits = 2; > int num_msaa_modes = 0; >+ const uint8_t *multisample_samples = NULL; > > depth_bits[0] = 0; > stencil_bits[0] = 0; >@@ -1978,10 +1980,16 @@ intel_screen_make_configs(__DRIscreen *dri_screen) > stencil_bits[1] = 8; > } > >- if (devinfo->gen >= 7) >+ if (devinfo->gen >= 9) { >+ multisample_samples = multisample_samples_2_4_8_16; >+ num_msaa_modes = 4; >+ } else if (devinfo->gen >= 7) { >+ multisample_samples = multisample_samples_2_4_8_16 + 1; > num_msaa_modes = 2; >- else if (devinfo->gen == 6) >+ } else if (devinfo->gen == 6) { >+ multisample_samples = multisample_samples_2_4_8_16 + 1; > num_msaa_modes = 1; >+ } I think it'd be a little cleaner to just make GEN specific arrays. Easier to read, and you can just USE ARRAY_SIZE but I honestly don't care much. if (devinfo->gen >= 9) { multisample_samples = multisample_samples_gen9; num_msaa_modes = ARRAY_SIZE(multisample_samples_gen9); } > > new_configs = driCreateConfigs(formats[i], > depth_bits, Kind of shocking to me that we missed this previously for both when we added 2x MSAA and later 16x. Indeed looking at glxinfo, I see no 2x or 16x visuals. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] Differentiate between DeleteQueries and DeleteQueriesARB
Hello, > typo -> ...query (same goes for patch 1/2) I hate typos. Thanks for pointing it out. > Please correct me if I'm wrong, but I think we cannot unalias functions once > they're in. > It will break the backwards compatibility we're trying to manage with glapi. > If we want to > retain it we should opt for the least likely to hit solution ? I confess I do not understand. What exactly is being compatible with what? > Either I'm misreading the comment above the function or this doesn't belong > here. If I understood the code correctly, that is setting up a function table for when GL state is in a glBegin/glEnd pair. I added the ARB function because the non-ARB one was there already. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 2/2] mesa/es3.1: Limit Framebuffer Parameter OpenGL ES 3.1 usage
Reviewed-by: Kevin Rogovin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 1/2] mesa/es3.1: Expose GL_ARB_framebuffer_no_attachments to GLES 3.1
Reviewed-by: Kevin Rogovin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [v4 PATCH 05/10] mesa: helper function for scissor box of gl_framebuffer
-Original Message- From: Rogovin, Kevin Sent: Wednesday, June 10, 2015 8:56 AM To: 'Ian Romanick'; mesa-dev@lists.freedesktop.org Subject: RE: [Mesa-dev] [v4 PATCH 05/10] mesa: helper function for scissor box of gl_framebuffer Hi, -Original Message- From: Ian Romanick [mailto:i...@freedesktop.org] Sent: Wednesday, June 10, 2015 1:06 AM To: Rogovin, Kevin; mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [v4 PATCH 05/10] mesa: helper function for scissor box of gl_framebuffer >All of the changes in this function below this point seem unnecessary. >If they are necessary, please explain why. If they're not, mixing cosmetic >changes and functional >changes is bad... don't do it. :) That also avoids >the issue of where to put the "+". In truth, the raw diff hides the fact that this is actually the body of a new function taken from an old function. In v1 (and I think v2) I did not use xmax and ymax, but the code was severely over the 80 column limit and did a re-compute (which a reasonable compiler should likely realize and not do in compiled code). I am totally happy to make the body of the new function to be essentially identical to the body of the old function and thus leave it going past 80-columns and doing the re-compute. Would that be better? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [v4 PATCH 05/10] mesa: helper function for scissor box of gl_framebuffer
Hi, -Original Message- From: Ian Romanick [mailto:i...@freedesktop.org] Sent: Wednesday, June 10, 2015 1:06 AM To: Rogovin, Kevin; mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [v4 PATCH 05/10] mesa: helper function for scissor box of gl_framebuffer >All of the changes in this function below this point seem unnecessary. >If they are necessary, please explain why. If they're not, mixing cosmetic >changes and functional >changes is bad... don't do it. :) That also avoids >the issue of where to put the "+". In truth, the raw diff hides the fact that this is actually the body of a new function taken from an old function. In v1 (and I think v2) I did not use xmax and ymax, but the code was severely over the 80 column limit and did a re-compute (which a reasonable compiler should likely realize and not do in compiled code). I am totally happy to make the body of the new function to be essentially identical to the body of the old function and thus leave it going past 80-columns and doing the re-compute. Would that be better? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [v4 PATCH 05/10] mesa: helper function for scissor box of gl_framebuffer
> Out of curiosity, what editor are you using? Usually setting the > indentation rules in your editor takes care of most of these problems. I am using emacs23 (under Ubuntu) and now for something amusing... > We have a .dir-locals.el file that should provide the correct settings > for Emacs. emacs23 rejected that .el file, switching to emacs24 solved the fiasco. Shudders. IF I had had a new emacs all the tabbing nits would have never appeared. Oh well. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [v4 PATCH 04/10] mesa: add helper functions for geometry of gl_framebuffer
> You should wait until you get a "real" review from someone before reposting. I think that is good advice. Indeed, I am not going to post a v5 of this series until for each patch there is: Two review bys, possibly qualified with "fix the mentioned formatting issues" OR Specific change requests. The patch series has not really changed in any meaningful way (in terms of compiled output) since v2. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [v4 PATCH 04/10] mesa: add helper functions for geometry of gl_framebuffer
Hi, >> +static inline GLuint >Here and below, why 2 spaces between "inline" and "GLuint"? I have no clue. I suspect it is a scar from some search/replace fiasco over 3 weeks ago. You are the first person who spotted that nit. >> --- a/src/mesa/main/mtypes.h >> +++ b/src/mesa/main/mtypes.h >> @@ -3187,7 +3187,13 @@ struct gl_framebuffer >> * GL_ARB_framebuffer_no_attachments must check for the flag >> _HasAttachments >> * and if GL_FALSE, must then use the values in DefaultGeometry to >> initialize > > * its viewport, scissor and so on (in particular _Xmin, _Xmax, _Ymin and > >-* _Ymax do NOT take into account _HasAttachments being false) > >+* _Ymax do NOT take into account _HasAttachments being false). To get > >the > >+* geometry of the framebuffer, the helper functions > Why 2 spaces between "the" and "helper"? No clue. You are again the first person to spot that space. It would be nice to get a "fix these nits I found and get a reviewed by" badge. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [v4 PATCH 05/10] mesa: helper function for scissor box of gl_framebuffer
Hi, > 44 instances of a leading + in mesa/main compared to 78 trailing ones. > Huh, I was going to say that it's really uncommon to do this in mesa, but > that may not be supported by fact. Considering there is a formatting issue below, I can change it to trailing. I don’t care. >> + if (ymax < bbox[3]) { >> +bbox[3] = ymax; >indent looks messed up here. Sighs, because it is. I missed that line, and I have missed that one since v2. I really wish I had a script to check the indenting. Sighs. > I don't think the "extern" isn't really necessary here. A lot of older code > still has it, but you don't have to stick it in on newer additions. I can kill the extern, I don't care; however, all the previous existing functions in the header had extern on them. Is it better for consistency within the file to keep the extern? >> >> + > What's with the extra newline? Double sighs. Apparently, my git skills are lacking and I let that added line sneak in when I had killed it from Patch 4. > static inline GLuint > _mesa_geometric_height(const struct gl_framebuffer *buffer) { > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [v3 PATCH 01/10] mesa:Define extension ARB_framebuffer_no_attachments
> FWIW, the kernel standards for commit messages are at: > https://www.kernel.org/doc/Documentation/SubmittingPatches > Most of those rules apply to Mesa too. It says the body should be wrapped to > 75 chars (although I've been using 72 like Matt said). This is my point: "use most rules, but not all".. and "I've been more conservative than X but I did not need to be". What I am seeing is that there is, in some collective form, a set of consistent rules (in the form of ranges) that are strongly enforced and yet not written down. Let's write them all down here and now, put them in some file for others to read and to refer to. Maybe even con someone to write a lint-like script for those rules. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [v3 PATCH 01/10] mesa:Define extension ARB_framebuffer_no_attachments
> I suppose it could be useful, but I think we've been mostly successful at > just expecting people to recognize when what they're writing doesn't look > like the code around it. This is my point. Older code had different style/expectations than newer code. For this patch series, I have hit a number of nits and quasi-nits that are ambiguous: 0. In v1 to v2, avoid GL types atleast in i965 although much of the code uses that (the explanation given was to move away from GL types). Does this apply to Mesa core as well? 1. the casting bit in the i965 patch to use the _mesa_geomety_ functions. These are not quite nits, but in some ways not a big deal. I hit these because there were ints there before. In this regard doing what was there before was ungood (atleast for this review). 2. the line between function thing. In truth I just missed that extra line for the added to framebuffer.h (and I should not have), but there are places in the code that there are multiple empty lines between function definitions. I do not mind saying "no extra empty lines", but not knowing the rules and attempting to infer them from the code, I seem to hit too many nits. 3. Even on the subject of git commit, I am seeing different answers: 75, but try 50 usually, but understandable if cannot do it. Utterly ambiguous. 4. on the subject of line length I have this so far: usually 78, but 80 sometimes is ok. Does ok, for example, include making a large-ish comment block more justified? 5. Consecutive empty lines is not ok, except in function definitions, but then only sometimes. I am guessing "sometimes" is for grouping function definitions, but plenty of files follow different conventions (hence what Brian Paul said). Given that nits just add traffic (and I want to minimize that silliness) I think it would be wise to set down some precise rules so there is no judgement calls required for something as silly as formatting nits. Ideally, we would have a lint script that would do it for us :) I see that reviews usually first hit nits, then content of patches. That is fine, but I'd rather know all the rules rather than hitting the nits at all. Again, I really have no preference since someone is paying me to do this, but knowing the exact rules would be more efficient. Inferring rules is quite error prone IMO (indeed, at one point, the GLSL pre-processor appeared to be written in a different style). -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [v3 PATCH 01/10] mesa:Define extension ARB_framebuffer_no_attachments
HI, >Or 78 columns, to be safe, but there's exceptions, like if you're > defining a big static table/array of info. Uggg I don't mind exceptions, but knowing them is key. >> 4. successive parenthesis must have spaces between parenthesis > Example? if (some_func(some_argument)) is bad, but if (some_func(some_argument) ) is good. I am also guessing that a = foo(bar(x)); is bad and must be changed to a = foo(bar(x) ); >> 6. Use "whether condition" when describing a bool instead of "true if >> condition is true" > not sure we care about that. I hit a nit for it in the series, so someone cared. > 7. derived values of structs -should- be prefixed with an underscore (this > rule is loaded with exceptions in the code base from its evolution) > 8. "indenting" is 3 spaces, except after switch where it is 0 (but after > case it is 3). > 9. open bracket on same line The 'indent' command in the docs should cover that. > > 11. functions for an extension must check if extension is supported and if > > not emit an INVALID_OPERATION message instead of relying on function table > > dispatch to guarantee they are not called. > Not sure about that, but that's not a coding style standard. Perhaps coding standard is not the right word, but it is a requirement (atleast it seems that way) and is a trivial requirement to satisfy. >> 12. (Guessing here) consecutive empty lines are not allowed > Generally true, except between functions. Ugg... I hit a nit from an extra space between functions. > 13. If changing a line that violates the nit rules, fix the line too rather > than just adding the change Yeah, probably. > Feel free to submit a patch against docs/devinfo.html with this info. :) I do not mind submitting the patch, but I want to know what the rules are; preferably explicitly written rather than inferred (by me). Especially since I seem to hit nits like no tomorrow even when trying not to :) -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [v3 PATCH 06/10] i965: Use _mesa_geometry_ functions appropriately
> Strange because of the types -- presumably fb_samples is an int because its > uses are in a comparison with another int (that probably doesn't need to be > an int :) I am really hesitant to start changing types all over the place; I have a sinking suspicion that changing the types of fb_width, _height and so on to unsigned int might require some careful checking (the minus ones here and there need some checking. If it really must be done then so be it, but the current series preserves more of the old behavior and I want to try to not add even more changes. So just to ask explicitly: do you want that those values fb_width = _mesa_geometry_width() and related buggers to be unsigned int and go through the process of making sure unsigned int is ok? There are some uglies related to a few "minus one" 's that make me twitch. Lastly, when I posted v2 of this series, no one commented on the use of int's. For the v1, Topi requested to have some values start as floats to avoid later casts (and this was done). -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [v3 PATCH 01/10] mesa:Define extension ARB_framebuffer_no_attachments
> This line is too long. (It will not fit in 80 columns in git log since git > log adds some spaces before each commit message line.) What is the accepted maximum length for a line in a commit message? >> - extension table >> - additions to gl_framebuffer >> >> v1 -> v2 >> Spacing and trailing spaces fixes. >This looks odd to me. I think you only need 'v2:' here. So, either I have seen a number of patches with the notation v1 -> v2 to list changes between versions. Those patches that I saw using that notation did not have comments about using the format v1->v2. If people want v2: instead of v1->v2, I am fine with it, but was just following what I saw in some patch series. Given the number of nits around (that I seem to hit regularly), it might be beneficial for Mesa to have a short document listing the formatting requirements, of which I have so far collected: 1. 80 column limit in source files 2. Justify comments to 80 columns as well 3. comparison expressions require spaces around both sides of comparison operator 4. successive parenthesis must have spaces between parenthesis 5. git commit messages have limit of N characters per line (the value of N I am asking above). 6. Use "whether condition" when describing a bool instead of "true if condition is true" 7. derived values of structs -should- be prefixed with an underscore (this rule is loaded with exceptions in the code base from its evolution) 8. "indenting" is 3 spaces, except after switch where it is 0 (but after case it is 3). 9. open bracket on same line 10. no white spaces at end of line 11. functions for an extension must check if extension is supported and if not emit an INVALID_OPERATION message instead of relying on function table dispatch to guarantee they are not called. 12. (Guessing here) consecutive empty lines are not allowed 13. If changing a line that violates the nit rules, fix the line too rather than just adding the change I suspect there are more as I listen to the nits, I think it might be beneficial to collect all the formatting nits and write them down to the coding standard thing in Mesa so that others can refer to it. Especially useful for those that work on multiple projects with different coding standards. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [v3 PATCH 06/10] i965: Use _mesa_geometry_ functions appropriately
>> src/mesa/drivers/dri/i965/brw_misc_state.c | 10 +++--- > > src/mesa/drivers/dri/i965/brw_sf_state.c | 8 > As is this? brw_misc_state.c is active for all Gens. The change to brw_sf_state.c is to add a comment warning that using gl_framebuffer::Width and so on is only ok if _HasAttachments is false. > BEGIN_BATCH(4); > OUT_BATCH(_3DSTATE_DRAWING_RECTANGLE << 16 | (4 - 2)); > OUT_BATCH(0); /* xmin, ymin */ > - OUT_BATCH(((ctx->DrawBuffer->Width - 1) & 0x) | > - ((ctx->DrawBuffer->Height - 1) << 16)); > + OUT_BATCH(((fb_width - 1) & 0x) | > + ((fb_height - 1) << 16)); > Remove the tab on this line and indent it properly while we're changing it. Actually it fits on one line, so that will be the change :) > + /* Accessing the fields Width and Height of > +* gl_framebuffer to produce the values to > +* program the viewport and scissor is fine > +* as long as the gl_framebuffer has atleast > +* one attachment. > Line wrapping... OK. > struct brw_state_flags state = brw->state.pipelines[pipeline]; > + int fb_samples = (int)_mesa_geometric_samples(ctx->DrawBuffer); > The cast looks strange Is there a spacing missing in the cast? or is it strange because of the types? > These casts look weird. (Happens elsewhere in this patch too). > Assuming brw_context::num_samples doesn't need to be signed, I'd be > inclined to change its type to unsigned and remove the casts. Grepping > for 'num_samples = ' shows some other instances of us implicitly > converting unsigned <-> int. I was hesitant to be changing types in the patch series. I wanted to minimize the changes and try to keep it consistent with what is left unchanged. If the types should be unsigned int, I can do that, but I would like to hear the opinions of others too. > Extra new line. OK. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [v3 PATCH 03/10] mesa: Complete implementation for ARB_framebuffer_no_attachments in Mesa core
> Extra space between . and " Fix I will > Extra space before . Fix I will > Also, if anyone is ever grepping for MAX_FRAMEBUFFER_SAMPLES, they won't find > this. I'd move the whole word to the next line. Ok. > + * > + * The same requirements are also in place for GL 4.5, > + * Section 9.4.1 "Framebuffer Attachment Completeness", pg 310-311 > + * > + * However, this is a tighter restriction than previous version of GL. > + * In interest of better compatibility, we will not enforce these > + * restrictions. > + */ > Comment at the end of a block like this looks strange. I'm not sure what it's > commenting on. The purpose of the comment block is to list what the extension and GL4.5 add to the requirements for FBO happiness. The last sentence is writing out *now* that these added requirements, if enforced, risk breaking older applications and stating that the extra requirements will not be enforced. Would it be helpful at the top of the comment block to say "GL4.5/GL_ARB_framebuffer_no_attachments" added these requirements? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [v3 PATCH 02/10] mesa:Define constants and functions for ARB_framebuffer_no_attachment extension
> I'm happy to see new documentation, so thanks for writing it up! > But let's separate this from the functional changes related to implementing > the extension. (Didn't I give this comment last time?) If you did, I missed it. Unless there are objections, I'll remove this from the series and make a tiny patch later that is just the documentation. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/9] mesa: add helper convenience functions for fetching geometry of gl_framebuffer
Sorry, one more question, which I should have asked before, but neglected to: > If we go that route, we should probably just add a gl_framebuffer::Samples > field for uniformity. If this is done, then there are two ways to get the value: from the Visual field and from the new field; Naturally, all drivers now use the field from Visual. If gl_framebuffer::Samples or potentially a better named gl_framebuffer::_Samples is added, then should the field be removed from Visual and all references in drivers to use the new field? -Kevin -Original Message- From: Rogovin, Kevin Sent: Wednesday, May 06, 2015 1:00 PM To: 'Ian Romanick'; 'mesa-...@freedesktop.org' Subject: RE: [Mesa-dev] [PATCH 4/9] mesa: add helper convenience functions for fetching geometry of gl_framebuffer One more question: for patch 2 of the series, there was the request to change the type of _HasAttachments from GLboolean to bool. If the helper functions "survive", should the helper functions return unsigned int instead of GLuint? -Kevin -Original Message- From: Rogovin, Kevin Sent: Wednesday, May 06, 2015 10:28 AM To: 'Ian Romanick'; mesa-...@freedesktop.org Subject: RE: [Mesa-dev] [PATCH 4/9] mesa: add helper convenience functions for fetching geometry of gl_framebuffer > I'm waffling about this a bit. I'm concerned that people will use > buffer->Width when they should use the accessor. I don't see any > changes to core Mesa code to use the accessor, so I'm a little concerned that > some subtle, incorrect behavior is introduced... but there may well not be. > A common idiom in Mesa is to have an _Value field that is the derived value. > In many structures in mtypes.h you can see things like Enabled and > _ReallyEnabled. One is what the API sets, and the other is what the driver > uses that is based on the API setting and "other factors." In many ways, the > existing gl_framebuffer Width and Height fields are already derived values > based on the sizes of the attachments. >For at least Width, Height, and MaxNumLayers, it seems better to set the >existing fields differently when _HasAttachments is true. That reduces the >number of places where a person has to think about _HasAttachments >considerably... though perhaps there is something else that I'm not thinking >of? I was thinking of doing this when I was making the patch. The problem is that Width, Height and MaxNumLayers were used for two different purposes: 1. Specifying the intersection of the attached buffers (this affects blitting for example) 2. Specifying the "geometry" to send to a rasterizer Changing Width and Height to be the geometry made me itch because Width and Height are zero when there are no buffer attachments. I added a mess of comments about the meaning of Width and Height. My concern is that other parts of code use Width and Height being 0 to do other things. With this patch way, the meaning of Width and Height has not changed; instead a diver has to "know" what it is doing when it enables the extension. I am not particular about this though; I made it this way in an attempt to reduce the scope of changes from this patch series. If people want that the meaning of Width, Height and MaxSamplers changes I can go with that too, I just think it can increase the ickiness of making sure no unintended side effects are added is ickier. > If we go that route, we should probably just add a gl_framebuffer::Samples > field for uniformity. > + > + > +static inline GLuint > +_mesa_geometric_height(const struct gl_framebuffer *buffer) { > + return buffer->_HasAttachments ? > + buffer->Height : buffer->DefaultGeometry.Height; } > + > +static inline GLuint > +_mesa_geometric_samples(const struct gl_framebuffer *buffer) { > + return buffer->_HasAttachments ? > + buffer->Visual.samples : buffer->DefaultGeometry.NumSamples; > +} > + > +static inline GLuint > +_mesa_geometric_layers(const struct gl_framebuffer *buffer) { > + return buffer->_HasAttachments ? > + buffer->MaxNumLayers : buffer->DefaultGeometry.Layers; } > + > extern void > _mesa_update_draw_buffer_bounds(struct gl_context *ctx); > > diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index > ef97538..f0e8fbc 100644 > --- a/src/mesa/main/mtypes.h > +++ b/src/mesa/main/mtypes.h > @@ -3178,7 +3178,13 @@ struct gl_framebuffer > * the values in DefaultGeometry to initialize its > * viewport, scissor and so on (in particular _Xmin, > * _Xmax, _Ymin and _Ymax do NOT take into account > -* _HasAttachments being false) > +* _HasAttachments being false). To get the geometry > +* of
Re: [Mesa-dev] [PATCH 7/9] i965: ensure execution of fragment shader when fragment shader has atomic buffer access
Hi, > I think this check should be put in a utility function up in core Mesa > somewhere. It's open-coded twice in this patch, and the check will change > when GL_ARB_image_load_store lands. Would a check of the form: inline bool _mesa_has_atomic_ops(struct gl_context *ctx) { return ctx-> Shader._CurrentFragmentProgram != NULL && ctx->Shader._CurrentFragmentProgram->NumAtomicBuffers > 0; } be good? I am hesitant to make a check that just tests for side effects since for some hardware (like Gen8 for example) there is not just a simple single flag to say "must execute frag shader", instead there are several such flags that "force to run fragment shader", but the flags by themselves take on other meanings; there was some discussion on flag things in the first posting of this patch with Curro and Kenneth. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/9] mesa: add helper convenience functions for fetching geometry of gl_framebuffer
One more question: for patch 2 of the series, there was the request to change the type of _HasAttachments from GLboolean to bool. If the helper functions "survive", should the helper functions return unsigned int instead of GLuint? -Kevin -Original Message----- From: Rogovin, Kevin Sent: Wednesday, May 06, 2015 10:28 AM To: 'Ian Romanick'; mesa-...@freedesktop.org Subject: RE: [Mesa-dev] [PATCH 4/9] mesa: add helper convenience functions for fetching geometry of gl_framebuffer > I'm waffling about this a bit. I'm concerned that people will use > buffer->Width when they should use the accessor. I don't see any > changes to core Mesa code to use the accessor, so I'm a little concerned that > some subtle, incorrect behavior is introduced... but there may well not be. > A common idiom in Mesa is to have an _Value field that is the derived value. > In many structures in mtypes.h you can see things like Enabled and > _ReallyEnabled. One is what the API sets, and the other is what the driver > uses that is based on the API setting and "other factors." In many ways, the > existing gl_framebuffer Width and Height fields are already derived values > based on the sizes of the attachments. >For at least Width, Height, and MaxNumLayers, it seems better to set the >existing fields differently when _HasAttachments is true. That reduces the >number of places where a person has to think about _HasAttachments >considerably... though perhaps there is something else that I'm not thinking >of? I was thinking of doing this when I was making the patch. The problem is that Width, Height and MaxNumLayers were used for two different purposes: 1. Specifying the intersection of the attached buffers (this affects blitting for example) 2. Specifying the "geometry" to send to a rasterizer Changing Width and Height to be the geometry made me itch because Width and Height are zero when there are no buffer attachments. I added a mess of comments about the meaning of Width and Height. My concern is that other parts of code use Width and Height being 0 to do other things. With this patch way, the meaning of Width and Height has not changed; instead a diver has to "know" what it is doing when it enables the extension. I am not particular about this though; I made it this way in an attempt to reduce the scope of changes from this patch series. If people want that the meaning of Width, Height and MaxSamplers changes I can go with that too, I just think it can increase the ickiness of making sure no unintended side effects are added is ickier. > If we go that route, we should probably just add a gl_framebuffer::Samples > field for uniformity. > + > + > +static inline GLuint > +_mesa_geometric_height(const struct gl_framebuffer *buffer) { > + return buffer->_HasAttachments ? > + buffer->Height : buffer->DefaultGeometry.Height; } > + > +static inline GLuint > +_mesa_geometric_samples(const struct gl_framebuffer *buffer) { > + return buffer->_HasAttachments ? > + buffer->Visual.samples : buffer->DefaultGeometry.NumSamples; > +} > + > +static inline GLuint > +_mesa_geometric_layers(const struct gl_framebuffer *buffer) { > + return buffer->_HasAttachments ? > + buffer->MaxNumLayers : buffer->DefaultGeometry.Layers; } > + > extern void > _mesa_update_draw_buffer_bounds(struct gl_context *ctx); > > diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index > ef97538..f0e8fbc 100644 > --- a/src/mesa/main/mtypes.h > +++ b/src/mesa/main/mtypes.h > @@ -3178,7 +3178,13 @@ struct gl_framebuffer > * the values in DefaultGeometry to initialize its > * viewport, scissor and so on (in particular _Xmin, > * _Xmax, _Ymin and _Ymax do NOT take into account > -* _HasAttachments being false) > +* _HasAttachments being false). To get the geometry > +* of the framebuffer, the helper functions > +* _mesa_geometric_width(), > +* _mesa_geometric_height(), > +* _mesa_geometric_samples(), > +* _mesa_geometric_layers() > +* are available that check _HasAttachments. > */ > GLboolean _HasAttachments; > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/9] mesa:Define constants and functions for GL_ARB_framebuffer_no_attachment extension
> You haven't been running 'make check'. :) You also need to update > src/mesa/tests/dispatch_sanity.cpp. There is something wrong with my box or something... I did run make check and there were no failures. Out of paranoia, I also ran src/mesa/main/tests/main-test explicitly and there were no failures there either. When I look at src/mesa/tests/dispatch_sanity.cpp, the functions for ARB_framebuffer_no_attachments in the table are commented out. When I remove the comments from them, then the test fails (DispatchSanity_test.GL31_CORE). What am I missing? I am utterly confused now about src/mesa/tests/dispatch_sanity.cpp. -Kevin On 04/29/2015 01:56 AM, kevin.rogo...@intel.com wrote: > From: Kevin Rogovin > > Define the enumeration constants, function entry points and glGet for > the GL_ARB_framebuffer_no_attachments > > --- > .../glapi/gen/ARB_framebuffer_no_attachments.xml | 33 +++ > src/mapi/glapi/gen/Makefile.am | 1 + > src/mapi/glapi/gen/gl_API.xml | 4 ++- > src/mesa/main/fbobject.c | 12 +++ > src/mesa/main/fbobject.h | 7 > src/mesa/main/get.c| 3 ++ > src/mesa/main/get_hash_params.py | 38 > ++ > 7 files changed, 97 insertions(+), 1 deletion(-) create mode 100644 > src/mapi/glapi/gen/ARB_framebuffer_no_attachments.xml > > diff --git a/src/mapi/glapi/gen/ARB_framebuffer_no_attachments.xml > b/src/mapi/glapi/gen/ARB_framebuffer_no_attachments.xml > new file mode 100644 > index 000..5bde14d > --- /dev/null > +++ b/src/mapi/glapi/gen/ARB_framebuffer_no_attachments.xml > @@ -0,0 +1,33 @@ > + > + > + > + > + > + > + > + +name="FRAMEBUFFER_DEFAULT_HEIGHT" value="0x9311" /> +name="FRAMEBUFFER_DEFAULT_LAYERS" value="0x9312" /> +name="FRAMEBUFFER_DEFAULT_SAMPLES" value="0x9313" /> +name="FRAMEBUFFER_DEFAULT_FIXED_SAMPLE_LOCATIONS" value="0x9314" /> > + +name="MAX_FRAMEBUFFER_HEIGHT" value="0x9316" /> +name="MAX_FRAMEBUFFER_LAYERS" value="0x9317" /> +name="MAX_FRAMEBUFFER_SAMPLES" value="0x9318" /> > + > + > + Since it looks like your patches will land before my ES3.1 patches, I'll have to update these with es2="3.1". This is mostly to help me remember. > + > + > + > + > + > + > + > + > + > + > + > + > diff --git a/src/mapi/glapi/gen/Makefile.am > b/src/mapi/glapi/gen/Makefile.am index 1c4b86a..dc61739 100644 > --- a/src/mapi/glapi/gen/Makefile.am > +++ b/src/mapi/glapi/gen/Makefile.am > @@ -129,6 +129,7 @@ API_XML = \ > ARB_draw_instanced.xml \ > ARB_ES2_compatibility.xml \ > ARB_ES3_compatibility.xml \ > + ARB_framebuffer_no_attachments.xml \ > ARB_framebuffer_object.xml \ > ARB_geometry_shader4.xml \ > ARB_get_program_binary.xml \ > diff --git a/src/mapi/glapi/gen/gl_API.xml > b/src/mapi/glapi/gen/gl_API.xml index a8a6db6..91882ef 100644 > --- a/src/mapi/glapi/gen/gl_API.xml > +++ b/src/mapi/glapi/gen/gl_API.xml > @@ -8324,7 +8324,9 @@ > > > - > + +xmlns:xi="http://www.w3.org/2001/XInclude"/> > + > + > > > > diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c index > eabbb96..5c78c40 100644 > --- a/src/mesa/main/fbobject.c > +++ b/src/mesa/main/fbobject.c > @@ -1292,6 +1292,18 @@ _mesa_BindRenderbufferEXT(GLenum target, GLuint > renderbuffer) > bind_renderbuffer(target, renderbuffer, true); } > > +extern void GLAPIENTRY > +_mesa_FramebufferParameteri(GLenum target, GLenum pname, GLint param) > +{ > + /* to be implemented */ > +} > + > +extern void GLAPIENTRY > +_mesa_GetFramebufferParameteriv(GLenum target, GLenum pname, GLint > +*params) { > + /* to be implemented */ > +} > + > > /** > * Remove the specified renderbuffer or texture from any attachment > point in diff --git a/src/mesa/main/fbobject.h > b/src/mesa/main/fbobject.h index 61aa1f5..76adb92 100644 > --- a/src/mesa/main/fbobject.h > +++ b/src/mesa/main/fbobject.h > @@ -211,4 +211,11 @@ extern void GLAPIENTRY > _mesa_DiscardFramebufferEXT(GLenum target, GLsizei numAttachments, > const GLenum *attachments); > > + > +extern void GLAPIENTRY > +_mesa_FramebufferParameteri(GLenum target, GLenum pname, GLint > +param); > + > +extern void GLAPIENTRY > +_mesa_GetFramebufferParameteriv(GLenum target, GLenum pname, GLint > +*params); > + > #endif /* FBOBJECT_H */ > diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c index > a881bc5..ca9d13c 100644 > --- a/src/mesa/main/get.c > +++ b/src/mesa/main/get.c > @@ -393,6 +393,7 @@ EXTRA_EXT(INTEL_performance_query); > EXTRA_EXT(ARB_explicit_uniform_location); > EXTRA_EXT(ARB_clip_control); > EXTRA_EXT(EXT_polygon_offset_clamp); > +EXTRA_EXT(ARB_framebuffer_no_attachments); > > static const int > extra_ARB_color_buffer_float_or_glcore[] = { @@ -466,6 +467,8 @@ > static const int extra_core_ARB_color_buffer_float_and_new
Re: [Mesa-dev] [PATCH 4/9] mesa: add helper convenience functions for fetching geometry of gl_framebuffer
> I'm waffling about this a bit. I'm concerned that people will use > buffer->Width when they should use the accessor. I don't see any > changes to core Mesa code to use the accessor, so I'm a little concerned that > some subtle, incorrect behavior is introduced... but there may well not be. > A common idiom in Mesa is to have an _Value field that is the derived value. > In many structures in mtypes.h you can see things like Enabled and > _ReallyEnabled. One is what the API sets, and the other is what the driver > uses that is based on the API setting and "other factors." In many ways, the > existing gl_framebuffer Width and Height fields are already derived values > based on the sizes of the attachments. >For at least Width, Height, and MaxNumLayers, it seems better to set the >existing fields differently when _HasAttachments is true. That reduces the >number of places where a person has to think about _HasAttachments >considerably... though perhaps there is something else that I'm not thinking >of? I was thinking of doing this when I was making the patch. The problem is that Width, Height and MaxNumLayers were used for two different purposes: 1. Specifying the intersection of the attached buffers (this affects blitting for example) 2. Specifying the "geometry" to send to a rasterizer Changing Width and Height to be the geometry made me itch because Width and Height are zero when there are no buffer attachments. I added a mess of comments about the meaning of Width and Height. My concern is that other parts of code use Width and Height being 0 to do other things. With this patch way, the meaning of Width and Height has not changed; instead a diver has to "know" what it is doing when it enables the extension. I am not particular about this though; I made it this way in an attempt to reduce the scope of changes from this patch series. If people want that the meaning of Width, Height and MaxSamplers changes I can go with that too, I just think it can increase the ickiness of making sure no unintended side effects are added is ickier. > If we go that route, we should probably just add a gl_framebuffer::Samples > field for uniformity. > + > + > +static inline GLuint > +_mesa_geometric_height(const struct gl_framebuffer *buffer) { > + return buffer->_HasAttachments ? > + buffer->Height : buffer->DefaultGeometry.Height; } > + > +static inline GLuint > +_mesa_geometric_samples(const struct gl_framebuffer *buffer) { > + return buffer->_HasAttachments ? > + buffer->Visual.samples : buffer->DefaultGeometry.NumSamples; > +} > + > +static inline GLuint > +_mesa_geometric_layers(const struct gl_framebuffer *buffer) { > + return buffer->_HasAttachments ? > + buffer->MaxNumLayers : buffer->DefaultGeometry.Layers; } > + > extern void > _mesa_update_draw_buffer_bounds(struct gl_context *ctx); > > diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index > ef97538..f0e8fbc 100644 > --- a/src/mesa/main/mtypes.h > +++ b/src/mesa/main/mtypes.h > @@ -3178,7 +3178,13 @@ struct gl_framebuffer > * the values in DefaultGeometry to initialize its > * viewport, scissor and so on (in particular _Xmin, > * _Xmax, _Ymin and _Ymax do NOT take into account > -* _HasAttachments being false) > +* _HasAttachments being false). To get the geometry > +* of the framebuffer, the helper functions > +* _mesa_geometric_width(), > +* _mesa_geometric_height(), > +* _mesa_geometric_samples(), > +* _mesa_geometric_layers() > +* are available that check _HasAttachments. > */ > GLboolean _HasAttachments; > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/9] mesa: Complete implementation for GL_ARB_framebuffer_no_attachments in Mesa core
HI, > For both this and get_framebuffer_parameteriv below, I don't see the value in > splitting the implementations. Also, these functions need to check that the > extension is enabled and generate GL_INVALID_OPERATION if it is not. No worries, I can add the GL_INVALID_OPERATION bit in. I have a question: these functions (with the patch) are only called from the _mesa_ functions, which are what gl functions are. If the extension is not on, does glxGetProcAddress/eglGetProcAccress able to return these functions? I ask because I thought that if the extension was missing, then the functions cannot be called by the application. By split do you mean that the GL API function calls the static function? I put this split in so that when the DSA functions are made, they can call the static functions. Would you like me to kill that split still? > +{ > + switch (pname) { > + case GL_FRAMEBUFFER_DEFAULT_WIDTH: > + if (param < 0 || param > ctx->Const.MaxFramebufferWidth) > +_mesa_error(ctx, GL_INVALID_VALUE, "%s", func); > It looks like your indention is off by one here and elsewhere in the switch. Icks. Sorry about that. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/9] i965: Use _mesa_geometry_ functions appropriately
Hi, One comment on the code, or rather a request to the reviewer of the code. The icky part of checking this patch is correct is checking that the remaining instances of gl_framebuffer::Width, Height, MaxNumLayers and Visaul.samples are "correct". I believe I "got 'em all", of those that should be changed and those changed should be changed, but the part of checking if the remaining unchanged bits is not seen from the raw diff. Also, the exchange I had with Topi previously, what I was really after was just this, to make sure this gets checked and I got terribly, embarrassingly sidetracked by to split and include or not include work for a particular Gen4/5 atom only atom. My sincere apologies for all the traffic and me making, by my actions, the community and development process worse. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/9] mesa: Complete implementation for GL_ARB_framebuffer_no_attachments in Mesa core
I just want to make a begging on the review for this patch: I am a touch paranoid about how the thing will act under the GL ES situation; I believe it should follow the spec, but if whoever reviews does the extra leg work of checking that I got this right, I'd really appreciate it. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 9/9] mark GL_ARB_framebuffer_no_attachments as done for i965
Hi, > I'd just go read the ES 3.1 spec and see if there are any differences in this > area. I checked the spec, and it appears to me to have the same behavior as GL_ARB_framebuffer_no_attachments. > Also, please fix your mail client to stop its weird line wrapping (and the > other half of the time sending HTML mail). Fingers crossed that the settings are correct. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 9/9] mark GL_ARB_framebuffer_no_attachments as done for i965
> When you rebase this, I'd appreciate it if you could insert it into > the list in alphabetical order. (You based this on a commit where a > bunch of the later additions were already not inserted alphabetically, > but I've recently fixed that up.) Sure, no worries. Given that I am just changing the word "not started" to "DONE (i965)", it is a no-op really. However, I'd really appreciate any advice on the GL ES situation, basically if more say mark as done in ES I will, if less say I won't. I also would really like to hear opinions on GL ES interaction on Patch 3. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 9/9] mark GL_ARB_framebuffer_no_attachments as done for i965
> At the bottom is another block with GLES 3.1 requirements, which also > contains GL_ARB_f_n_a. At first, I said "Oh futz, I did not mark that one". Then I did some thinking. Before expressing my thoughts I want to emphasize that I really do not know what is the best answer, or potentially even a good one. With the caveat in mind, maybe it is best that it is not marked as ready for GL ES. Maybe. The potentially flawed reasoning is as follows: The patch series enables the extension for OpenGL and the extension stuff, I think, only applies to GL, and not GL ES, contexts. In particular, the "wiring" for the new functions is not implemented for GL ES. Secondly, the code in Mesa core from patch 3 of this series just checks for the driver reporting it supports the extension; it does not do the check "if driver supports extension AND it is GL". However, the default geometry values are zero, so the correct error code gets emitted anyways, but the text-error is not correct. This part makes me twitch, though it follows the letter of the specs jut tine. Lastly, and this is just paranoia and quite likely not an issue. I had not checked, much less quote, the relevant sections of the GL ES 3.1 spec. I think they are the same, but I did not check it at all. As one can see from the above, when implementing this, GLES did not even cross my mind. My tendency is to mark the "enabling" of this feature for ES3.1 a separate patch, but on the other hand it is not code, it is just saying the feature is ready for eventual use by ES3.1. Now in mid e-mail, I am thinking it should be marked. Beats me what is best. In truth what is making me twitch is Patch 3 under GL ES 3.1. I think that the function might need to be revisited a little when GL ES 3.1 support goes live. Thoughts on this are welcome, particular in reference to Patch 3 of the series. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/7] i965: use _mesa_geometry_width/height/layers/samples for programming geometry of framebuffer to GEN
Hello, > No, because the non-shared code is (by your own admission) untested and/or > dead code. Untested code is broken code. I would personally be ok with a > lot > of the changes that just replace fb->Width with > _mesa_geometric_width(fb) since it's effectively just replacing a direct > access with a getter. However, almost half of the patch is updating the > upload_sf_vp > function which is only used for gen <= 5. A comment or assert > there would be sufficient rather than reworking it. Fair enough. Would the following be good: - keep all those that replace fb->whatever with _mesa_geomety_whatever, - instead of the ick I have done to upload_sf_vp, place a big comment warning I would be happy with the above as it addresses my main concern and the dead-is-broken code concern as well. If I had physical access to a Gen4 and 5 box I would test it and if it worked, enable the extension on those platforms as well. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/7] i965: use _mesa_geometry_width/height/layers/samples for programming geometry of framebuffer to GEN
> I read the patch again and I'm still in the opinion that the changes to the > pure pre-gen7 logic (i.e., logic that is not re-used for later gens) are not > needed. As I have tried and apparently failed to communicate, it is -better- and more consistent. Need is a far stronger word. Without a doubt, if the extension is never enabled for those older Gens, then it does not matter in terms of produced output. However, I stated that it leaves a trap and an inconsistency which I find quite bothering. > The shared logic between pre-gen7 and later, namely setup for renderbuffers, > drawing rectangle and > fragment shader compilation key are safe to do as they only introduce new > logic that is conditional to > no-attachments being used. And that is exactly for the case for that code that is not shared. Indeed, if the shared code is safe for pre-Gen7, then so is the non-shared code. > Your concern about the readers getting confused could be also addressed with > assert(brw->gen >= 7) > and a comment saying that the no-attachment specific path is not applicable > for older gens. There is only one occurrence of "no-attachment specific code paths" in these i965 patches and that is associated to scissoring. The rest is existing code is changed from accessing Width, Height of gl_framebuffer to "getting" those values from a function. There is no proper place to insert an assert(brw->gen >=7 ), since, with the exception of the scissoring (and it is just one if block) there is no such "no attachment code path". I had thought the diffs of the series made that quite clear. > And when it comes to the pure pre-gen7 logic, I, in fact, have just the > opposite opinion on making it to go through the no-attachment-aware path. > As the extension is not possible for older gens, I find it clearer that logic > explicitly by-passes such paths that even consider it. Um, I am pretty sure than pre Gen7 hardware can do the extension. The crux is that the extension is pointless for such hardware because pre Gen7 hardware does not (AFAIK) have a feature that allows for a fragment shader to have a side effect. Even that statement is not totally true. Indeed, one can argue performance queries and occlusion queries with framebuffer_no_attachments make some form of sense (it would give an application a count of sorts). -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/7] Define constants and functions for ARB_framebuffer_no_attachment extension
Hello, > H that's surprising. > > src/mesa/main/tests/dispatch_sanity.cpp:// { >"glFramebufferParameteri", 43, -1 }, // XXX: Add to xml > >I thought it should detect that there's a new API and complain loudly. > At least that's how I remembered it working, but that doesn't seem to be the > case? > Are you sure you had a clean build? Either way, those should probably get > uncommented, The reason why there was no issue is because the entries for both glFramebufferParameteri and glGetFramebufferParameteriv are already commented out. Looking at the commit log, they have been commented out for quite some time. The commit from Jordan Justen (dated Oct 24,2012) titled "dispatch sanity test: Add GL CORE 3.1 test" has the line added but commented out. In fact, looking at the commit log, with git log -p the only reference to those functions has both commented out. > and there are probably interactions with ARB_dsa as well, should probably > figure out if you or Laura should add support for that (or perhaps you had it > in your patches already). The extension ARB_framebuffer_no_attachments does NOT define the DSA style functions for ARB_direct_state_access. Instead, it only defines for the EXT_direct_state_access. I think that the implementers of GL_ARB_direct_state_access are the ones that need to define and implement gl[Get]NamedFraembufferParamteri. I wrote the patch 3 so that it is trivial to implement the DSA function though. -Kevin On Fri, Apr 24, 2015 at 11:06 AM, Rogovin, Kevin wrote: > Hi, > > I agree with the comments about the code (and when the last element of the > series is reviewed I will submit the series with review comments taken into > use), but when I applied just Patch 1 and Patch 2, and ran > src/mesa/main/tests/main-test (after a git clean -dfx and all that cleaning) > all test pass, in particular the 4 DispatchSanity_test's: > DispatchSanity_test.GL31_CORE , DispatchSanity_test.GLES11, > DispatchSanity_test.GLES2 and DispatchSanity_test.GLES3. In addition, make > check passes all test as well. If you are referring to another test, what > test is that? > > -Kevin > > -Original Message- > From: ibmir...@gmail.com [mailto:ibmir...@gmail.com] On Behalf Of Ilia > Mirkin > Sent: Friday, April 24, 2015 4:36 PM > To: Matt Turner > Cc: Rogovin, Kevin; mesa-...@freedesktop.org > Subject: Re: [Mesa-dev] [PATCH 2/7] Define constants and functions for > ARB_framebuffer_no_attachment extension > > This change will make the dispatch_sanity test fail. > > On Fri, Apr 24, 2015 at 3:05 AM, Matt Turner wrote: >> The subject should be prefixed with "mesa:" >> >> On Thu, Apr 23, 2015 at 11:59 PM, wrote: >>> From: Kevin Rogovin >>> >>> Define enumerations, functions and associated glGet's for extension >>> ARB_framebuffer_no_attachment. >>> >>> --- >>> .../glapi/gen/ARB_framebuffer_no_attachments.xml | 33 ++ >>> src/mapi/glapi/gen/Makefile.am | 1 + >>> src/mapi/glapi/gen/gl_API.xml | 1 + >>> src/mesa/main/fbobject.c | 12 +++ >>> src/mesa/main/fbobject.h | 7 >>> src/mesa/main/get.c| 3 ++ >>> src/mesa/main/get_hash_params.py | 40 >>> ++ >>> 7 files changed, 97 insertions(+) >>> create mode 100644 >>> src/mapi/glapi/gen/ARB_framebuffer_no_attachments.xml >>> >>> diff --git a/src/mapi/glapi/gen/ARB_framebuffer_no_attachments.xml >>> b/src/mapi/glapi/gen/ARB_framebuffer_no_attachments.xml >>> new file mode 100644 >>> index 000..60e40d0 >>> --- /dev/null >>> +++ b/src/mapi/glapi/gen/ARB_framebuffer_no_attachments.xml >>> @@ -0,0 +1,33 @@ >>> + >>> + >>> + >>> + >>> + >>> + >>> + >>> + >> +name="FRAMEBUFFER_DEFAULT_HEIGHT" value="0x9311" /> >> +name="FRAMEBUFFER_DEFAULT_LAYERS" value="0x9312" /> >> +name="FRAMEBUFFER_DEFAULT_SAMPLES" value="0x9313" /> >> +name="FRAMEBUFFER_DEFAULT_FIXED_SAMPLE_LOCATIONS" value="0x9314" /> >>> + >> +name="MAX_FRAMEBUFFER_HEIGHT" value="0x9316" /> >> +name="MAX_FRAMEBUFFER_LAYERS" value="0x9317" /> >> +name="MAX_FRAMEBUFFER_SAMPLES" value="0x9318" /> >>> + >>> + >>> + >>> +
Re: [Mesa-dev] [PATCH 4/7] helper-conveniance functions for drivers to implement ARB_framebuffer_no_attachment
Hi, > This is in fact two changes: introduction of the helpers and refactoring of > the intersection code to be used with caller provided bounding box. Is this a request to change the commit message or to split this as well? I think splitting it is silly, but if it make you happy so be it; the reason being that there are no lurking surprises if it split or inconsistencies introduced by such a split. > > --- > src/mesa/main/framebuffer.c | 49 > ++--- > src/mesa/main/framebuffer.h | 29 +++ > src/mesa/main/mtypes.h | 21 ++- > 3 files changed, 74 insertions(+), 25 deletions(-) > > diff --git a/src/mesa/main/framebuffer.c b/src/mesa/main/framebuffer.c > index 4e4d896..7d8921b 100644 > --- a/src/mesa/main/framebuffer.c > +++ b/src/mesa/main/framebuffer.c > @@ -357,30 +357,20 @@ update_framebuffer_size(struct gl_context *ctx, > struct gl_framebuffer *fb) } > > > + > /** > - * Calculate the inclusive bounding box for the scissor of a specific > viewport > + * Given a bounding box, intersect the bounding box with the scirros > + of > + * a specified vieport. > * > * \param ctx GL context. > - * \param buffer Framebuffer to be checked against > * \param idx Index of the desired viewport > * \param bboxBounding box for the scissored viewport. Stored as xmin, > *xmax, ymin, ymax. > - * > - * \warning This function assumes that the framebuffer dimensions are > up to > - * date (e.g., update_framebuffer_size has been recently called on \c > buffer). > - * > - * \sa _mesa_clip_to_region > */ > -void > -_mesa_scissor_bounding_box(const struct gl_context *ctx, > - const struct gl_framebuffer *buffer, > - unsigned idx, int *bbox) > +extern void > +_mesa_intersect_scissor_bounding_box(const struct gl_context *ctx, > + unsigned idx, int *bbox) > { > - bbox[0] = 0; > - bbox[2] = 0; > - bbox[1] = buffer->Width; > - bbox[3] = buffer->Height; > - > if (ctx->Scissor.EnableFlags & (1u << idx)) { >if (ctx->Scissor.ScissorArray[idx].X > bbox[0]) { > bbox[0] = ctx->Scissor.ScissorArray[idx].X; @@ -402,6 > +392,33 @@ _mesa_scissor_bounding_box(const struct gl_context *ctx, > bbox[2] = bbox[3]; >} > } > +} > + > +/** > + * Calculate the inclusive bounding box for the scissor of a specific > +viewport > + * > + * \param ctx GL context. > + * \param buffer Framebuffer to be checked against > + * \param idx Index of the desired viewport > + * \param bboxBounding box for the scissored viewport. Stored as xmin, > + *xmax, ymin, ymax. > + * > + * \warning This function assumes that the framebuffer dimensions are > +up to > + * date (e.g., update_framebuffer_size has been recently called on \c > buffer). > + * > + * \sa _mesa_clip_to_region > + */ > +void > +_mesa_scissor_bounding_box(const struct gl_context *ctx, > + const struct gl_framebuffer *buffer, > + unsigned idx, int *bbox) { > + bbox[0] = 0; > + bbox[2] = 0; > + bbox[1] = buffer->Width; > + bbox[3] = buffer->Height; > + > + _mesa_intersect_scissor_bounding_box(ctx, idx, bbox); > > assert(bbox[0] <= bbox[1]); > assert(bbox[2] <= bbox[3]); > diff --git a/src/mesa/main/framebuffer.h b/src/mesa/main/framebuffer.h > index a427421..8b84d26 100644 > --- a/src/mesa/main/framebuffer.h > +++ b/src/mesa/main/framebuffer.h > @@ -76,6 +76,35 @@ _mesa_scissor_bounding_box(const struct gl_context *ctx, > const struct gl_framebuffer *buffer, > unsigned idx, int *bbox); > > +extern void > +_mesa_intersect_scissor_bounding_box(const struct gl_context *ctx, > + unsigned idx, int *bbox); > + > +static inline GLuint > +_mesa_geometric_width(const struct gl_framebuffer *buffer) { > + return buffer->_HasAttachments ? buffer->Width : > +buffer->DefaultGeometry.Width; } > + > + > +static inline GLuint > +_mesa_geometric_height(const struct gl_framebuffer *buffer) { > + return buffer->_HasAttachments ? buffer->Height : > +buffer->DefaultGeometry.Height; } > + > +static inline GLuint > +_mesa_geometric_samples(const struct gl_framebuffer *buffer) { > + return buffer->_HasAttachments ? buffer->Visual.samples : > +buffer->DefaultGeometry.NumSamples; > +} > + > +static inline GLuint > +_mesa_geometric_layers(const struct gl_framebuffer *buffer) { > + return buffer->_HasAttachments ? buffer->MaxNumLayers : > +buffer->DefaultGeometry.Layers; } > + > extern void > _mesa_update_draw_buffer_bounds(struct gl_context *ctx); > > diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index > 38a3817..ac7cdb6 100644 > --- a/src/mesa/main/mtypes.h > +++ b/src/mesa/main/mtypes.h > @@ -3134,13 +3134,13 @@ struct g
Re: [Mesa-dev] [PATCH 5/7] i965: use _mesa_geometry_width/height/layers/samples for programming geometry of framebuffer to GEN
Hi, >A couple of us chatted about this, and we all agreed that it's probably not >useful to >enable ARB_framebuffer_no_attachments on pre-Gen7. We don't expose atomics, >SSBOs, > or image load/store on those platforms (and probably won't), so the > only way fragment shaders can output any data is via framebuffer writes. > So I'd be inclined to just not touch the pre-Gen7 code at all. There are a number of issues lurking. Firstly, there are atoms from Gen4 (for example brw_drawing_rect) which are used all the way to Gen8. Therefore, Gen4 code must be changed. At this point then one can advocate to only change those atoms that are active in Gen7 and Gen8. If that is done, then there is an style inconsistency where some atoms , for Gen4,5 and 6 use the new functions and some do not. That is just icky in my opinion, as it leads to the awkward question "why? is there something delicate here?" Compounding the style issue is that the code should -convey- what it being done to the reader what is going on. Using the functions makes the convey more trivial for the reader to know. Secondly, not using those functions for Gen4,5 and 6 leaves the code with a trap for later. Namely that trap if someone says "you know although the extension is silly for Gens 4,5 and 6, it still is trivial to enable". I hate leaving traps behind for others. Thirdly, there is the real meat of reviewing patch 6: a good review of that patch will make sure that any remaining references to gl_framebuffer::Width, Height, and so on are correct (i.e. the code requires the dimension of the backing store and not the geometry). That checking is made easier, more mechanical if -all- of i965 is made consistent in this fashion, for otherwise the checker has more to check (i.e. instead of is this for setting rasterizer it is setting rasterizer and active for Gen7 and 8). When I was making these changes to i965 I was also tempted to add a functions of the sort "_mesa_buffer_width", that returned gl_fraembufffer::Width regardless of the value of gl_framebuffer::_HasAttachments. The reason why I was tempted was to help (via lovely grep) to make patch 6 and the reviewing of it more mechanical and easier. but I chose not to since the meaning of the fields from gl_framebuffer became the exact meaning of those fields. That third reason is the one reason that patch 6 should make people itch (in my opinion): "where all references hit?" Checking that require more than just checking the diff, that requires knowing all the references to gl_framebuffer::Width and friends, knowing the use there and then checking if the patch does or DOES NOT change that code block appropriately. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/7] i965: use _mesa_geometry_width/height/layers/samples for programming geometry of framebuffer to GEN
Why make it two separate patches even? Seems silly (not to mention more work really). I guess the issue is the commit message portion saying "for framebuffer_no_attachements".. perhaps I just change the commit to "use new geometry function to indicate want geometry not buffer thingies". -Kevin -Original Message- From: Pohjolainen, Topi Sent: Monday, April 27, 2015 10:43 AM To: Rogovin, Kevin Cc: mesa-...@freedesktop.org Subject: Re: [Mesa-dev] [PATCH 5/7] i965: use _mesa_geometry_width/height/layers/samples for programming geometry of framebuffer to GEN On Fri, Apr 24, 2015 at 11:36:27PM +0300, Rogovin, Kevin wrote: > I want to add one more comment on why to replace with the _mesa_geometry_ > functions, which I had thought was so obvious I neglected to mention it: > > With this series the meaning of gl_framebuffer::Width, Height, and so on have > a different meaning. They give the intersection of the backing stores of the > render targets. In contrast, the _mesa_geometry_* functions give the geometry > to feed a rasterizer/windower. By using _mesa_geometry_* functions the code > communicates clearly it wants the geometry to feed windower/rasterizer and > not the geometry of the intersection of the (potentially empty) set of > backing stores. Moreover, it is better to be consistent as well, as later > someone will likely wonder: "why in Gen7 and higher are those _mesa_geometry > functions used but not before?" That question has no good answer because it > does not make sense to not use those functions when programming the > rasterizer/windower thingies. > Could we split the patch in two? One part dealing with the necessary needed for gen7 and higher to support no_attachments and then another just making older gens consistent. > -Kevin > > -Original Message- > From: Rogovin, Kevin > Sent: Friday, April 24, 2015 7:43 PM > To: Pohjolainen, Topi > Cc: mesa-...@freedesktop.org > Subject: RE: [Mesa-dev] [PATCH 5/7] i965: use > _mesa_geometry_width/height/layers/samples for programming geometry of > framebuffer to GEN > > > > My point specifically was that you are also updating atoms that _are not_ > > re-used. > > And as those changes are not really needed, I wouldn't take the risk > > of changing something in vain. I would introduce them only when you have > > patches to really enable older generations. > > My take is the following: > > 1. Tracking (and guaranteeing) that those function left unchanged as is are > exactly just those for before Gen7 is a pain. Much easier, and more reliable > to hit them all instead. A significant number of functions in i965 are not > emit functions of any atom but emit functions of atoms map to them. Again, > more reliable and -safer- to change them all, then just the bare minimum. > > 2. The change is benign. If _HasAttachments is true, then the function > substitution gives the same value. For Gens not supporting the extension > there is no effect. > > 3. Lastly, as stated: for later it leaves the option to enable it for Gen6 > and below, it is just trivial change, but it needs testing on hardware. > > When I writing this work, I originally had it for all Gens, but changed to > support only Gen7and higher because that is all on which I can test it. > > -Kevin > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/7] i965: ensure execution of fragment shader when fragment shader has atomic buffer access
One more comment, that I neglected to add: there are other checks for _CurrentFragmentProgram to be non-NULL, indeed function brw_upload_wm_abo_surface() [file brw_wm_surface_state.c], also has a check for it being non-NULL. That function is the emit for the atom brw_wm_abo_surfaces which is present in both gen7_atoms and gen8_atoms. I would argue that _CurrentFragmentProgram can be NULL, given that other places check for it and that without the check piglit gets about 30 more crashes. Sorry for not posting this in the first reply. -Kevin -Original Message- From: Rogovin, Kevin Sent: Friday, April 24, 2015 11:22 PM To: 'Kenneth Graunke'; mesa-dev@lists.freedesktop.org Cc: mesa-...@freedesktop.org; curroje...@riseup.net Subject: RE: [Mesa-dev] [PATCH 6/7] i965: ensure execution of fragment shader when fragment shader has atomic buffer access > Checking brw->ctx.Shader._CurrentFragmentProgram != NULL is unnecessary. > There is always a valid pixel shader. (If the application is using > fixed-function, we supply a fragment shader for them.) Please drop that > check. Without this check(in the Gen7 function/code), about 30 crashes are induced on piglit tests for Gen7; the tests are all using GL fixed function pipeline. I have not run piglit without this check on Gen8 though. > I thought that UAVs were essentially for Images...I'm not clear why this is > needed. Perhaps Curro can confirm one way or another. The essential reason is to guarantee that the pixel shader gets invoked by Gen even when all render target surfaces are NULL surfaces. There are other flags one can use, but the UAV seems (to me) the most natural. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/7] i965: use _mesa_geometry_width/height/layers/samples for programming geometry of framebuffer to GEN
I want to add one more comment on why to replace with the _mesa_geometry_ functions, which I had thought was so obvious I neglected to mention it: With this series the meaning of gl_framebuffer::Width, Height, and so on have a different meaning. They give the intersection of the backing stores of the render targets. In contrast, the _mesa_geometry_* functions give the geometry to feed a rasterizer/windower. By using _mesa_geometry_* functions the code communicates clearly it wants the geometry to feed windower/rasterizer and not the geometry of the intersection of the (potentially empty) set of backing stores. Moreover, it is better to be consistent as well, as later someone will likely wonder: "why in Gen7 and higher are those _mesa_geometry functions used but not before?" That question has no good answer because it does not make sense to not use those functions when programming the rasterizer/windower thingies. -Kevin -Original Message----- From: Rogovin, Kevin Sent: Friday, April 24, 2015 7:43 PM To: Pohjolainen, Topi Cc: mesa-...@freedesktop.org Subject: RE: [Mesa-dev] [PATCH 5/7] i965: use _mesa_geometry_width/height/layers/samples for programming geometry of framebuffer to GEN > My point specifically was that you are also updating atoms that _are not_ > re-used. > And as those changes are not really needed, I wouldn't take the risk > of changing something in vain. I would introduce them only when you have > patches to really enable older generations. My take is the following: 1. Tracking (and guaranteeing) that those function left unchanged as is are exactly just those for before Gen7 is a pain. Much easier, and more reliable to hit them all instead. A significant number of functions in i965 are not emit functions of any atom but emit functions of atoms map to them. Again, more reliable and -safer- to change them all, then just the bare minimum. 2. The change is benign. If _HasAttachments is true, then the function substitution gives the same value. For Gens not supporting the extension there is no effect. 3. Lastly, as stated: for later it leaves the option to enable it for Gen6 and below, it is just trivial change, but it needs testing on hardware. When I writing this work, I originally had it for all Gens, but changed to support only Gen7and higher because that is all on which I can test it. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/7] i965: ensure execution of fragment shader when fragment shader has atomic buffer access
> Checking brw->ctx.Shader._CurrentFragmentProgram != NULL is unnecessary. > There is always a valid pixel shader. (If the application is using > fixed-function, we supply a fragment shader for them.) Please drop that > check. Without this check(in the Gen7 function/code), about 30 crashes are induced on piglit tests for Gen7; the tests are all using GL fixed function pipeline. I have not run piglit without this check on Gen8 though. > I thought that UAVs were essentially for Images...I'm not clear why this is > needed. Perhaps Curro can confirm one way or another. The essential reason is to guarantee that the pixel shader gets invoked by Gen even when all render target surfaces are NULL surfaces. There are other flags one can use, but the UAV seems (to me) the most natural. -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/7] i965: ensure execution of fragment shader when fragment shader has atomic buffer access
> I'll check with Jordan and others. I have a faint recollection that compute > shaders have similar > needs. I think your change is fine though, I just want to understand the > bigger picture first. I do not think compute shaders have similar needs. These flags are for making sure the rasterizer-wm thingy in Gen spawns the fragment shader threads. Compute kernels are not (I believe) spawned by the raster-wm thing, as they do not actually use the pipeline (rather they use L3, samplers and EU's only essentially). -Kevin ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev