Re: [Mesa-dev] [PATCH 0/2] Implement INTEL_fragment_shader_ordering

2018-08-28 Thread Rogovin, Kevin
Hi,

> You are completely missing the point.  Any test which tests the
> GL_ARB_fragment_shader_interlock extension with a
> beginInvocationInterlockARB() call inside of control-flow would be an
> invalid test for GL_ARB_fragment_shader_interlock since that behavior is
> undefined.  Therefore, any such test would be a bad test.  Unless we have
> tests which test beginFragmentShaderOrderingINTEL() inside non-uniform
> control-flow, it is insufficiently tested.  Therefore, any tests we may
> have for GL_ARB_fragment_shader_interlock are either invalid use of the ARB
> extension or they are insufficient to test the INTEL extension.

My apologies, I misunderstood your question "is that tested?"; I thought the 
question was about if the assentation that the current implementation of 
beginFragmentShaderOrderingINTEL() and beginInvocationInterlockARB() were 
functionally interchangeable has been verified/tested.

FWIW, a test where beginInvocationInterlockARB() is called within any control 
flow or for that matter not called from main(), the shader is supposed to be 
rejected according the spec. In a negative way, it is another test. Though, 
while acknowledging Ian's point that "works for vendor X" has caused a great 
deal of GLSL fragmentation, I still personally prefer to implementation to try 
to take what applications throw at them as much as reasonable possible. 
However, that is not my decision to make and whatever the community chooses is 
what it shall be.

I agree with you that taking the current ARB test and then adding some 
non-uniform flow control (along with removing the layout() qualifiers) to it is 
the best way to go and I *think* Plamena has offered to do so.

At this point, I can just let for it all to sort out and bow out of the 
discussion at this point as the series seems to have brought additional issues 
up that are out-of-scope for what I had in mind with the series (namely 
something small-simple to expose the extension and not enforcing the limitation 
of the ARB extension).

-Kevin

-Original Message-
From: Rogovin, Kevin 
Sent: Tuesday, August 28, 2018 10:10 PM
To: 'mesa-dev@lists.freedesktop.org' 
Cc: ja...@jlekstrand.net
Subject: [Mesa-dev] [PATCH 0/2] Implement INTEL_fragment_shader_ordering

Hi,

> Is that tested?

I have tested it in a simple shader test I made (i.e. not in a dedicated test 
suite such as dEQP, piglit or something else). The created assembly is 
identical. The specific example is a shader where I replace calls of 
beginFragmentShaderOrderingINTEL() with beginInvocationInterlockARB() and the 
created assembly is the same. Running with INTEL_DEBUG=fs produced identical 
output except the SSA NIR had the different opcode. AFAIK, the current Mesa 
implementation of the ARB extension does not in any way shape or form enforce 
any of "no control flow", "only once and only in main" requirements. Indeed, 
Mesa happily accepts code even without the endInvocationInterlockARB() call as 
well.  Personally, I am happy that it is more accepting rather than rejecting 
the code.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/2] Implement INTEL_fragment_shader_ordering

2018-08-28 Thread Rogovin, Kevin
Hi,

> Is that tested?

I have tested it in a simple shader test I made (i.e. not in a dedicated test 
suite such as dEQP, piglit or something else). The created assembly is 
identical. The specific example is a shader where I replace calls of 
beginFragmentShaderOrderingINTEL() with beginInvocationInterlockARB() and the 
created assembly is the same. Running with INTEL_DEBUG=fs produced identical 
output except the SSA NIR had the different opcode. AFAIK, the current Mesa 
implementation of the ARB extension does not in any way shape or form enforce 
any of "no control flow", "only once and only in main" requirements. Indeed, 
Mesa happily accepts code even without the endInvocationInterlockARB() call as 
well.  Personally, I am happy that it is more accepting rather than rejecting 
the code.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/2] Implement INTEL_fragment_shader_ordering

2018-08-28 Thread Rogovin, Kevin
Hi,

> Having the same underlying compiler intrinsic and having the same behavior
> are not the same thing.  The INTEL extension allows strictly more
> functionality than the ARB extension so it needs even more testing.  In
> particular, it allows you to do the barrier in non-uniform control-flow
> which is a very different thing than at the top of a shader like is allowed
> by ARB_fragment_shader_interlock.  I expect the INTEL extension to need
> substantially more testing than the ARB one.

Just an FYI: the Mesa implementation for the ARB (and thus NV) extension accept 
shaders where the begin function takes place inside of control flow actually.

 -Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [1/2] mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering.

2018-08-28 Thread Rogovin, Kevin
Hi all,

Patch addressing the missing enum warning is here: 
https://lists.freedesktop.org/archives/mesa-dev/2018-August/203796.html . Also, 
see my reply to Francisco why I think it is better to have a new intrinsic 
function for that.

Best Regards,
-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [1/2] mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering.

2018-08-28 Thread Rogovin, Kevin
Hi,

Sighs; I missed that warning on the build since there is so much build output. 
I can issue a small patch to handle the missing enum.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] Implement INTEL_fragment_shader_ordering

2018-08-28 Thread Rogovin, Kevin
Hi,

Sorry for the doubly reply, but I wanted to add one more important bit in my 
thinking in addition to doing minimal code changes and following existing 
convention. The ARB/NV extensions defined a critical section where as the INTEL 
extension is just a barrier function. I suspect (but I do not know) that the 
NVIDIA hardware that supports the extension is implemented by some weird 
interfacing logic with the pixel backend and that implementation requires that 
there is just one critical section under no control flow whereas the INTEL is 
just a memory barrier. In that light, I think they should be 3 different 
intrinsics with the thinking that if the Nouveau driver adds support for the 
ARB/NV extension it will do the IR analysis to do what is needed for the 
critical-section style extension.

-Kevin

From: Rogovin, Kevin
Sent: Tuesday, August 28, 2018 7:05 PM
To: mesa-dev@lists.freedesktop.org
Cc: Jerez Plata, Francisco 
Subject: Re: [Mesa-dev] [PATCH 0/2] Implement INTEL_fragment_shader_ordering

Hi,

I followed the convention that was already present. The code from 
ARB/NV_fragment_shader_interlock had an intrinsic for begin critical section 
and an intrinsic for end critical section. I figured that since this extension 
has a completely different thinking (i.e. a memory barrier instead of a 
section) it warranted a new intrinsic. That and doing this way incurred the 
least amount of changes which I figured is always a good thing.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] Implement INTEL_fragment_shader_ordering

2018-08-28 Thread Rogovin, Kevin
Hi,

 On the questions of tests: If people want, I can adapt the test in piglit for 
ARB_fragment_shader_interlock to this INTEL one. In general, I have an 
app/library that uses the extension and testing of that does definitely work on 
i965 (which should be utterly unsurprising since the produced assembly is 
identical). Indeed the current implementation in Mesa of 
ARB_fragment_shader_interlock is as flexible as the INTEL one since the 
compiler accepts code with the begin/end interlock anywhere since the only 
backend that supports it, i965, can easily accept it. I view the 
INTEL_fragment_shader_ordering implementation in a similar light as the 
NV_fragment_shader_interlock where it actually does not really do anything new, 
but signals to an application that the Mesa/i965 implementation allows more 
(and does it correctly) than the ARB/NV extensions restrict to.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] Implement INTEL_fragment_shader_ordering

2018-08-28 Thread Rogovin, Kevin
Hi,

I followed the convention that was already present. The code from 
ARB/NV_fragment_shader_interlock had an intrinsic for begin critical section 
and an intrinsic for end critical section. I figured that since this extension 
has a completely different thinking (i.e. a memory barrier instead of a 
section) it warranted a new intrinsic. That and doing this way incurred the 
least amount of changes which I figured is always a good thing.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] i965: prevent using auxilary buffers when an astc5x5 texture is present

2018-03-16 Thread Rogovin, Kevin
Hi,

 I will just pipe in on the scenario that needs to be checked and ways around 
the scenario as well. Firstly, What Tapani has below is what I first wrote as 
well when developing a v4, it is cleaner and from that point of view I prefer 
it as well. However, the following scenario made me have reservations:

1. The final patch in the series has a simple test if the src in blorp_params 
HAS an auxiliary buffer, rather than a test if the sampler will read the 
surface using the auxiliary buffer. Note that on a simple blit from one texture 
that has an auxiliary (for example lossless color compression) to another 
surface that blorp needs to use the auxiliary buffer, this case is likely to 
happen in glBlitFramebuffer).

2. If brw_resolve_framebuffer() triggers a blorp call (for example render to 
and read from a same texture ala GL_ARB_texture_barrier), then blorp MIGHT set 
the mode to BRW_ASTC5x5_WA_MODE_HAS_AUX (even though the resolve does not have 
the sampler read from the surface in the way that helps trigger the hang). 
However, then if the mode was set in brw_resolve_inputs() to HAS_ASTC5x5, then 
that information is lost. The state upload for surfaces, because it uses the 
mode value, will then let auxiliary buffers be used by the sampler.

I do not know the internals of blorp well; if it is the case that the check 
made in patch 3/3 for having an auxiliary buffer is false in the case of the 
blorp calls triggered by brw_resolve_framebuffer(), then what Tapani posted is 
better; if that is not the case, then a nasty, hard to track bug will then lurk.

-Kevin

-Original Message-
From: Palli, Tapani 
Sent: Friday, March 16, 2018 11:32 AM
To: mesa-dev@lists.freedesktop.org
Cc: ja...@jlekstrand.net; Rogovin, Kevin ; Palli, 
Tapani 
Subject: [PATCH v2] i965: prevent using auxilary buffers when an astc5x5 
texture is present

From: Kevin Rogovin 

If ASTC5x5 textures are present, resolve all textures that the sampler accesses 
so that auxilary buffer is unneeded when the astc5x5 workaround is needed and 
also program the sampler state to not use the auxilary buffer as well.

v2: call gen9_set_astc5x5_wa_mode directly in brw_predraw_resolve_inputs
(Tapani)

Signed-off-by: Kevin Rogovin 
Signed-off-by: Tapani Pälli 
---

I've gone through this patch and wanted to express my thoughts as v2 rather 
than review. I've also discussed with Kevin offline about a plan to build a 
proper Piglit test to hit this and attempt to hit blorp as well (currently 
nothing really hitting that one).

 src/mesa/drivers/dri/i965/brw_draw.c | 64 +---
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 12 -
 2 files changed, 68 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index ef7c6a7df9..0b4e993576 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -392,6 +392,8 @@ mark_textures_used_for_txf(BITSET_WORD *used_for_txf,
  *
  * Resolve the depth buffer's HiZ buffer, resolve the depth buffer of each
  * enabled depth texture, and flush the render cache for any dirty textures.
+ * In addition, if the ASTC5x5 workaround is needed and if ASTC5x5 
+ textures
+ * are present, resolve textures so that auxilary buffers are not needed.
  */
 void
 brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering, @@ -412,8 
+414,33 @@ brw_predraw_resolve_inputs(struct brw_context *brw, bool rendering,
   mark_textures_used_for_txf(used_for_txf, ctx->ComputeProgram._Current);
}
 
-   /* Resolve depth buffer and render cache of each enabled texture. */
int maxEnabledUnit = ctx->Texture._MaxEnabledTexImageUnit;
+   bool disable_aux = false;
+   bool texture_astc5x5_present = false;
+   bool texture_with_auxilary_present = false;
+
+   if (gen9_astc5x5_wa_required(brw)) {
+  /* walk through all the texture units to see if an ASTC5x5 and/or
+   * a texture with an auxilary buffer is to be accessed.
+   */
+  for (int i = 0; i <= maxEnabledUnit; i++) {
+ if (!ctx->Texture.Unit[i]._Current)
+continue;
+ tex_obj = intel_texture_object(ctx->Texture.Unit[i]._Current);
+ if (!tex_obj)
+continue;
+ if (tex_obj->mt && tex_obj->mt->aux_usage != ISL_AUX_USAGE_NONE) {
+texture_with_auxilary_present = true;
+ }
+ if (tex_obj->_Format == MESA_FORMAT_RGBA_ASTC_5x5 ||
+ tex_obj->_Format == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5) {
+texture_astc5x5_present = true;
+ }
+  }
+  disable_aux = texture_astc5x5_present;
+   }
+
+   /* Resolve depth buffer and render cache of each enabled texture. */
for (int i = 0; i <= maxEnabledUnit; i++) {
   if (!ctx->Texture.Unit[i]._Current)
 continue;
@@ -445,9 +472,16 @@ brw_predraw_resolve_inputs(struct brw_context *brw, bool 
ren

Re: [Mesa-dev] [PATCH v3 2/7] i965: restore diable_aux argument to intel_miptree_prepare_texture()

2018-03-07 Thread Rogovin, Kevin
Just to make sure, you want it so that a new field is added to brw_context that 
tells intel_miptree_prepare_texture() to disable auxilary buffer using? That 
variable would then need to be carefully set and tracked carefully if anything 
besids brw_draw uses the function. This appears to be opposite request than 
that what was given in patch 1 (that request I thought was good).

FWIW, there is now a work in progress taking these review requests into use 
https://github.com/krogueintel/asem/tree/astc5x5-wa-v4; at this point in time 
this issue and the blorp issues remain in that branch.

-Kevin

From: Jason Ekstrand [mailto:ja...@jlekstrand.net]
Sent: Tuesday, March 6, 2018 6:00 PM
To: Rogovin, Kevin 
Cc: ML mesa-dev 
Subject: Re: [Mesa-dev] [PATCH v3 2/7] i965: restore diable_aux argument to 
intel_miptree_prepare_texture()

We took it out with good reason... I'd rather we do something similar to what 
we did for render targets and just pass aux usage directly from 
brw_predraw_resolve_inputs to brw_wm_surface_state through the context.

On Tue, Feb 27, 2018 at 1:30 AM, 
mailto:kevin.rogo...@intel.com>> wrote:
From: Kevin Rogovin mailto:kevin.rogo...@intel.com>>

Signed-off-by: Kevin Rogovin 
mailto:kevin.rogo...@intel.com>>
---
 src/mesa/drivers/dri/i965/brw_draw.c  | 9 ++---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 5 +++--
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 3 ++-
 3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index 299e7f9..0241035 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -447,7 +447,8 @@ brw_predraw_resolve_inputs(struct brw_context *brw, bool 
rendering,

   intel_miptree_prepare_texture(brw, tex_obj->mt, view_format,
 min_level, num_levels,
-min_layer, num_layers);
+min_layer, num_layers,
+false);

   /* If any programs are using it with texelFetch, we may need to also do
* a prepare with an sRGB format to ensure texelFetch works "properly".
@@ -458,7 +459,8 @@ brw_predraw_resolve_inputs(struct brw_context *brw, bool 
rendering,
  if (txf_format != view_format) {
 intel_miptree_prepare_texture(brw, tex_obj->mt, txf_format,
   min_level, num_levels,
-  min_layer, num_layers);
+  min_layer, num_layers,
+  false);
  }
   }

@@ -530,7 +532,8 @@ brw_predraw_resolve_framebuffer(struct brw_context *brw,
  if (irb) {
 intel_miptree_prepare_texture(brw, irb->mt, irb->mt->surf.format,
   irb->mt_level, 1,
-  irb->mt_layer, irb->layer_count);
+  irb->mt_layer, irb->layer_count,
+  false);
  }
   }
}
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index c6213b2..dbd9f7a 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -2648,9 +2648,10 @@ intel_miptree_prepare_texture(struct brw_context *brw,
   struct intel_mipmap_tree *mt,
   enum isl_format view_format,
   uint32_t start_level, uint32_t num_levels,
-  uint32_t start_layer, uint32_t num_layers)
+  uint32_t start_layer, uint32_t num_layers,
+  bool disable_aux)
 {
-   enum isl_aux_usage aux_usage =
+   enum isl_aux_usage aux_usage = (disable_aux) ? ISL_AUX_USAGE_NONE :
   intel_miptree_texture_aux_usage(brw, mt, view_format);
bool clear_supported = aux_usage != ISL_AUX_USAGE_NONE;

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index 07c8580..ee72309 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -642,7 +642,8 @@ intel_miptree_prepare_texture(struct brw_context *brw,
   struct intel_mipmap_tree *mt,
   enum isl_format view_format,
   uint32_t start_level, uint32_t num_levels,
-  uint32_t start_layer, uint32_t num_layers);
+  uint32_t start_layer, uint32_t num_layers,
+  bool disable_aux);
 void
 intel_miptree_prepare_image(struct brw_context *brw,
 

Re: [Mesa-dev] [PATCH v3 7/7] i965: ASTC5x5 workaround logic for blorp

2018-03-06 Thread Rogovin, Kevin
What would be a good place for this in the blorp code? I am guessing the only 
time that blorp would trigger an astc read is in blitting… what would you 
suggest for keeping it inside of blorp? The blorp_context struct is part of 
blorp not i965… to keep it in blorp would mean it needs to be passed down along 
the function calls somehow …

-Kevin

From: Jason Ekstrand [mailto:ja...@jlekstrand.net]
Sent: Tuesday, March 6, 2018 6:07 PM
To: Rogovin, Kevin 
Cc: ML mesa-dev 
Subject: Re: [Mesa-dev] [PATCH v3 7/7] i965: ASTC5x5 workaround logic for blorp

On Tue, Feb 27, 2018 at 1:30 AM, 
mailto:kevin.rogo...@intel.com>> wrote:
From: Kevin Rogovin mailto:kevin.rogo...@intel.com>>

Signed-off-by: Kevin Rogovin 
mailto:kevin.rogo...@intel.com>>
---
 src/mesa/drivers/dri/i965/genX_blorp_exec.c |  5 +
 src/mesa/drivers/dri/i965/intel_tex_image.c | 16 
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/genX_blorp_exec.c 
b/src/mesa/drivers/dri/i965/genX_blorp_exec.c
index 062171a..b7c35e9 100644
--- a/src/mesa/drivers/dri/i965/genX_blorp_exec.c
+++ b/src/mesa/drivers/dri/i965/genX_blorp_exec.c
@@ -230,6 +230,11 @@ genX(blorp_exec)(struct blorp_batch *batch,
struct gl_context *ctx = &brw->ctx;
bool check_aperture_failed_once = false;

+   if (brw->astc5x5_wa.blorp_sampling_from_astc5x5) {
+  gen9_set_astc5x5_wa_mode(brw, BRW_ASTC5x5_WA_MODE_HAS_ASTC5x5);
+   } else {
+  gen9_set_astc5x5_wa_mode(brw, BRW_ASTC5x5_WA_MODE_HAS_AUX);
+   }
/* Flush the sampler and render caches.  We definitely need to flush the
 * sampler cache so that we get updated contents from the render cache for
 * the glBlitFramebuffer() source.  Also, we are sometimes warned in the
diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
b/src/mesa/drivers/dri/i965/intel_tex_image.c
index e25bc9a..38967fb 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -824,10 +824,18 @@ intel_get_tex_sub_image(struct gl_context *ctx,
DBG("%s\n", __func__);

if (_mesa_is_bufferobj(ctx->Pack.BufferObj)) {
-  if (intel_gettexsubimage_blorp(brw, texImage,
- xoffset, yoffset, zoffset,
- width, height, depth, format, type,
- pixels, &ctx->Pack))
+  bool blorp_success;
+
+  brw->astc5x5_wa.blorp_sampling_from_astc5x5 =
+ (texImage->TexFormat == MESA_FORMAT_RGBA_ASTC_5x5 ||
+  texImage->TexFormat == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5);
+  blorp_success = intel_gettexsubimage_blorp(brw, texImage,
+ xoffset, yoffset, zoffset,
+ width, height, depth,
+ format, type, pixels,
+ &ctx->Pack);
+  brw->astc5x5_wa.blorp_sampling_from_astc5x5 = false;
+  if (blorp_success)

Let's keep all of the code for detecting whether or not BLORP needs and 
implementing it together in BLORP.  We can just as easily detect ASTC5x5 there 
as here.

  return;

   perf_debug("%s: fallback to CPU mapping in PBO case\n", __func__);
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org<mailto:mesa-dev@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 3/7] i965: set ASTC5x5 workaround texture type tracking on texture validate

2018-03-06 Thread Rogovin, Kevin
Originally, I had the entire astc and aux checking in a dedicated function; 
would that be more preferable?

-Kevin

From: Jason Ekstrand [mailto:ja...@jlekstrand.net]
Sent: Tuesday, March 6, 2018 6:02 PM
To: Rogovin, Kevin 
Cc: ML mesa-dev 
Subject: Re: [Mesa-dev] [PATCH v3 3/7] i965: set ASTC5x5 workaround texture 
type tracking on texture validate

I don't really like this being in texture validation.  It has nothing to do 
with texture validation.  Instead, we should put it in 
brw_predraw_resolve_inputs and do it as a pre-pass.  This would let us store 
astc5x5_present as a local variable.  Also, I don't think we need 
texture_with_auxiliary_present for anything since our solution is to resolve 
those.

On Tue, Feb 27, 2018 at 1:30 AM, 
mailto:kevin.rogo...@intel.com>> wrote:
From: Kevin Rogovin mailto:kevin.rogo...@intel.com>>

Signed-off-by: Kevin Rogovin 
mailto:kevin.rogo...@intel.com>>
---
 src/mesa/drivers/dri/i965/intel_tex_validate.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/intel_tex_validate.c 
b/src/mesa/drivers/dri/i965/intel_tex_validate.c
index eaa60ba..2bf6c65 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_validate.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_validate.c
@@ -179,6 +179,8 @@ brw_validate_textures(struct brw_context *brw)
struct gl_context *ctx = &brw->ctx;
const int max_enabled_unit = ctx->Texture._MaxEnabledTexImageUnit;

+   brw->astc5x5_wa.texture_astc5x5_present = false;
+   brw->astc5x5_wa.texture_with_auxilary_present = false;
for (int unit = 0; unit <= max_enabled_unit; unit++) {
   struct gl_texture_object *tex_obj = ctx->Texture.Unit[unit]._Current;

@@ -194,5 +196,18 @@ brw_validate_textures(struct brw_context *brw)

   intel_update_max_level(tex_obj, sampler);
   intel_finalize_mipmap_tree(brw, tex_obj);
+
+  /* ASTC5x5 workaround needs to know if textures in use have
+   * auxilary in buffers and/or a texture in use is ASTC5x5
+   */
+  struct intel_texture_object *tex = intel_texture_object(tex_obj);
+  struct intel_mipmap_tree *mt = tex->mt;
+  if (mt && mt->aux_usage != ISL_AUX_USAGE_NONE) {
+ brw->astc5x5_wa.texture_with_auxilary_present = true;
+  }
+  if (tex->_Format == MESA_FORMAT_RGBA_ASTC_5x5 ||
+  tex->_Format == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5) {
+ brw->astc5x5_wa.texture_astc5x5_present = true;
+  }
}
 }
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org<mailto:mesa-dev@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround

2018-02-14 Thread Rogovin, Kevin
Hi,

 For those interested, I made a version of the series that does not change 
intel_miptree_texture_aux_usage(), but instead restores the disable_aux 
argument to intel_miptree_prepare_texture() and uses that. What I do not like 
is that it forced one to have several little things tweaked here and there to 
make it well whereas in contrast hitting intel_miptree_texture_aux_usage() put 
it in one place exactly. The branch is: 
https://github.com/krogueintel/asem/tree/astc5x5-wa-v3 .

I took a look at blorp's using of intel_miptree_prepare_texture(). It uses it 
in only one place: when blitting (no surprise). Having the approach of changing 
intel_miptree_prepare_texture() will have blorp potentially not use auxiliary 
buffers on blitting; functionality will NOT regress because blorp angelically 
calls intel_miptree_prepare_access(), but it might degrade performance. 
Specifically: blorp when reading form the source surface will not use the 
auxiliary surface if the last draw/compute had astc5x5 texture.

At this point, the choice is essentially:

  - cleaner and simpler code by hitting of intel_miptree_prepare_texture() at 
cost of potential performance hit of where a blit follows a draw (or compute) 
that uses an ASTC5x5 texture

OR

  - ickier code without that one cost in blorp.

I honestly suspect that one will never find an application that has a 
performance difference between the two unless one deliberately makes such an 
application.

-Kevin

-Original Message-
From: Palli, Tapani 
Sent: Wednesday, February 14, 2018 9:58 AM
To: Rogovin, Kevin ; Jason Ekstrand 

Cc: ML mesa-dev 
Subject: Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround



On 14.02.2018 09:54, Tapani Pälli wrote:
> 
> 
> On 14.02.2018 09:38, Rogovin, Kevin wrote:
>> Hi,
>>
>>   The 3rd patch, "i965: use ASTC5x5 workaround in 
>> intel_miptree_texture_aux_usage has issues:
>>     1. Definitely: brw_draw lacks the call to
>> gen9_astc5x5_perform_wa() which generates the needed flush between 
>> batchbuffers
> 
> Now it happens via intel_miptree_prepare_texture (in 
> intel_miptree_texture_aux_usage).

No it does not, sorry for that :) Yeah I believe I need to restore that.


> 
>>     2.  Uneasy: I am nervous about hitting
>> intel_miptree_texture_aux_usage() as it is used in lots of places 
>> directly and indirectly. The interaction with blorp resolve makes me 
>> very uneasy  I would rather restore the bool argument disable_aux 
>> to intel_miptree_prepare_render() to keep the ASTC evil a little more 
>> localized (the function is used only in brw_draw.c for resolving inputs).
> 
> IMO it feels more centralized .. but no strong opinions here. I 
> haven't spotted any regressions because of this.
> 
> 
>> -Kevin
>>
>>
>> -Original Message-
>> From: Palli, Tapani
>> Sent: Monday, February 12, 2018 10:14 AM
>> To: Rogovin, Kevin ; Jason Ekstrand 
>> 
>> Cc: ML mesa-dev 
>> Subject: Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround
>>
>>
>>
>> On 02/12/2018 09:44 AM, Tapani Pälli wrote:
>>> Hi;
>>>
>>> On 02/08/2018 09:50 AM, Rogovin, Kevin wrote:
>>>> Hi,
>>>>
>>>> I gave it a whirl of setting the .mocs field set to 0 passed to
>>>> isl_surf_fill_state() ALWAYS. Sadly CarChase GLES continued to hang 
>>>> (where as the GL did not because it does not use ASTC). This makes 
>>>> sense since MOCS (atleast last time I looked at it) only really 
>>>> controls cache usage for L3 and eLLC (please anyone correct me if I 
>>>> am wrong in this) whereas the issue is that the samplers mess up 
>>>> how they deal with its own (private) cache.
>>>>
>>>> It really is nasty that it appears (as of now) that this 
>>>> complicated work around is needed and needs to somehow be 
>>>> re-implemented in anv as well.
>>>
>>> It seems surrounding code has changed so that these patches need 
>>> some changes. Kevin, are you planning to rebase/refactor these changes?
>>
>> FYI I've rebased the patches and did additional porting (because of 
>> commit df13588d21) here:
>>
>> https://cgit.freedesktop.org/~tpalli/mesa/log/?h=astc5x5
>>
>> Let me know if this looks OK for you.
>>
>>>
>>>> -Kevin
>>>>
>>>> *From:*Jason Ekstrand [mailto:ja...@jlekstrand.net]
>>>> *Sent:* Thursday, February 8, 2018 2:47 AM
>>>> *To:* Rogovin, Kevin 
>>>> *Cc:* ML mesa-dev 
>>>> *Subject:* Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround
>>>>
>>>> Random thought:

Re: [Mesa-dev] [PATCH v3] intel/tools: new intel_sanitize_gpu tool

2018-02-13 Thread Rogovin, Kevin
Reviewed by: kevin.rogovin [at] intel.com

-Original Message-
From: Phillips, Scott D 
Sent: Friday, February 9, 2018 3:11 AM
To: mesa-dev@lists.freedesktop.org; Rogovin, Kevin 
Subject: [PATCH v3] intel/tools: new intel_sanitize_gpu tool

From: Kevin Rogovin 

Adds a new debug tool to pad each GEM BO allocated with (weak) pseudo-random 
noise values which are then checked after each batchbuffer dispatch to the 
kernel. This can be quite valuable to find diffucult to track down heisenberg 
style bugs.

[scott.d.phill...@intel.com: split to separate tool]

v2: (by Scott D Phillips)
- track gem handles per fd (Kevin)
- remove handles on GEM_CLOSE (Kevin)
- ignore prime handles
- meson & shell script

v3: (by Scott D Phillips)
- don't track prime bos at all (Kevin)
- protect the hash table with a mutex (Kevin)
- hook fds by drm_version.name, not path (Chris Wilson)
---
 src/intel/tools/intel_sanitize_gpu.c  | 430 ++
 src/intel/tools/intel_sanitize_gpu.in |   4 +
 src/intel/tools/meson.build   |  25 ++
 3 files changed, 459 insertions(+)
 create mode 100644 src/intel/tools/intel_sanitize_gpu.c
 create mode 100755 src/intel/tools/intel_sanitize_gpu.in

diff --git a/src/intel/tools/intel_sanitize_gpu.c 
b/src/intel/tools/intel_sanitize_gpu.c
new file mode 100644
index 000..9b49b0bbf22
--- /dev/null
+++ b/src/intel/tools/intel_sanitize_gpu.c
@@ -0,0 +1,430 @@
+/*
+ * Copyright © 2015-2018 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person 
+obtaining a
+ * copy of this software and associated documentation files (the 
+"Software"),
+ * to deal in the Software without restriction, including without 
+limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
+sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom 
+the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the 
+next
+ * paragraph) shall be included in all copies or substantial portions 
+of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
+EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
+MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT 
+SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
+OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
+ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
+DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#undef _FILE_OFFSET_BITS /* prevent #define open open64 */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "util/hash_table.h"
+
+#define INTEL_LOG_TAG "INTEL-SANITIZE-GPU"
+#include "common/intel_log.h"
+#include "common/gen_clflush.h"
+
+static int (*libc_open)(const char *pathname, int flags, mode_t mode); 
+static int (*libc_close)(int fd); static int (*libc_ioctl)(int fd, 
+unsigned long request, void *argp); static int (*libc_fcntl)(int fd, 
+int cmd, int param);
+
+#define DRM_MAJOR 226
+
+/* TODO: we want to make sure that the padding forces
+ * the BO to take another page on the (PP)GTT; 4KB
+ * may or may not be the page size for the BO. Indeed,
+ * depending on GPU, kernel version and GEM size, the
+ * page size can be one of 4KB, 64KB or 2M.
+ */
+#define PADDING_SIZE 4096
+
+struct refcnt_hash_table {
+   struct hash_table *t;
+   int refcnt;
+};
+
+pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
+#define MUTEX_LOCK() do {\
+   if (unlikely(pthread_mutex_lock(&mutex))) {   \
+  intel_loge("mutex_lock failed");   \
+  abort();   \
+   } \
+} while (0)
+#define MUTEX_UNLOCK() do {  \
+   if (unlikely(pthread_mutex_unlock(&mutex))) { \
+  intel_loge("mutex_unlock failed"); \
+  abort();   \
+   } \
+} while (0)
+
+static struct hash_table *fds_to_bo_sizes = NULL;
+
+static inline struct hash_table*
+bo_size_table(int fd)
+{
+   struct hash_entry *e = _mesa_hash_table_search(fds_to_bo_sizes,
+  (void*)(uintptr_t)fd);
+   return e ? ((struct refcnt_hash_table*)e->data)->t : NULL; }
+
+static inline uint64_t
+bo_size(int fd, uint32_t handle)
+{
+   struct hash_table *t = bo_size_table(fd);
+   if (!t)
+  return UINT64_MAX;
+   struct hash_entry *e = _mesa_hash_table_search(t, (void*)(uintptr_t)handle);
+   return e ? (uint64_t)e->data : UINT64_MAX; }
+
+static inline bool
+is_dr

Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround

2018-02-13 Thread Rogovin, Kevin
Hi,

 The 3rd patch, "i965: use ASTC5x5 workaround in 
intel_miptree_texture_aux_usage has issues:
  
   1. Definitely: brw_draw lacks the call to gen9_astc5x5_perform_wa() which 
generates the needed flush between batchbuffers

   2.  Uneasy: I am nervous about hitting intel_miptree_texture_aux_usage() as 
it is used in lots of places directly and indirectly. The interaction with 
blorp resolve makes me very uneasy  I would rather restore the bool 
argument disable_aux  to intel_miptree_prepare_render() to keep the ASTC evil a 
little more localized (the function is used only in brw_draw.c for resolving 
inputs).

-Kevin


-Original Message-
From: Palli, Tapani 
Sent: Monday, February 12, 2018 10:14 AM
To: Rogovin, Kevin ; Jason Ekstrand 

Cc: ML mesa-dev 
Subject: Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround



On 02/12/2018 09:44 AM, Tapani Pälli wrote:
> Hi;
> 
> On 02/08/2018 09:50 AM, Rogovin, Kevin wrote:
>> Hi,
>>
>> I gave it a whirl of setting the .mocs field set to 0 passed to
>> isl_surf_fill_state() ALWAYS. Sadly CarChase GLES continued to hang 
>> (where as the GL did not because it does not use ASTC). This makes 
>> sense since MOCS (atleast last time I looked at it) only really 
>> controls cache usage for L3 and eLLC (please anyone correct me if I 
>> am wrong in this) whereas the issue is that the samplers mess up how 
>> they deal with its own (private) cache.
>>
>> It really is nasty that it appears (as of now) that this complicated 
>> work around is needed and needs to somehow be re-implemented in anv 
>> as well.
> 
> It seems surrounding code has changed so that these patches need some 
> changes. Kevin, are you planning to rebase/refactor these changes?

FYI I've rebased the patches and did additional porting (because of commit 
df13588d21) here:

https://cgit.freedesktop.org/~tpalli/mesa/log/?h=astc5x5

Let me know if this looks OK for you.

> 
>> -Kevin
>>
>> *From:*Jason Ekstrand [mailto:ja...@jlekstrand.net]
>> *Sent:* Thursday, February 8, 2018 2:47 AM
>> *To:* Rogovin, Kevin 
>> *Cc:* ML mesa-dev 
>> *Subject:* Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround
>>
>> Random thought:
>>
>> Nanley and I were talking about this just now and I was complaining 
>> about how much I hate the fact that this workaround exists because we 
>> can't implement it in Vulkan.  Then I got an idea.  What would happen 
>> if we just set MOCS to zero (uncached) for ASTC 5x5 textures?  Does 
>> that make the hang go away?  How bad is the car chase performance 
>> with that compared to this series?  It's a bit of a big hammer but 
>> has the advantage of simplicity.  If it causes performance to tank on 
>> anything then then the more complex solution is probably worth it but 
>> I thought it was worth a try.
>>
>> --Jason
>>
>> On Thu, Dec 14, 2017 at 9:39 AM, > <mailto:kevin.rogo...@intel.com>> wrote:
>>
>>     From: Kevin Rogovin >     <mailto:kevin.rogo...@intel.com>>
>>
>>     This patch series implements a needed workaround for Gen9 for 
>> ASTC5x5
>>     sampler reads. The crux of the work around is to make sure that 
>> the
>>     sampler does not read an ASTC5x5 texture and a surface with an 
>> auxilary
>>     buffer without having a texture cache invalidate and command 
>> streamer
>>     stall between such accesses.
>>
>>     With this patch series applied to the (current) master branch of 
>> mesa,
>>     carchase works on my SKL GT4.
>>
>>     v2:
>>    Rename workaround functions from brw_ to gen9_
>>    (suggested/requested by Topi Pohjolainen).
>>
>>    Place texture resolve to avoid using auxilary surface
>>    when ASTC5x5 is detected in brw_predraw_resolve_inputs()
>>    instead of another detected function; doing so allows
>>    one to avoid walking the textures again.
>>    (suggested/requested by Topi Pohjolainen).
>>
>>    Emit command streamer stall in addition to texture
>>    invalidate.
>>    (original short-coming caught by Jason Ekstrand)
>>
>>    Place workaround function in (new) dedicated file.
>>
>>    Minor path re-ordering to accomodate changes.
>>
>>     Kevin Rogovin (5):
>>    i965: define astx5x5 workaround infrastructure
>>    i965: set ASTC5x5 workaround texture type tracking on texture
>>     validate
>>    i965: use ASTC5x5 workaround in brw_draw
>>    i965: use ASTC5x5 workaround in brw_compute
>>    i965

Re: [Mesa-dev] [PATCH v2] intel/tools: new intel_sanitize_gpu tool

2018-02-08 Thread Rogovin, Kevin
HI,

  Review comments below.

-Original Message-
From: Phillips, Scott D 
Sent: Thursday, February 8, 2018 2:19 AM
To: mesa-dev@lists.freedesktop.org; Rogovin, Kevin 
Subject: [PATCH v2] intel/tools: new intel_sanitize_gpu tool

From: Kevin Rogovin 

Adds a new debug tool to pad each GEM BO allocated with (weak) pseudo-random 
noise values which are then checked after each batchbuffer dispatch to the 
kernel. This can be quite valuable to find diffucult to track down heisenberg 
style bugs.

[scott.d.phill...@intel.com: split to separate tool]

v2: (by Scott D Phillips)
- track gem handles per fd (Kevin)
- remove handles on GEM_CLOSE (Kevin)
- ignore prime handles
- meson & shell script
---
 src/intel/tools/intel_sanitize_gpu.c  | 399 ++
 src/intel/tools/intel_sanitize_gpu.in |   4 +
 src/intel/tools/meson.build   |  25 +++
 3 files changed, 428 insertions(+)
 create mode 100644 src/intel/tools/intel_sanitize_gpu.c
 create mode 100755 src/intel/tools/intel_sanitize_gpu.in

diff --git a/src/intel/tools/intel_sanitize_gpu.c 
b/src/intel/tools/intel_sanitize_gpu.c
new file mode 100644
index 000..84e9196da31
--- /dev/null
+++ b/src/intel/tools/intel_sanitize_gpu.c
@@ -0,0 +1,399 @@
+/*
+ * Copyright © 2015-2018 Intel Corporation
+ *


+static int
+prime_fd(int fd, struct drm_prime_handle *handle) {
+   int ret = libc_ioctl(fd, DRM_IOCTL_PRIME_FD_TO_HANDLE, handle);
+   if (ret != 0)
+  return ret;
+
+   /*
+* Can't pad prime buffers, mark it as size=UINT64_MAX and we'll
+* skip it in verification.
+*/
+   _mesa_hash_table_insert(bo_size_table(fd), (void*)(uintptr_t)handle->handle,
+   (void*)(uintptr_t)UINT64_MAX);
+
+   return 0;
+}

Rather than tracking such GEM handles, just ignore them entirely since one 
cannot pad
them at all. This is the same story as for USER_PTR GEM BO's as well.

+
+__attribute__ ((visibility ("default"))) int open(const char *path, int 
+flags, ...) {
+   va_list args;
+   mode_t mode;
+
+   va_start(args, flags);
+   mode = va_arg(args, int);
+   va_end(args);
+
+   int fd = libc_open(path, flags, mode);
+
+   if (fd >= 0 && strcmp(path, "/dev/dri/renderD128") == 0)
+  add_drm_fd(fd);
+
+   return fd;
+}
+
+__attribute__ ((visibility ("default"), alias ("open"))) int 
+open64(const char *path, int flags, ...);
+
+__attribute__ ((visibility ("default"))) int close(int fd) {
+   if (is_drm_fd(fd))
+  del_drm_fd(fd);
+
+   return libc_close(fd);
+}
+
+__attribute__ ((visibility ("default"))) int fcntl(int fd, int cmd, 
+...) {
+   va_list args;
+   int param;
+
+   va_start(args, cmd);
+   param = va_arg(args, int);
+   va_end(args);
+
+   int res = libc_fcntl(fd, cmd, param);
+
+   if (is_drm_fd(fd) && cmd == F_DUPFD_CLOEXEC)
+  dup_drm_fd(fd, res);
+
+   return res;
+}
+
+__attribute__ ((visibility ("default"))) int ioctl(int fd, unsigned 
+long request, ...) {
+   va_list args;
+   void *argp;
+   struct stat buf;
+
+   va_start(args, request);
+   argp = va_arg(args, void *);
+   va_end(args);
+
+   if (_IOC_TYPE(request) == DRM_IOCTL_BASE &&
+   !is_drm_fd(fd) && fstat(fd, &buf) == 0 &&
+   (buf.st_mode & S_IFMT) == S_IFCHR && major(buf.st_rdev) == DRM_MAJOR) {
+  intel_loge("missed drm fd %d", fd);
+  add_drm_fd(fd);
+   }
+
+   if (is_drm_fd(fd)) {
+  switch (request) {
+  case DRM_IOCTL_GEM_CLOSE:
+ return gem_close(fd, (struct drm_gem_close*)argp);
+
+  case DRM_IOCTL_PRIME_FD_TO_HANDLE:
+ return prime_fd(fd, (struct drm_prime_handle*)argp);
+
+  case DRM_IOCTL_I915_GEM_CREATE:
+ return create_with_padding(fd, (struct 
+ drm_i915_gem_create*)argp);
+
+  case DRM_IOCTL_I915_GEM_EXECBUFFER2:
+  case DRM_IOCTL_I915_GEM_EXECBUFFER2_WR:
+ return exec_and_check_padding(fd, request,
+   (struct 
+ drm_i915_gem_execbuffer2*)argp);
+
+  default:
+ break;
+  }
+   }
+   return libc_ioctl(fd, request, argp); }
+

One needs to mutex lock the internal structures of the created .so to correctly 
handle simultaneous ioctl's coming from multi-threaded applications. I suggest 
putting that common mutex lock in the functions open(), close() and ioctl().


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround

2018-02-07 Thread Rogovin, Kevin
Hi,

I gave it a whirl of setting the .mocs field set to 0 passed to 
isl_surf_fill_state() ALWAYS. Sadly CarChase GLES continued to hang (where as 
the GL did not because it does not use ASTC). This makes sense since MOCS 
(atleast last time I looked at it) only really controls cache usage for L3 and 
eLLC (please anyone correct me if I am wrong in this) whereas the issue is that 
the samplers mess up how they deal with its own (private) cache.

It really is nasty that it appears (as of now) that this complicated work 
around is needed and needs to somehow be re-implemented in anv as well.

-Kevin

From: Jason Ekstrand [mailto:ja...@jlekstrand.net]
Sent: Thursday, February 8, 2018 2:47 AM
To: Rogovin, Kevin 
Cc: ML mesa-dev 
Subject: Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround

Random thought:
Nanley and I were talking about this just now and I was complaining about how 
much I hate the fact that this workaround exists because we can't implement it 
in Vulkan.  Then I got an idea.  What would happen if we just set MOCS to zero 
(uncached) for ASTC 5x5 textures?  Does that make the hang go away?  How bad is 
the car chase performance with that compared to this series?  It's a bit of a 
big hammer but has the advantage of simplicity.  If it causes performance to 
tank on anything then then the more complex solution is probably worth it but I 
thought it was worth a try.
--Jason

On Thu, Dec 14, 2017 at 9:39 AM, 
mailto:kevin.rogo...@intel.com>> wrote:
From: Kevin Rogovin mailto:kevin.rogo...@intel.com>>

This patch series implements a needed workaround for Gen9 for ASTC5x5
sampler reads. The crux of the work around is to make sure that the
sampler does not read an ASTC5x5 texture and a surface with an auxilary
buffer without having a texture cache invalidate and command streamer
stall between such accesses.

With this patch series applied to the (current) master branch of mesa,
carchase works on my SKL GT4.

v2:
  Rename workaround functions from brw_ to gen9_
  (suggested/requested by Topi Pohjolainen).

  Place texture resolve to avoid using auxilary surface
  when ASTC5x5 is detected in brw_predraw_resolve_inputs()
  instead of another detected function; doing so allows
  one to avoid walking the textures again.
  (suggested/requested by Topi Pohjolainen).

  Emit command streamer stall in addition to texture
  invalidate.
  (original short-coming caught by Jason Ekstrand)

  Place workaround function in (new) dedicated file.

  Minor path re-ordering to accomodate changes.

Kevin Rogovin (5):
  i965: define astx5x5 workaround infrastructure
  i965: set ASTC5x5 workaround texture type tracking on texture validate
  i965: use ASTC5x5 workaround in brw_draw
  i965: use ASTC5x5 workaround in brw_compute
  i965: ASTC5x5 workaround logic for blorp

 src/mesa/drivers/dri/i965/Makefile.sources   |  1 +
 src/mesa/drivers/dri/i965/brw_compute.c  |  6 
 src/mesa/drivers/dri/i965/brw_context.c  |  6 
 src/mesa/drivers/dri/i965/brw_context.h  | 24 
 src/mesa/drivers/dri/i965/brw_draw.c | 16 +--
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c |  5 
 src/mesa/drivers/dri/i965/gen9_astc5x5_wa.c  | 36 
 src/mesa/drivers/dri/i965/genX_blorp_exec.c  |  5 
 src/mesa/drivers/dri/i965/intel_batchbuffer.c|  1 +
 src/mesa/drivers/dri/i965/intel_tex_image.c  | 16 ---
 src/mesa/drivers/dri/i965/intel_tex_validate.c   | 13 +
 src/mesa/drivers/dri/i965/meson.build|  1 +
 12 files changed, 124 insertions(+), 6 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/gen9_astc5x5_wa.c

--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org<mailto:mesa-dev@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH RFC] intel/tools: new intel_sanitize_gpu tool

2018-02-07 Thread Rogovin, Kevin
Hi,

 Two comments on the code you sent:
  
  1. the map to find the size of a given GEM needs to be keyed by (fd, 
gem_handle) and not just the gem_handle; if an application (though very 
unlikely) has multiple fd's to the GPU (for example if an application made 
multiple connections to X, or if the application is using OpenGL and OpenCL (or 
libva) ).

 2. I would also handle the destroy GEM ioctl to remove its entry from the map.

-Kevin

-Original Message-
From: Phillips, Scott D 
Sent: Wednesday, February 7, 2018 2:35 AM
To: mesa-dev@lists.freedesktop.org; Rogovin, Kevin 
Subject: [PATCH RFC] intel/tools: new intel_sanitize_gpu tool

From: Kevin Rogovin 

Adds a new debug tool to pad each GEM BO allocated with (weak) pseudo-random 
noise values which are then checked after each batchbuffer dispatch to the 
kernel. This can be quite valuable to find diffucult to track down heisenberg 
style bugs.

[scott.d.phill...@intel.com: split to separate tool]
---
Hi Kevin, here's another take on your out-of-bounds write detection code, 
packaged as a separate tool instead of in the driver code. Doing it this way 
could make it easier to use across i965 and anv (although USERPTR is needed for 
anv support). The build bits of this are less than half baked at the moment. 
What do you think?

 src/intel/tools/intel_sanitize_gpu.c   | 306 +
 src/intel/tools/meson.build|  10 ++
 src/intel/tools/run_intel_sanitize_gpu |   5 +
 3 files changed, 321 insertions(+)
 create mode 100644 src/intel/tools/intel_sanitize_gpu.c
 create mode 100755 src/intel/tools/run_intel_sanitize_gpu

diff --git a/src/intel/tools/intel_sanitize_gpu.c 
b/src/intel/tools/intel_sanitize_gpu.c
new file mode 100644
index 000..d74492bb02e
--- /dev/null
+++ b/src/intel/tools/intel_sanitize_gpu.c
@@ -0,0 +1,306 @@
+/*
+ * Copyright © 2015-2018 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person 
+obtaining a
+ * copy of this software and associated documentation files (the 
+"Software"),
+ * to deal in the Software without restriction, including without 
+limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
+sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom 
+the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the 
+next
+ * paragraph) shall be included in all copies or substantial portions 
+of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
+EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
+MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT 
+SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
+OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
+ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
+DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#undef _FILE_OFFSET_BITS /* prevent #define open open64 */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "util/hash_table.h"
+#include "util/set.h"
+
+#define INTEL_LOG_TAG "INTEL-SANITIZE-GPU"
+#include "common/intel_log.h"
+#include "common/gen_clflush.h"
+
+static int (*libc_open)(const char *pathname, int flags, mode_t mode); 
+static int (*libc_close)(int fd); static int (*libc_ioctl)(int fd, 
+unsigned long request, void *argp); static int (*libc_fcntl)(int fd, 
+int cmd, int param);
+
+#define DRM_MAJOR 226
+
+/* TODO: we want to make sure that the padding forces
+ * the BO to take another page on the (PP)GTT; 4KB
+ * may or may not be the page size for the BO. Indeed,
+ * depending on GPU, kernel version and GEM size, the
+ * page size can be one of 4KB, 64KB or 2M.
+ */
+#define PADDING_SIZE 4096
+
+static struct set *drm_fds = NULL;
+static struct hash_table *bo_sizes = NULL;
+
+#define IS_DRM_FD(fd) (_mesa_set_search(drm_fds,   
\
+(void*)(uintptr_t)(fd)) != NULL)
+#define BO_SIZE(handle) ({ 
\
+   struct hash_entry *e = _mesa_hash_table_search(bo_sizes,
\
+  (void*)(uintptr_t)(handle)); 
\
+   e ? (uintptr_t)e->data : 0; 
\
+})
+
+/* Our goal is not to have noise good enough for cryto,
+ * but instead values that are unique-ish enough that
+ * it is incredibly unlikely that a buffer overwrite
+ * will produce the exact same values.
+ */
+static uint8_t
+next_noise_value(uint8_t prev_noise)
+{
+   uint32_t v = prev_noise;
+   return (v * 103u + 227u) & 

Re: [Mesa-dev] [PATCH RFC] intel/tools: new intel_sanitize_gpu tool

2018-02-07 Thread Rogovin, Kevin
HI Scott,

 I like the idea of making this a pre-loadable .so as it lets it get reused for 
anything using the kernel interface using GEM's. However, there are a few bits, 
that if addressed, would make this perfect IMO:

  1. It needs a way of being able to label the GEMs (brw_bufmgr.c provides 
this). The current code below will just tell us which ioctl had a bad thing 
happen to a GEM BO, but it won't tell us which GEM easily. Admittedly a debug 
session will tell us the handle of the GEM, but then it is quite icky to figure 
what the GEM BO was for (OpenGL BO, scratch BO, etc.)

 2. In an ideal world, the .so will work perfectly with other pre-loadable 
.so's that intercept ioctl as well (for example aub or the batchbuffer logger I 
made); I freely admit that I am a touch fuzzy on the LD_PRELOAD rules, but it 
looks like one cannot use this tool at the same time as any other tool that 
tries to intercept ioctl's (though to be fair, those tools as well don't work 
with others either I think).

However, if one moves away from requiring at as an .so, but instead use it as 
something to allocate (and string-label) GEM BO's, then it could work fine; the 
interface would be something like:

uint32_t //allocate with padding
intel_allocate_gem(int fd, uint64_t size, const char *gem_label);

uint32_t //delete gem created with intel_allocate_gem()
intel_delete_gem_bo(int fd, uint32_t gem_handle);

void //needed when GEM BO's are reused for different purposes
intel_change_gem_label(int fd, uint32_t gem_handle, const char *new_gem_label);

bool //returns true if gem padding is good OR if gem was not created with 
intel_allocate_gem()
intel_gem_padding_good(int fd, uint32_t gem_handle);

void
intel_execbuffer2(int fd, struct drm_i915_gem_execbuffer2 *p); 

void
intel_execbuffer2_wr(int fd, struct drm_i915_gem_execbuffer2 *p);

The implementation would have a map keyed by the pair (fd, gem_handle) with 
value of size of GEM and padding size. If the debug flag for checking is down, 
then all the lookups and necessary locking is skipped so the thing would have 
(almost) no overhead when the option was not active.

One question that pops into my mind: is it possible to use brw_bufmgr of i965 
in anv? If so, then we get reuse with anv for free.

-Kevin

-Original Message-
From: Phillips, Scott D 
Sent: Wednesday, February 7, 2018 2:35 AM
To: mesa-dev@lists.freedesktop.org; Rogovin, Kevin 
Subject: [PATCH RFC] intel/tools: new intel_sanitize_gpu tool

From: Kevin Rogovin 

Adds a new debug tool to pad each GEM BO allocated with (weak) pseudo-random 
noise values which are then checked after each batchbuffer dispatch to the 
kernel. This can be quite valuable to find diffucult to track down heisenberg 
style bugs.

[scott.d.phill...@intel.com: split to separate tool]
---
Hi Kevin, here's another take on your out-of-bounds write detection code, 
packaged as a separate tool instead of in the driver code. Doing it this way 
could make it easier to use across i965 and anv (although USERPTR is needed for 
anv support). The build bits of this are less than half baked at the moment. 
What do you think?

 src/intel/tools/intel_sanitize_gpu.c   | 306 +
 src/intel/tools/meson.build|  10 ++
 src/intel/tools/run_intel_sanitize_gpu |   5 +
 3 files changed, 321 insertions(+)
 create mode 100644 src/intel/tools/intel_sanitize_gpu.c
 create mode 100755 src/intel/tools/run_intel_sanitize_gpu

diff --git a/src/intel/tools/intel_sanitize_gpu.c 
b/src/intel/tools/intel_sanitize_gpu.c
new file mode 100644
index 000..d74492bb02e
--- /dev/null
+++ b/src/intel/tools/intel_sanitize_gpu.c
@@ -0,0 +1,306 @@
+/*
+ * Copyright © 2015-2018 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person 
+obtaining a
+ * copy of this software and associated documentation files (the 
+"Software"),
+ * to deal in the Software without restriction, including without 
+limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
+sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom 
+the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the 
+next
+ * paragraph) shall be included in all copies or substantial portions 
+of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
+EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
+MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT 
+SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
+OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
+ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
+DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#undef _FILE_OFFSET_BITS /* prev

Re: [Mesa-dev] [PATCH v3 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for each bo used in batchbuffer is correct

2018-01-29 Thread Rogovin, Kevin
Hi,

Thanks, so the items that need to be fixed are:

 Patch 1: use the better name for the macro value to better match the string
 Patch 2: either use pread/pwrite for both set and check noise or use map for 
both (I will use map)
 Patch 3: fine as is.

Apparently, the mesa-dev archive is acting like /dev/null again as the patch 
series, your review and discussion have disappeared.

I will post a v4 shortly unless there are any additional shortcomings that need 
to be addressed.

-Kevin

-Original Message-
From: Jason Ekstrand [mailto:ja...@jlekstrand.net] 
Sent: Monday, January 29, 2018 6:41 PM
To: Rogovin, Kevin 
Subject: RE: [PATCH v3 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that 
noise padding for each bo used in batchbuffer is correct

Nope. That one looked fine as-is.


On January 28, 2018 23:13:40 "Rogovin, Kevin"  wrote:

> Any comments/review for Patch 3?
>
> -Original Message-----
> From: Rogovin, Kevin
> Sent: Friday, January 26, 2018 10:56 AM
> To: mesa-dev@lists.freedesktop.org
> Cc: Rogovin, Kevin 
> Subject: [PATCH v3 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, check 
> that noise padding for each bo used in batchbuffer is correct
>
> From: Kevin Rogovin 
>
> Signed-off-by: Kevin Rogovin 
> ---
>  src/mesa/drivers/dri/i965/intel_batchbuffer.c | 22 
> +-
>  1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> index 02bfd3f333..fc6998a7ca 100644
> --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> @@ -1019,11 +1019,31 @@ _intel_batchbuffer_flush_fence(struct 
> brw_context *brw,
>
> ret = submit_batch(brw, in_fence_fd, out_fence_fd);
>
> -   if (unlikely(INTEL_DEBUG & DEBUG_SYNC)) {
> +   if (unlikely(INTEL_DEBUG & (DEBUG_SYNC | DEBUG_OUT_OF_BOUND_CHK))) 
> + {
>fprintf(stderr, "waiting for idle\n");
>brw_bo_wait_rendering(brw->batch.batch.bo);
> }
>
> +   if (unlikely(INTEL_DEBUG & DEBUG_OUT_OF_BOUND_CHK)) {
> +  bool detected_out_of_bounds_write = false;
> +
> +  for (int i = 0; i < brw->batch.exec_count; i++) {
> + struct brw_bo *bo = brw->batch.exec_bos[i];
> +
> + if (!brw_bo_padding_is_good(bo)) {
> +detected_out_of_bounds_write = true;
> +fprintf(stderr,
> +"Detected buffer out-of-bounds write from brw_bo %p "
> +"(GEM %u, label = \"%s\")\n",
> +bo, bo->gem_handle, bo->name);
> + }
> +  }
> +
> +  if (unlikely(detected_out_of_bounds_write)) {
> + abort();
> +  }
> +   }
> +
> /* Start a new batch buffer. */
> brw_new_batch(brw);
>
> --
> 2.15.1
>


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 1/3] intel/common:add debug flag for adding and checking padding on BO's

2018-01-26 Thread Rogovin, Kevin
Sure, I can change the bit flag name to DEBUG_CHECK_OOB;


From: Jason Ekstrand [mailto:ja...@jlekstrand.net]
Sent: Friday, January 26, 2018 12:11 PM
To: Rogovin, Kevin 
Cc: ML mesa-dev 
Subject: Re: [Mesa-dev] [PATCH v3 1/3] intel/common:add debug flag for adding 
and checking padding on BO's

On Fri, Jan 26, 2018 at 12:56 AM, 
mailto:kevin.rogo...@intel.com>> wrote:
From: Kevin Rogovin mailto:kevin.rogo...@intel.com>>

Signed-off-by: Kevin Rogovin 
mailto:kevin.rogo...@intel.com>>
---
 src/intel/common/gen_debug.c | 1 +
 src/intel/common/gen_debug.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/intel/common/gen_debug.c b/src/intel/common/gen_debug.c
index a978f2f581..2154b23756 100644
--- a/src/intel/common/gen_debug.c
+++ b/src/intel/common/gen_debug.c
@@ -85,6 +85,7 @@ static const struct debug_control debug_control[] = {
{ "nohiz",   DEBUG_NO_HIZ },
{ "color",   DEBUG_COLOR },
{ "reemit",  DEBUG_REEMIT },
+   { "check_oob",   DEBUG_OUT_OF_BOUND_CHK },

Maybe call it DEBUG_CHECK_OOB to match the string?

{ NULL,0 }
 };

diff --git a/src/intel/common/gen_debug.h b/src/intel/common/gen_debug.h
index da5b5a569d..92fc68b494 100644
--- a/src/intel/common/gen_debug.h
+++ b/src/intel/common/gen_debug.h
@@ -83,6 +83,7 @@ extern uint64_t INTEL_DEBUG;
 #define DEBUG_NO_HIZ  (1ull << 39)
 #define DEBUG_COLOR   (1ull << 40)
 #define DEBUG_REEMIT  (1ull << 41)
+#define DEBUG_OUT_OF_BOUND_CHK(1ull << 42)

 #ifdef HAVE_ANDROID_PLATFORM
 #define LOG_TAG "INTEL-MESA"
--
2.15.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org<mailto:mesa-dev@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 2/3] i965: add noise padding to buffer object and function to check if noise is correct

2018-01-26 Thread Rogovin, Kevin
Hi,

The main reason I went with map and invalidate is because the kernel on pread 
will only wait on execbuffer2 calls that declare they are going to write to the 
given GEM;  there is the off-chance that a wild write might hit the padding of 
a GEM and the function contract is to check the padding. This is following the 
review comments from Chris Wilson.

-Kevin


From: Jason Ekstrand [mailto:ja...@jlekstrand.net]
Sent: Friday, January 26, 2018 12:13 PM
To: Rogovin, Kevin 
Cc: ML mesa-dev 
Subject: Re: [Mesa-dev] [PATCH v3 2/3] i965: add noise padding to buffer object 
and function to check if noise is correct

On Fri, Jan 26, 2018 at 12:56 AM, 
mailto:kevin.rogo...@intel.com>> wrote:
From: Kevin Rogovin mailto:kevin.rogo...@intel.com>>

Signed-off-by: Kevin Rogovin 
mailto:kevin.rogo...@intel.com>>
Reviewed-by: Chris Wilson 
mailto:ch...@chris-wilson.co.uk>>
---
 src/mesa/drivers/dri/i965/brw_bufmgr.c | 115 -
 src/mesa/drivers/dri/i965/brw_bufmgr.h |  13 
 2 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c 
b/src/mesa/drivers/dri/i965/brw_bufmgr.c
index fb180289a0..7e50ba512e 100644
--- a/src/mesa/drivers/dri/i965/brw_bufmgr.c
+++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c
@@ -220,6 +220,39 @@ bucket_for_size(struct brw_bufmgr *bufmgr, uint64_t size)
   &bufmgr->cache_bucket[index] : NULL;
 }

+/* Our goal is not to have noise good enough for cryto,
+ * but instead values that are unique-ish enough that
+ * it is incredibly unlikely that a buffer overwrite
+ * will produce the exact same values.
+ */
+static uint8_t
+next_noise_value(uint8_t prev_noise)
+{
+   uint32_t v = prev_noise;
+   return (v * 103u + 227u) & 0xFF;
+}
+
+static void
+fill_noise_buffer(uint8_t *dst, uint8_t start, uint32_t length)
+{
+   for(uint32_t i = 0; i < length; ++i) {
+  dst[i] = start;
+  start = next_noise_value(start);
+   }
+}
+
+static uint8_t*
+generate_noise_buffer(uint8_t start, uint32_t length)
+{
+   uint8_t *return_value;
+   return_value = malloc(length);
+   if (return_value) {
+  fill_noise_buffer(return_value, start, length);
+   }
+
+   return return_value;
+}
+
 int
 brw_bo_busy(struct brw_bo *bo)
 {
@@ -367,7 +400,18 @@ retry:
   bo->size = bo_size;
   bo->idle = true;

-  struct drm_i915_gem_create create = { .size = bo_size };
+  bo->padding_size = 0;
+  if (unlikely(INTEL_DEBUG & DEBUG_OUT_OF_BOUND_CHK)) {
+ /* TODO: we want to make sure that the padding forces
+  * the BO to take another page on the (PP)GTT; 4KB
+  * may or may not be the page size for the GEM. Indeed,
+  * depending on generation, kernel version and GEM size,
+  * the page size can be one of 4KB, 64KB or 2M.
+  */
+ bo->padding_size = 4096;
+  }
+
+  struct drm_i915_gem_create create = { .size = bo_size + bo->padding_size 
};

   /* All new BOs we get from the kernel are zeroed, so we don't need to
* worry about that here.
@@ -378,6 +422,27 @@ retry:
  goto err;
   }

+  if (unlikely(bo->padding_size > 0)) {
+ uint8_t *noise_values;
+ struct drm_i915_gem_pwrite pwrite;
+
+ noise_values = generate_noise_buffer(create.handle, bo->padding_size);
+ if (!noise_values)
+goto err_free;
+
+ pwrite.handle = create.handle;
+ pwrite.pad = 0;
+ pwrite.offset = bo_size;
+ pwrite.size = bo->padding_size;
+ pwrite.data_ptr = (__u64) (uintptr_t) noise_values;
+
+ ret = drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_PWRITE, &pwrite);
+ free(noise_values);
+
+ if (ret != 0)
+goto err_free;
+  }
+
   bo->gem_handle = create.handle;

   bo->bufmgr = bufmgr;
@@ -424,6 +489,53 @@ err:
return NULL;
 }

+bool
+brw_bo_padding_is_good(struct brw_bo *bo)
+{
+   if (bo->padding_size > 0) {
+  struct drm_i915_gem_mmap mmap_arg = {
+ .handle = bo->gem_handle,
+ .offset = bo->size,
+ .size = bo->padding_size,
+ .flags = 0,
+  };
+  uint8_t *mapped;
+  int ret;
+  uint8_t expected_value;
+
+  /* We cannot use brw_bo_map() because it maps precisely the range
+   * [0, bo->size) and we wish to map the range of the padding which
+   * is [bo->size, bo->size + bo->pading_size)
+   */
+  ret = drmIoctl(bo->bufmgr->fd, DRM_IOCTL_I915_GEM_MMAP, &mmap_arg);
+  if (ret != 0) {
+ DBG("Unable to map mapping buffer %d (%s) buffer for pad checking.\n",
+ bo->gem_handle, bo->name);
+ return false;
+  }
+
+  mapped = (uint8_t*) (uintptr_t) mmap_arg.addr_ptr;
+  /* bah-humbug, we need to see the latest contents and
+   * if the bo is not cache coherent we likely need to
+   * inva

Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround

2017-12-18 Thread Rogovin, Kevin
Hi,

 I gave it a try by modifying isl_genX(surf_fill_state_s) in 
src/intel/isl/isl_surface_state.c where I set SamplerL2BypassModeDisable ALWAYS 
as true for GEN9; sadly car chase continued to hang.

-Kevin

-Original Message-
From: Nanley Chery [mailto:nanleych...@gmail.com] 
Sent: Friday, December 15, 2017 8:34 PM
To: Rogovin, Kevin 
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH v2 0/5] i965: ASTC5x5 workaround

On Thu, Dec 14, 2017 at 07:39:46PM +0200, kevin.rogo...@intel.com wrote:
> From: Kevin Rogovin 
> 
> This patch series implements a needed workaround for Gen9 for ASTC5x5 
> sampler reads. The crux of the work around is to make sure that the 
> sampler does not read an ASTC5x5 texture and a surface with an 
> auxilary buffer without having a texture cache invalidate and command 
> streamer stall between such accesses.
> 

This workaround sounds like it deals with the same types of surfaces dealt with 
in the RENDER_SURFACE_STATE field, Sampler L2 Out of Order Mode Disable (or 
SamplerL2BypassModeDisable in our driver).

Here's the programming note from the SKL PRM on this field:
* This bit must be set for the following surface types:
  BC2_UNORM BC3_UNORM BC5_UNORM BC5_SNORM BC7_UNORM
* This bit must be set for surfaces which contain a HiZ auxilliary surface
  if other surfaces using AUX_CCS_E or AUX_CCS_D auxiliary surface state
  (lossless color compression) are being sampled at the same time.

Have we tried setting this bit for ASTC_5x5 textures?

-Nanley

> With this patch series applied to the (current) master branch of mesa, 
> carchase works on my SKL GT4.
> 
> v2:
>   Rename workaround functions from brw_ to gen9_
>   (suggested/requested by Topi Pohjolainen).
> 
>   Place texture resolve to avoid using auxilary surface
>   when ASTC5x5 is detected in brw_predraw_resolve_inputs()
>   instead of another detected function; doing so allows
>   one to avoid walking the textures again.
>   (suggested/requested by Topi Pohjolainen).
> 
>   Emit command streamer stall in addition to texture
>   invalidate.
>   (original short-coming caught by Jason Ekstrand)
> 
>   Place workaround function in (new) dedicated file.
> 
>   Minor path re-ordering to accomodate changes.
> 
> Kevin Rogovin (5):
>   i965: define astx5x5 workaround infrastructure
>   i965: set ASTC5x5 workaround texture type tracking on texture validate
>   i965: use ASTC5x5 workaround in brw_draw
>   i965: use ASTC5x5 workaround in brw_compute
>   i965: ASTC5x5 workaround logic for blorp
> 
>  src/mesa/drivers/dri/i965/Makefile.sources   |  1 +
>  src/mesa/drivers/dri/i965/brw_compute.c  |  6 
>  src/mesa/drivers/dri/i965/brw_context.c  |  6 
>  src/mesa/drivers/dri/i965/brw_context.h  | 24 
>  src/mesa/drivers/dri/i965/brw_draw.c | 16 +--
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c |  5 
>  src/mesa/drivers/dri/i965/gen9_astc5x5_wa.c  | 36 
> 
>  src/mesa/drivers/dri/i965/genX_blorp_exec.c  |  5 
>  src/mesa/drivers/dri/i965/intel_batchbuffer.c|  1 +
>  src/mesa/drivers/dri/i965/intel_tex_image.c  | 16 ---
>  src/mesa/drivers/dri/i965/intel_tex_validate.c   | 13 +
>  src/mesa/drivers/dri/i965/meson.build|  1 +
>  12 files changed, 124 insertions(+), 6 deletions(-)  create mode 
> 100644 src/mesa/drivers/dri/i965/gen9_astc5x5_wa.c
> 
> --
> 2.7.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for each bo used in batchbuffer is correct

2017-12-13 Thread Rogovin, Kevin
Hi,

> Yes. It's just the accidental writes into the read-only bo that you may miss. 
> Your call, and I have no
> objections if such limitations are described in the comments.

I'll add the comment; since there then less code to read (indeed using 
brw_bo_map will not work because
it does not map the range including the padding). I've made a version at: 

https://github.com/krogueintel/asem/tree/i965-pad-gem-bos-v2-dont-use-pread

which does not use pread, but it looks so much ickier to me.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for each bo used in batchbuffer is correct

2017-12-13 Thread Rogovin, Kevin
Just to make sure of the reason of the original objection to trusting pread 
blindly: 
pread will miss writes IF the GEM BO was not listed in the execbuffer2 ioctl. 

If that is the case, I am ok with missing those writes because those writes 
(should) only
occur if the GPU writes to padding of the BO by bad-luck; this series is 
looking really to
just protect against that buffer B is written to at offset O where offset O is 
larger than
the size of buffer B (the series only checks for writes where O is also less 
than the size plus
the padding).

Detecting the bad-luck writes requires much more pain and suffering:
  1. Scratch contents read/write
  2. For BO's that driver knows are not writeable by GPU to then mirror 
   those BO contents.

(1) is not so bad, but (2) is terribly hideous.

With the goal that the patch series is only looking to detect out-of-bound
writes (up to padding size), I think pread would be fine then.

Thoughts?

-Kevin

P.S. I am already writing it to do mapping and such, but it is more code sadly.

-Original Message-
From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] 
Sent: Wednesday, December 13, 2017 1:23 PM
To: Rogovin, Kevin ; mesa-dev@lists.freedesktop.org
Cc: joonas.lahti...@linux.intel.com
Subject: RE: [Mesa-dev] [PATCH v2 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, 
check that noise padding for each bo used in batchbuffer is correct

Quoting Rogovin, Kevin (2017-12-13 11:19:10)
> Hi,
> 
> > Actually that's not strictly true. Since you only do a pread here, it will 
> > only synchronize against the last declared write to the bo.
> > There's no guaranteed sync with the last batch for a set of read-only bo.
> > Similarly, because of no domain tracking, it won't also ensure that the bo 
> > is cache coherent before the read back.
> 
> Futz. Would then doing brw_bo_map() with flags having value MAP_READ 
> be sufficient then? This is what function
> brw_get_buffer_subdata() of intel_buffer_objects.c does to copy BO contents 
> to application supplied buffer.

Yes. I would also pencil in a note to do a map_range, or at least 
MAP_INCOHERENT flag to avoid the gen_invalidate_range and do the cache 
invalidation manually (for !llc).
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for each bo used in batchbuffer is correct

2017-12-13 Thread Rogovin, Kevin
Hi,

> Actually that's not strictly true. Since you only do a pread here, it will 
> only synchronize against the last declared write to the bo.
> There's no guaranteed sync with the last batch for a set of read-only bo.
> Similarly, because of no domain tracking, it won't also ensure that the bo is 
> cache coherent before the read back.

Futz. Would then doing brw_bo_map() with flags having value MAP_READ be 
sufficient then? This is what function
brw_get_buffer_subdata() of intel_buffer_objects.c does to copy BO contents to 
application supplied buffer.

-Kevin

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for each bo used in batchbuffer is correct

2017-12-12 Thread Rogovin, Kevin
Hi,

 Just got confirmation that kernel does the syncing required to make sure that 
pread values are realiable.

-Kevin

-Original Message-
From: Rogovin, Kevin 
Sent: Wednesday, December 13, 2017 8:19 AM
To: 'Jason Ekstrand' 
Cc: mesa-dev@lists.freedesktop.org; Lahtinen, Joonas 
Subject: RE: [Mesa-dev] [PATCH 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, 
check that noise padding for each bo used in batchbuffer is correct

Hi,

> I think you want to do this at the end of submit_batch instead and add 
> a brw_bo_wait_rendering on the batch. Otherwise, your bounds checking is 
> racing with the GPU.

I remember being told that pread has the kernel do the required waiting, 
however I am not 100% sure of this (which is why I cc'd Joonas to either 
confirm or deny the assertion). Joonas?
 
-Kevin

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] i965: if DEBUG_OUT_OF_BOUND_CHK is up, check that noise padding for each bo used in batchbuffer is correct

2017-12-12 Thread Rogovin, Kevin
Hi,

> I think you want to do this at the end of submit_batch instead and add a 
> brw_bo_wait_rendering on the batch.  
> Otherwise, your bounds checking is racing with the GPU.

I remember being told that pread has the kernel do the required waiting, 
however I am not 100% sure of this (which is why I cc'd Joonas to either 
confirm or deny the assertion). Joonas?
 
-Kevin

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] i965: add noise padding to buffer object and function to check if noise is correct

2017-12-12 Thread Rogovin, Kevin
Hi,

 Thankyou for reading the code and giving advice to improve upon it. Below are 
some thoughts:

> I can't help but think that this could be a bit simpler and involve throwing 
> fewer pointers around.

I was thinking this too; the easiest way to do this is to just have the same 
noise for all the paddings;
that would mean there is just one pointer of data and that would be a private 
member of brw_bufmgr.

> This is 4096.  I think we could just have a single uint32_t padding field 
> which is either 0 or 4096 (More on that later).

If the kernel supports huge pages, though now there might be "two" different 
things as far as page size goes then:
the page size for CPU things and the page size for the PPGTT. I don't know if 
they must be the same or if they can
be different. I also do not know how to actually get the page size used by the 
PPGTT, as getpagesize() is the page
size for the CPU page tabling magic. Any suggestions on how to get the page 
size used for the PPGTT? I think it is
worthwhile to make sure that atleast some of the noise is in the next page, but 
I admit that is just a hunchy thing.

> Does using rand() really help us?  Why not just come up with some hash-like 
> thing which generates consistent pseudo-random data? 
> How about something like "value.[i] = i * 853 + 193"  (some random primes)?  
> That would mean that we can generate the data and check
> it without having to store it in a per-bo temporary.  If you want it to be 
> magic per-bo, you could also seed it somehow with the bo handl
> (just add handle * 607).

I figured that rand() was the most reliable way to generate noise in addition 
to the least amount of code. However, if the padding values
are generated with an internal routine (maybe something like value[i] = 223 * 
value[i - 1] + 123, with value[0] =  handle; all truncated to 8-bits),
that would drop the need completely for the per-buffer storage.


> If we always allocate 4096B of padding, then you don't need to heap allocate 
> it and can just put it on the stack for the purpose of interacting with
 > pread/pwrite.  It's a bit big but still perfectly reasonable.  If a day came 
 > when we wanted to make the padding size adjustable, it would still 
> probably be reasonable to make the heap allocations temporary so we have less 
> random stuff in the BO we have to cleanup.

I was tempted to make it stack allocated, but the 4096 size scared me off... 
and when I thought of huge page support of 2M I ran screaming.
 
> There's a part of me that wants to kill pread/write.  However, I think you 
> may have come up with the only good use of it I've ever seen. :-)

I was told that one of the advantages of pread is the kernel will then do the 
syncing magic for you, i.e. waiting for the GPU to be done with the
buffer;  I freely admit that now I am not 100% sure of this.
 
> If we still keep these heap allocations, deleting them should be keyed off of 
> bo->padding.size or nothing at all.

That is how I originally wrote it, but to handle the case where creating a 
brw_bo fails midway (i.e. after GEM create ioctl, but during the tmp buffer 
allocation), checking the existence by .size > 0 was not going to work. 
However, for everywhere else it is fine.
 
At this point I am tempted to do the following for the noise padding:
  1. take your suggestion and make the noise per brw_bo, but the noise is 
generated with an incremental chaotic function that uses GEM-handle value as a 
start
  2. have a single integer in the brw_bo struct indicating the amount of noise 
padding it has;
  3. for checking use that single integer to heap-allocate a temporary buffer 
to store the pread contents OR have the necessary syncing operations and use 
the mapping pointer to read the values. The latter is more tempting I admit 
though.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965: compute scratch space size correctly for Gen9

2017-12-12 Thread Rogovin, Kevin
Glad that this helped. The main lead for fixing the bug I got from using the 
patch series posted earlier this week " GEM BO padding to find OOB buffer 
writes" (URL: 
https://lists.freedesktop.org/archives/mesa-dev/2017-December/179658.html). I 
am hoping that that patch series can get reviewed and land in Mesa so that 
hunting for a certain classes of Heisenberg bugs can be less Heisenberg like.

Best Regards,
-Kevin

-Original Message-
From: Kenneth Graunke [mailto:kenn...@whitecape.org] 
Sent: Tuesday, December 12, 2017 9:09 PM
To: mesa-dev@lists.freedesktop.org
Cc: Rogovin, Kevin 
Subject: Re: [Mesa-dev] [PATCH 2/2] i965: compute scratch space size correctly 
for Gen9

On Tuesday, December 12, 2017 4:17:27 AM PST kevin.rogo...@intel.com wrote:
> From: Kevin Rogovin 
> 
> Signed-off-by: Kevin Rogovin 
> ---
>  src/mesa/drivers/dri/i965/brw_program.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
> b/src/mesa/drivers/dri/i965/brw_program.c
> index 6aa4100..1ae0aa0 100644
> --- a/src/mesa/drivers/dri/i965/brw_program.c
> +++ b/src/mesa/drivers/dri/i965/brw_program.c
> @@ -368,9 +368,13 @@ brw_alloc_stage_scratch(struct brw_context *brw,
> *
> * According to the other driver team, this applies to compute shaders
> * as well.  This is not currently documented at all.
> +   *
> +   * brw->screen->subslice_total is the TOTAL number of subslices
> +   * and we wish to view that there are 4 subslices per slice
> +   * instead of the actual number of subslices per slice.
> */
>if (devinfo->gen >= 9)
> - subslices = 4;
> + subslices = 4 * brw->screen->devinfo.num_slices;
>  
>/* WaCSScratchSize:hsw
> *
> 

Thank you!  I'd meant to clean up the nonsense in patch 1 a while ago, but I 
guess I got distracted.  Good catch on the bug, too...

First is R-b, and this one gets:

Fixes: 8ecdbb61360 "i965: Pretend there are 4 subslices for compute shader 
threads on Gen9+."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104005
Reviewed-by: Kenneth Graunke 

Both are now pushed:

To ssh://git.freedesktop.org/git/mesa/mesa
   7469966ed2a..b1ce812c514  master -> master

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965: compute scratch space size correctly for Gen9

2017-12-12 Thread Rogovin, Kevin
Ahh futz this was the wrong one!! it should just be

 subslices += 4 * brw->screen->devinfo.num_slices;

Sighs (too many terminals open with too many branches :P )

I need to post a v2; I will post it shortly. 

 -Kevin

-Original Message-
From: Rogovin, Kevin 
Sent: Tuesday, December 12, 2017 12:05 PM
To: mesa-dev@lists.freedesktop.org
Cc: Rogovin, Kevin 
Subject: [PATCH 2/2] i965: compute scratch space size correctly for Gen9

From: Kevin Rogovin 

Signed-off-by: Kevin Rogovin 
---
 src/mesa/drivers/dri/i965/brw_program.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
b/src/mesa/drivers/dri/i965/brw_program.c
index 6aa41009e7..7bfcad9a65 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -368,9 +368,17 @@ brw_alloc_stage_scratch(struct brw_context *brw,
*
* According to the other driver team, this applies to compute shaders
* as well.  This is not currently documented at all.
+   *
+   * brw->screen->subslice_total is the TOTAL number of subslices
+   * and we wish to view that there are 4 subslices per slice
+   * instead of the actaul number of subslices per slice.
*/
-  if (devinfo->gen >= 9)
- subslices = 4;
+  if (devinfo->gen >= 9) {
+ subslices = 0;
+ for (int i = 0; i < brw->screen->devinfo.num_slices; ++i) {
+subslices += 4 * brw->screen->devinfo.num_subslices[i];
+ }
+  }
 
   /* WaCSScratchSize:hsw
*
-- 
2.15.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/2] i965: scratch space fixes

2017-12-12 Thread Rogovin, Kevin

> These patches fix GPU hangs I'm seeing also with the *GL* version of 
> CarChase on KBL GT3e, when using Ubuntu 16.04 kernel.

> NOTE: those hangs don't happen when doing tests with latest drm-tip 
> kernel.  Besides the older Ubuntu kernel, I'm seeing hangs also with the 
> 4.13 drm-tip kernel, but these have happened only since early November 
> version of Mesa.

Chances are it is caused by that the out-of-bounds overwrites in older kernels 
landed in the scratch page; it is essentially just luck of the out-of-bounds 
write lands in scratch or someplace important where corrupted data hangs the 
GPU (for example the memory holding the GEN shaders).

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] i965: correctly program MEDIA_VFE_STATE for compute shading

2017-12-12 Thread Rogovin, Kevin
Just a comment: in truth the MEDIA_VFE_STATE -was- programmed correctly without 
this patch; it turns out that the PerThreadScratchSpace are the first bits in 
the bytes holding the scratch base pointer; those first bits are used by the HW 
(and the GENX pack knows this and accounts for it) to stash state.

In all honesty this patch is not necessary to fix car-chase, the patch is just 
a readability patch.

My apologies for jumping the gun and not checking if the bits for 
PerThreadScratchSpace were of the first bits of the BO for scratch space.

Sighs.

In spite of that it is just a readability patch, I think it should land to aid 
in readability of the code.

 -Kevin

-Original Message-
From: Rogovin, Kevin 
Sent: Tuesday, December 12, 2017 12:05 PM
To: mesa-dev@lists.freedesktop.org
Cc: Rogovin, Kevin 
Subject: [PATCH 1/2] i965: correctly program MEDIA_VFE_STATE for compute shading

From: Kevin Rogovin 

Signed-off-by: Kevin Rogovin 
---
 src/mesa/drivers/dri/i965/genX_state_upload.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c 
b/src/mesa/drivers/dri/i965/genX_state_upload.c
index 04a492539a..50ac5bc59f 100644
--- a/src/mesa/drivers/dri/i965/genX_state_upload.c
+++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
@@ -4183,28 +4183,35 @@ genX(upload_cs_state)(struct brw_context *brw)
 
brw_batch_emit(brw, GENX(MEDIA_VFE_STATE), vfe) {
   if (prog_data->total_scratch) {
- uint32_t bo_offset;
+ uint32_t per_thread_scratch_value;
 
  if (GEN_GEN >= 8) {
 /* Broadwell's Per Thread Scratch Space is in the range [0, 11]
  * where 0 = 1k, 1 = 2k, 2 = 4k, ..., 11 = 2M.
  */
-bo_offset = ffs(stage_state->per_thread_scratch) - 11;
+per_thread_scratch_value = ffs(stage_state->per_thread_scratch) - 
11;
  } else if (GEN_IS_HASWELL) {
 /* Haswell's Per Thread Scratch Space is in the range [0, 10]
  * where 0 = 2k, 1 = 4k, 2 = 8k, ..., 10 = 2M.
  */
-bo_offset = ffs(stage_state->per_thread_scratch) - 12;
+per_thread_scratch_value = ffs(stage_state->per_thread_scratch) - 
12;
  } else {
 /* Earlier platforms use the range [0, 11] to mean [1kB, 12kB]
  * where 0 = 1kB, 1 = 2kB, 2 = 3kB, ..., 11 = 12kB.
  */
-bo_offset = stage_state->per_thread_scratch / 1024 - 1;
+per_thread_scratch_value = stage_state->per_thread_scratch / 1024 
- 1;
  }
- vfe.ScratchSpaceBasePointer =
-rw_bo(stage_state->scratch_bo, bo_offset);
+ vfe.ScratchSpaceBasePointer = rw_bo(stage_state->scratch_bo, 0);
+ vfe.PerThreadScratchSpace = per_thread_scratch_value;
   }
 
+  /* If brw->screen->subslice_total is greater than one, then
+   * devinfo->max_cs_threads stores number of threads per sub-slice;
+   * thus we need to multiply by that number by subslices to get
+   * the actual maximum number of threads; the -1 is because the HW
+   * has a bias of 1 (would not make sense to say the maximum number
+   * of threads is 0).
+   */
   const uint32_t subslices = MAX2(brw->screen->subslice_total, 1);
   vfe.MaximumNumberofThreads = devinfo->max_cs_threads * subslices - 1;
   vfe.NumberofURBEntries = GEN_GEN >= 8 ? 2 : 0;
-- 
2.15.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-06 Thread Rogovin, Kevin


>I'd have to think about it harder but even those are not likely to actually 
>get ASTC decode.  Maybe for some sort of decompression thing but if you're 
>doing
> GetCompressedTexImage, it'll probably turn into a blorp_copy which will use 
> R32G32B32A32_UINT.

I am thinking for the case where an application calls glGetTexImage() or 
glGetTextureImage(), which I think is legal in GL and the GL implementation is 
expected to do the decompress.

-Kevin
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Rogovin, Kevin
Hi,

>This isn't true.  100% of the intel_mipmap_tree -> blorp_surf translations are 
>handled by that function. 
>  It's a perfectly reasonable place to handle these things.  It could also be 
> handled in genX(blorp_exec) if that makes someone more comfortable.

This is where I placed the ASTC enumeration setting, in genX(blorp_exec). I 
added a boolean signaling if the caller to blorp believed it would sample from 
an ASTC, if it did it sets the enum as ASTC otherwise as AUX.

I confess I am not super familiar with blorp, but as far as I can tell, the 
only way for blorp to read an ASTC is for GetTexImage/GetTexSubImage since an 
ASTC surface cannot be used for an FBO.

> The problem is that a single invalidate doesn't actually cause a 
> synchronization point in the rendering operation.  All it does is torch the 
> texture cache and it 
> does so immediately and in parallel with currently active rendering.  If you 
> can't have ASTC5x5 in the texture cache with a CCS_E surface, then we need to
> do a CS stall to ensure that the previous draw is complete before we 
> invalidate.  Otherwise, if the draw with ASTC5x5 is still in-flight, the 
> texture cache will 
> immediately start to get populated with ASTC5x5 data again.

Ahh futz, I forget to add that stall.. by luck the guilty application worked 
anyways (when I first wrote the work I did intel_batchbuffer_flush() and 
relaxed it down to texture invalidate).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Rogovin, Kevin

> Are you saying that this bug extends over hardware context?

Different HW contexts imply different execbuffer2 ioctl's. The kernel inserts a 
full blown flush of everything after (or before, I cannot remember) each 
execbuffer2 call. This way there is context isolation in HW buggineness.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Rogovin, Kevin

> If you take a look at brw_update_texture_surface(), just in the end before
> brw_emit_surface_state() the logic explictly consults for 
> intel_miptree_texture_aux_usage(). 
> This in turn tells if the auxiliary buffer is resolved and it doesn't need to 
> be programmed.

The full stack trace is this:

 brw_update_texture_surface() calls intel_miptree_texture_usage() which for HiZ 
calls the function intel_miptree_sample_with_hiz() which, as the name suggest, 
returns true if and only if it is ok to use the HiZ to speed up depth texture 
reads. Indeed, the function calls intel_miptree_texture_usage() switches on 
intel_mipmap_tree::aux_usage which the documentation states is about the 
intended usage of the auxiliary buffer. Indeed, if the value is 
ISL_AUX_USAGE_NONE that means, quoting the comment tag above the field, "that 
auxiliary compression is permanently disabled". The conclusion then is that 
even if a buffer is fully resolved, the value of aux_usage is not 
ISL_AUX_USAGE_NONE and that the return value of calls 
intel_miptree_texture_usage() gives the return value assuming that the 
auxiliary buffer can be used with respect to that it holds good values enough 
for the sampler.

>This path goes thru brw_blorp_download_miptree() which in turn takes either
> brw_blorp_copy_miptrees() or brw_blorp_blit_miptrees(). Both in turn consult 
> blorp_surf_for_miptree().

If you are 100% sure (because I am not) that ALL blorp paths walk through that, 
then I have no real objection except it needs to do the magic of checking if it 
is reading from an ASTC5x5 or a surface with an auxiliary and update the 
enumeration appropriately.

> This would be a nice a piece of documentation to add into the code. Thanks 
> for explaining.

I can add this, though I do freely admit I take this for granted and an axiom 
of HW accelerated graphics.

>I don't see how the bit mask prevents one from forcing anything.

By not being able to encode such a state (both are present) such a state is 
impossible to store and cannot be reached. From the point of view of code, an 
enum is slightly more pleasant to read than bitmaks IMO.

> I now do see a flaw in the RFC related to this. In 
> brw_predraw_resolve_inputs() one would 
> actually need to smash down the recorded AUX mask bit when it reacts to 
> ASTC5x5 being present.

Indeed, really two passes are needed over the textures. The first pass to 
detect if ASTC5x5 is present and a second to resolve auxiliary using textures 
if they are present.

>Well, if nothing else, I would really like to see you used 
>brw_predraw_resolve_inputs() for the resolves.

I am happy with that as that walks through the textures anyways BUT it is 
called before brw_predraw_resolve_framebuffer() which might do some resolving 
too. The easy way out is to permute their calling order unless there is some 
other dangling assumption preventing permuting the call order of those two.

> 
> -Kevin
>  
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-05 Thread Rogovin, Kevin
Hi,


>> Here are my comments of the patch posted:
>> 
>>  1.  it is essentially replication and moving around of the code of the 
>> patch series posted earlier but missing various
>>   important bits: preventing the sampler from using the auxiliary buffer 
>> (this requires to modify surface state
>>   sent in brw_wm_surfaces.c). It also does not cover blorp sufficiently 
>> (blorp might read from an ASTC5x5
>>   and there are more paths in blorp than blorp_surf_for_miptree() that 
>> sample from surfaces.
>> 

>Can you explain both more in detail? Resolves done in
>brw_predraw_resolve_inputs() mean that there is nothing interesting in the aux 
>buffers and surface setup won't therefore enable auxiliary for texture 
>surfaces.

That there is nothing interesting is irrelevant to the sampler if the 
SURFACE_STATE fed includes the auxiliary buffer, thus when one sets up the 
SURFACE_STATE for sampler, the auxiliary buffer cannot be mentioned in the GPU 
command; The sampler will always try to read the auxiliary buffer if it is 
given the opportunity to do so. Indeed, it is quite feasible that less 
bandwidth is consumed if the sampler is given the chance to read an auxiliary 
buffer in place of the buffer; as such even if the surface is resolved one may 
wish to feed the sampler the auxiliary buffer. Indeed, for HiZ, i965 programs 
to use the HiZ auxiliary buffer even if the depth buffer is fully resolved (see 
inte_mipmap_tree_sample_with_hiz() in intel_mipmap_tree.c).

> In case of blorp, as far as I know all operations sampling something should 
> go thru blorp_surf_for_miptree(). Can you point out cases that don't?

Blorp is used in more than blorp_surf_for_miptree(), for example implementing 
GetTexImage(). Indeed, it is possible for blorp to sample from an ASTC5x5 (you 
can see this handled in the patch series I posted). I chose the hammer that the 
default is to just assume blorp is going to access auxiliary buffers unless a 
flag is set when the caller knows that blorp is going to sample from an astc5x5 
(against see my patch series).

>Right. In the case of sampling both aux and astc5x5 in the same draw cycle the 
>only thing to do is to disable aux. With my question of direction I meant the 
>texture 
> cache flush between two cycles. Do we need to flush in both cases
> 1) ASTC5x5 in first cycle and AUX in the following
> 2) AUX in first cycle and ASTC5x5 in the following

YES we need to flush in both cases. What is happening is that the sampler 
hardware is bugged. Let us suppose it was bugged in only 1 direction, take 1. 
Then if the sampler first samples from an ASTC5x5 then an AUX it would not 
hang, but the other way it would. However, if there are multiple draws in 
flight where one samples from an ASTC5x5 and the other does not, the command 
buffer order gives ZERO guarantee that the sampler will sample in that order 
because fragments get executed out-of-order even across draw calls even within 
a subslice (this is why sendc is needed at end of shader in GEN).

>>  4. With 3 in mind, using the bit-masks is not a good idea as we want to 
>> then enforce at the code level
>>   that only one of the two is possible without texture invalidates.
> Can you elaborate this a little more? It tells if aux is/was used and it 
> tells if astc5x5 is/was used. That is all we need, right?

WRONG. We must enforce that a given draw call can have neither or only one. By 
having bitmasks it is possible to support a state having both.

At any rate, please review the patch series I have posted and I am happy to 
take suggestions to improve that patch series that I have tested.

-Kevin
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

2017-12-04 Thread Rogovin, Kevin
Hi,

 The patch series I have submitted handles the case of needing to resolve 
texture surfaces when a draw (or compute) accesses a texture which is astc5x5. 
As it is quite clear you understand the issue and know the code of i965 the 
patch centers on, you are an excellent person to review the code.

Here are my comments of the patch posted:

 1.  it is essentially replication and moving around of the code of the patch 
series posted earlier but missing various
  important bits: preventing the sampler from using the auxiliary buffer 
(this requires to modify surface state
  sent in brw_wm_surfaces.c). It also does not cover blorp sufficiently 
(blorp might read from an ASTC5x5
  and there are more paths in blorp than blorp_surf_for_miptree() that 
sample from surfaces.

 2.  using the check that the GPU is gen9 (and for that matter, using the name 
gen9_ astc5x5_sampler_wa())
  is not ideal; I would not be surprised that the bug might not be present 
on various re-spins of Gen9 and/or
  it might linger on to future Gens. I do NOT know; using a Boolean 
assigned earlier makes the code more
  future-ish proof.

 3.  the nature of GPU fragment dispatch is going to require a texture 
invalidate even if the sampler only
  have the bug in one direction; this is because a subslice is not 
guaranteed to process fragments in any
  order. The crux is that a single sampler serves an entire sub-slice which 
has more than 1 EU. It is quite
  possible that one EU has threads of a draw call ahead of the other and 
depending on the timing, portions
  of those fragments' coming after might be processed by the sampler of 
those before of those fragments
  coming before in batchbuffer order. Indeed a single EU might have threads 
from separate draws as well.
  A texture invalidate places a barrier in the pipeline preventing the 
mixing (and means that efficiency of 
 GPU drops quite a bit with every texture invalidate sadly). 

 4. With 3 in mind, using the bit-masks is not a good idea as we want to then 
enforce at the code level
  that only one of the two is possible without texture invalidates.

-Kevin


-Original Message-
From: Topi Pohjolainen [mailto:topi.pohjolai...@gmail.com] 
Sent: Monday, December 4, 2017 12:49 PM
To: mesa-dev@lists.freedesktop.org
Cc: Rogovin, Kevin 
Subject: [PATCH] RFC: Workaround for gen9 hw astc5x5 sampler bug

This is just drafting some thoughts and only compile tested.

CC: "Rogovin, Kevin" 
---
 src/mesa/drivers/dri/i965/brw_blorp.c   |  8 +
 src/mesa/drivers/dri/i965/brw_context.h | 10 ++
 src/mesa/drivers/dri/i965/brw_draw.c| 54 -
 3 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
b/src/mesa/drivers/dri/i965/brw_blorp.c
index 680121b6ab..b3f84ab8ca 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.c
+++ b/src/mesa/drivers/dri/i965/brw_blorp.c
@@ -186,11 +186,19 @@ blorp_surf_for_miptree(struct brw_context *brw,
  surf->aux_addr.buffer = mt->hiz_buf->bo;
  surf->aux_addr.offset = mt->hiz_buf->offset;
   }
+
+  if (!is_render_target && brw->screen->devinfo.gen == 9)
+ gen9_astc5x5_sampler_wa(brw, GEN9_ASTC5X5_WA_TEX_TYPE_AUX);
} else {
   surf->aux_addr = (struct blorp_address) {
  .buffer = NULL,
   };
   memset(&surf->clear_color, 0, sizeof(surf->clear_color));
+
+  if (!is_render_target && brw->screen->devinfo.gen == 9 &&
+  (mt->format == MESA_FORMAT_RGBA_ASTC_5x5 ||
+   mt->format == MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5))
+ gen9_astc5x5_sampler_wa(brw, 
+ GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5);
}
assert((surf->aux_usage == ISL_AUX_USAGE_NONE) ==
   (surf->aux_addr.buffer == NULL)); diff --git 
a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 0670483806..44602c23c0 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -165,6 +165,11 @@ enum brw_cache_id {
BRW_MAX_CACHE
 };
 
+enum gen9_astc5x5_wa_tex_type {
+   GEN9_ASTC5X5_WA_TEX_TYPE_ASTC5x5 = 1 << 0,
+   GEN9_ASTC5X5_WA_TEX_TYPE_AUX = 1 << 1,
+};
+
 enum brw_state_id {
/* brw_cache_ids must come first - see brw_program_cache.c */
BRW_STATE_URB_FENCE = BRW_MAX_CACHE, @@ -1262,6 +1267,8 @@ struct 
brw_context
 */
bool draw_aux_buffer_disabled[MAX_DRAW_BUFFERS];
 
+   enum gen9_astc5x5_wa_tex_type gen9_sampler_wa_tex_mask;
+
__DRIcontext *driContext;
struct intel_screen *screen;
 };
@@ -1286,6 +1293,9 @@ void intel_update_renderbuffers(__DRIcontext *context,
 __DRIdrawable *drawable);  void 
intel_prepare_render(struct brw_context *brw);
 
+void gen9_astc5x5_sampler_wa(struct brw_context *brw,
+ 

Re: [Mesa-dev] [PATCH 0/5] i965: ASTC5x5 workaround

2017-12-04 Thread Rogovin, Kevin
Hi,

> 1) do extra tex cache flush when needed and specifically only when needed
> 2) resolve surfaces when undesired combination is about to be sampled

>Latter case could be addressed also with on-the-fly check in 
>brw_predraw_resolve_inputs(). There one goes thru all the active textures for 
>the
> next draw call and resolves when necessary. One could check for the undesired 
> combination of textures there. I understand we would need to
> walk the textures twice, first to check for any occurrences of one type and 
> then for the other. This, however, would fit to the way we resolve
> other texture types and prevent from adding more things to check into the 
> context.

This patch series does handle the second case, as follows:
 a) Checking if there are astc5x5 textures and/or textures with auxiliary is 
done in brw_validate_textures(); this was a really nice place because
  the function loops over all textures anyways
 b) the resolve operation is handled by the added function 
brw_atsc_perform_wa(); this also sets the mode correctly too after the resolve.

I chose to make a dedicated function that does the mode setting and resolve.  
Putting the resolve logic into brw_predraw_resolve_inputs()
is not a bad idea and I am open to it; the interior of the loop would then look 
ickier as it would had the check on each texture unit of if
the workaround is necessary and if there is an astc5x5 sampler handing around. 
It would also kill off a member from the astc5x5_wa
struct that tracks if there are auxiliary textures.

At any rate, I'd appreciate some review for the code so it can land in some 
form shortly.

-Kevin

> 
> 
> Kevin Rogovin (5):
>   i965: define astc5x5 workaround infrastructure
>   i965: ASTC5x5 workaround logic for blorp
>   i965: set ASTC5x5 workaround texture type tracking on texture validate
>   i965: use ASTC5x5 workaround in brw_draw
>   i965: use ASTC5x5 workaround in brw_compute
> 
>  src/mesa/drivers/dri/i965/brw_compute.c  |  6 +++
>  src/mesa/drivers/dri/i965/brw_context.c  | 63 
> 
>  src/mesa/drivers/dri/i965/brw_context.h  | 23 +
>  src/mesa/drivers/dri/i965/brw_draw.c |  6 +++
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c |  5 ++
>  src/mesa/drivers/dri/i965/genX_blorp_exec.c  |  5 ++
>  src/mesa/drivers/dri/i965/intel_batchbuffer.c|  1 +
>  src/mesa/drivers/dri/i965/intel_tex_image.c  | 16 --
>  src/mesa/drivers/dri/i965/intel_tex_validate.c   | 13 +
>  9 files changed, 134 insertions(+), 4 deletions(-)
> 
> --
> 2.14.2
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/5] i965: ASTC5x5 workaround

2017-12-01 Thread Rogovin, Kevin
Hi,

 For ANV I do not know as I have not really poked into its code. For i965, this 
patch series handles the situation as to what to do if a draw of dispatch 
compute accesses both an ASTC5x5 texture and a texture with an auxiliary 
buffer. It does this by checking if there are both such textures and ASTC5x5 
textures in the list of currently bound textures. If the answer is yes, then it 
resolves all such auxiliary requiring textures that use an auxiliary buffer so 
that the sampler does not need them when it reads from the surfaces. The 
resolve stuff is handled in the function brw_astc5x5_perform_wa(() in 
brw_context.c of the first patch, the checking is handled in the 3rd patch by 
modifying brw_tex_validate() in intel_tex_validate.c. The 4'th and 5'th patches 
are deceptively small since all they do is add a call to 
brw_astc5x5_perform_wa(() in brw_draw.c and brw_compute.c respectively. The 4th 
patch also has a small addition to prevent surface state for sampler state to 
have the auxiliary surface given in the call.

As to how to do similar auto-resolve and tweak of state on ANV, I need to dive 
quite deep into the code to see how to do it.

-Kevin

-Original Message-
From: Matt Turner [mailto:matts...@gmail.com] 
Sent: Friday, December 1, 2017 8:25 PM
To: Rogovin, Kevin 
Cc: Ilia Mirkin ; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 0/5] i965: ASTC5x5 workaround

On Fri, Dec 1, 2017 at 10:06 AM, Rogovin, Kevin  wrote:
> Hi,
>
>  Yes ANV will need something like this as well. If the GPU samples from both 
> an ASTC5x5 texture and one with an aux buffer without a texture cache 
> invalidate between such accesses, then the GPU hangs, which in turn makes the 
> system unresponsive for a few seconds until the kernel resets the GPU; then 
> an ioctl will fail in i965 which means things are very bad usually and (for 
> me atleast on my system with how I build mesa) the application crashes.

I think his question is -- is there anything we can do about the case where a 
single shader program samples ASTC5x5 and a texture with an aux buffer? 
Presumably there's no way to invalidate the texture cache during a shader 
program, so what can you do?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/5] i965: ASTC5x5 workaround

2017-12-01 Thread Rogovin, Kevin
Hi,

 Yes ANV will need something like this as well. If the GPU samples from both an 
ASTC5x5 texture and one with an aux buffer without a texture cache invalidate 
between such accesses, then the GPU hangs, which in turn makes the system 
unresponsive for a few seconds until the kernel resets the GPU; then an ioctl 
will fail in i965 which means things are very bad usually and (for me atleast 
on my system with how I build mesa) the application crashes.

 -Kevin

-Original Message-
From: ibmir...@gmail.com [mailto:ibmir...@gmail.com] On Behalf Of Ilia Mirkin
Sent: Friday, December 1, 2017 7:38 PM
To: Rogovin, Kevin 
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 0/5] i965: ASTC5x5 workaround

On Fri, Dec 1, 2017 at 12:19 PM,   wrote:
> From: Kevin Rogovin 
>
> This patch series implements a needed workaround for Gen9 for ASTC5x5 
> sampler reads. The crux of the work around is to make sure that the 
> sampler does not read an ASTC5x5 texture and a surface with an 
> auxilary buffer without having a texture cache invalidate between such 
> accesses.

Presumably anv needs something like this too? What happens if you have a single 
draw which samples from both an ASTC5x5 texture and one with an aux buffer? 
[Just curious.]
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/18] intel/compiler: fix for memmove argument on annotating error

2017-11-17 Thread Rogovin, Kevin
Thankyou! very much for the patch to the command line disassembler.

 -Kevin

-Original Message-
From: Matt Turner [mailto:matts...@gmail.com] 
Sent: Friday, November 17, 2017 6:52 AM
To: Rogovin, Kevin 
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 07/18] intel/compiler: fix for memmove argument 
on annotating error

On Wed, Nov 15, 2017 at 12:13 AM, Rogovin, Kevin  
wrote:
> I have just seen that I have had an epic brain lapse on this.
>
> The code is pretty clear, the correct value of count should be ann_count - i. 
> This is because:
>   a. The value of ann_count is the value of the array size BEFORE the insert; 
> this is clear from the code within the if (offset + ..) where it increments 
> ann_count.
>  b. Since ann_count is the size before the insert, it should move the content 
> in the open range [i, ann_count) to [i + 1, ann_count + 1); thus the number 
> of elements is given by ann_count - i.
>
> Changing the count to ann_count - i does the right thing, and also makes it 
> quite clear that Matt is correct on patch 11: that it was just papering over 
> a bug from using the wrong count value.

I think I used the wrong data structure :)

I reproduced the problem you were having and then got tired of thinking about 
how to memmove elements. I rewrote the annotation code yesterday to use a 
linked list, which is much more natural. I just sent the patches. Attached is a 
patch you can squash into your series in order to make it work on top of my 
series.

> However, once this is done, the build does assert() on some shaders that I 
> have; this is because it fails to understand some registers.

I think that's a result of the disassembler not knowing how to disassemble 
sends/sendsc. Both Toni and Neil have written patches for that. I'll see if I 
can rebase them.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Difference between Meson and autotools release builds

2017-11-15 Thread Rogovin, Kevin
Thanks, I missed that; sorry for the mailing list chatter caused by missing the 
original e-mail on the material.
 -Kevin

-Original Message-
From: Eric Engestrom [mailto:eric.engest...@imgtec.com] 
Sent: Wednesday, November 15, 2017 1:39 PM
To: Rogovin, Kevin 
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] Difference between Meson and autotools release builds

On Wednesday, 2017-11-15 09:43:59 +, Rogovin, Kevin wrote:
> Hi,
> 
> I do not know if anyone has noticed or if it is deliberate, but the meson 
> builds have that assert()'s are active but the autotools release builds have 
> that assert() is inactive.

Hi,

Yep, this is by design, meson decouples the build type (eg. "release") from 
whether asserts are enabled; you can read more about it in this
thread:
https://lists.freedesktop.org/archives/mesa-dev/2017-November/175739.html

You can already read the not-yet-landed documentation in this patch:
https://lists.freedesktop.org/archives/mesa-dev/2017-November/175854.html

(or if it lands before you read this, the up-to-date meson-for-mesa 
documentation will live at: https://mesa3d.org/meson.html)

Cheers,
  Eric
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Difference between Meson and autotools release builds

2017-11-15 Thread Rogovin, Kevin
Just to clarify, I do not mean just the defaults, but doing -buildtype release 
also leaves assert()'s active (atleast on my system setup).

 -Kevin

From: Rogovin, Kevin 
Sent: Wednesday, November 15, 2017 11:44 AM
To: mesa-dev@lists.freedesktop.org
Subject: Difference between Meson and autotools release builds

Hi, 

I do not know if anyone has noticed or if it is deliberate, but the meson 
builds have that assert()'s are active but the autotools release builds have 
that assert() is inactive. 

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Difference between Meson and autotools release builds

2017-11-15 Thread Rogovin, Kevin
Hi,

I do not know if anyone has noticed or if it is deliberate, but the meson 
builds have that assert()'s are active but the autotools release builds have 
that assert() is inactive.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/18] intel/compiler: fix for memmove argument on annotating error

2017-11-15 Thread Rogovin, Kevin
I have just seen that I have had an epic brain lapse on this.

The code is pretty clear, the correct value of count should be ann_count - i. 
This is because:
  a. The value of ann_count is the value of the array size BEFORE the insert; 
this is clear from the code within the if (offset + ..) where it increments 
ann_count.
 b. Since ann_count is the size before the insert, it should move the content 
in the open range [i, ann_count) to [i + 1, ann_count + 1); thus the number of 
elements is given by ann_count - i.

Changing the count to ann_count - i does the right thing, and also makes it 
quite clear that Matt is correct on patch 11: that it was just papering over a 
bug from using the wrong count value.

However, once this is done, the build does assert() on some shaders that I 
have; this is because it fails to understand some registers.

-Kevin

-Original Message-
From: Matt Turner [mailto:matts...@gmail.com] 
Sent: Monday, November 13, 2017 11:21 PM
To: Rogovin, Kevin 
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 07/18] intel/compiler: fix for memmove argument 
on annotating error

On Mon, Nov 13, 2017 at 1:12 PM, Rogovin, Kevin  wrote:
> Hi,
>
>
>  I confess I am not 100% on this code and I did educated guessing what it is  
> trying to do; I figured it was trying to insert contents at the current index 
> i; and that ann_count is the size -after- the insert. thus I figured the 
> memmove is to move the half open interval [i, ann_count-1) to the half open 
> interval [i + 1, ann_count). The number of elements in the half open range 
> [i, ann_count - 1) is given by ann_count - i - 1.
>
> I tried changing the count from ann_count - i - 1 to ann_count - i + 1 and 
> then the disassembler crashed in annotation on the same shaders that I have 
> had it crash on before.

Interesting. Could you send me the shader binary? I'll take a look.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 18/18] intel/tools: add command line GEN shader disassembler tool

2017-11-14 Thread Rogovin, Kevin
Hi,

 I've made a preliminary meson.build in src/intel/tools for the stuff; One can 
see it in the commit "   
src/intel/tools: add BatchbufferLogger to meson build system " at 
https://github.com/krogueintel/asem/tree/batchbuffer-logger . Comments welcome.

Best Regards,
 -Kevin

P.S. This was my first experience with Meson; It was so much nicer to add a 
target compared to autotool and meson is so much faster to build from a git 
clean than autotools. So much faster.

-Original Message-
From: Eric Engestrom [mailto:eric.engest...@imgtec.com] 
Sent: Monday, November 13, 2017 8:09 PM
To: Rogovin, Kevin 
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 18/18] intel/tools: add command line GEN shader 
disassembler tool

On Monday, 2017-11-13 15:18:06 +0200, kevin.rogo...@intel.com wrote:
> From: Kevin Rogovin 
> 
> Signed-off-by: Kevin Rogovin 
> ---
>  src/intel/Makefile.tools.am   |  21 ++-
>  src/intel/tools/.gitignore|   1 +
>  src/intel/tools/gen_shader_disassembler.c | 221 
> ++
>  3 files changed, 242 insertions(+), 1 deletion(-)  create mode 100644 
> src/intel/tools/gen_shader_disassembler.c
> 
> diff --git a/src/intel/Makefile.tools.am b/src/intel/Makefile.tools.am 
> index dd68d8f173..71eb3253c3 100644
> --- a/src/intel/Makefile.tools.am
> +++ b/src/intel/Makefile.tools.am
> @@ -32,7 +32,8 @@ intellib_LTLIBRARIES = \
>  
>  intelbin_PROGRAMS = tools/i965_batchbuffer_dump_show \
>   tools/i965_batchbuffer_dump_show_xml \
> - tools/i965_batchbuffer_dump_show_json
> + tools/i965_batchbuffer_dump_show_json \
> + tools/gen_shader_disassembler

Could you add these to src/intel/tools/meson.build as well?
Thanks :)

>  
>  intelbin_SCRIPTS = tools/i965_batchbuffer_logger_sh  CLEANFILES += 
> $(intelbin_SCRIPTS) @@ -111,3 +112,21 @@ 
> tools_i965_batchbuffer_dump_show_xml_SOURCES = \
>  
>  tools_i965_batchbuffer_dump_show_json_SOURCES = \
>   tools/i965_batchbuffer_dump_show_json.cpp
> +
> +tools_gen_shader_disassembler_SOURCES = \
> + tools/gen_shader_disassembler.c \
> + tools/disasm.c \
> + tools/gen_disasm.h
> +
> +tools_gen_shader_disassembler_LDADD = \
> + common/libintel_common.la \
> + compiler/libintel_compiler.la \
> + $(top_builddir)/src/util/libmesautil.la \
> + $(PTHREAD_LIBS) \
> + $(EXPAT_LIBS) \
> + $(ZLIB_LIBS)
> +
> +tools_gen_shader_disassembler_CFLAGS = \
> + $(AM_CFLAGS) \
> + $(EXPAT_CFLAGS) \
> + $(ZLIB_CFLAGS)
> diff --git a/src/intel/tools/.gitignore b/src/intel/tools/.gitignore 
> index ea4dc23c20..e9b22c89aa 100644
> --- a/src/intel/tools/.gitignore
> +++ b/src/intel/tools/.gitignore
> @@ -4,3 +4,4 @@
>  /i965_batchbuffer_dump_show
>  /i965_batchbuffer_dump_show_xml
>  /i965_batchbuffer_dump_show_json
> +/gen_shader_disassembler
> diff --git a/src/intel/tools/gen_shader_disassembler.c 
> b/src/intel/tools/gen_shader_disassembler.c
> new file mode 100644
> index 00..bd6c400fcc
> --- /dev/null
> +++ b/src/intel/tools/gen_shader_disassembler.c
> @@ -0,0 +1,221 @@
> +/*
> + * Copyright © 2017 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person 
> +obtaining a
> + * copy of this software and associated documentation files (the 
> +"Software"),
> + * to deal in the Software without restriction, including without 
> +limitation
> + * the rights to use, copy, modify, merge, publish, distribute, 
> +sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom 
> +the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including 
> +the next
> + * paragraph) shall be included in all copies or substantial portions 
> +of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
> +EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
> +MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT 
> +SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES 
> +OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
> +ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 
> +OTHER DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "compiler/brw_inst.h"
> +#include "compiler/brw_eu.h"
> +
> +static
> +void
> +print_opcodes(const void *data, int da

Re: [Mesa-dev] [PATCH 11/18] intel/tools/disasm: make sure that entire range is disassembled

2017-11-13 Thread Rogovin, Kevin
Your theory makes sense to me too; I suspect that something in the annotation 
code is the ultimate cause.


-Original Message-
From: Matt Turner [mailto:matts...@gmail.com] 
Sent: Monday, November 13, 2017 9:15 PM
To: Rogovin, Kevin 
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 11/18] intel/tools/disasm: make sure that entire 
range is disassembled

On Mon, Nov 13, 2017 at 5:17 AM,   wrote:
> From: Kevin Rogovin 
>
> Without this patch, if a shader has errors, the disassembly of the 
> shader often stops after the first opcode that has errors.

I can't see anything wrong with the current code. If I'm right that
07/18 is wrong, could this be papering over the bug introduced in that patch 
which would have caused annotations to get lost?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/18] intel/tools/disasm: add gen_disasm_assembly_length function

2017-11-13 Thread Rogovin, Kevin
I need this function in order for the logger to save shader binary to disk.

-Kevin

-Original Message-
From: Matt Turner [mailto:matts...@gmail.com] 
Sent: Monday, November 13, 2017 9:43 PM
To: Rogovin, Kevin 
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 10/18] intel/tools/disasm: add 
gen_disasm_assembly_length function

Pending the questions about 07/18, I don't think this should be necessary.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/18] intel/compiler:add function to give option to print offsets into assembly

2017-11-13 Thread Rogovin, Kevin
I am fine with that suggestion of changing the function signature and to not 
add a new one; I wanted the changes I made to be as uninvasive as possible 
which is why I added the function instead of changing an existing one.

-Kevin

-Original Message-
From: Matt Turner [mailto:matts...@gmail.com] 
Sent: Monday, November 13, 2017 9:42 PM
To: Rogovin, Kevin 
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 08/18] intel/compiler:add function to give 
option to print offsets into assembly

On Mon, Nov 13, 2017 at 5:17 AM,   wrote:
> From: Kevin Rogovin 
>
> Signed-off-by: Kevin Rogovin 
> ---
>  src/intel/compiler/brw_eu.c | 11 ++-  
> src/intel/compiler/brw_eu.h |  3 +++
>  2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_eu.c b/src/intel/compiler/brw_eu.c 
> index bc297a21b3..8969ae5bda 100644
> --- a/src/intel/compiler/brw_eu.c
> +++ b/src/intel/compiler/brw_eu.c
> @@ -339,6 +339,15 @@ const unsigned *brw_get_program( struct 
> brw_codegen *p,  void  brw_disassemble(const struct gen_device_info 
> *devinfo,
>  const void *assembly, int start, int end, FILE *out)
> +{
> +   brw_disassemble_print_offset_option(devinfo, assembly, start, end, out,
> +   false); }
> +
> +void
> +brw_disassemble_print_offset_option(const struct gen_device_info *devinfo,
> +const void *assembly, int start, int end,
> +FILE *out, bool print_offsets)
>  {
> bool dump_hex = (INTEL_DEBUG & DEBUG_HEX) != 0;
>
> @@ -346,7 +355,7 @@ brw_disassemble(const struct gen_device_info *devinfo,
>const brw_inst *insn = assembly + offset;
>brw_inst uncompacted;
>bool compacted = brw_inst_cmpt_control(devinfo, insn);
> -  if (0)
> +  if (print_offsets)
>   fprintf(out, "0x%08x: ", offset);
>
>if (compacted) {
> diff --git a/src/intel/compiler/brw_eu.h b/src/intel/compiler/brw_eu.h 
> index 95503d5513..497cf9e575 100644
> --- a/src/intel/compiler/brw_eu.h
> +++ b/src/intel/compiler/brw_eu.h
> @@ -128,6 +128,9 @@ int brw_disassemble_inst(FILE *file, const struct 
> gen_device_info *devinfo,
>   const struct brw_inst *inst, bool 
> is_compacted);  void brw_disassemble(const struct gen_device_info *devinfo,
>   const void *assembly, int start, int end, FILE 
> *out);
> +void brw_disassemble_print_offset_option(const struct gen_device_info 
> *devinfo,
> + const void *assembly, int start, 
> int end,
> + FILE *out, bool 
> +print_offsets);

Instead of adding a new function, I'd be in favor of just adding the bool 
print_offsets parameter to brw_disassemble (before the FILE parameter). Then we 
can add an INTEL_DEBUG=print_offsets and pass INTEL_DEBUG & DEBUG_PRINT_OFFSETS 
as the argument in the existing calls.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/18] intel/compiler: fix for memmove argument on annotating error

2017-11-13 Thread Rogovin, Kevin
Hi,


 I confess I am not 100% on this code and I did educated guessing what it is  
trying to do; I figured it was trying to insert contents at the current index 
i; and that ann_count is the size -after- the insert. thus I figured the 
memmove is to move the half open interval [i, ann_count-1) to the half open 
interval [i + 1, ann_count). The number of elements in the half open range [i, 
ann_count - 1) is given by ann_count - i - 1.

I tried changing the count from ann_count - i - 1 to ann_count - i + 1 and then 
the disassembler crashed in annotation on the same shaders that I have had it 
crash on before.

 -Kevin

-Original Message-
From: Matt Turner [mailto:matts...@gmail.com] 
Sent: Monday, November 13, 2017 9:02 PM
To: Rogovin, Kevin 
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 07/18] intel/compiler: fix for memmove argument 
on annotating error

On Mon, Nov 13, 2017 at 5:17 AM,   wrote:
> From: Kevin Rogovin 
>
> Without this fix, disassembling of GEN shaders with GPU commands that 
> the disassembler does not know would result in errors being added to 
> the annotator which would crash when more than one error was added.
>
> Signed-off-by: Kevin Rogovin 
> ---
>  src/intel/compiler/intel_asm_annotation.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/intel_asm_annotation.c 
> b/src/intel/compiler/intel_asm_annotation.c
> index b07a545a12..7aa222f04e 100644
> --- a/src/intel/compiler/intel_asm_annotation.c
> +++ b/src/intel/compiler/intel_asm_annotation.c
> @@ -181,8 +181,9 @@ annotation_insert_error(struct annotation_info 
> *annotation, unsigned offset,
>   continue;
>
>if (offset + sizeof(brw_inst) != next->offset) {
> - memmove(next, cur,
> - (annotation->ann_count - i + 2) * sizeof(struct 
> annotation));
> + int count;
> + count = annotation->ann_count - i - 1;
> + memmove(next, cur, count * sizeof(struct annotation));

I don't see how this can be right.

Take for example a case where we have ann_count == 4. We have
annotation->ann[0..4] where ann[4] is the end marker... a little
surprising. We want to insert an error annotation on an instruction in ann[2] 
but not at the end, so we need to split ann[2].

cur = 2, next = 3. We need to memmove elements 2, 3, 4 one spot later, and then 
update ann[2] and ann[3] (which after the memmove is a copy of ann[2]).

Count should be ann_count (4) - i (2) + 1 -> 3. The code currently says +2 
instead of +1 and that seems like a bug.

What do you think?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/18] intel/common/gen_decoder: make useable from C++ source

2017-11-13 Thread Rogovin, Kevin
Hi,

Oh MY! I missed that it was added to the driver (because miraculously the patch 
applied cleanly anyway).

Thanks, I will drop this patch if I make a v3.

-Kevin

-Original Message-
From: Landwerlin, Lionel G 
Sent: Monday, November 13, 2017 3:45 PM
To: Rogovin, Kevin ; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 06/18] intel/common/gen_decoder: make useable 
from C++ source

On 13/11/17 13:17, kevin.rogo...@intel.com wrote:
> From: Kevin Rogovin 
>
> Signed-off-by: Kevin Rogovin 
> ---
>   src/intel/common/gen_decoder.h | 7 +++
>   1 file changed, 7 insertions(+)
>
> diff --git a/src/intel/common/gen_decoder.h 
> b/src/intel/common/gen_decoder.h index 8b00b6edc2..e3b2457dfd 100644
> --- a/src/intel/common/gen_decoder.h
> +++ b/src/intel/common/gen_decoder.h
> @@ -34,6 +34,10 @@
>   extern "C" {
>   #endif

The line above shows you can drop this patch.

>   
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
>   struct gen_spec;
>   struct gen_group;
>   struct gen_field;
> @@ -185,6 +189,9 @@ void gen_print_group(FILE *out,
>struct gen_group *group,
>uint64_t offset, const uint32_t *p,
>bool color);
> +#ifdef __cplusplus
> +}
> +#endif
>   
>   #ifdef __cplusplus
>   }


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

2017-09-28 Thread Rogovin, Kevin
Hi all,

Just a gentle poke.

Even though a serious issue was already found by Chris Wilson on batchbuffer 
migration, I would like for folks to look at the series (in particular the 
monster patch 16) and give comments. With those comments I will then create a 
v2 (indeed I've already implemented fixes for the issue that Chris pointed out 
on batchbuffer migration and a pair of issues I realized on the script at patch 
17).

Best Regards,
 -Kevin



-Original Message-
From: Rogovin, Kevin 
Sent: Wednesday, September 27, 2017 2:38 PM
To: Chris Wilson ; mesa-dev@lists.freedesktop.org
Subject: RE: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

Hi,

 If we just want to send to the kernel the data from the trace, I can do that 
very easily; just make such a GEM BO, comprising of dword-pairs of 
(TraceCallID, BatchbufferOffset). That will be a small buffer and together with 
the apitrace file, will give complete data. 

 I could probably make such a dedicated tool quite quickly, or add that 
functionality to the logger.

-Kevin

-Original Message-
From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] 
Sent: Wednesday, September 27, 2017 1:21 PM
To: Rogovin, Kevin ; mesa-dev@lists.freedesktop.org
Subject: RE: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

Quoting Rogovin, Kevin (2017-09-27 07:53:29)
> Hi,
> 
>  Right now the way the thing works is that it walks the batchbuffer just 
> after the kernel returns from the ioctl and updates its internal view of the 
> GPU state as it walks and emits to the log file the data. The log on a single 
> batchbuffer is (essentially) just a list of call ID's from the apitrace 
> together of "where in the batchbuffer" that call started. 
> 
>  I confess that I had not realized the potential application for using 
> something like this to help diagnose GPU hangs! I think it is a really good 
> idea. What I could do is the following (and it is not terribly hard to do):
> 
>1. -BEFORE- issuing the ioctl, the logger walks just the api markers in 
> the log of the batchbuffer, and makes a new GEM BO filled with apitrace data 
> (call ID, and maybe GL function data) and modify the ioctl to have an extra 
> buffer.

Yes. With the current intel_batchbuffer.c this should be relatively easy (I 
suggest you limit yourself to recent kernels for that simplification); see 
EXEC_BATCH_FIRST and remember to mark the trace bo as EXEC_OBJECT_CAPTURE.
 
>   2. -AFTER- the ioctl returns, emit the log data (as now) and delete the GEM 
> BO; In order to read the GPU state more accurately I need to walk the log and 
> update the GPU state after the ioctl (mostly out of paranoia for values 
> copied from BO's to pipeline registers).

Up to, but my paranoia goes the other way. Once the ioctl returns the hw is 
indeed using that memory, so I have less trust of it. If you need to tie the 
relocated pointers to the trace, I would also emit relocations into the trace. 
For the reasons of port-mortem GPU hang debugging, I would want the execbuf be 
complete before the ioctl, rather than post processing.
 
> What would happen, is that if a batchbuffer made the GPU hang, you would then 
> know all the GL commands (trace ID's from the API trace) that made stuff on 
> that batchbuffer. Then one could go back to the apitrace of the troublesome 
> application  and have a much better starting place to debug.

Yup. As times go on, I hope this becomes a more complete flight-recorder that 
we don't have to rely on referencing back to a separate trace to work out the 
interesting calls. My goal is that you can give one instruction (that doesn't 
require any additional dependencies, so can just be LD_PRELOAD=i965-fdr.so, or 
better a script installed in mesa-utils?) to a bug reporter and that will then 
capture enough information.
 
> We could also do something evil looking and put another modification on 
> apitrace where it can have a list of call trace ranges where it inserts 
> glFinish after each call. Those glFinish()'s will then force the ioctl of the 
> exact troublesome draw call without needing to tell i965 to flush after each 
> draw call.
> 
> Just to make sure, you want the "apitrace" data (call ID list, maybe function 
> name) in a GEM BO? Which GEM BO should it be in the list so that kernel debug 
> code know which one to use to dump? I would guess if the batchbuffer is the 
> first buffer, then it would be the last buffer, otherwise if the batch buffer 
> is the last one, I guess it would be one just before, but that might screw up 
> reloc-data if any of the relocs in the batchbuffer refer to itself. I can 
> also emit the data to a file and close the file before the ioctl and if the 
> ioctl returns, delete said file (assuming a GPU hang always stops the 
> process, then a hang

Re: [Mesa-dev] [PATCH 06/22] i965: Enable BatchbufferLogger in i965 driver

2017-09-27 Thread Rogovin, Kevin

>My worry is that batch->bo is not constant for the construction of a single 
>execbuf2.
> If intel_batchbuffer.c runs out of room inside the batch->bo, it will 
> allocate a new one and continue building it.

That will be ok, but if the (fd, GEM BO handle) changes, there is a serious 
issue.

> I'm just mentioning that may not be the same handle as some of the earlier 
> state calls.

This is a serious issue; sighs. The Logger will not miss any GPU state (since 
that is handled at ioctl interception time), but it will get the GL/GLES call 
ID's royally hosed because the api call markers will be on the log associated 
to that (old) (fd, GEM handle) pair.

Futz.

The solution is to add another function to the driver interface that the driver 
calls when it "migrates" stuff from one BO to another (i.e. when it grows a 
batchbuffer to fit in that last draw call).

Good catch on that, I had utterly forgotten that the batchbuffer split lets 
i965 "grow" the batchbuffer to fit in that one last call. All the stuff I have 
run on it so far have not hit that, but there will be loads where it gets hit.

I will fix this ASAP.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

2017-09-27 Thread Rogovin, Kevin
Hi,

 If we just want to send to the kernel the data from the trace, I can do that 
very easily; just make such a GEM BO, comprising of dword-pairs of 
(TraceCallID, BatchbufferOffset). That will be a small buffer and together with 
the apitrace file, will give complete data. 

 I could probably make such a dedicated tool quite quickly, or add that 
functionality to the logger.

-Kevin

-Original Message-
From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] 
Sent: Wednesday, September 27, 2017 1:21 PM
To: Rogovin, Kevin ; mesa-dev@lists.freedesktop.org
Subject: RE: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

Quoting Rogovin, Kevin (2017-09-27 07:53:29)
> Hi,
> 
>  Right now the way the thing works is that it walks the batchbuffer just 
> after the kernel returns from the ioctl and updates its internal view of the 
> GPU state as it walks and emits to the log file the data. The log on a single 
> batchbuffer is (essentially) just a list of call ID's from the apitrace 
> together of "where in the batchbuffer" that call started. 
> 
>  I confess that I had not realized the potential application for using 
> something like this to help diagnose GPU hangs! I think it is a really good 
> idea. What I could do is the following (and it is not terribly hard to do):
> 
>1. -BEFORE- issuing the ioctl, the logger walks just the api markers in 
> the log of the batchbuffer, and makes a new GEM BO filled with apitrace data 
> (call ID, and maybe GL function data) and modify the ioctl to have an extra 
> buffer.

Yes. With the current intel_batchbuffer.c this should be relatively easy (I 
suggest you limit yourself to recent kernels for that simplification); see 
EXEC_BATCH_FIRST and remember to mark the trace bo as EXEC_OBJECT_CAPTURE.
 
>   2. -AFTER- the ioctl returns, emit the log data (as now) and delete the GEM 
> BO; In order to read the GPU state more accurately I need to walk the log and 
> update the GPU state after the ioctl (mostly out of paranoia for values 
> copied from BO's to pipeline registers).

Up to, but my paranoia goes the other way. Once the ioctl returns the hw is 
indeed using that memory, so I have less trust of it. If you need to tie the 
relocated pointers to the trace, I would also emit relocations into the trace. 
For the reasons of port-mortem GPU hang debugging, I would want the execbuf be 
complete before the ioctl, rather than post processing.
 
> What would happen, is that if a batchbuffer made the GPU hang, you would then 
> know all the GL commands (trace ID's from the API trace) that made stuff on 
> that batchbuffer. Then one could go back to the apitrace of the troublesome 
> application  and have a much better starting place to debug.

Yup. As times go on, I hope this becomes a more complete flight-recorder that 
we don't have to rely on referencing back to a separate trace to work out the 
interesting calls. My goal is that you can give one instruction (that doesn't 
require any additional dependencies, so can just be LD_PRELOAD=i965-fdr.so, or 
better a script installed in mesa-utils?) to a bug reporter and that will then 
capture enough information.
 
> We could also do something evil looking and put another modification on 
> apitrace where it can have a list of call trace ranges where it inserts 
> glFinish after each call. Those glFinish()'s will then force the ioctl of the 
> exact troublesome draw call without needing to tell i965 to flush after each 
> draw call.
> 
> Just to make sure, you want the "apitrace" data (call ID list, maybe function 
> name) in a GEM BO? Which GEM BO should it be in the list so that kernel debug 
> code know which one to use to dump? I would guess if the batchbuffer is the 
> first buffer, then it would be the last buffer, otherwise if the batch buffer 
> is the last one, I guess it would be one just before, but that might screw up 
> reloc-data if any of the relocs in the batchbuffer refer to itself. I can 
> also emit the data to a file and close the file before the ioctl and if the 
> ioctl returns, delete said file (assuming a GPU hang always stops the 
> process, then a hang would leave behind a file). 

My vision is that you would attach all "files" to the execbuf, but then again 
I'm focusing on fdr and not debugging of new features. So long as we are 
talking about a few megabytes of trace data that isn't too bad. Then we don't 
have to fiddle around with extra files to find the ones corresponding to the 
hang, as they will be recorded in the error state. The contents I leave up to 
you :) (I figure it is a snowball, once a tracing mechanism exists for 
capturing GPU hangs, there'll be lots of suggestions! One is probably just to 
capture the aub annotations alongside the batch. Hmm, th

Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

2017-09-27 Thread Rogovin, Kevin
HI,

 In spirit, stuffing data into MI_NOOP is nicer since then one can just rely on 
aubinator to read that data and go to town. The main issues I see are the 
following.

 1. One needs to now insert MI_NOOP's into the command buffer in order to 
insert strings. This changes what is sent to the GPU (though one can argue that 
MI_NOOP should not really matter). The big nasty potential change is in 
situations where the command buffer approaches full, with the MI_NOOP it fills 
faster and thus that dramatically changes what a driver sends to the GPU since 
a new batchbuffer triggers more state and -FLUSHES-.

2. It means more modifications to the driver in order to insert the messages.

3. The driver needs to somehow get a call-id from the application in order to 
know what value to place in the MI_NOOP. 

The worst issue (for me) is #1; #3 is solveable-ish by making some function 
pointer available to set the value to stuff in the MI_NOOP unused bits. Issue 
#2 is quite icky because I have more in mind for the logger than Mesa/i965 and 
I want to keep the work to add it to a driver to a bare minimum.

FWIW, when I started this, I wanted to do it via aub-dumper and aubinator where 
they would produce auxiliary files that had the necessary data to know what in 
it came from where. But the more I looked at the issues I wanted to solve, the 
more trickier it seemed to me to use aubdumper and aubinator to accomplish that.

-Original Message-
From: Landwerlin, Lionel G 
Sent: Wednesday, September 27, 2017 12:35 PM
To: Rogovin, Kevin ; Chris Wilson 
; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

A few months ago I implemented debug messages in the command stream by stuffing 
the unused bits of MI_NOOP :

https://patchwork.freedesktop.org/series/26079/

Aubinator would then read the bits and print the messages.

We might be able to reuse similar idea to get away with any external interface.
Instead of having strings of characters we could put a marker with the BO 
handle that could be used to store all of the metadata about a particular draw 
call.

What do you think?

-
Lionel

On 27/09/17 07:53, Rogovin, Kevin wrote:
> Hi,
>
>   Right now the way the thing works is that it walks the batchbuffer just 
> after the kernel returns from the ioctl and updates its internal view of the 
> GPU state as it walks and emits to the log file the data. The log on a single 
> batchbuffer is (essentially) just a list of call ID's from the apitrace 
> together of "where in the batchbuffer" that call started.
>
>   I confess that I had not realized the potential application for using 
> something like this to help diagnose GPU hangs! I think it is a really good 
> idea. What I could do is the following (and it is not terribly hard to do):
>
> 1. -BEFORE- issuing the ioctl, the logger walks just the api markers in 
> the log of the batchbuffer, and makes a new GEM BO filled with apitrace data 
> (call ID, and maybe GL function data) and modify the ioctl to have an extra 
> buffer.
>
>2. -AFTER- the ioctl returns, emit the log data (as now) and delete the 
> GEM BO; In order to read the GPU state more accurately I need to walk the log 
> and update the GPU state after the ioctl (mostly out of paranoia for values 
> copied from BO's to pipeline registers).
>
> What would happen, is that if a batchbuffer made the GPU hang, you would then 
> know all the GL commands (trace ID's from the API trace) that made stuff on 
> that batchbuffer. Then one could go back to the apitrace of the troublesome 
> application  and have a much better starting place to debug.
>
> We could also do something evil looking and put another modification on 
> apitrace where it can have a list of call trace ranges where it inserts 
> glFinish after each call. Those glFinish()'s will then force the ioctl of the 
> exact troublesome draw call without needing to tell i965 to flush after each 
> draw call.
>
> Just to make sure, you want the "apitrace" data (call ID list, maybe function 
> name) in a GEM BO? Which GEM BO should it be in the list so that kernel debug 
> code know which one to use to dump? I would guess if the batchbuffer is the 
> first buffer, then it would be the last buffer, otherwise if the batch buffer 
> is the last one, I guess it would be one just before, but that might screw up 
> reloc-data if any of the relocs in the batchbuffer refer to itself. I can 
> also emit the data to a file and close the file before the ioctl and if the 
> ioctl returns, delete said file (assuming a GPU hang always stops the 
> process, then a hang would leave behind a file).
>
> Let me know, what is best, and I will do it.
>
> -Kevin
>
>
> -Original Message-
> From: Chris Wilson [mai

Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

2017-09-27 Thread Rogovin, Kevin
Hi,

Sighs, I forget one -critical- issue on decoding the batchbuffer: it needs to 
reloc data from the kernel to have any chance of correctly decoding things 
referred to by the batch buffer (which is oodles of stuff), thus decode can 
only happen after kernel succeeds.

However, I can make it emit a separate file before the call to the kernel 
giving atleast the apitrace data (and perhaps the filename to give an 
indication too) and delete the file after the ioctl returns; from there it is 
straight forward to have n files alive to see the last n execbuffer2 ioctls.

Though, if the logger is to send the api-trace call id's to the kernel on 
execbuffer2, then this does not matter.

-Kevin

-Original Message-
From: Rogovin, Kevin 
Sent: Wednesday, September 27, 2017 9:53 AM
To: 'Chris Wilson' ; mesa-dev@lists.freedesktop.org
Subject: RE: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

Hi,

 Right now the way the thing works is that it walks the batchbuffer just after 
the kernel returns from the ioctl and updates its internal view of the GPU 
state as it walks and emits to the log file the data. The log on a single 
batchbuffer is (essentially) just a list of call ID's from the apitrace 
together of "where in the batchbuffer" that call started. 

 I confess that I had not realized the potential application for using 
something like this to help diagnose GPU hangs! I think it is a really good 
idea. What I could do is the following (and it is not terribly hard to do):

   1. -BEFORE- issuing the ioctl, the logger walks just the api markers in the 
log of the batchbuffer, and makes a new GEM BO filled with apitrace data (call 
ID, and maybe GL function data) and modify the ioctl to have an extra buffer.

  2. -AFTER- the ioctl returns, emit the log data (as now) and delete the GEM 
BO; In order to read the GPU state more accurately I need to walk the log and 
update the GPU state after the ioctl (mostly out of paranoia for values copied 
from BO's to pipeline registers).

What would happen, is that if a batchbuffer made the GPU hang, you would then 
know all the GL commands (trace ID's from the API trace) that made stuff on 
that batchbuffer. Then one could go back to the apitrace of the troublesome 
application  and have a much better starting place to debug.

We could also do something evil looking and put another modification on 
apitrace where it can have a list of call trace ranges where it inserts 
glFinish after each call. Those glFinish()'s will then force the ioctl of the 
exact troublesome draw call without needing to tell i965 to flush after each 
draw call.

Just to make sure, you want the "apitrace" data (call ID list, maybe function 
name) in a GEM BO? Which GEM BO should it be in the list so that kernel debug 
code know which one to use to dump? I would guess if the batchbuffer is the 
first buffer, then it would be the last buffer, otherwise if the batch buffer 
is the last one, I guess it would be one just before, but that might screw up 
reloc-data if any of the relocs in the batchbuffer refer to itself. I can also 
emit the data to a file and close the file before the ioctl and if the ioctl 
returns, delete said file (assuming a GPU hang always stops the process, then a 
hang would leave behind a file). 

Let me know, what is best, and I will do it.

-Kevin


-Original Message-
From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
Sent: Tuesday, September 26, 2017 11:20 PM
To: Rogovin, Kevin ; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

Quoting Rogovin, Kevin (2017-09-26 10:35:44)
> Hi,
> 
>   Attached to this message are the following:
>  1. a file giving example usage of the tool with a modified 
> apitrace to produce json output
> 
>  2. the patches to apitrace to make it BatchbufferLogger aware
> 
>  3. the JSON files (gzipped) made from the example.
> 
> 
> I encourage (and hope) people will take a look at the JSON to see the 
> potential of the tool.

The automatic apitrace-esque logging seems very useful. How easy would it be to 
write that trace into a bo and associate with the execbuffer (from my pov, it 
should be that hard)? That way you could get the most recent actions before a 
GPU hang, attach them to a bug and decode them at leisure. (An extension may be 
to keep a ring of the last N traces so that you can see some setup a few 
batches ago that triggered a hang in this one.)

I presume you already have such a plan, and I'm just preaching to the choir.
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

2017-09-27 Thread Rogovin, Kevin
Hi,

 Right now the way the thing works is that it walks the batchbuffer just after 
the kernel returns from the ioctl and updates its internal view of the GPU 
state as it walks and emits to the log file the data. The log on a single 
batchbuffer is (essentially) just a list of call ID's from the apitrace 
together of "where in the batchbuffer" that call started. 

 I confess that I had not realized the potential application for using 
something like this to help diagnose GPU hangs! I think it is a really good 
idea. What I could do is the following (and it is not terribly hard to do):

   1. -BEFORE- issuing the ioctl, the logger walks just the api markers in the 
log of the batchbuffer, and makes a new GEM BO filled with apitrace data (call 
ID, and maybe GL function data) and modify the ioctl to have an extra buffer.

  2. -AFTER- the ioctl returns, emit the log data (as now) and delete the GEM 
BO; In order to read the GPU state more accurately I need to walk the log and 
update the GPU state after the ioctl (mostly out of paranoia for values copied 
from BO's to pipeline registers).

What would happen, is that if a batchbuffer made the GPU hang, you would then 
know all the GL commands (trace ID's from the API trace) that made stuff on 
that batchbuffer. Then one could go back to the apitrace of the troublesome 
application  and have a much better starting place to debug.

We could also do something evil looking and put another modification on 
apitrace where it can have a list of call trace ranges where it inserts 
glFinish after each call. Those glFinish()'s will then force the ioctl of the 
exact troublesome draw call without needing to tell i965 to flush after each 
draw call.

Just to make sure, you want the "apitrace" data (call ID list, maybe function 
name) in a GEM BO? Which GEM BO should it be in the list so that kernel debug 
code know which one to use to dump? I would guess if the batchbuffer is the 
first buffer, then it would be the last buffer, otherwise if the batch buffer 
is the last one, I guess it would be one just before, but that might screw up 
reloc-data if any of the relocs in the batchbuffer refer to itself. I can also 
emit the data to a file and close the file before the ioctl and if the ioctl 
returns, delete said file (assuming a GPU hang always stops the process, then a 
hang would leave behind a file). 

Let me know, what is best, and I will do it.

-Kevin


-Original Message-
From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] 
Sent: Tuesday, September 26, 2017 11:20 PM
To: Rogovin, Kevin ; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 00/22] RFC: Batchbuffer Logger for Intel GPU

Quoting Rogovin, Kevin (2017-09-26 10:35:44)
> Hi,
> 
>   Attached to this message are the following:
>  1. a file giving example usage of the tool with a modified 
> apitrace to produce json output
> 
>  2. the patches to apitrace to make it BatchbufferLogger aware
> 
>  3. the JSON files (gzipped) made from the example.
> 
> 
> I encourage (and hope) people will take a look at the JSON to see the 
> potential of the tool.

The automatic apitrace-esque logging seems very useful. How easy would it be to 
write that trace into a bo and associate with the execbuffer (from my pov, it 
should be that hard)? That way you could get the most recent actions before a 
GPU hang, attach them to a bug and decode them at leisure. (An extension may be 
to keep a ring of the last N traces so that you can see some setup a few 
batches ago that triggered a hang in this one.)

I presume you already have such a plan, and I'm just preaching to the choir.
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/22] i965: correctly assign SamplerCount of INTERFACE_DESCRIPTOR_DATA

2017-09-25 Thread Rogovin, Kevin
Hi,

 AFAIK, for all shader types both binding table entry count and number of 
samplers can be zero; the hardware uses those values to pre-fetch data. The 
docs say one may wish to leave it at zero if there is a risk of state thrashing 
if the number (for either) is large. FWIW, my main reason for fixing these two 
values was so that the logger could decode the binding tables and samplers 
correctly. 

If people want to push these 2 patches without having the entire series RB'd, 
that is fine with me, I am going to need to rebase anyways as this is just an 
RFC. Moreover all the patches that are fixes/enhancements to existing code, 
i.e. patches 0004-0005 and patches 0007-0015, I would be happy to see them 
pushed even if the main thrush of this series, the BatchbufferLogger, has 
comments (which I think it will). 

-Kevin

-Original Message-
From: Justen, Jordan L 
Sent: Tuesday, September 26, 2017 2:53 AM
To: Landwerlin, Lionel G ; Rogovin, Kevin 
; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 05/22] i965: correctly assign SamplerCount of 
INTERFACE_DESCRIPTOR_DATA

On 2017-09-25 05:46:32, Lionel Landwerlin wrote:
> I'm genuinely surprised we didn't noticed this problem before :|

Indeed. Did jenkins show any 'fixes' from this patch?

I think this patch should be handled separately from this RFC series.

Reviewed-by: Jordan Justen 

> Fixes: 71bfb44005bf ("i965: Port brw_cs_state tracked state to genxml.")
> Reviewed-by: Lionel Landwerlin 
> Cc: "17.2" 
> 
> On 25/09/17 11:34, kevin.rogo...@intel.com wrote:
> > From: Kevin Rogovin 
> >
> > Signed-off-by: Kevin Rogovin 
> > ---
> >   src/mesa/drivers/dri/i965/genX_state_upload.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c 
> > b/src/mesa/drivers/dri/i965/genX_state_upload.c
> > index bbb47ea..32c7d22 100644
> > --- a/src/mesa/drivers/dri/i965/genX_state_upload.c
> > +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
> > @@ -4197,7 +4197,7 @@ genX(upload_cs_state)(struct brw_context *brw)
> >  const struct GENX(INTERFACE_DESCRIPTOR_DATA) idd = {
> > .KernelStartPointer = brw->cs.base.prog_offset,
> > .SamplerStatePointer = stage_state->sampler_offset,
> > -  .SamplerCount = DIV_ROUND_UP(stage_state->sampler_count, 4) >> 2,
> > +  .SamplerCount = DIV_ROUND_UP(stage_state->sampler_count, 4),
> > .BindingTablePointer = stage_state->bind_bo_offset,
> > .BindingTableEntryCount = prog_data->binding_table.size_bytes / 4,
> > .ConstantURBEntryReadLength = cs_prog_data->push.per_thread.regs,
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/22] i965: assign BindingTableEntryCount of INTERFACE_DESCRIPTOR_DATA

2017-09-25 Thread Rogovin, Kevin
Hi,

 The value is the number of entries, not the size in bytes; 
BindingTableEntryCount for compute/media is programmed the same way as for all 
the other shader stages (where they too are divided by 4).

 FWIW, the hardware will work without this patch because genX initializes the 
value as 0. When the value is zero, it means the GPU will not prefetch the 
binding table entries.

-Kevin


-Original Message-
From: Landwerlin, Lionel G 
Sent: Monday, September 25, 2017 3:10 PM
To: Rogovin, Kevin ; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 04/22] i965: assign BindingTableEntryCount of 
INTERFACE_DESCRIPTOR_DATA

On 25/09/17 11:34, kevin.rogo...@intel.com wrote:
> From: Kevin Rogovin 
>
> Signed-off-by: Kevin Rogovin 
> ---
>   src/mesa/drivers/dri/i965/genX_state_upload.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c 
> b/src/mesa/drivers/dri/i965/genX_state_upload.c
> index 6127616..bbb47ea 100644
> --- a/src/mesa/drivers/dri/i965/genX_state_upload.c
> +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
> @@ -4199,6 +4199,7 @@ genX(upload_cs_state)(struct brw_context *brw)
> .SamplerStatePointer = stage_state->sampler_offset,
> .SamplerCount = DIV_ROUND_UP(stage_state->sampler_count, 4) >> 2,
> .BindingTablePointer = stage_state->bind_bo_offset,
> +  .BindingTableEntryCount = prog_data->binding_table.size_bytes / 
> + 4,

I'm not sure this should be divided by 4. The documentation doesn't seemto say 
it behaves like SamplerCount.

> .ConstantURBEntryReadLength = cs_prog_data->push.per_thread.regs,
> .NumberofThreadsinGPGPUThreadGroup = cs_prog_data->threads,
> .SharedLocalMemorySize = encode_slm_size(GEN_GEN,


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: add 2xMSAA and 16xMSAA to DRI configs for Gen9.

2017-08-30 Thread Rogovin, Kevin
Hi,

 Please do not push yet. I will post a V2 that makes the pointer thing not so 
offensively ugly. Also, I missed the opportunity to also fix the DRI conf for 
Gen8, as it does have 2xMSAA.

 -Kevin

-Original Message-
From: Ben Widawsky [mailto:b...@bwidawsk.net] 
Sent: Thursday, August 24, 2017 8:10 PM
To: Rogovin, Kevin 
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH] i965: add 2xMSAA and 16xMSAA to DRI configs for 
Gen9.

On 17-08-24 14:16:39, kevin.rogo...@intel.com wrote:
>From: Kevin Rogovin 
>
>Special thanks to Eero Tamminen for reporting rasterizer numbers being 
>twice what it should be for 2xMSAA under a benchmark.
>
>Signed-off-by: Kevin Rogovin 
>---
> src/mesa/drivers/dri/i965/intel_screen.c | 14 +++---
> 1 file changed, 11 insertions(+), 3 deletions(-)
>
>diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
>b/src/mesa/drivers/dri/i965/intel_screen.c
>index 579554f..67eb776 100644
>--- a/src/mesa/drivers/dri/i965/intel_screen.c
>+++ b/src/mesa/drivers/dri/i965/intel_screen.c
>@@ -1882,7 +1882,7 @@ intel_screen_make_configs(__DRIscreen *dri_screen)
>};
>
>static const uint8_t singlesample_samples[1] = {0};
>-   static const uint8_t multisample_samples[2]  = {4, 8};
>+   static const uint8_t multisample_samples_2_4_8_16[]  = {2, 4, 8, 
>+ 16};
>
>struct intel_screen *screen = dri_screen->driverPrivate;
>const struct gen_device_info *devinfo = &screen->devinfo; @@ 
>-1959,6 +1959,7 @@ intel_screen_make_configs(__DRIscreen *dri_screen)
> * supported.  Singlebuffer configs are not supported because no one wants
> * them.
> */
>+

No unnecessary whitespace changes, please.

>for (unsigned i = 0; i < ARRAY_SIZE(formats); i++) {
>   if (devinfo->gen < 6)
>  break;
>@@ -1966,6 +1967,7 @@ intel_screen_make_configs(__DRIscreen *dri_screen)
>   __DRIconfig **new_configs;
>   const int num_depth_stencil_bits = 2;
>   int num_msaa_modes = 0;
>+  const uint8_t *multisample_samples = NULL;
>
>   depth_bits[0] = 0;
>   stencil_bits[0] = 0;
>@@ -1978,10 +1980,16 @@ intel_screen_make_configs(__DRIscreen *dri_screen)
>  stencil_bits[1] = 8;
>   }
>
>-  if (devinfo->gen >= 7)
>+  if (devinfo->gen >= 9) {
>+ multisample_samples = multisample_samples_2_4_8_16;
>+ num_msaa_modes = 4;
>+  } else if (devinfo->gen >= 7) {
>+ multisample_samples = multisample_samples_2_4_8_16 + 1;
>  num_msaa_modes = 2;
>-  else if (devinfo->gen == 6)
>+  } else if (devinfo->gen == 6) {
>+ multisample_samples = multisample_samples_2_4_8_16 + 1;
>  num_msaa_modes = 1;
>+  }

I think it'd be a little cleaner to just make GEN specific arrays. Easier to 
read, and you can just USE ARRAY_SIZE but I honestly don't care much.

if (devinfo->gen >= 9) {
   multisample_samples = multisample_samples_gen9;
   num_msaa_modes = ARRAY_SIZE(multisample_samples_gen9);
}

>
>   new_configs = driCreateConfigs(formats[i],
>  depth_bits,

Kind of shocking to me that we missed this previously for both when we added 2x 
MSAA and later 16x. Indeed looking at glxinfo, I see no 2x or 16x visuals.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] Differentiate between DeleteQueries and DeleteQueriesARB

2016-04-07 Thread Rogovin, Kevin
Hello,

> typo -> ...query (same goes for patch 1/2)

I hate typos.  Thanks for pointing it out.

> Please correct me if I'm wrong, but I think we cannot unalias functions once 
> they're in.
> It will break the backwards compatibility we're trying to manage with glapi. 
> If we want to 
> retain it we should opt for the least likely to hit solution ?

I confess I do not understand. What exactly is being compatible with what?

> Either I'm misreading the comment above the function or this doesn't belong 
> here.

If I understood the code correctly, that is setting up a function table for 
when GL state is in a glBegin/glEnd pair. I added the ARB function because the 
non-ARB one was there already. 

-Kevin

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 2/2] mesa/es3.1: Limit Framebuffer Parameter OpenGL ES 3.1 usage

2015-08-24 Thread Rogovin, Kevin
Reviewed-by: Kevin Rogovin 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/2] mesa/es3.1: Expose GL_ARB_framebuffer_no_attachments to GLES 3.1

2015-08-24 Thread Rogovin, Kevin
Reviewed-by: Kevin Rogovin 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [v4 PATCH 05/10] mesa: helper function for scissor box of gl_framebuffer

2015-06-09 Thread Rogovin, Kevin


-Original Message-
From: Rogovin, Kevin 
Sent: Wednesday, June 10, 2015 8:56 AM
To: 'Ian Romanick'; mesa-dev@lists.freedesktop.org
Subject: RE: [Mesa-dev] [v4 PATCH 05/10] mesa: helper function for scissor box 
of gl_framebuffer

Hi,

-Original Message-
From: Ian Romanick [mailto:i...@freedesktop.org] 
Sent: Wednesday, June 10, 2015 1:06 AM
To: Rogovin, Kevin; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [v4 PATCH 05/10] mesa: helper function for scissor box 
of gl_framebuffer


>All of the changes in this function below this point seem unnecessary.
>If they are necessary, please explain why.  If they're not, mixing cosmetic 
>changes and functional >changes is bad... don't do it. :)  That also avoids 
>the issue of where to put the "+".

In truth, the raw diff hides the fact that this is actually the body of a new 
function taken from an old function. In v1 (and I think v2) I did not use xmax 
and ymax, but the code was severely over the 80 column limit and did a 
re-compute (which a reasonable compiler should likely realize and not do in 
compiled code). 

I am totally happy to make the body of the new function to be essentially 
identical to the body of the old function and thus leave it going past 
80-columns and doing the re-compute. Would that be better?



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [v4 PATCH 05/10] mesa: helper function for scissor box of gl_framebuffer

2015-06-09 Thread Rogovin, Kevin
Hi,

-Original Message-
From: Ian Romanick [mailto:i...@freedesktop.org] 
Sent: Wednesday, June 10, 2015 1:06 AM
To: Rogovin, Kevin; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [v4 PATCH 05/10] mesa: helper function for scissor box 
of gl_framebuffer


>All of the changes in this function below this point seem unnecessary.
>If they are necessary, please explain why.  If they're not, mixing cosmetic 
>changes and functional >changes is bad... don't do it. :)  That also avoids 
>the issue of where to put the "+".

In truth, the raw diff hides the fact that this is actually the body of a new 
function taken from an old function. In v1 (and I think v2) I did not use xmax 
and ymax, but the code was severely over the 80 column limit and did a 
re-compute (which a reasonable compiler should likely realize and not do in 
compiled code). 

I am totally happy to make the body of the new function to be essentially 
identical to the body of the old function and thus leave it going past 
80-columns and doing the re-compute. Would that be better?



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [v4 PATCH 05/10] mesa: helper function for scissor box of gl_framebuffer

2015-05-27 Thread Rogovin, Kevin

> Out of curiosity, what editor are you using?  Usually setting the
> indentation rules in your editor takes care of most of these problems.

I am using emacs23 (under Ubuntu) and now for something amusing...

> We have a .dir-locals.el file that should provide the correct settings
> for Emacs.

emacs23 rejected that .el file, switching to emacs24 solved the fiasco. 
Shudders. IF I had had a new emacs all the tabbing nits would have never 
appeared. Oh well.

-Kevin


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [v4 PATCH 04/10] mesa: add helper functions for geometry of gl_framebuffer

2015-05-27 Thread Rogovin, Kevin

> You should wait until you get a "real" review from someone before reposting.

I think that is good advice. Indeed, I am not going to post a v5 of this series 
until for each patch there is:

 Two review bys, possibly qualified with "fix the mentioned formatting issues"
OR
 Specific change requests. 

The patch series has not really changed in any meaningful way (in terms of 
compiled output) since v2.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [v4 PATCH 04/10] mesa: add helper functions for geometry of gl_framebuffer

2015-05-27 Thread Rogovin, Kevin
Hi, 
>> +static inline  GLuint
>Here and below, why 2 spaces between "inline" and "GLuint"?

I have no clue.  I suspect it is a scar from  some search/replace fiasco over 3 
weeks ago. You are the first person who spotted that nit.

>> --- a/src/mesa/main/mtypes.h
>> +++ b/src/mesa/main/mtypes.h
>> @@ -3187,7 +3187,13 @@ struct gl_framebuffer
>>  * GL_ARB_framebuffer_no_attachments must check for the flag 
>> _HasAttachments
>>  * and if GL_FALSE, must then use the values in DefaultGeometry to 
>> initialize
> > * its viewport, scissor and so on (in particular _Xmin, _Xmax, _Ymin and
> >-* _Ymax do NOT take into account _HasAttachments being false)
> >+* _Ymax do NOT take into account _HasAttachments being false). To get 
> >the
> >+* geometry of the framebuffer, the  helper functions

> Why 2 spaces between "the" and "helper"?

No clue. You are again the first person to spot that space. 

It would be nice to get a "fix these nits I found and get a reviewed by" badge.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [v4 PATCH 05/10] mesa: helper function for scissor box of gl_framebuffer

2015-05-27 Thread Rogovin, Kevin
Hi,

> 44 instances of a leading + in mesa/main compared to 78 trailing ones.
> Huh, I was going to say that it's really uncommon to do this in mesa, but 
> that may not be supported by fact.

Considering there is a formatting issue below, I can change it to trailing. I 
don’t care.

>> +  if (ymax < bbox[3]) {
>> +bbox[3] = ymax;
>indent looks messed up here.

Sighs, because it is. I missed that line, and I have missed that one since v2. 
I really wish I had a script to check the indenting. Sighs.

> I don't think the "extern" isn't really necessary here. A lot of older code 
> still has it, but you don't have to stick it in on newer additions.

I can kill the extern, I don't care; however, all the previous existing 
functions in the header had extern on them. Is it better for consistency within 
the file to keep the extern?

>>
>> +
> What's with the extra newline?

Double sighs. Apparently, my git skills are lacking and I let that added line 
sneak in when I had killed it from Patch 4. 


>  static inline  GLuint
>  _mesa_geometric_height(const struct gl_framebuffer *buffer)  {
> --
> 1.9.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [v3 PATCH 01/10] mesa:Define extension ARB_framebuffer_no_attachments

2015-05-21 Thread Rogovin, Kevin

> FWIW, the kernel standards for commit messages are at:
> https://www.kernel.org/doc/Documentation/SubmittingPatches
> Most of those rules apply to Mesa too. It says the body should be wrapped to 
> 75 chars (although I've been using 72 like Matt said). 

This is my point: "use most rules, but not all".. and "I've been more 
conservative than X but I did not need to be". What I am seeing is that there 
is, in some collective form, a set of consistent rules (in the form of ranges) 
that are strongly enforced and yet not written down.

Let's write them all  down here and now, put them in some file for others to 
read and to refer to. Maybe even con someone to write a lint-like script for 
those rules.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [v3 PATCH 01/10] mesa:Define extension ARB_framebuffer_no_attachments

2015-05-21 Thread Rogovin, Kevin


> I suppose it could be useful, but I think we've been mostly successful at 
> just expecting people to recognize when what they're writing doesn't look 
> like the code around it.

This is my point. Older code had different style/expectations than newer code. 
For this patch series, I have hit a number of nits and quasi-nits that are 
ambiguous:

0. In v1 to v2, avoid GL types atleast in i965 although much of the code uses 
that (the explanation given was to move away from GL types). Does this apply to 
Mesa core as well?

 1. the casting bit in the i965 patch to use the _mesa_geomety_ functions. 
These are not quite nits, but in some ways not a big deal. I hit these because 
there were ints there before. In this regard doing what was there before was 
ungood (atleast for this review).

2. the line between function thing. In truth I just missed that extra line for 
the added to framebuffer.h (and I should not have), but there are places in the 
code that there are multiple empty lines between function definitions. I do not 
mind saying "no extra empty lines", but not knowing the rules and attempting to 
infer them from the code, I seem to hit too many nits.

3. Even on the subject of git commit, I am seeing different answers: 75, but 
try 50 usually, but understandable if cannot do it. Utterly ambiguous.

4. on the subject of line length I have this so far: usually 78, but 80 
sometimes is ok. Does ok, for example, include making a large-ish comment block 
more justified? 

5. Consecutive empty lines is not ok, except in function definitions, but then 
only sometimes. I am guessing "sometimes" is for grouping function definitions, 
but plenty of files follow different conventions (hence what Brian Paul said).

Given that nits just add traffic (and I want to minimize that silliness) I 
think it would be wise to set down some precise rules so there is no judgement 
calls required for something as silly as formatting nits. Ideally, we would 
have a lint script that would do it for us :) I see that reviews usually first 
hit nits, then content of patches. That is fine, but I'd rather know all the 
rules rather than hitting the nits at all.

Again, I really have no preference since someone is paying me to do this, but 
knowing the exact rules would be more efficient. Inferring rules is quite error 
prone IMO (indeed, at one point, the GLSL pre-processor appeared to be written 
in a different style).

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [v3 PATCH 01/10] mesa:Define extension ARB_framebuffer_no_attachments

2015-05-21 Thread Rogovin, Kevin
HI,



>Or 78 columns, to be safe, but there's exceptions, like if you're 
> defining a big static table/array of info.

Uggg I don't mind exceptions, but knowing them is key. 



>>   4. successive parenthesis must have spaces between parenthesis

> Example?

if (some_func(some_argument)) 

is bad, but

if (some_func(some_argument) ) 

is good. I am also guessing that

a = foo(bar(x));

is bad and must be changed to

a = foo(bar(x) );



>>   6. Use "whether condition" when describing a bool instead of "true if 
>> condition is true"
> not sure we care about that.

I hit a nit for it in the series, so someone cared. 

>   7. derived values of structs -should- be prefixed with an underscore (this 
> rule is loaded with exceptions in the code base from its evolution)
>   8.  "indenting" is 3 spaces, except after switch where it is 0 (but after 
> case it is 3).
>   9.  open bracket on same line

The 'indent' command in the docs should cover that.


> >  11. functions for an extension must check if extension is supported and if 
> > not emit an INVALID_OPERATION message instead of relying on function table 
> > dispatch to guarantee they are not called.

> Not sure about that, but that's not a coding style standard.

Perhaps coding standard is not the right word, but it is a requirement (atleast 
it seems that way) and is a trivial requirement to satisfy. 

>>   12. (Guessing here) consecutive empty lines are not allowed

> Generally true, except between functions.

Ugg... I hit a nit from an extra space between functions.

>   13. If changing a line that violates the nit rules, fix the line too rather 
> than just adding the change

Yeah, probably.

> Feel free to submit a patch against docs/devinfo.html with this info. :)

I do not mind submitting the patch, but I want to know what the rules are; 
preferably explicitly written rather than inferred (by me). Especially since I 
seem to hit nits like no tomorrow even when trying not to :)

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [v3 PATCH 06/10] i965: Use _mesa_geometry_ functions appropriately

2015-05-21 Thread Rogovin, Kevin

> Strange because of the types -- presumably fb_samples is an int because its 
> uses are in a comparison with another int (that probably doesn't need to be 
> an int :)
I am really hesitant to start changing types all over the place; I have a 
sinking suspicion that changing the types of fb_width, _height and so on to 
unsigned int might require some careful checking (the minus ones here and there 
need some checking. If it really must be done then so be it, but the current 
series preserves more of the old behavior and I want to try to not add even 
more changes.

So just to ask explicitly: do you want that those values fb_width = 
_mesa_geometry_width() and related buggers to be unsigned int and go through 
the process of making sure unsigned int is ok? There are some uglies related to 
a few "minus one" 's that make me twitch.

Lastly, when I posted v2 of this series, no one commented on the use of int's. 
For the v1, Topi requested to have some values start as floats to avoid later 
casts (and this was done).

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [v3 PATCH 01/10] mesa:Define extension ARB_framebuffer_no_attachments

2015-05-21 Thread Rogovin, Kevin

> This line is too long. (It will not fit in 80 columns in git log since git 
> log adds some spaces before each commit message line.)

What is the accepted maximum length for a line in a commit message?

>>  - extension table
>>  - additions to gl_framebuffer
>> 
>> v1 -> v2
>>  Spacing and trailing spaces fixes.

>This looks odd to me. I think you only need 'v2:' here. So, either

I have seen a number of patches with the notation v1 -> v2 to list changes 
between versions. Those patches that I saw using that notation did not have 
comments about using the format v1->v2.  If people want v2: instead of v1->v2, 
I am fine with it, but was just following what I saw in some patch series.

Given the number of nits around (that I seem to hit regularly), it might be 
beneficial for Mesa to have a short document listing the formatting 
requirements, of which I have so far collected:
 1. 80 column limit in source files
 2. Justify comments to 80 columns as well
 3. comparison expressions require spaces around both sides of comparison 
operator
 4. successive parenthesis must have spaces between parenthesis
 5. git commit messages have limit of N characters per line (the value of N I 
am asking above).
 6. Use "whether condition" when describing a bool instead of "true if 
condition is true"
 7. derived values of structs -should- be prefixed with an underscore (this 
rule is loaded with exceptions in the code base from its evolution)
 8.  "indenting" is 3 spaces, except after switch where it is 0 (but after case 
it is 3).
 9.  open bracket on same line
 10. no white spaces at end of line
 11. functions for an extension must check if extension is supported and if not 
emit an INVALID_OPERATION message instead of relying on function table dispatch 
to guarantee they are not called.
 12. (Guessing here) consecutive empty lines are not allowed
 13. If changing a line that violates the nit rules, fix the line too rather 
than just adding the change

I suspect there are more as I listen to the nits, I think it might be 
beneficial to collect all the formatting nits and write them down to the coding 
standard thing in Mesa so that others can refer to it. Especially useful for 
those that work on multiple projects with different coding standards.

-Kevin

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [v3 PATCH 06/10] i965: Use _mesa_geometry_ functions appropriately

2015-05-21 Thread Rogovin, Kevin


>>  src/mesa/drivers/dri/i965/brw_misc_state.c | 10 +++---
> > src/mesa/drivers/dri/i965/brw_sf_state.c   |  8 

> As is this?

brw_misc_state.c is active for all Gens. The change to brw_sf_state.c  is to 
add a comment warning that using gl_framebuffer::Width and so on is only ok if 
_HasAttachments is false.

> BEGIN_BATCH(4);
> OUT_BATCH(_3DSTATE_DRAWING_RECTANGLE << 16 | (4 - 2));
> OUT_BATCH(0); /* xmin, ymin */
> -   OUT_BATCH(((ctx->DrawBuffer->Width - 1) & 0x) |
> -   ((ctx->DrawBuffer->Height - 1) << 16));
> +   OUT_BATCH(((fb_width - 1) & 0x) |
> +   ((fb_height - 1) << 16));

> Remove the tab on this line and indent it properly while we're changing it.

Actually it fits on one line, so that will be the change :)

> +   /* Accessing the fields Width and Height of
> +* gl_framebuffer to produce the values to
> +* program the viewport and scissor is fine
> +* as long as the gl_framebuffer has atleast
> +* one attachment.

> Line wrapping...

OK.

> struct brw_state_flags state = brw->state.pipelines[pipeline];
> +   int fb_samples = (int)_mesa_geometric_samples(ctx->DrawBuffer);

> The cast looks strange

Is there a spacing missing in the cast? or is it strange because of the types?


> These casts look weird. (Happens elsewhere in this patch too).
> Assuming brw_context::num_samples doesn't need to be signed, I'd be
> inclined to change its type to unsigned and remove the casts. Grepping
> for 'num_samples = ' shows some other instances of us implicitly
> converting unsigned <-> int.

I was hesitant to be changing types in the patch series. I wanted to minimize 
the changes and try to keep it consistent with what is left unchanged. If the 
types should be unsigned int, I can do that, but I would like to hear the 
opinions of others too.


> Extra new line.
OK.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [v3 PATCH 03/10] mesa: Complete implementation for ARB_framebuffer_no_attachments in Mesa core

2015-05-21 Thread Rogovin, Kevin



> Extra space between . and "
Fix I will

> Extra space before . 
Fix I will

> Also, if anyone is ever grepping for MAX_FRAMEBUFFER_SAMPLES, they won't find 
> this. I'd move the whole word to the next line.
Ok.

> +   *
> +   * The same requirements are also in place for GL 4.5,
> +   * Section 9.4.1 "Framebuffer Attachment Completeness", pg 310-311
> +   *
> +   * However, this is a tighter restriction than previous version of GL.
> +   * In interest of better compatibility, we will not enforce these
> +   * restrictions.
> +   */

> Comment at the end of a block like this looks strange. I'm not sure what it's 
> commenting on.

The purpose of the comment block is to list what the extension and GL4.5 add to 
the requirements for FBO happiness.  The last sentence is writing out *now* 
that these added requirements, if enforced, risk breaking older applications 
and stating that the extra requirements will not be enforced. Would it be 
helpful at the top of the comment block to say 
"GL4.5/GL_ARB_framebuffer_no_attachments" added these requirements?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [v3 PATCH 02/10] mesa:Define constants and functions for ARB_framebuffer_no_attachment extension

2015-05-21 Thread Rogovin, Kevin


> I'm happy to see new documentation, so thanks for writing it up!
> But let's separate this from the functional changes related to implementing 
> the extension. (Didn't I give this comment last time?)

If you did, I missed it. Unless there are objections, I'll remove this from the 
series and make a tiny patch later that is just the documentation.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/9] mesa: add helper convenience functions for fetching geometry of gl_framebuffer

2015-05-06 Thread Rogovin, Kevin
Sorry, one more question, which I should have asked before, but neglected to:

> If we go that route, we should probably just add a gl_framebuffer::Samples 
> field for uniformity.

If this is done, then there are two ways to get the value: from the Visual 
field and from the new field; Naturally, all drivers now use the field from 
Visual. If gl_framebuffer::Samples or potentially a better named 
gl_framebuffer::_Samples is added, then should the field be removed from Visual 
and all references in drivers to use the new field?

-Kevin


-Original Message-
From: Rogovin, Kevin 
Sent: Wednesday, May 06, 2015 1:00 PM
To: 'Ian Romanick'; 'mesa-...@freedesktop.org'
Subject: RE: [Mesa-dev] [PATCH 4/9] mesa: add helper convenience functions for 
fetching geometry of gl_framebuffer

One more question:
 for patch 2 of the series, there was the request to change the type of 
_HasAttachments from GLboolean to bool. If the helper functions "survive", 
should the helper functions return unsigned int instead of GLuint?

-Kevin

-Original Message-
From: Rogovin, Kevin 
Sent: Wednesday, May 06, 2015 10:28 AM
To: 'Ian Romanick'; mesa-...@freedesktop.org
Subject: RE: [Mesa-dev] [PATCH 4/9] mesa: add helper convenience functions for 
fetching geometry of gl_framebuffer


> I'm waffling about this a bit.  I'm concerned that people will use
> buffer->Width when they should use the accessor.  I don't see any
> changes to core Mesa code to use the accessor, so I'm a little concerned that 
> some subtle, incorrect behavior is introduced... but there may well not be.

> A common idiom in Mesa is to have an _Value field that is the derived value.  
> In many structures in mtypes.h you can see things like Enabled and 
> _ReallyEnabled.  One is what the API sets, and the other is what the driver 
> uses that is based on the API setting and "other factors."  In many ways, the 
> existing gl_framebuffer Width and Height fields are already derived values 
> based on the sizes of the attachments.

>For at least Width, Height, and MaxNumLayers, it seems better to set the 
>existing fields differently when _HasAttachments is true.  That reduces the 
>number of places where a person has to think about _HasAttachments 
>considerably... though perhaps there is something else that I'm not thinking 
>of?

I was thinking of doing this when I was making the patch. The problem is that 
Width, Height and MaxNumLayers were used for two different purposes:
 1. Specifying the intersection of the attached buffers (this affects blitting 
for example)  2. Specifying the "geometry" to send to a rasterizer

Changing Width and Height to be the geometry made me itch because Width and 
Height are zero when there are no buffer attachments. I added a mess of 
comments about the meaning of Width and Height. My concern is that other parts 
of code use Width and Height being 0 to do other things.  With this patch way, 
the meaning of Width and Height has not changed; instead a diver has to "know" 
what it is doing when it enables the extension.

I am not particular about this though; I made it this way in an attempt to 
reduce the scope of changes from this patch series. If people want that the 
meaning of Width, Height and MaxSamplers changes I can go with that too, I just 
think it can increase the ickiness of making sure no unintended side effects 
are added is ickier.


> If we go that route, we should probably just add a gl_framebuffer::Samples 
> field for uniformity.

> +
> +
> +static inline  GLuint
> +_mesa_geometric_height(const struct gl_framebuffer *buffer) {
> +   return buffer->_HasAttachments ?
> +  buffer->Height : buffer->DefaultGeometry.Height; }
> +
> +static inline  GLuint
> +_mesa_geometric_samples(const struct gl_framebuffer *buffer) {
> +   return buffer->_HasAttachments ?
> +  buffer->Visual.samples : buffer->DefaultGeometry.NumSamples;
> +}
> +
> +static inline  GLuint
> +_mesa_geometric_layers(const struct gl_framebuffer *buffer) {
> +   return buffer->_HasAttachments ?
> +  buffer->MaxNumLayers : buffer->DefaultGeometry.Layers; }
> +
>  extern void
>  _mesa_update_draw_buffer_bounds(struct gl_context *ctx);
>  
> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 
> ef97538..f0e8fbc 100644
> --- a/src/mesa/main/mtypes.h
> +++ b/src/mesa/main/mtypes.h
> @@ -3178,7 +3178,13 @@ struct gl_framebuffer
>  * the values in DefaultGeometry to initialize its
>  * viewport, scissor and so on (in particular _Xmin,
>  * _Xmax, _Ymin and _Ymax do NOT take into account
> -* _HasAttachments being false)
> +* _HasAttachments being false). To get the geometry
> +* of 

Re: [Mesa-dev] [PATCH 7/9] i965: ensure execution of fragment shader when fragment shader has atomic buffer access

2015-05-06 Thread Rogovin, Kevin
Hi,

> I think this check should be put in a utility function up in core Mesa 
> somewhere.  It's open-coded twice in this patch, and the check will change 
> when GL_ARB_image_load_store lands.


Would a check of the form:

inline bool
_mesa_has_atomic_ops(struct gl_context *ctx) 
{
   return ctx-> Shader._CurrentFragmentProgram != NULL &&
   ctx->Shader._CurrentFragmentProgram->NumAtomicBuffers > 0;
}

be good? 

I am hesitant to make a check that just tests for side effects since for some 
hardware (like Gen8 for example) there is not just a simple single flag to say 
"must execute frag shader", instead there are several such flags that "force to 
run fragment shader", but the flags by themselves take on other meanings; there 
was some discussion on flag things in the first posting of this patch with 
Curro and Kenneth.

-Kevin
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/9] mesa: add helper convenience functions for fetching geometry of gl_framebuffer

2015-05-06 Thread Rogovin, Kevin
One more question:
 for patch 2 of the series, there was the request to change the type of 
_HasAttachments from GLboolean to bool. If the helper functions "survive", 
should the helper functions return unsigned int instead of GLuint?

-Kevin

-Original Message-----
From: Rogovin, Kevin 
Sent: Wednesday, May 06, 2015 10:28 AM
To: 'Ian Romanick'; mesa-...@freedesktop.org
Subject: RE: [Mesa-dev] [PATCH 4/9] mesa: add helper convenience functions for 
fetching geometry of gl_framebuffer


> I'm waffling about this a bit.  I'm concerned that people will use
> buffer->Width when they should use the accessor.  I don't see any
> changes to core Mesa code to use the accessor, so I'm a little concerned that 
> some subtle, incorrect behavior is introduced... but there may well not be.

> A common idiom in Mesa is to have an _Value field that is the derived value.  
> In many structures in mtypes.h you can see things like Enabled and 
> _ReallyEnabled.  One is what the API sets, and the other is what the driver 
> uses that is based on the API setting and "other factors."  In many ways, the 
> existing gl_framebuffer Width and Height fields are already derived values 
> based on the sizes of the attachments.

>For at least Width, Height, and MaxNumLayers, it seems better to set the 
>existing fields differently when _HasAttachments is true.  That reduces the 
>number of places where a person has to think about _HasAttachments 
>considerably... though perhaps there is something else that I'm not thinking 
>of?

I was thinking of doing this when I was making the patch. The problem is that 
Width, Height and MaxNumLayers were used for two different purposes:
 1. Specifying the intersection of the attached buffers (this affects blitting 
for example)  2. Specifying the "geometry" to send to a rasterizer

Changing Width and Height to be the geometry made me itch because Width and 
Height are zero when there are no buffer attachments. I added a mess of 
comments about the meaning of Width and Height. My concern is that other parts 
of code use Width and Height being 0 to do other things.  With this patch way, 
the meaning of Width and Height has not changed; instead a diver has to "know" 
what it is doing when it enables the extension.

I am not particular about this though; I made it this way in an attempt to 
reduce the scope of changes from this patch series. If people want that the 
meaning of Width, Height and MaxSamplers changes I can go with that too, I just 
think it can increase the ickiness of making sure no unintended side effects 
are added is ickier.


> If we go that route, we should probably just add a gl_framebuffer::Samples 
> field for uniformity.

> +
> +
> +static inline  GLuint
> +_mesa_geometric_height(const struct gl_framebuffer *buffer) {
> +   return buffer->_HasAttachments ?
> +  buffer->Height : buffer->DefaultGeometry.Height; }
> +
> +static inline  GLuint
> +_mesa_geometric_samples(const struct gl_framebuffer *buffer) {
> +   return buffer->_HasAttachments ?
> +  buffer->Visual.samples : buffer->DefaultGeometry.NumSamples;
> +}
> +
> +static inline  GLuint
> +_mesa_geometric_layers(const struct gl_framebuffer *buffer) {
> +   return buffer->_HasAttachments ?
> +  buffer->MaxNumLayers : buffer->DefaultGeometry.Layers; }
> +
>  extern void
>  _mesa_update_draw_buffer_bounds(struct gl_context *ctx);
>  
> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 
> ef97538..f0e8fbc 100644
> --- a/src/mesa/main/mtypes.h
> +++ b/src/mesa/main/mtypes.h
> @@ -3178,7 +3178,13 @@ struct gl_framebuffer
>  * the values in DefaultGeometry to initialize its
>  * viewport, scissor and so on (in particular _Xmin,
>  * _Xmax, _Ymin and _Ymax do NOT take into account
> -* _HasAttachments being false)
> +* _HasAttachments being false). To get the geometry
> +* of the framebuffer, the  helper functions
> +*   _mesa_geometric_width(),
> +*   _mesa_geometric_height(),
> +*   _mesa_geometric_samples(),
> +*   _mesa_geometric_layers()
> +* are available that check _HasAttachments.
>  */
> GLboolean _HasAttachments;
>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/9] mesa:Define constants and functions for GL_ARB_framebuffer_no_attachment extension

2015-05-06 Thread Rogovin, Kevin

> You haven't been running 'make check'. :) You also need to update 
> src/mesa/tests/dispatch_sanity.cpp.

There is something wrong with my box or something... I did run make check and 
there were no failures. Out of paranoia, I also ran 
src/mesa/main/tests/main-test explicitly and there were no failures there 
either. When I look at src/mesa/tests/dispatch_sanity.cpp, the functions for 
ARB_framebuffer_no_attachments in the table are commented out. When I remove 
the comments from them, then the test fails (DispatchSanity_test.GL31_CORE).

What am I missing? I am utterly confused now about 
src/mesa/tests/dispatch_sanity.cpp.

-Kevin


On 04/29/2015 01:56 AM, kevin.rogo...@intel.com wrote:
> From: Kevin Rogovin 
> 
> Define the enumeration constants, function entry points and glGet for 
> the GL_ARB_framebuffer_no_attachments
> 
> ---
>  .../glapi/gen/ARB_framebuffer_no_attachments.xml   | 33 +++
>  src/mapi/glapi/gen/Makefile.am |  1 +
>  src/mapi/glapi/gen/gl_API.xml  |  4 ++-
>  src/mesa/main/fbobject.c   | 12 +++
>  src/mesa/main/fbobject.h   |  7 
>  src/mesa/main/get.c|  3 ++
>  src/mesa/main/get_hash_params.py   | 38 
> ++
>  7 files changed, 97 insertions(+), 1 deletion(-)  create mode 100644 
> src/mapi/glapi/gen/ARB_framebuffer_no_attachments.xml
> 
> diff --git a/src/mapi/glapi/gen/ARB_framebuffer_no_attachments.xml 
> b/src/mapi/glapi/gen/ARB_framebuffer_no_attachments.xml
> new file mode 100644
> index 000..5bde14d
> --- /dev/null
> +++ b/src/mapi/glapi/gen/ARB_framebuffer_no_attachments.xml
> @@ -0,0 +1,33 @@
> +
> +
> +
> +
> +
> +
> +
> +  +name="FRAMEBUFFER_DEFAULT_HEIGHT" value="0x9311" />  +name="FRAMEBUFFER_DEFAULT_LAYERS" value="0x9312" />  +name="FRAMEBUFFER_DEFAULT_SAMPLES" value="0x9313" />  +name="FRAMEBUFFER_DEFAULT_FIXED_SAMPLE_LOCATIONS" value="0x9314" /> 
> +  +name="MAX_FRAMEBUFFER_HEIGHT" value="0x9316" />  +name="MAX_FRAMEBUFFER_LAYERS" value="0x9317" />  +name="MAX_FRAMEBUFFER_SAMPLES" value="0x9318" />
> +
> +
> +

Since it looks like your patches will land before my ES3.1 patches, I'll have 
to update these with es2="3.1".  This is mostly to help me remember.

> +
> +
> + 
> +
> +
> +
> +
> + 
> +
> +
> +
> +
> diff --git a/src/mapi/glapi/gen/Makefile.am 
> b/src/mapi/glapi/gen/Makefile.am index 1c4b86a..dc61739 100644
> --- a/src/mapi/glapi/gen/Makefile.am
> +++ b/src/mapi/glapi/gen/Makefile.am
> @@ -129,6 +129,7 @@ API_XML = \
>   ARB_draw_instanced.xml \
>   ARB_ES2_compatibility.xml \
>   ARB_ES3_compatibility.xml \
> + ARB_framebuffer_no_attachments.xml \
>   ARB_framebuffer_object.xml \
>   ARB_geometry_shader4.xml \
>   ARB_get_program_binary.xml \
> diff --git a/src/mapi/glapi/gen/gl_API.xml 
> b/src/mapi/glapi/gen/gl_API.xml index a8a6db6..91882ef 100644
> --- a/src/mapi/glapi/gen/gl_API.xml
> +++ b/src/mapi/glapi/gen/gl_API.xml
> @@ -8324,7 +8324,9 @@
>
>  
> -
> + +xmlns:xi="http://www.w3.org/2001/XInclude"/>
> +
> +
>  
>  
>   
> diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c index 
> eabbb96..5c78c40 100644
> --- a/src/mesa/main/fbobject.c
> +++ b/src/mesa/main/fbobject.c
> @@ -1292,6 +1292,18 @@ _mesa_BindRenderbufferEXT(GLenum target, GLuint 
> renderbuffer)
> bind_renderbuffer(target, renderbuffer, true);  }
>  
> +extern void GLAPIENTRY
> +_mesa_FramebufferParameteri(GLenum target, GLenum pname, GLint param) 
> +{
> +  /* to be implemented */
> +}
> +
> +extern void GLAPIENTRY
> +_mesa_GetFramebufferParameteriv(GLenum target, GLenum pname, GLint 
> +*params) {
> +  /* to be implemented */
> +}
> +
>  
>  /**
>   * Remove the specified renderbuffer or texture from any attachment 
> point in diff --git a/src/mesa/main/fbobject.h 
> b/src/mesa/main/fbobject.h index 61aa1f5..76adb92 100644
> --- a/src/mesa/main/fbobject.h
> +++ b/src/mesa/main/fbobject.h
> @@ -211,4 +211,11 @@ extern void GLAPIENTRY  
> _mesa_DiscardFramebufferEXT(GLenum target, GLsizei numAttachments,
>  const GLenum *attachments);
>  
> +
> +extern void GLAPIENTRY
> +_mesa_FramebufferParameteri(GLenum target, GLenum pname, GLint 
> +param);
> +
> +extern void GLAPIENTRY
> +_mesa_GetFramebufferParameteriv(GLenum target, GLenum pname, GLint 
> +*params);
> +
>  #endif /* FBOBJECT_H */
> diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c index 
> a881bc5..ca9d13c 100644
> --- a/src/mesa/main/get.c
> +++ b/src/mesa/main/get.c
> @@ -393,6 +393,7 @@ EXTRA_EXT(INTEL_performance_query);
>  EXTRA_EXT(ARB_explicit_uniform_location);
>  EXTRA_EXT(ARB_clip_control);
>  EXTRA_EXT(EXT_polygon_offset_clamp);
> +EXTRA_EXT(ARB_framebuffer_no_attachments);
>  
>  static const int
>  extra_ARB_color_buffer_float_or_glcore[] = { @@ -466,6 +467,8 @@ 
> static const int extra_core_ARB_color_buffer_float_and_new

Re: [Mesa-dev] [PATCH 4/9] mesa: add helper convenience functions for fetching geometry of gl_framebuffer

2015-05-06 Thread Rogovin, Kevin

> I'm waffling about this a bit.  I'm concerned that people will use
> buffer->Width when they should use the accessor.  I don't see any
> changes to core Mesa code to use the accessor, so I'm a little concerned that 
> some subtle, incorrect behavior is introduced... but there may well not be.

> A common idiom in Mesa is to have an _Value field that is the derived value.  
> In many structures in mtypes.h you can see things like Enabled and 
> _ReallyEnabled.  One is what the API sets, and the other is what the driver 
> uses that is based on the API setting and "other factors."  In many ways, the 
> existing gl_framebuffer Width and Height fields are already derived values 
> based on the sizes of the attachments.

>For at least Width, Height, and MaxNumLayers, it seems better to set the 
>existing fields differently when _HasAttachments is true.  That reduces the 
>number of places where a person has to think about _HasAttachments 
>considerably... though perhaps there is something else that I'm not thinking 
>of?

I was thinking of doing this when I was making the patch. The problem is that 
Width, Height and MaxNumLayers were used for two different purposes:
 1. Specifying the intersection of the attached buffers (this affects blitting 
for example)
 2. Specifying the "geometry" to send to a rasterizer

Changing Width and Height to be the geometry made me itch because Width and 
Height are zero when there are no buffer attachments. I added a mess of 
comments about the meaning of Width and Height. My concern is that other parts 
of code use Width and Height being 0 to do other things.  With this patch way, 
the meaning of Width and Height has not changed; instead a diver has to "know" 
what it is doing when it enables the extension.

I am not particular about this though; I made it this way in an attempt to 
reduce the scope of changes from this patch series. If people want that the 
meaning of Width, Height and MaxSamplers changes I can go with that too, I just 
think it can increase the ickiness of making sure no unintended side effects 
are added is ickier.


> If we go that route, we should probably just add a gl_framebuffer::Samples 
> field for uniformity.

> +
> +
> +static inline  GLuint
> +_mesa_geometric_height(const struct gl_framebuffer *buffer) {
> +   return buffer->_HasAttachments ?
> +  buffer->Height : buffer->DefaultGeometry.Height; }
> +
> +static inline  GLuint
> +_mesa_geometric_samples(const struct gl_framebuffer *buffer) {
> +   return buffer->_HasAttachments ?
> +  buffer->Visual.samples : buffer->DefaultGeometry.NumSamples;
> +}
> +
> +static inline  GLuint
> +_mesa_geometric_layers(const struct gl_framebuffer *buffer) {
> +   return buffer->_HasAttachments ?
> +  buffer->MaxNumLayers : buffer->DefaultGeometry.Layers; }
> +
>  extern void
>  _mesa_update_draw_buffer_bounds(struct gl_context *ctx);
>  
> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 
> ef97538..f0e8fbc 100644
> --- a/src/mesa/main/mtypes.h
> +++ b/src/mesa/main/mtypes.h
> @@ -3178,7 +3178,13 @@ struct gl_framebuffer
>  * the values in DefaultGeometry to initialize its
>  * viewport, scissor and so on (in particular _Xmin,
>  * _Xmax, _Ymin and _Ymax do NOT take into account
> -* _HasAttachments being false)
> +* _HasAttachments being false). To get the geometry
> +* of the framebuffer, the  helper functions
> +*   _mesa_geometric_width(),
> +*   _mesa_geometric_height(),
> +*   _mesa_geometric_samples(),
> +*   _mesa_geometric_layers()
> +* are available that check _HasAttachments.
>  */
> GLboolean _HasAttachments;
>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/9] mesa: Complete implementation for GL_ARB_framebuffer_no_attachments in Mesa core

2015-05-06 Thread Rogovin, Kevin
HI,


> For both this and get_framebuffer_parameteriv below, I don't see the value in 
> splitting the implementations.  Also, these functions need to check that the 
> extension is enabled and generate  GL_INVALID_OPERATION if it is not.

No worries, I can add the GL_INVALID_OPERATION bit in. I have a question:  
these functions (with the patch) are only called from the _mesa_ functions, 
which are what gl functions are. If the extension is not on, does 
glxGetProcAddress/eglGetProcAccress able to return these functions? I ask 
because I thought that if the extension was missing, then the functions cannot 
be called by the application. 

By split do you mean that the GL API function calls the static function? I put 
this split in so that when the DSA functions are made, they can call the static 
functions. Would you like me to kill that split still?

> +{
> +   switch (pname) {
> +   case GL_FRAMEBUFFER_DEFAULT_WIDTH:
> +  if (param < 0 || param > ctx->Const.MaxFramebufferWidth)
> +_mesa_error(ctx, GL_INVALID_VALUE, "%s", func);

> It looks like your indention is off by one here and elsewhere in the switch.

Icks. Sorry about that.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/9] i965: Use _mesa_geometry_ functions appropriately

2015-04-29 Thread Rogovin, Kevin
Hi,

 One comment on the code, or rather a request to the reviewer of the code. The 
icky part of checking this patch is correct is checking that the remaining 
instances of gl_framebuffer::Width, Height, MaxNumLayers and Visaul.samples are 
"correct". I believe I "got 'em all", of those that should be changed and those 
changed should be changed, but the part of checking if the remaining unchanged 
bits is not seen from the raw diff. Also, the exchange I had with Topi 
previously, what I was really after was just this, to make sure this gets 
checked and I got terribly, embarrassingly sidetracked by to split and include 
or not include work for a particular Gen4/5 atom only atom. My sincere 
apologies for all the traffic and me making, by my actions, the community and 
development process worse.

 -Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/9] mesa: Complete implementation for GL_ARB_framebuffer_no_attachments in Mesa core

2015-04-29 Thread Rogovin, Kevin
I just want to make a begging on the review for this patch: I am a touch 
paranoid
about how the thing will act under the GL ES situation; I believe it should 
follow
the spec, but if whoever reviews does the extra leg work of checking that I got 
this
right, I'd really appreciate it.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 9/9] mark GL_ARB_framebuffer_no_attachments as done for i965

2015-04-29 Thread Rogovin, Kevin
Hi,

> I'd just go read the ES 3.1 spec and see if there are any differences in this 
> area.

I checked the spec, and it appears to me to have the same behavior as 
GL_ARB_framebuffer_no_attachments.

> Also, please fix your mail client to stop its weird line wrapping (and the 
> other half of the time sending HTML mail).

Fingers crossed that the settings are correct.

 -Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 9/9] mark GL_ARB_framebuffer_no_attachments as done for i965

2015-04-29 Thread Rogovin, Kevin
> When you rebase this, I'd appreciate it if you could insert it into
> the list in alphabetical order. (You based this on a commit where a
> bunch of the later additions were already not inserted alphabetically,
> but I've recently fixed that up.)

Sure, no worries. Given that I am just changing the word "not started" to "DONE 
(i965)", it is a no-op really.

However, I'd really appreciate any advice on the GL ES situation, basically if 
more say mark as done in ES I will, if less say I won't.

I also would really like to hear opinions on GL ES interaction on Patch 3.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 9/9] mark GL_ARB_framebuffer_no_attachments as done for i965

2015-04-29 Thread Rogovin, Kevin
> At the bottom is another block with GLES 3.1 requirements, which also  
> contains GL_ARB_f_n_a.

At first, I said "Oh futz, I did not mark that one". Then I did some thinking. 
Before expressing
my thoughts I want to emphasize that I really do not know what is the best 
answer, or
potentially even a good one. With the caveat in mind, maybe it is best that it 
is not marked
as ready for GL ES. Maybe. The potentially flawed reasoning is as follows:

The patch series enables the extension for OpenGL and the extension stuff, I 
think, only
applies to GL, and not GL ES, contexts. In particular, the "wiring" for the new 
functions is
not implemented for GL ES.

Secondly, the code in Mesa core from patch 3 of this series just checks for the 
driver
reporting it supports the extension; it does not do the check "if driver 
supports extension
AND  it is GL". However, the default geometry values are zero, so the correct 
error
code gets emitted anyways, but the text-error is not correct.  This part makes 
me twitch,
though it follows the letter of the specs jut tine.

Lastly, and this is just paranoia and quite likely not an issue. I had not 
checked, much less
quote, the relevant sections of the GL ES 3.1 spec. I think they are the same, 
but I did
not check it at all.

As one can see from the above, when implementing this, GLES did not even cross 
my mind.

My tendency is to mark the "enabling" of this feature for ES3.1 a separate 
patch, but on
the other hand it is not code, it is just saying the feature is ready for 
eventual use by ES3.1.
Now in mid e-mail, I am thinking it should be marked. Beats me what is best.

In truth what is making me twitch is Patch 3 under GL ES 3.1. I think that the 
function
might need to be revisited a little when GL ES 3.1 support goes live.

Thoughts on this are welcome, particular in reference to Patch 3 of the series.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/7] i965: use _mesa_geometry_width/height/layers/samples for programming geometry of framebuffer to GEN

2015-04-28 Thread Rogovin, Kevin
Hello,


> No, because the non-shared code is (by your own admission) untested and/or 
> dead code.  Untested code is broken code.  I would personally be ok with a 
> lot > of the changes that just replace fb->Width with
> _mesa_geometric_width(fb) since it's effectively just replacing a direct 
> access with a getter.  However, almost half of the patch is updating the 
> upload_sf_vp > function which is only used for gen <= 5.  A comment or assert 
> there would be sufficient rather than reworking it.

Fair enough. Would the following be good:
 - keep all those that replace fb->whatever with _mesa_geomety_whatever,
 - instead of the ick I have done to upload_sf_vp, place a big comment warning

I would be happy with the above  as it addresses my main concern and the 
dead-is-broken code concern as well. If I had physical access to a Gen4 and 5 
box I would test it and if it worked, enable the extension on those platforms 
as well.

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/7] i965: use _mesa_geometry_width/height/layers/samples for programming geometry of framebuffer to GEN

2015-04-28 Thread Rogovin, Kevin

> I read the patch again and I'm still in the opinion that the changes to the 
> pure pre-gen7 logic (i.e., logic that is not re-used for later gens) are not 
> needed.

As I have tried and apparently failed to communicate, it is -better- and more 
consistent. Need 
is a far stronger word. Without a doubt, if the extension is never enabled for 
those older
Gens, then it does not matter in terms of produced output. However, I stated 
that it leaves 
a trap and an inconsistency which I find quite bothering.

> The shared logic between pre-gen7 and later, namely setup for renderbuffers, 
> drawing rectangle and 
> fragment shader compilation key are safe to do as they only introduce new 
> logic that is conditional to 
> no-attachments being used.

And that is exactly for the case for that code that is not shared. Indeed, if 
the shared code is safe 
for pre-Gen7, then so is the non-shared code. 

> Your concern about the readers getting confused could be also addressed with 
> assert(brw->gen >= 7) 
>  and a comment saying that the no-attachment specific path is not applicable 
> for older gens.

There is only one occurrence of "no-attachment specific code paths" in these 
i965 patches
and that is associated to scissoring.  The rest is existing code is changed 
from accessing Width, 
Height of gl_framebuffer to "getting" those values from a function. There is no 
proper place
to insert an assert(brw->gen >=7 ), since, with the exception of the scissoring 
(and it is just
one if block) there is no such "no attachment code path". I had thought the 
diffs of the series
made that quite clear.

> And when it comes to the pure pre-gen7 logic, I, in fact, have just the 
> opposite opinion on making it to go through the no-attachment-aware path.
> As the extension is not possible for older gens, I find it clearer that logic 
> explicitly by-passes such paths that even consider it.

Um, I am pretty sure than pre Gen7 hardware can do the extension. The crux is 
that the extension
is pointless for such hardware because pre Gen7 hardware does not (AFAIK) have 
a feature that
allows for a fragment shader to have a side effect. Even that statement is not 
totally true. Indeed,
one can argue performance queries and occlusion queries with 
framebuffer_no_attachments 
make some form of sense (it would give an application a count of sorts). 

-Kevin




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/7] Define constants and functions for ARB_framebuffer_no_attachment extension

2015-04-28 Thread Rogovin, Kevin
Hello,

> H that's surprising.
>
> src/mesa/main/tests/dispatch_sanity.cpp:// {
>"glFramebufferParameteri", 43, -1 },   // XXX: Add to xml
>
>I thought it should detect that there's a new API and complain loudly.
> At least that's how I remembered it working, but that doesn't seem to be the 
> case?
> Are you sure you had a clean build? Either way, those should probably get 
> uncommented, 

The reason why there was no issue is because the entries for both 
glFramebufferParameteri 
and glGetFramebufferParameteriv are already commented out. Looking at the 
commit log, 
they have been commented out for quite some time. The commit from Jordan Justen
(dated Oct 24,2012) titled "dispatch sanity test: Add GL CORE 3.1 test" has the 
line added 
but commented out. In fact, looking at the commit log, with git log -p the only 
reference 
to those functions has both commented out.

> and there are probably interactions with ARB_dsa as well, should probably 
> figure out if you or Laura should add support for that (or perhaps you had it 
> in your patches already).

The extension ARB_framebuffer_no_attachments does NOT define the DSA style 
functions 
for ARB_direct_state_access. Instead, it only defines for the 
EXT_direct_state_access. I think 
that the implementers of GL_ARB_direct_state_access are the ones that need to  
define and 
implement gl[Get]NamedFraembufferParamteri. I wrote the patch 3 so that it is 
trivial to
implement the DSA function though.

-Kevin

On Fri, Apr 24, 2015 at 11:06 AM, Rogovin, Kevin  
wrote:
> Hi,
>
>  I agree with the comments about the code (and when the last element of the 
> series is reviewed I will submit the series with review comments taken into 
> use), but when I applied just Patch 1 and Patch 2, and ran 
> src/mesa/main/tests/main-test (after a git clean -dfx and all that cleaning) 
> all test pass, in particular the 4 DispatchSanity_test's:  
> DispatchSanity_test.GL31_CORE ,  DispatchSanity_test.GLES11,  
> DispatchSanity_test.GLES2 and  DispatchSanity_test.GLES3. In addition, make 
> check passes all test as well. If you are referring to another test, what 
> test is that?
>
>  -Kevin
>
> -Original Message-
> From: ibmir...@gmail.com [mailto:ibmir...@gmail.com] On Behalf Of Ilia 
> Mirkin
> Sent: Friday, April 24, 2015 4:36 PM
> To: Matt Turner
> Cc: Rogovin, Kevin; mesa-...@freedesktop.org
> Subject: Re: [Mesa-dev] [PATCH 2/7] Define constants and functions for 
> ARB_framebuffer_no_attachment extension
>
> This change will make the dispatch_sanity test fail.
>
> On Fri, Apr 24, 2015 at 3:05 AM, Matt Turner  wrote:
>> The subject should be prefixed with "mesa:"
>>
>> On Thu, Apr 23, 2015 at 11:59 PM,   wrote:
>>> From: Kevin Rogovin 
>>>
>>> Define enumerations, functions and associated glGet's for extension 
>>> ARB_framebuffer_no_attachment.
>>>
>>> ---
>>>  .../glapi/gen/ARB_framebuffer_no_attachments.xml   | 33 ++
>>>  src/mapi/glapi/gen/Makefile.am |  1 +
>>>  src/mapi/glapi/gen/gl_API.xml  |  1 +
>>>  src/mesa/main/fbobject.c   | 12 +++
>>>  src/mesa/main/fbobject.h   |  7 
>>>  src/mesa/main/get.c|  3 ++
>>>  src/mesa/main/get_hash_params.py   | 40 
>>> ++
>>>  7 files changed, 97 insertions(+)
>>>  create mode 100644
>>> src/mapi/glapi/gen/ARB_framebuffer_no_attachments.xml
>>>
>>> diff --git a/src/mapi/glapi/gen/ARB_framebuffer_no_attachments.xml
>>> b/src/mapi/glapi/gen/ARB_framebuffer_no_attachments.xml
>>> new file mode 100644
>>> index 000..60e40d0
>>> --- /dev/null
>>> +++ b/src/mapi/glapi/gen/ARB_framebuffer_no_attachments.xml
>>> @@ -0,0 +1,33 @@
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> +
>>> + >> +name="FRAMEBUFFER_DEFAULT_HEIGHT" value="0x9311" /> >> +name="FRAMEBUFFER_DEFAULT_LAYERS" value="0x9312" /> >> +name="FRAMEBUFFER_DEFAULT_SAMPLES" value="0x9313" /> >> +name="FRAMEBUFFER_DEFAULT_FIXED_SAMPLE_LOCATIONS" value="0x9314" /> 
>>> + >> +name="MAX_FRAMEBUFFER_HEIGHT" value="0x9316" /> >> +name="MAX_FRAMEBUFFER_LAYERS" value="0x9317" /> >> +name="MAX_FRAMEBUFFER_SAMPLES" value="0x9318" />
>>> +
>>> +
>>> +
>>> +

Re: [Mesa-dev] [PATCH 4/7] helper-conveniance functions for drivers to implement ARB_framebuffer_no_attachment

2015-04-28 Thread Rogovin, Kevin
Hi, 

> This is in fact two changes: introduction of the helpers and refactoring of 
> the intersection code to be used with caller provided bounding box.

Is this a request to change the commit message or to split this as well? 
I think splitting it is silly, but if it make you happy so be it; the reason 
being
that there are no lurking surprises if it split or inconsistencies introduced
by such a split.

> 
> ---
>  src/mesa/main/framebuffer.c | 49 
> ++---
>  src/mesa/main/framebuffer.h | 29 +++
>  src/mesa/main/mtypes.h  | 21 ++-
>  3 files changed, 74 insertions(+), 25 deletions(-)
> 
> diff --git a/src/mesa/main/framebuffer.c b/src/mesa/main/framebuffer.c 
> index 4e4d896..7d8921b 100644
> --- a/src/mesa/main/framebuffer.c
> +++ b/src/mesa/main/framebuffer.c
> @@ -357,30 +357,20 @@ update_framebuffer_size(struct gl_context *ctx, 
> struct gl_framebuffer *fb)  }
>  
>  
> +
>  /**
> - * Calculate the inclusive bounding box for the scissor of a specific 
> viewport
> + * Given a bounding box, intersect the bounding box with the scirros 
> + of
> + * a specified vieport.
>   *
>   * \param ctx GL context.
> - * \param buffer  Framebuffer to be checked against
>   * \param idx Index of the desired viewport
>   * \param bboxBounding box for the scissored viewport.  Stored as xmin,
>   *xmax, ymin, ymax.
> - *
> - * \warning This function assumes that the framebuffer dimensions are 
> up to
> - * date (e.g., update_framebuffer_size has been recently called on \c 
> buffer).
> - *
> - * \sa _mesa_clip_to_region
>   */
> -void
> -_mesa_scissor_bounding_box(const struct gl_context *ctx,
> -   const struct gl_framebuffer *buffer,
> -   unsigned idx, int *bbox)
> +extern void
> +_mesa_intersect_scissor_bounding_box(const struct gl_context *ctx,
> + unsigned idx, int *bbox)
>  {
> -   bbox[0] = 0;
> -   bbox[2] = 0;
> -   bbox[1] = buffer->Width;
> -   bbox[3] = buffer->Height;
> -
> if (ctx->Scissor.EnableFlags & (1u << idx)) {
>if (ctx->Scissor.ScissorArray[idx].X > bbox[0]) {
>   bbox[0] = ctx->Scissor.ScissorArray[idx].X; @@ -402,6 
> +392,33 @@ _mesa_scissor_bounding_box(const struct gl_context *ctx,
>   bbox[2] = bbox[3];
>}
> }
> +}
> +
> +/**
> + * Calculate the inclusive bounding box for the scissor of a specific 
> +viewport
> + *
> + * \param ctx GL context.
> + * \param buffer  Framebuffer to be checked against
> + * \param idx Index of the desired viewport
> + * \param bboxBounding box for the scissored viewport.  Stored as xmin,
> + *xmax, ymin, ymax.
> + *
> + * \warning This function assumes that the framebuffer dimensions are 
> +up to
> + * date (e.g., update_framebuffer_size has been recently called on \c 
> buffer).
> + *
> + * \sa _mesa_clip_to_region
> + */
> +void
> +_mesa_scissor_bounding_box(const struct gl_context *ctx,
> +   const struct gl_framebuffer *buffer,
> +   unsigned idx, int *bbox) {
> +   bbox[0] = 0;
> +   bbox[2] = 0;
> +   bbox[1] = buffer->Width;
> +   bbox[3] = buffer->Height;
> +
> +   _mesa_intersect_scissor_bounding_box(ctx, idx, bbox);
>  
> assert(bbox[0] <= bbox[1]);
> assert(bbox[2] <= bbox[3]);
> diff --git a/src/mesa/main/framebuffer.h b/src/mesa/main/framebuffer.h 
> index a427421..8b84d26 100644
> --- a/src/mesa/main/framebuffer.h
> +++ b/src/mesa/main/framebuffer.h
> @@ -76,6 +76,35 @@ _mesa_scissor_bounding_box(const struct gl_context *ctx,
> const struct gl_framebuffer *buffer,
> unsigned idx, int *bbox);
>  
> +extern void
> +_mesa_intersect_scissor_bounding_box(const struct gl_context *ctx,
> + unsigned idx, int *bbox);
> +
> +static inline  GLuint
> +_mesa_geometric_width(const struct gl_framebuffer *buffer) {
> +   return buffer->_HasAttachments ? buffer->Width : 
> +buffer->DefaultGeometry.Width; }
> +
> +
> +static inline  GLuint
> +_mesa_geometric_height(const struct gl_framebuffer *buffer) {
> +   return buffer->_HasAttachments ? buffer->Height : 
> +buffer->DefaultGeometry.Height; }
> +
> +static inline  GLuint
> +_mesa_geometric_samples(const struct gl_framebuffer *buffer) {
> +   return buffer->_HasAttachments ? buffer->Visual.samples : 
> +buffer->DefaultGeometry.NumSamples;
> +}
> +
> +static inline  GLuint
> +_mesa_geometric_layers(const struct gl_framebuffer *buffer) {
> +   return buffer->_HasAttachments ? buffer->MaxNumLayers : 
> +buffer->DefaultGeometry.Layers; }
> +
>  extern void
>  _mesa_update_draw_buffer_bounds(struct gl_context *ctx);
>  
> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 
> 38a3817..ac7cdb6 100644
> --- a/src/mesa/main/mtypes.h
> +++ b/src/mesa/main/mtypes.h
> @@ -3134,13 +3134,13 @@ struct g

Re: [Mesa-dev] [PATCH 5/7] i965: use _mesa_geometry_width/height/layers/samples for programming geometry of framebuffer to GEN

2015-04-28 Thread Rogovin, Kevin
Hi, 

>A couple of us chatted about this, and we all agreed that it's probably not 
>useful to 
>enable ARB_framebuffer_no_attachments on pre-Gen7.  We don't expose atomics, 
>SSBOs, 
> or image load/store on those platforms (and probably won't), so the 
> only way fragment shaders can output any data is via framebuffer writes.

> So I'd be inclined to just not touch the pre-Gen7 code at all.

There are a number of issues lurking. 

Firstly, there are atoms from Gen4 (for example brw_drawing_rect) 
which are used all the way to Gen8.  Therefore, Gen4 code must be 
changed. At this point then one can advocate to only change those 
atoms that are active in Gen7 and Gen8. If that is done, then there 
is an style inconsistency where some atoms , for Gen4,5 and 6 use 
the new functions and some do not. That is just icky in my opinion,
as it  leads to the awkward question "why? is there something 
delicate here?" Compounding the style issue is that the code
should -convey- what it being done to the reader what is going on.
Using the functions makes the convey more trivial for the reader
to know.

Secondly, not using those functions for Gen4,5  and 6 leaves the code 
with a trap for  later. Namely that trap if someone says "you know 
although the extension is silly for Gens 4,5 and 6, it still is trivial to 
enable". I hate leaving traps behind for others.

Thirdly, there is the real meat of reviewing patch 6: a good review 
of that patch will make sure that any remaining references to 
gl_framebuffer::Width, Height, and so on are correct (i.e. the code 
requires the dimension of the backing store and not the geometry).
That checking is made easier, more mechanical if -all- of i965 is made 
consistent in this fashion, for otherwise the checker has more to check 
(i.e. instead of is this for setting rasterizer it is setting rasterizer and 
active for Gen7 and 8). 

When I was making these changes to i965 I was also tempted
to add a functions of the sort "_mesa_buffer_width", that
returned gl_fraembufffer::Width regardless of the value
of gl_framebuffer::_HasAttachments. The reason why I was
tempted was to help (via lovely grep) to make patch 6 and 
the reviewing of it more mechanical and easier. but I chose 
not to since the meaning of the fields from gl_framebuffer
became the exact meaning of those fields.  That third 
reason is the one reason that patch 6 should make people
itch (in my opinion): "where all references hit?" Checking
that require more than just checking the diff, that requires
knowing all the references to gl_framebuffer::Width and
friends, knowing the use there and then checking if the
patch does or DOES NOT change that code block 
appropriately. 

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/7] i965: use _mesa_geometry_width/height/layers/samples for programming geometry of framebuffer to GEN

2015-04-27 Thread Rogovin, Kevin
Why make it two separate patches even? Seems silly (not to mention more work 
really). I guess the issue is the commit message portion saying "for 
framebuffer_no_attachements".. perhaps I just change the commit to "use new 
geometry function to indicate want geometry not buffer thingies".
 -Kevin

-Original Message-
From: Pohjolainen, Topi 
Sent: Monday, April 27, 2015 10:43 AM
To: Rogovin, Kevin
Cc: mesa-...@freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 5/7] i965: use 
_mesa_geometry_width/height/layers/samples for programming geometry of 
framebuffer to GEN

On Fri, Apr 24, 2015 at 11:36:27PM +0300, Rogovin, Kevin wrote:
> I want to add one more comment on why to replace with the _mesa_geometry_ 
> functions, which I had thought was so obvious I neglected to mention it:
> 
> With this series the meaning of gl_framebuffer::Width, Height, and so on have 
> a different meaning. They give the intersection of the backing stores of the 
> render targets. In contrast, the _mesa_geometry_* functions give the geometry 
> to feed a rasterizer/windower. By using _mesa_geometry_* functions the code 
> communicates clearly it wants the geometry to feed windower/rasterizer and 
> not the geometry of the intersection of the (potentially empty) set of 
> backing stores. Moreover, it is better to be consistent as well, as later 
> someone will likely wonder: "why in Gen7 and higher are those _mesa_geometry 
> functions used but not before?" That question has no good answer because it 
> does not make sense to not use those functions when programming the 
> rasterizer/windower thingies.
> 

Could we split the patch in two? One part dealing with the necessary needed for 
gen7 and higher to support no_attachments and then another just making older 
gens consistent.

> -Kevin
> 
> -Original Message-
> From: Rogovin, Kevin
> Sent: Friday, April 24, 2015 7:43 PM
> To: Pohjolainen, Topi
> Cc: mesa-...@freedesktop.org
> Subject: RE: [Mesa-dev] [PATCH 5/7] i965: use 
> _mesa_geometry_width/height/layers/samples for programming geometry of 
> framebuffer to GEN
> 
> 
> > My point specifically was that you are also updating atoms that _are not_ 
> > re-used. 
> > And as those changes are not really needed, I wouldn't take the risk 
> > of changing something in vain. I would introduce them only when you have 
> > patches to really enable older generations.
> 
> My take is the following:
> 
>  1. Tracking (and guaranteeing) that those function left unchanged as is are 
> exactly just those for before Gen7 is a pain. Much easier, and more reliable 
> to hit them all instead. A significant number of functions in i965 are not 
> emit functions of any atom but emit functions of atoms map to them. Again, 
> more reliable and -safer- to change them all, then just the bare minimum. 
> 
> 2. The change is benign. If _HasAttachments is true, then the function 
> substitution gives the same value. For Gens not supporting the extension 
> there is no effect.
> 
> 3. Lastly, as stated: for later it leaves the option to enable it for Gen6 
> and below, it is just trivial change, but it needs testing on hardware.
> 
> When I writing this work, I originally had it for all Gens, but changed to 
> support only Gen7and higher because that is all on which I can test it. 
> 
> -Kevin
>  
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/7] i965: ensure execution of fragment shader when fragment shader has atomic buffer access

2015-04-24 Thread Rogovin, Kevin
One more comment, that I neglected to add: there are other checks for 
_CurrentFragmentProgram to be non-NULL, indeed function 
brw_upload_wm_abo_surface() [file brw_wm_surface_state.c], also 
has a check for it being non-NULL. That function is the emit for
the atom brw_wm_abo_surfaces which is present in both gen7_atoms
and gen8_atoms.

I would argue that _CurrentFragmentProgram can be NULL, given
that other places check for it and that without the check piglit gets
about 30 more crashes.

Sorry for not posting this in the first reply.

 -Kevin

-Original Message-
From: Rogovin, Kevin 
Sent: Friday, April 24, 2015 11:22 PM
To: 'Kenneth Graunke'; mesa-dev@lists.freedesktop.org
Cc: mesa-...@freedesktop.org; curroje...@riseup.net
Subject: RE: [Mesa-dev] [PATCH 6/7] i965: ensure execution of fragment shader 
when fragment shader has atomic buffer access



> Checking brw->ctx.Shader._CurrentFragmentProgram != NULL is unnecessary.
> There is always a valid pixel shader.  (If the application is using 
> fixed-function, we supply a fragment shader for them.)  Please drop that 
> check.

Without this check(in the Gen7 function/code), about 30 crashes are induced on 
piglit tests for Gen7; the tests are all using GL fixed function pipeline. I 
have not run piglit without this check on Gen8 though.

> I thought that UAVs were essentially for Images...I'm not clear why this is 
> needed.  Perhaps Curro can confirm one way or another.

The essential reason is to guarantee that the pixel shader gets invoked by Gen 
even when all render target surfaces are NULL surfaces. There are other flags 
one can use, but the UAV seems (to me) the most natural.
 
-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/7] i965: use _mesa_geometry_width/height/layers/samples for programming geometry of framebuffer to GEN

2015-04-24 Thread Rogovin, Kevin
I want to add one more comment on why to replace with the _mesa_geometry_ 
functions, which I had thought was so obvious I neglected to mention it:

With this series the meaning of gl_framebuffer::Width, Height, and so on have a 
different meaning. They give the intersection of the backing stores of the 
render targets. In contrast, the _mesa_geometry_* functions give the geometry 
to feed a rasterizer/windower. By using _mesa_geometry_* functions the code 
communicates clearly it wants the geometry to feed windower/rasterizer and not 
the geometry of the intersection of the (potentially empty) set of backing 
stores. Moreover, it is better to be consistent as well, as later someone will 
likely wonder: "why in Gen7 and higher are those _mesa_geometry functions used 
but not before?" That question has no good answer because it does not make 
sense to not use those functions when programming the rasterizer/windower 
thingies.

-Kevin

-Original Message-----
From: Rogovin, Kevin 
Sent: Friday, April 24, 2015 7:43 PM
To: Pohjolainen, Topi
Cc: mesa-...@freedesktop.org
Subject: RE: [Mesa-dev] [PATCH 5/7] i965: use 
_mesa_geometry_width/height/layers/samples for programming geometry of 
framebuffer to GEN


> My point specifically was that you are also updating atoms that _are not_ 
> re-used. 
> And as those changes are not really needed, I wouldn't take the risk 
> of changing something in vain. I would introduce them only when you have 
> patches to really enable older generations.

My take is the following:

 1. Tracking (and guaranteeing) that those function left unchanged as is are 
exactly just those for before Gen7 is a pain. Much easier, and more reliable to 
hit them all instead. A significant number of functions in i965 are not emit 
functions of any atom but emit functions of atoms map to them. Again, more 
reliable and -safer- to change them all, then just the bare minimum. 

2. The change is benign. If _HasAttachments is true, then the function 
substitution gives the same value. For Gens not supporting the extension there 
is no effect.

3. Lastly, as stated: for later it leaves the option to enable it for Gen6 and 
below, it is just trivial change, but it needs testing on hardware.

When I writing this work, I originally had it for all Gens, but changed to 
support only Gen7and higher because that is all on which I can test it. 

-Kevin
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/7] i965: ensure execution of fragment shader when fragment shader has atomic buffer access

2015-04-24 Thread Rogovin, Kevin


> Checking brw->ctx.Shader._CurrentFragmentProgram != NULL is unnecessary.
> There is always a valid pixel shader.  (If the application is using 
> fixed-function, we supply a fragment shader for them.)  Please drop that 
> check.

Without this check(in the Gen7 function/code), about 30 crashes are induced on 
piglit tests for Gen7; the tests are all using GL fixed function pipeline. I 
have not run piglit without this check on Gen8 though.

> I thought that UAVs were essentially for Images...I'm not clear why this is 
> needed.  Perhaps Curro can confirm one way or another.

The essential reason is to guarantee that the pixel shader gets invoked by Gen 
even when all render target surfaces are NULL surfaces. There are other flags 
one can use, but the UAV seems (to me) the most natural.
 
-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/7] i965: ensure execution of fragment shader when fragment shader has atomic buffer access

2015-04-24 Thread Rogovin, Kevin


> I'll check with Jordan and others. I have a faint recollection that compute 
> shaders have similar 
>  needs. I think your change is fine though, I just want to understand the 
> bigger picture first.

I do not think compute shaders have similar needs. These flags are for making 
sure the
rasterizer-wm thingy in Gen spawns the fragment shader threads. Compute kernels
are not (I believe) spawned by the raster-wm thing, as they do not actually use 
the pipeline
(rather they use L3, samplers and EU's only essentially).

-Kevin

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >