Re: [Mesa-dev] [PATCH 2/2] mesa: fix GLSL program objects with more than 16 samplers combined

2013-05-21 Thread Eric Anholt
Marek Olšák  writes:

> 1) The tabs are used in the entire file. They might have got there
> while I was moving some of the bits of code around. I don't really use
> tabs for core Mesa work. I can change the tabs to spaces unless you
> mean I should change everything to tabs (you didn't make that clear).

Official mesa style is all spaces -- Ian and I were violating that for
years, and so to get things consistent eventually, I mention when +
lines have tabs.

> 2) _mesa_get_sampler_uniform_value is really called at link time,
> specifically at the conversion to TGSI. The only case where it might
> be called after link time is when a new shader variant is being
> generated, though i'm not really sure about this. Anyway, the function
> already calls linker_error above my code to report an error. I decided
> to do the same. _mesa_problem sounds better though.

OK, either way.


pgpoN_d1iTQkL.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [v4 03/10] intel: replace single region with a vector of regions

2013-05-21 Thread Pohjolainen, Topi
On Tue, May 21, 2013 at 10:11:17PM -0700, Chad Versace wrote:
> On 05/02/2013 12:08 AM, Topi Pohjolainen wrote:
> >No functional change in preparation for supporting multiple planes
> >per image each having its own region.
> >
> >Signed-off-by: Topi Pohjolainen 
> >---
> >  src/mesa/drivers/dri/intel/intel_fbo.c   |  6 +--
> >  src/mesa/drivers/dri/intel/intel_regions.h   |  7 ++-
> >  src/mesa/drivers/dri/intel/intel_screen.c| 69 
> > ++--
> >  src/mesa/drivers/dri/intel/intel_tex_image.c |  2 +-
> >  4 files changed, 45 insertions(+), 39 deletions(-)
> 
> [snip]
> 
> >diff --git a/src/mesa/drivers/dri/intel/intel_regions.h 
> >b/src/mesa/drivers/dri/intel/intel_regions.h
> >index 1fb6b27..e610f6b 100644
> >--- a/src/mesa/drivers/dri/intel/intel_regions.h
> >+++ b/src/mesa/drivers/dri/intel/intel_regions.h
> >@@ -129,8 +129,13 @@ struct intel_image_format {
> > } planes[3];
> >  };
> >
> >+/**
> >+ * An image representing multiple planes may come in two flavours:
> >+ *  - all planes in single region but in different offsets or
> >+ *  - each plane in its own region.
> >+ */
> 
> 
> In case (1), does image->regions contain a single image, or does
> it contain 3 pointers to the same image?
> 
> More importantly, I don't understand a key point here. By examining a given 
> instance
> of __DRIimageRec, how can I determine if the image falls under case (1)
> or case (2)?
> 
> Here is a concrete instance of the problem I don't know how to solve.
> Suppose that I have an instance of __DRIimageRec and that I've determined
> so far that its contents are as below, where image->planar_format was obtained
> by a lookup in the intel_screen.c:intel_image_formats table.
> 
>   image = {
> // ...
> .planar_format = {
>.fourcc = __DRI_IMAGE_FOURCC_YUV422,
>.components = __DRI_IMAGE_COMPONENTS_Y_U_V,
>.nplanes = 3,
>. planes = {
>  [0] = {
>.buffer_index = 0,
> .dri_format = __DRI_IMAGE_FORMAT_R8,
> // ...
>   },
>  [1] = {
>.buffer_index = 1,
>.dri_format = __DRI_IMAGE_FORMAT_R8,
>// ...
>  },
>  [2] = {
>.buffer_index = 2,
>.dri_format = __DRI_IMAGE_FORMAT_R8,
>// ...
>   },
> };
> 
> With this information, how can I know if dereferencing image->regions[1]
> will segfault or not? That is, how can I know if this image have one or
> three images?

All the entities that produce instances of 'struct __DRIimageRec' make certain
that 'regions' contain as many valid pointers as there are planes. Hence the
controlling counter is 'planar_format.nplanes'. I originally thought adding 
a counter along with 'regions', say 'nregions', but then decided against as it
would be duplicating 'planar_format.nplanes' - they would always agree anyway.
If 'planar_format' itself is missing (zero pointer) the image is implicitly
always packed having only one region.

In this patch I have simply extended the number of pointers, but no logic is
yet trying to access elements of 'regions' beyond the first. The following
patches introduce planar cases where other elements are accessed the
'planar_format' controlling how many.

In addition, patch number six takes advantage of the fact that all the unused
region pointers in the array are fixed to zero (for now all instances of
'struct __DRIimageRec' are allocated from the heap and all members are
initialised to zero). It is also the patch that introduces the creation of
images having more than one region guaranteeing that the planar format is also
set accordingly.


I think I understand where you are getting at - these rules are not obvious and
should probably be enforced and/or made clearer. Would you have any preference,
introducing the 'nregions' perhaps?

> 
> >  struct __DRIimageRec {
> >-   struct intel_region *region;
> >+   struct intel_region *regions[3];
> > GLenum internal_format;
> > uint32_t dri_format;
> > GLuint format;
> 
> [snip]
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/13] gallium: Introduce 32-bit bytewise format names

2013-05-21 Thread Jose Fonseca


- Original Message -
> Hi Jose,
> 
> Thanks for the review.
> 
> Jose Fonseca  writes:
> > - Original Message -
> >> From: Richard Sandiford 
> >> 
> >> RGBA has R at byte 0 and A at byte 3, regardless of platform
> >> endianness.
> >
> > Maybe I'm missing something, but this naming convention seems to me
> > the exact opposite of what was decided [1], which is:
> >
> >  - R at byte 0, ..., and A at byte 3, regardless of platform endianness
> >  would be called "R8G8B8A8"
> >
> >  - R at bit 0, ..., A at bit 24, encoded as integers that match the
> >  platform endianness would be called "RGBA"
> >
> > which would be consistent with (as in a superset of) D3D10 format
> > naming.
> 
> Yeah, it's supposed to be that way round in the patches.  RGBA is a
> 32-bit int with R in the high 8 bits and A in the low 8 bits. 

This is still slightly different than what I expected: gallium was using the 
convention the components started with least significant bit appear first in 
name (same as D3D10). That is, RGBA is a 32-bit int with R in the _low_ 8 
bits and A in the _high_ 8 bits.

I understand that Mesa formats follow the opposite convention, but between 
consistency with Mesa vs consistency within gallium I believe that it is better 
for gallium formats to be consistent among themselves: it is much easier to 
remember that _all_ gallium formats go from least->most significant 
bit/byte/word, than to remember which formats are supposed to go from the 
low->high vs high->low, which would end up forcing developers to either guess 
wrongly or waste time look up the format implementation.

> R8G8B8A8 is
> an array of 4 bytes in the order { R, G, B, A }.  I think it was just
> the comment in the covering note that had it the wrong way around --
> sorry about that.

Otherwise looks good.  I went through the implementation in more detail and it 
looks AFAICT.

As Adam mentioned on the cover email, the series must be squashed when 
commited, such way that each commit builds and works, to avoid interfering with 
git bisection.

I also agree with Marek regarding docs. Please create a new 
src/gallium/docs/source/format.rst and put a couple of paragraphs there on the 
differences between the two kinds.

Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] softpipe: change TEX_TILE_SIZE and NUM_TEX_TILE_ENTRIES

2013-05-21 Thread Jose Fonseca
Sounds definitely an improvement.

Long term I think the cache is probably not worth to keep any texture cache at 
all.

Jose

- Original Message -
> From: Roland Scheidegger 
> 
> Initially we had NUM_TEX_TILE_ENTRIES of 50, however this was using too much
> memory (mostly because the tile cache is operating on fixed max current
> sampler views which could be fixed but that's another topic). So it was
> decreased to 4. However this is a ridiculously low number which can't
> actually really work (the number of tiles needed for as little as
> a single quad with linear_mipmap_linear is 2 to 8 for a 2d texture, and
> 4 to 16 for a 3d texture), as it just about guarantees there will be
> cache thrashing sometimes (just about always for 3d textures in fact, since
> while there are 4 entries the cache is direct mapped).
> So increase that number to 16 (which is still on the low side for direct
> mapped cache though I guess using something like 4-way associativity would
> be more effective than increasing this further) which has at least some good
> chance to avoid thrashing. Since we don't want to increase memory
> requirements
> however in turn decrease the tile size accordingly from 64 to 32 (as a bonus
> point this also decreases the cost of texture thrashing which might still
> happen sometimes).
> I've seen performance improvement in the order of factor ~200 (specifically,
> drawing the first frame from the replay from bug 41787 needs "only" ~10s
> instead of ~30min, meaning I can actually compare the output with other
> drivers...) with this.
> ---
>  src/gallium/drivers/softpipe/sp_tex_tile_cache.h |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/drivers/softpipe/sp_tex_tile_cache.h
> b/src/gallium/drivers/softpipe/sp_tex_tile_cache.h
> index 0ea82b3..2fd6f12 100644
> --- a/src/gallium/drivers/softpipe/sp_tex_tile_cache.h
> +++ b/src/gallium/drivers/softpipe/sp_tex_tile_cache.h
> @@ -40,7 +40,7 @@ struct softpipe_tex_tile_cache;
>  /**
>   * Cache tile size (width and height). This needs to be a power of two.
>   */
> -#define TEX_TILE_SIZE_LOG2 6
> +#define TEX_TILE_SIZE_LOG2 5
>  #define TEX_TILE_SIZE (1 << TEX_TILE_SIZE_LOG2)
>  
>  
> @@ -73,7 +73,7 @@ struct softpipe_tex_cached_tile
> } data;
>  };
>  
> -#define NUM_TEX_TILE_ENTRIES 4
> +#define NUM_TEX_TILE_ENTRIES 16
>  
>  struct softpipe_tex_tile_cache
>  {
> --
> 1.7.9.5
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [v4 03/10] intel: replace single region with a vector of regions

2013-05-21 Thread Chad Versace

On 05/02/2013 12:08 AM, Topi Pohjolainen wrote:

No functional change in preparation for supporting multiple planes
per image each having its own region.

Signed-off-by: Topi Pohjolainen 
---
  src/mesa/drivers/dri/intel/intel_fbo.c   |  6 +--
  src/mesa/drivers/dri/intel/intel_regions.h   |  7 ++-
  src/mesa/drivers/dri/intel/intel_screen.c| 69 ++--
  src/mesa/drivers/dri/intel/intel_tex_image.c |  2 +-
  4 files changed, 45 insertions(+), 39 deletions(-)


[snip]


diff --git a/src/mesa/drivers/dri/intel/intel_regions.h 
b/src/mesa/drivers/dri/intel/intel_regions.h
index 1fb6b27..e610f6b 100644
--- a/src/mesa/drivers/dri/intel/intel_regions.h
+++ b/src/mesa/drivers/dri/intel/intel_regions.h
@@ -129,8 +129,13 @@ struct intel_image_format {
 } planes[3];
  };

+/**
+ * An image representing multiple planes may come in two flavours:
+ *  - all planes in single region but in different offsets or
+ *  - each plane in its own region.
+ */



In case (1), does image->regions contain a single image, or does
it contain 3 pointers to the same image?

More importantly, I don't understand a key point here. By examining a given 
instance
of __DRIimageRec, how can I determine if the image falls under case (1)
or case (2)?

Here is a concrete instance of the problem I don't know how to solve.
Suppose that I have an instance of __DRIimageRec and that I've determined
so far that its contents are as below, where image->planar_format was obtained
by a lookup in the intel_screen.c:intel_image_formats table.

  image = {
// ...
.planar_format = {
   .fourcc = __DRI_IMAGE_FOURCC_YUV422,
   .components = __DRI_IMAGE_COMPONENTS_Y_U_V,
   .nplanes = 3,
   . planes = {
 [0] = {
   .buffer_index = 0,
.dri_format = __DRI_IMAGE_FORMAT_R8,
// ...
  },
 [1] = {
   .buffer_index = 1,
   .dri_format = __DRI_IMAGE_FORMAT_R8,
   // ...
 },
 [2] = {
   .buffer_index = 2,
   .dri_format = __DRI_IMAGE_FORMAT_R8,
   // ...
  },
};

With this information, how can I know if dereferencing image->regions[1]
will segfault or not? That is, how can I know if this image have one or
three images?


  struct __DRIimageRec {
-   struct intel_region *region;
+   struct intel_region *regions[3];
 GLenum internal_format;
 uint32_t dri_format;
 GLuint format;


[snip]

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] i965 gen7: use SURFACE_STATE fields to select render level/layer

2013-05-21 Thread Jordan Justen
Rather than pointing the surface_state directly at a single
sub-image of the texture for rendering, we now point the
surface_state at the top level of the texture, and configure
the surface_state as needed based on this.

Signed-off-by: Jordan Justen 
---
 src/mesa/drivers/dri/i965/brw_defines.h   |2 +
 src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |   62 +++--
 2 files changed, 46 insertions(+), 18 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index fedd78c..d61151f 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -539,6 +539,8 @@
 #define GEN7_SURFACE_MULTISAMPLECOUNT_8 (3 << 3)
 #define GEN7_SURFACE_MSFMT_MSS  (0 << 6)
 #define GEN7_SURFACE_MSFMT_DEPTH_STENCIL(1 << 6)
+#define GEN7_SURFACE_MIN_ARRAY_ELEMENT_SHIFT   18
+#define GEN7_SURFACE_RENDER_TARGET_VIEW_EXTENT_SHIFT   7
 
 /* Surface state DW5 */
 #define BRW_SURFACE_X_OFFSET_SHIFT 25
diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
index 6c01545..6ba66b0 100644
--- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
@@ -23,6 +23,7 @@
 #include "main/mtypes.h"
 #include "main/blend.h"
 #include "main/samplerobj.h"
+#include "main/texformat.h"
 #include "program/prog_parameter.h"
 
 #include "intel_mipmap_tree.h"
@@ -529,12 +530,15 @@ gen7_update_renderbuffer_surface(struct brw_context *brw,
struct gl_context *ctx = &intel->ctx;
struct intel_renderbuffer *irb = intel_renderbuffer(rb);
struct intel_region *region = irb->mt->region;
-   uint32_t tile_x, tile_y;
uint32_t format;
/* _NEW_BUFFERS */
gl_format rb_format = _mesa_get_render_format(ctx, intel_rb_format(irb));
-
-   assert(!layered);
+   uint32_t surftype;
+   bool is_array = false;
+   int depth = MAX2(rb->Depth, 1);
+   int min_array_element;
+   GLenum gl_target = rb->TexImage ?
+ rb->TexImage->TexObject->Target : GL_TEXTURE_2D;
 
uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
 8 * 4, 32, &brw->wm.surf_offset[unit]);
@@ -550,7 +554,28 @@ gen7_update_renderbuffer_surface(struct brw_context *brw,
 __FUNCTION__, _mesa_get_format_name(rb_format));
}
 
-   surf[0] = BRW_SURFACE_2D << BRW_SURFACE_TYPE_SHIFT |
+   switch (gl_target) {
+   case GL_TEXTURE_CUBE_MAP_ARRAY:
+   case GL_TEXTURE_CUBE_MAP:
+  surftype = BRW_SURFACE_2D;
+  is_array = true;
+  depth *= 6;
+  break;
+   default:
+  surftype = translate_tex_target(gl_target);
+  is_array = _mesa_tex_target_is_array(gl_target);
+  break;
+   }
+
+   if (layered) {
+  min_array_element = 0;
+   } else if (irb->mt->num_samples > 1) {
+  min_array_element = irb->mt_layer / irb->mt->num_samples;
+   } else {
+  min_array_element = irb->mt_layer;
+   }
+
+   surf[0] = surftype << BRW_SURFACE_TYPE_SHIFT |
  format << BRW_SURFACE_FORMAT_SHIFT |
  (irb->mt->array_spacing_lod0 ? GEN7_SURFACE_ARYSPC_LOD0
   : GEN7_SURFACE_ARYSPC_FULL) |
@@ -561,24 +586,25 @@ gen7_update_renderbuffer_surface(struct brw_context *brw,
if (irb->mt->align_w == 8)
   surf[0] |= GEN7_SURFACE_HALIGN_8;
 
-   /* reloc */
-   surf[1] = intel_renderbuffer_tile_offsets(irb, &tile_x, &tile_y) +
- region->bo->offset; /* reloc */
+   if (is_array) {
+  surf[0] |= GEN7_SURFACE_IS_ARRAY;
+   }
+
+   surf[1] = region->bo->offset;
 
assert(brw->has_surface_tile_offset);
-   /* Note that the low bits of these fields are missing, so
-* there's the possibility of getting in trouble.
-*/
-   assert(tile_x % 4 == 0);
-   assert(tile_y % 2 == 0);
-   surf[5] = SET_FIELD(tile_x / 4, BRW_SURFACE_X_OFFSET) |
- SET_FIELD(tile_y / 2, BRW_SURFACE_Y_OFFSET);
 
-   surf[2] = SET_FIELD(rb->Width - 1, GEN7_SURFACE_WIDTH) |
- SET_FIELD(rb->Height - 1, GEN7_SURFACE_HEIGHT);
-   surf[3] = region->pitch - 1;
+   surf[5] = irb->mt_level;
+
+   surf[2] = SET_FIELD(irb->mt->logical_width0 - 1, GEN7_SURFACE_WIDTH) |
+ SET_FIELD(irb->mt->logical_height0 - 1, GEN7_SURFACE_HEIGHT);
+
+   surf[3] = ((depth - 1) << BRW_SURFACE_DEPTH_SHIFT) |
+ (region->pitch - 1);
 
-   surf[4] = gen7_surface_msaa_bits(irb->mt->num_samples, 
irb->mt->msaa_layout);
+   surf[4] = gen7_surface_msaa_bits(irb->mt->num_samples, 
irb->mt->msaa_layout) |
+ min_array_element << GEN7_SURFACE_MIN_ARRAY_ELEMENT_SHIFT |
+ (depth - 1) << GEN7_SURFACE_RENDER_TARGET_VIEW_EXTENT_SHIFT;
 
if (irb->mt->msaa_layout == INTEL_MSAA_LAYOUT_CMS) {
   gen7_set_surface_mcs_info(brw, surf, brw->wm.surf_offset[unit],
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@

Re: [Mesa-dev] [RFC PATCH] ARB_vp/ARB_fp: accept duplicate precision options

2013-05-21 Thread Chris Forbes
OK, I'll add the spec quote and fixup the two piglits which are
asserting the old behavior, and push it all tonight.

-- Chris

On Wed, May 22, 2013 at 9:33 AM, Ian Romanick  wrote:
> On 05/18/2013 11:18 PM, Chris Forbes wrote:
>>
>> Relaxes the validation of
>>
>> OPTION ARB_precision_hint_{nicest,fastest};
>>
>> to allow duplicate options. The spec says that both /nicest/ and
>> /fastest/ cannot be specified together, but could be interpreted
>> either way for respecification of the same option.
>>
>> Other drivers (NVIDIA etc) accept this, and at least one Unity3D game
>> expects it to succeed (Kerbal Space Program).
>
>
> NOTE: This is a candidate for stable release branches.
>
>
>> Signed-off-by: Chris Forbes 
>> ---
>>   src/mesa/program/program_parse_extra.c | 16 +++-
>>   1 file changed, 7 insertions(+), 9 deletions(-)
>>
>> diff --git a/src/mesa/program/program_parse_extra.c
>> b/src/mesa/program/program_parse_extra.c
>> index 4d92848..eb1fccc 100644
>> --- a/src/mesa/program/program_parse_extra.c
>> +++ b/src/mesa/program/program_parse_extra.c
>> @@ -194,15 +194,13 @@ _mesa_ARBfp_parse_option(struct asm_parser_state
>> *state, const char *option)
>> } else if (strncmp(option, "precision_hint_", 15) == 0) {
>>  option += 15;
>>
>> -if (state->option.PrecisionHint == OPTION_NONE) {
>> -   if (strcmp(option, "nicest") == 0) {
>> -  state->option.PrecisionHint = OPTION_NICEST;
>> -  return 1;
>> -   } else if (strcmp(option, "fastest") == 0) {
>> -  state->option.PrecisionHint = OPTION_FASTEST;
>> -  return 1;
>> -   }
>> -}
>
>
> I think it's worth quoting the bit of spec that Eric mentioned in a comment
> here.  With or without that change, this patch is
>
> Reviewed-by: Ian Romanick 
>
>
>> + if (strcmp(option, "nicest") == 0 && state->option.PrecisionHint
>> != OPTION_FASTEST) {
>> +state->option.PrecisionHint = OPTION_NICEST;
>> +return 1;
>> + } else if (strcmp(option, "fastest") == 0 &&
>> state->option.PrecisionHint != OPTION_NICEST) {
>> +state->option.PrecisionHint = OPTION_FASTEST;
>> +return 1;
>> + }
>>
>>  return 0;
>> } else if (strcmp(option, "draw_buffers") == 0) {
>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] softpipe: change TEX_TILE_SIZE and NUM_TEX_TILE_ENTRIES

2013-05-21 Thread sroland
From: Roland Scheidegger 

Initially we had NUM_TEX_TILE_ENTRIES of 50, however this was using too much
memory (mostly because the tile cache is operating on fixed max current
sampler views which could be fixed but that's another topic). So it was
decreased to 4. However this is a ridiculously low number which can't
actually really work (the number of tiles needed for as little as
a single quad with linear_mipmap_linear is 2 to 8 for a 2d texture, and
4 to 16 for a 3d texture), as it just about guarantees there will be
cache thrashing sometimes (just about always for 3d textures in fact, since
while there are 4 entries the cache is direct mapped).
So increase that number to 16 (which is still on the low side for direct
mapped cache though I guess using something like 4-way associativity would
be more effective than increasing this further) which has at least some good
chance to avoid thrashing. Since we don't want to increase memory requirements
however in turn decrease the tile size accordingly from 64 to 32 (as a bonus
point this also decreases the cost of texture thrashing which might still
happen sometimes).
I've seen performance improvement in the order of factor ~200 (specifically,
drawing the first frame from the replay from bug 41787 needs "only" ~10s
instead of ~30min, meaning I can actually compare the output with other
drivers...) with this.
---
 src/gallium/drivers/softpipe/sp_tex_tile_cache.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/softpipe/sp_tex_tile_cache.h 
b/src/gallium/drivers/softpipe/sp_tex_tile_cache.h
index 0ea82b3..2fd6f12 100644
--- a/src/gallium/drivers/softpipe/sp_tex_tile_cache.h
+++ b/src/gallium/drivers/softpipe/sp_tex_tile_cache.h
@@ -40,7 +40,7 @@ struct softpipe_tex_tile_cache;
 /**
  * Cache tile size (width and height). This needs to be a power of two.
  */
-#define TEX_TILE_SIZE_LOG2 6
+#define TEX_TILE_SIZE_LOG2 5
 #define TEX_TILE_SIZE (1 << TEX_TILE_SIZE_LOG2)
 
 
@@ -73,7 +73,7 @@ struct softpipe_tex_cached_tile
} data;
 };
 
-#define NUM_TEX_TILE_ENTRIES 4
+#define NUM_TEX_TILE_ENTRIES 16
 
 struct softpipe_tex_tile_cache
 {
-- 
1.7.9.5
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] softpipe: disambiguate TILE_SIZE / TEX_TILE_SIZE

2013-05-21 Thread sroland
From: Roland Scheidegger 

These can be different (just like NUM_TEX_TILE_ENTRIES / NUM_ENTRIES),
though currently they aren't.
---
 src/gallium/drivers/softpipe/sp_tex_sample.c |   28 +++---
 src/gallium/drivers/softpipe/sp_tex_tile_cache.c |   28 +++---
 src/gallium/drivers/softpipe/sp_tex_tile_cache.h |   20 
 3 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/src/gallium/drivers/softpipe/sp_tex_sample.c 
b/src/gallium/drivers/softpipe/sp_tex_sample.c
index 1550199..2c7f17f 100644
--- a/src/gallium/drivers/softpipe/sp_tex_sample.c
+++ b/src/gallium/drivers/softpipe/sp_tex_sample.c
@@ -580,10 +580,10 @@ get_texel_2d_no_border(const struct sp_sampler_view 
*sp_sview,
union tex_tile_address addr, int x, int y)
 {
const struct softpipe_tex_cached_tile *tile;
-   addr.bits.x = x / TILE_SIZE;
-   addr.bits.y = y / TILE_SIZE;
-   y %= TILE_SIZE;
-   x %= TILE_SIZE;
+   addr.bits.x = x / TEX_TILE_SIZE;
+   addr.bits.y = y / TEX_TILE_SIZE;
+   y %= TEX_TILE_SIZE;
+   x %= TEX_TILE_SIZE;
 
tile = sp_get_cached_tile_tex(sp_sview->cache, addr);
 
@@ -722,10 +722,10 @@ get_texel_quad_2d_no_border_single_tile(const struct 
sp_sampler_view *sp_sview,
 {
 const struct softpipe_tex_cached_tile *tile;
 
-   addr.bits.x = x / TILE_SIZE;
-   addr.bits.y = y / TILE_SIZE;
-   y %= TILE_SIZE;
-   x %= TILE_SIZE;
+   addr.bits.x = x / TEX_TILE_SIZE;
+   addr.bits.y = y / TEX_TILE_SIZE;
+   y %= TEX_TILE_SIZE;
+   x %= TEX_TILE_SIZE;
 
tile = sp_get_cached_tile_tex(sp_sview->cache, addr);
   
@@ -777,11 +777,11 @@ get_texel_3d_no_border(const struct sp_sampler_view 
*sp_sview,
 {
const struct softpipe_tex_cached_tile *tile;
 
-   addr.bits.x = x / TILE_SIZE;
-   addr.bits.y = y / TILE_SIZE;
+   addr.bits.x = x / TEX_TILE_SIZE;
+   addr.bits.y = y / TEX_TILE_SIZE;
addr.bits.z = z;
-   y %= TILE_SIZE;
-   x %= TILE_SIZE;
+   y %= TEX_TILE_SIZE;
+   x %= TEX_TILE_SIZE;
 
tile = sp_get_cached_tile_tex(sp_sview->cache, addr);
 
@@ -917,8 +917,8 @@ img_filter_2d_linear_repeat_POT(struct sp_sampler_view 
*sp_sview,
 {
unsigned xpot = pot_level_size(sp_sview->xpot, level);
unsigned ypot = pot_level_size(sp_sview->ypot, level);
-   unsigned xmax = (xpot - 1) & (TILE_SIZE - 1); /* MIN2(TILE_SIZE, xpot) - 1; 
*/
-   unsigned ymax = (ypot - 1) & (TILE_SIZE - 1); /* MIN2(TILE_SIZE, ypot) - 1; 
*/
+   unsigned xmax = (xpot - 1) & (TEX_TILE_SIZE - 1); /* MIN2(TEX_TILE_SIZE, 
xpot) - 1; */
+   unsigned ymax = (ypot - 1) & (TEX_TILE_SIZE - 1); /* MIN2(TEX_TILE_SIZE, 
ypot) - 1; */
union tex_tile_address addr;
int c;
 
diff --git a/src/gallium/drivers/softpipe/sp_tex_tile_cache.c 
b/src/gallium/drivers/softpipe/sp_tex_tile_cache.c
index af1024d..b0d8a18 100644
--- a/src/gallium/drivers/softpipe/sp_tex_tile_cache.c
+++ b/src/gallium/drivers/softpipe/sp_tex_tile_cache.c
@@ -50,7 +50,7 @@ sp_create_tex_tile_cache( struct pipe_context *pipe )
uint pos;
 
/* make sure max texture size works */
-   assert((TILE_SIZE << TEX_ADDR_BITS) >= (1 << (SP_MAX_TEXTURE_2D_LEVELS-1)));
+   assert((TEX_TILE_SIZE << TEX_ADDR_BITS) >= (1 << 
(SP_MAX_TEXTURE_2D_LEVELS-1)));
 
tc = CALLOC_STRUCT( softpipe_tex_tile_cache );
if (tc) {
@@ -212,7 +212,7 @@ sp_find_cached_tile_tex(struct softpipe_tex_tile_cache *tc,
 
if (addr.value != tile->addr.value) {
 
-  /* cache miss.  Most misses are because we've invaldiated the
+  /* cache miss.  Most misses are because we've invalidated the
* texture cache previously -- most commonly on binding a new
* texture.  Currently we effectively flush the cache on texture
* bind.
@@ -265,26 +265,26 @@ sp_find_cached_tile_tex(struct softpipe_tex_tile_cache 
*tc,
*/
   if (!zs && util_format_is_pure_uint(tc->format)) {
  pipe_get_tile_ui_format(tc->tex_trans, tc->tex_trans_map,
- addr.bits.x * TILE_SIZE,
- addr.bits.y * TILE_SIZE,
- TILE_SIZE,
- TILE_SIZE,
+ addr.bits.x * TEX_TILE_SIZE,
+ addr.bits.y * TEX_TILE_SIZE,
+ TEX_TILE_SIZE,
+ TEX_TILE_SIZE,
  tc->format,
  (unsigned *) tile->data.colorui);
   } else if (!zs && util_format_is_pure_sint(tc->format)) {
  pipe_get_tile_i_format(tc->tex_trans, tc->tex_trans_map,
-addr.bits.x * TILE_SIZE,
-addr.bits.y * TILE_SIZE,
-TILE_SIZE,
- TILE_SIZE,
+addr.bits.x * TEX_TILE_SIZE,
+addr.bits.y * TEX_TILE_SIZE,
+TEX_TILE_SIZE,
+

[Mesa-dev] [PATCH 5/5] i965/fs: Fix test for smearing enabled on an instruction.

2013-05-21 Thread Eric Anholt
We were expanding the live range too far, breaking register_coalesce_2()
and compute_to_mrf() on 16-wide shaders.  Turning it back on improves
GLB2.7 performance by 0.239355% +/- 0.0850649% (n=398), though some
16-wide shaders are lost.  shader-db stats are:

total instructions in shared programs: 1627211 -> 1609262 (-1.10%)
instructions in affected programs: 450351 -> 432402 (-3.99%)

While 33 new 16-wide shaders are gained, 70 are lost.
---
 src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp
index 3daf8fa..f5daab2 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp
@@ -216,7 +216,7 @@ fs_visitor::calculate_live_intervals()
  * pixel_x/pixel_y, which are registers of 16-bit values and thus
  * would get stomped by the first decode as well.
  */
-if (dispatch_width == 16 && (inst->src[i].smear ||
+if (dispatch_width == 16 && (inst->src[i].smear >= 0 ||
  (this->pixel_x.reg == reg ||
   this->pixel_y.reg == reg))) {
end_ip++;
-- 
1.8.3.rc0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] i965/fs: Fix segfault in instruction scheduling with LINTERP using last GRF.

2013-05-21 Thread Eric Anholt
The scheduler didn't know about uniform-type accesses, and if a uniform
access was last in a 16-wide, we'd walk off the end of the array.  This
never happened, because we'd never coalesce out all the GRFs, due to a bug
to be fixed in the next commit.
---
 src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp 
b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
index 6a52754..ccedee3 100644
--- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp
@@ -579,7 +579,10 @@ fs_instruction_scheduler::calculate_deps()
(inst->src[i].fixed_hw_reg.file ==
 BRW_GENERAL_REGISTER_FILE)) {
if (post_reg_alloc) {
-   for (int r = 0; r < reg_width; r++)
+   int size = reg_width;
+   if (inst->src[i].fixed_hw_reg.vstride == BRW_VERTICAL_STRIDE_0)
+  size = 1;
+   for (int r = 0; r < size; r++)
   add_dep(last_grf_write[inst->src[i].fixed_hw_reg.nr + r], n);
 } else {
add_dep(last_fixed_grf_write, n);
@@ -684,7 +687,10 @@ fs_instruction_scheduler::calculate_deps()
(inst->src[i].fixed_hw_reg.file ==
 BRW_GENERAL_REGISTER_FILE)) {
if (post_reg_alloc) {
-   for (int r = 0; r < reg_width; r++)
+   int size = reg_width;
+   if (inst->src[i].fixed_hw_reg.vstride == BRW_VERTICAL_STRIDE_0)
+  size = 1;
+   for (int r = 0; r < size; r++)
   add_dep(n, last_grf_write[inst->src[i].fixed_hw_reg.nr + r]);
 } else {
add_dep(n, last_fixed_grf_write);
-- 
1.8.3.rc0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] i965: Shut up more compiler warnings from vector insert/extract changes.

2013-05-21 Thread Eric Anholt
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index b7bbaab..36d9cf0 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -493,6 +493,14 @@ fs_visitor::visit(ir_expression *ir)
   assert(!"not reached: should be handled by lower_quadop_vector");
   break;
 
+   case ir_binop_vector_extract:
+  assert(!"not reached: should be handled by 
lower_vec_index_to_cond_assign()");
+  break;
+
+   case ir_triop_vector_insert:
+  assert(!"not reached: should be handled by lower_vector_insert()");
+  break;
+
case ir_unop_sqrt:
   emit_math(SHADER_OPCODE_SQRT, this->result, op[0]);
   break;
-- 
1.8.3.rc0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] mesa: Fix test for optimistic coloring being necessary.

2013-05-21 Thread Eric Anholt
i965 and radeon use ra_set_node_reg() to force payload registers to
specific registers while exposing those registers to the allocator still.
We were treating those register nodes as unsuccessfully allocated in the
ra_simplify() step, leading to walking the registers again to do
optimistic coloring even if there was nothing left ot do.
---
 src/mesa/program/register_allocate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/program/register_allocate.c 
b/src/mesa/program/register_allocate.c
index b8472a2..16739fd 100644
--- a/src/mesa/program/register_allocate.c
+++ b/src/mesa/program/register_allocate.c
@@ -437,7 +437,7 @@ ra_simplify(struct ra_graph *g)
}
 
for (i = 0; i < g->count; i++) {
-  if (!g->nodes[i].in_stack)
+  if (!g->nodes[i].in_stack && g->nodes[i].reg == -1)
 return GL_FALSE;
}
 
-- 
1.8.3.rc0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] intel: Count fragments in our blitter-based glBitmap() path.

2013-05-21 Thread Eric Anholt
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59440
---
 src/mesa/drivers/dri/intel/intel_pixel_bitmap.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/intel/intel_pixel_bitmap.c 
b/src/mesa/drivers/dri/intel/intel_pixel_bitmap.c
index 954dfc5..c538a29 100644
--- a/src/mesa/drivers/dri/intel/intel_pixel_bitmap.c
+++ b/src/mesa/drivers/dri/intel/intel_pixel_bitmap.c
@@ -259,14 +259,15 @@ do_blit_bitmap( struct gl_context *ctx,
  * Have to translate destination coordinates back into source
  * coordinates.
  */
-if (get_bitmap_rect(bitmap_width, bitmap_height, unpack,
-bitmap,
--orig_dstx + (dstx + px),
--orig_dsty + y_flip(fb, dsty + py, h),
-w, h,
-(GLubyte *)stipple,
-8,
-_mesa_is_winsys_fbo(fb)) == 0)
+ int count = get_bitmap_rect(bitmap_width, bitmap_height, unpack,
+ bitmap,
+ -orig_dstx + (dstx + px),
+ -orig_dsty + y_flip(fb, dsty + py, h),
+ w, h,
+ (GLubyte *)stipple,
+ 8,
+ _mesa_is_winsys_fbo(fb));
+ if (count == 0)
continue;
 
 if (!intelEmitImmediateColorExpandBlit(intel,
@@ -284,6 +285,9 @@ do_blit_bitmap( struct gl_context *ctx,
logic_op)) {
return false;
 }
+
+ if (ctx->Query.CurrentOcclusionObject)
+ctx->Query.CurrentOcclusionObject->Result += count;
   }
}
 out:
-- 
1.8.3.rc0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/12] i965/gen7+: Implement fast color clears.

2013-05-21 Thread Paul Berry
On 21 May 2013 16:52, Paul Berry  wrote:

> This series implements fast color clears, a Gen7+ feature which
> reduces memory bandwidth by deferring the memory writes involved in a
> glClear() until the same memory is later touched during rendering.
>
> From a broad overview point of view, fast color clears work in a
> similar way to HiZ: an auxiliary "MCS" buffer keeps track of which
> parts of the buffer have been cleared but haven't yet had the
> necessary memory writes performed.  Whenever a color buffer needs to
> be accessed by the CPU, or by a part of the GPU that is not
> fast-color-aware, we have to perform a "resolve operation" to force
> any pending memory writes to occur.
>
> This patch series adopts a slightly different strategy (compared to
> HiZ) for making sure the resolves happen when needed.  Instead of
> modifying each code path that might need to do a resolve so that it
> does one if needed, we create an accessor function that does the
> resolve if needed and then provides the caller with access to the
> miptree's underlying memory region.  This lets us have a lot more
> confidence that we didn't miss any code paths, which is important
> since color buffers are accessed by a large number of code paths.  To
> discourage future maintainers from trying to bypass the accessor
> function, it is inline (so that overhead is negligible), and the field
> it provides access to has been renamed to region_private.
>
> Patch 01 ifdefs out some code so that it does not appear in the i915
> (pre-Gen4) driver--this makes it easier to be confident that these
> changes won't regress i915.  Patch 02 introduces the aforementioned
> accessor function.  Patches 03-11 are the guts of the implementation,
> and patch 12 enables the new feature.
>
> No piglit regressions.  I have additional piglit tests which validate
> specific important corner cases--I hope to get those out to the list
> later this week.
>
> [PATCH 01/12] intel: Conditionally compile mcs-related code for i965 only.
> [PATCH 02/12] intel: Create intel_miptree_get_region() to prepare for fast
> color clear.
> [PATCH 03/12] i965/gen7+: Create an enum for keeping track of fast color
> clear state.
> [PATCH 04/12] i965/gen7+: Set up MCS in SURFACE_STATE whenever MCS is
> present.
> [PATCH 05/12] i965/gen7+: Create helper functions for single-sample MCS
> buffers.
> [PATCH 06/12] i965/gen7+: Implement fast color clear operation in BLORP.
> [PATCH 07/12] i965/blorp: Expand clear class hierarchy to prepare for RT
> resolves.
> [PATCH 08/12] i965/blorp: Write blorp code to do render target resolves.
> [PATCH 09/12] i965/gen7+: Ensure that front/back buffers are fast-clear
> resolved.
> [PATCH 10/12] i965/gen7+: Resolve color buffers when necessary.
> [PATCH 11/12] i965/gen7+: Disable fast color clears on shared regions.
> [PATCH 12/12] i965/gen7: Enable support for fast color clears.
>

Forgot to mention: this series is avaiable on branch "fast-color-clear-2"
of git://github.com/stereotype441/mesa.git.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/12] i965/gen7+: Disable fast color clears on shared regions.

2013-05-21 Thread Paul Berry
In certain circumstances the memory region underlying a miptree is
shared with other miptrees, or with other code outside Mesa's control.
This happens, for instance, when an extension like GL_OES_EGL_image or
GLX_EXT_texture_from_pixmap extension is used to associate a miptree
with an image existing outside of Mesa.

When this happens, we need to disable fast color clears on the miptree
in question, since there's no good synchronization mechanism to ensure
that deferred clear writes get performed by the time the buffer is
examined from the other miptree, or from outside of Mesa.

Fortunately, this should not be a performance hit for most
applications, since most applications that use these extensions use
them for importing textures into Mesa, rather than for exporting
rendered images out of Mesa.  So most of the time the miptrees
involved will never experience a clear.
---
 src/mesa/drivers/dri/intel/intel_mipmap_tree.h | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/intel/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/intel/intel_mipmap_tree.h
index 56e5c88..9829462 100644
--- a/src/mesa/drivers/dri/intel/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/intel/intel_mipmap_tree.h
@@ -805,9 +805,14 @@ intel_miptree_get_region(struct intel_context *intel,
   intel_miptree_resolve_color(intel, mt);
   break;
case INTEL_MIPTREE_ACCESS_SHARED:
-  /* TODO: resolve and then discard MCS buffer since fast color clears are
-   * unsafe with shared buffers.
+  /* Fast color clears are unsafe with shared buffers, so resolve and then
+   * discard the MCS buffer, if present.  Also set the mcs_state to
+   * INTEL_MCS_STATE_NONE to ensure that no MCS buffer gets allocated in
+   * the future.
*/
+  intel_miptree_resolve_color(intel, mt);
+  intel_miptree_release(&mt->mcs_mt);
+  mt->mcs_state = INTEL_MCS_STATE_NONE;
   break;
case INTEL_MIPTREE_ACCESS_RENDER:
   /* If the buffer was previously in fast clear state, change it to
-- 
1.8.2.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/12] i965/gen7: Enable support for fast color clears.

2013-05-21 Thread Paul Berry
This patch adds code to place mcs_state into INTEL_MCS_STATE_RESOLVED
for miptrees that are capable of supporting fast color clears.  This
will have no effect on buffers that don't undergo a fast color clear;
however, for buffers that do undergo a fast color clear, an MCS
miptree will be allocated (at the time of the first fast clear), and
will be used thereafter.
---
 src/mesa/drivers/dri/intel/intel_mipmap_tree.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/src/mesa/drivers/dri/intel/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
index 7a3c135..5b9771f 100644
--- a/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
@@ -602,6 +602,16 @@ intel_miptree_create(struct intel_context *intel,
return NULL;
}
 
+#ifndef I915
+   /* If this miptree is capable of supporting fast color clears, set
+* mcs_state appropriately to ensure that fast clears will occur.
+* Allocation of the MCS miptree will be deferred until the first fast
+* clear actually occurs.
+*/
+   if (intel_is_non_msrt_mcs_buffer_supported(intel, mt))
+  mt->mcs_state = INTEL_MCS_STATE_RESOLVED;
+#endif
+
return mt;
 }
 
@@ -657,6 +667,16 @@ intel_miptree_create_for_dri2_buffer(struct intel_context 
*intel,
if (!singlesample_mt)
   return NULL;
 
+#ifndef I915
+   /* If this miptree is capable of supporting fast color clears, set
+* mcs_state appropriately to ensure that fast clears will occur.
+* Allocation of the MCS miptree will be deferred until the first fast
+* clear actually occurs.
+*/
+   if (intel_is_non_msrt_mcs_buffer_supported(intel, singlesample_mt))
+  singlesample_mt->mcs_state = INTEL_MCS_STATE_RESOLVED;
+#endif
+
if (num_samples == 0)
   return singlesample_mt;
 
-- 
1.8.2.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/12] i965/gen7+: Resolve color buffers when necessary.

2013-05-21 Thread Paul Berry
Resolve color buffers that have been fast-color cleared:
1. before texturing from the buffer
2. before using the buffer as the source in a blorp blit
3. before mapping the buffer's miptree
4. before accessing the buffer using the hardware blitter

Cases 1 and 2 happen in the functions brw_predraw_resolve_buffers()
and brw_blorp_blit_miptrees(), respectively.  Cases 3 and 4 are
handled by the intel_miptree_get_region() function.

In order to make sure that intel_miptree_get_region() doesn't try to
resolve a buffer during emission of state commands, this patch also
adds a new boolean, brw->state_emission_in_progress, which is set to
true during brw_upload_state() and brw_blorp_exec().
---
 src/mesa/drivers/dri/i965/brw_blorp.cpp| 3 +++
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp   | 7 +++
 src/mesa/drivers/dri/i965/brw_blorp_clear.cpp  | 8 
 src/mesa/drivers/dri/i965/brw_context.h| 2 ++
 src/mesa/drivers/dri/i965/brw_draw.c   | 6 +-
 src/mesa/drivers/dri/i965/brw_state_upload.c   | 4 
 src/mesa/drivers/dri/intel/intel_mipmap_tree.h | 2 +-
 7 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp.cpp
index c6019d1..984b2a1 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp
@@ -172,6 +172,7 @@ void
 brw_blorp_exec(struct intel_context *intel, const brw_blorp_params *params)
 {
struct brw_context *brw = brw_context(&intel->ctx);
+   brw->state_emission_in_progress = true;
 
switch (intel->gen) {
case 6:
@@ -186,6 +187,8 @@ brw_blorp_exec(struct intel_context *intel, const 
brw_blorp_params *params)
   break;
}
 
+   brw->state_emission_in_progress = false;
+
if (unlikely(intel->always_flush_batch))
   intel_batchbuffer_flush(intel);
 
diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index c3ef054..7a47632 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -132,6 +132,13 @@ brw_blorp_blit_miptrees(struct intel_context *intel,
 int dst_x1, int dst_y1,
 bool mirror_x, bool mirror_y)
 {
+   /* Get ready to blit.  This includes depth resolving the src and dst
+* buffers if necessary.  Note: it's not necessary to do a color resolve on
+* the destination buffer because we use the standard render path to render
+* to destination color buffers, and the standard render path is
+* fast-color-aware.
+*/
+   intel_miptree_resolve_color(intel, src_mt);
intel_miptree_slice_resolve_depth(intel, src_mt, src_level, src_layer);
intel_miptree_slice_resolve_depth(intel, dst_mt, dst_level, dst_layer);
 
diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
index a598dff..d397917 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
@@ -503,6 +503,14 @@ void
 brw_blorp_resolve_color(struct intel_context *intel, struct intel_mipmap_tree 
*mt)
 {
struct brw_context *brw = brw_context(&intel->ctx);
+
+   /* We can't safely resolve the render target while emitting 3D state.  If
+* the following assertion fails, that indicates that the render target
+* resolve should have been performed earlier, e.g. by
+* brw_predraw_resolve_buffers().
+*/
+   assert(!brw->state_emission_in_progress);
+
brw_blorp_rt_resolve_params params(brw, mt);
brw_blorp_exec(intel, ¶ms);
mt->mcs_state = INTEL_MCS_STATE_RESOLVED;
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index ee7bf33..1961c01 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1131,6 +1131,8 @@ struct brw_context
   int max_entries;
   double report_time;
} shader_time;
+
+   bool state_emission_in_progress;
 };
 
 /*==
diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index 8c37e0b..d8c1ecf 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -41,6 +41,7 @@
 #include "swrast_setup/swrast_setup.h"
 #include "drivers/common/meta.h"
 
+#include "brw_blorp.h"
 #include "brw_draw.h"
 #include "brw_defines.h"
 #include "brw_context.h"
@@ -310,7 +311,9 @@ brw_predraw_resolve_buffers(struct brw_context *brw)
if (depth_irb)
   intel_renderbuffer_resolve_hiz(intel, depth_irb);
 
-   /* Resolve depth buffer of each enabled depth texture. */
+   /* Resolve depth buffer of each enabled depth texture, and color buffer of
+* each fast-clear-enabled color texture.
+*/
for (int i = 0; i < BRW_MAX_TEX_UNIT; i++) {
   if (!ctx->Texture.Unit[i]._ReallyEnabled)
 continue;
@@ 

[Mesa-dev] [PATCH 09/12] i965/gen7+: Ensure that front/back buffers are fast-clear resolved.

2013-05-21 Thread Paul Berry
We already had code in intel_downsample_for_dri2_flush() for
downsampling front and back buffers when multisampling was in use.
This patch extends that function to perform fast color clear resolves
when necessary.

To account for the additional functionality, the function is renamed
to simply intel_resolve_for_dri2_flush().
---
 src/mesa/drivers/dri/intel/intel_context.c | 21 -
 src/mesa/drivers/dri/intel/intel_context.h |  4 ++--
 src/mesa/drivers/dri/intel/intel_screen.c  |  2 +-
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/src/mesa/drivers/dri/intel/intel_context.c 
b/src/mesa/drivers/dri/intel/intel_context.c
index fab99c1..eaaf4ec 100644
--- a/src/mesa/drivers/dri/intel/intel_context.c
+++ b/src/mesa/drivers/dri/intel/intel_context.c
@@ -249,12 +249,12 @@ intelGetString(struct gl_context * ctx, GLenum name)
 }
 
 void
-intel_downsample_for_dri2_flush(struct intel_context *intel,
-__DRIdrawable *drawable)
+intel_resolve_for_dri2_flush(struct intel_context *intel,
+ __DRIdrawable *drawable)
 {
if (intel->gen < 6) {
-  /* MSAA is not supported, so don't waste time checking for
-   * a multisample buffer.
+  /* MSAA and fast color clear are not supported, so don't waste time
+   * checking whether a resolve is needed.
*/
   return;
}
@@ -274,7 +274,10 @@ intel_downsample_for_dri2_flush(struct intel_context 
*intel,
   rb = intel_get_renderbuffer(fb, buffers[i]);
   if (rb == NULL || rb->mt == NULL)
  continue;
-  intel_miptree_downsample(intel, rb->mt);
+  if (rb->mt->num_samples <= 1)
+ intel_miptree_resolve_color(intel, rb->mt);
+  else
+ intel_miptree_downsample(intel, rb->mt);
}
 }
 
@@ -291,14 +294,14 @@ intel_flush_front(struct gl_context *ctx)
   driDrawable &&
   driDrawable->loaderPrivate) {
 
- /* Downsample before flushing FAKE_FRONT_LEFT to FRONT_LEFT.
+ /* Resolve before flushing FAKE_FRONT_LEFT to FRONT_LEFT.
   *
-  * This potentially downsamples both front and back buffer. It
-  * is unnecessary to downsample the back, but harms nothing except
+  * This potentially resolves both front and back buffer. It
+  * is unnecessary to resolve the back, but harms nothing except
   * performance. And no one cares about front-buffer render
   * performance.
   */
- intel_downsample_for_dri2_flush(intel, driDrawable);
+ intel_resolve_for_dri2_flush(intel, driDrawable);
 
  screen->dri2.loader->flushFrontBuffer(driDrawable,
driDrawable->loaderPrivate);
diff --git a/src/mesa/drivers/dri/intel/intel_context.h 
b/src/mesa/drivers/dri/intel/intel_context.h
index c0f07ff..6b8951c 100644
--- a/src/mesa/drivers/dri/intel/intel_context.h
+++ b/src/mesa/drivers/dri/intel/intel_context.h
@@ -614,8 +614,8 @@ void intel_update_renderbuffers(__DRIcontext *context,
 void intel_prepare_render(struct intel_context *intel);
 
 void
-intel_downsample_for_dri2_flush(struct intel_context *intel,
-__DRIdrawable *drawable);
+intel_resolve_for_dri2_flush(struct intel_context *intel,
+ __DRIdrawable *drawable);
 
 void i915_set_buf_info_for_region(uint32_t *state, struct intel_region *region,
  uint32_t buffer_id);
diff --git a/src/mesa/drivers/dri/intel/intel_screen.c 
b/src/mesa/drivers/dri/intel/intel_screen.c
index cf1044c..54ef7d0 100644
--- a/src/mesa/drivers/dri/intel/intel_screen.c
+++ b/src/mesa/drivers/dri/intel/intel_screen.c
@@ -173,7 +173,7 @@ intelDRI2Flush(__DRIdrawable *drawable)
if (intel->gen < 4)
   INTEL_FIREVERTICES(intel);
 
-   intel_downsample_for_dri2_flush(intel, drawable);
+   intel_resolve_for_dri2_flush(intel, drawable);
intel->need_throttle = true;
 
if (intel->batch.used)
-- 
1.8.2.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/12] i965/blorp: Write blorp code to do render target resolves.

2013-05-21 Thread Paul Berry
This patch implements the "render target resolve" blorp operation.
This will be needed when a buffer that has experienced a fast color
clear is later used for a purpose other than as a render target
(texturing, glReadPixels, or swapped to the screen).  It resolves any
remaining deferred clear operation that was not taken care of during
normal rendering.

Fortunately not much work is necessary; all we need to do is scale
down the size of the rectangle primitive being emitted, run the
fragment shader with the "Render Target Resolve Enable" bit set, and
ensure that the fragment shader writes to the render target using the
"replicated color" message.  We already have a fragment shader that
does that (the shader that we use for fast color clears), so for
simplicity we re-use it.
---
 src/mesa/drivers/dri/i965/brw_blorp.h  |  5 +++
 src/mesa/drivers/dri/i965/brw_blorp_clear.cpp  | 60 ++
 src/mesa/drivers/dri/i965/brw_defines.h|  1 +
 src/mesa/drivers/dri/i965/gen7_blorp.cpp   |  3 ++
 src/mesa/drivers/dri/intel/intel_mipmap_tree.c | 23 ++
 src/mesa/drivers/dri/intel/intel_mipmap_tree.h |  4 ++
 6 files changed, 96 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h 
b/src/mesa/drivers/dri/i965/brw_blorp.h
index 687d7eb..2750aba 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.h
+++ b/src/mesa/drivers/dri/i965/brw_blorp.h
@@ -49,6 +49,10 @@ bool
 brw_blorp_clear_color(struct intel_context *intel, struct gl_framebuffer *fb,
   bool partial_clear);
 
+void
+brw_blorp_resolve_color(struct intel_context *intel,
+struct intel_mipmap_tree *mt);
+
 #ifdef __cplusplus
 } /* end extern "C" */
 
@@ -200,6 +204,7 @@ struct brw_blorp_prog_data
 enum gen7_fast_clear_op {
GEN7_FAST_CLEAR_OP_NONE,
GEN7_FAST_CLEAR_OP_FAST_CLEAR,
+   GEN7_FAST_CLEAR_OP_RESOLVE,
 };
 
 
diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
index 4ced318..a598dff 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
@@ -68,6 +68,20 @@ public:
   bool partial_clear);
 };
 
+
+/**
+ * Parameters for a blorp operation that performs a "render target resolve".
+ * This is used to resolve pending fast clear pixels before a color buffer is
+ * used for texturing, ReadPixels, or scanout.
+ */
+class brw_blorp_rt_resolve_params : public brw_blorp_const_color_params
+{
+public:
+   brw_blorp_rt_resolve_params(struct brw_context *brw,
+   struct intel_mipmap_tree *mt);
+};
+
+
 class brw_blorp_const_color_program
 {
 public:
@@ -264,6 +278,43 @@ brw_blorp_clear_params::brw_blorp_clear_params(struct 
brw_context *brw,
}
 }
 
+
+brw_blorp_rt_resolve_params::brw_blorp_rt_resolve_params(
+  struct brw_context *brw,
+  struct intel_mipmap_tree *mt)
+{
+   dst.set(brw, mt, 0 /* level */, 0 /* layer */);
+
+   /* From the Ivy Bridge PRM, Vol2 Part1 11.9 "Render Target Resolve":
+*
+* A rectangle primitive must be scaled down by the following factors
+* with respect to render target being resolved.
+*
+* The scaledown factors in the table that follows are related to the
+* alignment size returned by intel_get_non_msrt_mcs_alignment(), but with
+* X and Y alignment each divided by 2.
+*/
+   unsigned x_align, y_align;
+   intel_get_non_msrt_mcs_alignment(&brw->intel, mt, &x_align, &y_align);
+   unsigned x_scaledown = x_align / 2;
+   unsigned y_scaledown = y_align / 2;
+   x0 = y0 = 0;
+   x1 = ALIGN(mt->logical_width0, x_scaledown) / x_scaledown;
+   y1 = ALIGN(mt->logical_height0, y_scaledown) / y_scaledown;
+
+   fast_clear_op = GEN7_FAST_CLEAR_OP_RESOLVE;
+
+   /* Note: there is no need to initialize push constants because it doesn't
+* matter what data gets dispatched to the render target.  However, we must
+* ensure that the fragment shader delivers the data using the "replicated
+* color" message.
+*/
+   use_wm_prog = true;
+   memset(&wm_prog_key, 0, sizeof(wm_prog_key));
+   wm_prog_key.use_simd16_replicated_data = true;
+}
+
+
 uint32_t
 brw_blorp_const_color_params::get_wm_prog(struct brw_context *brw,
   brw_blorp_prog_data **prog_data)
@@ -448,4 +499,13 @@ brw_blorp_clear_color(struct intel_context *intel, struct 
gl_framebuffer *fb,
return true;
 }
 
+void
+brw_blorp_resolve_color(struct intel_context *intel, struct intel_mipmap_tree 
*mt)
+{
+   struct brw_context *brw = brw_context(&intel->ctx);
+   brw_blorp_rt_resolve_params params(brw, mt);
+   brw_blorp_exec(intel, ¶ms);
+   mt->mcs_state = INTEL_MCS_STATE_RESOLVED;
+}
+
 } /* extern "C" */
diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 90b16ab..b81517d 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -16

[Mesa-dev] [PATCH 07/12] i965/blorp: Expand clear class hierarchy to prepare for RT resolves.

2013-05-21 Thread Paul Berry
The fragment shaders that to do color clears will be re-used to
perform so-called "render target resolves" (the resolves associated
with fast color clears).  To prepare for that, this patch expands the
class hierarchy for blorp params by adding
brw_blorp_const_color_params (which will be used for all blorp
operations where the fragment shader outputs a constant color).

Some other data structures and functions were also renamed to use
"const_color" nomenclature where appropriate.
---
 src/mesa/drivers/dri/i965/brw_blorp_clear.cpp | 58 ---
 src/mesa/drivers/dri/i965/brw_context.h   |  2 +-
 2 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
index 675289b..4ced318 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
@@ -37,13 +37,28 @@ extern "C" {
 #include "brw_eu.h"
 #include "brw_state.h"
 
-struct brw_blorp_clear_prog_key
+struct brw_blorp_const_color_prog_key
 {
bool use_simd16_replicated_data;
bool pad[3];
 };
 
-class brw_blorp_clear_params : public brw_blorp_params
+/**
+ * Parameters for a blorp operation where the fragment shader outputs a
+ * constant color.  This is used for both fast color clears and color
+ * resolves.
+ */
+class brw_blorp_const_color_params : public brw_blorp_params
+{
+public:
+   virtual uint32_t get_wm_prog(struct brw_context *brw,
+brw_blorp_prog_data **prog_data) const;
+
+protected:
+   brw_blorp_const_color_prog_key wm_prog_key;
+};
+
+class brw_blorp_clear_params : public brw_blorp_const_color_params
 {
 public:
brw_blorp_clear_params(struct brw_context *brw,
@@ -51,20 +66,14 @@ public:
   struct gl_renderbuffer *rb,
   GLubyte *color_mask,
   bool partial_clear);
-
-   virtual uint32_t get_wm_prog(struct brw_context *brw,
-brw_blorp_prog_data **prog_data) const;
-
-private:
-   brw_blorp_clear_prog_key wm_prog_key;
 };
 
-class brw_blorp_clear_program
+class brw_blorp_const_color_program
 {
 public:
-   brw_blorp_clear_program(struct brw_context *brw,
-  const brw_blorp_clear_prog_key *key);
-   ~brw_blorp_clear_program();
+   brw_blorp_const_color_program(struct brw_context *brw,
+ const brw_blorp_const_color_prog_key *key);
+   ~brw_blorp_const_color_program();
 
const GLuint *compile(struct brw_context *brw, GLuint *program_size);
 
@@ -75,7 +84,7 @@ private:
 
void *mem_ctx;
struct brw_context *brw;
-   const brw_blorp_clear_prog_key *key;
+   const brw_blorp_const_color_prog_key *key;
struct brw_compile func;
 
/* Thread dispatch header */
@@ -91,9 +100,9 @@ private:
GLuint base_mrf;
 };
 
-brw_blorp_clear_program::brw_blorp_clear_program(
+brw_blorp_const_color_program::brw_blorp_const_color_program(
   struct brw_context *brw,
-  const brw_blorp_clear_prog_key *key)
+  const brw_blorp_const_color_prog_key *key)
: mem_ctx(ralloc_context(NULL)),
  brw(brw),
  key(key)
@@ -101,7 +110,7 @@ brw_blorp_clear_program::brw_blorp_clear_program(
brw_init_compile(brw, &func, mem_ctx);
 }
 
-brw_blorp_clear_program::~brw_blorp_clear_program()
+brw_blorp_const_color_program::~brw_blorp_const_color_program()
 {
ralloc_free(mem_ctx);
 }
@@ -256,17 +265,18 @@ brw_blorp_clear_params::brw_blorp_clear_params(struct 
brw_context *brw,
 }
 
 uint32_t
-brw_blorp_clear_params::get_wm_prog(struct brw_context *brw,
-   brw_blorp_prog_data **prog_data) const
+brw_blorp_const_color_params::get_wm_prog(struct brw_context *brw,
+  brw_blorp_prog_data **prog_data)
+   const
 {
uint32_t prog_offset;
-   if (!brw_search_cache(&brw->cache, BRW_BLORP_CLEAR_PROG,
+   if (!brw_search_cache(&brw->cache, BRW_BLORP_CONST_COLOR_PROG,
  &this->wm_prog_key, sizeof(this->wm_prog_key),
  &prog_offset, prog_data)) {
-  brw_blorp_clear_program prog(brw, &this->wm_prog_key);
+  brw_blorp_const_color_program prog(brw, &this->wm_prog_key);
   GLuint program_size;
   const GLuint *program = prog.compile(brw, &program_size);
-  brw_upload_cache(&brw->cache, BRW_BLORP_CLEAR_PROG,
+  brw_upload_cache(&brw->cache, BRW_BLORP_CONST_COLOR_PROG,
&this->wm_prog_key, sizeof(this->wm_prog_key),
program, program_size,
&prog.prog_data, sizeof(prog.prog_data),
@@ -276,7 +286,7 @@ brw_blorp_clear_params::get_wm_prog(struct brw_context *brw,
 }
 
 void
-brw_blorp_clear_program::alloc_regs()
+brw_blorp_const_color_program::alloc_regs()
 {
int reg = 0;
this->R0 = retype(brw_vec8_grf(reg++, 0), BRW_REGISTER_TYPE_UW);
@@ -293,8 +303,8 @@ brw_blorp_clear_progr

[Mesa-dev] [PATCH 06/12] i965/gen7+: Implement fast color clear operation in BLORP.

2013-05-21 Thread Paul Berry
Since we defer allocation of the MCS miptree until the time of the
fast clear operation, this patch also implements creation of the MCS
miptree.

In addition, this patch adds the field
intel_mipmap_tree::fast_clear_color_value, which holds the most recent
fast color clear value, if any. We use it to set the SURFACE_STATE's
clear color for render targets.
---
 src/mesa/drivers/dri/i965/brw_blorp.cpp   |   1 +
 src/mesa/drivers/dri/i965/brw_blorp.h |  11 +-
 src/mesa/drivers/dri/i965/brw_blorp_clear.cpp | 143 +-
 src/mesa/drivers/dri/i965/brw_clear.c |   2 +-
 src/mesa/drivers/dri/i965/brw_defines.h   |   2 +
 src/mesa/drivers/dri/i965/gen7_blorp.cpp  |  18 ++-
 src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |  10 +-
 src/mesa/drivers/dri/intel/intel_mipmap_tree.c|  47 +++
 src/mesa/drivers/dri/intel/intel_mipmap_tree.h|  13 ++
 9 files changed, 233 insertions(+), 14 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp.cpp
index 20f7153..c6019d1 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp.cpp
@@ -147,6 +147,7 @@ brw_blorp_params::brw_blorp_params()
  y1(0),
  depth_format(0),
  hiz_op(GEN6_HIZ_OP_NONE),
+ fast_clear_op(GEN7_FAST_CLEAR_OP_NONE),
  num_samples(0),
  use_wm_prog(false)
 {
diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h 
b/src/mesa/drivers/dri/i965/brw_blorp.h
index 6360a62..687d7eb 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.h
+++ b/src/mesa/drivers/dri/i965/brw_blorp.h
@@ -46,7 +46,8 @@ brw_blorp_blit_miptrees(struct intel_context *intel,
 bool mirror_x, bool mirror_y);
 
 bool
-brw_blorp_clear_color(struct intel_context *intel, struct gl_framebuffer *fb);
+brw_blorp_clear_color(struct intel_context *intel, struct gl_framebuffer *fb,
+  bool partial_clear);
 
 #ifdef __cplusplus
 } /* end extern "C" */
@@ -195,6 +196,13 @@ struct brw_blorp_prog_data
bool persample_msaa_dispatch;
 };
 
+
+enum gen7_fast_clear_op {
+   GEN7_FAST_CLEAR_OP_NONE,
+   GEN7_FAST_CLEAR_OP_FAST_CLEAR,
+};
+
+
 class brw_blorp_params
 {
 public:
@@ -212,6 +220,7 @@ public:
brw_blorp_surface_info src;
brw_blorp_surface_info dst;
enum gen6_hiz_op hiz_op;
+   enum gen7_fast_clear_op fast_clear_op;
unsigned num_samples;
bool use_wm_prog;
brw_blorp_wm_push_constants wm_push_consts;
diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
index 28d7ad0..675289b 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
@@ -49,7 +49,8 @@ public:
brw_blorp_clear_params(struct brw_context *brw,
   struct gl_framebuffer *fb,
   struct gl_renderbuffer *rb,
-  GLubyte *color_mask);
+  GLubyte *color_mask,
+  bool partial_clear);
 
virtual uint32_t get_wm_prog(struct brw_context *brw,
 brw_blorp_prog_data **prog_data) const;
@@ -105,10 +106,49 @@ brw_blorp_clear_program::~brw_blorp_clear_program()
ralloc_free(mem_ctx);
 }
 
+
+/**
+ * Determine if fast color clear supports the given clear color.
+ *
+ * Fast color clear can only clear to color values of 1.0 or 0.0.  At the
+ * moment we only support floating point buffers.
+ */
+static bool
+is_color_fast_clear_compatible(gl_format format,
+   const union gl_color_union *color)
+{
+   if (_mesa_is_format_integer_color(format))
+  return false;
+
+   for (int i = 0; i < 4; i++) {
+  if (color->f[i] != 0.0 && color->f[i] != 1.0)
+ return false;
+   }
+   return true;
+}
+
+
+/**
+ * Convert the given color to a bitfield suitable for ORing into DWORD 7 of
+ * SURFACE_STATE.
+ */
+static uint32_t
+compute_fast_clear_color_bits(const union gl_color_union *color)
+{
+   uint32_t bits = 0;
+   for (int i = 0; i < 4; i++) {
+  if (color->f[i] != 0.0)
+ bits |= 1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i));
+   }
+   return bits;
+}
+
+
 brw_blorp_clear_params::brw_blorp_clear_params(struct brw_context *brw,
struct gl_framebuffer *fb,
struct gl_renderbuffer *rb,
-   GLubyte *color_mask)
+   GLubyte *color_mask,
+   bool partial_clear)
 {
struct intel_context *intel = &brw->intel;
struct gl_context *ctx = &intel->ctx;
@@ -163,6 +203,56 @@ brw_blorp_clear_params::brw_blorp_clear_params(struct 
brw_context *brw,
  wm_prog_key.use_simd16_replicated_data = false;
   }
}
+
+   /* If we can do this as a fast color clear, do so. */
+   if (irb-

[Mesa-dev] [PATCH 05/12] i965/gen7+: Create helper functions for single-sample MCS buffers.

2013-05-21 Thread Paul Berry
---
 src/mesa/drivers/dri/intel/intel_mipmap_tree.c | 123 +
 src/mesa/drivers/dri/intel/intel_mipmap_tree.h |   7 ++
 2 files changed, 130 insertions(+)

diff --git a/src/mesa/drivers/dri/intel/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
index a366892..9d1b91a 100644
--- a/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
@@ -124,6 +124,129 @@ compute_msaa_layout(struct intel_context *intel, 
gl_format format, GLenum target
 
 
 /**
+ * For single-sampled render targets ("non-MSRT"), the MCS buffer is a
+ * scaled-down bitfield representation of the color buffer which is capable of
+ * recording when blocks of the color buffer are equal to the clear value.
+ * This function returns the block size that will be used by the MCS buffer
+ * corresponding to a certain color miptree.
+ *
+ * From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for Render Target(s)",
+ * beneath the "Fast Color Clear" bullet (p327):
+ *
+ * The following table describes the RT alignment
+ *
+ *   Pixels  Lines
+ * TiledY RT CL
+ * bpp
+ *  32  8  4
+ *  64  4  4
+ * 128  2  4
+ * TiledX RT CL
+ * bpp
+ *  32 16  2
+ *  64  8  2
+ * 128  4  2
+ *
+ * This alignment has the following uses:
+ *
+ * - For figuring out the size of the MCS buffer.  Each 4k tile in the MCS
+ *   buffer contains 128 blocks horizontally and 256 blocks vertically.
+ *
+ * - For figuring out alignment restrictions for a fast clear operation.  Fast
+ *   clear operations must always clear aligned multiples of 16 blocks
+ *   horizontally and 32 blocks vertically.
+ *
+ * - For scaling down the coordinates sent through the render pipeline during
+ *   a fast clear.  X coordinates must be scaled down by 8 times the block
+ *   width, and Y coordinates by 16 times the block height.
+ *
+ * - For scaling down the coordinates sent through the render pipeline during
+ *   a "Render Target Resolve" operation.  X coordinates must be scaled down
+ *   by half the block width, and Y coordinates by half the block height.
+ */
+void
+intel_get_non_msrt_mcs_alignment(struct intel_context *intel,
+ struct intel_mipmap_tree *mt,
+ unsigned *width_px, unsigned *height)
+{
+   struct intel_region *region =
+  intel_miptree_get_region(intel, mt, INTEL_MIPTREE_ACCESS_NONE);
+   switch (region->tiling) {
+   default:
+  assert(!"Non-MSRT MCS requires X or Y tiling");
+  /* In release builds, fall through */
+   case I915_TILING_Y:
+  *width_px = 32 / mt->cpp;
+  *height = 4;
+  break;
+   case I915_TILING_X:
+  *width_px = 64 / mt->cpp;
+  *height = 2;
+   }
+}
+
+
+/**
+ * For a single-sampled render target ("non-MSRT"), determine if an MCS buffer
+ * can be used.
+ *
+ * From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for Render Target(s)",
+ * beneath the "Fast Color Clear" bullet (p326):
+ *
+ * - Support is limited to tiled render targets.
+ * - Support is for non-mip-mapped and non-array surface types only.
+ *
+ * And then later, on p327:
+ *
+ * - MCS buffer for non-MSRT is supported only for RT formats 32bpp,
+ *   64bpp, and 128bpp.
+ */
+bool
+intel_is_non_msrt_mcs_buffer_supported(struct intel_context *intel,
+   struct intel_mipmap_tree *mt)
+{
+#ifdef I915
+   /* MCS is not supported on the i915 (pre-Gen4) driver */
+   return false;
+#else
+   struct brw_context *brw = brw_context(&intel->ctx);
+   struct intel_region *region =
+  intel_miptree_get_region(intel, mt, INTEL_MIPTREE_ACCESS_NONE);
+
+   /* MCS support does not exist prior to Gen7 */
+   if (intel->gen < 7)
+  return false;
+
+   /* MCS is only supported for color buffers */
+   switch (_mesa_get_format_base_format(mt->format)) {
+   case GL_DEPTH_COMPONENT:
+   case GL_DEPTH_STENCIL:
+   case GL_STENCIL_INDEX:
+  return false;
+   }
+
+   if (region->tiling != I915_TILING_X &&
+   region->tiling != I915_TILING_Y)
+  return false;
+   if (mt->cpp != 4 && mt->cpp != 8 && mt->cpp != 16)
+  return false;
+   if (mt->first_level != 0 || mt->last_level != 0)
+  return false;
+   if (mt->physical_depth0 != 1)
+  return false;
+
+   /* There's no point in using an MCS buffer if the surface isn't in a
+* renderable format.
+*/
+   if (!brw->format_supported_as_render_target[mt->format])
+  return false;
+
+   return true;
+#endif
+}
+
+
+/**
  * @param for_region Indicates that the caller is
  *intel_miptree_create_for_region(). If true, then do not create
  *\c stencil_mt.
diff --git a/src/mesa/drivers/dri/intel/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/intel/intel_mipmap_tree.h
index ea4cd17..5c

[Mesa-dev] [PATCH 04/12] i965/gen7+: Set up MCS in SURFACE_STATE whenever MCS is present.

2013-05-21 Thread Paul Berry
On Gen7+, MCS buffers are used both for compressed multisampled color
buffers and for "fast clear" of single-sampled color buffers.

Previous to this patch series, we didn't support fast clear, so we
only used MCS with multisampled bolor buffers.

As a first step to implementing fast clears, this patch modifies the
code that sets up SURFACE_STATE so that it configures the MCS buffer
whenever it is present, regardless of whether we are multisampling or
not.
---
 src/mesa/drivers/dri/i965/gen7_blorp.cpp  | 2 +-
 src/mesa/drivers/dri/i965/gen7_wm_surface_state.c | 2 +-
 src/mesa/drivers/dri/intel/intel_mipmap_tree.h| 8 +---
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp 
b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
index f9eda63..2d09c7f 100644
--- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
+++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
@@ -197,7 +197,7 @@ gen7_blorp_emit_surface_state(struct brw_context *brw,
surf[3] = pitch_bytes - 1;
 
surf[4] = gen7_surface_msaa_bits(surface->num_samples, 
surface->msaa_layout);
-   if (surface->msaa_layout == INTEL_MSAA_LAYOUT_CMS) {
+   if (surface->mt->mcs_mt) {
   gen7_set_surface_mcs_info(brw, surf, wm_surf_offset, surface->mt->mcs_mt,
 is_render_target);
}
diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
index ea818f4..f5d2e43 100644
--- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
@@ -584,7 +584,7 @@ gen7_update_renderbuffer_surface(struct brw_context *brw,
 
surf[4] = gen7_surface_msaa_bits(irb->mt->num_samples, 
irb->mt->msaa_layout);
 
-   if (irb->mt->msaa_layout == INTEL_MSAA_LAYOUT_CMS) {
+   if (irb->mt->mcs_mt) {
   gen7_set_surface_mcs_info(brw, surf, brw->wm.surf_offset[unit],
 irb->mt->mcs_mt, true /* is RT */);
}
diff --git a/src/mesa/drivers/dri/intel/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/intel/intel_mipmap_tree.h
index ce774c6..ea4cd17 100644
--- a/src/mesa/drivers/dri/intel/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/intel/intel_mipmap_tree.h
@@ -447,11 +447,13 @@ struct intel_mipmap_tree
 
 #ifndef I915
/**
-* \brief MCS miptree for multisampled textures.
+* \brief MCS miptree.
 *
 * This miptree contains the "multisample control surface", which stores
-* the necessary information to implement compressed MSAA on Gen7+
-* (INTEL_MSAA_FORMAT_CMS).
+* the necessary information to implement compressed MSAA
+* (INTEL_MSAA_FORMAT_CMS) and "fast color clear" behaviour on Gen7+.
+*
+* NULL if no MCS miptree is in use for this surface.
 */
struct intel_mipmap_tree *mcs_mt;
 
-- 
1.8.2.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/12] i965/gen7+: Create an enum for keeping track of fast color clear state.

2013-05-21 Thread Paul Berry
This patch includes code to update the fast color clear state
appropriately when rendering occurs.  The state will also need to be
updated when a fast clear or a resolve operation is performed; those
state updates will be added when the fast clear and resolve operations
are added.
---
 src/mesa/drivers/dri/intel/intel_mipmap_tree.c |   4 +
 src/mesa/drivers/dri/intel/intel_mipmap_tree.h | 103 +
 2 files changed, 107 insertions(+)

diff --git a/src/mesa/drivers/dri/intel/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
index 4440885..a366892 100644
--- a/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
@@ -154,6 +154,9 @@ intel_miptree_create_layout(struct intel_context *intel,
mt->logical_width0 = width0;
mt->logical_height0 = height0;
mt->logical_depth0 = depth0;
+#ifndef I915
+   mt->mcs_state = INTEL_MCS_STATE_NONE;
+#endif
 
/* The cpp is bytes per (1, blockheight)-sized block for compressed
 * textures.  This is why you'll see divides by blockheight all over
@@ -1004,6 +1007,7 @@ intel_miptree_alloc_mcs(struct intel_context *intel,
 *
 * "The MCS surface must be stored as Tile Y."
 */
+   mt->mcs_state = INTEL_MCS_STATE_MSAA;
mt->mcs_mt = intel_miptree_create(intel,
  mt->target,
  format,
diff --git a/src/mesa/drivers/dri/intel/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/intel/intel_mipmap_tree.h
index 81a3b69..ce774c6 100644
--- a/src/mesa/drivers/dri/intel/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/intel/intel_mipmap_tree.h
@@ -200,6 +200,74 @@ enum intel_msaa_layout
INTEL_MSAA_LAYOUT_CMS,
 };
 
+
+#ifndef I915
+/**
+ * Enum for keeping track of the state of an MCS buffer associated with a
+ * miptree.  This determines when fast clear related operations are needed.
+ *
+ * Fast clear works by deferring the memory writes that would be used to clear
+ * the buffer, so that instead of performing them at the time of the clear
+ * operation, the hardware automatically performs them at the time that the
+ * buffer is later accessed for rendering.  The MCS buffer keeps track of
+ * which regions of the buffer still have pending clear writes.
+ *
+ * This enum keeps track of the driver's knowledge of the state of the MCS
+ * buffer.
+ *
+ * MCS buffers only exist on Gen7+.
+ */
+enum intel_mcs_state
+{
+   /**
+* There is no MCS buffer for this miptree, and one should never be
+* allocated.
+*/
+   INTEL_MCS_STATE_NONE,
+
+   /**
+* An MCS buffer exists for this miptree, and it is used for MSAA purposes.
+*/
+   INTEL_MCS_STATE_MSAA,
+
+   /**
+* No deferred clears are pending for this miptree, and the contents of the
+* color buffer are entirely correct.  An MCS buffer may or may not exist
+* for this miptree.  If it does exist, it is entirely in the "no deferred
+* clears pending" state.  If it does not exist, it will be created the
+* first time a fast color clear is executed.
+*
+* In this state, the color buffer can be used for purposes other than
+* rendering without needing a render target resolve.
+*/
+   INTEL_MCS_STATE_RESOLVED,
+
+   /**
+* An MCS buffer exists for this miptree, and deferred clears are pending
+* for some regions of the color buffer, as indicated by the MCS buffer.
+* The contents of the color buffer are only correct for the regions where
+* the MCS buffer doesn't indicate a deferred clear.
+*
+* In this state, a render target resolve must be performed before the
+* color buffer can be used for purposes other than rendering.
+*/
+   INTEL_MCS_STATE_UNRESOLVED,
+
+   /**
+* An MCS buffer exists for this miptree, and deferred clears are pending
+* for the entire color buffer, and the contents of the MCS buffer reflect
+* this.  The contents of the color buffer are undefined.
+*
+* In this state, a render target resolve must be performed before the
+* color buffer can be used for purposes other than rendering.
+*
+* If the client attempts to clear a buffer which is already in this state,
+* the clear can be safely skipped, since the buffer is already clear.
+*/
+   INTEL_MCS_STATE_CLEAR,
+};
+#endif
+
 struct intel_mipmap_tree
 {
/* Effectively the key:
@@ -386,6 +454,11 @@ struct intel_mipmap_tree
 * (INTEL_MSAA_FORMAT_CMS).
 */
struct intel_mipmap_tree *mcs_mt;
+
+   /**
+* MCS state for this buffer.
+*/
+   enum intel_mcs_state mcs_state;
 #endif
 
/* These are also refcounted:
@@ -696,6 +769,36 @@ intel_miptree_get_region(struct intel_context *intel,
  struct intel_mipmap_tree *mt,
  enum intel_miptree_access_type access_type)
 {
+#ifndef I915
+   switch (access_type) {
+   case INTEL_MIPTREE_ACCESS_NONE:
+  break;
+   case INTEL_MIPTREE_ACCESS_BLIT:
+   case IN

[Mesa-dev] [PATCH 01/12] intel: Conditionally compile mcs-related code for i965 only.

2013-05-21 Thread Paul Berry
This patch ifdefs out intel_mipmap_tree::mcs_mt when building the i915
(pre-Gen4) driver (MCS buffers aren't supported until Gen7, so there
is no need for this field in the i915 driver).  This should make it a
bit easier to implement fast color clears without undue risk to i915.
---
 src/mesa/drivers/dri/intel/intel_mipmap_tree.c | 8 +++-
 src/mesa/drivers/dri/intel/intel_mipmap_tree.h | 2 ++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/intel/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
index 2dfa787..ad9e2b3 100644
--- a/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
@@ -624,7 +624,9 @@ intel_miptree_release(struct intel_mipmap_tree **mt)
   intel_region_release(&((*mt)->region));
   intel_miptree_release(&(*mt)->stencil_mt);
   intel_miptree_release(&(*mt)->hiz_mt);
+#ifndef I915
   intel_miptree_release(&(*mt)->mcs_mt);
+#endif
   intel_miptree_release(&(*mt)->singlesample_mt);
   intel_resolve_map_clear(&(*mt)->hiz_map);
 
@@ -963,8 +965,11 @@ intel_miptree_alloc_mcs(struct intel_context *intel,
 struct intel_mipmap_tree *mt,
 GLuint num_samples)
 {
-   assert(mt->mcs_mt == NULL);
assert(intel->gen >= 7); /* MCS only used on Gen7+ */
+#ifdef I915
+   return false;
+#else
+   assert(mt->mcs_mt == NULL);
 
/* Choose the correct format for the MCS buffer.  All that really matters
 * is that we allocate the right buffer size, since we'll always be
@@ -1021,6 +1026,7 @@ intel_miptree_alloc_mcs(struct intel_context *intel,
intel_miptree_unmap_raw(intel, mt->mcs_mt);
 
return mt->mcs_mt;
+#endif
 }
 
 /**
diff --git a/src/mesa/drivers/dri/intel/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/intel/intel_mipmap_tree.h
index b7376e0..0ec3c5e 100644
--- a/src/mesa/drivers/dri/intel/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/intel/intel_mipmap_tree.h
@@ -373,6 +373,7 @@ struct intel_mipmap_tree
 */
struct intel_mipmap_tree *stencil_mt;
 
+#ifndef I915
/**
 * \brief MCS miptree for multisampled textures.
 *
@@ -381,6 +382,7 @@ struct intel_mipmap_tree
 * (INTEL_MSAA_FORMAT_CMS).
 */
struct intel_mipmap_tree *mcs_mt;
+#endif
 
/* These are also refcounted:
 */
-- 
1.8.2.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/12] i965/gen7+: Implement fast color clears.

2013-05-21 Thread Paul Berry
This series implements fast color clears, a Gen7+ feature which
reduces memory bandwidth by deferring the memory writes involved in a
glClear() until the same memory is later touched during rendering.

>From a broad overview point of view, fast color clears work in a
similar way to HiZ: an auxiliary "MCS" buffer keeps track of which
parts of the buffer have been cleared but haven't yet had the
necessary memory writes performed.  Whenever a color buffer needs to
be accessed by the CPU, or by a part of the GPU that is not
fast-color-aware, we have to perform a "resolve operation" to force
any pending memory writes to occur.

This patch series adopts a slightly different strategy (compared to
HiZ) for making sure the resolves happen when needed.  Instead of
modifying each code path that might need to do a resolve so that it
does one if needed, we create an accessor function that does the
resolve if needed and then provides the caller with access to the
miptree's underlying memory region.  This lets us have a lot more
confidence that we didn't miss any code paths, which is important
since color buffers are accessed by a large number of code paths.  To
discourage future maintainers from trying to bypass the accessor
function, it is inline (so that overhead is negligible), and the field
it provides access to has been renamed to region_private.

Patch 01 ifdefs out some code so that it does not appear in the i915
(pre-Gen4) driver--this makes it easier to be confident that these
changes won't regress i915.  Patch 02 introduces the aforementioned
accessor function.  Patches 03-11 are the guts of the implementation,
and patch 12 enables the new feature.

No piglit regressions.  I have additional piglit tests which validate
specific important corner cases--I hope to get those out to the list
later this week.

[PATCH 01/12] intel: Conditionally compile mcs-related code for i965 only.
[PATCH 02/12] intel: Create intel_miptree_get_region() to prepare for fast 
color clear.
[PATCH 03/12] i965/gen7+: Create an enum for keeping track of fast color clear 
state.
[PATCH 04/12] i965/gen7+: Set up MCS in SURFACE_STATE whenever MCS is present.
[PATCH 05/12] i965/gen7+: Create helper functions for single-sample MCS buffers.
[PATCH 06/12] i965/gen7+: Implement fast color clear operation in BLORP.
[PATCH 07/12] i965/blorp: Expand clear class hierarchy to prepare for RT 
resolves.
[PATCH 08/12] i965/blorp: Write blorp code to do render target resolves.
[PATCH 09/12] i965/gen7+: Ensure that front/back buffers are fast-clear 
resolved.
[PATCH 10/12] i965/gen7+: Resolve color buffers when necessary.
[PATCH 11/12] i965/gen7+: Disable fast color clears on shared regions.
[PATCH 12/12] i965/gen7: Enable support for fast color clears.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] mesa: fix GLSL program objects with more than 16 samplers combined

2013-05-21 Thread Marek Olšák
1) The tabs are used in the entire file. They might have got there
while I was moving some of the bits of code around. I don't really use
tabs for core Mesa work. I can change the tabs to spaces unless you
mean I should change everything to tabs (you didn't make that clear).

2) _mesa_get_sampler_uniform_value is really called at link time,
specifically at the conversion to TGSI. The only case where it might
be called after link time is when a new shader variant is being
generated, though i'm not really sure about this. Anyway, the function
already calls linker_error above my code to report an error. I decided
to do the same. _mesa_problem sounds better though.

Marek

On Tue, May 21, 2013 at 11:54 PM, Eric Anholt  wrote:
> Marek Olšák  writes:
>
>> The problem is the sampler units are allocated from the same pool for all
>> shader stages, so if a vertex shader uses 12 samplers (0..11), the fragment
>> shader samplers start at index 12, leaving only 4 sampler units
>> for the fragment shader. The main cause is probably the fact that samplers
>> (texture unit -> sampler unit mapping, etc.) are tracked globally
>> for an entire program object.
>>
>> This commit adapts the GLSL linker and core Mesa such that the sampler units
>> are assigned to sampler uniforms for each shader stage separately
>> (if a sampler uniform is used in all shader stages, it may occupy a different
>> sampler unit in each, and vice versa, an i-th sampler unit may refer to
>> a different sampler uniform in each shader stage), and the sampler-specific
>> variables are moved from gl_shader_program to gl_shader.
>>
>> This doesn't require any driver changes, and it fixes piglit/max-samplers
>> for gallium and classic swrast. It also works with any number of shader
>> stages.
>
> I was initially uncomfortable with gl_uniform_storage getting bigger,
> but it's *huge* already, and the increased overhead I thought I saw in
> uniform updates isn't really there.  So, just nitpicks:
>
>> @@ -335,8 +339,38 @@ public:
>> int ubo_block_index;
>> int ubo_byte_offset;
>> bool ubo_row_major;
>> +   gl_shader_type shader_type;
>>
>>  private:
>> +   void handle_samplers(const glsl_type *base_type,
>> +struct gl_uniform_storage *uniform)
>> +   {
>> +
>> +  if (base_type->is_sampler()) {
>> + uniform->sampler[shader_type].index = this->next_sampler;
>> + uniform->sampler[shader_type].active = true;
>> +
>> +  /* Increment the sampler by 1 for non-arrays and by the number of
>> +   * array elements for arrays.
>> +   */
>
> bad tab indentation
>
>> + this->next_sampler +=
>> +   MAX2(1, uniform->array_elements);
>> +
>> +  const gl_texture_index target = base_type->sampler_index();
>> +  const unsigned shadow = base_type->sampler_shadow;
>
> tabs
>
>> + for (unsigned i = uniform->sampler[shader_type].index;
>> +  i < MIN2(this->next_sampler, MAX_SAMPLERS);
>> +  i++) {
>> + this->targets[i] = target;
>> + this->shader_samplers_used |= 1U << i;
>> + this->shader_shadow_samplers |= shadow << i;
>> +  }
>
> tabs
>
>> @@ -388,26 +397,16 @@ private:
>>base_type = type;
>>}
>>
>> -  if (base_type->is_sampler()) {
>> -  this->uniforms[id].sampler = this->next_sampler;
>> +  /* This assigns sampler uniforms to sampler units. */
>> +  handle_samplers(base_type, &this->uniforms[id]);
>>
>> -  /* Increment the sampler by 1 for non-arrays and by the number of
>> -   * array elements for arrays.
>> -   */
>> -  this->next_sampler += MAX2(1, this->uniforms[id].array_elements);
>> -
>> -  const gl_texture_index target = base_type->sampler_index();
>> -  const unsigned shadow = base_type->sampler_shadow;
>> -  for (unsigned i = this->uniforms[id].sampler
>> -  ; i < MIN2(this->next_sampler, MAX_SAMPLERS)
>> -  ; i++) {
>> - this->targets[i] = target;
>> - this->shader_samplers_used |= 1U << i;
>> - this->shader_shadow_samplers |= shadow << i;
>> -  }
>> -
>> -  } else {
>> -  this->uniforms[id].sampler = ~0;
>> +  /* If there is already storage associated with this uniform, it means
>> +   * that it was set while processing an earlier shader stage.  For
>> +   * example, we may be processing the uniform in the fragment shader, 
>> but
>> +   * the uniform was already processed in the vertex shader.
>> +   */
>> +  if (this->uniforms[id].storage != NULL) {
>> +  return;
>
> tabs
>
>> @@ -119,6 +122,14 @@ _mesa_get_sampler_uniform_value(class ir_dereference 
>> *sampler,
>>return 0;
>> }
>>
>> -   return shader_program->UniformStorage[location].sampler + getname.offset;
>> -}
>> +   if (!shader_program->UniformStorage[location].sampler[shader].active) {
>> +  linker_error(shader_program,
>> +"cannot return a sampler named %s, because it is no

Re: [Mesa-dev] [PATCH] i965: Skip etc-to-rgb transcode on BayTrail.

2013-05-21 Thread Kenneth Graunke

On 05/21/2013 02:55 PM, Eric Anholt wrote:

The hardware does it, so no need for this workaround.
---
  src/mesa/drivers/dri/intel/intel_mipmap_tree.c | 64 +-
  1 file changed, 33 insertions(+), 31 deletions(-)


Looks like roughly a 6% speedup, and no regressions.  Pushed...thanks!

--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Correctly initialize the shader key, v2

2013-05-21 Thread Lauri Kasanen
Hi,

seems this was missed for 9.1.3. Could someone cherrypick it to the 9.1
branch?

- Lauri
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] llvmpipe: disable simple_shader optimization

2013-05-21 Thread Alex Deucher
On Tue, May 21, 2013 at 6:12 PM,   wrote:
> From: Roland Scheidegger 
>
> This optimization disabled mask checks if the shader is simple enough.
> While this should work correctly, the problem is that it can hide real issues
> because shaders in practice are usually complex enough (8 instructions or 1
> texture is already enough) so this doesn't get used, whereas dumbed-down
> tests which should hit all the same code paths suddenly do something quite
> different. This was the reason that bug 41787 could not be easily tracked as
> stencil test not working correctly (piglit would in fact have failed some
> tests without that optimization).
> So disable it for now, it's unclear if it's much of a win in any case.
> ---
>  src/gallium/drivers/llvmpipe/lp_state_fs.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/llvmpipe/lp_state_fs.c 
> b/src/gallium/drivers/llvmpipe/lp_state_fs.c
> index 9661273..b06f915 100644
> --- a/src/gallium/drivers/llvmpipe/lp_state_fs.c
> +++ b/src/gallium/drivers/llvmpipe/lp_state_fs.c
> @@ -247,7 +247,7 @@ generate_fs_loop(struct gallivm_state *gallivm,
> struct lp_build_mask_context mask;
> boolean simple_shader = (shader->info.base.file_count[TGSI_FILE_SAMPLER] 
> == 0 &&
>  shader->info.base.num_inputs < 3 &&
> -shader->info.base.num_instructions < 8);
> +shader->info.base.num_instructions < 8) && 0;

Maybe add a comment here as per your commit message as to why it's disabled.

Alex

> const boolean dual_source_blend = key->blend.rt[0].blend_enable &&
>   util_blend_state_is_dual(&key->blend, 
> 0);
> unsigned attrib;
> --
> 1.7.9.5
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] llvmpipe: disable simple_shader optimization

2013-05-21 Thread sroland
From: Roland Scheidegger 

This optimization disabled mask checks if the shader is simple enough.
While this should work correctly, the problem is that it can hide real issues
because shaders in practice are usually complex enough (8 instructions or 1
texture is already enough) so this doesn't get used, whereas dumbed-down
tests which should hit all the same code paths suddenly do something quite
different. This was the reason that bug 41787 could not be easily tracked as
stencil test not working correctly (piglit would in fact have failed some
tests without that optimization).
So disable it for now, it's unclear if it's much of a win in any case.
---
 src/gallium/drivers/llvmpipe/lp_state_fs.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_state_fs.c 
b/src/gallium/drivers/llvmpipe/lp_state_fs.c
index 9661273..b06f915 100644
--- a/src/gallium/drivers/llvmpipe/lp_state_fs.c
+++ b/src/gallium/drivers/llvmpipe/lp_state_fs.c
@@ -247,7 +247,7 @@ generate_fs_loop(struct gallivm_state *gallivm,
struct lp_build_mask_context mask;
boolean simple_shader = (shader->info.base.file_count[TGSI_FILE_SAMPLER] == 
0 &&
 shader->info.base.num_inputs < 3 &&
-shader->info.base.num_instructions < 8);
+shader->info.base.num_instructions < 8) && 0;
const boolean dual_source_blend = key->blend.rt[0].blend_enable &&
  util_blend_state_is_dual(&key->blend, 0);
unsigned attrib;
-- 
1.7.9.5
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] llvmpipe: fix early depth test / late depth write stencil issues

2013-05-21 Thread sroland
From: Roland Scheidegger 

We actually did early depth/stencil test and late depth/stencil write even
when the shader could kill the fragment (alpha test or discard). Since it
matters for the new stencil value if the fragment is killed by depth/stencil
test or by the shader (in which case it will not reach the depth/stencil
test) this simply cannot work (we also would possibly skip writing the new
stencil value due to mask checks but this is a secondary issue).
So use late depth test / late depth write instead in this case.
(No piglit changes as it doesn't seem to hit such bogus early depth test
/ late depth write path.)
---
 src/gallium/drivers/llvmpipe/lp_state_fs.c |   17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_state_fs.c 
b/src/gallium/drivers/llvmpipe/lp_state_fs.c
index b1696ee..9661273 100644
--- a/src/gallium/drivers/llvmpipe/lp_state_fs.c
+++ b/src/gallium/drivers/llvmpipe/lp_state_fs.c
@@ -266,13 +266,20 @@ generate_fs_loop(struct gallivm_state *gallivm,
   assert(zs_format_desc);
 
   if (!shader->info.base.writes_z) {
- if (key->alpha.enabled || shader->info.base.uses_kill)
+ if (key->alpha.enabled || shader->info.base.uses_kill) {
 /* With alpha test and kill, can do the depth test early
  * and hopefully eliminate some quads.  But need to do a
  * special deferred depth write once the final mask value
- * is known.
+ * is known. This only works though if there's either no
+ * stencil test or the stencil value isn't written.
  */
-depth_mode = EARLY_DEPTH_TEST | LATE_DEPTH_WRITE;
+if (key->stencil[0].enabled && (key->stencil[0].writemask ||
+(key->stencil[1].enabled &&
+ key->stencil[1].writemask)))
+   depth_mode = LATE_DEPTH_TEST | LATE_DEPTH_WRITE;
+else
+   depth_mode = EARLY_DEPTH_TEST | LATE_DEPTH_WRITE;
+ }
  else
 depth_mode = EARLY_DEPTH_TEST | EARLY_DEPTH_WRITE;
   }
@@ -281,9 +288,9 @@ generate_fs_loop(struct gallivm_state *gallivm,
   }
 
   if (!(key->depth.enabled && key->depth.writemask) &&
-  !((key->stencil[0].enabled && (key->stencil[0].writemask ||
+  !(key->stencil[0].enabled && (key->stencil[0].writemask ||
 (key->stencil[1].enabled &&
- key->stencil[1].writemask)
+ key->stencil[1].writemask
  depth_mode &= ~(LATE_DEPTH_WRITE | EARLY_DEPTH_WRITE);
}
else {
-- 
1.7.9.5
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] llvmpipe: fix issue with not writing new stencil values

2013-05-21 Thread sroland
From: Roland Scheidegger 

We did mask checks between depth/stencil testing and depth/stencil write.
This meant that if the depth/stencil test killed off all fragments we never
actually wrote the new stencil value. This issue affected all early/late
test/write combinations.
So move the mask check after depth/stencil write (for early depth test,
could do the same for late depth test but might not be worth it at that
point so just skip it there).
This addresses https://bugs.freedesktop.org/show_bug.cgi?id=41787.
Piglit does not hit this issue because of the simple_shader optimization
in generate_fs_loop() which means we're skipping the mask checks.
---
 src/gallium/drivers/llvmpipe/lp_bld_depth.c |4 
 src/gallium/drivers/llvmpipe/lp_state_fs.c  |3 +++
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_bld_depth.c 
b/src/gallium/drivers/llvmpipe/lp_bld_depth.c
index 5ef9947..df6a6c4 100644
--- a/src/gallium/drivers/llvmpipe/lp_bld_depth.c
+++ b/src/gallium/drivers/llvmpipe/lp_bld_depth.c
@@ -1116,9 +1116,5 @@ lp_build_depth_stencil_test(struct gallivm_state *gallivm,
 
if (depth->enabled && stencil[0].enabled)
   lp_build_mask_update(mask, z_pass);
-
-   if (do_branch)
-  lp_build_mask_check(mask);
-
 }
 
diff --git a/src/gallium/drivers/llvmpipe/lp_state_fs.c 
b/src/gallium/drivers/llvmpipe/lp_state_fs.c
index 1dfc75a..b1696ee 100644
--- a/src/gallium/drivers/llvmpipe/lp_state_fs.c
+++ b/src/gallium/drivers/llvmpipe/lp_state_fs.c
@@ -361,6 +361,9 @@ generate_fs_loop(struct gallivm_state *gallivm,
depth_ptr, depth_stride,
z_value, s_value);
   }
+
+  if (!simple_shader && key->stencil[0].enabled)
+ lp_build_mask_check(&mask);
}
 
lp_build_interp_soa_update_inputs_dyn(interp, gallivm, loop_state.counter);
-- 
1.7.9.5
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] llvmpipe: (trivial) remove confusing code in stencil test

2013-05-21 Thread sroland
From: Roland Scheidegger 

This was meant to disable some code which isn't needed when depth/stencil
isn't written. However, there's more code which wouldn't be needed in that
case so having the condition there was just odd (llvm will drop all the code
anyway).
---
 src/gallium/drivers/llvmpipe/lp_bld_depth.c |   27 +++
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_bld_depth.c 
b/src/gallium/drivers/llvmpipe/lp_bld_depth.c
index 08138f0..5ef9947 100644
--- a/src/gallium/drivers/llvmpipe/lp_bld_depth.c
+++ b/src/gallium/drivers/llvmpipe/lp_bld_depth.c
@@ -1097,23 +1097,18 @@ lp_build_depth_stencil_test(struct gallivm_state 
*gallivm,
   stencil_shift, "");
 
/* Finally, merge the z/stencil values */
-   if ((depth->enabled && depth->writemask) ||
-   (stencil[0].enabled && (stencil[0].writemask ||
-   (stencil[1].enabled && stencil[1].writemask 
{
-
-  if (format_desc->block.bits <= 32) {
- if (have_z && have_s)
-*z_value = LLVMBuildOr(builder, z_dst, stencil_vals, "");
- else if (have_z)
-*z_value = z_dst;
- else
-*z_value = stencil_vals;
- *s_value = *z_value;
-  }
-  else {
+   if (format_desc->block.bits <= 32) {
+  if (have_z && have_s)
+ *z_value = LLVMBuildOr(builder, z_dst, stencil_vals, "");
+  else if (have_z)
  *z_value = z_dst;
- *s_value = stencil_vals;
-  }
+  else
+ *z_value = stencil_vals;
+  *s_value = *z_value;
+   }
+   else {
+  *z_value = z_dst;
+  *s_value = stencil_vals;
}
 
if (s_pass_mask)
-- 
1.7.9.5
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] llvmpipe: fix bug in early depth test / late depth write handling

2013-05-21 Thread sroland
From: Roland Scheidegger 

Using wrong type if the format was less than 32bits.
No piglit changes as it doesn't hit that path.
---
 src/gallium/drivers/llvmpipe/lp_bld_depth.c |   13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_bld_depth.c 
b/src/gallium/drivers/llvmpipe/lp_bld_depth.c
index 2376ca7..08138f0 100644
--- a/src/gallium/drivers/llvmpipe/lp_bld_depth.c
+++ b/src/gallium/drivers/llvmpipe/lp_bld_depth.c
@@ -680,15 +680,15 @@ lp_build_depth_stencil_write_swizzled(struct 
gallivm_state *gallivm,
LLVMTypeRef load_ptr_type;
unsigned depth_bytes = format_desc->block.bits / 8;
struct lp_type zs_type = lp_depth_type(format_desc, z_src_type.length);
+   struct lp_type z_type = zs_type;
struct lp_type zs_load_type = zs_type;
 
zs_load_type.length = zs_load_type.length / 2;
load_ptr_type = LLVMPointerType(lp_build_vec_type(gallivm, zs_load_type), 
0);
 
-   if (zs_type.width > 32)
-  zs_type.width = 32;
+   z_type.width = z_src_type.width;
 
-   lp_build_context_init(&z_bld, gallivm, zs_type);
+   lp_build_context_init(&z_bld, gallivm, z_type);
 
/*
 * This is far from ideal, at least for late depth write we should do this
@@ -742,7 +742,8 @@ lp_build_depth_stencil_write_swizzled(struct gallivm_state 
*gallivm,
 
if (zs_type.width < z_src_type.width) {
   /* Truncate ZS values (e.g., when writing to Z16_UNORM) */
-  z_value = LLVMBuildTrunc(builder, z_value, z_bld.vec_type, "");
+  z_value = LLVMBuildTrunc(builder, z_value,
+   lp_build_int_vec_type(gallivm, zs_type), "");
}
 
if (format_desc->block.bits <= 32) {
@@ -762,9 +763,9 @@ lp_build_depth_stencil_write_swizzled(struct gallivm_state 
*gallivm,
}
else {
   if (z_src_type.length == 4) {
- zs_dst1 = lp_build_interleave2(gallivm, zs_type,
+ zs_dst1 = lp_build_interleave2(gallivm, z_type,
 z_value, s_value, 0);
- zs_dst2 = lp_build_interleave2(gallivm, zs_type,
+ zs_dst2 = lp_build_interleave2(gallivm, z_type,
 z_value, s_value, 1);
   }
   else {
-- 
1.7.9.5
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Skip etc-to-rgb transcode on BayTrail.

2013-05-21 Thread Eric Anholt
The hardware does it, so no need for this workaround.
---
 src/mesa/drivers/dri/intel/intel_mipmap_tree.c | 64 +-
 1 file changed, 33 insertions(+), 31 deletions(-)

diff --git a/src/mesa/drivers/dri/intel/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
index 2dfa787..d967b19 100644
--- a/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
@@ -382,37 +382,39 @@ intel_miptree_create(struct intel_context *intel,
gl_format etc_format = MESA_FORMAT_NONE;
GLuint total_width, total_height;
 
-   switch (format) {
-   case MESA_FORMAT_ETC1_RGB8:
-  format = MESA_FORMAT_RGBX_REV;
-  break;
-   case MESA_FORMAT_ETC2_RGB8:
-  format = MESA_FORMAT_RGBX_REV;
-  break;
-   case MESA_FORMAT_ETC2_SRGB8:
-   case MESA_FORMAT_ETC2_SRGB8_ALPHA8_EAC:
-   case MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1:
-  format = MESA_FORMAT_SARGB8;
-  break;
-   case MESA_FORMAT_ETC2_RGBA8_EAC:
-   case MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1:
-  format = MESA_FORMAT_RGBA_REV;
-  break;
-   case MESA_FORMAT_ETC2_R11_EAC:
-  format = MESA_FORMAT_R16;
-  break;
-   case MESA_FORMAT_ETC2_SIGNED_R11_EAC:
-  format = MESA_FORMAT_SIGNED_R16;
-  break;
-   case MESA_FORMAT_ETC2_RG11_EAC:
-  format = MESA_FORMAT_GR1616;
-  break;
-   case MESA_FORMAT_ETC2_SIGNED_RG11_EAC:
-  format = MESA_FORMAT_SIGNED_GR1616;
-  break;
-   default:
-  /* Non ETC1 / ETC2 format */
-  break;
+   if (!intel->is_baytrail) {
+  switch (format) {
+  case MESA_FORMAT_ETC1_RGB8:
+ format = MESA_FORMAT_RGBX_REV;
+ break;
+  case MESA_FORMAT_ETC2_RGB8:
+ format = MESA_FORMAT_RGBX_REV;
+ break;
+  case MESA_FORMAT_ETC2_SRGB8:
+  case MESA_FORMAT_ETC2_SRGB8_ALPHA8_EAC:
+  case MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1:
+ format = MESA_FORMAT_SARGB8;
+ break;
+  case MESA_FORMAT_ETC2_RGBA8_EAC:
+  case MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1:
+ format = MESA_FORMAT_RGBA_REV;
+ break;
+  case MESA_FORMAT_ETC2_R11_EAC:
+ format = MESA_FORMAT_R16;
+ break;
+  case MESA_FORMAT_ETC2_SIGNED_R11_EAC:
+ format = MESA_FORMAT_SIGNED_R16;
+ break;
+  case MESA_FORMAT_ETC2_RG11_EAC:
+ format = MESA_FORMAT_GR1616;
+ break;
+  case MESA_FORMAT_ETC2_SIGNED_RG11_EAC:
+ format = MESA_FORMAT_SIGNED_GR1616;
+ break;
+  default:
+ /* Non ETC1 / ETC2 format */
+ break;
+  }
}
 
etc_format = (format != tex_format) ? tex_format : MESA_FORMAT_NONE;
-- 
1.8.3.rc0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] mesa: fix GLSL program objects with more than 16 samplers combined

2013-05-21 Thread Eric Anholt
Marek Olšák  writes:

> The problem is the sampler units are allocated from the same pool for all
> shader stages, so if a vertex shader uses 12 samplers (0..11), the fragment
> shader samplers start at index 12, leaving only 4 sampler units
> for the fragment shader. The main cause is probably the fact that samplers
> (texture unit -> sampler unit mapping, etc.) are tracked globally
> for an entire program object.
>
> This commit adapts the GLSL linker and core Mesa such that the sampler units
> are assigned to sampler uniforms for each shader stage separately
> (if a sampler uniform is used in all shader stages, it may occupy a different
> sampler unit in each, and vice versa, an i-th sampler unit may refer to
> a different sampler uniform in each shader stage), and the sampler-specific
> variables are moved from gl_shader_program to gl_shader.
>
> This doesn't require any driver changes, and it fixes piglit/max-samplers
> for gallium and classic swrast. It also works with any number of shader
> stages.

I was initially uncomfortable with gl_uniform_storage getting bigger,
but it's *huge* already, and the increased overhead I thought I saw in
uniform updates isn't really there.  So, just nitpicks:

> @@ -335,8 +339,38 @@ public:
> int ubo_block_index;
> int ubo_byte_offset;
> bool ubo_row_major;
> +   gl_shader_type shader_type;
>  
>  private:
> +   void handle_samplers(const glsl_type *base_type,
> +struct gl_uniform_storage *uniform)
> +   {
> +
> +  if (base_type->is_sampler()) {
> + uniform->sampler[shader_type].index = this->next_sampler;
> + uniform->sampler[shader_type].active = true;
> +
> +  /* Increment the sampler by 1 for non-arrays and by the number of
> +   * array elements for arrays.
> +   */

bad tab indentation

> + this->next_sampler +=
> +   MAX2(1, uniform->array_elements);
> +
> +  const gl_texture_index target = base_type->sampler_index();
> +  const unsigned shadow = base_type->sampler_shadow;

tabs

> + for (unsigned i = uniform->sampler[shader_type].index;
> +  i < MIN2(this->next_sampler, MAX_SAMPLERS);
> +  i++) {
> + this->targets[i] = target;
> + this->shader_samplers_used |= 1U << i;
> + this->shader_shadow_samplers |= shadow << i;
> +  }

tabs

> @@ -388,26 +397,16 @@ private:
>base_type = type;
>}
>  
> -  if (base_type->is_sampler()) {
> -  this->uniforms[id].sampler = this->next_sampler;
> +  /* This assigns sampler uniforms to sampler units. */
> +  handle_samplers(base_type, &this->uniforms[id]);
>  
> -  /* Increment the sampler by 1 for non-arrays and by the number of
> -   * array elements for arrays.
> -   */
> -  this->next_sampler += MAX2(1, this->uniforms[id].array_elements);
> -
> -  const gl_texture_index target = base_type->sampler_index();
> -  const unsigned shadow = base_type->sampler_shadow;
> -  for (unsigned i = this->uniforms[id].sampler
> -  ; i < MIN2(this->next_sampler, MAX_SAMPLERS)
> -  ; i++) {
> - this->targets[i] = target;
> - this->shader_samplers_used |= 1U << i;
> - this->shader_shadow_samplers |= shadow << i;
> -  }
> -
> -  } else {
> -  this->uniforms[id].sampler = ~0;
> +  /* If there is already storage associated with this uniform, it means
> +   * that it was set while processing an earlier shader stage.  For
> +   * example, we may be processing the uniform in the fragment shader, 
> but
> +   * the uniform was already processed in the vertex shader.
> +   */
> +  if (this->uniforms[id].storage != NULL) {
> +  return;

tabs

> @@ -119,6 +122,14 @@ _mesa_get_sampler_uniform_value(class ir_dereference 
> *sampler,
>return 0;
> }
>  
> -   return shader_program->UniformStorage[location].sampler + getname.offset;
> -}
> +   if (!shader_program->UniformStorage[location].sampler[shader].active) {
> +  linker_error(shader_program,
> +"cannot return a sampler named %s, because it is not "
> +   "used in this shader stage. This is a driver bug.\n",
> +   getname.name);
> +  return 0;
> +   }

Maybe _mesa_problem()?  This isn't really called at link time, and you
want something really visible like an assert.

Other than the style bits,

Reviewed-by: Eric Anholt 


pgpyl48fdG8kX.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] ARB_vp/ARB_fp: accept duplicate precision options

2013-05-21 Thread Ian Romanick

On 05/18/2013 11:18 PM, Chris Forbes wrote:

Relaxes the validation of

OPTION ARB_precision_hint_{nicest,fastest};

to allow duplicate options. The spec says that both /nicest/ and
/fastest/ cannot be specified together, but could be interpreted
either way for respecification of the same option.

Other drivers (NVIDIA etc) accept this, and at least one Unity3D game
expects it to succeed (Kerbal Space Program).


NOTE: This is a candidate for stable release branches.


Signed-off-by: Chris Forbes 
---
  src/mesa/program/program_parse_extra.c | 16 +++-
  1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/src/mesa/program/program_parse_extra.c 
b/src/mesa/program/program_parse_extra.c
index 4d92848..eb1fccc 100644
--- a/src/mesa/program/program_parse_extra.c
+++ b/src/mesa/program/program_parse_extra.c
@@ -194,15 +194,13 @@ _mesa_ARBfp_parse_option(struct asm_parser_state *state, 
const char *option)
} else if (strncmp(option, "precision_hint_", 15) == 0) {
 option += 15;

-if (state->option.PrecisionHint == OPTION_NONE) {
-   if (strcmp(option, "nicest") == 0) {
-  state->option.PrecisionHint = OPTION_NICEST;
-  return 1;
-   } else if (strcmp(option, "fastest") == 0) {
-  state->option.PrecisionHint = OPTION_FASTEST;
-  return 1;
-   }
-}


I think it's worth quoting the bit of spec that Eric mentioned in a 
comment here.  With or without that change, this patch is


Reviewed-by: Ian Romanick 


+ if (strcmp(option, "nicest") == 0 && state->option.PrecisionHint != 
OPTION_FASTEST) {
+state->option.PrecisionHint = OPTION_NICEST;
+return 1;
+ } else if (strcmp(option, "fastest") == 0 && 
state->option.PrecisionHint != OPTION_NICEST) {
+state->option.PrecisionHint = OPTION_FASTEST;
+return 1;
+ }

 return 0;
} else if (strcmp(option, "draw_buffers") == 0) {



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] mesa: fix GLSL program objects with more than 16 samplers combined

2013-05-21 Thread Ian Romanick

On 05/18/2013 11:20 AM, Marek Olšák wrote:

Ping. This patch allows more than 16 samplers in all shaders combined
(MAX_COMBINED_TEXTURE_IMAGE_UNITS), which has never been possible with
current Mesa git. GL 3.1 requires 32 combined texture image units and
GL 3.2 requires 48, so I think this patch deserves a review.


I was in the process of reviewing this last week but I got distracted. 
I'll have some review in the next 24 hours.



Marek

On Mon, May 13, 2013 at 6:42 PM, Marek Olšák  wrote:

The problem is the sampler units are allocated from the same pool for all
shader stages, so if a vertex shader uses 12 samplers (0..11), the fragment
shader samplers start at index 12, leaving only 4 sampler units
for the fragment shader. The main cause is probably the fact that samplers
(texture unit -> sampler unit mapping, etc.) are tracked globally
for an entire program object.

This commit adapts the GLSL linker and core Mesa such that the sampler units
are assigned to sampler uniforms for each shader stage separately
(if a sampler uniform is used in all shader stages, it may occupy a different
sampler unit in each, and vice versa, an i-th sampler unit may refer to
a different sampler uniform in each shader stage), and the sampler-specific
variables are moved from gl_shader_program to gl_shader.

This doesn't require any driver changes, and it fixes piglit/max-samplers
for gallium and classic swrast. It also works with any number of shader
stages.
---
  src/glsl/ir_uniform.h|   27 +++--
  src/glsl/link_uniform_initializers.cpp   |   25 -
  src/glsl/link_uniforms.cpp   |  127 +++---
  src/glsl/tests/set_uniform_initializer_tests.cpp |   10 +-
  src/mesa/main/mtypes.h   |   22 ++--
  src/mesa/main/uniform_query.cpp  |   24 ++--
  src/mesa/main/uniforms.c |   11 +-
  src/mesa/program/ir_to_mesa.cpp  |   15 ++-
  src/mesa/program/sampler.cpp |   19 +++-
  9 files changed, 167 insertions(+), 113 deletions(-)

diff --git a/src/glsl/ir_uniform.h b/src/glsl/ir_uniform.h
index 30e6f26..8198c48 100644
--- a/src/glsl/ir_uniform.h
+++ b/src/glsl/ir_uniform.h
@@ -99,15 +99,24 @@ struct gl_uniform_storage {
  */
 bool initialized;

-   /**
-* Base sampler index
-*
-* If \c ::base_type is \c GLSL_TYPE_SAMPLER, this represents the index of
-* this sampler.  If \c ::array_elements is not zero, the array will use
-* sampler indexes \c ::sampler through \c ::sampler + \c ::array_elements
-* - 1, inclusive.
-*/
-   uint8_t sampler;
+   struct {
+  /**
+   * Base sampler index
+   *
+   * If \c ::base_type is \c GLSL_TYPE_SAMPLER, this represents the index
+   * of this sampler.  If \c ::array_elements is not zero, the array will
+   * use sampler indices \c ::sampler through \c ::sampler +
+   * \c ::array_elements - 1, inclusive.
+   *
+   * Note that the index may be different in each shader stage.
+   */
+  uint8_t index;
+
+  /**
+   * Whether this sampler is used in this shader stage.
+   */
+  bool active;
+   } sampler[MESA_SHADER_TYPES];

 /**
  * Storage used by the driver for the uniform
diff --git a/src/glsl/link_uniform_initializers.cpp 
b/src/glsl/link_uniform_initializers.cpp
index 836a360..54d9bf1 100644
--- a/src/glsl/link_uniform_initializers.cpp
+++ b/src/glsl/link_uniform_initializers.cpp
@@ -138,8 +138,16 @@ set_uniform_initializer(void *mem_ctx, gl_shader_program 
*prog,
}

if (base_type == GLSL_TYPE_SAMPLER) {
-for (unsigned int i = 0; i < storage->array_elements; i++) {
-   prog->SamplerUnits[storage->sampler + i] = storage->storage[i].i;
+ for (int sh = 0; sh < MESA_SHADER_TYPES; sh++) {
+gl_shader *shader = prog->_LinkedShaders[sh];
+
+if (shader && storage->sampler[sh].active) {
+   for (unsigned i = 0; i < storage->array_elements; i++) {
+  unsigned index = storage->sampler[sh].index + i;
+
+  shader->SamplerUnits[index] = storage->storage[i].i;
+   }
+}
  }
}
 } else {
@@ -148,8 +156,17 @@ set_uniform_initializer(void *mem_ctx, gl_shader_program 
*prog,
val->type->base_type,
val->type->components());

-  if (storage->type->is_sampler())
-prog->SamplerUnits[storage->sampler] = storage->storage[0].i;
+  if (storage->type->is_sampler()) {
+ for (int sh = 0; sh < MESA_SHADER_TYPES; sh++) {
+gl_shader *shader = prog->_LinkedShaders[sh];
+
+if (shader && storage->sampler[sh].active) {
+   unsigned index = storage->sampler[sh].index;
+
+   shader->SamplerUnits[index] = storage->storage[0].i;
+}
+ }
+  }
 }

 storage->initi

[Mesa-dev] [PATCH] Haiku: Add Gallium winsys and target code

2013-05-21 Thread Alexander von Gluck IV
* We generate a static library for Haiku
  Gallium targets as our port system combines
  the compiled rendering code into a modular
  ar for each module (for example, our port
  system combines llvm libsoftpipe.a libllvmpipe.a
  into a single ar for the Haiku build system.
  I'd like the Gallium hgl target scons build
  system to do this some day, however how is
  beyond me at the moment. This is a first step.
---
 src/gallium/SConscript |  10 ++
 src/gallium/targets/haiku-softpipe/SConscript  |  21 +++
 .../targets/haiku-softpipe/haiku-softpipe.c|  65 +++
 .../targets/haiku-softpipe/haiku-softpipe.h|  36 
 src/gallium/winsys/sw/hgl/SConscript   |  25 +++
 src/gallium/winsys/sw/hgl/bitmap_wrapper.cpp   | 134 +++
 src/gallium/winsys/sw/hgl/bitmap_wrapper.h |  59 +++
 src/gallium/winsys/sw/hgl/hgl_sw_winsys.c  | 187 +
 src/gallium/winsys/sw/hgl/hgl_sw_winsys.h  |  52 ++
 9 files changed, 589 insertions(+)
 create mode 100644 src/gallium/targets/haiku-softpipe/SConscript
 create mode 100644 src/gallium/targets/haiku-softpipe/haiku-softpipe.c
 create mode 100644 src/gallium/targets/haiku-softpipe/haiku-softpipe.h
 create mode 100644 src/gallium/winsys/sw/hgl/SConscript
 create mode 100644 src/gallium/winsys/sw/hgl/bitmap_wrapper.cpp
 create mode 100644 src/gallium/winsys/sw/hgl/bitmap_wrapper.h
 create mode 100644 src/gallium/winsys/sw/hgl/hgl_sw_winsys.c
 create mode 100644 src/gallium/winsys/sw/hgl/hgl_sw_winsys.h

diff --git a/src/gallium/SConscript b/src/gallium/SConscript
index a3edc65..ca75f37 100644
--- a/src/gallium/SConscript
+++ b/src/gallium/SConscript
@@ -74,6 +74,11 @@ if not env['msvc']:
 'winsys/i915/sw/SConscript',
 ])
 
+if env['platform'] == 'haiku':
+SConscript([
+'winsys/sw/hgl/SConscript',
+])
+
 if env['dri']:
 SConscript([
 'winsys/sw/dri/SConscript',
@@ -114,6 +119,11 @@ if not env['embedded']:
 'targets/libgl-gdi/SConscript',
 ])
 
+if env['platform'] == 'haiku':
+SConscript([
+'targets/haiku-softpipe/SConscript',
+])
+
 if env['dri']:
 SConscript([
 'targets/SConscript.dri',
diff --git a/src/gallium/targets/haiku-softpipe/SConscript 
b/src/gallium/targets/haiku-softpipe/SConscript
new file mode 100644
index 000..72a5ba7
--- /dev/null
+++ b/src/gallium/targets/haiku-softpipe/SConscript
@@ -0,0 +1,21 @@
+Import('*')
+
+if True:
+env.Append(CPPDEFINES = [
+'GALLIUM_SOFTPIPE',
+'GALLIUM_RBUG',
+'GALLIUM_TRACE',
+])
+
+if env['llvm']:
+env.Append(CPPDEFINES = 'HAVE_LLVMPIPE')
+
+softpipe_sources = [
+'haiku-softpipe.c'
+]
+
+module = env.StaticLibrary(
+target ='swpipe_haiku.a',
+source = softpipe_sources,
+SHLIBPREFIX = '',
+)
diff --git a/src/gallium/targets/haiku-softpipe/haiku-softpipe.c 
b/src/gallium/targets/haiku-softpipe/haiku-softpipe.c
new file mode 100644
index 000..d0a427b
--- /dev/null
+++ b/src/gallium/targets/haiku-softpipe/haiku-softpipe.c
@@ -0,0 +1,65 @@
+/**
+ *
+ * Copyright 2013 Alexander von Gluck IV 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM,
+ * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+ * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
+ * USE OR OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ **/
+
+
+#include "haiku-softpipe.h"
+
+#include "util/u_debug.h"
+#include "sw/hgl/hgl_sw_winsys.h"
+
+#include "softpipe/sp_texture.h"
+#include "softpipe/sp_screen.h"
+#include "softpipe/sp_public.h"
+
+#ifdef HAVE_LLVMPIPE
+#include "llvmpipe/lp_texture.h"
+#include "llvmpipe/lp_screen.h"
+#include "llvmpipe/lp_public.h"
+#endif
+
+
+struct pipe_screen*
+hgl_sw_screen_create(void)
+{
+   struct sw_winsys* winsys = hgl_create_sw_winsys();
+   struct pipe_screen* screen = NULL;

Re: [Mesa-dev] [PATCH 5/5] i965 gen7: add support for layered color renderbuffers

2013-05-21 Thread Jordan Justen
On Tue, May 21, 2013 at 1:33 PM, Eric Anholt  wrote:
> Paul Berry  writes:
>
>> On 17 May 2013 21:44, Chia-I Wu  wrote:
>>
>>> On Sat, May 18, 2013 at 10:11 AM, Jordan Justen
>>>  wrote:
>>> > Rather than pointing the surface_state directly at a single
>>> > sub-image of the texture for rendering, we now point the
>>> > surface_state at the top level of the texture, and configure
>>> > the surface_state as needed based on this.
>>> >
>>> > We now also need to stop setting the FORCE_ZERO_RTAINDEX bit
>>> > in the clip date so render target array values other than zero
>>> > will be used.
>>> >
>>> > Signed-off-by: Jordan Justen 
>>> > ---
>>> >  src/mesa/drivers/dri/i965/brw_defines.h   |2 +
>>> >  src/mesa/drivers/dri/i965/gen7_clip_state.c   |3 +-
>>> >  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |   63
>>> +++--
>>> >  3 files changed, 48 insertions(+), 20 deletions(-)
>>> >
>>> > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h
>>> b/src/mesa/drivers/dri/i965/brw_defines.h
>>> > index fedd78c..d61151f 100644
>>> > --- a/src/mesa/drivers/dri/i965/brw_defines.h
>>> > +++ b/src/mesa/drivers/dri/i965/brw_defines.h
>>> > @@ -539,6 +539,8 @@
>>> >  #define GEN7_SURFACE_MULTISAMPLECOUNT_8 (3 << 3)
>>> >  #define GEN7_SURFACE_MSFMT_MSS  (0 << 6)
>>> >  #define GEN7_SURFACE_MSFMT_DEPTH_STENCIL(1 << 6)
>>> > +#define GEN7_SURFACE_MIN_ARRAY_ELEMENT_SHIFT   18
>>> > +#define GEN7_SURFACE_RENDER_TARGET_VIEW_EXTENT_SHIFT   7
>>> >
>>> >  /* Surface state DW5 */
>>> >  #define BRW_SURFACE_X_OFFSET_SHIFT 25
>>> > diff --git a/src/mesa/drivers/dri/i965/gen7_clip_state.c
>>> b/src/mesa/drivers/dri/i965/gen7_clip_state.c
>>> > index 29a5ed5..1256f32 100644
>>> > --- a/src/mesa/drivers/dri/i965/gen7_clip_state.c
>>> > +++ b/src/mesa/drivers/dri/i965/gen7_clip_state.c
>>> > @@ -107,8 +107,7 @@ upload_clip_state(struct brw_context *brw)
>>> >  GEN6_CLIP_XY_TEST |
>>> >   dw2);
>>> > OUT_BATCH(U_FIXED(0.125, 3) << GEN6_CLIP_MIN_POINT_WIDTH_SHIFT |
>>> > - U_FIXED(255.875, 3) << GEN6_CLIP_MAX_POINT_WIDTH_SHIFT |
>>> > - GEN6_CLIP_FORCE_ZERO_RTAINDEX);
>>> > + U_FIXED(255.875, 3) << GEN6_CLIP_MAX_POINT_WIDTH_SHIFT);
>>> > ADVANCE_BATCH();
>>> >  }
>>> >
>>> > diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
>>> b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
>>> > index 6c01545..5f15eff 100644
>>> > --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
>>> > +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
>>> > @@ -23,6 +23,7 @@
>>> >  #include "main/mtypes.h"
>>> >  #include "main/blend.h"
>>> >  #include "main/samplerobj.h"
>>> > +#include "main/texformat.h"
>>> >  #include "program/prog_parameter.h"
>>> >
>>> >  #include "intel_mipmap_tree.h"
>>> > @@ -529,12 +530,13 @@ gen7_update_renderbuffer_surface(struct
>>> brw_context *brw,
>>> > struct gl_context *ctx = &intel->ctx;
>>> > struct intel_renderbuffer *irb = intel_renderbuffer(rb);
>>> > struct intel_region *region = irb->mt->region;
>>> > -   uint32_t tile_x, tile_y;
>>> > uint32_t format;
>>> > /* _NEW_BUFFERS */
>>> > gl_format rb_format = _mesa_get_render_format(ctx,
>>> intel_rb_format(irb));
>>> > -
>>> > -   assert(!layered);
>>> > +   uint32_t surftype;
>>> > +   bool is_array = false;
>>> > +   int depth = rb->Depth > 0 ? rb->Depth - 1 : 0;
>>> > +   int min_array_element = 0;
>>> >
>>> > uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
>>> >  8 * 4, 32,
>>> &brw->wm.surf_offset[unit]);
>>> > @@ -550,7 +552,23 @@ gen7_update_renderbuffer_surface(struct brw_context
>>> *brw,
>>> >  __FUNCTION__, _mesa_get_format_name(rb_format));
>>> > }
>>> >
>>> > -   surf[0] = BRW_SURFACE_2D << BRW_SURFACE_TYPE_SHIFT |
>>> > +   if (rb->TexImage) {
>>> > +  surftype = translate_tex_target(rb->TexImage->TexObject->Target);
>>> > +  is_array =
>>> _mesa_tex_target_is_array(rb->TexImage->TexObject->Target);
>>> > +  if (rb->TexImage->TexObject->Target == GL_TEXTURE_CUBE_MAP_ARRAY)
>>> {
>>> > + assert(rb->Depth > 0);
>>> > + surftype = BRW_SURFACE_2D;
>>> > + depth = (6 * (depth + 1)) - 1;
>>> > +  } else if (rb->TexImage->TexObject->Target ==
>>> GL_TEXTURE_CUBE_MAP) {
>>> > + surftype = BRW_SURFACE_2D;
>>> > + depth = 5;
>>> > + is_array = true;
>>> > +  }
>>> > +   } else {
>>> > +  surftype = BRW_SURFACE_2D;
>>> > +   }
>>> > +
>>> > +   surf[0] = surftype << BRW_SURFACE_TYPE_SHIFT |
>>> >   format << BRW_SURFACE_FORMAT_SHIFT |
>>> >   (irb->mt->array_spacing_lod0 ? GEN7_SURFACE_ARYSPC_LOD0
>>> >: GEN7_SURFACE_ARYSPC_FULL) |
>>> > @@ -561,24 +579,33 @@ gen7_update_renderbuffer_surface(struct
>>> brw_context *brw,
>>> > if (irb->mt->align_w == 8)
>>> >

Re: [Mesa-dev] [RFC PATCH] ARB_vp/ARB_fp: accept duplicate precision options

2013-05-21 Thread Eric Anholt
Chris Forbes  writes:

> Relaxes the validation of
>
>OPTION ARB_precision_hint_{nicest,fastest};
>
> to allow duplicate options. The spec says that both /nicest/ and
> /fastest/ cannot be specified together, but could be interpreted
> either way for respecification of the same option.
>
> Other drivers (NVIDIA etc) accept this, and at least one Unity3D game
> expects it to succeed (Kerbal Space Program).

I think this change is well supported by the spec:

Only one precision control option may be specified by any given 
fragment program.  A fragment program that specifies both the
"ARB_precision_hint_fastest" and "ARB_precision_hint_nicest" program
options will fail to load.

with the second sentence clarifying what exactly is meant in the first.

Reviewed-by: Eric Anholt 


pgpSqdDcmFYkj.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] i965 gen7: add support for layered color renderbuffers

2013-05-21 Thread Eric Anholt
Paul Berry  writes:

> On 17 May 2013 21:44, Chia-I Wu  wrote:
>
>> On Sat, May 18, 2013 at 10:11 AM, Jordan Justen
>>  wrote:
>> > Rather than pointing the surface_state directly at a single
>> > sub-image of the texture for rendering, we now point the
>> > surface_state at the top level of the texture, and configure
>> > the surface_state as needed based on this.
>> >
>> > We now also need to stop setting the FORCE_ZERO_RTAINDEX bit
>> > in the clip date so render target array values other than zero
>> > will be used.
>> >
>> > Signed-off-by: Jordan Justen 
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_defines.h   |2 +
>> >  src/mesa/drivers/dri/i965/gen7_clip_state.c   |3 +-
>> >  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |   63
>> +++--
>> >  3 files changed, 48 insertions(+), 20 deletions(-)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h
>> b/src/mesa/drivers/dri/i965/brw_defines.h
>> > index fedd78c..d61151f 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_defines.h
>> > +++ b/src/mesa/drivers/dri/i965/brw_defines.h
>> > @@ -539,6 +539,8 @@
>> >  #define GEN7_SURFACE_MULTISAMPLECOUNT_8 (3 << 3)
>> >  #define GEN7_SURFACE_MSFMT_MSS  (0 << 6)
>> >  #define GEN7_SURFACE_MSFMT_DEPTH_STENCIL(1 << 6)
>> > +#define GEN7_SURFACE_MIN_ARRAY_ELEMENT_SHIFT   18
>> > +#define GEN7_SURFACE_RENDER_TARGET_VIEW_EXTENT_SHIFT   7
>> >
>> >  /* Surface state DW5 */
>> >  #define BRW_SURFACE_X_OFFSET_SHIFT 25
>> > diff --git a/src/mesa/drivers/dri/i965/gen7_clip_state.c
>> b/src/mesa/drivers/dri/i965/gen7_clip_state.c
>> > index 29a5ed5..1256f32 100644
>> > --- a/src/mesa/drivers/dri/i965/gen7_clip_state.c
>> > +++ b/src/mesa/drivers/dri/i965/gen7_clip_state.c
>> > @@ -107,8 +107,7 @@ upload_clip_state(struct brw_context *brw)
>> >  GEN6_CLIP_XY_TEST |
>> >   dw2);
>> > OUT_BATCH(U_FIXED(0.125, 3) << GEN6_CLIP_MIN_POINT_WIDTH_SHIFT |
>> > - U_FIXED(255.875, 3) << GEN6_CLIP_MAX_POINT_WIDTH_SHIFT |
>> > - GEN6_CLIP_FORCE_ZERO_RTAINDEX);
>> > + U_FIXED(255.875, 3) << GEN6_CLIP_MAX_POINT_WIDTH_SHIFT);
>> > ADVANCE_BATCH();
>> >  }
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
>> b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
>> > index 6c01545..5f15eff 100644
>> > --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
>> > +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
>> > @@ -23,6 +23,7 @@
>> >  #include "main/mtypes.h"
>> >  #include "main/blend.h"
>> >  #include "main/samplerobj.h"
>> > +#include "main/texformat.h"
>> >  #include "program/prog_parameter.h"
>> >
>> >  #include "intel_mipmap_tree.h"
>> > @@ -529,12 +530,13 @@ gen7_update_renderbuffer_surface(struct
>> brw_context *brw,
>> > struct gl_context *ctx = &intel->ctx;
>> > struct intel_renderbuffer *irb = intel_renderbuffer(rb);
>> > struct intel_region *region = irb->mt->region;
>> > -   uint32_t tile_x, tile_y;
>> > uint32_t format;
>> > /* _NEW_BUFFERS */
>> > gl_format rb_format = _mesa_get_render_format(ctx,
>> intel_rb_format(irb));
>> > -
>> > -   assert(!layered);
>> > +   uint32_t surftype;
>> > +   bool is_array = false;
>> > +   int depth = rb->Depth > 0 ? rb->Depth - 1 : 0;
>> > +   int min_array_element = 0;
>> >
>> > uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
>> >  8 * 4, 32,
>> &brw->wm.surf_offset[unit]);
>> > @@ -550,7 +552,23 @@ gen7_update_renderbuffer_surface(struct brw_context
>> *brw,
>> >  __FUNCTION__, _mesa_get_format_name(rb_format));
>> > }
>> >
>> > -   surf[0] = BRW_SURFACE_2D << BRW_SURFACE_TYPE_SHIFT |
>> > +   if (rb->TexImage) {
>> > +  surftype = translate_tex_target(rb->TexImage->TexObject->Target);
>> > +  is_array =
>> _mesa_tex_target_is_array(rb->TexImage->TexObject->Target);
>> > +  if (rb->TexImage->TexObject->Target == GL_TEXTURE_CUBE_MAP_ARRAY)
>> {
>> > + assert(rb->Depth > 0);
>> > + surftype = BRW_SURFACE_2D;
>> > + depth = (6 * (depth + 1)) - 1;
>> > +  } else if (rb->TexImage->TexObject->Target ==
>> GL_TEXTURE_CUBE_MAP) {
>> > + surftype = BRW_SURFACE_2D;
>> > + depth = 5;
>> > + is_array = true;
>> > +  }
>> > +   } else {
>> > +  surftype = BRW_SURFACE_2D;
>> > +   }
>> > +
>> > +   surf[0] = surftype << BRW_SURFACE_TYPE_SHIFT |
>> >   format << BRW_SURFACE_FORMAT_SHIFT |
>> >   (irb->mt->array_spacing_lod0 ? GEN7_SURFACE_ARYSPC_LOD0
>> >: GEN7_SURFACE_ARYSPC_FULL) |
>> > @@ -561,24 +579,33 @@ gen7_update_renderbuffer_surface(struct
>> brw_context *brw,
>> > if (irb->mt->align_w == 8)
>> >surf[0] |= GEN7_SURFACE_HALIGN_8;
>> >
>> > -   /* reloc */
>> > -   surf[1] = intel_renderbuffer_tile_offsets(irb, &tile_x, &tile_y) +
>> > - region->b

Re: [Mesa-dev] Gen6+ transform feedback fully in hardware

2013-05-21 Thread Eric Anholt
Kenneth Graunke  writes:

> This series deletes all the software primitive counting used to compute
> transform feedback buffer offsets, and generally cleans up all of that
> code.  Now that we require hardware contexts on Gen6+, we can just
> initialize the counters to 0 and let the hardware do the counting for us.
>
> This means that we get hardware primitive restart in the majority of cases,
> which should be a lot more efficient.  It also means we're prepared for
> geometry shaders.

Reviewed-by: Eric Anholt 


pgpjhj46iwjie.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Mesa 9.1.3

2013-05-21 Thread Ian Romanick

Mesa 9.1.3 has been released.  Mesa 9.1.3 is a bug fix release which
fixes bugs fixed since the 9.1.2 release.

The tag in the GIT repository for Mesa 9.1.3 is 'mesa-9.1.3'.

Mesa 9.1.3 is available for download at
ftp://freedesktop.org/pub/mesa/9.1.3/

md5sums:

952ccd03547ed72333b64e1746cf8ada  MesaLib-9.1.3.tar.bz2
26d2f1aa8e9db388d51fcbd163c61fb7  MesaLib-9.1.3.tar.gz
7017b7bdf0ebfd39a5c46cee7cf6b567  MesaLib-9.1.3.zip

I have verified building from the .tar.bz2 file by doing:

tar -xjf MesaLib-9.1.3.tar.bz2
cd Mesa-9.1.3
./configure --enable-gallium-llvm --with-llvm-shared-libs
make -j6
make install

I have also verified that I pushed the tag.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 64733] unexpected format GL_DEPTH_COMPONENT16 with I865 driver

2013-05-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=64733

Eric Anholt  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #4 from Eric Anholt  ---
Should be the same bug -- any RB creation with depth.  I never got review on my
patches, so I'm just re-pigliting and pushing them now.

*** This bug has been marked as a duplicate of bug 53562 ***

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 17/17] glsl linker: compare interface blocks during interstage linking

2013-05-21 Thread Ian Romanick

On 05/21/2013 11:44 AM, Jordan Justen wrote:

On Tue, May 21, 2013 at 12:11 AM, Kenneth Graunke  wrote:

On 05/16/2013 05:44 PM, Jordan Justen wrote:


Verify that interface blocks match when linking separate shader
stages into a program.

Fixes piglit glsl-1.50 tests:
* linker/interface-blocks-vs-fs-member-count-mismatch.shader_test
* linker/interface-blocks-vs-fs-member-order-mismatch.shader_test

Signed-off-by: Jordan Justen 
---
   src/glsl/interface_blocks.cpp |   47
+++--
   src/glsl/interface_blocks.h   |4 
   src/glsl/linker.cpp   |6 ++
   3 files changed, 55 insertions(+), 2 deletions(-)



I have a few annoying spec questions:

1. What happens if the producer has output interfaces not used by the next
stage?  Is that legal (like unused varyings)?

For example,
[vertex shader]
out foo { vec4 a };
out bar { int b };

[fragment shader]
in foo { vec4 a };
// FS does not reference bar


I'd been assuming this was allowed, similar to unused varyings.


2. What happens if the consumer has input interfaces not declared in the
previous stage?  Is that legal?  If so, does it get populated with undefined
values?

For example,
[vertex shader]
out foo { vec4 a };
// VS does not declare/write bar

[fragment shader]
in foo { vec4 a };
in bar { int b };

The text in the spec is really obnoxious...it's so hand-wavy and full of
examples and prose...


I'd been assuming that this would be a link error.

Do you think these assumptions are reasonable, or should I investigate
the behavior of other drivers?


Yes and yes.  My assumption is that the existing language in the spec 
about inputs and outputs is supposed to apply to interfaces.  We should 
verify that other drivers have made the same assumptions.  Whatever the 
result, we should also suggest some tighter spec language to Khronos.



-Jordan
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 64733] unexpected format GL_DEPTH_COMPONENT16 with I865 driver

2013-05-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=64733

--- Comment #3 from Ian Romanick  ---
I'm 99% sure that this bug is already fixed in Mesa.  That message sounds
really familiar... I could only find bug #53562, but I would swear there was
something else.

I second Brian's suggestion of using newer Mesa.  I would specifically suggest
the 9.1.2 stable release.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 17/17] glsl linker: compare interface blocks during interstage linking

2013-05-21 Thread Jordan Justen
On Tue, May 21, 2013 at 12:11 AM, Kenneth Graunke  wrote:
> On 05/16/2013 05:44 PM, Jordan Justen wrote:
>>
>> Verify that interface blocks match when linking separate shader
>> stages into a program.
>>
>> Fixes piglit glsl-1.50 tests:
>> * linker/interface-blocks-vs-fs-member-count-mismatch.shader_test
>> * linker/interface-blocks-vs-fs-member-order-mismatch.shader_test
>>
>> Signed-off-by: Jordan Justen 
>> ---
>>   src/glsl/interface_blocks.cpp |   47
>> +++--
>>   src/glsl/interface_blocks.h   |4 
>>   src/glsl/linker.cpp   |6 ++
>>   3 files changed, 55 insertions(+), 2 deletions(-)
>
>
> I have a few annoying spec questions:
>
> 1. What happens if the producer has output interfaces not used by the next
> stage?  Is that legal (like unused varyings)?
>
> For example,
>[vertex shader]
>out foo { vec4 a };
>out bar { int b };
>
>[fragment shader]
>in foo { vec4 a };
>// FS does not reference bar

I'd been assuming this was allowed, similar to unused varyings.

> 2. What happens if the consumer has input interfaces not declared in the
> previous stage?  Is that legal?  If so, does it get populated with undefined
> values?
>
> For example,
>[vertex shader]
>out foo { vec4 a };
>// VS does not declare/write bar
>
>[fragment shader]
>in foo { vec4 a };
>in bar { int b };
>
> The text in the spec is really obnoxious...it's so hand-wavy and full of
> examples and prose...

I'd been assuming that this would be a link error.

Do you think these assumptions are reasonable, or should I investigate
the behavior of other drivers?

-Jordan
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] i965 gen7: add support for layered color renderbuffers

2013-05-21 Thread Paul Berry
On 21 May 2013 11:09, Jordan Justen  wrote:

> On Tue, May 21, 2013 at 10:27 AM, Paul Berry 
> wrote:
> > On 17 May 2013 19:11, Jordan Justen  wrote:
> >>
> >> Rather than pointing the surface_state directly at a single
> >> sub-image of the texture for rendering, we now point the
> >> surface_state at the top level of the texture, and configure
> >> the surface_state as needed based on this.
> >>
> >> We now also need to stop setting the FORCE_ZERO_RTAINDEX bit
> >> in the clip date so render target array values other than zero
> >> will be used.
> >
> >
> > Clarifying question to make sure I understand the purpose of this patch:
> it
> > looks to me like you are *not* trying to enable support for gl_Layer (and
> > layered framebuffer attachments) in this patch; you are merely changing
> the
> > way SURFACE_STATE gets configured as a preparatory step, and you'll be
> > adding gl_Layer support later.  Is that correct?
>
> It was my intent to essentially support gl_Layer with this patch. Of
> course, there are still missing pieces before it would be user
> visibible:
> * support layered depth/stencil
> * enable AMD_vertex_shader_layer (or GL 3.2)
>
> > If so, I think the subject line is a little misleading--"add support for"
> > frequently means "add a user-visible feature" and I think someone could
> > easily misunderstand this patch to be adding gl_Layer support.  To
> clarify
> > that there's no user-visible change involved here, perhaps we should say
> > something like "use SURFACE_STATE fields to select level/layer to render
> > to", or "stop using X/Y offsets to select level/layer to render to".
>
> Would you prefer I delay the layered support portions of this patch,
> or just move it to a separate patch in this same series?
>
> I'm guessing you'd prefer to delay the layered support until a future
> series where everything can be enabled and made user-visible.
>

I'm ok with the gl_Layer-related stuff going in now; I was mostly just
confused by the commit subject.  How about if we defer the
gen7_clip_state.c part of the patch until later (since there's a question
about whether it's going to be correct in the long run, and if I'm not
mistaken it's not necessary now), and leave the rest of the patch as is?
If we did that, and changed the subject line to something that doesn't
imply that gl_Layer is finished yet (maybe "prepare for supporting layered
color renderbuffers"?), that would be good enough for me.


>
> >> Signed-off-by: Jordan Justen 
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_defines.h   |2 +
> >>  src/mesa/drivers/dri/i965/gen7_clip_state.c   |3 +-
> >>  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |   63
> >> +++--
> >>  3 files changed, 48 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h
> >> b/src/mesa/drivers/dri/i965/brw_defines.h
> >> index fedd78c..d61151f 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> >> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> >> @@ -539,6 +539,8 @@
> >>  #define GEN7_SURFACE_MULTISAMPLECOUNT_8 (3 << 3)
> >>  #define GEN7_SURFACE_MSFMT_MSS  (0 << 6)
> >>  #define GEN7_SURFACE_MSFMT_DEPTH_STENCIL(1 << 6)
> >> +#define GEN7_SURFACE_MIN_ARRAY_ELEMENT_SHIFT   18
> >> +#define GEN7_SURFACE_RENDER_TARGET_VIEW_EXTENT_SHIFT   7
> >>
> >>  /* Surface state DW5 */
> >>  #define BRW_SURFACE_X_OFFSET_SHIFT 25
> >> diff --git a/src/mesa/drivers/dri/i965/gen7_clip_state.c
> >> b/src/mesa/drivers/dri/i965/gen7_clip_state.c
> >> index 29a5ed5..1256f32 100644
> >> --- a/src/mesa/drivers/dri/i965/gen7_clip_state.c
> >> +++ b/src/mesa/drivers/dri/i965/gen7_clip_state.c
> >> @@ -107,8 +107,7 @@ upload_clip_state(struct brw_context *brw)
> >>  GEN6_CLIP_XY_TEST |
> >>   dw2);
> >> OUT_BATCH(U_FIXED(0.125, 3) << GEN6_CLIP_MIN_POINT_WIDTH_SHIFT |
> >> - U_FIXED(255.875, 3) << GEN6_CLIP_MAX_POINT_WIDTH_SHIFT |
> >> - GEN6_CLIP_FORCE_ZERO_RTAINDEX);
> >> + U_FIXED(255.875, 3) << GEN6_CLIP_MAX_POINT_WIDTH_SHIFT);
> >
> >
> > This doesn't seem right to me--are you sure this is needed?  My reading
> of
> > the bspec is that min_array_element gets applied after the clipper forces
> > RTA index to 0, so it should still be ok for the clipper to force RTA
> index
> > to 0 when we're rendering to a layer other than 0.  It seems to me that
> the
> > only time we should clear the GEN6_CLIP_FORCE_ZERO_RTAINDEX bit is when
> we
> > want gl_Layer to take effect (which, if my earlier assumption is
> correct, we
> > don't want to do yet).
>
> This goes along with the comment above.
>
> > And even when we do get around to enabling gl_Layer,
> > we won't want to do it unconditionally--we'll only want to enable it when
> > the current draw framebuffer is layered (because gl_Layer is supposed to
> be
> > ignored for non-layered framebuffers).
>
> Hmm. Yeah, I'm not sure that scenario is handled pr

Re: [Mesa-dev] [PATCH 5/5] i965 gen7: add support for layered color renderbuffers

2013-05-21 Thread Jordan Justen
On Tue, May 21, 2013 at 10:27 AM, Paul Berry  wrote:
> On 17 May 2013 19:11, Jordan Justen  wrote:
>>
>> Rather than pointing the surface_state directly at a single
>> sub-image of the texture for rendering, we now point the
>> surface_state at the top level of the texture, and configure
>> the surface_state as needed based on this.
>>
>> We now also need to stop setting the FORCE_ZERO_RTAINDEX bit
>> in the clip date so render target array values other than zero
>> will be used.
>
>
> Clarifying question to make sure I understand the purpose of this patch: it
> looks to me like you are *not* trying to enable support for gl_Layer (and
> layered framebuffer attachments) in this patch; you are merely changing the
> way SURFACE_STATE gets configured as a preparatory step, and you'll be
> adding gl_Layer support later.  Is that correct?

It was my intent to essentially support gl_Layer with this patch. Of
course, there are still missing pieces before it would be user
visibible:
* support layered depth/stencil
* enable AMD_vertex_shader_layer (or GL 3.2)

> If so, I think the subject line is a little misleading--"add support for"
> frequently means "add a user-visible feature" and I think someone could
> easily misunderstand this patch to be adding gl_Layer support.  To clarify
> that there's no user-visible change involved here, perhaps we should say
> something like "use SURFACE_STATE fields to select level/layer to render
> to", or "stop using X/Y offsets to select level/layer to render to".

Would you prefer I delay the layered support portions of this patch,
or just move it to a separate patch in this same series?

I'm guessing you'd prefer to delay the layered support until a future
series where everything can be enabled and made user-visible.

>> Signed-off-by: Jordan Justen 
>> ---
>>  src/mesa/drivers/dri/i965/brw_defines.h   |2 +
>>  src/mesa/drivers/dri/i965/gen7_clip_state.c   |3 +-
>>  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |   63
>> +++--
>>  3 files changed, 48 insertions(+), 20 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h
>> b/src/mesa/drivers/dri/i965/brw_defines.h
>> index fedd78c..d61151f 100644
>> --- a/src/mesa/drivers/dri/i965/brw_defines.h
>> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
>> @@ -539,6 +539,8 @@
>>  #define GEN7_SURFACE_MULTISAMPLECOUNT_8 (3 << 3)
>>  #define GEN7_SURFACE_MSFMT_MSS  (0 << 6)
>>  #define GEN7_SURFACE_MSFMT_DEPTH_STENCIL(1 << 6)
>> +#define GEN7_SURFACE_MIN_ARRAY_ELEMENT_SHIFT   18
>> +#define GEN7_SURFACE_RENDER_TARGET_VIEW_EXTENT_SHIFT   7
>>
>>  /* Surface state DW5 */
>>  #define BRW_SURFACE_X_OFFSET_SHIFT 25
>> diff --git a/src/mesa/drivers/dri/i965/gen7_clip_state.c
>> b/src/mesa/drivers/dri/i965/gen7_clip_state.c
>> index 29a5ed5..1256f32 100644
>> --- a/src/mesa/drivers/dri/i965/gen7_clip_state.c
>> +++ b/src/mesa/drivers/dri/i965/gen7_clip_state.c
>> @@ -107,8 +107,7 @@ upload_clip_state(struct brw_context *brw)
>>  GEN6_CLIP_XY_TEST |
>>   dw2);
>> OUT_BATCH(U_FIXED(0.125, 3) << GEN6_CLIP_MIN_POINT_WIDTH_SHIFT |
>> - U_FIXED(255.875, 3) << GEN6_CLIP_MAX_POINT_WIDTH_SHIFT |
>> - GEN6_CLIP_FORCE_ZERO_RTAINDEX);
>> + U_FIXED(255.875, 3) << GEN6_CLIP_MAX_POINT_WIDTH_SHIFT);
>
>
> This doesn't seem right to me--are you sure this is needed?  My reading of
> the bspec is that min_array_element gets applied after the clipper forces
> RTA index to 0, so it should still be ok for the clipper to force RTA index
> to 0 when we're rendering to a layer other than 0.  It seems to me that the
> only time we should clear the GEN6_CLIP_FORCE_ZERO_RTAINDEX bit is when we
> want gl_Layer to take effect (which, if my earlier assumption is correct, we
> don't want to do yet).

This goes along with the comment above.

> And even when we do get around to enabling gl_Layer,
> we won't want to do it unconditionally--we'll only want to enable it when
> the current draw framebuffer is layered (because gl_Layer is supposed to be
> ignored for non-layered framebuffers).

Hmm. Yeah, I'm not sure that scenario is handled properly by this.

>>
>> ADVANCE_BATCH();
>>  }
>>
>> diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
>> b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
>> index 6c01545..5f15eff 100644
>> --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
>> +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
>> @@ -23,6 +23,7 @@
>>  #include "main/mtypes.h"
>>  #include "main/blend.h"
>>  #include "main/samplerobj.h"
>> +#include "main/texformat.h"
>>  #include "program/prog_parameter.h"
>>
>>  #include "intel_mipmap_tree.h"
>> @@ -529,12 +530,13 @@ gen7_update_renderbuffer_surface(struct brw_context
>> *brw,
>> struct gl_context *ctx = &intel->ctx;
>> struct intel_renderbuffer *irb = intel_renderbuffer(rb);
>> struct intel_region *region = irb->mt->region;

Re: [Mesa-dev] [PATCH 5/5] i965 gen7: add support for layered color renderbuffers

2013-05-21 Thread Paul Berry
On 17 May 2013 19:11, Jordan Justen  wrote:

> Rather than pointing the surface_state directly at a single
> sub-image of the texture for rendering, we now point the
> surface_state at the top level of the texture, and configure
> the surface_state as needed based on this.
>
> We now also need to stop setting the FORCE_ZERO_RTAINDEX bit
> in the clip date so render target array values other than zero
> will be used.
>

Clarifying question to make sure I understand the purpose of this patch: it
looks to me like you are *not* trying to enable support for gl_Layer (and
layered framebuffer attachments) in this patch; you are merely changing the
way SURFACE_STATE gets configured as a preparatory step, and you'll be
adding gl_Layer support later.  Is that correct?

If so, I think the subject line is a little misleading--"add support for"
frequently means "add a user-visible feature" and I think someone could
easily misunderstand this patch to be adding gl_Layer support.  To clarify
that there's no user-visible change involved here, perhaps we should say
something like "use SURFACE_STATE fields to select level/layer to render
to", or "stop using X/Y offsets to select level/layer to render to".


>
> Signed-off-by: Jordan Justen 
> ---
>  src/mesa/drivers/dri/i965/brw_defines.h   |2 +
>  src/mesa/drivers/dri/i965/gen7_clip_state.c   |3 +-
>  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |   63
> +++--
>  3 files changed, 48 insertions(+), 20 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index fedd78c..d61151f 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -539,6 +539,8 @@
>  #define GEN7_SURFACE_MULTISAMPLECOUNT_8 (3 << 3)
>  #define GEN7_SURFACE_MSFMT_MSS  (0 << 6)
>  #define GEN7_SURFACE_MSFMT_DEPTH_STENCIL(1 << 6)
> +#define GEN7_SURFACE_MIN_ARRAY_ELEMENT_SHIFT   18
> +#define GEN7_SURFACE_RENDER_TARGET_VIEW_EXTENT_SHIFT   7
>
>  /* Surface state DW5 */
>  #define BRW_SURFACE_X_OFFSET_SHIFT 25
> diff --git a/src/mesa/drivers/dri/i965/gen7_clip_state.c
> b/src/mesa/drivers/dri/i965/gen7_clip_state.c
> index 29a5ed5..1256f32 100644
> --- a/src/mesa/drivers/dri/i965/gen7_clip_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_clip_state.c
> @@ -107,8 +107,7 @@ upload_clip_state(struct brw_context *brw)
>  GEN6_CLIP_XY_TEST |
>   dw2);
> OUT_BATCH(U_FIXED(0.125, 3) << GEN6_CLIP_MIN_POINT_WIDTH_SHIFT |
> - U_FIXED(255.875, 3) << GEN6_CLIP_MAX_POINT_WIDTH_SHIFT |
> - GEN6_CLIP_FORCE_ZERO_RTAINDEX);
> + U_FIXED(255.875, 3) << GEN6_CLIP_MAX_POINT_WIDTH_SHIFT);
>

This doesn't seem right to me--are you sure this is needed?  My reading of
the bspec is that min_array_element gets applied after the clipper forces
RTA index to 0, so it should still be ok for the clipper to force RTA index
to 0 when we're rendering to a layer other than 0.  It seems to me that the
only time we should clear the GEN6_CLIP_FORCE_ZERO_RTAINDEX bit is when we
want gl_Layer to take effect (which, if my earlier assumption is correct,
we don't want to do yet).  And even when we do get around to enabling
gl_Layer, we won't want to do it unconditionally--we'll only want to enable
it when the current draw framebuffer is layered (because gl_Layer is
supposed to be ignored for non-layered framebuffers).


> ADVANCE_BATCH();
>  }
>
> diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> index 6c01545..5f15eff 100644
> --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> @@ -23,6 +23,7 @@
>  #include "main/mtypes.h"
>  #include "main/blend.h"
>  #include "main/samplerobj.h"
> +#include "main/texformat.h"
>  #include "program/prog_parameter.h"
>
>  #include "intel_mipmap_tree.h"
> @@ -529,12 +530,13 @@ gen7_update_renderbuffer_surface(struct brw_context
> *brw,
> struct gl_context *ctx = &intel->ctx;
> struct intel_renderbuffer *irb = intel_renderbuffer(rb);
> struct intel_region *region = irb->mt->region;
> -   uint32_t tile_x, tile_y;
> uint32_t format;
> /* _NEW_BUFFERS */
> gl_format rb_format = _mesa_get_render_format(ctx,
> intel_rb_format(irb));
> -
> -   assert(!layered);
> +   uint32_t surftype;
> +   bool is_array = false;
> +   int depth = rb->Depth > 0 ? rb->Depth - 1 : 0;
>

As written, the "depth" variable holds the true depth minus one (since
that's what the hardware expects in the "depth" field of SURFACE_STATE).
That makes some of the math below confusing to follow (especially the
formula "depth = (6 * (depth + 1)) - 1;").  I'd recommend storing the true
depth in the "depth" variable, and don't subtract one until you store it in
surf[3].  So this line would become "int depth = MAX(rb->Depth, 1);", and
"depth = (6 * (d

Re: [Mesa-dev] [PATCH 4/5] mesa/texformat: add _mesa_tex_target_is_array function

2013-05-21 Thread Paul Berry
On 17 May 2013 19:11, Jordan Justen  wrote:

> Signed-off-by: Jordan Justen 
>

Patches 1-4 are

Reviewed-by: Paul Berry 


> ---
>  src/mesa/main/texformat.c |   13 +
>  src/mesa/main/texformat.h |2 ++
>  2 files changed, 15 insertions(+)
>
> diff --git a/src/mesa/main/texformat.c b/src/mesa/main/texformat.c
> index ed40b7e..a7df868 100644
> --- a/src/mesa/main/texformat.c
> +++ b/src/mesa/main/texformat.c
> @@ -929,3 +929,16 @@ _mesa_choose_tex_format(struct gl_context *ctx,
> GLenum target,
> return MESA_FORMAT_NONE;
>  }
>
> +GLboolean
> +_mesa_tex_target_is_array(GLenum target)
> +{
> +   switch (target) {
> +   case GL_TEXTURE_1D_ARRAY_EXT:
> +   case GL_TEXTURE_2D_ARRAY_EXT:
> +   case GL_TEXTURE_2D_MULTISAMPLE_ARRAY:
> +   case GL_TEXTURE_CUBE_MAP_ARRAY:
> +  return GL_TRUE;
> +   default:
> +  return GL_FALSE;
> +   }
> +}
> diff --git a/src/mesa/main/texformat.h b/src/mesa/main/texformat.h
> index efe2699..d6ff541 100644
> --- a/src/mesa/main/texformat.h
> +++ b/src/mesa/main/texformat.h
> @@ -36,5 +36,7 @@ extern gl_format
>  _mesa_choose_tex_format(struct gl_context *ctx, GLenum target,
>  GLint internalFormat, GLenum format, GLenum type);
>
> +extern GLboolean
> +_mesa_tex_target_is_array(GLenum target);
>
>  #endif
> --
> 1.7.10.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 64733] unexpected format GL_DEPTH_COMPONENT16 with I865 driver

2013-05-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=64733

Brian Paul  changed:

   What|Removed |Added

Summary|Asked To Report Here|unexpected format
   ||GL_DEPTH_COMPONENT16 with
   ||I865 driver

--- Comment #2 from Brian Paul  ---
Renaming this bug.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 64733] Asked To Report Here

2013-05-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=64733

--- Comment #1 from Brian Paul  ---
Mesa 8.0.4 is getting a bit old. Can you possibly test a newer version?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] i965 gen7: add support for layered color renderbuffers

2013-05-21 Thread Paul Berry
On 20 May 2013 20:13, Chia-I Wu  wrote:

> On Tue, May 21, 2013 at 5:08 AM, Paul Berry 
> wrote:
> > On 17 May 2013 21:44, Chia-I Wu  wrote:
> >>
> >> On Sat, May 18, 2013 at 10:11 AM, Jordan Justen
> >>  wrote:
> >> > Rather than pointing the surface_state directly at a single
> >> > sub-image of the texture for rendering, we now point the
> >> > surface_state at the top level of the texture, and configure
> >> > the surface_state as needed based on this.
> >> >
> >> > We now also need to stop setting the FORCE_ZERO_RTAINDEX bit
> >> > in the clip date so render target array values other than zero
> >> > will be used.
> >> >
> >> > Signed-off-by: Jordan Justen 
> >> > ---
> >> >  src/mesa/drivers/dri/i965/brw_defines.h   |2 +
> >> >  src/mesa/drivers/dri/i965/gen7_clip_state.c   |3 +-
> >> >  src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |   63
> >> > +++--
> >> >  3 files changed, 48 insertions(+), 20 deletions(-)
> >> >
> >> > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h
> >> > b/src/mesa/drivers/dri/i965/brw_defines.h
> >> > index fedd78c..d61151f 100644
> >> > --- a/src/mesa/drivers/dri/i965/brw_defines.h
> >> > +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> >> > @@ -539,6 +539,8 @@
> >> >  #define GEN7_SURFACE_MULTISAMPLECOUNT_8 (3 << 3)
> >> >  #define GEN7_SURFACE_MSFMT_MSS  (0 << 6)
> >> >  #define GEN7_SURFACE_MSFMT_DEPTH_STENCIL(1 << 6)
> >> > +#define GEN7_SURFACE_MIN_ARRAY_ELEMENT_SHIFT   18
> >> > +#define GEN7_SURFACE_RENDER_TARGET_VIEW_EXTENT_SHIFT   7
> >> >
> >> >  /* Surface state DW5 */
> >> >  #define BRW_SURFACE_X_OFFSET_SHIFT 25
> >> > diff --git a/src/mesa/drivers/dri/i965/gen7_clip_state.c
> >> > b/src/mesa/drivers/dri/i965/gen7_clip_state.c
> >> > index 29a5ed5..1256f32 100644
> >> > --- a/src/mesa/drivers/dri/i965/gen7_clip_state.c
> >> > +++ b/src/mesa/drivers/dri/i965/gen7_clip_state.c
> >> > @@ -107,8 +107,7 @@ upload_clip_state(struct brw_context *brw)
> >> >  GEN6_CLIP_XY_TEST |
> >> >   dw2);
> >> > OUT_BATCH(U_FIXED(0.125, 3) << GEN6_CLIP_MIN_POINT_WIDTH_SHIFT |
> >> > - U_FIXED(255.875, 3) << GEN6_CLIP_MAX_POINT_WIDTH_SHIFT |
> >> > - GEN6_CLIP_FORCE_ZERO_RTAINDEX);
> >> > + U_FIXED(255.875, 3) << GEN6_CLIP_MAX_POINT_WIDTH_SHIFT);
> >> > ADVANCE_BATCH();
> >> >  }
> >> >
> >> > diff --git a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> >> > b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> >> > index 6c01545..5f15eff 100644
> >> > --- a/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> >> > +++ b/src/mesa/drivers/dri/i965/gen7_wm_surface_state.c
> >> > @@ -23,6 +23,7 @@
> >> >  #include "main/mtypes.h"
> >> >  #include "main/blend.h"
> >> >  #include "main/samplerobj.h"
> >> > +#include "main/texformat.h"
> >> >  #include "program/prog_parameter.h"
> >> >
> >> >  #include "intel_mipmap_tree.h"
> >> > @@ -529,12 +530,13 @@ gen7_update_renderbuffer_surface(struct
> >> > brw_context *brw,
> >> > struct gl_context *ctx = &intel->ctx;
> >> > struct intel_renderbuffer *irb = intel_renderbuffer(rb);
> >> > struct intel_region *region = irb->mt->region;
> >> > -   uint32_t tile_x, tile_y;
> >> > uint32_t format;
> >> > /* _NEW_BUFFERS */
> >> > gl_format rb_format = _mesa_get_render_format(ctx,
> >> > intel_rb_format(irb));
> >> > -
> >> > -   assert(!layered);
> >> > +   uint32_t surftype;
> >> > +   bool is_array = false;
> >> > +   int depth = rb->Depth > 0 ? rb->Depth - 1 : 0;
> >> > +   int min_array_element = 0;
> >> >
> >> > uint32_t *surf = brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
> >> >  8 * 4, 32,
> >> > &brw->wm.surf_offset[unit]);
> >> > @@ -550,7 +552,23 @@ gen7_update_renderbuffer_surface(struct
> brw_context
> >> > *brw,
> >> >  __FUNCTION__, _mesa_get_format_name(rb_format));
> >> > }
> >> >
> >> > -   surf[0] = BRW_SURFACE_2D << BRW_SURFACE_TYPE_SHIFT |
> >> > +   if (rb->TexImage) {
> >> > +  surftype =
> translate_tex_target(rb->TexImage->TexObject->Target);
> >> > +  is_array =
> >> > _mesa_tex_target_is_array(rb->TexImage->TexObject->Target);
> >> > +  if (rb->TexImage->TexObject->Target ==
> GL_TEXTURE_CUBE_MAP_ARRAY)
> >> > {
> >> > + assert(rb->Depth > 0);
> >> > + surftype = BRW_SURFACE_2D;
> >> > + depth = (6 * (depth + 1)) - 1;
> >> > +  } else if (rb->TexImage->TexObject->Target ==
> >> > GL_TEXTURE_CUBE_MAP) {
> >> > + surftype = BRW_SURFACE_2D;
> >> > + depth = 5;
> >> > + is_array = true;
> >> > +  }
> >> > +   } else {
> >> > +  surftype = BRW_SURFACE_2D;
> >> > +   }
> >> > +
> >> > +   surf[0] = surftype << BRW_SURFACE_TYPE_SHIFT |
> >> >   format << BRW_SURFACE_FORMAT_SHIFT |
> >> >   (irb->mt->array_spacing_lod0 ? GEN7_SURFACE_ARYSPC_LOD0
> >> >:

Re: [Mesa-dev] [PATCH 7/7] i965: Split BeginTransformFeedback hook into Gen6 and Gen7+ variants.

2013-05-21 Thread Paul Berry
On 20 May 2013 15:54, Kenneth Graunke  wrote:

> Most of the work in BeginTransformFeedback is only necessary on Gen6.
> We may as well just skip it on Gen7+.
>
> Signed-off-by: Kenneth Graunke 
> Cc: Eric Anholt 
> Cc: Paul Berry 
> ---
>  src/mesa/drivers/dri/i965/brw_context.c|  8 +++---
>  src/mesa/drivers/dri/i965/brw_context.h|  3 +++
>  src/mesa/drivers/dri/i965/gen6_sol.c   | 41
> +++---
>  src/mesa/drivers/dri/i965/gen7_sol_state.c | 17 +
>  4 files changed, 40 insertions(+), 29 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_context.c
> b/src/mesa/drivers/dri/i965/brw_context.c
> index 405580f..56c42ba 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.c
> +++ b/src/mesa/drivers/dri/i965/brw_context.c
> @@ -94,12 +94,14 @@ static void brwInitDriverFunctions(struct intel_screen
> *screen,
>gen4_init_queryobj_functions(functions);
>
> functions->QuerySamplesForFormat = brw_query_samples_for_format;
> -   functions->BeginTransformFeedback = brw_begin_transform_feedback;
>
> -   if (screen->gen >= 7)
> +   if (screen->gen >= 7) {
> +  functions->BeginTransformFeedback = gen7_begin_transform_feedback;
>functions->EndTransformFeedback = gen7_end_transform_feedback;
> -   else
> +   } else {
> +  functions->BeginTransformFeedback = brw_begin_transform_feedback;
>functions->EndTransformFeedback = brw_end_transform_feedback;
> +   }
>
> if (screen->gen >= 6)
>functions->GetSamplePosition = gen6_get_sample_position;
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h
> b/src/mesa/drivers/dri/i965/brw_context.h
> index be49078..60b713d 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1228,6 +1228,9 @@ brw_end_transform_feedback(struct gl_context *ctx,
>
>  /* gen7_sol_state.c */
>  void
> +gen7_begin_transform_feedback(struct gl_context *ctx, GLenum mode,
> +  struct gl_transform_feedback_object *obj);
> +void
>  gen7_end_transform_feedback(struct gl_context *ctx,
> struct gl_transform_feedback_object *obj);
>
> diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c
> b/src/mesa/drivers/dri/i965/gen6_sol.c
> index cdd6e74..8e197a1 100644
> --- a/src/mesa/drivers/dri/i965/gen6_sol.c
> +++ b/src/mesa/drivers/dri/i965/gen6_sol.c
> @@ -152,36 +152,25 @@ brw_begin_transform_feedback(struct gl_context *ctx,
> GLenum mode,
>= _mesa_compute_max_transform_feedback_vertices(xfb_obj,
>linked_xfb_info);
>
> -   if (intel->gen == 6) {
>

Can we replace this with "assert(intel->gen == 6)" just to document that
this is a Gen6-only function now?

With that change, the series is:

Reviewed-by: Paul Berry 


> -  /* Initialize the SVBI 0 register to zero and set the maximum
> index. */
> +   /* Initialize the SVBI 0 register to zero and set the maximum index. */
> +   BEGIN_BATCH(4);
> +   OUT_BATCH(_3DSTATE_GS_SVB_INDEX << 16 | (4 - 2));
> +   OUT_BATCH(0); /* SVBI 0 */
> +   OUT_BATCH(0); /* starting index */
> +   OUT_BATCH(max_index);
> +   ADVANCE_BATCH();
> +
> +   /* Initialize the rest of the unused streams to sane values.
>  Otherwise,
> +* they may indicate that there is no room to write data and prevent
> +* anything from happening at all.
> +*/
> +   for (int i = 1; i < 4; i++) {
>BEGIN_BATCH(4);
>OUT_BATCH(_3DSTATE_GS_SVB_INDEX << 16 | (4 - 2));
> -  OUT_BATCH(0); /* SVBI 0 */
> +  OUT_BATCH(i << SVB_INDEX_SHIFT);
>OUT_BATCH(0); /* starting index */
> -  OUT_BATCH(max_index);
> +  OUT_BATCH(0x);
>ADVANCE_BATCH();
> -
> -  /* Initialize the rest of the unused streams to sane values.
>  Otherwise,
> -   * they may indicate that there is no room to write data and prevent
> -   * anything from happening at all.
> -   */
> -  for (int i = 1; i < 4; i++) {
> - BEGIN_BATCH(4);
> - OUT_BATCH(_3DSTATE_GS_SVB_INDEX << 16 | (4 - 2));
> - OUT_BATCH(i << SVB_INDEX_SHIFT);
> - OUT_BATCH(0); /* starting index */
> - OUT_BATCH(0x);
> - ADVANCE_BATCH();
> -  }
> -   } else if (intel->gen >= 7) {
> -  /* Reset the SOL buffer offset register. */
> -  for (int i = 0; i < 4; i++) {
> - BEGIN_BATCH(3);
> - OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2));
> - OUT_BATCH(GEN7_SO_WRITE_OFFSET(i));
> - OUT_BATCH(0);
> - ADVANCE_BATCH();
> -  }
> }
>  }
>
> diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c
> b/src/mesa/drivers/dri/i965/gen7_sol_state.c
> index 2c4b7f9..8dfac01 100644
> --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
> @@ -254,6 +254,23 @@ const struct brw_tracked_state gen7_sol_state = {
>  };
>
>  void
> +gen7_begin_transform_feedback(struct gl_context *ctx, GLenum mode,
> +  

Re: [Mesa-dev] [PATCH 2/8] mesa: Use accessor for stencil reference values in glGet

2013-05-21 Thread Brian Paul

On 05/20/2013 08:59 PM, Kenneth Graunke wrote:

On 05/20/2013 04:19 PM, Brian Paul wrote:

On 05/13/2013 05:10 AM, Chris Forbes wrote:

Signed-off-by: Chris Forbes
---
src/mesa/main/get.c | 6 +-
src/mesa/main/get_hash_params.py | 2 +-
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
index f498118..593c75b 100644
--- a/src/mesa/main/get.c
+++ b/src/mesa/main/get.c
@@ -36,6 +36,7 @@
#include "texcompress.h"
#include "framebuffer.h"
#include "samplerobj.h"
+#include "stencil.h"

/* This is a table driven implemetation of the glGet*v() functions.
* The basic idea is that most getters just look up an int somewhere
@@ -675,7 +676,10 @@ find_custom_value(struct gl_context *ctx, const
struct value_desc *d, union valu
v->value_enum = ctx->Stencil.ZPassFunc[ctx->Stencil.ActiveFace];
break;
case GL_STENCIL_REF:
- v->value_int = ctx->Stencil.Ref[ctx->Stencil.ActiveFace];
+ v->value_int = _mesa_get_stencil_ref(ctx,
ctx->Stencil.ActiveFace);


I think this is incorrect. I tested with NVIDIA's driver and it returns
the user's originally-specified value without any sort of masking.

Can you revert this chunk?

Sorry I missed this earlier, I was on vacation.

-Brian


Chris did notice that, but I think nVidia's implementation is
incorrect. The specification is pretty clear that it should be clamped.

It might be worth trying AMD. If both return the unclamped value, we
might want to file a spec bug.


Yeah, it would be good to know what another implementation does.  It 
doesn't seem right that clamping/masking should be done for glGet.


-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] r600g/llvm: fix sample cube shadow

2013-05-21 Thread Vincent Lejeune
---
 src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 3f7e79f..f49170d 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -654,7 +654,8 @@ void radeon_llvm_emit_prepare_cube_coords(
opcode == TGSI_OPCODE_TXB2 ||
opcode == TGSI_OPCODE_TXL2) {
coords[3] = coords_arg[4];
-   } else if (opcode == TGSI_OPCODE_TXB ||
+   } else if (opcode == TGSI_OPCODE_TEX ||
+   opcode == TGSI_OPCODE_TXB ||
opcode == TGSI_OPCODE_TXL) {
coords[3] = coords_arg[3];
}
-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] r600g/llvm: fix txq for texture buffer

2013-05-21 Thread Vincent Lejeune
---
 src/gallium/drivers/r600/r600_llvm.c | 7 +--
 src/gallium/drivers/r600/r600_shader.c   | 1 +
 src/gallium/drivers/radeon/radeon_llvm.h | 1 +
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_llvm.c 
b/src/gallium/drivers/r600/r600_llvm.c
index c1809b3..77c6abb 100644
--- a/src/gallium/drivers/r600/r600_llvm.c
+++ b/src/gallium/drivers/r600/r600_llvm.c
@@ -23,6 +23,7 @@
 #define CONSTANT_BUFFER_0_ADDR_SPACE 8
 #define CONSTANT_BUFFER_1_ADDR_SPACE (CONSTANT_BUFFER_0_ADDR_SPACE + 
R600_UCP_CONST_BUFFER)
 #define CONSTANT_TXQ_BUFFER (CONSTANT_BUFFER_0_ADDR_SPACE + 
R600_TXQ_CONST_BUFFER)
+#define LLVM_R600_BUFFER_INFO_CONST_BUFFER (CONSTANT_BUFFER_0_ADDR_SPACE + 
R600_BUFFER_INFO_CONST_BUFFER)
 
 static LLVMValueRef llvm_load_const_buffer(
struct lp_build_tgsi_context * bld_base,
@@ -410,8 +411,10 @@ static void llvm_emit_tex(
if (emit_data->inst->Texture.Texture == TGSI_TEXTURE_BUFFER) {
switch (emit_data->inst->Instruction.Opcode) {
case TGSI_OPCODE_TXQ: {
-   LLVMValueRef offset = 
lp_build_const_int32(bld_base->base.gallivm, 1);
-   LLVMValueRef cvecval = llvm_load_const_buffer(bld_base, 
offset, R600_BUFFER_INFO_CONST_BUFFER);
+   struct radeon_llvm_context * ctx = 
radeon_llvm_context(bld_base);
+   ctx->uses_tex_buffers = true;
+   LLVMValueRef offset = 
lp_build_const_int32(bld_base->base.gallivm, 0);
+   LLVMValueRef cvecval = llvm_load_const_buffer(bld_base, 
offset, LLVM_R600_BUFFER_INFO_CONST_BUFFER);
emit_data->output[0] = cvecval;
return;
}
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 81ed3ce..2f126c6 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -1170,6 +1170,7 @@ static int r600_shader_from_tgsi(struct r600_screen 
*rscreen,
radeon_llvm_ctx.alpha_to_one = key.alpha_to_one;
mod = r600_tgsi_llvm(&radeon_llvm_ctx, tokens);
ctx.shader->has_txq_cube_array_z_comp = 
radeon_llvm_ctx.has_txq_cube_array_z_comp;
+   ctx.shader->uses_tex_buffers = radeon_llvm_ctx.uses_tex_buffers;
 
if (r600_llvm_compile(mod, rscreen->family, ctx.bc, &use_kill, 
dump)) {
radeon_llvm_dispose(&radeon_llvm_ctx);
diff --git a/src/gallium/drivers/radeon/radeon_llvm.h 
b/src/gallium/drivers/radeon/radeon_llvm.h
index 14a8c34..345ae70 100644
--- a/src/gallium/drivers/radeon/radeon_llvm.h
+++ b/src/gallium/drivers/radeon/radeon_llvm.h
@@ -67,6 +67,7 @@ struct radeon_llvm_context {
unsigned fs_color_all;
unsigned alpha_to_one;
unsigned has_txq_cube_array_z_comp;
+   unsigned uses_tex_buffers;
 
/*=== Front end configuration ===*/
 
-- 
1.8.2.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 64745] [llvmpipe] SIGSEGV src/gallium/state_trackers/glx/xlib/glx_api.c:1374

2013-05-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=64745

Brian Paul  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Brian Paul  ---
OK, patches to glxinfo and Mesa have been pushed.  I think we can close this
bug, but as to why context creation is failing - that may be another issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] xlib: check for null ctx pointer in glXIsDirect()

2013-05-21 Thread Jose Fonseca


- Original Message -
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64745
> Note: This is a candidate for the stable branches.
> ---
>  src/mesa/drivers/x11/fakeglx.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/src/mesa/drivers/x11/fakeglx.c b/src/mesa/drivers/x11/fakeglx.c
> index daed4f4..c7fb327 100644
> --- a/src/mesa/drivers/x11/fakeglx.c
> +++ b/src/mesa/drivers/x11/fakeglx.c
> @@ -1551,7 +1551,7 @@ Fake_glXIsDirect( Display *dpy, GLXContext ctx )
>  {
> struct fake_glx_context *glxCtx = (struct fake_glx_context *) ctx;
> (void) dpy;
> -   return glxCtx->xmesaContext->direct;
> +   return glxCtx ? glxCtx->xmesaContext->direct : False;
>  }
>  
>  
> --
> 1.7.3.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

For series
Reviewed-by: Jose Fonseca 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeon: Initialize variables in radeon_llvm_context_init.

2013-05-21 Thread Michel Dänzer
On Sam, 2013-05-18 at 00:28 -0700, Vinson Lee wrote:
> 'type' was not fully initialized when calling lp_build_context_init.
> 
> Fixes "Uninitialized scalar variable" defect reported by Coverity.
> 
> Signed-off-by: Vinson Lee 
> ---
>  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> index 0629b89..972655e 100644
> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> @@ -1172,7 +1172,9 @@ void radeon_llvm_context_init(struct 
> radeon_llvm_context * ctx)
> /* XXX: We need to revisit this.I think the correct way to do this is
>  * to use length = 4 here and use the elem_bld for everything. */
> type.floating = TRUE;
> +   type.fixed = FALSE;
> type.sign = TRUE;
> +   type.norm = FALSE;
> type.width = 32;
> type.length = 1;

Might be safer to use memset? Either way:

NOTE: This is a candidate for the stable branches.

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 17/17] glsl linker: compare interface blocks during interstage linking

2013-05-21 Thread Kenneth Graunke

On 05/16/2013 05:44 PM, Jordan Justen wrote:

Verify that interface blocks match when linking separate shader
stages into a program.

Fixes piglit glsl-1.50 tests:
* linker/interface-blocks-vs-fs-member-count-mismatch.shader_test
* linker/interface-blocks-vs-fs-member-order-mismatch.shader_test

Signed-off-by: Jordan Justen 
---
  src/glsl/interface_blocks.cpp |   47 +++--
  src/glsl/interface_blocks.h   |4 
  src/glsl/linker.cpp   |6 ++
  3 files changed, 55 insertions(+), 2 deletions(-)


I have a few annoying spec questions:

1. What happens if the producer has output interfaces not used by the 
next stage?  Is that legal (like unused varyings)?


For example,
   [vertex shader]
   out foo { vec4 a };
   out bar { int b };

   [fragment shader]
   in foo { vec4 a };
   // FS does not reference bar

2. What happens if the consumer has input interfaces not declared in the 
previous stage?  Is that legal?  If so, does it get populated with 
undefined values?


For example,
   [vertex shader]
   out foo { vec4 a };
   // VS does not declare/write bar

   [fragment shader]
   in foo { vec4 a };
   in bar { int b };

The text in the spec is really obnoxious...it's so hand-wavy and full of 
examples and prose...

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] intel_fbo: set gl_renderbuffer Depth field

2013-05-21 Thread Jordan Justen
On Mon, May 20, 2013 at 11:30 PM, Pohjolainen, Topi
 wrote:
> On Fri, May 17, 2013 at 07:11:36PM -0700, Jordan Justen wrote:
>> Set the renderbuffer's Depth field to match the texture's
>> Depth when rendering to a texture.
>>
>> Signed-off-by: Jordan Justen 
>> ---
>>  src/mesa/drivers/dri/intel/intel_fbo.c |9 ++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c 
>> b/src/mesa/drivers/dri/intel/intel_fbo.c
>> index a8a7ab3..243c00a 100644
>> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
>> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
>> @@ -482,14 +482,17 @@ intel_framebuffer_renderbuffer(struct gl_context * ctx,
>>  static bool
>>  intel_renderbuffer_update_wrapper(struct intel_context *intel,
>>struct intel_renderbuffer *irb,
>> -   struct gl_texture_image *image,
>> -  uint32_t layer)
>> +  struct gl_texture_image *image,
>> +  uint32_t layer,
>> +  bool layered)
>
> This belongs to the next patch in the series, right?

Nope, that is a different interface.

But, actually I need to remove the layered parameter here since I'm
not using it anymore. Originally, I was only setting rb->Depth (below)
if it was a layered renderbuffer.

-Jordan

>
>>  {
>> struct gl_renderbuffer *rb = &irb->Base.Base;
>> struct intel_texture_image *intel_image = intel_texture_image(image);
>> struct intel_mipmap_tree *mt = intel_image->mt;
>> int level = image->Level;
>>
>> +   rb->Depth = image->Depth;
>> +
>> rb->AllocStorage = intel_nop_alloc_storage;
>>
>> intel_miptree_check_level_layer(mt, level, layer);
>> @@ -598,7 +601,7 @@ intel_render_texture(struct gl_context * ctx,
>>
>> intel_miptree_check_level_layer(mt, att->TextureLevel, layer);
>>
>> -   if (!intel_renderbuffer_update_wrapper(intel, irb, image, layer)) {
>> +   if (!intel_renderbuffer_update_wrapper(intel, irb, image, layer, 
>> att->Layered)) {
>> _swrast_render_texture(ctx, fb, att);
>> return;
>> }
>> --
>> 1.7.10.4
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev