[Mesa-dev] [PATCH] i965: Tidy shader time printing code by using printf's field widths.

2013-03-28 Thread Kenneth Graunke
We can use %-6s%-6s rather than manually counting characters, resulting
in much more readable code.

This necessitates a small secondary change: using "total fs16" and ""
now causes the "" string to be padded out to 6 characters, resulting in
too much whitespace.  Splitting it into "total" and "fs16" produces the
same output as before.

Cc: Eric Anholt 
Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_program.c | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

What in the world?  Loops? :)

Want to squash this in with your patch?

diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
b/src/mesa/drivers/dri/i965/brw_program.c
index d7606e2..87986a9 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -280,15 +280,7 @@ static void
 print_shader_time_line(const char *stage, const char *name,
int shader_num, uint64_t time, uint64_t total)
 {
-   int namelen = printf("%s ", stage);
-
-   while (namelen++ < 6)
-  printf(" ");
-
-   namelen += printf("%s ", name);
-
-   while (namelen++ < 13)
-  printf(" ");
+   printf("%-6s%-6s", stage, name);
 
if (shader_num != -1)
   printf("%4d: ", shader_num);
@@ -429,9 +421,9 @@ brw_report_shader_time(struct brw_context *brw)
}
 
printf("\n");
-   print_shader_time_line("total vs", "", -1, total_by_type[ST_VS], total);
-   print_shader_time_line("total fs8", "", -1, total_by_type[ST_FS8], total);
-   print_shader_time_line("total fs16", "", -1, total_by_type[ST_FS16], total);
+   print_shader_time_line("total", "vs", -1, total_by_type[ST_VS], total);
+   print_shader_time_line("total", "fs8", -1, total_by_type[ST_FS8], total);
+   print_shader_time_line("total", "fs16", -1, total_by_type[ST_FS16], total);
 }
 
 static void
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Haiku using the Wayland state_tracker?

2013-03-28 Thread Pekka Paalanen
On Wed, 27 Mar 2013 15:45:23 -0500
Alexander von Gluck IV  wrote:

> On 03/27/2013 2:37 am, Pekka Paalanen wrote:
> > On Tue, 26 Mar 2013 16:47:15 -0500
> > Alexander von Gluck IV  wrote:
> > 
> >> I've been hitting several brick walls working on the Haiku GL stuff.
> >> (mostly due to things being too complex for the time I have 
> >> available)
> >> 
> >> Given all the recent publicity for Wayland, I decided to look into 
> >> the
> >> wl_shm stuff.  From my point of view wl_shm looks *extremely* simple.
> >> 
> >> Does anyone see any reason *not* to use the wayland state_tracker for
> >> Haiku's GL rendering? The only possible down side I see is using EGL 
> >> vs
> >> OpenGL. (I admit I don't fully understand the pros and cons of EGL)
> > 
> > I'm not familiar with the "wayland state_tracker", so I can only
> > comment from the Wayland protocol perspective. I'm not sure what you
> > intend to do with wl_shm, either. Are you implementing Wayland
> > platform support?
> 
> To be honest, we're not related to Wayland in any way.  I'm looking for
> a simple interface between os and GL rendering. The Xorg interface to
> Mesa and Gallium always seemed like overkill for our needs,

Ah, well, I think this is not exactly the interface you are looking for.

The Wayland protocol interfaces, wl_shm and wl_drm, are only to
facilitate buffer passing between processes using the minimum amount of
information needed to be sent over the Wayland wire. Or rather to not
push pixels through the wire but handles and metadata.

To make that actually work, especially wl_drm, they rely on the OS
magic to let it happen. In our case, it is the kernel DRM, DRI2, and
probably some other acronyms I'm not completely familiar with. So what
you see in the Wayland protocol is just the skin of buffer passing, all
the crucial details are hidden below in the OS graphics stack, like how
to turn a GEM flink name into a buffer you can actually do something
with (or whatever).

I'm guessing that there is no magic way to make things work, other than
actually inventing your own magic to begin with, if it doesn't exist
yet.


Thanks,
pq

> > If you are only ever going to have software rendered GL, then I guess
> > you might use wl_shm. If you have any reason to believe you might ever
> > want hardware accelerated GL, then wl_shm won't work. (Actually, you
> > probably want to choose between wl_shm and something else according to
> > your renderer. Maybe.)
> 
> I noticed that the native Wayland code chooses dri or shm based on the
> rendering needs. Thats fine for us (although we don't have any of the
> dri stuff ported or wrapped yet)
> 
> > wl_shm basically deals with mmappable files, i.e. directly
> > CPU-accessible memory. Buffers suitable for hardware rendering or
> > texturing are often not CPU-accessible, or extremely slow for that.
> > Conversely, CPU-accessible memory is often not usable for GPU, or is
> > slow. And you really don't want to have extra copies between CPU and
> > GPU memory, especially just for buffer passing.
> > 
> > Mesa contains another Wayland protocol interface used for hardware
> > accelerated graphics buffers: wl_drm.
> > 
> > Also, EGL vs. OpenGL is like comparing a bucket to paint. EGL is just
> > one form of a bucket, that can give you OpenGL as the paint. There are
> > other buckets, and other paints, and you cannot use a bucket as paint,
> > nor paint as a bucket. Probably I just didn't understand what you are
> > actually comparing here. (and sorry for a bad analogue :-p)
> 
> This is actually a good example.  Our (Haiku's) GL Rendering with Mesa 
> swrast
> works fine at the moment. We also have an in-development Gallium driver
> (using llvmpipe or swpipe) *almost* working minus some on-screen stride 
> issues.
> The problem i'm trying to tackle is that by calling private Mesa and 
> Gallium
> functions externally.. the size of the paint can lid keeps changing and 
> keeping
> up between Mesa versions is consuming a lot of resources. (small 
> project trying
> to keep up with a large project, we have the same issue with Webkit)
> 
> > I have a blog post about Wayland, that is maybe not directly related 
> > to
> > your question, but might give some insight, I hope:
> > http://ppaalanen.blogspot.fi/2012/11/on-supporting-wayland-gl-clients-and.html
> 
> I'll take a look.  Thanks!
> 
>   -- Alex

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/8] i965/vs: Prepare split_virtual_grfs() for the presence of SENDs from GRFs.

2013-03-28 Thread Kenneth Graunke

On 03/19/2013 05:06 PM, Eric Anholt wrote:

---
  src/mesa/drivers/dri/i965/brw_vec4.cpp |   49 +---
  src/mesa/drivers/dri/i965/brw_vec4.h   |1 +
  2 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 3e628c8..36b5408 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -218,6 +218,13 @@ vec4_instruction::is_math()
   opcode == SHADER_OPCODE_INT_REMAINDER ||
   opcode == SHADER_OPCODE_POW);
  }
+
+bool
+vec4_instruction::is_send_from_grf()
+{
+   return false;
+}
+
  /**
   * Returns how many MRFs an opcode will write over.
   *
@@ -878,27 +885,49 @@ vec4_visitor::opt_register_coalesce()
   *
   * We initially create large virtual GRFs for temporary structures, arrays,
   * and matrices, so that the dereference visitor functions can add reg_offsets
- * to work their way down to the actual member being accessed.
+ * to work their way down to the actual member being accessed.  But when it
+ * comes to optimization, we'd like to treat each register as individual
+ * storage if possible.
   *
- * Unlike in the FS visitor, though, we have no SEND messages that return more
- * than 1 register.  We also don't do any array access in register space,
- * which would have required contiguous physical registers.  Thus, all those
- * large virtual GRFs can be split up into independent single-register virtual
- * GRFs, making allocation and optimization easier.
+ * So far, the only thing that might prevent splitting is a send message from
+ * a GRF on IVB.
   */
  void
  vec4_visitor::split_virtual_grfs()
  {
 int num_vars = this->virtual_grf_count;
 int new_virtual_grf[num_vars];
+   bool split_grf[num_vars];

 memset(new_virtual_grf, 0, sizeof(new_virtual_grf));

+   /* Try to split anything > 0 sized. */
+   for (int i = 0; i < num_vars; i++) {
+  if (this->virtual_grf_sizes[i] != 1)
+split_grf[i] = true;
+  else
+split_grf[i] = false;


This "if x then true else false" pattern is silly.  Please just do:

   split_grf[i] = this->virtual_grf_sizes[i] != 1;


+   }
+
+   /* Check that the instructions are compatible with the registers we're 
trying
+* to split.
+*/
+   foreach_list(node, &this->instructions) {
+  vec4_instruction *inst = (vec4_instruction *)node;
+
+  /* If there's a SEND message loading from a GRF on gen7+, it needs to be
+   * contiguous.  Assume that the GRF for the SEND is always in src[0].
+   */
+  if (inst->is_send_from_grf()) {
+split_grf[inst->src[0].reg] = false;


Bad whitespace here (tabs).


+  }
+   }
+
 /* Allocate new space for split regs.  Note that the virtual
  * numbers will be contiguous.
  */
 for (int i = 0; i < num_vars; i++) {
-  if (this->virtual_grf_sizes[i] == 1)
+  if (!split_grf[i])
   continue;

new_virtual_grf[i] = virtual_grf_alloc(1);
@@ -913,16 +942,14 @@ vec4_visitor::split_virtual_grfs()
 foreach_list(node, &this->instructions) {
vec4_instruction *inst = (vec4_instruction *)node;

-  if (inst->dst.file == GRF &&
- new_virtual_grf[inst->dst.reg] &&
+  if (inst->dst.file == GRF && split_grf[inst->dst.reg] &&
  inst->dst.reg_offset != 0) {
 inst->dst.reg = (new_virtual_grf[inst->dst.reg] +
  inst->dst.reg_offset - 1);
 inst->dst.reg_offset = 0;
}
for (int i = 0; i < 3; i++) {
-if (inst->src[i].file == GRF &&
-new_virtual_grf[inst->src[i].reg] &&
+if (inst->src[i].file == GRF && split_grf[inst->src[i].reg] &&
 inst->src[i].reg_offset != 0) {
inst->src[i].reg = (new_virtual_grf[inst->src[i].reg] +
inst->src[i].reg_offset - 1);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 9e700a9..e792ec0 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -194,6 +194,7 @@ public:

 bool is_tex();
 bool is_math();
+   bool is_send_from_grf();
 bool can_reswizzle_dst(int dst_writemask, int swizzle, int swizzle_mask);
 void reswizzle_dst(int dst_writemask, int swizzle);
  };



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] intel: provide a default QuerySamplesForFormat func

2013-03-28 Thread Chris Forbes
We claim to support ARB_internalformat_query on everything, but only
i965+ bothered to provide the hook -- so QueryInternalformativ() would
just explode (fail the assert in debug; segfault otherwise) on earlier
platforms.

Signed-off-by: Chris Forbes 
---
 src/mesa/drivers/dri/intel/intel_context.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/mesa/drivers/dri/intel/intel_context.c 
b/src/mesa/drivers/dri/intel/intel_context.c
index bf4045e..c0574c0 100644
--- a/src/mesa/drivers/dri/intel/intel_context.c
+++ b/src/mesa/drivers/dri/intel/intel_context.c
@@ -557,6 +557,18 @@ intelFinish(struct gl_context * ctx)
   drm_intel_bo_wait_rendering(intel->batch.last_bo);
 }
 
+static size_t
+intel_query_samples_for_format(struct gl_context *ctx, GLenum target,
+   GLenum internalFormat, int samples[16])
+{
+   (void) target;
+   (void) internalFormat;
+   (void) ctx;
+
+   samples[0] = 1;
+   return 1;
+}
+
 void
 intelInitDriverFunctions(struct dd_function_table *functions)
 {
@@ -576,6 +588,8 @@ intelInitDriverFunctions(struct dd_function_table 
*functions)
intelInitPixelFuncs(functions);
intelInitBufferObjectFuncs(functions);
intel_init_syncobj_functions(functions);
+
+   functions->QuerySamplesForFormat = intel_query_samples_for_format;
 }
 
 static bool
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] i965: Improved shader_time accuracy, using send-from-GRF

2013-03-28 Thread Kenneth Graunke

On 03/19/2013 05:05 PM, Eric Anholt wrote:

I've been experimenting with the simd16 replicated data message
(non-gen7-specific fast clears), and was bothered that shader_time wasn't
noting any time spent in our clear FS currently, so I wasn't totally sure
which apps I should be looking at.

Many apps won't spend any time in the clear FS since they don't clear
color, in which case the FS isn't dispatched.  But for those that do
(like minecraft), it would be nice to see how much is spent clearing.  I
also knew that we were significantly under-counting VS time by not
counting URB payload setup, which I'd worked on a patchset for
previously.

This series fixes those two problems.  Now we know that 7% of non-EOT
instruction time in minecraft is in the clear color setup (wow), and that
VSes are a bit of a bigger deal than we previously knew (36% of minecraft,
23% of ETQW, 7.5% of GLB2.7)

Branch is at shadertime-from-grfs of my tree.


This series is great!  Thanks so much for fixing this; it definitely 
gives more accurate times now.


Other than my proposed tidying of patch 2 and trivial comments on patch 
5, this series is:


Reviewed-by: Kenneth Graunke 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel: provide a default QuerySamplesForFormat func

2013-03-28 Thread Kenneth Graunke

On 03/28/2013 12:26 AM, Chris Forbes wrote:

We claim to support ARB_internalformat_query on everything, but only
i965+ bothered to provide the hook -- so QueryInternalformativ() would
just explode (fail the assert in debug; segfault otherwise) on earlier
platforms.

Signed-off-by: Chris Forbes 
---
  src/mesa/drivers/dri/intel/intel_context.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/src/mesa/drivers/dri/intel/intel_context.c 
b/src/mesa/drivers/dri/intel/intel_context.c
index bf4045e..c0574c0 100644
--- a/src/mesa/drivers/dri/intel/intel_context.c
+++ b/src/mesa/drivers/dri/intel/intel_context.c
@@ -557,6 +557,18 @@ intelFinish(struct gl_context * ctx)
drm_intel_bo_wait_rendering(intel->batch.last_bo);
  }

+static size_t
+intel_query_samples_for_format(struct gl_context *ctx, GLenum target,
+   GLenum internalFormat, int samples[16])
+{
+   (void) target;
+   (void) internalFormat;
+   (void) ctx;
+
+   samples[0] = 1;
+   return 1;
+}
+
  void
  intelInitDriverFunctions(struct dd_function_table *functions)
  {
@@ -576,6 +588,8 @@ intelInitDriverFunctions(struct dd_function_table 
*functions)
 intelInitPixelFuncs(functions);
 intelInitBufferObjectFuncs(functions);
 intel_init_syncobj_functions(functions);
+
+   functions->QuerySamplesForFormat = intel_query_samples_for_format;
  }

  static bool


Shouldn't this live in Core Mesa?  Presumably the other class drivers 
(radeon, r200, nouveau_vieux, and maybe swrast?) are broken in the same 
way...


--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Fix INTEL_DEBUG=shader_time for fragment shaders with discards.

2013-03-28 Thread Kenneth Graunke
"discard" instructions generate HALT instructions which jump to a final
HALT near the end of the shader.  Previously, fs_generator created this
final jump target when it saw the first FS_OPCODE_FB_WRITE, causing it
to jump right before the FB write epilogue.  This is normally good.

However, INTEL_DEBUG=shader_time also has an epilogue section which
records the final timestamp.  The frontend emits IR for this just before
FS_OPCODE_FB_WRITE.  Unfortunately, this led to the following ordering:

1. Shader Time Epilogue
2. Final HALT (where discards jump)
3. Framebuffer Write Epilogue

This meant that discarded pixels completely skipped the shader time
epilogue, causing no ending timestamp to be written.  This obviously
led to inaccurate results.

This patch adds a new FS_OPCODE_PLACEHOLDER_HALT in the IR stream just
before any epilogue sections.  This is where the final HALT should be
generated, and makes it easy to ensure the correct ordering:

1. Final HALT
2. Shader Time Epilogue
3. Framebuffer Write Epilogue

For shaders that don't discard, this opcode compiles away to nothing.
The scheduler adds barrier dependencies to make sure that it doesn't
get moved above any FS_OPCODE_DISCARD_JUMP instructions.

One 8-wide shader in GLBenchmark 2.7 dropped from 2291.67 Gcycles to
a mere 5.13 Gcycles.

Cc: Eric Anholt 
Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_defines.h|  1 +
 src/mesa/drivers/dri/i965/brw_fs.cpp   |  2 ++
 src/mesa/drivers/dri/i965/brw_fs_emit.cpp  | 13 +++--
 src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp |  3 +++
 src/mesa/drivers/dri/i965/brw_shader.cpp   |  3 +++
 5 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index b45e4a4..47dceac 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -732,6 +732,7 @@ enum opcode {
FS_OPCODE_PACK_HALF_2x16_SPLIT,
FS_OPCODE_UNPACK_HALF_2x16_SPLIT_X,
FS_OPCODE_UNPACK_HALF_2x16_SPLIT_Y,
+   FS_OPCODE_PLACEHOLDER_HALT,
 
VS_OPCODE_URB_WRITE,
VS_OPCODE_SCRATCH_READ,
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index a34d5e8..e6e4302 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2766,6 +2766,8 @@ fs_visitor::run()
   if (failed)
 return false;
 
+  emit(FS_OPCODE_PLACEHOLDER_HALT);
+
   emit_fb_writes();
 
   split_virtual_grfs();
diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
index 5db481c..28217d7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
@@ -102,12 +102,6 @@ fs_generator::generate_fb_write(fs_inst *inst)
struct brw_reg implied_header;
uint32_t msg_control;
 
-   /* Note that the jumps emitted to this point mean that the g0 ->
-* base_mrf setup must be inside of this function, so that we jump
-* to a point containing it.
-*/
-   patch_discard_jumps_to_fb_writes();
-
/* Header is 2 regs, g0 and g1 are the contents. g0 will be implied
 * move, here's g1.
 */
@@ -1342,6 +1336,13 @@ fs_generator::generate_code(exec_list *instructions)
  generate_unpack_half_2x16_split(inst, dst, src[0]);
  break;
 
+  case FS_OPCODE_PLACEHOLDER_HALT:
+ /* This is the place where the final HALT needs to be inserted if
+  * we've emitted any discards.  If not, this will emit no code.
+  */
+ patch_discard_jumps_to_fb_writes();
+ break;
+
   default:
 if (inst->opcode < (int) ARRAY_SIZE(opcode_descs)) {
_mesa_problem(ctx, "Unsupported opcode `%s' in FS",
diff --git a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
index c125928..74f7c67 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
@@ -469,6 +469,9 @@ instruction_scheduler::calculate_deps()
   schedule_node *n = (schedule_node *)node;
   fs_inst *inst = n->inst;
 
+  if (inst->opcode == FS_OPCODE_PLACEHOLDER_HALT)
+ add_barrier_deps(n);
+
   /* read-after-write deps. */
   for (int i = 0; i < 3; i++) {
 if (inst->src[i].file == GRF) {
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index e4392bd..066cf4e 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -471,6 +471,9 @@ brw_instruction_name(enum opcode op)
case FS_OPCODE_UNPACK_HALF_2x16_SPLIT_Y:
   return "unpack_half_2x16_split_y";
 
+   case FS_OPCODE_PLACEHOLDER_HALT:
+  return "placeholder_halt";
+
case VS_OPCODE_URB_WRITE:
   return "urb_wri

Re: [Mesa-dev] [PATCH] r600g: fix range handling for tgsi input/output declarations

2013-03-28 Thread Christian König

Am 27.03.2013 20:37, schrieb Vadim Girlin:

Signed-off-by: Vadim Girlin 
---
  src/gallium/drivers/r600/r600_shader.c | 19 +++
  1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 29facf7..d4c9c03 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -874,12 +874,12 @@ static int select_twoside_color(struct r600_shader_ctx 
*ctx, int front, int back
  static int tgsi_declaration(struct r600_shader_ctx *ctx)
  {
struct tgsi_full_declaration *d = &ctx->parse.FullToken.FullDeclaration;
-   unsigned i;
-   int r;
+   int r, i, j, count = d->Range.Last - d->Range.First + 1;
  
  	switch (d->Declaration.File) {

case TGSI_FILE_INPUT:
-   i = ctx->shader->ninput++;
+   i = ctx->shader->ninput;
+   ctx->shader->ninput += count;
ctx->shader->input[i].name = d->Semantic.Name;
ctx->shader->input[i].sid = d->Semantic.Index;
ctx->shader->input[i].interpolate = d->Interp.Interpolate;
@@ -903,9 +903,15 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx)
return r;
}
}
+   for (j = 1; j < count; ++j) {
+   memcpy(&ctx->shader->input[i + j], 
&ctx->shader->input[i],
+  sizeof(struct r600_shader_io));


Instead of memcpy, shouldn't an assignment do the trick here as well?


+   ctx->shader->input[i + j].gpr += j;
+   }
break;
case TGSI_FILE_OUTPUT:
-   i = ctx->shader->noutput++;
+   i = ctx->shader->noutput;
+   ctx->shader->noutput += count;
ctx->shader->output[i].name = d->Semantic.Name;
ctx->shader->output[i].sid = d->Semantic.Index;
ctx->shader->output[i].gpr = ctx->file_offset[TGSI_FILE_OUTPUT] + 
d->Range.First;
@@ -933,6 +939,11 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx)
break;
}
}
+   for (j = 1; j < count; ++j) {
+   memcpy(&ctx->shader->output[i + j], 
&ctx->shader->output[i],
+  sizeof(struct r600_shader_io));


Same here.


+   ctx->shader->output[i + j].gpr += j;
+   }
break;
case TGSI_FILE_CONSTANT:
case TGSI_FILE_TEMPORARY:


Christian.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallivm: cleanup the gs interface

2013-03-28 Thread Brian Paul

On 03/27/2013 05:31 AM, Zack Rusin wrote:

Instead of void pointers use a base interface.

Signed-off-by: Zack Rusin
---
  src/gallium/auxiliary/draw/draw_llvm.c  |   77 ---
  src/gallium/auxiliary/gallivm/lp_bld_tgsi.h |   25 
  src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c |   31 -
  3 files changed, 83 insertions(+), 50 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index 3ce48d8..efbcb04 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -64,6 +64,13 @@ draw_llvm_generate(struct draw_llvm *llvm, struct 
draw_llvm_variant *var,
 boolean elts);


+struct draw_gs_llvm_iface {
+   struct lp_build_tgsi_gs_iface base;
+
+   struct draw_gs_llvm_variant *variant;
+   LLVMValueRef input;
+};
+
  /**
   * Create LLVM type for struct draw_jit_texture
   */
@@ -1237,14 +1244,39 @@ clipmask_booli32(struct gallivm_state *gallivm,
 return ret;
  }

+static LLVMValueRef
+draw_gs_llvm_fetch_input(const struct lp_build_tgsi_gs_iface *gs_iface,
+ struct lp_build_tgsi_context * bld_base,
+ LLVMValueRef vertex_index,
+ LLVMValueRef attrib_index,
+ LLVMValueRef swizzle_index)
+{
+   const struct draw_gs_llvm_iface *gs =
+  (const struct draw_gs_llvm_iface *)gs_iface;


When I find myself making a cast like this several times, I usually 
make an inline cast-wrapper function, just to add a bit more type safety.




+   struct gallivm_state *gallivm = bld_base->base.gallivm;
+   LLVMBuilderRef builder = gallivm->builder;
+   LLVMValueRef indices[3];
+   LLVMValueRef res;
+
+   indices[0] = vertex_index;
+   indices[1] = attrib_index;
+   indices[2] = swizzle_index;
+
+   res = LLVMBuildGEP(builder, gs->input, indices, 3, "");
+   res = LLVMBuildLoad(builder, res, "");
+
+   return res;
+}
+
  static void
-draw_gs_llvm_emit_vertex(struct lp_build_tgsi_context * bld_base,
+draw_gs_llvm_emit_vertex(const struct lp_build_tgsi_gs_iface *gs_base,
+ struct lp_build_tgsi_context * bld_base,
   LLVMValueRef (*outputs)[4],
- LLVMValueRef emitted_vertices_vec,
- void *user_data)
+ LLVMValueRef emitted_vertices_vec)
  {
-   struct draw_gs_llvm_variant *variant =
-  (struct draw_gs_llvm_variant *)user_data;
+   const struct draw_gs_llvm_iface *gs_iface =
+  (const struct draw_gs_llvm_iface *)gs_base;
+   struct draw_gs_llvm_variant *variant = gs_iface->variant;
 struct gallivm_state *gallivm = variant->gallivm;
 LLVMBuilderRef builder = gallivm->builder;
 struct lp_type gs_type = bld_base->base.type;
@@ -1272,13 +1304,14 @@ draw_gs_llvm_emit_vertex(struct lp_build_tgsi_context * 
bld_base,
  }

  static void
-draw_gs_llvm_end_primitive(struct lp_build_tgsi_context * bld_base,
+draw_gs_llvm_end_primitive(const struct lp_build_tgsi_gs_iface *gs_base,
+   struct lp_build_tgsi_context * bld_base,
 LLVMValueRef verts_per_prim_vec,
-   LLVMValueRef emitted_prims_vec,
-   void *user_data)
+   LLVMValueRef emitted_prims_vec)
  {
-   struct draw_gs_llvm_variant *variant =
-  (struct draw_gs_llvm_variant *)user_data;
+   const struct draw_gs_llvm_iface *gs_iface =
+  (const struct draw_gs_llvm_iface *)gs_base;
+   struct draw_gs_llvm_variant *variant = gs_iface->variant;
 struct gallivm_state *gallivm = variant->gallivm;
 LLVMBuilderRef builder = gallivm->builder;
 LLVMValueRef prim_lengts_ptr =
@@ -1301,13 +1334,14 @@ draw_gs_llvm_end_primitive(struct lp_build_tgsi_context 
* bld_base,
  }

  static void
-draw_gs_llvm_epilogue(struct lp_build_tgsi_context * bld_base,
+draw_gs_llvm_epilogue(const struct lp_build_tgsi_gs_iface *gs_base,
+  struct lp_build_tgsi_context * bld_base,
LLVMValueRef total_emitted_vertices_vec,
-  LLVMValueRef emitted_prims_vec,
-  void *user_data)
+  LLVMValueRef emitted_prims_vec)
  {
-   struct draw_gs_llvm_variant *variant =
-  (struct draw_gs_llvm_variant *)user_data;
+   const struct draw_gs_llvm_iface *gs_iface =
+  (const struct draw_gs_llvm_iface *)gs_base;
+   struct draw_gs_llvm_variant *variant = gs_iface->variant;
 struct gallivm_state *gallivm = variant->gallivm;
 LLVMBuilderRef builder = gallivm->builder;
 LLVMValueRef emitted_verts_ptr =
@@ -1867,7 +1901,7 @@ draw_gs_llvm_generate(struct draw_llvm *llvm,
 struct lp_bld_tgsi_system_values system_values;
 struct lp_type gs_type;
 unsigned i;
-   struct lp_build_tgsi_gs_iface gs_iface;
+   struct draw_gs_llvm_iface gs_iface;
 const struct tgsi_token *tokens = variant

[Mesa-dev] [Bug 44618] Cross-compilation broken by glsl builtin_compiler

2013-03-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=44618

Pierre Ossman  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #18 from Pierre Ossman  ---
Still broken I'm afraid:

  CXXLD  builtin_compiler
/usr/lib/gcc/sparc-sun-solaris2.10/4.5.3/libstdc++.so: could not read symbols:
File in wrong format

This is being snuck in via liblibglslcore.la:

$ grep ^depend libglslcore.la 
dependency_libs=' /usr/lib/gcc/sparc-sun-solaris2.10/4.5.3/libstdc++.la'

(host system is i686 linux)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 44618] Cross-compilation broken by glsl builtin_compiler

2013-03-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=44618

--- Comment #19 from Pierre Ossman  ---
This seems to be coming from deep inside libtool. I guess a separate libtool
needs to be generated for the host CC/CXX. Unfortunately I don't see any easy
way of doing that.

However, there is a simpler approach. Neither of the libraries in
builtin_compiler have any real dependencies. So by changing them libtool
libraries to normal ones, it should solve things neatly.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 44618] Cross-compilation broken by glsl builtin_compiler

2013-03-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=44618

--- Comment #20 from Pierre Ossman  ---
Created attachment 77156
  --> https://bugs.freedesktop.org/attachment.cgi?id=77156&action=edit
MesaLib-9.1.1-cross.patch

Suggested fix.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 62868] New: solaris build broken with missing ffsll

2013-03-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=62868

  Priority: medium
Bug ID: 62868
  Assignee: mesa-dev@lists.freedesktop.org
   Summary: solaris build broken with missing ffsll
  Severity: normal
Classification: Unclassified
OS: Solaris
  Reporter: pierre-bugzi...@ossman.eu
  Hardware: Other
Status: NEW
   Version: 9.1
 Component: Mesa core
   Product: Mesa

Created attachment 77157
  --> https://bugs.freedesktop.org/attachment.cgi?id=77157&action=edit
MesaLib-9.1.1-ffs.patch

Mesa uses ffsll, which isn't present on Solaris. It is however present as a
builtin in gcc, which is sufficient. But for unclear reasons the Mesa headers
will only properly use that on a specific list of platforms, instead of all gcc
compliations.

Attached patch removes this restrictive list and uses the builtins whenever gcc
is used and the native versions are missing.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 62868] solaris build broken with missing ffsll

2013-03-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=62868

--- Comment #1 from Alan Coopersmith  ---
For what it's worth Solaris 11 and later have ffsll(), but not older releases.
The patch to use the gcc builtins when possible makes sense to me.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 47607] [advocacy] Make Anomaly Warzone Earth work with Mesa

2013-03-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=47607

--- Comment #7 from Alexander Monakov  ---
Created attachment 77166
  --> https://bugs.freedesktop.org/attachment.cgi?id=77166&action=edit
LD_PRELOAD hack to renumber uniform locations in a dense manner

To follow up to my previous comment, attaching a quick'n'dirty LD_PRELOAD
interposer that avoids exposing uniform locations such as 655360 to the game. 
With this, the game doesn't segfault, but I think I'm getting rendering errors
in map view (IVB, Mesa-9.0).

(if anyone wants to try: compile command is at the first line of the .cpp file,
to use: export LD_PRELOAD=/path/to/compiled/anomalyhack.so
./AnomalyWarzoneEarth)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Geometry shader update

2013-03-28 Thread Paul Berry
It's been about two months since I talked on the mailing list about my
progress in implementing geometry shaders (
http://lists.freedesktop.org/archives/mesa-dev/2013-January/033725.html),
so it seems like a good time to give an update.  Also I have some happy
progress to report :)

In mid-February I added about 1800 geometry shader tests cases to Piglit.
Those tests exercise basic geometry shader functionality (input a single
triangle, output a single triangle), and then exhaustively test all of the
built-in GLSL functions and operators inside of the geometry shader.

Then I rebased Bryan Cain's front-end patches onto Mesa master, made a
minor bug fix or two, and spent the next month and a half beginning to
implement a geometry shader back-end for i965 Gen7 (Ivy Bridge).  I'm happy
to report that as of this morning, my work-in-progress back-end, combined
with Bryan's front-end, passes all the piglit tests I introduced in
mid-February.  You can find my patch series in branch "gs" of git://
github.com/stereotype441/mesa.git.  Warning: that branch is *highly*
volatile and in need of some tender loving rebasing--it's not ready for
review yet.  If anyone needs a less volatile branch to base some work on,
please let me know so we can coordinate.

This is substantial progress and definitely a reason to celebrate, but
there is still a lot of work to do before we can expose
ARB/EXT_geometry_shader4 support on i965 Gen7, mostly in the back end.
Here's an incomplete list that hopefully touches all the major points (I
have a much longer list in my notes):

- gl_PrimitiveIDIn support (requires special back-end code because it is
not part of the vertex URB entry on i965)
- gl_PointSize and, gl_ClipVertex support (i965 emits special VS code to
handle these values--when a GS is present, that code has to be disabled in
the VS)
- Sampler support
- Interactions with transform feedback, particularly the
GL_PRIMITIVES_GENERATED and GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN
queries (I believe Kenneth Graunke may be planning to work on this)
- gl_Layer support (includes both back-end and front-end work.  I believe
Jordan Justen is planning to work on this)
- Support for EndPrimitive()
- Precompilation
- SINGLE dispatch mode (an alternate SIMD mode for i965 Gen7 that needs to
be used for complex geometry shaders to avoid running out of registers)

I don't want to land any patches until there are adequate piglit tests for
them, so my next order of business is to write additional tests to exercise
corner cases of the ARB/EXT_geometry_shader4 spec.  Once those are
complete, I will start sending out my rebase of Bryan Cain's front-end
patch series for review.  After Bryan's series has landed, I'll start
sending out the i965 Gen7 back-end for review.

If all goes well, my hope is that Kenneth's work on transform feedback and
Jordan's work on gl_Layer will finish up at around the same time as I'm
landing my i965 Gen7 back-end work.  Once all that happens, we'll be able
to turn on the ARB/EXT_geometry_shader4 extension on i965 Gen7.

Once that extension is on, I plan to move on to Gen6 (Sandy Bridge)
support, and support for GLSL 1.50-style geometry shaders.

Cheers,

Paul
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: handle STATE_CURRENT_ATTRIB_MAYBE_VP_CLAMPED for parameter printing

2013-03-28 Thread Marek Olšák
---
 src/mesa/program/prog_statevars.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/program/prog_statevars.c 
b/src/mesa/program/prog_statevars.c
index 5a35079..ccc590d 100644
--- a/src/mesa/program/prog_statevars.c
+++ b/src/mesa/program/prog_statevars.c
@@ -871,6 +871,9 @@ append_token(char *dst, gl_state_index k)
case STATE_CURRENT_ATTRIB:
   append(dst, "current");
   break;
+   case STATE_CURRENT_ATTRIB_MAYBE_VP_CLAMPED:
+  append(dst, "currentAttribMaybeVPClamped");
+  break;
case STATE_NORMAL_SCALE:
   append(dst, "normalScale");
   break;
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Piglit] [RFC] ARB_separate_shader_objects test v2

2013-03-28 Thread gregory hainaut
On Tue, 26 Mar 2013 16:58:17 -0700
Jordan Justen  wrote:

> Tested with NVidia binary:
> fail :: spec/ARB_separate_shader_objects/sso-GetProgramPipelineiv
> fail :: spec/ARB_separate_shader_objects/sso-mix_pipeline_useprogram
> crash :: spec/ARB_separate_shader_objects/sso-IsProgramPipeline
> (results piglit output attached)
> 
> On Tue, Mar 26, 2013 at 12:56 PM, gregory hainaut
>  wrote:
> > Plese find below my (really too big) patch.
> 
> It's not really big at all, but I think the piglit-shader changes need
> to be a separate commit/patch.
> 
> -Jordan

Thanks very much for your test.

V2:
* split the patch properly with git :)
* sso-GetProgramPipelineiv: fix a VS linker error. Print current test
  step.
* sso-mix_pipeline_useprogram: fix a wrong expected. Now the test must
  work on nvidia :)
* sso-IsProgramPipeline: Print more current test step
=> my guess is  nvidia choke on glDeleteProgramPipelines( -1,
   id);
=> Nvidia as AMD doesn't follow the spec (not very serious)

Gregory
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] piglit util: new functions piglit_program_pipeline_check_status/quiet

2013-03-28 Thread gregory hainaut
Equivalent to piglit_link_check_status/quiet but with program object pipeline
---
 tests/util/piglit-shader.c |   50 
 tests/util/piglit-shader.h |2 ++
 2 files changed, 52 insertions(+)

diff --git a/tests/util/piglit-shader.c b/tests/util/piglit-shader.c
index c67e58a..d7a0266 100644
--- a/tests/util/piglit-shader.c
+++ b/tests/util/piglit-shader.c
@@ -213,6 +213,45 @@ link_check_status(GLint prog, FILE *output)
return ok;
 }
 
+/* Same function as above but for program pipeline */
+static GLboolean
+program_pipeline_check_status(GLuint pipeline, FILE *output)
+{
+   GLchar *info = NULL;
+   GLint size;
+   GLint ok;
+
+   piglit_require_extension("GL_ARB_separate_shader_objects");
+
+   glValidateProgramPipeline(pipeline);
+   glGetProgramPipelineiv(pipeline, GL_VALIDATE_STATUS, &ok);
+
+   /* Some drivers return a size of 1 for an empty log.  This is the size
+* of a log that contains only a terminating NUL character.
+*/
+   glGetProgramPipelineiv(pipeline, GL_INFO_LOG_LENGTH, &size);
+   if (size > 1) {
+   info = malloc(size);
+   glGetProgramPipelineInfoLog(pipeline, size, NULL, info);
+   }
+
+   if (!ok) {
+   fprintf(output, "Failed to validate the pipeline: %s\n",
+   (info != NULL) ? info : "");
+   }
+   else if (0 && info != NULL) {
+   /* Enable this to get extra linking info.
+* Even if there's no link errors, the info log may
+* have some remarks.
+*/
+   printf("Pipeline validataion warning: %s\n", info);
+   }
+
+   free(info);
+
+   return ok;
+}
+
 GLboolean
 piglit_link_check_status(GLint prog)
 {
@@ -234,6 +273,17 @@ piglit_link_check_status_quiet(GLint prog)
return link_check_status(prog, stdout);
 }
 
+GLboolean
+piglit_program_pipeline_check_status(GLuint pipeline)
+{
+   return program_pipeline_check_status(pipeline, stderr);
+}
+
+GLboolean
+piglit_program_pipeline_check_status_quiet(GLuint pipeline)
+{
+   return program_pipeline_check_status(pipeline, stdout);
+}
 
 GLint piglit_link_simple_program(GLint vs, GLint fs)
 {
diff --git a/tests/util/piglit-shader.h b/tests/util/piglit-shader.h
index 12cf731..0bb7792 100644
--- a/tests/util/piglit-shader.h
+++ b/tests/util/piglit-shader.h
@@ -34,6 +34,8 @@ GLuint piglit_compile_shader(GLenum target, const char 
*filename);
 GLuint piglit_compile_shader_text(GLenum target, const char *text);
 GLboolean piglit_link_check_status(GLint prog);
 GLboolean piglit_link_check_status_quiet(GLint prog);
+GLboolean piglit_program_pipeline_check_status(GLuint pipeline);
+GLboolean piglit_program_pipeline_check_status_quiet(GLuint pipeline);
 GLint piglit_link_simple_program(GLint vs, GLint fs);
 
 #if defined(PIGLIT_USE_OPENGL_ES1)
-- 
1.7.10.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] add 3 news tests for arb_separate_shader_objects

2013-03-28 Thread gregory hainaut
* new test IsProgramPipeline: check the Pipeline object state creation. 
GenPipeline
only reserved name. I also take the opportunity to test negative value
on glGenProgramPipelines and glDeleteProgramPipelines.
Note: FGLRX don't follow the spec besides crash on glGenProgramPipelines 
because of -1...
Note: Nvidia seems to crash on glDeleteProgramPipelines because of -1...

* new test mix_pipeline_useprogram: the goal is to test the state mix of
glUseProgram/glBindProgramPipeline/glActiveProgram. Ian quote:
 "In this case, either the UseProgram state or the BindProgramPipeline state.
  If UseProgram sets a non-zero program, that state is used.  Otherwise the
  BindProgramPipeline state is used.In this case, I think AMD's
  behavior is incorrect."

Note: Nvidia seems to be fine.

* new test GetProgramPipelineiv: bind various stages (including geometry and 
tesselation
to the pipeline and check the result with GetProgramPipelineiv. I take also the 
opportunity to use
glCreateShaderProgramv.  Note: I got an GL_INVALID_VALUE when 
glActiveShaderProgram is
called with a program that isn't bound to any stage to the pipeline. For me 
there isn't such limitation in the spec
but I could be wrong.
Spec quote:
"An INVALID_VALUE error is generated if program is not the name of ei-
ther a program or shader object.
An INVALID_OPERATION error is generated if program is the name of a
shader object."

General note: I duplicate the vertex shader for differents GL version. On GLSL 
150+ built-in must be
redeclared. However previous version doesn't support in/out block so it is up 
to the GLSL compiler to
include them inplicitely. I try to force on older version of GLSL but it 
doesn't work on FGLRX and still
need the redeclaration...
---
 tests/all.tests|7 +
 tests/spec/CMakeLists.txt  |1 +
 .../arb_separate_shader_objects/CMakeLists.gl.txt  |   14 +
 .../arb_separate_shader_objects/CMakeLists.txt |1 +
 .../GetProgramPipelineiv.c |  279 +++
 .../IsProgramPipeline.c|  113 ++
 .../mix_pipeline_useprogram.c  |  361 
 7 files changed, 776 insertions(+)
 create mode 100644 tests/spec/arb_separate_shader_objects/CMakeLists.gl.txt
 create mode 100644 tests/spec/arb_separate_shader_objects/CMakeLists.txt
 create mode 100644 
tests/spec/arb_separate_shader_objects/GetProgramPipelineiv.c
 create mode 100644 tests/spec/arb_separate_shader_objects/IsProgramPipeline.c
 create mode 100644 
tests/spec/arb_separate_shader_objects/mix_pipeline_useprogram.c

diff --git a/tests/all.tests b/tests/all.tests
index 20120d1..2cbf3c4 100644
--- a/tests/all.tests
+++ b/tests/all.tests
@@ -1079,6 +1079,13 @@ add_concurrent_test(arb_occlusion_query, 
'occlusion_query_meta_fragments')
 add_concurrent_test(arb_occlusion_query, 'occlusion_query_meta_no_fragments')
 add_concurrent_test(arb_occlusion_query, 'occlusion_query_order')
 
+# Group ARB_separate_shader_objects
+arb_separate_shader_objects = Group()
+spec['ARB_separate_shader_objects'] = arb_separate_shader_objects
+arb_separate_shader_objects['sso-GetProgramPipelineiv'] = 
concurrent_test('arb_separate_shader_object-GetProgramPipelineiv')
+arb_separate_shader_objects['sso-IsProgramPipeline'] = 
concurrent_test('arb_separate_shader_object-IsProgramPipeline')
+arb_separate_shader_objects['sso-mix_pipeline_useprogram'] = 
concurrent_test('arb_separate_shader_object-mix_pipeline_useprogram')
+
 # Group ARB_sampler_objects
 arb_sampler_objects = Group()
 spec['ARB_sampler_objects'] = arb_sampler_objects
diff --git a/tests/spec/CMakeLists.txt b/tests/spec/CMakeLists.txt
index 18b1d37..97efd36 100644
--- a/tests/spec/CMakeLists.txt
+++ b/tests/spec/CMakeLists.txt
@@ -15,6 +15,7 @@ add_subdirectory (arb_robustness)
 add_subdirectory (arb_sampler_objects)
 add_subdirectory (arb_seamless_cube_map)
 add_subdirectory (amd_seamless_cubemap_per_texture)
+add_subdirectory (arb_separate_shader_objects)
 add_subdirectory (arb_shader_texture_lod/execution)
 add_subdirectory (arb_shader_objects)
 add_subdirectory (arb_sync)
diff --git a/tests/spec/arb_separate_shader_objects/CMakeLists.gl.txt 
b/tests/spec/arb_separate_shader_objects/CMakeLists.gl.txt
new file mode 100644
index 000..e651956
--- /dev/null
+++ b/tests/spec/arb_separate_shader_objects/CMakeLists.gl.txt
@@ -0,0 +1,14 @@
+include_directories(
+   ${GLEXT_INCLUDE_DIR}
+   ${OPENGL_INCLUDE_PATH}
+)
+
+link_libraries (
+   piglitutil_${piglit_target_api}
+   ${OPENGL_gl_LIBRARY}
+   ${OPENGL_glu_LIBRARY}
+)
+
+piglit_add_executable (arb_separate_shader_object-IsProgramPipeline 
IsProgramPipeline.c)
+piglit_add_executable (arb_separate_shader_object-GetProgramPipelineiv 
GetProgramPipelineiv.c)
+piglit_add_executable (arb_separate_shader_object-mix_pipeline_useprogram 
mix_pipeline_useprogram.c)
diff --git a/tests/spec/arb_separate_shader_objects/CMak

[Mesa-dev] [PATCH 3/3] update EXT_transform_feedback error detection

2013-03-28 Thread gregory hainaut
program pipeline add new INVALID_OPERATION (spec chapter 13.2.2)

Note: FGLRX don't report any of the expected errors...
---
 tests/all.tests|4 +-
 tests/spec/ext_transform_feedback/api-errors.c |   84 +++-
 2 files changed, 85 insertions(+), 3 deletions(-)

diff --git a/tests/all.tests b/tests/all.tests
index 2cbf3c4..cd759b5 100644
--- a/tests/all.tests
+++ b/tests/all.tests
@@ -1906,7 +1906,9 @@ for mode in ['interleaved_ok_base', 
'interleaved_ok_range',
  'bind_range_offset_2', 'bind_range_offset_3',
  'bind_range_offset_5', 'bind_offset_offset_1',
  'bind_offset_offset_2', 'bind_offset_offset_3',
- 'bind_offset_offset_5', 'not_a_program']:
+ 'bind_offset_offset_5', 'not_a_program',
+ 'useprogstage_noactive', 'useprogstage_active',
+ 'bind_pipeline']:
 test_name = 'api-errors {0}'.format(mode)
 ext_transform_feedback[test_name] = concurrent_test(
 'ext_transform_feedback-{0}'.format(test_name))
diff --git a/tests/spec/ext_transform_feedback/api-errors.c 
b/tests/spec/ext_transform_feedback/api-errors.c
index 04470b2..ba4fff8 100644
--- a/tests/spec/ext_transform_feedback/api-errors.c
+++ b/tests/spec/ext_transform_feedback/api-errors.c
@@ -78,6 +78,9 @@ enum test_mode {
BIND_BAD_SIZE,
BIND_BAD_OFFSET,
NOT_A_PROGRAM,
+   USEPROGSTAGE_ACTIVE,
+   USEPROGSTAGE_NOACTIVE,
+   BIND_PIPELINE
 };
 
 enum bind_mode {
@@ -97,6 +100,32 @@ static const char *vstext =
"  gl_Position = vec4(1.0);\n"
"}\n";
 
+static const char *vstext_sep_150 =
+   "#extension GL_ARB_separate_shader_objects : enable\n"
+   "out gl_PerVertex {\n"
+   "vec4 gl_Position;\n"
+   "};\n"
+   "varying vec4 foo;\n"
+   "varying vec4 bar;\n"
+   "\n"
+   "void main()\n"
+   "{\n"
+   "  foo = vec4(1.0);\n"
+   "  bar = vec4(1.0);\n"
+   "  gl_Position = vec4(1.0);\n"
+   "}\n";
+static const char *vstext_sep_140 =
+   "#extension GL_ARB_separate_shader_objects : enable\n"
+   "varying vec4 foo;\n"
+   "varying vec4 bar;\n"
+   "\n"
+   "void main()\n"
+   "{\n"
+   "  foo = vec4(1.0);\n"
+   "  bar = vec4(1.0);\n"
+   "  gl_Position = vec4(1.0);\n"
+   "}\n";
+
 static const char *varyings[] = { "foo", "bar" };
 
 static struct test_desc
@@ -151,6 +180,10 @@ static struct test_desc
{ "bind_offset_offset_3",BIND_BAD_OFFSET,  3, OFFSET, 
GL_INTERLEAVED_ATTRIBS, 1 },
{ "bind_offset_offset_5",BIND_BAD_OFFSET,  5, OFFSET, 
GL_INTERLEAVED_ATTRIBS, 1 },
{ "not_a_program",   NOT_A_PROGRAM,0, BASE,   
GL_INTERLEAVED_ATTRIBS, 1 },
+   { "useprogstage_noactive",   USEPROGSTAGE_NOACTIVE,0, BASE,   
GL_INTERLEAVED_ATTRIBS, 1 },
+   { "useprogstage_active", USEPROGSTAGE_ACTIVE,  0, BASE,   
GL_INTERLEAVED_ATTRIBS, 1 },
+   { "bind_pipeline",   BIND_PIPELINE,0, BASE,   
GL_INTERLEAVED_ATTRIBS, 1 },
+
 };
 
 static void
@@ -186,6 +219,7 @@ do_test(const struct test_desc *test)
 {
GLuint vs;
GLuint progs[2];
+   GLuint pipes[2];
GLuint bufs[NUM_BUFFERS];
float initial_xfb_buffer_contents[XFB_BUFFER_SIZE];
GLboolean pass = GL_TRUE;
@@ -193,6 +227,10 @@ do_test(const struct test_desc *test)
int num_varyings = test->mode == NO_VARYINGS ? 0 : test->num_buffers;
GLint max_separate_attribs;
 
+   if (test->mode == USEPROGSTAGE_ACTIVE || test->mode == 
USEPROGSTAGE_NOACTIVE || test->mode == BIND_PIPELINE) {
+   piglit_require_extension("GL_ARB_separate_shader_objects");
+   }
+
glGetIntegerv(GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS,
  &max_separate_attribs);
printf("MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTIBS=%i\n",
@@ -200,7 +238,19 @@ do_test(const struct test_desc *test)
 
printf("Compile vertex shader\n");
vs = piglit_compile_shader_text(GL_VERTEX_SHADER, vstext);
-   if (test->mode == NOT_A_PROGRAM) {
+   if (test->mode == USEPROGSTAGE_ACTIVE || test->mode == 
USEPROGSTAGE_NOACTIVE || test->mode == BIND_PIPELINE) {
+   /* Note, we can't use glCreateShaderProgramv because the setup 
of transform feedback
+* must be done before linking
+*/
+   if (piglit_get_gl_version() >= 32)
+   vs = piglit_compile_shader_text(GL_VERTEX_SHADER, 
vstext_sep_150);
+   else
+   vs = piglit_compile_shader_text(GL_VERTEX_SHADER, 
vstext_sep_140);
+   progs[0] = glCreateProgram();
+   glProgramParameteri(progs[0], GL_PROGRAM_SEPARABLE, GL_TRUE);
+   glAttachShader(progs[0], vs);
+
+   } else if (test->mode == NOT_A_PROGRAM) {
printf("Create a program and then delete it\n");
   

Re: [Mesa-dev] [PATCH 3/3] update EXT_transform_feedback error detection

2013-03-28 Thread Brian Paul
These piglit patches should be posted to the piglit mailing list, not 
mesa-dev.  Thanks.


-Brian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] R600: Emit CF_ALU and use true kcache register.

2013-03-28 Thread Vincent Lejeune
---
 lib/Target/R600/AMDGPU.h   |   1 +
 lib/Target/R600/AMDGPUTargetMachine.cpp|   1 +
 lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp |  10 +-
 lib/Target/R600/R600EmitClauseMarkers.cpp  | 250 +
 lib/Target/R600/R600Instructions.td|  79 ++-
 lib/Target/R600/R600RegisterInfo.td|  63 ++
 test/CodeGen/R600/kcache-fold.ll   |   2 +-
 7 files changed, 395 insertions(+), 11 deletions(-)
 create mode 100644 lib/Target/R600/R600EmitClauseMarkers.cpp

diff --git a/lib/Target/R600/AMDGPU.h b/lib/Target/R600/AMDGPU.h
index e099a9f..3cd792a 100644
--- a/lib/Target/R600/AMDGPU.h
+++ b/lib/Target/R600/AMDGPU.h
@@ -23,6 +23,7 @@ class AMDGPUTargetMachine;
 // R600 Passes
 FunctionPass* createR600KernelParametersPass(const DataLayout *TD);
 FunctionPass *createR600ExpandSpecialInstrsPass(TargetMachine &tm);
+FunctionPass *createR600EmitClauseMarkers(TargetMachine &tm);
 
 // SI Passes
 FunctionPass *createSIAnnotateControlFlowPass();
diff --git a/lib/Target/R600/AMDGPUTargetMachine.cpp 
b/lib/Target/R600/AMDGPUTargetMachine.cpp
index 0185747..45b1be0 100644
--- a/lib/Target/R600/AMDGPUTargetMachine.cpp
+++ b/lib/Target/R600/AMDGPUTargetMachine.cpp
@@ -151,6 +151,7 @@ bool AMDGPUPassConfig::addPreEmitPass() {
   if (ST.device()->getGeneration() <= AMDGPUDeviceInfo::HD6XXX) {
 addPass(createAMDGPUCFGPreparationPass(*TM));
 addPass(createAMDGPUCFGStructurizerPass(*TM));
+addPass(createR600EmitClauseMarkers(*TM));
 addPass(createR600ExpandSpecialInstrsPass(*TM));
 addPass(&FinalizeMachineBundlesID);
   } else {
diff --git a/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp 
b/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
index a945fe9..1bf87fc 100644
--- a/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
+++ b/lib/Target/R600/MCTargetDesc/R600MCCodeEmitter.cpp
@@ -101,7 +101,8 @@ enum InstrTypes {
   INSTR_FC,
   INSTR_NATIVE,
   INSTR_VTX,
-  INSTR_EXPORT
+  INSTR_EXPORT,
+  INSTR_CFALU
 };
 
 enum FCInstr {
@@ -251,6 +252,13 @@ void R600MCCodeEmitter::EncodeInstruction(const MCInst 
&MI, raw_ostream &OS,
   Emit(Inst, OS);
   break;
 }
+case AMDGPU::CF_ALU:
+case AMDGPU::CF_ALU_PUSH_BEFORE: {
+  uint64_t Inst = getBinaryCodeForInstr(MI, Fixups);
+  EmitByte(INSTR_CFALU, OS);
+  Emit(Inst, OS);
+  break;
+}
 
 default:
   EmitALUInstr(MI, Fixups, OS);
diff --git a/lib/Target/R600/R600EmitClauseMarkers.cpp 
b/lib/Target/R600/R600EmitClauseMarkers.cpp
new file mode 100644
index 000..3c0e86e
--- /dev/null
+++ b/lib/Target/R600/R600EmitClauseMarkers.cpp
@@ -0,0 +1,250 @@
+//===-- R600EmitClauseMarkers.cpp - Emit CF_ALU 
---===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===--===//
+//
+/// \file
+/// Add CF_ALU. R600 Alu instructions are grouped in clause which can hold
+/// 128 Alu instructions ; these instructions can access up to 4 prefetched
+/// 4 lines of 16 registers from constant buffers. Such ALU clauses are
+/// initiated by CF_ALU instructions.
+//===--===//
+
+#include "AMDGPU.h"
+#include "R600Defines.h"
+#include "R600InstrInfo.h"
+#include "R600MachineFunctionInfo.h"
+#include "R600RegisterInfo.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+
+namespace llvm {
+
+class R600EmitClauseMarkersPass : public MachineFunctionPass {
+
+private:
+  static char ID;
+  const R600InstrInfo *TII;
+
+  unsigned OccupiedDwords(MachineInstr *MI) const {
+switch (MI->getOpcode()) {
+case AMDGPU::INTERP_PAIR_XY:
+case AMDGPU::INTERP_PAIR_ZW:
+case AMDGPU::INTERP_VEC_LOAD:
+case AMDGPU::DOT4_eg_pseudo:
+case AMDGPU::DOT4_r600_pseudo:
+  return 4;
+case AMDGPU::KILL:
+  return 0;
+default:
+  break;
+}
+
+if(TII->isVector(*MI) ||
+TII->isCubeOp(MI->getOpcode()) ||
+TII->isReductionOp(MI->getOpcode()))
+  return 4;
+
+unsigned NumLiteral = 0;
+for (MachineInstr::mop_iterator It = MI->operands_begin(),
+E = MI->operands_end(); It != E; ++It) {
+  MachineOperand &MO = *It;
+  if (MO.isReg() && MO.getReg() == AMDGPU::ALU_LITERAL_X)
+++NumLiteral;
+}
+return 1 + NumLiteral;
+  }
+
+  bool isALU(const MachineInstr *MI) const {
+if (TII->isALUInstr(MI->getOpcode()))
+  return true;
+if (TII->isVector(*MI) || TII->isCubeOp(MI->getOpcode()))
+  return true;
+switch (MI->getOpcode()) {
+case AMDGPU::INTERP_PAIR_XY:
+case AMDGPU::INTERP_PAIR_ZW:
+case AMDGPU::INTERP_VEC_LOAD:
+case AMDGPU::COPY:
+c

Re: [Mesa-dev] [PATCH 0/4] Begin some ir_dereference_array-of-a-vector rework

2013-03-28 Thread Jordan Justen
Reviewed-by: Jordan Justen 

On Wed, Mar 27, 2013 at 9:30 AM, Ian Romanick  wrote:
> This is the first of three patch series that I'm going to send related
> to reworking ir_dereference_array of vectors.  The final series end with
> ir_dereference_array of vectors being completely removed from the IR.
>
> The first series is a set of fixes for bugs that I discovered along the
> way.  The first two are definitely candidates for the stable branches.
> The second two might be, but I could be convinced either way.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: handle STATE_CURRENT_ATTRIB_MAYBE_VP_CLAMPED for parameter printing

2013-03-28 Thread Brian Paul
On Thu, Mar 28, 2013 at 11:18 AM, Marek Olšák  wrote:
> ---
>  src/mesa/program/prog_statevars.c |3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/src/mesa/program/prog_statevars.c 
> b/src/mesa/program/prog_statevars.c
> index 5a35079..ccc590d 100644
> --- a/src/mesa/program/prog_statevars.c
> +++ b/src/mesa/program/prog_statevars.c
> @@ -871,6 +871,9 @@ append_token(char *dst, gl_state_index k)
> case STATE_CURRENT_ATTRIB:
>append(dst, "current");
>break;
> +   case STATE_CURRENT_ATTRIB_MAYBE_VP_CLAMPED:
> +  append(dst, "currentAttribMaybeVPClamped");
> +  break;
> case STATE_NORMAL_SCALE:
>append(dst, "normalScale");
>break;


Reviewed-by: Brian Paul 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix range handling for tgsi input/output declarations

2013-03-28 Thread Vadim Girlin

On 03/28/2013 01:01 PM, � wrote:

Am 27.03.2013 20:37, schrieb Vadim Girlin:

Signed-off-by: Vadim Girlin 
---
  src/gallium/drivers/r600/r600_shader.c | 19 +++
  1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c
b/src/gallium/drivers/r600/r600_shader.c
index 29facf7..d4c9c03 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -874,12 +874,12 @@ static int select_twoside_color(struct
r600_shader_ctx *ctx, int front, int back
  static int tgsi_declaration(struct r600_shader_ctx *ctx)
  {
  struct tgsi_full_declaration *d =
&ctx->parse.FullToken.FullDeclaration;
-unsigned i;
-int r;
+int r, i, j, count = d->Range.Last - d->Range.First + 1;
  switch (d->Declaration.File) {
  case TGSI_FILE_INPUT:
-i = ctx->shader->ninput++;
+i = ctx->shader->ninput;
+ctx->shader->ninput += count;
  ctx->shader->input[i].name = d->Semantic.Name;
  ctx->shader->input[i].sid = d->Semantic.Index;
  ctx->shader->input[i].interpolate = d->Interp.Interpolate;
@@ -903,9 +903,15 @@ static int tgsi_declaration(struct
r600_shader_ctx *ctx)
  return r;
  }
  }
+for (j = 1; j < count; ++j) {
+memcpy(&ctx->shader->input[i + j], &ctx->shader->input[i],
+   sizeof(struct r600_shader_io));


Instead of memcpy, shouldn't an assignment do the trick here as well?


Yes, assignment should work fine, I just used to use memcpy in such 
cases for some reason. I'll replace memcpy with assignment.


Also I think second part (outputs handling) can be dropped for now - 
currently we only need to handle the inputs (for HUD shaders), and later 
when array declarations for inputs/outputs will be implemented in TGSI 
probably we'll need to update the parser in r600g anyway - I'm just not 
sure yet how the semantic indices should be handled for input/output arrays.


Vadim




+ctx->shader->input[i + j].gpr += j;
+}
  break;
  case TGSI_FILE_OUTPUT:
-i = ctx->shader->noutput++;
+i = ctx->shader->noutput;
+ctx->shader->noutput += count;
  ctx->shader->output[i].name = d->Semantic.Name;
  ctx->shader->output[i].sid = d->Semantic.Index;
  ctx->shader->output[i].gpr =
ctx->file_offset[TGSI_FILE_OUTPUT] + d->Range.First;
@@ -933,6 +939,11 @@ static int tgsi_declaration(struct
r600_shader_ctx *ctx)
  break;
  }
  }
+for (j = 1; j < count; ++j) {
+memcpy(&ctx->shader->output[i + j], &ctx->shader->output[i],
+   sizeof(struct r600_shader_io));


Same here.


+ctx->shader->output[i + j].gpr += j;
+}
  break;
  case TGSI_FILE_CONSTANT:
  case TGSI_FILE_TEMPORARY:


Christian.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] R600: Emit CF_ALU and use true kcache register.

2013-03-28 Thread Vadim Girlin

On 03/28/2013 09:47 PM, Vincent Lejeune wrote:

> [snip]


diff --git a/lib/Target/R600/R600RegisterInfo.td 
b/lib/Target/R600/R600RegisterInfo.td
index ce5994c..3ee6623 100644
--- a/lib/Target/R600/R600RegisterInfo.td
+++ b/lib/Target/R600/R600RegisterInfo.td
@@ -43,6 +43,37 @@ foreach Index = 0-127 in {
 Index>;
  }

+// KCACHE_BANK0
+foreach Index = 159-128 in {
+  foreach Chan = [ "X", "Y", "Z", "W" ] in {
+// 32-bit Temporary Registers
+def KC0_#Index#_#Chan : R600RegWithChan <"KC0["#Index#"-128]."#Chan, Index, 
Chan>;
+  }
+  // 128-bit Temporary Registers
+  def KC0_#Index#_XYZW : R600Reg_128 <"KC0["#Index#"-128].XYZW",
+ [!cast("KC0_"#Index#"_X"),
+  !cast("KC0_"#Index#"_Y"),
+  !cast("KC0_"#Index#"_Z"),
+  !cast("KC0_"#Index#"_W")],
+ Index>;
+}
+
+// KCACHE_BANK1
+foreach Index = 191-159 in {


Probably 160 should be used instead of 159 here (and in the two 
occurrences below)?


Vadim


+  foreach Chan = [ "X", "Y", "Z", "W" ] in {
+// 32-bit Temporary Registers
+def KC1_#Index#_#Chan : R600RegWithChan <"KC1["#Index#"-159]."#Chan, Index, 
Chan>;
+  }
+  // 128-bit Temporary Registers
+  def KC1_#Index#_XYZW : R600Reg_128 <"KC1["#Index#"-159].XYZW",
+ [!cast("KC1_"#Index#"_X"),
+  !cast("KC1_"#Index#"_Y"),
+  !cast("KC1_"#Index#"_Z"),
+  !cast("KC1_"#Index#"_W")],
+ Index>;
+}
+
+
  // Array Base Register holding input in FS
  foreach Index = 448-480 in {
def ArrayBase#Index :  R600Reg<"ARRAY_BASE", Index>;
@@ -80,6 +111,38 @@ def R600_Addr : RegisterClass <"AMDGPU", [i32], 127, (add (sequence 
"Addr%u_X",

  } // End isAllocatable = 0

+def R600_KC0_X : RegisterClass <"AMDGPU", [f32, i32], 32,
+  (add (sequence "KC0_%u_X", 128, 159))>;
+
+def R600_KC0_Y : RegisterClass <"AMDGPU", [f32, i32], 32,
+  (add (sequence "KC0_%u_Y", 128, 159))>;
+
+def R600_KC0_Z : RegisterClass <"AMDGPU", [f32, i32], 32,
+  (add (sequence "KC0_%u_Z", 128, 159))>;
+
+def R600_KC0_W : RegisterClass <"AMDGPU", [f32, i32], 32,
+  (add (sequence "KC0_%u_W", 128, 159))>;
+
+def R600_KC0 : RegisterClass <"AMDGPU", [f32, i32], 32,
+   (interleave R600_KC0_X, R600_KC0_Y,
+   R600_KC0_Z, R600_KC0_W)>;
+
+def R600_KC1_X : RegisterClass <"AMDGPU", [f32, i32], 32,
+  (add (sequence "KC1_%u_X", 160, 191))>;
+
+def R600_KC1_Y : RegisterClass <"AMDGPU", [f32, i32], 32,
+  (add (sequence "KC1_%u_Y", 160, 191))>;
+
+def R600_KC1_Z : RegisterClass <"AMDGPU", [f32, i32], 32,
+  (add (sequence "KC1_%u_Z", 160, 191))>;
+
+def R600_KC1_W : RegisterClass <"AMDGPU", [f32, i32], 32,
+  (add (sequence "KC1_%u_W", 160, 191))>;
+
+def R600_KC1 : RegisterClass <"AMDGPU", [f32, i32], 32,
+   (interleave R600_KC1_X, R600_KC1_Y,
+   R600_KC1_Z, R600_KC1_W)>;
+
  def R600_TReg32_X : RegisterClass <"AMDGPU", [f32, i32], 32,
 (add (sequence "T%u_X", 0, 127), AR_X)>;

diff --git a/test/CodeGen/R600/kcache-fold.ll b/test/CodeGen/R600/kcache-fold.ll
index e8e2bf5..3d70e4b 100644
--- a/test/CodeGen/R600/kcache-fold.ll
+++ b/test/CodeGen/R600/kcache-fold.ll
@@ -1,7 +1,7 @@
  ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s

  ; CHECK: @main1
-; CHECK: MOV T{{[0-9]+\.[XYZW], CBuf0\[[0-9]+\]\.[XYZW]}}
+; CHECK: MOV T{{[0-9]+\.[XYZW], KC0}}
  define void @main1() {
  main_body:
%0 = load <4 x float> addrspace(8)* null



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/6] i965: Reword comment about the shared mathbox.

2013-03-28 Thread Matt Turner
---
 .../dri/i965/brw_fs_schedule_instructions.cpp  |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
index c125928..aa9c0a6 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
@@ -772,10 +772,10 @@ instruction_scheduler::schedule_instructions(fs_inst 
*next_block_header)
 }
   }
 
-  /* Shared resource: the mathbox.  There's one per EU (on later
-   * generations, it's even more limited pre-gen6), so if we send
-   * something off to it then the next math isn't going to make
-   * progress until the first is done.
+  /* Shared resource: the mathbox.  There's one mathbox per EU on Gen6+
+   * but it's more limited pre-gen6, so if we send something off to it then
+   * the next math instruction isn't going to make progress until the first
+   * is done.
*/
   if (chosen->inst->is_math()) {
 foreach_list(node, &instructions) {
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/6] i965: Note that write-after-write dependencies are blocking.

2013-03-28 Thread Matt Turner
---
 .../dri/i965/brw_fs_schedule_instructions.cpp  |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
index aa9c0a6..72b3729 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
@@ -36,7 +36,7 @@
  *
  * The basic model of the list scheduler is to take a basic block,
  * compute a DAG of the dependencies (RAW ordering with latency, WAW
- * ordering, WAR ordering), and make a list of the DAG heads.
+ * ordering with latency, WAR ordering), and make a list of the DAG heads.
  * Heuristically pick a DAG head, then put all the children that are
  * now DAG heads into the list of things to schedule.
  *
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/6] i965/fs: Add Haswell cycle timings

2013-03-28 Thread Matt Turner
---
 .../dri/i965/brw_fs_schedule_instructions.cpp  |   18 +-
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
index 72b3729..d2420cd 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
@@ -57,7 +57,7 @@ static bool debug = false;
 class schedule_node : public exec_node
 {
 public:
-   schedule_node(fs_inst *inst, int gen)
+   schedule_node(fs_inst *inst, const struct intel_context *intel)
{
   this->inst = inst;
   this->child_array_size = 0;
@@ -67,14 +67,14 @@ public:
   this->parent_count = 0;
   this->unblocked_time = 0;
 
-  if (gen >= 7)
- set_latency_gen7();
+  if (intel->gen >= 7)
+ set_latency_gen7(intel->is_haswell);
   else
  set_latency_gen4();
}
 
void set_latency_gen4();
-   void set_latency_gen7();
+   void set_latency_gen7(bool is_haswell);
 
fs_inst *inst;
schedule_node **children;
@@ -125,7 +125,7 @@ schedule_node::set_latency_gen4()
 }
 
 void
-schedule_node::set_latency_gen7()
+schedule_node::set_latency_gen7(bool is_haswell)
 {
switch (inst->opcode) {
case BRW_OPCODE_MAD:
@@ -137,7 +137,7 @@ schedule_node::set_latency_gen7()
* mad(8) g4<1>F g2.2<4,1,1>F.x  g2<4,1,1>F.x g2.1<4,1,1>F.x { align16 
WE_normal 1Q };
* mov(8) null   g4<4,4,1>F { align16 WE_normal 1Q };
*/
-  latency = 17;
+  latency = is_haswell ? 16 : 17;
   break;
 
case SHADER_OPCODE_RCP:
@@ -156,7 +156,7 @@ schedule_node::set_latency_gen7()
*
* Same for exp2, log2, rsq, sqrt, sin, cos.
*/
-  latency = 16;
+  latency = is_haswell ? 14 : 16;
   break;
 
case SHADER_OPCODE_POW:
@@ -167,7 +167,7 @@ schedule_node::set_latency_gen7()
* math pow(8) g4<1>F g2<0,1,0>F   g2.1<0,1,0>F  { align1 WE_normal 1Q };
* mov(8)  null   g4<8,8,1>F { align1 WE_normal 1Q };
*/
-  latency = 24;
+  latency = is_haswell ? 22 : 24;
   break;
 
case SHADER_OPCODE_TEX:
@@ -335,7 +335,7 @@ public:
 void
 instruction_scheduler::add_inst(fs_inst *inst)
 {
-   schedule_node *n = new(mem_ctx) schedule_node(inst, v->intel->gen);
+   schedule_node *n = new(mem_ctx) schedule_node(inst, v->intel);
 
assert(!inst->is_head_sentinel());
assert(!inst->is_tail_sentinel());
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/6] i965/fs: Add LRP instruction latency.

2013-03-28 Thread Matt Turner
Set its latency to what happens to be the default floating-point
instruction latency. One day we may want to handle latency based on
register bank information.
---
 .../dri/i965/brw_fs_schedule_instructions.cpp  |   26 
 1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
index d2420cd..ec558e3 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
@@ -140,6 +140,32 @@ schedule_node::set_latency_gen7(bool is_haswell)
   latency = is_haswell ? 16 : 17;
   break;
 
+   case BRW_OPCODE_LRP:
+  /* 2 cycles
+   *  (since the last two src operands are in different register banks):
+   * lrp(8) g4<1>F g2.2<4,1,1>F.x  g2<4,1,1>F.x g3.1<4,1,1>F.x { align16 
WE_normal 1Q };
+   *
+   * 3 cycles on IVB, 4 on HSW
+   *  (since the last two src operands are in the same register bank):
+   * lrp(8) g4<1>F g2.2<4,1,1>F.x  g2<4,1,1>F.x g2.1<4,1,1>F.x { align16 
WE_normal 1Q };
+   *
+   * 16 cycles on IVB, 14 on HSW
+   *  (since the last two src operands are in different register banks):
+   * lrp(8) g4<1>F g2.2<4,1,1>F.x  g2<4,1,1>F.x g3.1<4,1,1>F.x { align16 
WE_normal 1Q };
+   * mov(8) null   g4<4,4,1>F { align16 WE_normal 1Q };
+   *
+   * 16 cycles
+   *  (since the last two src operands are in the same register bank):
+   * lrp(8) g4<1>F g2.2<4,1,1>F.x  g2<4,1,1>F.x g2.1<4,1,1>F.x { align16 
WE_normal 1Q };
+   * mov(8) null   g4<4,4,1>F { align16 WE_normal 1Q };
+   */
+
+  /* Our register allocator doesn't know about register banks, so use the
+   * higher latency.
+   */
+  latency = 14;
+  break;
+
case SHADER_OPCODE_RCP:
case SHADER_OPCODE_RSQ:
case SHADER_OPCODE_SQRT:
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/6] i965/fs: Increase and document MAD latency on Gen7.

2013-03-28 Thread Matt Turner
58% of mad(8) generated in shader-db are reading registers from the same
bank.
---
 .../dri/i965/brw_fs_schedule_instructions.cpp  |   22 ---
 1 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
index ec558e3..f0ef470 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
@@ -129,15 +129,29 @@ schedule_node::set_latency_gen7(bool is_haswell)
 {
switch (inst->opcode) {
case BRW_OPCODE_MAD:
-  /* 3 cycles (this is said to be 4 cycles sometimes depending on the
-   * register numbers in the sources):
+  /* 2 cycles
+   *  (since the last two src operands are in different register banks):
+   * mad(8) g4<1>F g2.2<4,1,1>F.x  g2<4,1,1>F.x g3.1<4,1,1>F.x { align16 
WE_normal 1Q };
+   *
+   * 3 cycles on IVB, 4 on HSW
+   *  (since the last two src operands are in the same register bank):
* mad(8) g4<1>F g2.2<4,1,1>F.x  g2<4,1,1>F.x g2.1<4,1,1>F.x { align16 
WE_normal 1Q };
*
-   * 20 cycles:
+   * 18 cycles on IVB, 16 on HSW
+   *  (since the last two src operands are in different register banks):
+   * mad(8) g4<1>F g2.2<4,1,1>F.x  g2<4,1,1>F.x g3.1<4,1,1>F.x { align16 
WE_normal 1Q };
+   * mov(8) null   g4<4,5,1>F { align16 WE_normal 1Q };
+   *
+   * 20 cycles on IVB, 18 on HSW
+   *  (since the last two src operands are in the same register bank):
* mad(8) g4<1>F g2.2<4,1,1>F.x  g2<4,1,1>F.x g2.1<4,1,1>F.x { align16 
WE_normal 1Q };
* mov(8) null   g4<4,4,1>F { align16 WE_normal 1Q };
*/
-  latency = is_haswell ? 16 : 17;
+
+  /* Our register allocator doesn't know about register banks, so use the
+   * higher latency.
+   */
+  latency = is_haswell ? 16 : 18;
   break;
 
case BRW_OPCODE_LRP:
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/6] i965/fs: Use measured Gen7 instruction timings on Gen6.

2013-03-28 Thread Matt Turner
>From GLBenchmark 2.5:

x before
+ after
+--+
|   x   x   +  |
|   xx  ++  x   +  |
|   xx  ++ +   xx   ++ |
|x xxx x+  +   xxx x*x+*+++ + x   +|
|   |_|A__AMM_|___||
+--+
N   Min   MaxMedian   AvgStddev
x  23   8083.78   8287.83   8205.55 8162.7461 68.307951
+  23   8107.56   8358.74   8224.33 8186.1765 71.506301
No difference proven at 95.0% confidence
---
Not a strong preference if this goes in or not.

 .../dri/i965/brw_fs_schedule_instructions.cpp  |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
index f0ef470..90f1a16 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
@@ -67,7 +67,10 @@ public:
   this->parent_count = 0;
   this->unblocked_time = 0;
 
-  if (intel->gen >= 7)
+  /* We can't measure Gen6 timings directly but expect them to be much
+   * closer to Gen7 than Gen4.
+   */
+  if (intel->gen >= 6)
  set_latency_gen7(intel->is_haswell);
   else
  set_latency_gen4();
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix INTEL_DEBUG=shader_time for fragment shaders with discards.

2013-03-28 Thread Eric Anholt
Kenneth Graunke  writes:

> "discard" instructions generate HALT instructions which jump to a final
> HALT near the end of the shader.  Previously, fs_generator created this
> final jump target when it saw the first FS_OPCODE_FB_WRITE, causing it
> to jump right before the FB write epilogue.  This is normally good.
>
> However, INTEL_DEBUG=shader_time also has an epilogue section which
> records the final timestamp.  The frontend emits IR for this just before
> FS_OPCODE_FB_WRITE.  Unfortunately, this led to the following ordering:

Wow, that changes timings...

Reviewed-by: Eric Anholt 


pgp_6yr0_tnow.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] i965: Reword comment about the shared mathbox.

2013-03-28 Thread Eric Anholt
This series is:

Reviewed-by: Eric Anholt 


pgpsUI6zUKGpw.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Reduce code duplication in handling of depth, stencil, and HiZ.

2013-03-28 Thread Paul Berry
On 27 March 2013 20:43, Chad Versace  wrote:

> I like this patch. The final result feels cleaner than the present code.
> Comments below.
>
>
> On Tue, Mar 26, 2013 at 09:54:18PM -0700, Paul Berry wrote:
> > This patch consolidates duplicate code in the brw_depthbuffer and
> > gen7_depthbuffer state atoms.  Previously, these state atoms contained
> > 5 chunks of code for emitting the _3DSTATE_DEPTH_BUFFER packet (3 for
> > Gen4-6 and 2 for Gen7).  Also a lot of logic for determining the
> > appropriate buffer setup was duplicated between the Gen4-6 and Gen7
> > functions.
> >
> > This refactor splits the code into three separate functions:
> > brw_emit_depthbuffer(), which determines the appropriate buffer setup
> > in a mostly generation-independent way, brw_emit_depth_stencil_hiz(),
> > which emits the appropriate state packets for Gen4-6, and
> > gen7_emit_depth_stencil_hiz(), which emits the appropriate state
> > packets for Gen7.
> >
> > Tested using Piglit on Gen5-7 (no regressions).
> > ---
> >  src/mesa/drivers/dri/i965/brw_context.h |  23 
> >  src/mesa/drivers/dri/i965/brw_misc_state.c  | 189
> +++-
> >  src/mesa/drivers/dri/i965/brw_vtbl.c|   2 +
> >  src/mesa/drivers/dri/i965/gen7_misc_state.c |  93 +-
> >  src/mesa/drivers/dri/intel/intel_context.h  |  16 +++
> >  5 files changed, 171 insertions(+), 152 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_context.h
> b/src/mesa/drivers/dri/i965/brw_context.h
> > index 8ff70c9..1ea038f 100644
> > --- a/src/mesa/drivers/dri/i965/brw_context.h
> > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> > @@ -1347,6 +1347,29 @@ struct opcode_desc {
> >
> >  extern const struct opcode_desc opcode_descs[128];
> >
> > +void
> > +brw_emit_depthbuffer(struct brw_context *brw);
> > +
> > +void
> > +brw_emit_depth_stencil_hiz(struct brw_context *brw,
> > +   struct intel_mipmap_tree *depth_mt,
> > +   uint32_t depth_offset, uint32_t
> depthbuffer_format,
> > +   uint32_t depth_surface_type,
> > +   struct intel_mipmap_tree *stencil_mt,
> > +   struct intel_mipmap_tree *hiz_mt,
> > +   bool separate_stencil, uint32_t width,
> > +   uint32_t height, uint32_t tile_x, uint32_t
> tile_y);
> > +
> > +void
> > +gen7_emit_depth_stencil_hiz(struct brw_context *brw,
> > +struct intel_mipmap_tree *depth_mt,
> > +uint32_t depth_offset, uint32_t
> depthbuffer_format,
> > +uint32_t depth_surface_type,
> > +struct intel_mipmap_tree *stencil_mt,
> > +struct intel_mipmap_tree *hiz_mt,
> > +bool separate_stencil, uint32_t width,
> > +uint32_t height, uint32_t tile_x, uint32_t
> tile_y);
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c
> b/src/mesa/drivers/dri/i965/brw_misc_state.c
> > index d6bd86c..3821eda 100644
> > --- a/src/mesa/drivers/dri/i965/brw_misc_state.c
> > +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
> > @@ -561,7 +561,8 @@ brw_workaround_depthstencil_alignment(struct
> brw_context *brw,
> > }
> >  }
> >
> > -static void emit_depthbuffer(struct brw_context *brw)
> > +void
> > +brw_emit_depthbuffer(struct brw_context *brw)
> >  {
> > struct intel_context *intel = &brw->intel;
> > struct gl_context *ctx = &intel->ctx;
> > @@ -574,20 +575,23 @@ static void emit_depthbuffer(struct brw_context
> *brw)
> > struct intel_mipmap_tree *hiz_mt = brw->depthstencil.hiz_mt;
> > uint32_t tile_x = brw->depthstencil.tile_x;
> > uint32_t tile_y = brw->depthstencil.tile_y;
> > -   unsigned int len;
> > bool separate_stencil = false;
> > +   uint32_t depth_surface_type = BRW_SURFACE_NULL;
> > +   uint32_t depthbuffer_format = BRW_DEPTHFORMAT_D32_FLOAT;
> > +   uint32_t depth_offset = 0;
> > +   uint32_t width = 1, height = 1;
> >
> > -   if (stencil_mt && stencil_mt->format == MESA_FORMAT_S8)
> > -  separate_stencil = true;
> > +   if (stencil_mt) {
> > +  separate_stencil = stencil_mt->format == MESA_FORMAT_S8;
> >
> > -   /* 3DSTATE_DEPTH_BUFFER, 3DSTATE_STENCIL_BUFFER are both
> > -* non-pipelined state that will need the PIPE_CONTROL workaround.
> > -*/
> > -   if (intel->gen == 6) {
> > -  intel_emit_post_sync_nonzero_flush(intel);
> > -  intel_emit_depth_stall_flushes(intel);
> > +  /* Gen7 only supports separate stencil */
>
> The order of words here confuse me. The "only" is adjacent to "Gen7", not
> "separate stencil", so I read it as "Only Gen7 supports separate stencil."
> Please move "only" to be adjacent to "separate stencil".
>

Wow, that interpretation really surprises me.  I wonder if this is a
difference between our dialects.  "Gen

[Mesa-dev] [RFC PATCH] i965/fs: Don't immediately schedule instructions that were just made available.

2013-03-28 Thread Matt Turner
This is not how the final patch would look. Rather, we'd remove the flatten the
if (post_reg_alloc) block and remove the else clause. This patch just aims to
prove that we're choosing instructions in a bad order.

On Sandybridge GLB2.5 C24Z16_DXT1 1600x900 non-composited:

x before
+ after
+--+
| +|
|   x + +  |
|  xx x x + ++ |
|x x  x++*++xx*xx  x +   ++|
|  |___M__A_|___|_M_A__|   |
+--+
N   Min   MaxMedian   AvgStddev
x  23   8025.588203.4   8048.86 8105.5061  72.50085
+  23   8156.34   8323.38   8185.55 8236.8326 74.079214
Difference at 95.0% confidence
131.327 +/- 43.5508
1.62021% +/- 0.537299%
(Student's t, pooled s = 73.2943)

The original goal of pre-register allocation scheduling was to reduce live
ranges so we'd use fewer registers and hopefully fit into 16-wide. In shader-db,
this change causes us to lose 30 16-wide programs, but we gain 29... so it's a
toss-up. At least by choosing instructions in a better order all programs should
be slightly faster. Consider the trivial case of

uniform float a, b;
void main() { gl_FragColor = vec4(cross(a, b)); }

Before the patch we compile this to

mov.sat(8)  m4<1>F  0F
mul(8)  g3<1>F  g2.4<0,1,0>Fg2<0,1,0>F
mad.sat(8)  m3<1>F  -g3<4,1,1>F g2.3<4,1,1>F.x  g2.1<4,1,1>F.x
mul(8)  g3<1>F  g2.3<0,1,0>Fg2.2<0,1,0>F
mad.sat(8)  m2<1>F  -g3<4,1,1>F g2.5<4,1,1>F.x  g2<4,1,1>F.x
mul(8)  g3<1>F  g2.5<0,1,0>Fg2.1<0,1,0>F
mad.sat(8)  m1<1>F  -g3<4,1,1>F g2.4<4,1,1>F.x  g2.2<4,1,1>F.x
sendc(8)nullm1<8,8,1>F

where we stall on each mad.sat waiting for the mul to finish. The sendc is 
issued
cycle 66. After the patch it compiles to

mul(8)  g3<1>F  g2.5<0,1,0>Fg2.1<0,1,0>F
mul(8)  g4<1>F  g2.3<0,1,0>Fg2.2<0,1,0>F
mul(8)  g5<1>F  g2.4<0,1,0>Fg2<0,1,0>F
mov.sat(8)  m4<1>F  0F
mad.sat(8)  m1<1>F  -g3<4,1,1>F g2.4<4,1,1>F.x  g2.2<4,1,1>F.x
mad.sat(8)  m2<1>F  -g4<4,1,1>F g2.5<4,1,1>F.x  g2<4,1,1>F.x
mad.sat(8)  m3<1>F  -g5<4,1,1>F g2.3<4,1,1>F.x  g2.1<4,1,1>F.x
sendc(8)nullm1<8,8,1>F

By hiding much of the latency, the sendc instruction is issued by cycle 32.
---
 .../dri/i965/brw_fs_schedule_instructions.cpp  |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
index 90f1a16..4d2dbe8 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
@@ -753,9 +753,9 @@ instruction_scheduler::schedule_instructions(fs_inst 
*next_block_header)
   * but also the MRF setup for the next sampler message, which in turn
   * unblocks the next sampler message).
   */
- for (schedule_node *node = (schedule_node *)instructions.get_tail();
-  node != instructions.get_head()->prev;
-  node = (schedule_node *)node->prev) {
+ for (schedule_node *node = (schedule_node *)instructions.get_head();
+  node != instructions.get_tail()->next;
+  node = (schedule_node *)node->next) {
 schedule_node *n = (schedule_node *)node;
 
 chosen = n;
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/5] ARB_color_buffer_float vs GL Core + refactoring

2013-03-28 Thread Marek Olšák
This patch series adds the ability to disable ARB_color_buffer_float in core GL 
contexts, because let's be honest, the clamping controls don't make much sense 
with core GL and must be emulated by most drivers anyway.

There is also some cleanup in _mesa_update_state_locked and how clamping 
controls are updated. And finally, there is a patch which should reduce the 
number of shader recompilations for drivers which emulate fragment color 
clamping.

Please review.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] mesa: refactor clamping controls, get rid of _ClampReadColor

2013-03-28 Thread Marek Olšák
---
 src/mesa/main/blend.c   |   28 +++-
 src/mesa/main/blend.h   |8 
 src/mesa/main/fbobject.c|   10 ++
 src/mesa/main/framebuffer.c |1 +
 src/mesa/main/mtypes.h  |4 +++-
 src/mesa/main/readpix.c |5 +++--
 src/mesa/main/state.c   |   29 +++--
 7 files changed, 55 insertions(+), 30 deletions(-)

diff --git a/src/mesa/main/blend.c b/src/mesa/main/blend.c
index 309f1d5..876cbf2 100644
--- a/src/mesa/main/blend.c
+++ b/src/mesa/main/blend.c
@@ -782,7 +782,34 @@ _mesa_ClampColor(GLenum target, GLenum clamp)
}
 }
 
+static GLboolean _mesa_get_clamp_color(const struct gl_framebuffer *fb,
+   GLenum clamp)
+{
+   if (clamp == GL_TRUE || clamp == GL_FALSE)
+  return clamp;
+
+   ASSERT(clamp == GL_FIXED_ONLY);
+   if (!fb)
+  return GL_TRUE;
 
+   return fb->_AllColorBuffersFixedPoint;
+}
+
+GLboolean _mesa_get_clamp_fragment_color(const struct gl_context *ctx)
+{
+   return _mesa_get_clamp_color(ctx->DrawBuffer,
+ctx->Color.ClampFragmentColor);
+}
+
+GLboolean _mesa_get_clamp_vertex_color(const struct gl_context *ctx)
+{
+   return _mesa_get_clamp_color(ctx->DrawBuffer, ctx->Light.ClampVertexColor);
+}
+
+GLboolean _mesa_get_clamp_read_color(const struct gl_context *ctx)
+{
+   return _mesa_get_clamp_color(ctx->ReadBuffer, ctx->Color.ClampReadColor);
+}
 
 
 /**/
@@ -835,7 +862,6 @@ void _mesa_init_color( struct gl_context * ctx )
ctx->Color.ClampFragmentColor = GL_FIXED_ONLY_ARB;
ctx->Color._ClampFragmentColor = GL_TRUE;
ctx->Color.ClampReadColor = GL_FIXED_ONLY_ARB;
-   ctx->Color._ClampReadColor = GL_TRUE;
 
if (ctx->API == API_OPENGLES2) {
   /* GLES 3 behaves as though GL_FRAMEBUFFER_SRGB is always enabled. */
diff --git a/src/mesa/main/blend.h b/src/mesa/main/blend.h
index a539aa8..694fc5a 100644
--- a/src/mesa/main/blend.h
+++ b/src/mesa/main/blend.h
@@ -99,6 +99,14 @@ _mesa_ColorMaski( GLuint buf, GLboolean red, GLboolean green,
 extern void GLAPIENTRY
 _mesa_ClampColor(GLenum target, GLenum clamp);
 
+extern GLboolean
+_mesa_get_clamp_fragment_color(const struct gl_context *ctx);
+
+extern GLboolean
+_mesa_get_clamp_vertex_color(const struct gl_context *ctx);
+
+extern GLboolean
+_mesa_get_clamp_read_color(const struct gl_context *ctx);
 
 extern void  
 _mesa_init_color( struct gl_context * ctx );
diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c
index c1f5299..67c1161 100644
--- a/src/mesa/main/fbobject.c
+++ b/src/mesa/main/fbobject.c
@@ -784,6 +784,7 @@ _mesa_test_framebuffer_completeness(struct gl_context *ctx,
numImages = 0;
fb->Width = 0;
fb->Height = 0;
+   fb->_AllColorBuffersFixedPoint = GL_TRUE;
 
/* Start at -2 to more easily loop over all attachment points.
 *  -2: depth buffer
@@ -900,6 +901,15 @@ _mesa_test_framebuffer_completeness(struct gl_context *ctx,
   /* check if integer color */
   fb->_IntegerColor = _mesa_is_format_integer_color(attFormat);
 
+  /* Update _AllColorBuffersFixedPoint. */
+  if (i >= 0) {
+ GLenum type = _mesa_get_format_datatype(attFormat);
+
+ fb->_AllColorBuffersFixedPoint =
+fb->_AllColorBuffersFixedPoint &&
+(type == GL_UNSIGNED_NORMALIZED || type == GL_SIGNED_NORMALIZED);
+  }
+
   /* Error-check width, height, format */
   if (numImages == 1) {
  /* save format */
diff --git a/src/mesa/main/framebuffer.c b/src/mesa/main/framebuffer.c
index 619aaa3..5fb3677 100644
--- a/src/mesa/main/framebuffer.c
+++ b/src/mesa/main/framebuffer.c
@@ -154,6 +154,7 @@ _mesa_initialize_window_framebuffer(struct gl_framebuffer 
*fb,
 
fb->Delete = _mesa_destroy_framebuffer;
fb->_Status = GL_FRAMEBUFFER_COMPLETE_EXT;
+   fb->_AllColorBuffersFixedPoint = !visual->floatMode;
 
compute_depth_max(fb);
 }
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index a0e7e28..8c64a5b 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -683,7 +683,6 @@ struct gl_colorbuffer_attrib
GLenum ClampFragmentColor; /**< GL_TRUE, GL_FALSE or GL_FIXED_ONLY_ARB */
GLboolean _ClampFragmentColor; /** < with GL_FIXED_ONLY_ARB resolved */
GLenum ClampReadColor; /**< GL_TRUE, GL_FALSE or GL_FIXED_ONLY_ARB */
-   GLboolean _ClampReadColor; /** < with GL_FIXED_ONLY_ARB resolved */
 
GLboolean sRGBEnabled;  /**< Framebuffer sRGB blending/updating 
requested */
 };
@@ -2670,6 +2669,9 @@ struct gl_framebuffer
/** Integer color values */
GLboolean _IntegerColor;
 
+   /* ARB_color_buffer_float */
+   GLboolean _AllColorBuffersFixedPoint; /* no integer, no float */
+
/** Array of all renderbuffer attachments, indexed by BUFFER_* tokens. */
struct gl_renderbuffer_attachment Attachment[BUFFER_COUNT];
 
diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpi

[Mesa-dev] [PATCH 2/5] mesa: don't set _ClampFragmentColor to TRUE if it has no effect

2013-03-28 Thread Marek Olšák
This should reduce shader recompilations with drivers that emulate fragment
color clamping, because we want the clamping to be enabled only if there is
a signed normalized or floating-point colorbuffer.
---
 src/mesa/main/blend.c   |2 +-
 src/mesa/main/fbobject.c|7 ++-
 src/mesa/main/framebuffer.c |1 +
 src/mesa/main/get.c |9 +
 src/mesa/main/mtypes.h  |1 +
 src/mesa/main/state.c   |   12 +++-
 src/mesa/main/texenv.c  |3 ++-
 src/mesa/main/texparam.c|3 ++-
 src/mesa/program/prog_statevars.c   |5 +++--
 src/mesa/state_tracker/st_atom_rasterizer.c |3 +--
 src/mesa/state_tracker/st_atom_shader.c |3 +--
 src/mesa/state_tracker/st_cb_bitmap.c   |3 +--
 src/mesa/state_tracker/st_cb_drawpixels.c   |6 ++
 13 files changed, 37 insertions(+), 21 deletions(-)

diff --git a/src/mesa/main/blend.c b/src/mesa/main/blend.c
index 876cbf2..c1b49b1 100644
--- a/src/mesa/main/blend.c
+++ b/src/mesa/main/blend.c
@@ -860,7 +860,7 @@ void _mesa_init_color( struct gl_context * ctx )
}
 
ctx->Color.ClampFragmentColor = GL_FIXED_ONLY_ARB;
-   ctx->Color._ClampFragmentColor = GL_TRUE;
+   ctx->Color._ClampFragmentColor = GL_FALSE;
ctx->Color.ClampReadColor = GL_FIXED_ONLY_ARB;
 
if (ctx->API == API_OPENGLES2) {
diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c
index 67c1161..fc2b262 100644
--- a/src/mesa/main/fbobject.c
+++ b/src/mesa/main/fbobject.c
@@ -785,6 +785,7 @@ _mesa_test_framebuffer_completeness(struct gl_context *ctx,
fb->Width = 0;
fb->Height = 0;
fb->_AllColorBuffersFixedPoint = GL_TRUE;
+   fb->_HasSNormOrFloatColorBuffer = GL_FALSE;
 
/* Start at -2 to more easily loop over all attachment points.
 *  -2: depth buffer
@@ -901,13 +902,17 @@ _mesa_test_framebuffer_completeness(struct gl_context 
*ctx,
   /* check if integer color */
   fb->_IntegerColor = _mesa_is_format_integer_color(attFormat);
 
-  /* Update _AllColorBuffersFixedPoint. */
+  /* Update _AllColorBuffersFixedPoint and _HasSNormOrFloatColorBuffer. */
   if (i >= 0) {
  GLenum type = _mesa_get_format_datatype(attFormat);
 
  fb->_AllColorBuffersFixedPoint =
 fb->_AllColorBuffersFixedPoint &&
 (type == GL_UNSIGNED_NORMALIZED || type == GL_SIGNED_NORMALIZED);
+
+ fb->_HasSNormOrFloatColorBuffer =
+fb->_HasSNormOrFloatColorBuffer ||
+type == GL_SIGNED_NORMALIZED || type == GL_FLOAT;
   }
 
   /* Error-check width, height, format */
diff --git a/src/mesa/main/framebuffer.c b/src/mesa/main/framebuffer.c
index 5fb3677..757c4b0 100644
--- a/src/mesa/main/framebuffer.c
+++ b/src/mesa/main/framebuffer.c
@@ -155,6 +155,7 @@ _mesa_initialize_window_framebuffer(struct gl_framebuffer 
*fb,
fb->Delete = _mesa_destroy_framebuffer;
fb->_Status = GL_FRAMEBUFFER_COMPLETE_EXT;
fb->_AllColorBuffersFixedPoint = !visual->floatMode;
+   fb->_HasSNormOrFloatColorBuffer = visual->floatMode;
 
compute_depth_max(fb);
 }
diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
index 582ef31..244e3ea 100644
--- a/src/mesa/main/get.c
+++ b/src/mesa/main/get.c
@@ -24,6 +24,7 @@
 
 #include "glheader.h"
 #include "context.h"
+#include "blend.h"
 #include "enable.h"
 #include "enums.h"
 #include "extensions.h"
@@ -767,13 +768,13 @@ find_custom_value(struct gl_context *ctx, const struct 
value_desc *d, union valu
   break;
 
case GL_FOG_COLOR:
-  if(ctx->Color._ClampFragmentColor)
+  if (_mesa_get_clamp_fragment_color(ctx))
  COPY_4FV(v->value_float_4, ctx->Fog.Color);
   else
  COPY_4FV(v->value_float_4, ctx->Fog.ColorUnclamped);
   break;
case GL_COLOR_CLEAR_VALUE:
-  if(ctx->Color._ClampFragmentColor) {
+  if (_mesa_get_clamp_fragment_color(ctx)) {
  v->value_float_4[0] = CLAMP(ctx->Color.ClearColor.f[0], 0.0F, 1.0F);
  v->value_float_4[1] = CLAMP(ctx->Color.ClearColor.f[1], 0.0F, 1.0F);
  v->value_float_4[2] = CLAMP(ctx->Color.ClearColor.f[2], 0.0F, 1.0F);
@@ -782,13 +783,13 @@ find_custom_value(struct gl_context *ctx, const struct 
value_desc *d, union valu
  COPY_4FV(v->value_float_4, ctx->Color.ClearColor.f);
   break;
case GL_BLEND_COLOR_EXT:
-  if(ctx->Color._ClampFragmentColor)
+  if (_mesa_get_clamp_fragment_color(ctx))
  COPY_4FV(v->value_float_4, ctx->Color.BlendColor);
   else
  COPY_4FV(v->value_float_4, ctx->Color.BlendColorUnclamped);
   break;
case GL_ALPHA_TEST_REF:
-  if(ctx->Color._ClampFragmentColor)
+  if (_mesa_get_clamp_fragment_color(ctx))
  v->value_float = ctx->Color.AlphaRef;
   else
  v->value_float = ctx->Color.AlphaRefUnclamped;
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 8c64a

[Mesa-dev] [PATCH 3/5] mesa: move updating clamp control derived state out of mesa_update_state_locked

2013-03-28 Thread Marek Olšák
It has 2 dependencies: glClampColor and the framebuffer, we might just as well
do the update where those two are changed.
---
 src/mesa/main/blend.c   |   28 
 src/mesa/main/blend.h   |6 ++
 src/mesa/main/framebuffer.c |4 
 src/mesa/main/state.c   |   36 
 4 files changed, 38 insertions(+), 36 deletions(-)

diff --git a/src/mesa/main/blend.c b/src/mesa/main/blend.c
index c1b49b1..6cc2310 100644
--- a/src/mesa/main/blend.c
+++ b/src/mesa/main/blend.c
@@ -767,10 +767,12 @@ _mesa_ClampColor(GLenum target, GLenum clamp)
case GL_CLAMP_VERTEX_COLOR_ARB:
   FLUSH_VERTICES(ctx, _NEW_LIGHT);
   ctx->Light.ClampVertexColor = clamp;
+  _mesa_update_clamp_vertex_color(ctx);
   break;
case GL_CLAMP_FRAGMENT_COLOR_ARB:
   FLUSH_VERTICES(ctx, _NEW_FRAG_CLAMP);
   ctx->Color.ClampFragmentColor = clamp;
+  _mesa_update_clamp_fragment_color(ctx);
   break;
case GL_CLAMP_READ_COLOR_ARB:
   FLUSH_VERTICES(ctx, _NEW_COLOR);
@@ -811,6 +813,32 @@ GLboolean _mesa_get_clamp_read_color(const struct 
gl_context *ctx)
return _mesa_get_clamp_color(ctx->ReadBuffer, ctx->Color.ClampReadColor);
 }
 
+/**
+ * Update the ctx->Color._ClampFragmentColor field
+ */
+void _mesa_update_clamp_fragment_color(struct gl_context *ctx)
+{
+   struct gl_framebuffer *fb = ctx->DrawBuffer;
+
+   /* Don't clamp if:
+* - there is no colorbuffer
+* - all colorbuffers are unsigned normalized, so clamping has no effect
+* - there is an integer colorbuffer
+*/
+   if (!fb || !fb->_HasSNormOrFloatColorBuffer || fb->_IntegerColor)
+  ctx->Color._ClampFragmentColor = GL_FALSE;
+   else
+  ctx->Color._ClampFragmentColor = _mesa_get_clamp_fragment_color(ctx);
+}
+
+/**
+ * Update the ctx->Color._ClampVertexColor field
+ */
+void _mesa_update_clamp_vertex_color(struct gl_context *ctx)
+{
+   ctx->Light._ClampVertexColor = _mesa_get_clamp_vertex_color(ctx);
+}
+
 
 /**/
 /** \name Initialization */
diff --git a/src/mesa/main/blend.h b/src/mesa/main/blend.h
index 694fc5a..621311d 100644
--- a/src/mesa/main/blend.h
+++ b/src/mesa/main/blend.h
@@ -108,6 +108,12 @@ _mesa_get_clamp_vertex_color(const struct gl_context *ctx);
 extern GLboolean
 _mesa_get_clamp_read_color(const struct gl_context *ctx);
 
+extern void
+_mesa_update_clamp_fragment_color(struct gl_context *ctx);
+
+extern void
+_mesa_update_clamp_vertex_color(struct gl_context *ctx);
+
 extern void  
 _mesa_init_color( struct gl_context * ctx );
 
diff --git a/src/mesa/main/framebuffer.c b/src/mesa/main/framebuffer.c
index 757c4b0..6c69774 100644
--- a/src/mesa/main/framebuffer.c
+++ b/src/mesa/main/framebuffer.c
@@ -32,6 +32,7 @@
 
 #include "glheader.h"
 #include "imports.h"
+#include "blend.h"
 #include "buffers.h"
 #include "context.h"
 #include "enums.h"
@@ -742,6 +743,9 @@ _mesa_update_framebuffer(struct gl_context *ctx)
update_framebuffer(ctx, drawFb);
if (readFb != drawFb)
   update_framebuffer(ctx, readFb);
+
+   _mesa_update_clamp_vertex_color(ctx);
+   _mesa_update_clamp_fragment_color(ctx);
 }
 
 
diff --git a/src/mesa/main/state.c b/src/mesa/main/state.c
index 73c5a1c..251c1ae 100644
--- a/src/mesa/main/state.c
+++ b/src/mesa/main/state.c
@@ -309,36 +309,6 @@ update_multisample(struct gl_context *ctx)
 
 
 /**
- * Update the ctx->Color._ClampFragmentColor field
- */
-static void
-update_clamp_fragment_color(struct gl_context *ctx)
-{
-   struct gl_framebuffer *fb = ctx->DrawBuffer;
-
-   /* Don't clamp if:
-* - there is no colorbuffer
-* - all colorbuffers are unsigned normalized, so clamping has no effect
-* - there is an integer colorbuffer
-*/
-   if (!fb || !fb->_HasSNormOrFloatColorBuffer || fb->_IntegerColor)
-  ctx->Color._ClampFragmentColor = GL_FALSE;
-   else
-  ctx->Color._ClampFragmentColor = _mesa_get_clamp_fragment_color(ctx);
-}
-
-
-/**
- * Update the ctx->Color._ClampVertexColor field
- */
-static void
-update_clamp_vertex_color(struct gl_context *ctx)
-{
-   ctx->Light._ClampVertexColor = _mesa_get_clamp_vertex_color(ctx);
-}
-
-
-/**
  * Update the ctx->VertexProgram._TwoSideEnabled flag.
  */
 static void
@@ -497,9 +467,6 @@ _mesa_update_state_locked( struct gl_context *ctx )
if (new_state & (_NEW_LIGHT | _NEW_PROGRAM))
   update_twoside( ctx );
 
-   if (new_state & (_NEW_LIGHT | _NEW_BUFFERS))
-  update_clamp_vertex_color(ctx);
-
if (new_state & (_NEW_STENCIL | _NEW_BUFFERS))
   _mesa_update_stencil( ctx );
 
@@ -515,9 +482,6 @@ _mesa_update_state_locked( struct gl_context *ctx )
if (new_state & (_NEW_MULTISAMPLE | _NEW_BUFFERS))
   update_multisample( ctx );
 
-   if(new_state & (_NEW_FRAG_CLAMP | _NEW_BUFFERS))
-  update_clamp_fragment_color(ctx);
-
 #if 0
if (new_state & (_NEW_POINT | _NEW_LINE | _NEW_POLYGON | _NEW_LIGHT
 | _NEW_STENCIL | _MESA_NEW_

[Mesa-dev] [PATCH 4/5] mesa: allow drivers not to expose ARB_color_buffer_float in GL core profile

2013-03-28 Thread Marek Olšák
---
 src/mesa/drivers/common/meta.c   |   18 --
 src/mesa/main/attrib.c   |9 +++--
 src/mesa/main/blend.c|   19 ---
 src/mesa/main/get.c  |9 -
 src/mesa/main/get_hash_params.py |2 +-
 src/mesa/main/light.c|3 ++-
 src/mesa/main/version.c  |3 ++-
 7 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c
index 8114550..e3ab82b 100644
--- a/src/mesa/drivers/common/meta.c
+++ b/src/mesa/drivers/common/meta.c
@@ -757,7 +757,8 @@ _mesa_meta_begin(struct gl_context *ctx, GLbitfield state)
* it's for the pixel path (ClampFragmentColor is GL_TRUE),
* regardless of the internal implementation of the metaops.
*/
-  if (ctx->Color.ClampFragmentColor != GL_TRUE)
+  if (ctx->Color.ClampFragmentColor != GL_TRUE &&
+  ctx->Extensions.ARB_color_buffer_float)
 _mesa_ClampColor(GL_CLAMP_FRAGMENT_COLOR, GL_FALSE);
}
 
@@ -767,7 +768,8 @@ _mesa_meta_begin(struct gl_context *ctx, GLbitfield state)
   /* Generally in here we never want vertex color clamping --
* result clamping is only dependent on fragment clamping.
*/
-  _mesa_ClampColor(GL_CLAMP_VERTEX_COLOR, GL_FALSE);
+  if (ctx->Extensions.ARB_color_buffer_float)
+ _mesa_ClampColor(GL_CLAMP_VERTEX_COLOR, GL_FALSE);
}
 
if (state & MESA_META_CONDITIONAL_RENDER) {
@@ -1091,11 +1093,13 @@ _mesa_meta_end(struct gl_context *ctx)
   _mesa_DepthRange(save->DepthNear, save->DepthFar);
}
 
-   if (state & MESA_META_CLAMP_FRAGMENT_COLOR) {
+   if (state & MESA_META_CLAMP_FRAGMENT_COLOR &&
+   ctx->Extensions.ARB_color_buffer_float) {
   _mesa_ClampColor(GL_CLAMP_FRAGMENT_COLOR, save->ClampFragmentColor);
}
 
-   if (state & MESA_META_CLAMP_VERTEX_COLOR) {
+   if (state & MESA_META_CLAMP_VERTEX_COLOR &&
+   ctx->Extensions.ARB_color_buffer_float) {
   _mesa_ClampColor(GL_CLAMP_VERTEX_COLOR, save->ClampVertexColor);
}
 
@@ -2044,7 +2048,8 @@ _mesa_meta_Clear(struct gl_context *ctx, GLbitfield 
buffers)
   /* leave colormask, glDrawBuffer state as-is */
 
   /* Clears never have the color clamped. */
-  _mesa_ClampColor(GL_CLAMP_FRAGMENT_COLOR, GL_FALSE);
+  if (ctx->Extensions.ARB_color_buffer_float)
+ _mesa_ClampColor(GL_CLAMP_FRAGMENT_COLOR, GL_FALSE);
}
else {
   ASSERT(metaSave & MESA_META_COLOR_MASK);
@@ -2295,7 +2300,8 @@ _mesa_meta_glsl_Clear(struct gl_context *ctx, GLbitfield 
buffers)
   /* leave colormask, glDrawBuffer state as-is */
 
   /* Clears never have the color clamped. */
-  _mesa_ClampColor(GL_CLAMP_FRAGMENT_COLOR, GL_FALSE);
+  if (ctx->Extensions.ARB_color_buffer_float)
+ _mesa_ClampColor(GL_CLAMP_FRAGMENT_COLOR, GL_FALSE);
}
else {
   ASSERT(metaSave & MESA_META_COLOR_MASK);
diff --git a/src/mesa/main/attrib.c b/src/mesa/main/attrib.c
index 3b991bc..a672845 100644
--- a/src/mesa/main/attrib.c
+++ b/src/mesa/main/attrib.c
@@ -986,7 +986,9 @@ _mesa_PopAttrib(void)
_mesa_set_enable(ctx, GL_INDEX_LOGIC_OP,
 color->IndexLogicOpEnabled);
_mesa_set_enable(ctx, GL_DITHER, color->DitherFlag);
-   _mesa_ClampColor(GL_CLAMP_FRAGMENT_COLOR_ARB, 
color->ClampFragmentColor);
+   if (ctx->Extensions.ARB_color_buffer_float)
+  _mesa_ClampColor(GL_CLAMP_FRAGMENT_COLOR_ARB,
+   color->ClampFragmentColor);
_mesa_ClampColor(GL_CLAMP_READ_COLOR_ARB, 
color->ClampReadColor);
 
/* GL_ARB_framebuffer_sRGB / GL_EXT_framebuffer_sRGB */
@@ -1114,7 +1116,10 @@ _mesa_PopAttrib(void)
/* materials */
memcpy(&ctx->Light.Material, &light->Material,
   sizeof(struct gl_material));
-   _mesa_ClampColor(GL_CLAMP_VERTEX_COLOR_ARB, 
light->ClampVertexColor);
+   if (ctx->Extensions.ARB_color_buffer_float) {
+  _mesa_ClampColor(GL_CLAMP_VERTEX_COLOR_ARB,
+   light->ClampVertexColor);
+   }
 }
 break;
  case GL_LINE_BIT:
diff --git a/src/mesa/main/blend.c b/src/mesa/main/blend.c
index 6cc2310..4e1a7ee 100644
--- a/src/mesa/main/blend.c
+++ b/src/mesa/main/blend.c
@@ -765,11 +765,19 @@ _mesa_ClampColor(GLenum target, GLenum clamp)
 
switch (target) {
case GL_CLAMP_VERTEX_COLOR_ARB:
+  if (ctx->API == API_OPENGL_CORE &&
+  !ctx->Extensions.ARB_color_buffer_float) {
+ goto invalid_enum;
+  }
   FLUSH_VERTICES(ctx, _NEW_LIGHT);
   ctx->Light.ClampVertexColor = clamp;
   _mesa_update_clamp_vertex_color(ctx);
   break;
case GL_CLAMP_FRAGMENT_COLOR_ARB:
+  if (ctx->API == API_OPENGL_CORE &&
+  !ctx->Extensions.ARB_color_buffer_float) {

[Mesa-dev] [PATCH 5/5] st/mesa: don't expose ARB_color_buffer_float without driver support in GL core

2013-03-28 Thread Marek Olšák
---
 src/mesa/state_tracker/st_extensions.c |   11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index 11db9d3..2d8b9ef 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -629,6 +629,7 @@ void st_init_extensions(struct st_context *st)
   ctx->Const.PrimitiveRestartInSoftware = GL_TRUE;
}
 
+   /* ARB_color_buffer_float. */
if (screen->get_param(screen, PIPE_CAP_VERTEX_COLOR_UNCLAMPED)) {
   ctx->Extensions.ARB_color_buffer_float = GL_TRUE;
 
@@ -639,6 +640,16 @@ void st_init_extensions(struct st_context *st)
   if (!screen->get_param(screen, PIPE_CAP_FRAGMENT_COLOR_CLAMPED)) {
  st->clamp_frag_color_in_shader = TRUE;
   }
+
+  /* For drivers which cannot do color clamping, it's better to just
+   * disable ARB_color_buffer_float in the core profile, because
+   * the clamping is deprecated there anyway. */
+  if (ctx->API == API_OPENGL_CORE &&
+  (st->clamp_frag_color_in_shader || st->clamp_vert_color_in_shader)) {
+ st->clamp_vert_color_in_shader = GL_FALSE;
+ st->clamp_frag_color_in_shader = GL_FALSE;
+ ctx->Extensions.ARB_color_buffer_float = GL_FALSE;
+  }
}
 
if (screen->fence_finish) {
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 2/3] i965/fs: Generate LOD sampler message from ir_lod.

2013-03-28 Thread Matt Turner
v2: Support Ironlake as well.
Reviewed-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_defines.h|2 ++
 src/mesa/drivers/dri/i965/brw_fs.cpp   |4 +++-
 src/mesa/drivers/dri/i965/brw_fs_emit.cpp  |4 
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |6 ++
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp |5 +
 5 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index b45e4a4..1929035 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -711,6 +711,7 @@ enum opcode {
SHADER_OPCODE_TXS,
FS_OPCODE_TXB,
SHADER_OPCODE_TXF_MS,
+   SHADER_OPCODE_LOD,
 
SHADER_OPCODE_SHADER_TIME_ADD,
 
@@ -896,6 +897,7 @@ enum brw_message_target {
 #define GEN5_SAMPLER_MESSAGE_SAMPLE_BIAS_COMPARE 5
 #define GEN5_SAMPLER_MESSAGE_SAMPLE_LOD_COMPARE  6
 #define GEN5_SAMPLER_MESSAGE_SAMPLE_LD   7
+#define GEN5_SAMPLER_MESSAGE_LOD 9
 #define GEN5_SAMPLER_MESSAGE_SAMPLE_RESINFO  10
 #define HSW_SAMPLER_MESSAGE_SAMPLE_DERIV_COMPARE 20
 #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD_MCS   29
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 273761b..1b5887d 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -338,7 +338,8 @@ fs_inst::is_tex()
opcode == SHADER_OPCODE_TXF ||
opcode == SHADER_OPCODE_TXF_MS ||
opcode == SHADER_OPCODE_TXL ||
-   opcode == SHADER_OPCODE_TXS);
+   opcode == SHADER_OPCODE_TXS ||
+   opcode == SHADER_OPCODE_LOD);
 }
 
 bool
@@ -732,6 +733,7 @@ fs_visitor::implied_mrf_writes(fs_inst *inst)
case SHADER_OPCODE_TXF_MS:
case SHADER_OPCODE_TXL:
case SHADER_OPCODE_TXS:
+   case SHADER_OPCODE_LOD:
   return 1;
case FS_OPCODE_FB_WRITE:
   return 2;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
index 5db481c..b64eb7a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
@@ -404,6 +404,9 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg 
dst, struct brw_reg src
  else
 msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LD;
  break;
+  case SHADER_OPCODE_LOD:
+ msg_type = GEN6_SAMPLER_MESSAGE_LOD;
+ break;
   default:
 assert(!"not reached");
 break;
@@ -1276,6 +1279,7 @@ fs_generator::generate_code(exec_list *instructions)
   case SHADER_OPCODE_TXF_MS:
   case SHADER_OPCODE_TXL:
   case SHADER_OPCODE_TXS:
+  case SHADER_OPCODE_LOD:
 generate_tex(inst, dst, src[0]);
 break;
   case FS_OPCODE_DDX:
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 857f956..8556b56 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1084,6 +1084,9 @@ fs_visitor::emit_texture_gen5(ir_texture *ir, fs_reg dst, 
fs_reg coordinate,
   mlen += reg_width;
   inst = emit(SHADER_OPCODE_TXF_MS, dst);
   break;
+   case ir_lod:
+  inst = emit(SHADER_OPCODE_LOD, dst);
+  break;
}
inst->base_mrf = base_mrf;
inst->mlen = mlen;
@@ -1124,6 +1127,7 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, 
fs_reg coordinate,
/* Set up the LOD info */
switch (ir->op) {
case ir_tex:
+   case ir_lod:
   break;
case ir_txb:
   emit(MOV(fs_reg(MRF, base_mrf + mlen), lod));
@@ -1237,6 +1241,7 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, 
fs_reg coordinate,
case ir_txf: inst = emit(SHADER_OPCODE_TXF, dst); break;
case ir_txf_ms: inst = emit(SHADER_OPCODE_TXF_MS, dst); break;
case ir_txs: inst = emit(SHADER_OPCODE_TXS, dst); break;
+   case ir_lod: inst = emit(SHADER_OPCODE_LOD, dst); break;
}
inst->base_mrf = base_mrf;
inst->mlen = mlen;
@@ -1388,6 +1393,7 @@ fs_visitor::visit(ir_texture *ir)
fs_reg lod, lod2, sample_index;
switch (ir->op) {
case ir_tex:
+   case ir_lod:
   break;
case ir_txb:
   ir->lod_info.bias->accept(this);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 3229d82..8bd2fd8 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -2127,6 +2127,7 @@ vec4_visitor::visit(ir_texture *ir)
   lod_type = ir->lod_info.grad.dPdx->type;
   break;
case ir_txb:
+   case ir_lod:
   break;
}
 
@@ -2150,6 +2151,10 @@ vec4_visitor::visit(ir_texture *ir)
   break;
case ir_txb:
   assert(!"TXB is not valid for vertex shaders.");
+  break;
+   case ir_lod:
+  assert(!"LOD is not valid for vertex shaders.");
+  break;
}
 
bool use_texture_offset = 

Re: [Mesa-dev] [PATCH v2 2/3] i965/fs: Generate LOD sampler message from ir_lod.

2013-03-28 Thread Matt Turner
On Thu, Mar 28, 2013 at 3:02 PM, Matt Turner  wrote:
> v2: Support Ironlake as well.
> Reviewed-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_defines.h|2 ++
>  src/mesa/drivers/dri/i965/brw_fs.cpp   |4 +++-
>  src/mesa/drivers/dri/i965/brw_fs_emit.cpp  |4 
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |6 ++
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp |5 +
>  5 files changed, 20 insertions(+), 1 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index b45e4a4..1929035 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -711,6 +711,7 @@ enum opcode {
> SHADER_OPCODE_TXS,
> FS_OPCODE_TXB,
> SHADER_OPCODE_TXF_MS,
> +   SHADER_OPCODE_LOD,
>
> SHADER_OPCODE_SHADER_TIME_ADD,
>
> @@ -896,6 +897,7 @@ enum brw_message_target {
>  #define GEN5_SAMPLER_MESSAGE_SAMPLE_BIAS_COMPARE 5
>  #define GEN5_SAMPLER_MESSAGE_SAMPLE_LOD_COMPARE  6
>  #define GEN5_SAMPLER_MESSAGE_SAMPLE_LD   7
> +#define GEN5_SAMPLER_MESSAGE_LOD 9
>  #define GEN5_SAMPLER_MESSAGE_SAMPLE_RESINFO  10
>  #define HSW_SAMPLER_MESSAGE_SAMPLE_DERIV_COMPARE 20
>  #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD_MCS   29
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 273761b..1b5887d 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -338,7 +338,8 @@ fs_inst::is_tex()
> opcode == SHADER_OPCODE_TXF ||
> opcode == SHADER_OPCODE_TXF_MS ||
> opcode == SHADER_OPCODE_TXL ||
> -   opcode == SHADER_OPCODE_TXS);
> +   opcode == SHADER_OPCODE_TXS ||
> +   opcode == SHADER_OPCODE_LOD);
>  }
>
>  bool
> @@ -732,6 +733,7 @@ fs_visitor::implied_mrf_writes(fs_inst *inst)
> case SHADER_OPCODE_TXF_MS:
> case SHADER_OPCODE_TXL:
> case SHADER_OPCODE_TXS:
> +   case SHADER_OPCODE_LOD:
>return 1;
> case FS_OPCODE_FB_WRITE:
>return 2;
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
> index 5db481c..b64eb7a 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
> @@ -404,6 +404,9 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg 
> dst, struct brw_reg src
>   else
>  msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LD;
>   break;
> +  case SHADER_OPCODE_LOD:
> + msg_type = GEN6_SAMPLER_MESSAGE_LOD;

Just imagine that I'd compile tested this and noticed that it should
be s/GEN6/GEN5/.

> + break;
>default:
>  assert(!"not reached");
>  break;
> @@ -1276,6 +1279,7 @@ fs_generator::generate_code(exec_list *instructions)
>case SHADER_OPCODE_TXF_MS:
>case SHADER_OPCODE_TXL:
>case SHADER_OPCODE_TXS:
> +  case SHADER_OPCODE_LOD:
>  generate_tex(inst, dst, src[0]);
>  break;
>case FS_OPCODE_DDX:
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> index 857f956..8556b56 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> @@ -1084,6 +1084,9 @@ fs_visitor::emit_texture_gen5(ir_texture *ir, fs_reg 
> dst, fs_reg coordinate,
>mlen += reg_width;
>inst = emit(SHADER_OPCODE_TXF_MS, dst);
>break;
> +   case ir_lod:
> +  inst = emit(SHADER_OPCODE_LOD, dst);
> +  break;
> }
> inst->base_mrf = base_mrf;
> inst->mlen = mlen;
> @@ -1124,6 +1127,7 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg 
> dst, fs_reg coordinate,
> /* Set up the LOD info */
> switch (ir->op) {
> case ir_tex:
> +   case ir_lod:
>break;
> case ir_txb:
>emit(MOV(fs_reg(MRF, base_mrf + mlen), lod));
> @@ -1237,6 +1241,7 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg 
> dst, fs_reg coordinate,
> case ir_txf: inst = emit(SHADER_OPCODE_TXF, dst); break;
> case ir_txf_ms: inst = emit(SHADER_OPCODE_TXF_MS, dst); break;
> case ir_txs: inst = emit(SHADER_OPCODE_TXS, dst); break;
> +   case ir_lod: inst = emit(SHADER_OPCODE_LOD, dst); break;
> }
> inst->base_mrf = base_mrf;
> inst->mlen = mlen;
> @@ -1388,6 +1393,7 @@ fs_visitor::visit(ir_texture *ir)
> fs_reg lod, lod2, sample_index;
> switch (ir->op) {
> case ir_tex:
> +   case ir_lod:
>break;
> case ir_txb:
>ir->lod_info.bias->accept(this);
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> index 3229d82..8bd2fd8 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> @@ -2127,6 +2127,7 @@ vec4_visitor::visit(ir_texture *ir)
>lod_type

[Mesa-dev] [PATCH 3/3] i965: Enable ARB_texture_query_lod.

2013-03-28 Thread Matt Turner
v2: Support Ironlake as well.
Reviewed-by: Kenneth Graunke 
---
 docs/GL3.txt  |2 +-
 src/mesa/drivers/dri/intel/intel_extensions.c |4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index 640dec2..2c15a9b 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -87,7 +87,7 @@ GL_ARB_vertex_type_2_10_10_10_rev DONE 
(i965, r600)
 GL 4.0:
 
 GLSL 4.0 not started
-GL_ARB_texture_query_lod not started
+GL_ARB_texture_query_lod DONE (i965)
 GL_ARB_draw_buffers_blendDONE (i965, r600, 
softpipe)
 GL_ARB_draw_indirect not started
 GL_ARB_gpu_shader5   not started
diff --git a/src/mesa/drivers/dri/intel/intel_extensions.c 
b/src/mesa/drivers/dri/intel/intel_extensions.c
index 332fdd8..9efdee4 100755
--- a/src/mesa/drivers/dri/intel/intel_extensions.c
+++ b/src/mesa/drivers/dri/intel/intel_extensions.c
@@ -110,8 +110,10 @@ intelInitExtensions(struct gl_context *ctx)
   ctx->Extensions.ARB_texture_multisample = true;
}
 
-   if (intel->gen >= 5)
+   if (intel->gen >= 5) {
+  ctx->Extensions.ARB_texture_query_lod = true;
   ctx->Extensions.EXT_timer_query = true;
+   }
 
if (intel->gen >= 6) {
   uint64_t dummy;
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 62883] New: draw/draw_gs.c:684:4: error: implicit declaration of function 'draw_get_option_use_llvm' [-Werror=implicit-function-declaration]

2013-03-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=62883

  Priority: medium
Bug ID: 62883
  Keywords: regression
CC: za...@vmware.com
  Assignee: mesa-dev@lists.freedesktop.org
   Summary: draw/draw_gs.c:684:4: error: implicit declaration of
function 'draw_get_option_use_llvm'
[-Werror=implicit-function-declaration]
  Severity: blocker
Classification: Unclassified
OS: All
  Reporter: v...@freedesktop.org
  Hardware: x86-64 (AMD64)
Status: NEW
   Version: git
 Component: Mesa core
   Product: Mesa

mesa: 9ad914191716631169427f44e08078812ce38ff8

$ ./autogen.sh --with-dri-drivers= --with-gallium-drivers=svga,swrast
--enable-gallium-llvm=no
[...]
$ make
[...]
  CC draw/draw_gs.lo
In file included from draw/draw_gs.c:28:0:
draw/draw_gs.h:136:41: warning: 'struct draw_gs_llvm_variant' declared inside
parameter list [enabled by default]
draw/draw_gs.h:136:41: warning: its scope is only this definition or
declaration, which is probably not what you want [enabled by default]
draw/draw_gs.c: In function 'draw_create_geometry_shader':
draw/draw_gs.c:684:4: error: implicit declaration of function
'draw_get_option_use_llvm' [-Werror=implicit-function-declaration]
draw/draw_gs.c: At top level:
draw/draw_gs.c:803:41: warning: 'struct draw_gs_llvm_variant' declared inside
parameter list [enabled by default]
draw/draw_gs.c:802:6: error: conflicting types for
'draw_gs_set_current_variant'
draw/draw_gs.h:135:6: note: previous declaration of
'draw_gs_set_current_variant' was here
draw/draw_gs.c: In function 'draw_gs_set_current_variant':
draw/draw_gs.c:805:10: error: 'struct draw_geometry_shader' has no member named
'current_variant'
cc1: some warnings being treated as errors

e96f4e3b853ff5fe4d927c69695c0b5f1966d448 is the first bad commit
commit e96f4e3b853ff5fe4d927c69695c0b5f1966d448
Author: Zack Rusin 
Date:   Mon Feb 18 04:00:19 2013 -0800

gallium/llvm: implement geometry shaders in the llvm paths

This commits implements code generation of the geometry shaders in
the SOA paths. All the code is there but bugs are likely present.

Signed-off-by: Zack Rusin 
Reviewed-by: Brian Paul 
Reviewed-by: José Fonseca 

:04 04 3e402aa649d27d70f4b2a1481038737873d81982
23436fbded4ca6db7db3c45a4fa47b0bc203190b Msrc
bisect run success

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 62883] draw/draw_gs.c:684:4: error: implicit declaration of function 'draw_get_option_use_llvm' [-Werror=implicit-function-declaration]

2013-03-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=62883

--- Comment #1 from Brian Paul  ---
Created attachment 77174
  --> https://bugs.freedesktop.org/attachment.cgi?id=77174&action=edit
patch to test

Can you try this patch?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 62883] draw/draw_gs.c:684:4: error: implicit declaration of function 'draw_get_option_use_llvm' [-Werror=implicit-function-declaration]

2013-03-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=62883

--- Comment #2 from Vinson Lee  ---
Attachment 77174 fixes the build error.

Tested-by: Vinson Lee 

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: remove platform checks around __builtin_ffs, __builtin_ffsll

2013-03-28 Thread Brian Paul
Use the __builtin_ffs, __builtin_ffsll functions whenever we have GCC,
not just for specific platforms.  Fixes Solaris build.

Note: This is a candidate for the stable branches.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62868
Signed-off-by: Brian Paul 
---
 src/mesa/main/imports.h |6 --
 1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/src/mesa/main/imports.h b/src/mesa/main/imports.h
index 4b78381..c4110ef 100644
--- a/src/mesa/main/imports.h
+++ b/src/mesa/main/imports.h
@@ -500,17 +500,11 @@ _mesa_realloc( void *oldBuffer, size_t oldSize, size_t 
newSize );
 #ifndef FFS_DEFINED
 #define FFS_DEFINED 1
 #ifdef __GNUC__
-
-#if defined(__MINGW32__) || defined(__CYGWIN__) || defined(ANDROID) || 
defined(__APPLE__)
 #define ffs __builtin_ffs
 #define ffsll __builtin_ffsll
-#endif
-
 #else
-
 extern int ffs(int i);
 extern int ffsll(long long int i);
-
 #endif /*__ GNUC__ */
 #endif /* FFS_DEFINED */
 
-- 
1.7.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 62883] draw/draw_gs.c:684:4: error: implicit declaration of function 'draw_get_option_use_llvm' [-Werror=implicit-function-declaration]

2013-03-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=62883

Brian Paul  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Brian Paul  ---
Fixed with commit 499aa3ddb448a7461c0bdea93c8f218db6992720

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2] mesa: provide default implementation of QuerySamplesForFormat

2013-03-28 Thread Chris Forbes
Previously at least i915 failed to provide an implementation, but
exposed ARB_internalformat_query anyway, leading to crashes when
QueryInternalformativ was called.

Default implementation just returns 1 for everything, so is suitable for
any driver which does not support multisampling.

V2: - Move from intel to core mesa.

Signed-off-by: Chris Forbes 
---
 src/mesa/drivers/common/driverfuncs.c |  3 ++-
 src/mesa/main/formatquery.c   | 15 +++
 src/mesa/main/formatquery.h   |  4 
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/common/driverfuncs.c 
b/src/mesa/drivers/common/driverfuncs.c
index 731d466..a98dfc6 100644
--- a/src/mesa/drivers/common/driverfuncs.c
+++ b/src/mesa/drivers/common/driverfuncs.c
@@ -28,6 +28,7 @@
 #include "main/accum.h"
 #include "main/arrayobj.h"
 #include "main/context.h"
+#include "main/formatquery.h"
 #include "main/framebuffer.h"
 #include "main/mipmap.h"
 #include "main/queryobj.h"
@@ -90,7 +91,7 @@ _mesa_init_driver_functions(struct dd_function_table *driver)
 
/* Texture functions */
driver->ChooseTextureFormat = _mesa_choose_tex_format;
-   driver->QuerySamplesForFormat = NULL;
+   driver->QuerySamplesForFormat = _mesa_query_samples_for_format;
driver->TexImage = _mesa_store_teximage;
driver->TexSubImage = _mesa_store_texsubimage;
driver->GetTexImage = _mesa_meta_GetTexImage;
diff --git a/src/mesa/main/formatquery.c b/src/mesa/main/formatquery.c
index 78c5fbe..e45dc86 100644
--- a/src/mesa/main/formatquery.c
+++ b/src/mesa/main/formatquery.c
@@ -30,6 +30,21 @@
 #include "fbobject.h"
 #include "formatquery.h"
 
+/* default implementation of QuerySamplesForFormat driverfunc, for
+ * non-multisample-capable drivers. */
+size_t
+_mesa_query_samples_for_format(struct gl_context *ctx, GLenum target,
+   GLenum internalFormat, int samples[16])
+{
+   (void) target;
+   (void) internalFormat;
+   (void) ctx;
+
+   samples[0] = 1;
+   return 1;
+}
+
+
 void GLAPIENTRY
 _mesa_GetInternalformativ(GLenum target, GLenum internalformat, GLenum pname,
   GLsizei bufSize, GLint *params)
diff --git a/src/mesa/main/formatquery.h b/src/mesa/main/formatquery.h
index 585c3eb..6034000 100644
--- a/src/mesa/main/formatquery.h
+++ b/src/mesa/main/formatquery.h
@@ -28,6 +28,10 @@
 #include "compiler.h"
 #include "glheader.h"
 
+size_t
+_mesa_query_samples_for_format(struct gl_context *ctx, GLenum target,
+   GLenum internalFormat, int samples[16]);
+
 extern void GLAPIENTRY
 _mesa_GetInternalformativ(GLenum target, GLenum internalformat,
   GLenum pname, GLsizei bufSize, GLint *params);
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] draw/gs: Fix memory corruption with multiple primitives

2013-03-28 Thread Zack Rusin
We were flushing with incorrect number of primitives. TGSI exec
can only work with a single primitive at a time. Plus the fetching
with multiple primitives on llvm paths wasn't copying the last
element.

Signed-off-by: Zack Rusin 
---
 src/gallium/auxiliary/draw/draw_gs.c |   25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_gs.c 
b/src/gallium/auxiliary/draw/draw_gs.c
index 0115d47..2aae07c 100644
--- a/src/gallium/auxiliary/draw/draw_gs.c
+++ b/src/gallium/auxiliary/draw/draw_gs.c
@@ -247,8 +247,8 @@ llvm_fetch_gs_input(struct draw_geometry_shader *shader,
 shader->info.input_semantic_index[slot],
 shader->input_info);
 #if DEBUG_INPUTS
-debug_printf("\tSlot = %d, vs_slot = %d, idx = %d:\n",
- slot, vs_slot, idx);
+debug_printf("\tSlot = %d, vs_slot = %d, i = %d:\n",
+ slot, vs_slot, i);
 #endif
 #if 0
 assert(!util_is_inf_or_nan(input[vs_slot][0]));
@@ -302,7 +302,7 @@ llvm_fetch_gs_outputs(struct draw_geometry_shader *shader,
   if (current_verts != shader->max_output_vertices) {
  memcpy(output_ptr + (vertex_count + current_verts) * 
shader->vertex_size,
 output_ptr + (vertex_count + shader->max_output_vertices) * 
shader->vertex_size,
-shader->vertex_size * (total_verts - vertex_count - 
current_verts));
+shader->vertex_size * (total_verts - vertex_count));
   }
   vertex_count += current_verts;
}
@@ -384,7 +384,8 @@ static void gs_point(struct draw_geometry_shader *shader,
++shader->in_prim_idx;
++shader->fetched_prim_count;
 
-   gs_flush(shader);
+   if (draw_gs_should_flush(shader))
+  gs_flush(shader);
 }
 
 static void gs_line(struct draw_geometry_shader *shader,
@@ -399,8 +400,9 @@ static void gs_line(struct draw_geometry_shader *shader,
 shader->fetched_prim_count);
++shader->in_prim_idx;
++shader->fetched_prim_count;
-
-   gs_flush(shader);
+   
+   if (draw_gs_should_flush(shader))   
+  gs_flush(shader);
 }
 
 static void gs_line_adj(struct draw_geometry_shader *shader,
@@ -418,7 +420,8 @@ static void gs_line_adj(struct draw_geometry_shader *shader,
++shader->in_prim_idx;
++shader->fetched_prim_count;
 
-   gs_flush(shader);
+   if (draw_gs_should_flush(shader))
+  gs_flush(shader);
 }
 
 static void gs_tri(struct draw_geometry_shader *shader,
@@ -435,7 +438,8 @@ static void gs_tri(struct draw_geometry_shader *shader,
++shader->in_prim_idx;
++shader->fetched_prim_count;
 
-   gs_flush(shader);
+   if (draw_gs_should_flush(shader))
+  gs_flush(shader);
 }
 
 static void gs_tri_adj(struct draw_geometry_shader *shader,
@@ -456,7 +460,8 @@ static void gs_tri_adj(struct draw_geometry_shader *shader,
++shader->in_prim_idx;
++shader->fetched_prim_count;
 
-   gs_flush(shader);
+   if (draw_gs_should_flush(shader))
+  gs_flush(shader);
 }
 
 #define FUNC gs_run
@@ -688,7 +693,7 @@ draw_create_geometry_shader(struct draw_context *draw,
   gs->vector_length = lp_native_vector_width / 32;*/
   gs->vector_length = TGSI_NUM_CHANNELS;
} else {
-  gs->vector_length = TGSI_NUM_CHANNELS;
+  gs->vector_length = 1;
}
 
for (i = 0; i < gs->info.num_properties; ++i) {
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] draw/so: Fix bogus assert

2013-03-28 Thread Zack Rusin
We do support so with multiple primitives.

Signed-off-by: Zack Rusin 
---
 src/gallium/auxiliary/draw/draw_so_emit_tmp.h |1 -
 1 file changed, 1 deletion(-)

diff --git a/src/gallium/auxiliary/draw/draw_so_emit_tmp.h 
b/src/gallium/auxiliary/draw/draw_so_emit_tmp.h
index ec31c3f..4611cd0 100644
--- a/src/gallium/auxiliary/draw/draw_so_emit_tmp.h
+++ b/src/gallium/auxiliary/draw/draw_so_emit_tmp.h
@@ -12,7 +12,6 @@
const boolean quads_flatshade_last = FALSE;\
const boolean last_vertex_last = TRUE; \
do {   \
-  debug_assert(input_prims->primitive_count == 1);\
   switch (prim) { \
   case PIPE_PRIM_LINES_ADJACENCY: \
   case PIPE_PRIM_LINE_STRIP_ADJACENCY:\
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V2] mesa: provide default implementation of QuerySamplesForFormat

2013-03-28 Thread Kenneth Graunke

On 03/28/2013 08:28 PM, Chris Forbes wrote:

Previously at least i915 failed to provide an implementation, but
exposed ARB_internalformat_query anyway, leading to crashes when
QueryInternalformativ was called.

Default implementation just returns 1 for everything, so is suitable for
any driver which does not support multisampling.

V2: - Move from intel to core mesa.

Signed-off-by: Chris Forbes 


Reviewed-by: Kenneth Graunke 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev