Re: [Mesa-dev] [PATCH] st/dri/sw: Fix pitch calculation in drisw_update_tex_buffer

2013-06-19 Thread Stéphane Marchesin
On Tue, Jun 18, 2013 at 8:41 AM, Richard Sandiford
 wrote:
> swrastGetImage rounds the pitch up to 4 bytes for compatibility reasons
> that are explained in drisw_glx.c:bytes_per_line, so drisw_update_tex_buffer
> must do the same.
>
> Fixes window skew seen while running firefox over vnc on a 16-bit screen.
>
> Signed-off-by: Richard Sandiford 
> ---
>  src/gallium/state_trackers/dri/sw/drisw.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/state_trackers/dri/sw/drisw.c 
> b/src/gallium/state_trackers/dri/sw/drisw.c
> index 7a5f797..f9fddf7 100644
> --- a/src/gallium/state_trackers/dri/sw/drisw.c
> +++ b/src/gallium/state_trackers/dri/sw/drisw.c
> @@ -265,8 +265,9 @@ drisw_update_tex_buffer(struct dri_drawable *drawable,
> /* Copy the Drawable content to the mapped texture buffer */
> get_image(dPriv, x, y, w, h, map);
>
> -   /* The pipe transfer has a pitch rounded up to the nearest 64 pixels. */
> -   ximage_stride = w * cpp;
> +   /* The pipe transfer has a pitch rounded up to the nearest 64 pixels.
> +  get_image() has a patch rounded up to 4 bytes.  */

"has a pitch"

Other than that:
Reviewed-by: Stéphane Marchesin 

> +   ximage_stride = ((w * cpp) + 3) & -4;
> for (line = h-1; line; --line) {
>memmove(&map[line * transfer->stride],
>&map[line * ximage_stride],
> --
> 1.7.11.7
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V3 2/2] i965: Enable ext_framebuffer_multisample_blit_scaled on intel h/w

2013-06-19 Thread Anuj Phogat
This patch enables ext_framebuffer_multisample_blit_scaled extension
on intel h/w >= gen6.

Note: Patches for piglit tests to verify this functionality are out
for review on piglit mailing list.

Comment history:
Paul Berry on v1 of my implementation:
"I have some concerns about the image quality of the method you've
implemented.  As I understand it, the primary use case of this extension
is to allow the client to do multisampled rendering at slightly less
than screen resolution (e.g. 720p instead of 1080p), and then blit the
result to the screen in one step while keeping most of the quality
benefits of multisampling.  Since your implementation is effectively
equivalent to downsampling and then blitting using GL_NEAREST filtering,
my fear is that it will lead to blocky artifacts that are severe enough
to negate the benefit of multisampling in the first place.

Anuj Phogat proposed v2:
My latest implementation uses bilinear filtering of samples within a
texel and addresses most of the above concerns. It produces images
which are free from blocky artifacts and show big improvement in visual
quality. Here is a link to an image comparing the rendering quality on
Intel and NVIDIA drivers:
https://www.dropbox.com/s/m90lqqrj2vjps3g/scaled-blit.png

Paul Berry on v2:
There is a big improvement in quality of the image. But my concern is
that the algorithm used for filtering is not true bilinear filtering
of samples. For sampling texture coordinates on the edge of a texel,
we should also use samples from adjacent texels. We might not be
seeing these problems in the test case you've developed, but we can
hit them in some real world application.

Anuj Phogat proposed v3:
Algorithm used for filtering now assumes a rectangular grid of samples
roughly corresponding to sample locations. It also tests the boundary
conditions on the texture edges.
So, Now we do a true bilinear filtering of samples.

Piglit tests to verify the implementation can be found at:
https://github.com/aphogat/piglit.git
Branch: blit-3

Error conditions test:
ext_framebuffer_multisample-negative-blit-scaled
Accuracy test:
ext_framebuffer_multisample-blit-scaled-glsl
Visual quality test:
ext_framebuffer_multisample-blit-scaled

Signed-off-by: Anuj Phogat 
---
 src/mesa/drivers/dri/intel/intel_extensions.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/intel/intel_extensions.c 
b/src/mesa/drivers/dri/intel/intel_extensions.c
index 5cb2fa3..23bc757 100644
--- a/src/mesa/drivers/dri/intel/intel_extensions.c
+++ b/src/mesa/drivers/dri/intel/intel_extensions.c
@@ -94,6 +94,7 @@ intelInitExtensions(struct gl_context *ctx)
if (intel->gen >= 6) {
   ctx->Extensions.EXT_framebuffer_multisample = true;
   ctx->Extensions.EXT_transform_feedback = true;
+  ctx->Extensions.EXT_framebuffer_multisample_blit_scaled = true;
   ctx->Extensions.ARB_blend_func_extended = 
!driQueryOptionb(&intel->optionCache, "disable_blend_func_extended");
   ctx->Extensions.ARB_draw_buffers_blend = true;
   ctx->Extensions.ARB_ES3_compatibility = true;
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V3 1/2] i965/blorp: Add bilinear filtering of samples for multisample scaled blits

2013-06-19 Thread Anuj Phogat
Current implementation of ext_framebuffer_multisample_blit_scaled in
i965/blorp uses nearest filtering for multisample scaled blits. Using
nearest filtering produces blocky artifacts and negates the benefits
of MSAA. That is the reason why extension was not enabled on i965.

This patch implements the bilinear filtering of samples in blorp engine.
Images generated with this patch are free from blocky artifacts and show
big improvement in visual quality.

Observed no piglit and gles3 regressions.

V3:
- Algorithm used for filtering assumes a rectangular grid of samples
  roughly corresponding to sample locations.
- Test the boundary conditions on the edges of texture.

Signed-off-by: Anuj Phogat 
---
 src/mesa/drivers/dri/i965/brw_blorp.h|  11 ++
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 257 +--
 2 files changed, 258 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h 
b/src/mesa/drivers/dri/i965/brw_blorp.h
index ffc27cc..0a15b89 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.h
+++ b/src/mesa/drivers/dri/i965/brw_blorp.h
@@ -319,6 +319,17 @@ struct brw_blorp_blit_prog_key
 * than one sample per pixel.
 */
bool persample_msaa_dispatch;
+
+   /* True for scaled blitting. */
+   bool blit_scaled;
+
+   /* Source rectangle dimensions. Used to test boundary conditions in shader
+* program.
+*/
+   float src_x0;
+   float src_y0;
+   float src_x1;
+   float src_y1;
 };
 
 class brw_blorp_blit_params : public brw_blorp_params
diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index 8694128..e99c9df 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -622,7 +622,8 @@ private:
void kill_if_outside_dst_rect();
void translate_dst_to_src();
void single_to_blend();
-   void manual_blend(unsigned num_samples);
+   void manual_blend_average(unsigned num_samples);
+   void manual_blend_linear(unsigned num_samples);
void sample(struct brw_reg dst);
void texel_fetch(struct brw_reg dst);
void mcs_fetch();
@@ -676,6 +677,16 @@ private:
 */
struct brw_reg y_coords[2];
 
+   /* X, Y coordinates of the pixel from which we need to fetch the specific
+*  sample. These are used for multisample scaled blitting.
+*/
+   struct brw_reg x_sample_coords;
+   struct brw_reg y_sample_coords;
+
+   /* Store the values to use to interpolate in x and y directions */
+   struct brw_reg x_lerp;
+   struct brw_reg y_lerp;
+
/* Which element of x_coords and y_coords is currently in use.
 */
int xy_coord_index;
@@ -814,15 +825,17 @@ brw_blorp_blit_program::compile(struct brw_context *brw,
 * that we want to texture from.  Exception: if we are blending, then S is
 * irrelevant, because we are going to fetch all samples.
 */
-   if (key->blend) {
+   if (key->blend && !key->blit_scaled) {
   if (brw->intel.gen == 6) {
  /* Gen6 hardware an automatically blend using the SAMPLE message */
  single_to_blend();
  sample(texture_data[0]);
   } else {
  /* Gen7+ hardware doesn't automaticaly blend. */
- manual_blend(key->src_samples);
+ manual_blend_average(key->src_samples);
   }
+   } else if(key->blend && key->blit_scaled) {
+  manual_blend_linear(key->src_samples);
} else {
   /* We aren't blending, which means we just want to fetch a single sample
* from the source surface.  The address that we want to fetch from is
@@ -913,6 +926,18 @@ brw_blorp_blit_program::alloc_regs()
  = retype(brw_vec8_grf(reg, 0), BRW_REGISTER_TYPE_UD);
   reg += 2;
}
+
+   if (key->blit_scaled && key->blend) {
+  this->x_sample_coords = brw_vec8_grf(reg, 0);
+  reg += 2;
+  this->y_sample_coords = brw_vec8_grf(reg, 0);
+  reg += 2;
+  this->x_lerp = brw_vec8_grf(reg, 0);
+  reg += 2;
+  this->y_lerp = brw_vec8_grf(reg, 0);
+  reg += 2;
+   }
+
this->xy_coord_index = 0;
this->sample_index
   = retype(brw_vec8_grf(reg, 0), BRW_REGISTER_TYPE_UD);
@@ -1368,11 +1393,82 @@ brw_blorp_blit_program::translate_dst_to_src()
brw_MUL(&func, Y_f, Yp_f, y_transform.multiplier);
brw_ADD(&func, X_f, X_f, x_transform.offset);
brw_ADD(&func, Y_f, Y_f, y_transform.offset);
-   /* Round the float coordinates down to nearest integer by moving to
-* UD registers.
-*/
-   brw_MOV(&func, Xp, X_f);
-   brw_MOV(&func, Yp, Y_f);
+   if (key->blit_scaled && key->blend) {
+  float x_scale = 2.0;
+  float y_scale = key->src_samples / 2.0;
+  /* Translate coordinates to lay out the samples in a rectangular  grid
+   * roughly corresponding to sample locations.
+   */
+  brw_ADD(&func, X_f, X_f, brw_imm_f(-0.25));
+  brw_ADD(&func, Y_f, Y_f, brw_imm_f(-1.0 / key->src_samples));
+  brw_MUL(&func, X_f, X_f, brw_imm_f(x_scale));
+  brw_MUL(&func, Y_f, Y

Re: [Mesa-dev] [PATCH] i965: Remove interleaved user array upload optimization

2013-06-19 Thread Eric Anholt
Ian Romanick  writes:

> From: Ian Romanick 
>
> The checks to determine when the data can be uploaded in an interleaved
> fashion can be tricked by certain data layouts.  For example,
>
> float data[...];
>
> glVertexAttribPointer(0, 4, GL_FLOAT, GL_FALSE, 16, &data[0]);
> glVertexAttribPointer(1, 4, GL_FLOAT, GL_FALSE, 16, &data[4]);
> glDrawArrays(GL_POINTS, 0, 1);
>
> will hit the interleaved path with an incorrect size (16 bytes instead
> of 32 bytes).  As a result, the data for attribute 1 never gets
> uploaded.  The single element draw case is the only sensible case I can
> think of for non-interleaved-that-looks-like-interleaved data, but there
> may be others as well.
>
> I don't see an easy way to fix the checks, so I have chosen to just rip
> out the optimization.  This may cause performance issues on applications
> that use large arrays of data that are not in VBOs.

Check that count * strideb actually contains the data you're hoping to
read from the later attributes?  Seems pretty doable.

As is, this hurts cairo-gl performance by 1.9034% +/- 0.441538%
(n=33/34, 3 throttling outliers removed)


pgpEb4SBtrQyC.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] mesa: Move shader compiler API code to shaderapi.c

2013-06-19 Thread Eric Anholt
There was nothing ir_to_mesa-specific about this code, but it's not
exactly part of the compiler's core turning-source-into-IR job either.

v2: Split from the ir_to_mesa to glsl/ commit, avoid renaming the sh
variable.

Acked-by: Paul Berry  (v1)
---
 src/mesa/main/shaderapi.c   | 42 +
 src/mesa/program/ir_to_mesa.cpp | 31 --
 2 files changed, 38 insertions(+), 35 deletions(-)

diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
index 88518ca..d214204 100644
--- a/src/mesa/main/shaderapi.c
+++ b/src/mesa/main/shaderapi.c
@@ -48,10 +48,12 @@
 #include "main/transformfeedback.h"
 #include "main/uniforms.h"
 #include "program/program.h"
+#include "program/prog_print.h"
 #include "program/prog_parameter.h"
 #include "ralloc.h"
 #include 
 #include "../glsl/glsl_parser_extras.h"
+#include "../glsl/ir.h"
 #include "../glsl/ir_uniform.h"
 
 /** Define this to enable shader substitution (see below) */
@@ -743,10 +745,42 @@ compile_shader(struct gl_context *ctx, GLuint shaderObj)
/* set default pragma state for shader */
sh->Pragmas = options->DefaultPragmas;
 
-   /* this call will set the sh->CompileStatus field to indicate if
-* compilation was successful.
-*/
-   _mesa_glsl_compile_shader(ctx, sh);
+   if (!sh->Source) {
+  /* If the user called glCompileShader without first calling
+   * glShaderSource, we should fail to compile, but not raise a GL_ERROR.
+   */
+  sh->CompileStatus = GL_FALSE;
+   } else {
+  if (ctx->Shader.Flags & GLSL_DUMP) {
+ printf("GLSL source for %s shader %d:\n",
+_mesa_glsl_shader_target_name(sh->Type), sh->Name);
+ printf("%s\n", sh->Source);
+  }
+
+  /* this call will set the shader->CompileStatus field to indicate if
+   * compilation was successful.
+   */
+  _mesa_glsl_compile_shader(ctx, sh);
+
+  if (ctx->Shader.Flags & GLSL_LOG) {
+ _mesa_write_shader_to_file(sh);
+  }
+
+  if (ctx->Shader.Flags & GLSL_DUMP) {
+ if (sh->CompileStatus) {
+printf("GLSL IR for shader %d:\n", sh->Name);
+_mesa_print_ir(sh->ir, NULL);
+printf("\n\n");
+ } else {
+printf("GLSL shader %d failed to compile.\n", sh->Name);
+ }
+ if (sh->InfoLog && sh->InfoLog[0] != 0) {
+printf("GLSL shader %d info log:\n", sh->Name);
+printf("%s\n", sh->InfoLog);
+ }
+  }
+
+   }
 
if (sh->CompileStatus == GL_FALSE && 
(ctx->Shader.Flags & GLSL_REPORT_ERRORS)) {
diff --git a/src/mesa/program/ir_to_mesa.cpp b/src/mesa/program/ir_to_mesa.cpp
index a915974..96996ee 100644
--- a/src/mesa/program/ir_to_mesa.cpp
+++ b/src/mesa/program/ir_to_mesa.cpp
@@ -3108,23 +3108,10 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, 
struct gl_shader *shader)
   new(shader) _mesa_glsl_parse_state(ctx, shader->Type, shader);
 
const char *source = shader->Source;
-   /* Check if the user called glCompileShader without first calling
-* glShaderSource.  This should fail to compile, but not raise a GL_ERROR.
-*/
-   if (source == NULL) {
-  shader->CompileStatus = GL_FALSE;
-  return;
-   }
 
state->error = glcpp_preprocess(state, &source, &state->info_log,
 &ctx->Extensions, ctx);
 
-   if (ctx->Shader.Flags & GLSL_DUMP) {
-  printf("GLSL source for %s shader %d:\n",
-_mesa_glsl_shader_target_name(state->target), shader->Name);
-  printf("%s\n", shader->Source);
-   }
-
if (!state->error) {
  _mesa_glsl_lexer_ctor(state, source);
  _mesa_glsl_parse(state);
@@ -3160,24 +3147,6 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct 
gl_shader *shader)
  sizeof(shader->builtins_to_link[0]) * state->num_builtins_to_link);
shader->num_builtins_to_link = state->num_builtins_to_link;
 
-   if (ctx->Shader.Flags & GLSL_LOG) {
-  _mesa_write_shader_to_file(shader);
-   }
-
-   if (ctx->Shader.Flags & GLSL_DUMP) {
-  if (shader->CompileStatus) {
-printf("GLSL IR for shader %d:\n", shader->Name);
-_mesa_print_ir(shader->ir, NULL);
-printf("\n\n");
-  } else {
-printf("GLSL shader %d failed to compile.\n", shader->Name);
-  }
-  if (shader->InfoLog && shader->InfoLog[0] != 0) {
-printf("GLSL shader %d info log:\n", shader->Name);
-printf("%s\n", shader->InfoLog);
-  }
-   }
-
if (shader->UniformBlocks)
   ralloc_free(shader->UniformBlocks);
shader->NumUniformBlocks = state->num_uniform_blocks;
-- 
1.8.3.rc0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] mesa: Move the common _mesa_glsl_compile_shader() code to glsl/.

2013-06-19 Thread Eric Anholt
This code had no relation to ir_to_mesa.cpp, since it was also used by
intel and state_tracker, and most of it was duplicated with the standalone
compiler (which has periodically drifted from the Mesa copy).

v2: Split from the ir_to_mesa to shaderapi.c changes.

Acked-by: Paul Berry  (v1)
---
 src/glsl/glsl_parser_extras.cpp | 83 +
 src/glsl/main.cpp   | 59 +
 src/glsl/program.h  | 16 +++-
 src/mesa/main/shaderapi.c   |  3 +-
 src/mesa/program/ir_to_mesa.cpp | 63 ---
 src/mesa/program/ir_to_mesa.h   |  1 -
 6 files changed, 100 insertions(+), 125 deletions(-)

diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
index 7b827ba..f4087df 100644
--- a/src/glsl/glsl_parser_extras.cpp
+++ b/src/glsl/glsl_parser_extras.cpp
@@ -28,6 +28,7 @@
 extern "C" {
 #include "main/core.h" /* for struct gl_context */
 #include "main/context.h"
+#include "main/shaderobj.h"
 }
 
 #include "ralloc.h"
@@ -1237,6 +1238,88 @@ ast_struct_specifier::ast_struct_specifier(const char 
*identifier,
this->declarations.push_degenerate_list_at_head(&declarator_list->link);
 }
 
+extern "C" {
+
+void
+_mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader,
+  bool dump_ast, bool dump_hir)
+{
+   struct _mesa_glsl_parse_state *state =
+  new(shader) _mesa_glsl_parse_state(ctx, shader->Type, shader);
+   const char *source = shader->Source;
+
+   state->error = glcpp_preprocess(state, &source, &state->info_log,
+ &ctx->Extensions, ctx);
+
+   if (!state->error) {
+ _mesa_glsl_lexer_ctor(state, source);
+ _mesa_glsl_parse(state);
+ _mesa_glsl_lexer_dtor(state);
+   }
+
+   if (dump_ast) {
+  foreach_list_const(n, &state->translation_unit) {
+ ast_node *ast = exec_node_data(ast_node, n, link);
+ ast->print();
+  }
+  printf("\n\n");
+   }
+
+   ralloc_free(shader->ir);
+   shader->ir = new(shader) exec_list;
+   if (!state->error && !state->translation_unit.is_empty())
+  _mesa_ast_to_hir(shader->ir, state);
+
+   if (!state->error) {
+  validate_ir_tree(shader->ir);
+
+  /* Print out the unoptimized IR. */
+  if (dump_hir) {
+ _mesa_print_ir(shader->ir, state);
+  }
+   }
+
+
+   if (!state->error && !shader->ir->is_empty()) {
+  struct gl_shader_compiler_options *options =
+ &ctx->ShaderCompilerOptions[_mesa_shader_type_to_index(shader->Type)];
+
+  /* Do some optimization at compile time to reduce shader IR size
+   * and reduce later work if the same shader is linked multiple times
+   */
+  while (do_common_optimization(shader->ir, false, false, 32, options))
+ ;
+
+  validate_ir_tree(shader->ir);
+   }
+
+   if (shader->InfoLog)
+  ralloc_free(shader->InfoLog);
+
+   shader->symbols = state->symbols;
+   shader->CompileStatus = !state->error;
+   shader->InfoLog = state->info_log;
+   shader->Version = state->language_version;
+   shader->InfoLog = state->info_log;
+   shader->IsES = state->es_shader;
+
+   memcpy(shader->builtins_to_link, state->builtins_to_link,
+  sizeof(shader->builtins_to_link[0]) * state->num_builtins_to_link);
+   shader->num_builtins_to_link = state->num_builtins_to_link;
+
+   if (shader->UniformBlocks)
+  ralloc_free(shader->UniformBlocks);
+   shader->NumUniformBlocks = state->num_uniform_blocks;
+   shader->UniformBlocks = state->uniform_blocks;
+   ralloc_steal(shader, shader->UniformBlocks);
+
+   /* Retain any live IR, but trash the rest. */
+   reparent_ir(shader->ir, shader->ir);
+
+   ralloc_free(state);
+}
+
+} /* extern "C" */
 /**
  * Do the set of common optimizations passes
  *
diff --git a/src/glsl/main.cpp b/src/glsl/main.cpp
index 5713ee5..60bc628 100644
--- a/src/glsl/main.cpp
+++ b/src/glsl/main.cpp
@@ -143,70 +143,13 @@ compile_shader(struct gl_context *ctx, struct gl_shader 
*shader)
struct _mesa_glsl_parse_state *state =
   new(shader) _mesa_glsl_parse_state(ctx, shader->Type, shader);
 
-   const char *source = shader->Source;
-   state->error = glcpp_preprocess(state, &source, &state->info_log,
-state->extensions, ctx) != 0;
-
-   if (!state->error) {
-  _mesa_glsl_lexer_ctor(state, source);
-  _mesa_glsl_parse(state);
-  _mesa_glsl_lexer_dtor(state);
-   }
-
-   if (dump_ast) {
-  foreach_list_const(n, &state->translation_unit) {
-ast_node *ast = exec_node_data(ast_node, n, link);
-ast->print();
-  }
-  printf("\n\n");
-   }
-
-   shader->ir = new(shader) exec_list;
-   if (!state->error && !state->translation_unit.is_empty())
-  _mesa_ast_to_hir(shader->ir, state);
-
-   /* Print out the unoptimized IR. */
-   if (!state->error && dump_hir) {
-  validate_ir_tree(shader->ir);
-  _mesa_print_ir(shader->ir, state);
-   }
-
-   /* Optimization passes */

Re: [Mesa-dev] [PATCH 5/6] mesa: Fix missing setting of shader->IsES.

2013-06-19 Thread Eric Anholt
Kenneth Graunke  writes:

> On 06/17/2013 04:10 PM, Eric Anholt wrote:
>> I noticed this while trying to merge code with the builtin compiler, which
>> does set it.
>>
>> Note that this causes two regressions in piglit in
>> default-precision-sampler.* which try to link without a vertex or fragment
>> shader, due to being run under the desktop glslparsertest binary (using
>> ARB_ES3_compatibility) that doesn't know about this requirement.
>
> That seems quite odd.

There are ifdefs in glslparsertest for adding appropriate stub shaders
for ES in an ES build, but if you run in ES3 compat on a desktop build,
it doesn't happen.  We should just be running these tests in the ES
build.


pgpIMwq2ZMhXA.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/6] mesa: Move the common _mesa_glsl_compile_shader() code to glsl/.

2013-06-19 Thread Eric Anholt
Kenneth Graunke  writes:

> On 06/17/2013 04:10 PM, Eric Anholt wrote:
>> ... and move the mesa-core-specific code into Mesa core.  This code had no
>> relation to ir_to_mesa.cpp, since it was also used by intel and
>> state_tracker, and most of it was duplicated with the standalone compiler
>> (which has periodically drifted from the Mesa copy).
>> ---
>>   src/glsl/glsl_parser_extras.cpp | 83 
>>   src/glsl/glsl_parser_extras.h   |  2 +
>>   src/glsl/main.cpp   | 59 +-
>>   src/glsl/program.h  | 16 ++-
>>   src/mesa/main/shaderapi.c   | 57 -
>>   src/mesa/program/ir_to_mesa.cpp | 94 
>> -
>>   src/mesa/program/ir_to_mesa.h   |  1 -
>>   7 files changed, 146 insertions(+), 166 deletions(-)
>>
>> diff --git a/src/glsl/glsl_parser_extras.cpp 
>> b/src/glsl/glsl_parser_extras.cpp
>> index 7b827ba..f142d73 100644
>> --- a/src/glsl/glsl_parser_extras.cpp
>> +++ b/src/glsl/glsl_parser_extras.cpp

>> @@ -1237,6 +1238,88 @@ ast_struct_specifier::ast_struct_specifier(const char 
>> *identifier,
>>  this->declarations.push_degenerate_list_at_head(&declarator_list->link);
>>   }
>>
>> +extern "C" {
>> +
>> +void
>> +_mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader,
>> +  bool dump_ast, bool dump_hir)
>> +{
>> +   struct _mesa_glsl_parse_state *state =
>> +  new(shader) _mesa_glsl_parse_state(ctx, shader->Type, shader);
>> +   const char *source = shader->Source;
>> +
>> +   state->error = glcpp_preprocess(state, &source, &state->info_log,
>> + &ctx->Extensions, ctx);
>> +
>> +   if (!state->error) {
>> + _mesa_glsl_lexer_ctor(state, source);
>> + _mesa_glsl_parse(state);
>> + _mesa_glsl_lexer_dtor(state);
>> +   }
>> +
>> +   if (dump_ast) {
>> +  foreach_list_const(n, &state->translation_unit) {
>> + ast_node *ast = exec_node_data(ast_node, n, link);
>> + ast->print();
>> +  }
>> +  printf("\n\n");
>> +   }
>> +
>> +   ralloc_free(shader->ir);
>> +   shader->ir = new(shader) exec_list;
>> +   if (!state->error && !state->translation_unit.is_empty())
>> +  _mesa_ast_to_hir(shader->ir, state);
>> +
>> +   if (!state->error) {
>> +  validate_ir_tree(shader->ir);
>> +
>> +  /* Print out the unoptimized IR. */
>> +  if (dump_hir) {
>> + _mesa_print_ir(shader->ir, state);
>> +  }
>> +   }
>> +
>> +
>> +   if (!state->error && !shader->ir->is_empty()) {
>> +  struct gl_shader_compiler_options *options =
>> + 
>> &ctx->ShaderCompilerOptions[_mesa_shader_type_to_index(shader->Type)];
>> +
>> +  /* Do some optimization at compile time to reduce shader IR size
>> +   * and reduce later work if the same shader is linked multiple times
>> +   */
>> +  while (do_common_optimization(shader->ir, false, false, 32, options))
>> + ;
>> +
>> +  validate_ir_tree(shader->ir);
>> +   }
>> +
>> +   if (shader->InfoLog)
>> +  ralloc_free(shader->InfoLog);
>> +
>> +   shader->symbols = state->symbols;
>> +   shader->CompileStatus = !state->error;
>> +   shader->InfoLog = state->info_log;
>> +   shader->Version = state->language_version;
>> +   shader->InfoLog = state->info_log;
>> +   shader->IsES = state->es_shader;
>> +
>> +   memcpy(shader->builtins_to_link, state->builtins_to_link,
>> +  sizeof(shader->builtins_to_link[0]) * state->num_builtins_to_link);
>> +   shader->num_builtins_to_link = state->num_builtins_to_link;
>> +
>> +   if (shader->UniformBlocks)
>> +  ralloc_free(shader->UniformBlocks);
>> +   shader->NumUniformBlocks = state->num_uniform_blocks;
>> +   shader->UniformBlocks = state->uniform_blocks;
>> +   ralloc_steal(shader, shader->UniformBlocks);
>> +
>> +   /* Retain any live IR, but trash the rest. */
>> +   reparent_ir(shader->ir, shader->ir);
>> +
>> +   ralloc_free(state);
>> +}
>> +
>> +} /* extern "C" */
>>   /**
>>* Do the set of common optimizations passes
>>*
>
> The above change looks good.  It makes a lot of sense for this to be in 
> the compiler.
>
> There are tabs, if you care.  Not a big deal either way.

Fixed.

>> diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h
>> index 7f478df..cf296e9 100644
>> --- a/src/glsl/glsl_parser_extras.h
>> +++ b/src/glsl/glsl_parser_extras.h
>> @@ -342,6 +342,8 @@ extern int _mesa_glsl_lex(union YYSTYPE *yylval, YYLTYPE 
>> *yylloc,
>>
>>   extern int _mesa_glsl_parse(struct _mesa_glsl_parse_state *);
>>
>> +extern void compile_shader();
>> +
>>   /**
>>* Process elements of the #extension directive
>>*
>
> I am bewildered by this hunk.  There are compile_shader() functions in 
> both glsl/main.cpp and mesa/main/shaderapi.c, but both of those are 
> static, and your patch doesn't change that.  I can't find /anything/ 
> that this would refer to.
>
> It almost looks like you split things differently i

Re: [Mesa-dev] [PATCH 1/4] glsl: Add simple vector type accessor helpers.

2013-06-19 Thread Eric Anholt
Kenneth Graunke  writes:

> On 06/18/2013 06:47 AM, Ian Romanick wrote:
>> On 06/18/2013 04:22 AM, Kenneth Graunke wrote:
>>> This patch introduces new functions to quickly grab a pointer to a
>>> vector type.  For example:
>>>
>>> glsl_type::bvec(4)   returns   glsl_type::bvec4_type
>>> glsl_type::ivec(3)   returns   glsl_type::ivec3_type
>>> glsl_type::uvec(2)   returns   glsl_type::uvec2_type
>>> glsl_type::vec(1)returns   glsl_type::float_type
>>>
>>> This is less wordy than glsl_type::get_instance(GLSL_TYPE_BOOL, 4, 1),
>>> which can help avoid extra word wrapping.
>>
>> One thing that's annoying about this code is that we have so many ways
>> to get at types.  You can get them from a symbol table, you can get them
>> from the direct pointers, you can get them from pointer math, or you can
>> get them from get_instance.  I'm not sure replacing one with another
>> makes it much better.
>>
>> I remember considering having separate overloads of get_instance for
>> vectors and matrices.  I don't recall why I didn't do it that way.
>>
>>> Signed-off-by: Kenneth Graunke 
>>> ---
>>>   src/glsl/glsl_types.cpp | 52
>>> +
>>>   src/glsl/glsl_types.h   |  9 +
>>>   2 files changed, 61 insertions(+)
>>>
>>> diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp
>>> index df9c5d3..a5491c5 100644
>>> --- a/src/glsl/glsl_types.cpp
>>> +++ b/src/glsl/glsl_types.cpp
>>> @@ -534,6 +534,58 @@ glsl_type::glsl_type(const glsl_type *array,
>>> unsigned length) :
>>>   }
>>>
>>>
>>> +const glsl_type *const
>>> +glsl_type::vec(unsigned components)
>>> +{
>>> +   if (components == 0 || components > 4)
>>> +  return error_type;
>>> +
>>> +   static const glsl_type *const ts[] = {
>>> +  float_type, vec2_type, vec3_type, vec4_type
>>> +   };
>>> +   return ts[components - 1];
>>
>> We could just embed this code in get_instance.  That seems better than
>> adding new methods that are only used in method that's part of the same
>> class.
>
> They're also used in patch 3, outside the class.  That was the 
> motivation for adding them...and I thought it was a bit tidier this way.
>
> But if you'd prefer just putting this in get_instance, that's fine by 
> me...it's what I originally did in v0 of this series anyway.

I really like the new accessors, and always hated get_instance() when
I'm trying to work on simple vectors.

The only bugs I found were the same shadow sampler issues for ES that
Ian caught.  Assuming Ian's issues are fixed, this series is:

Reviewed-by: Eric Anholt 


pgpgFKOWChBLR.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/8] indices: add some comments

2013-06-19 Thread Roland Scheidegger
Am 19.06.2013 18:39, schrieb Brian Paul:
> This is pretty complicated code with few/any comments.  Here's a first stab.
> ---
>  src/gallium/auxiliary/indices/u_indices.c  |   23 
> +---
>  src/gallium/auxiliary/indices/u_unfilled_indices.c |9 +++-
>  2 files changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/indices/u_indices.c 
> b/src/gallium/auxiliary/indices/u_indices.c
> index 03d7453..72c46f7 100644
> --- a/src/gallium/auxiliary/indices/u_indices.c
> +++ b/src/gallium/auxiliary/indices/u_indices.c
> @@ -150,9 +150,26 @@ int u_index_translator( unsigned hw_mask,
>  }
>  
>  
> -
> -
> -
> +/**
> + * If a driver does not support a particular gallium primitive type
> + * (such as PIPE_PRIM_QUAD_STRIP) this function can be used to help
> + * convert the primitive into a simpler type (like PIPE_PRIM_TRIANGLES).
> + *
> + * The generator functions generates a number of ushort or uint indexes
function


> + * for drawing the new type of primitive.
> + *
> + * \param hw_mask  a bitmask of (1 << PIPE_PRIM_x) values that indicates
> + * kind of primitives are supported by the driver.
> + * \param prim  the PIPE_PRIM_x that the user wants to draw
> + * \param start  index of first vertex to draw
> + * \param nr  number of vertices to draw
> + * \param in_pv  user's provoking vertex (PV_FIRST/LAST)
> + * \param out_pv  desired proking vertex for the hardware (PV_FIRST/LAST)
> + * \param out_prim  returns the new primitive type for the driver
> + * \param out_index_size  returns OUT_USHORT or OUT_UINT
> + * \param out_nr  returns new number of vertices to draw
> + * \param out_generate  returns pointer to the generator function
> + */
>  int u_index_generator( unsigned hw_mask,
> unsigned prim,
> unsigned start,
> diff --git a/src/gallium/auxiliary/indices/u_unfilled_indices.c 
> b/src/gallium/auxiliary/indices/u_unfilled_indices.c
> index c353717..25c61d9 100644
> --- a/src/gallium/auxiliary/indices/u_unfilled_indices.c
> +++ b/src/gallium/auxiliary/indices/u_unfilled_indices.c
> @@ -151,7 +151,14 @@ int u_unfilled_translator( unsigned prim,
>  }
>  
>  
> -
> +/**
> + * Utility for converting unfilled polygons into points, lines, triangles.
> + * Few drivers have direct support for OpenGL's glPolygonMode.
> + * This function helps with converting triangles into points or lines
> + * when the front and back fill modes are the same.  When there's
> + * different front/back fill modes, that can be handled with the
> + * 'draw' module.
> + */
>  int u_unfilled_generator( unsigned prim,
>unsigned start,
>unsigned nr,
> 

For the series:
Reviewed-by: Roland Scheidegger 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/8] svga: reindent svga_tgsi.c

2013-06-19 Thread Roland Scheidegger
Am 19.06.2013 18:39, schrieb Brian Paul:
> ---
>  src/gallium/drivers/svga/svga_tgsi.c |  128 
> +-
>  1 file changed, 65 insertions(+), 63 deletions(-)
> 
> diff --git a/src/gallium/drivers/svga/svga_tgsi.c 
> b/src/gallium/drivers/svga/svga_tgsi.c
> index 8bea7f8..56529c6 100644
> --- a/src/gallium/drivers/svga/svga_tgsi.c
> +++ b/src/gallium/drivers/svga/svga_tgsi.c
> @@ -50,7 +50,8 @@
>  static char err_buf[128];
>  
>  #if 0
> -static void svga_destroy_shader_emitter( struct svga_shader_emitter *emit )
> +static void
> +svga_destroy_shader_emitter(struct svga_shader_emitter *emit)
>  {
> if (emit->buf != err_buf)
>FREE(emit->buf);
> @@ -58,12 +59,13 @@ static void svga_destroy_shader_emitter( struct 
> svga_shader_emitter *emit )
>  #endif
>  
>  
> -static boolean svga_shader_expand( struct svga_shader_emitter *emit )
> +static boolean
> +svga_shader_expand(struct svga_shader_emitter *emit)
>  {
> char *new_buf;
> unsigned newsize = emit->size * 2;
>  
> -   if(emit->buf != err_buf)
> +   if (emit->buf != err_buf)
>new_buf = REALLOC(emit->buf, emit->size, newsize);
> else
>new_buf = NULL;
> @@ -79,70 +81,76 @@ static boolean svga_shader_expand( struct 
> svga_shader_emitter *emit )
> emit->ptr = new_buf + (emit->ptr - emit->buf);
> emit->buf = new_buf;
> return TRUE;
> -}   
> +}
> +
>  
> -static INLINE boolean reserve(  struct svga_shader_emitter *emit,
> -unsigned nr_dwords )
> +static INLINE boolean
> +reserve(struct svga_shader_emitter *emit, unsigned nr_dwords)
>  {
> if (emit->ptr - emit->buf + nr_dwords * sizeof(unsigned) >= emit->size) {
> -  if (!svga_shader_expand( emit ))
> +  if (!svga_shader_expand(emit)) {
>   return FALSE;
> +  }
> }
>  
> return TRUE;
>  }
>  
> -boolean svga_shader_emit_dword( struct svga_shader_emitter *emit,
> -unsigned dword )
> +
> +boolean
> +svga_shader_emit_dword(struct svga_shader_emitter * emit, unsigned dword)
>  {
> if (!reserve(emit, 1))
>return FALSE;
>  
> -   *(unsigned *)emit->ptr = dword;
> +   *(unsigned *) emit->ptr = dword;
> emit->ptr += sizeof dword;
> return TRUE;
>  }
>  
> -boolean svga_shader_emit_dwords( struct svga_shader_emitter *emit,
> - const unsigned *dwords,
> - unsigned nr )
> +
> +boolean
> +svga_shader_emit_dwords(struct svga_shader_emitter * emit,
> +const unsigned *dwords, unsigned nr)
>  {
> if (!reserve(emit, nr))
>return FALSE;
>  
> -   memcpy( emit->ptr, dwords, nr * sizeof *dwords );
> +   memcpy(emit->ptr, dwords, nr * sizeof *dwords);
> emit->ptr += nr * sizeof *dwords;
> return TRUE;
>  }
>  
> -boolean svga_shader_emit_opcode( struct svga_shader_emitter *emit,
> - unsigned opcode )
> +
> +boolean
> +svga_shader_emit_opcode(struct svga_shader_emitter * emit, unsigned opcode)
>  {
> SVGA3dShaderInstToken *here;
>  
> if (!reserve(emit, 1))
>return FALSE;
>  
> -   here = (SVGA3dShaderInstToken *)emit->ptr;
> +   here = (SVGA3dShaderInstToken *) emit->ptr;
> here->value = opcode;
>  
> if (emit->insn_offset) {
> -  SVGA3dShaderInstToken *prev = (SVGA3dShaderInstToken *)(emit->buf + 
> -  
> emit->insn_offset);
> +  SVGA3dShaderInstToken *prev =
> + (SVGA3dShaderInstToken *) (emit->buf + emit->insn_offset);
>prev->size = (here - prev) - 1;
> }
> -   
> +
> emit->insn_offset = emit->ptr - emit->buf;
> emit->ptr += sizeof(unsigned);
> return TRUE;
>  }
>  
>  
> -static boolean svga_shader_emit_header( struct svga_shader_emitter *emit )
> +static boolean
> +svga_shader_emit_header(struct svga_shader_emitter *emit)
>  {
> SVGA3dShaderVersion header;
>  
> -   memset( &header, 0, sizeof header );
> +   memset(&header, 0, sizeof header);
>  
> switch (emit->unit) {
> case PIPE_SHADER_FRAGMENT:
> @@ -152,8 +160,8 @@ static boolean svga_shader_emit_header( struct 
> svga_shader_emitter *emit )
>header.value = SVGA3D_VS_30;
>break;
> }
> - 
> -   return svga_shader_emit_dword( emit, header.value );
> +
> +   return svga_shader_emit_dword(emit, header.value);
>  }
>  
>  
> @@ -248,17 +256,17 @@ svga_remap_generic_index(int8_t 
> remap_table[MAX_GENERIC_VARYING],
>  }
>  
>  
> -/* Parse TGSI shader and translate to SVGA/DX9 serialized
> - * representation.  
> +/**
> + * Parse TGSI shader and translate to SVGA/DX9 serialized
> + * representation.
>   *
>   * In this function SVGA shader is emitted to an in-memory buffer that
>   * can be dynamically grown.  Once we've finished and know how large
>   * it is, it will be copied to a hardware buffer for upload.
>   */
>  static struct svga_shader_result *
> -svga_tgsi_translate( const struct svga

Re: [Mesa-dev] [PATCH 04/14] gallium: Document packed formats

2013-06-19 Thread Jose Fonseca
Series looks good to me too.

Jose

- Original Message -
> Signed-off-by: Adam Jackson 
> ---
>  src/gallium/docs/format.rst | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/src/gallium/docs/format.rst b/src/gallium/docs/format.rst
> index 2514656..e270d93 100644
> --- a/src/gallium/docs/format.rst
> +++ b/src/gallium/docs/format.rst
> @@ -6,7 +6,14 @@ Gallium format names mostly follow D3D10 conventions, with
> some extensions.
>  Format names like XnYnZnWn have the X component in the lowest-address n bits
>  and the W component in the highest-address n bits; for B8G8R8A8, byte 0 is
>  blue and byte 3 is alpha.  Note that platform endianness is not considered
> -in this definition.
> +in this definition.  In C:
> +
> +struct xyzw { uint8_t x, y, z, w; };
> +
> +Format aliases like XYZWstrq are (s+t+r+q)-bit integers in host endianness,
> +with the X component in the s least-significant bits of the integer.  In C:
> +
> +uint32_t xyzw = (x << 0) | (y << 8) | (z << 16) | (w << 24);
>  
>  Format suffixes affect the interpretation of the channel:
>  
> --
> 1.8.2.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/8] util: big-endian fixes for format generator

2013-06-19 Thread Jose Fonseca

- Original Message -
> This series is a replacement for the util part of:
> 
> http://lists.freedesktop.org/archives/mesa-dev/2013-May/039419.html
> 
> It doesn't include any of the controversial format name parts of
> those patches (although of course they're still needed in some form).

I've skimmed through the series and looks good to me.

> The original version took the endianness from the build system and
> only produced output for that endianness.  This version instead
> generates both big- and little-endian code, with preprocessor
> guards to choose the right version.

Yes, this is better -- it means that cross compilation between big-little 
endian machines (eg, powerpc -> mingw) will work well.

Jose

> The x8y8z8w8 formats could be handled by adding separate big- and
> little-endian shift amounts to u_format_parse.Channel.  The patches
> go a bit further and separate the entire channel and swizzle lists.
> That is, u_format_parse.Format now has separate channel and swizzle
> lists for each endianness.
> 
> The reason for this is that (AFAICT) depth-and-stencil formats are
> universally treated as uint32_t-based or (float x uint32_t)-based.
> E.g. PIPE_FORMAT_Z24_UNORM_S8_UINT is always a 32-bit int with the depth
> in the lower 24 bits and the stencil in the upper 8 bits.  (This is
> called S8_Z24 in mesa and elsewhere.)  The easiest way of dealing with
> that seemed to be to add the big-endian form directly to u_format.csv,
> as with the attached patch.  No other changes seem to be needed to support
> these particular formats on big-endian.
> 
> I'm not submitting the patch below yet because it doesn't make sense
> without the other endianness changes.  When applied on top of those
> changes though, it fixes glxgears on System z.  It also fixes many
> piglit tests.
> 
> No piglit regressions on x86_64.
> 
> Thanks,
> Richard
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/13] gallium: Introduce 32-bit bytewise format names

2013-06-19 Thread Jose Fonseca


- Original Message -
> Jose Fonseca  writes:
> > - Original Message -
> >> On Sun, 2013-06-16 at 10:22 -0700, Jose Fonseca wrote:
> >> 
> >> > Ok. I think this patch series is sound from an implementation POV. I
> >> > see no point in delaying further. We can tweak things afterwards if
> >> > deemed necessary.
> >> > 
> >> > Lets squash the commits, rename the XYZW formats to go from
> >> > low->high bit, and commit this into master.
> 
> Thanks.
> 
> >> Just so I'm clear, when you say "go from low to high bit", you mean that
> >> in the new format names the first component in the name maps to the
> >> least-significant bits in memory, thus:
> >> 
> >> #if defined(PIPE_ARCH_LITTLE_ENDIAN)
> >> #define PIPE_FORMAT_RGBA_UNORM PIPE_FORMAT_R8G8B8A8_UNORM
> >> #elif defined(PIPE_ARCH_BIG_ENDIAN)
> >> #define PIPE_FORMAT_RGBA_UNORM PIPE_FORMAT_A8B8G8R8_UNORM
> >> #endif
> >>
> >> If I have this correct I'll send out a corrected series so we can get
> >> this landed.
> >
> > Yep. That seems correct to me.
> >
> >   PIPE_FORMAT_XYZW_UNORM goes from least to most significant bit in
> > integer: uint32 color = (X << 0) | (Y << 8) | ...
> >
> >   PIPE_FORMAT_X8Y8Z8W8_UNORM matches memory ordering: uint8 color[4] =
> > {X, Y, Z, W};
> >
> >> (The patch series as given plugs the XYZW formats into enum
> >> pipe_format; I think it'd be clearer to do the #define the other way
> >> around at the end, as I have it above, since that way enum pipe_format
> >> remains endian-independent.)
> >
> > I didn't want to nitpick any further than I already have so far, but I
> > agree with you entirely.
> 
> What do you think about the alternative "util: big-endian fixes for
> format generator" implementation of the python bits:
> 
>   http://lists.freedesktop.org/archives/mesa-dev/2013-June/040594.html
> 
> ?  Would it be OK to use that instead of the original? 

Yep, that sounds good. Separate reply shortly.


> I have some
> follow-up patches for things like the zs, 565 and 10/10/10/2 formats
> that apply on top of that series.
> 
> Thanks,
> Richard

I will try to a least skim over them quickly

I haven't been able to keep up with email / todo things lately. I have too much 
going on. I need to cut the number of things and spend less time on them. It's 
easy for little things, but large reviews are particularly hard to slot in my 
schedule -- risk of regression, a lot of effort put in, so a minimal ammount of 
care and attention is necessary. But I can't afford to micro manage / nitpick 
anymore

Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 8/8] svga: add some comments about primitive conversion

2013-06-19 Thread Brian Paul
And clean up the svga_translate_prim() function with better
variable names.
---
 src/gallium/drivers/svga/svga_draw_arrays.c  |5 
 src/gallium/drivers/svga/svga_draw_private.h |   32 +-
 2 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_draw_arrays.c 
b/src/gallium/drivers/svga/svga_draw_arrays.c
index a506371..7adefd0 100644
--- a/src/gallium/drivers/svga/svga_draw_arrays.c
+++ b/src/gallium/drivers/svga/svga_draw_arrays.c
@@ -225,6 +225,7 @@ svga_hwtnl_draw_arrays( struct svga_hwtnl *hwtnl,
if (hwtnl->api_fillmode != PIPE_POLYGON_MODE_FILL &&
prim >= PIPE_PRIM_TRIANGLES)
{
+  /* Convert unfilled polygons into points, lines, triangles */
   gen_type = u_unfilled_generator( prim,
start,
count,
@@ -235,6 +236,10 @@ svga_hwtnl_draw_arrays( struct svga_hwtnl *hwtnl,
&gen_func );
}
else {
+  /* Convert PIPE_PRIM_LINE_LOOP to PIPE_PRIM_LINESTRIP,
+   * convert PIPE_PRIM_POLYGON to PIPE_PRIM_TRIANGLE_FAN,
+   * etc, if needed (as determined by svga_hw_prims mask).
+   */
   gen_type = u_index_generator( svga_hw_prims,
 prim,
 start,
diff --git a/src/gallium/drivers/svga/svga_draw_private.h 
b/src/gallium/drivers/svga/svga_draw_private.h
index 8126f7e..1b05403 100644
--- a/src/gallium/drivers/svga/svga_draw_private.h
+++ b/src/gallium/drivers/svga/svga_draw_private.h
@@ -35,7 +35,10 @@
 struct svga_context;
 struct u_upload_mgr;
 
-/* Should include polygon?
+/**
+ * Mask indicating which types of gallium primitives are actually
+ * handled by the svga device.  Other types will be converted to
+ * these types by the index/translation code.
  */
 static const unsigned svga_hw_prims = 
((1 << PIPE_PRIM_POINTS) |
@@ -46,38 +49,45 @@ static const unsigned svga_hw_prims =
 (1 << PIPE_PRIM_TRIANGLE_FAN));
 
 
-static INLINE unsigned svga_translate_prim(unsigned mode, 
-   unsigned count,
-   unsigned *out_count)
+/**
+ * Translate a gallium PIPE_PRIM_x value to an SVGA3D_PRIMITIVE_x value.
+ * Also, compute the number of primitives that'll be drawn given a
+ * vertex count.
+ * Note that this function doesn't have to handle PIPE_PRIM_LINE_LOOP,
+ * PIPE_PRIM_QUADS, PIPE_PRIM_QUAD_STRIP or PIPE_PRIM_POLYGON.  We convert
+ * those to other types of primitives with index/translation code.
+ */
+static INLINE unsigned
+svga_translate_prim(unsigned mode, unsigned vcount,unsigned *prim_count)
 {
switch (mode) {
case PIPE_PRIM_POINTS:
-  *out_count = count;
+  *prim_count = vcount;
   return SVGA3D_PRIMITIVE_POINTLIST;
 
case PIPE_PRIM_LINES:
-  *out_count = count / 2;
+  *prim_count = vcount / 2;
   return SVGA3D_PRIMITIVE_LINELIST; 
 
case PIPE_PRIM_LINE_STRIP:
-  *out_count = count - 1;
+  *prim_count = vcount - 1;
   return SVGA3D_PRIMITIVE_LINESTRIP; 
 
case PIPE_PRIM_TRIANGLES:
-  *out_count = count / 3;
+  *prim_count = vcount / 3;
   return SVGA3D_PRIMITIVE_TRIANGLELIST; 
 
case PIPE_PRIM_TRIANGLE_STRIP:
-  *out_count = count - 2;
+  *prim_count = vcount - 2;
   return SVGA3D_PRIMITIVE_TRIANGLESTRIP; 
 
case PIPE_PRIM_TRIANGLE_FAN:
-  *out_count = count - 2;
+  *prim_count = vcount - 2;
   return SVGA3D_PRIMITIVE_TRIANGLEFAN; 
 
default:
   assert(0);
-  *out_count = 0;
+  *prim_count = 0;
   return 0;
}
 }
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/8] svga: whitespace, comment, formatting fixes in svga_tgsi_emit.h

2013-06-19 Thread Brian Paul
---
 src/gallium/drivers/svga/svga_tgsi_emit.h |   98 -
 1 file changed, 56 insertions(+), 42 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_tgsi_emit.h 
b/src/gallium/drivers/svga/svga_tgsi_emit.h
index 949c39d..e36a955 100644
--- a/src/gallium/drivers/svga/svga_tgsi_emit.h
+++ b/src/gallium/drivers/svga/svga_tgsi_emit.h
@@ -38,16 +38,18 @@ struct src_register
 };
 
 
-struct svga_arl_consts {
+struct svga_arl_consts
+{
int number;
int idx;
int swizzle;
int arl_num;
 };
 
-/* Internal functions:
- */
 
+/**
+ * This is the context/state used during TGSI->SVGA shader translation.
+ */
 struct svga_shader_emitter
 {
unsigned size;
@@ -63,7 +65,7 @@ struct svga_shader_emitter
int nr_hw_float_const;
int nr_hw_int_const;
int nr_hw_temp;
-   
+
int insn_offset;
 
int internal_temp_count;
@@ -129,28 +131,24 @@ struct svga_shader_emitter
 };
 
 
-boolean svga_shader_emit_dword( struct svga_shader_emitter *emit,
-unsigned dword );
+boolean
+svga_shader_emit_dword(struct svga_shader_emitter *emit, unsigned dword);
 
-boolean svga_shader_emit_dwords( struct svga_shader_emitter *emit,
- const unsigned *dwords,
- unsigned nr );
+boolean
+svga_shader_emit_dwords(struct svga_shader_emitter *emit,
+const unsigned *dwords, unsigned nr);
 
-boolean svga_shader_emit_opcode( struct svga_shader_emitter *emit,
- unsigned opcode );
+boolean
+svga_shader_emit_opcode(struct svga_shader_emitter *emit,
+unsigned opcode);
 
-boolean svga_shader_emit_instructions( struct svga_shader_emitter *emit,
-   const struct tgsi_token *tokens );
+boolean
+svga_shader_emit_instructions(struct svga_shader_emitter *emit,
+  const struct tgsi_token *tokens);
 
-boolean svga_translate_decl_sm30( struct svga_shader_emitter *emit,
-   const struct tgsi_full_declaration *decl );
-
-
-static INLINE boolean emit_instruction( struct svga_shader_emitter *emit,
- SVGA3dShaderInstToken opcode )
-{
-   return svga_shader_emit_opcode( emit, opcode.value );
-}
+boolean
+svga_translate_decl_sm30(struct svga_shader_emitter *emit,
+ const struct tgsi_full_declaration *decl);
 
 
 #define TRANSLATE_SWIZZLE(x,y,z,w)  ((x) | ((y) << 2) | ((z) << 4) | ((w) << 
6))
@@ -166,9 +164,18 @@ static INLINE boolean emit_instruction( struct 
svga_shader_emitter *emit,
  TRANSLATE_SWIZZLE(TGSI_SWIZZLE_W,TGSI_SWIZZLE_W,TGSI_SWIZZLE_W,TGSI_SWIZZLE_W)
 
 
+/** Emit the given SVGA3dShaderInstToken opcode */
+static INLINE boolean
+emit_instruction(struct svga_shader_emitter *emit,
+ SVGA3dShaderInstToken opcode)
+{
+   return svga_shader_emit_opcode(emit, opcode.value);
+}
 
+
+/** Generate a SVGA3dShaderInstToken for the given SVGA3D shader opcode */
 static INLINE SVGA3dShaderInstToken
-inst_token( unsigned opcode )
+inst_token(unsigned opcode)
 {
SVGA3dShaderInstToken inst;
 
@@ -184,9 +191,8 @@ inst_token( unsigned opcode )
  * Note that this function is used to create tokens for output registers,
  * temp registers AND constants (see emit_def_const()).
  */
-static INLINE SVGA3dShaderDestToken 
-dst_register( unsigned file,
-  int number )
+static INLINE SVGA3dShaderDestToken
+dst_register(unsigned file, int number)
 {
SVGA3dShaderDestToken dest;
 
@@ -204,13 +210,17 @@ dst_register( unsigned file,
dest.shfScale = 0;
dest.type_lower = file & 0x7;
dest.reserved0 = 1;  /* is_reg */
-   
+
return dest;
 }
 
+
+/**
+ * Apply a writemask to the given SVGA3dShaderDestToken, returning a
+ * new SVGA3dShaderDestToken.
+ */
 static INLINE SVGA3dShaderDestToken
-writemask( SVGA3dShaderDestToken dest,
-   unsigned mask )
+writemask(SVGA3dShaderDestToken dest, unsigned mask)
 {
assert(dest.mask & mask);
dest.mask &= mask;
@@ -218,8 +228,9 @@ writemask( SVGA3dShaderDestToken dest,
 }
 
 
-static INLINE SVGA3dShaderSrcToken 
-src_token( unsigned file, int number )
+/** Create a SVGA3dShaderSrcToken given a register file and number */
+static INLINE SVGA3dShaderSrcToken
+src_token(unsigned file, int number)
 {
SVGA3dShaderSrcToken src;
 
@@ -241,28 +252,31 @@ src_token( unsigned file, int number )
 }
 
 
-static INLINE struct src_register 
-src_register( unsigned file, int number )
+/** Create a src_register given a register file and register number */
+static INLINE struct src_register
+src_register(unsigned file, int number)
 {
struct src_register src;
-   
-   src.base = src_token( file, number );
+
+   src.base = src_token(file, number);
src.indirect.value = 0;
 
return src;
 }
 
-static INLINE SVGA3dShaderDestToken dst( struct src_register src )
+/** Translate src_register into SVGA3dShaderDestToken */

[Mesa-dev] [PATCH 7/8] indices: add some comments

2013-06-19 Thread Brian Paul
This is pretty complicated code with few/any comments.  Here's a first stab.
---
 src/gallium/auxiliary/indices/u_indices.c  |   23 +---
 src/gallium/auxiliary/indices/u_unfilled_indices.c |9 +++-
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/src/gallium/auxiliary/indices/u_indices.c 
b/src/gallium/auxiliary/indices/u_indices.c
index 03d7453..72c46f7 100644
--- a/src/gallium/auxiliary/indices/u_indices.c
+++ b/src/gallium/auxiliary/indices/u_indices.c
@@ -150,9 +150,26 @@ int u_index_translator( unsigned hw_mask,
 }
 
 
-
-
-
+/**
+ * If a driver does not support a particular gallium primitive type
+ * (such as PIPE_PRIM_QUAD_STRIP) this function can be used to help
+ * convert the primitive into a simpler type (like PIPE_PRIM_TRIANGLES).
+ *
+ * The generator functions generates a number of ushort or uint indexes
+ * for drawing the new type of primitive.
+ *
+ * \param hw_mask  a bitmask of (1 << PIPE_PRIM_x) values that indicates
+ * kind of primitives are supported by the driver.
+ * \param prim  the PIPE_PRIM_x that the user wants to draw
+ * \param start  index of first vertex to draw
+ * \param nr  number of vertices to draw
+ * \param in_pv  user's provoking vertex (PV_FIRST/LAST)
+ * \param out_pv  desired proking vertex for the hardware (PV_FIRST/LAST)
+ * \param out_prim  returns the new primitive type for the driver
+ * \param out_index_size  returns OUT_USHORT or OUT_UINT
+ * \param out_nr  returns new number of vertices to draw
+ * \param out_generate  returns pointer to the generator function
+ */
 int u_index_generator( unsigned hw_mask,
unsigned prim,
unsigned start,
diff --git a/src/gallium/auxiliary/indices/u_unfilled_indices.c 
b/src/gallium/auxiliary/indices/u_unfilled_indices.c
index c353717..25c61d9 100644
--- a/src/gallium/auxiliary/indices/u_unfilled_indices.c
+++ b/src/gallium/auxiliary/indices/u_unfilled_indices.c
@@ -151,7 +151,14 @@ int u_unfilled_translator( unsigned prim,
 }
 
 
-
+/**
+ * Utility for converting unfilled polygons into points, lines, triangles.
+ * Few drivers have direct support for OpenGL's glPolygonMode.
+ * This function helps with converting triangles into points or lines
+ * when the front and back fill modes are the same.  When there's
+ * different front/back fill modes, that can be handled with the
+ * 'draw' module.
+ */
 int u_unfilled_generator( unsigned prim,
   unsigned start,
   unsigned nr,
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/8] svga: reindent svga_tgsi.c

2013-06-19 Thread Brian Paul
---
 src/gallium/drivers/svga/svga_tgsi.c |  128 +-
 1 file changed, 65 insertions(+), 63 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_tgsi.c 
b/src/gallium/drivers/svga/svga_tgsi.c
index 8bea7f8..56529c6 100644
--- a/src/gallium/drivers/svga/svga_tgsi.c
+++ b/src/gallium/drivers/svga/svga_tgsi.c
@@ -50,7 +50,8 @@
 static char err_buf[128];
 
 #if 0
-static void svga_destroy_shader_emitter( struct svga_shader_emitter *emit )
+static void
+svga_destroy_shader_emitter(struct svga_shader_emitter *emit)
 {
if (emit->buf != err_buf)
   FREE(emit->buf);
@@ -58,12 +59,13 @@ static void svga_destroy_shader_emitter( struct 
svga_shader_emitter *emit )
 #endif
 
 
-static boolean svga_shader_expand( struct svga_shader_emitter *emit )
+static boolean
+svga_shader_expand(struct svga_shader_emitter *emit)
 {
char *new_buf;
unsigned newsize = emit->size * 2;
 
-   if(emit->buf != err_buf)
+   if (emit->buf != err_buf)
   new_buf = REALLOC(emit->buf, emit->size, newsize);
else
   new_buf = NULL;
@@ -79,70 +81,76 @@ static boolean svga_shader_expand( struct 
svga_shader_emitter *emit )
emit->ptr = new_buf + (emit->ptr - emit->buf);
emit->buf = new_buf;
return TRUE;
-}   
+}
+
 
-static INLINE boolean reserve(  struct svga_shader_emitter *emit,
-unsigned nr_dwords )
+static INLINE boolean
+reserve(struct svga_shader_emitter *emit, unsigned nr_dwords)
 {
if (emit->ptr - emit->buf + nr_dwords * sizeof(unsigned) >= emit->size) {
-  if (!svga_shader_expand( emit ))
+  if (!svga_shader_expand(emit)) {
  return FALSE;
+  }
}
 
return TRUE;
 }
 
-boolean svga_shader_emit_dword( struct svga_shader_emitter *emit,
-unsigned dword )
+
+boolean
+svga_shader_emit_dword(struct svga_shader_emitter * emit, unsigned dword)
 {
if (!reserve(emit, 1))
   return FALSE;
 
-   *(unsigned *)emit->ptr = dword;
+   *(unsigned *) emit->ptr = dword;
emit->ptr += sizeof dword;
return TRUE;
 }
 
-boolean svga_shader_emit_dwords( struct svga_shader_emitter *emit,
- const unsigned *dwords,
- unsigned nr )
+
+boolean
+svga_shader_emit_dwords(struct svga_shader_emitter * emit,
+const unsigned *dwords, unsigned nr)
 {
if (!reserve(emit, nr))
   return FALSE;
 
-   memcpy( emit->ptr, dwords, nr * sizeof *dwords );
+   memcpy(emit->ptr, dwords, nr * sizeof *dwords);
emit->ptr += nr * sizeof *dwords;
return TRUE;
 }
 
-boolean svga_shader_emit_opcode( struct svga_shader_emitter *emit,
- unsigned opcode )
+
+boolean
+svga_shader_emit_opcode(struct svga_shader_emitter * emit, unsigned opcode)
 {
SVGA3dShaderInstToken *here;
 
if (!reserve(emit, 1))
   return FALSE;
 
-   here = (SVGA3dShaderInstToken *)emit->ptr;
+   here = (SVGA3dShaderInstToken *) emit->ptr;
here->value = opcode;
 
if (emit->insn_offset) {
-  SVGA3dShaderInstToken *prev = (SVGA3dShaderInstToken *)(emit->buf + 
-  
emit->insn_offset);
+  SVGA3dShaderInstToken *prev =
+ (SVGA3dShaderInstToken *) (emit->buf + emit->insn_offset);
   prev->size = (here - prev) - 1;
}
-   
+
emit->insn_offset = emit->ptr - emit->buf;
emit->ptr += sizeof(unsigned);
return TRUE;
 }
 
 
-static boolean svga_shader_emit_header( struct svga_shader_emitter *emit )
+static boolean
+svga_shader_emit_header(struct svga_shader_emitter *emit)
 {
SVGA3dShaderVersion header;
 
-   memset( &header, 0, sizeof header );
+   memset(&header, 0, sizeof header);
 
switch (emit->unit) {
case PIPE_SHADER_FRAGMENT:
@@ -152,8 +160,8 @@ static boolean svga_shader_emit_header( struct 
svga_shader_emitter *emit )
   header.value = SVGA3D_VS_30;
   break;
}
- 
-   return svga_shader_emit_dword( emit, header.value );
+
+   return svga_shader_emit_dword(emit, header.value);
 }
 
 
@@ -248,17 +256,17 @@ svga_remap_generic_index(int8_t 
remap_table[MAX_GENERIC_VARYING],
 }
 
 
-/* Parse TGSI shader and translate to SVGA/DX9 serialized
- * representation.  
+/**
+ * Parse TGSI shader and translate to SVGA/DX9 serialized
+ * representation.
  *
  * In this function SVGA shader is emitted to an in-memory buffer that
  * can be dynamically grown.  Once we've finished and know how large
  * it is, it will be copied to a hardware buffer for upload.
  */
 static struct svga_shader_result *
-svga_tgsi_translate( const struct svga_shader *shader,
- struct svga_compile_key key,
- unsigned unit )
+svga_tgsi_translate(const struct svga_shader *shader,
+struct svga_compile_key key, unsigned unit)
 {
struct svga_shader_result *result = NULL;
struct svga_shader_emitter emit;
@@ -275,10 +283,10 @@ svga_tgsi_translate( const struct sv

[Mesa-dev] [PATCH 4/8] svga: move some svga/tgsi functions

2013-06-19 Thread Brian Paul
Move some functions from the svga_tgsi_insn.h header into the
svga_tgsi_insn.c file since they're only used there.  Plus, add
comments and fix formatting.
---
 src/gallium/drivers/svga/svga_tgsi_decl_sm30.c |   20 
 src/gallium/drivers/svga/svga_tgsi_emit.h  |  125 ---
 src/gallium/drivers/svga/svga_tgsi_insn.c  |  129 
 3 files changed, 149 insertions(+), 125 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c 
b/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c
index becd159..7faa275 100644
--- a/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c
+++ b/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c
@@ -491,6 +491,26 @@ vs30_output(struct svga_shader_emitter *emit,
 }
 
 
+/** Translate PIPE_TEXTURE_x to SVGA3DSAMP_x */
+static ubyte
+svga_tgsi_sampler_type(const struct svga_shader_emitter *emit, int idx)
+{
+   switch (emit->key.fkey.tex[idx].texture_target) {
+   case PIPE_TEXTURE_1D:
+  return SVGA3DSAMP_2D;
+   case PIPE_TEXTURE_2D:
+   case PIPE_TEXTURE_RECT:
+  return SVGA3DSAMP_2D;
+   case PIPE_TEXTURE_3D:
+  return SVGA3DSAMP_VOLUME;
+   case PIPE_TEXTURE_CUBE:
+  return SVGA3DSAMP_CUBE;
+   }
+
+   return SVGA3DSAMP_UNKNOWN;
+}
+
+
 static boolean
 ps30_sampler( struct svga_shader_emitter *emit,
   struct tgsi_declaration_semantic semantic,
diff --git a/src/gallium/drivers/svga/svga_tgsi_emit.h 
b/src/gallium/drivers/svga/svga_tgsi_emit.h
index 73c4cda..949c39d 100644
--- a/src/gallium/drivers/svga/svga_tgsi_emit.h
+++ b/src/gallium/drivers/svga/svga_tgsi_emit.h
@@ -146,30 +146,6 @@ boolean svga_translate_decl_sm30( struct 
svga_shader_emitter *emit,
const struct tgsi_full_declaration *decl );
 
 
-static INLINE boolean emit_dst( struct svga_shader_emitter *emit,
- SVGA3dShaderDestToken dest )
-{
-   assert(dest.reserved0);
-   assert(dest.mask);
-   return svga_shader_emit_dword( emit, dest.value );
-}
-
-static INLINE boolean emit_src( struct svga_shader_emitter *emit,
- const struct src_register src )
-{
-   if (src.base.relAddr) {
-  assert(src.base.reserved0);
-  assert(src.indirect.reserved0);
-  return (svga_shader_emit_dword( emit, src.base.value ) &&
-  svga_shader_emit_dword( emit, src.indirect.value ));
-   }
-   else {
-  assert(src.base.reserved0);
-  return svga_shader_emit_dword( emit, src.base.value );
-   }
-}
-
-
 static INLINE boolean emit_instruction( struct svga_shader_emitter *emit,
  SVGA3dShaderInstToken opcode )
 {
@@ -177,60 +153,6 @@ static INLINE boolean emit_instruction( struct 
svga_shader_emitter *emit,
 }
 
 
-static INLINE boolean emit_op1( struct svga_shader_emitter *emit,
- SVGA3dShaderInstToken inst,
- SVGA3dShaderDestToken dest,
- struct src_register src0 )
-{
-   return (emit_instruction( emit, inst ) &&
-   emit_dst( emit, dest ) &&
-   emit_src( emit, src0 ));
-}
-
-static INLINE boolean emit_op2( struct svga_shader_emitter *emit,
- SVGA3dShaderInstToken inst,
- SVGA3dShaderDestToken dest,
- struct src_register src0,
- struct src_register src1 )
-{
-   return (emit_instruction( emit, inst ) &&
-   emit_dst( emit, dest ) &&
-   emit_src( emit, src0 ) &&
-   emit_src( emit, src1 ));
-}
-
-static INLINE boolean emit_op3( struct svga_shader_emitter *emit,
- SVGA3dShaderInstToken inst,
- SVGA3dShaderDestToken dest,
- struct src_register src0,
- struct src_register src1,
- struct src_register src2 )
-{
-   return (emit_instruction( emit, inst ) &&
-   emit_dst( emit, dest ) &&
-   emit_src( emit, src0 ) &&
-   emit_src( emit, src1 ) &&
-   emit_src( emit, src2 ));
-}
-
-
-static INLINE boolean emit_op4( struct svga_shader_emitter *emit,
-SVGA3dShaderInstToken inst,
-SVGA3dShaderDestToken dest,
-struct src_register src0,
-struct src_register src1,
-struct src_register src2,
-struct src_register src3)
-{
-   return (emit_instruction( emit, inst ) &&
-   emit_dst( emit, dest ) &&
-   emit_src( emit, src0 ) &&
-   emit_src( emit, src1 ) &&
-   emit_src( emit, src2 ) &&
-   emit_src( emit, src3 ));
-}
-
-
 #define TRANSLATE_SWIZZLE(x,y,z,w)  ((x) | ((y) << 2) | ((z) << 4) | ((w) << 
6))
 #define SWIZZLE_XYZW  \
  TRANSLATE_SWIZZLE(TGSI_SWIZZLE_X,TGSI_SWIZZLE_Y,TGSI_SWIZZLE_Z,TGSI_SWIZZLE_W)
@@ -320,36 +242,6 @@ src_token( unsigned file, int number )
 
 
 static IN

[Mesa-dev] [PATCH 3/8] svga: formatting fixes in svga_tgsi_insn.c

2013-06-19 Thread Brian Paul
---
 src/gallium/drivers/svga/svga_tgsi_insn.c |  655 +
 1 file changed, 395 insertions(+), 260 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_tgsi_insn.c 
b/src/gallium/drivers/svga/svga_tgsi_insn.c
index f94f3b2..1e0579d 100644
--- a/src/gallium/drivers/svga/svga_tgsi_insn.c
+++ b/src/gallium/drivers/svga/svga_tgsi_insn.c
@@ -39,8 +39,7 @@ static boolean emit_ps_postamble( struct svga_shader_emitter 
*emit );
 
 
 static unsigned
-translate_opcode(
-   uint opcode )
+translate_opcode(uint opcode)
 {
switch (opcode) {
case TGSI_OPCODE_ABS:return SVGA3DOP_ABS;
@@ -64,7 +63,8 @@ translate_opcode(
 }
 
 
-static unsigned translate_file( unsigned file )
+static unsigned
+translate_file(unsigned file)
 {
switch (file) {
case TGSI_FILE_TEMPORARY: return SVGA3DREG_TEMP;
@@ -134,6 +134,7 @@ swizzle( struct src_register src,
return src;
 }
 
+
 static struct src_register
 scalar( struct src_register src,
 int comp )
@@ -141,7 +142,8 @@ scalar( struct src_register src,
return swizzle( src, comp, comp, comp, comp );
 }
 
-static INLINE boolean
+
+static boolean
 svga_arl_needs_adjustment( const struct svga_shader_emitter *emit )
 {
int i;
@@ -153,7 +155,8 @@ svga_arl_needs_adjustment( const struct svga_shader_emitter 
*emit )
return FALSE;
 }
 
-static INLINE int
+
+static int
 svga_arl_adjustment( const struct svga_shader_emitter *emit )
 {
int i;
@@ -165,6 +168,7 @@ svga_arl_adjustment( const struct svga_shader_emitter *emit 
)
return 0;
 }
 
+
 static struct src_register
 translate_src_register( const struct svga_shader_emitter *emit,
 const struct tgsi_full_src_register *reg )
@@ -190,7 +194,6 @@ translate_src_register( const struct svga_shader_emitter 
*emit,
default:
   src = src_register( translate_file( reg->Register.File ),
   reg->Register.Index );
-
   break;
}
 
@@ -258,7 +261,7 @@ translate_src_register( const struct svga_shader_emitter 
*emit,
  * Note: if we exceed the temporary register limit we just use
  * register SVGA3D_TEMPREG_MAX - 1.
  */
-static INLINE SVGA3dShaderDestToken
+static SVGA3dShaderDestToken
 get_temp( struct svga_shader_emitter *emit )
 {
int i = emit->nr_hw_temp + emit->internal_temp_count++;
@@ -267,11 +270,13 @@ get_temp( struct svga_shader_emitter *emit )
return dst_register( SVGA3DREG_TEMP, i );
 }
 
-/* Release a single temp.  Currently only effective if it was the last
+
+/**
+ * Release a single temp.  Currently only effective if it was the last
  * allocated temp, otherwise release will be delayed until the next
  * call to reset_temp_regs().
  */
-static INLINE void
+static void
 release_temp( struct svga_shader_emitter *emit,
   SVGA3dShaderDestToken temp )
 {
@@ -279,7 +284,9 @@ release_temp( struct svga_shader_emitter *emit,
   emit->internal_temp_count--;
 }
 
-static void reset_temp_regs( struct svga_shader_emitter *emit )
+
+static void
+reset_temp_regs(struct svga_shader_emitter *emit)
 {
emit->internal_temp_count = 0;
 }
@@ -290,9 +297,10 @@ static void reset_temp_regs( struct svga_shader_emitter 
*emit )
  * important given that several opcodes have constraints in the allowed
  * swizzles).
  */
-static boolean emit_repl( struct svga_shader_emitter *emit,
-  SVGA3dShaderDestToken dst,
-  struct src_register *src0)
+static boolean
+emit_repl(struct svga_shader_emitter *emit,
+  SVGA3dShaderDestToken dst,
+  struct src_register *src0)
 {
unsigned src0_swizzle;
unsigned chan;
@@ -320,35 +328,40 @@ static boolean emit_repl( struct svga_shader_emitter 
*emit,
 }
 
 
-static boolean submit_op0( struct svga_shader_emitter *emit,
-   SVGA3dShaderInstToken inst,
-   SVGA3dShaderDestToken dest )
+static boolean
+submit_op0(struct svga_shader_emitter *emit,
+   SVGA3dShaderInstToken inst,
+   SVGA3dShaderDestToken dest)
 {
return (emit_instruction( emit, inst ) &&
emit_dst( emit, dest ));
 }
 
-static boolean submit_op1( struct svga_shader_emitter *emit,
-   SVGA3dShaderInstToken inst,
-   SVGA3dShaderDestToken dest,
-   struct src_register src0 )
+
+static boolean
+submit_op1(struct svga_shader_emitter *emit,
+   SVGA3dShaderInstToken inst,
+   SVGA3dShaderDestToken dest,
+   struct src_register src0)
 {
return emit_op1( emit, inst, dest, src0 );
 }
 
 
-/* SVGA shaders may not refer to >1 constant register in a single
+/**
+ * SVGA shaders may not refer to >1 constant register in a single
  * instruction.  This function checks for that usage and inserts a
  * move to temporary if detected.
  *
  * The same applies to input registers -- at most a single input
  * register may be read by any instruction.
  */
-static boolean submit_op2( struc

[Mesa-dev] [PATCH 2/8] mesa: wrap comments, code to 78 columns in multisample.c

2013-06-19 Thread Brian Paul
---
 src/mesa/main/multisample.c |   17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/src/mesa/main/multisample.c b/src/mesa/main/multisample.c
index f4116ca..8b974c1 100644
--- a/src/mesa/main/multisample.c
+++ b/src/mesa/main/multisample.c
@@ -120,7 +120,8 @@ _mesa_SampleMaski(GLuint index, GLbitfield mask)
 }
 
 
-/* Helper for checking a requested sample count against the limit
+/**
+ * Helper for checking a requested sample count against the limit
  * for a particular (target, internalFormat) pair. The limit imposed,
  * and the error generated, both depend on which extensions are supported.
  *
@@ -131,8 +132,9 @@ GLenum
 _mesa_check_sample_count(struct gl_context *ctx, GLenum target,
  GLenum internalFormat, GLsizei samples)
 {
-   /* If ARB_internalformat_query is supported, then treat its highest 
returned sample
-* count as the absolute maximum for this format; it is allowed to exceed 
MAX_SAMPLES.
+   /* If ARB_internalformat_query is supported, then treat its highest
+* returned sample count as the absolute maximum for this format; it is
+* allowed to exceed MAX_SAMPLES.
 *
 * From the ARB_internalformat_query spec:
 *
@@ -141,7 +143,8 @@ _mesa_check_sample_count(struct gl_context *ctx, GLenum 
target,
 */
if (ctx->Extensions.ARB_internalformat_query) {
   GLint buffer[16];
-  int count = ctx->Driver.QuerySamplesForFormat(ctx, target, 
internalFormat, buffer);
+  int count = ctx->Driver.QuerySamplesForFormat(ctx, target,
+internalFormat, buffer);
   int limit = count ? buffer[0] : -1;
 
   return samples > limit ? GL_INVALID_OPERATION : GL_NO_ERROR;
@@ -159,7 +162,8 @@ _mesa_check_sample_count(struct gl_context *ctx, GLenum 
target,
 *
 * And when describing the operation of TexImage*Multisample:
 *
-* "The error INVALID_OPERATION may be generated if any of the following 
are true:
+* "The error INVALID_OPERATION may be generated if any of the following
+* are true:
 *
 * *  is a depth/stencil-renderable format and 
 *   is greater than the value of MAX_DEPTH_TEXTURE_SAMPLES
@@ -171,7 +175,8 @@ _mesa_check_sample_count(struct gl_context *ctx, GLenum 
target,
 
if (ctx->Extensions.ARB_texture_multisample) {
   if (_mesa_is_enum_format_integer(internalFormat))
- return samples > ctx->Const.MaxIntegerSamples ? GL_INVALID_OPERATION 
: GL_NO_ERROR;
+ return samples > ctx->Const.MaxIntegerSamples
+? GL_INVALID_OPERATION : GL_NO_ERROR;
 
   if (target == GL_TEXTURE_2D_MULTISAMPLE ||
   target == GL_TEXTURE_2D_MULTISAMPLE_ARRAY) {
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/8] mesa: remove unused BITSET64 macros

2013-06-19 Thread Brian Paul
---
 src/mesa/main/bitset.h |   61 
 1 file changed, 61 deletions(-)

diff --git a/src/mesa/main/bitset.h b/src/mesa/main/bitset.h
index c3b060b..601fd0e 100644
--- a/src/mesa/main/bitset.h
+++ b/src/mesa/main/bitset.h
@@ -96,65 +96,4 @@ __bitset_ffs(const BITSET_WORD *x, int n)
 
 #define BITSET_FFS(x) __bitset_ffs(x, Elements(x))
 
-/
- * 64-bit bitset implementation
- */
- 
-#define BITSET64_WORD GLuint
-#define BITSET64_WORDBITS (sizeof (BITSET64_WORD) * 8)
-
-/* bitset declarations
- */
-#define BITSET64_DECLARE(name, size) \
-   GLuint name[2]
-
-/* bitset operations
- */
-#define BITSET64_COPY(x, y) do { (x)[0] = (y)[0]; (x)[1] = (y)[1]; } while (0)
-#define BITSET64_EQUAL(x, y) ( (x)[0] == (y)[0] && (x)[1] == (y)[1] )
-#define BITSET64_ZERO(x) do { (x)[0] = 0; (x)[1] = 0; } while (0)
-#define BITSET64_ONES(x) do { (x)[0] = 0xFF; (x)[1] = 0xFF; } while (0)
-
-#define BITSET64_BITWORD(b) ((b) / BITSET64_WORDBITS)
-#define BITSET64_BIT(b) (1 << ((b) % BITSET64_WORDBITS))
-
-/* single bit operations
- */
-#define BITSET64_TEST(x, b) ((x)[BITSET64_BITWORD(b)] & BITSET64_BIT(b))
-#define BITSET64_SET(x, b) ((x)[BITSET64_BITWORD(b)] |= BITSET64_BIT(b))
-#define BITSET64_CLEAR(x, b) ((x)[BITSET64_BITWORD(b)] &= ~BITSET64_BIT(b))
-
-#define BITSET64_MASK(b) ((b) == BITSET64_WORDBITS ? ~0 : BITSET64_BIT(b) - 1)
-#define BITSET64_RANGE(b, e) (BITSET64_MASK((e) + 1) & ~BITSET64_MASK(b))
-
-/* bit range operations
- */
-#define BITSET64_TEST_SUBRANGE(x, b, e) \
-   (BITSET64_BITWORD(b) == BITSET64_BITWORD(e) ? \
-   ((x)[BITSET64_BITWORD(b)] & BITSET64_RANGE(b, e)) : \
-   (assert (!"BITSET64_TEST_RANGE: bit range crosses word boundary"), 0))
-#define BITSET64_TEST_RANGE(x, b, e) \
-   (BITSET64_BITWORD(b) == BITSET64_BITWORD(e) ? \
-   (BITSET64_TEST_SUBRANGE(x, b, e)) : \
-   (BITSET64_TEST_SUBRANGE(x, b, BITSET64_WORDBITS - 1) | \
-BITSET64_TEST_SUBRANGE(x, BITSET64_WORDBITS, e)))
-#define BITSET64_SET_SUBRANGE(x, b, e) \
-   (BITSET64_BITWORD(b) == BITSET64_BITWORD(e) ? \
-   ((x)[BITSET64_BITWORD(b)] |= BITSET64_RANGE(b, e)) : \
-   (assert (!"BITSET64_SET_RANGE: bit range crosses word boundary"), 0))
-#define BITSET64_SET_RANGE(x, b, e) \
-   (BITSET64_BITWORD(b) == BITSET64_BITWORD(e) ? \
-   (BITSET64_SET_SUBRANGE(x, b, e)) : \
-   (BITSET64_SET_SUBRANGE(x, b, BITSET64_WORDBITS - 1) | \
-BITSET64_SET_SUBRANGE(x, BITSET64_WORDBITS, e)))
-#define BITSET64_CLEAR_SUBRANGE(x, b, e) \
-   (BITSET64_BITWORD(b) == BITSET64_BITWORD(e) ? \
-   ((x)[BITSET64_BITWORD(b)] &= ~BITSET64_RANGE(b, e)) : \
-   (assert (!"BITSET64_CLEAR_RANGE: bit range crosses word boundary"), 0))
-#define BITSET64_CLEAR_RANGE(x, b, e) \
-   (BITSET64_BITWORD(b) == BITSET64_BITWORD(e) ? \
-   (BITSET64_CLEAR_SUBRANGE(x, b, e)) : \
-   (BITSET64_CLEAR_SUBRANGE(x, b, BITSET64_WORDBITS - 1) | \
-BITSET64_CLEAR_SUBRANGE(x, BITSET64_WORDBITS, e)))
-
 #endif
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radeonsi: Handle TGSI_OPCODE_DDX/Y using local memory

2013-06-19 Thread Michel Dänzer
From: Michel Dänzer 

16 more little piglits.

Signed-off-by: Michel Dänzer 
---
 src/gallium/drivers/radeonsi/radeonsi_compute.c |   2 +-
 src/gallium/drivers/radeonsi/radeonsi_shader.c  | 101 +++-
 src/gallium/drivers/radeonsi/radeonsi_shader.h  |   1 +
 src/gallium/drivers/radeonsi/si_state_draw.c|   1 +
 4 files changed, 103 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_compute.c 
b/src/gallium/drivers/radeonsi/radeonsi_compute.c
index ed51587..a4a2856 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_compute.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_compute.c
@@ -191,7 +191,7 @@ static void radeonsi_launch_grid(
| S_00B84C_TGID_Z_EN(1)
| S_00B84C_TG_SIZE_EN(1)
| S_00B84C_TIDIG_COMP_CNT(2)
-   | S_00B84C_LDS_SIZE(0)
+   | S_00B84C_LDS_SIZE(shader->lds_size)
| S_00B84C_EXCP_EN(0))
;
si_pm4_set_reg(pm4, R_00B854_COMPUTE_RESOURCE_LIMITS, 0);
diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c 
b/src/gallium/drivers/radeonsi/radeonsi_shader.c
index 7f3d38b..fee6262 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_shader.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c
@@ -61,6 +61,9 @@ struct si_shader_context
unsigned type; /* TGSI_PROCESSOR_* specifies the type of shader. */
LLVMValueRef const_md;
LLVMValueRef const_resource;
+#if HAVE_LLVM >= 0x0304
+   LLVMValueRef ddxy_lds;
+#endif
LLVMValueRef *constants;
LLVMValueRef *resources;
LLVMValueRef *samplers;
@@ -82,6 +85,7 @@ static struct si_shader_context * si_shader_context(
 
 #define USE_SGPR_MAX_SUFFIX_LEN 5
 #define CONST_ADDR_SPACE 2
+#define LOCAL_ADDR_SPACE 3
 #define USER_SGPR_ADDR_SPACE 8
 
 /**
@@ -1065,6 +1069,78 @@ static void txq_fetch_args(
4);
 }
 
+#if HAVE_LLVM >= 0x0304
+
+static void si_llvm_emit_ddxy(
+   const struct lp_build_tgsi_action * action,
+   struct lp_build_tgsi_context * bld_base,
+   struct lp_build_emit_data * emit_data)
+{
+   struct si_shader_context *si_shader_ctx = si_shader_context(bld_base);
+   struct gallivm_state *gallivm = bld_base->base.gallivm;
+   struct lp_build_context * base = &bld_base->base;
+   const struct tgsi_full_instruction *inst = emit_data->inst;
+   unsigned opcode = inst->Instruction.Opcode;
+   LLVMValueRef indices[2];
+   LLVMValueRef store_ptr, load_ptr0, load_ptr1;
+   LLVMValueRef tl, trbl, result[4];
+   LLVMTypeRef i32;
+   unsigned swizzle[4];
+   unsigned c;
+
+   i32 = LLVMInt32TypeInContext(gallivm->context);
+
+   indices[0] = bld_base->uint_bld.zero;
+   indices[1] = build_intrinsic(gallivm->builder, "llvm.SI.tid", i32,
+NULL, 0, LLVMReadNoneAttribute);
+   store_ptr = LLVMBuildGEP(gallivm->builder, si_shader_ctx->ddxy_lds,
+indices, 2, "");
+
+   indices[1] = LLVMBuildAnd(gallivm->builder, indices[1],
+ lp_build_const_int32(gallivm, 0xfffc), 
"");
+   load_ptr0 = LLVMBuildGEP(gallivm->builder, si_shader_ctx->ddxy_lds,
+indices, 2, "");
+
+   indices[1] = LLVMBuildAdd(gallivm->builder, indices[1],
+ lp_build_const_int32(gallivm,
+  opcode == 
TGSI_OPCODE_DDX ? 1 : 2),
+ "");
+   load_ptr1 = LLVMBuildGEP(gallivm->builder, si_shader_ctx->ddxy_lds,
+indices, 2, "");
+
+   for (c = 0; c < 4; ++c) {
+   unsigned i;
+
+   swizzle[c] = 
tgsi_util_get_full_src_register_swizzle(&inst->Src[0], c);
+   for (i = 0; i < c; ++i) {
+   if (swizzle[i] == swizzle[c]) {
+   result[c] = result[i];
+   break;
+   }
+   }
+   if (i != c)
+   continue;
+
+   LLVMBuildStore(gallivm->builder,
+  LLVMBuildBitCast(gallivm->builder,
+   lp_build_emit_fetch(bld_base, 
inst, 0, c),
+   i32, ""),
+  store_ptr);
+
+   tl = LLVMBuildLoad(gallivm->builder, load_ptr0, "");
+   tl = LLVMBuildBitCast(gallivm->builder, tl, base->elem_type, 
"");
+
+   trbl = LLVMBuildLoad(gallivm->builder, load_ptr1, "");
+   trbl = LLVMBuildBitCast(gallivm->builder, trbl, 
base->elem_type, "");
+
+   result[c] = LLVMBuildFSub(gallivm->builder, trbl, tl, "");
+   }
+
+   emit_data->output[0] = lp_build_gather_values(gallivm, result, 4);
+}
+
+#endif /* HAVE_LLVM >= 0x0304 */
+
 static const struct lp_build_tgsi_act

[Mesa-dev] [PATCH v2 1/2] radeonsi: Handle TGSI_OPCODE_TXD

2013-06-19 Thread Michel Dänzer
From: Michel Dänzer 

One more little piglit.

Signed-off-by: Michel Dänzer 
---

v2: Only use the new functionality as of LLVM 3.4.

 src/gallium/drivers/radeonsi/radeonsi_shader.c | 27 --
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c 
b/src/gallium/drivers/radeonsi/radeonsi_shader.c
index f6fdfae..7f3d38b 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_shader.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c
@@ -875,6 +875,7 @@ static void tex_fetch_args(
const struct tgsi_full_instruction * inst = emit_data->inst;
unsigned opcode = inst->Instruction.Opcode;
unsigned target = inst->Texture.Texture;
+   unsigned sampler_src;
LLVMValueRef coords[4];
LLVMValueRef address[16];
int ref_pos;
@@ -920,6 +921,15 @@ static void tex_fetch_args(
address[count++] = lp_build_emit_fetch(bld_base, inst, 1, 0);
}
 
+   /* Pack user derivatives */
+   if (opcode == TGSI_OPCODE_TXD) {
+   for (chan = 0; chan < 2; chan++) {
+   address[count++] = lp_build_emit_fetch(bld_base, inst, 
1, chan);
+   if (num_coords > 1)
+   address[count++] = 
lp_build_emit_fetch(bld_base, inst, 2, chan);
+   }
+   }
+
/* Pack texture coordinates */
address[count++] = coords[0];
if (num_coords > 1)
@@ -961,8 +971,10 @@ static void tex_fetch_args(
 "");
}
 
+   sampler_src = emit_data->inst->Instruction.NumSrcRegs - 1;
+
/* Resource */
-   emit_data->args[1] = 
si_shader_ctx->resources[emit_data->inst->Src[1].Register.Index];
+   emit_data->args[1] = 
si_shader_ctx->resources[emit_data->inst->Src[sampler_src].Register.Index];
 
if (opcode == TGSI_OPCODE_TXF) {
/* add tex offsets */
@@ -993,7 +1005,7 @@ static void tex_fetch_args(
emit_data->arg_count = 3;
} else {
/* Sampler */
-   emit_data->args[2] = 
si_shader_ctx->samplers[emit_data->inst->Src[1].Register.Index];
+   emit_data->args[2] = 
si_shader_ctx->samplers[emit_data->inst->Src[sampler_src].Register.Index];
 
emit_data->dst_type = LLVMVectorType(
LLVMFloatTypeInContext(bld_base->base.gallivm->context),
@@ -1065,6 +1077,14 @@ static const struct lp_build_tgsi_action txb_action = {
.intr_name = "llvm.SI.sampleb."
 };
 
+#if HAVE_LLVM >= 0x0304
+static const struct lp_build_tgsi_action txd_action = {
+   .fetch_args = tex_fetch_args,
+   .emit = build_tex_intrinsic,
+   .intr_name = "llvm.SI.sampled."
+};
+#endif
+
 static const struct lp_build_tgsi_action txf_action = {
.fetch_args = tex_fetch_args,
.emit = build_tex_intrinsic,
@@ -1321,6 +1341,9 @@ int si_pipe_shader_create(
 
bld_base->op_actions[TGSI_OPCODE_TEX] = tex_action;
bld_base->op_actions[TGSI_OPCODE_TXB] = txb_action;
+#if HAVE_LLVM >= 0x0304
+   bld_base->op_actions[TGSI_OPCODE_TXD] = txd_action;
+#endif
bld_base->op_actions[TGSI_OPCODE_TXF] = txf_action;
bld_base->op_actions[TGSI_OPCODE_TXL] = txl_action;
bld_base->op_actions[TGSI_OPCODE_TXP] = tex_action;
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] R600/SI: Support for local memory and derivatives

2013-06-19 Thread Michel Dänzer

These patches implement enough of local memory support to allow radeonsi
to use that for computing derivatives, as suggested by Tom.

They also almost allow test/CodeGen/R600/local-memory.ll to generate
code for SI. Right now it still fails because it tries to copy a VGPR to
an SGPR, which is not possible.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
From f4ca359c4536aa53122b654196f2e007d50976f8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Michel=20D=C3=A4nzer?= 
Date: Thu, 21 Feb 2013 16:12:45 +0100
Subject: [PATCH 1/6] R600/SI: Add intrinsics for texture sampling with user
 derivatives
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Michel Dänzer 
---
 lib/Target/R600/SIInstructions.td | 7 ++-
 lib/Target/R600/SIIntrinsics.td   | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
index 9c96c08..c9eac7d 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -535,7 +535,7 @@ def IMAGE_SAMPLE_B : MIMG_Sampler_Helper <0x0025, "IMAGE_SAMPLE_B">;
 //def IMAGE_SAMPLE_LZ : MIMG_NoPattern_ <"IMAGE_SAMPLE_LZ", 0x0027>;
 def IMAGE_SAMPLE_C : MIMG_Sampler_Helper <0x0028, "IMAGE_SAMPLE_C">;
 //def IMAGE_SAMPLE_C_CL : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_CL", 0x0029>;
-//def IMAGE_SAMPLE_C_D : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_D", 0x002a>;
+def IMAGE_SAMPLE_C_D : MIMG_Sampler_Helper <0x002a, "IMAGE_SAMPLE_C_D">;
 //def IMAGE_SAMPLE_C_D_CL : MIMG_NoPattern_ <"IMAGE_SAMPLE_C_D_CL", 0x002b>;
 def IMAGE_SAMPLE_C_L : MIMG_Sampler_Helper <0x002c, "IMAGE_SAMPLE_C_L">;
 def IMAGE_SAMPLE_C_B : MIMG_Sampler_Helper <0x002d, "IMAGE_SAMPLE_C_B">;
@@ -1296,6 +1296,11 @@ multiclass SamplePatterns {
   def : SampleArrayPattern ;
   def : SampleShadowPattern ;
   def : SampleShadowArrayPattern ;
+
+  def : SamplePattern ;
+  def : SampleArrayPattern ;
+  def : SampleShadowPattern ;
+  def : SampleShadowArrayPattern ;
 }
 
 defm : SamplePatterns;
diff --git a/lib/Target/R600/SIIntrinsics.td b/lib/Target/R600/SIIntrinsics.td
index 224cd2f..d2643e0 100644
--- a/lib/Target/R600/SIIntrinsics.td
+++ b/lib/Target/R600/SIIntrinsics.td
@@ -23,6 +23,7 @@ let TargetPrefix = "SI", isTarget = 1 in {
 
   def int_SI_sample : Sample;
   def int_SI_sampleb : Sample;
+  def int_SI_sampled : Sample;
   def int_SI_samplel : Sample;
 
   def int_SI_imageload : Intrinsic <[llvm_v4i32_ty], [llvm_anyvector_ty, llvm_v32i8_ty, llvm_i32_ty], [IntrNoMem]>;
-- 
1.8.3.1

From 7a0048bb2ab1b661831da2b764bf1a52f66bec15 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Michel=20D=C3=A4nzer?= 
Date: Thu, 21 Feb 2013 18:51:38 +0100
Subject: [PATCH v3 2/6] R600/SI: Initial support for LDS/GDS instructions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Michel Dänzer 
---

v3: Drop vdst operand from DS_Store_Helper class, and adapt
SIInsertWaits::getHwCounts() to handle that. Unfortunately, this seems
to mess up the asm string output somehow, not sure what's going on
there.

 lib/Target/R600/SIInsertWaits.cpp  |  2 ++
 lib/Target/R600/SIInstrFormats.td  | 24 
 lib/Target/R600/SIInstrInfo.td | 23 +++
 lib/Target/R600/SIInstructions.td  |  3 +++
 lib/Target/R600/SILowerControlFlow.cpp | 16 
 5 files changed, 68 insertions(+)

diff --git a/lib/Target/R600/SIInsertWaits.cpp b/lib/Target/R600/SIInsertWaits.cpp
index c36e1dc..d31da45 100644
--- a/lib/Target/R600/SIInsertWaits.cpp
+++ b/lib/Target/R600/SIInsertWaits.cpp
@@ -134,6 +134,8 @@ Counters SIInsertWaits::getHwCounts(MachineInstr &MI) {
   if (TSFlags & SIInstrFlags::LGKM_CNT) {
 
 MachineOperand &Op = MI.getOperand(0);
+if (!Op.isReg())
+  Op = MI.getOperand(1);
 assert(Op.isReg() && "First LGKM operand must be a register!");
 
 unsigned Reg = Op.getReg();
diff --git a/lib/Target/R600/SIInstrFormats.td b/lib/Target/R600/SIInstrFormats.td
index 51f323d..434aa7e 100644
--- a/lib/Target/R600/SIInstrFormats.td
+++ b/lib/Target/R600/SIInstrFormats.td
@@ -281,6 +281,30 @@ class VINTRP  op, dag outs, dag ins, string asm, list pattern> :
 
 let Uses = [EXEC] in {
 
+class DS  op, dag outs, dag ins, string asm, list pattern> :
+Enc64  {
+
+  bits<8> vdst;
+  bits<1> gds;
+  bits<8> addr;
+  bits<8> data0;
+  bits<8> data1;
+  bits<8> offset0;
+  bits<8> offset1;
+
+  let Inst{7-0} = offset0;
+  let Inst{15-8} = offset1;
+  let Inst{17} = gds;
+  let Inst{25-18} = op;
+  let Inst{31-26} = 0x36; //encoding
+  let Inst{39-32} = addr;
+  let Inst{47-40} = data0;
+  let Inst{55-48} = data1;
+  let Inst{63-56} = vdst;
+
+  let LGKM_CNT = 1;
+}
+
 class MUBUF  op, dag outs, dag ins, string asm, list pattern> :
 Enc64 {
 
diff --git a/lib/Target/R600/SIInstr

[Mesa-dev] [PATCH] st/dri/sw: Fix pitch calculation in drisw_update_tex_buffer

2013-06-19 Thread Richard Sandiford
swrastGetImage rounds the pitch up to 4 bytes for compatibility reasons
that are explained in drisw_glx.c:bytes_per_line, so drisw_update_tex_buffer
must do the same.

Fixes window skew seen while running firefox over vnc on a 16-bit screen.

Signed-off-by: Richard Sandiford 
---
 src/gallium/state_trackers/dri/sw/drisw.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/gallium/state_trackers/dri/sw/drisw.c 
b/src/gallium/state_trackers/dri/sw/drisw.c
index 7a5f797..f9fddf7 100644
--- a/src/gallium/state_trackers/dri/sw/drisw.c
+++ b/src/gallium/state_trackers/dri/sw/drisw.c
@@ -265,8 +265,9 @@ drisw_update_tex_buffer(struct dri_drawable *drawable,
/* Copy the Drawable content to the mapped texture buffer */
get_image(dPriv, x, y, w, h, map);
 
-   /* The pipe transfer has a pitch rounded up to the nearest 64 pixels. */
-   ximage_stride = w * cpp;
+   /* The pipe transfer has a pitch rounded up to the nearest 64 pixels.
+  get_image() has a patch rounded up to 4 bytes.  */
+   ximage_stride = ((w * cpp) + 3) & -4;
for (line = h-1; line; --line) {
   memmove(&map[line * transfer->stride],
   &map[line * ximage_stride],
-- 
1.7.11.7

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] R600/SI: Expand sub for v2i32 and v4i32 for SI

2013-06-19 Thread Tom Stellard
On Mon, Jun 17, 2013 at 04:11:40PM -0500, Aaron Watry wrote:
> Also add a v2i32 test to the existing v4i32 test.
> 
> Note: v2i32 for EG seems slightly out of order based on the normal
> ordering. i.e. "SUB_INT * T..." comes before the "SUB_INT T..."
> I am not sure if this is correct, but it's how the current R600 back-end
> emits the operation order.
> 
Reviewed-by: Tom Stellard 
> Signed-off-by: Aaron Watry
> ---
>  lib/Target/R600/SIISelLowering.cpp |  3 +++
>  test/CodeGen/R600/sub.ll   | 37 +++--
>  2 files changed, 34 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/Target/R600/SIISelLowering.cpp 
> b/lib/Target/R600/SIISelLowering.cpp
> index bf4918a..ea2b123 100644
> --- a/lib/Target/R600/SIISelLowering.cpp
> +++ b/lib/Target/R600/SIISelLowering.cpp
> @@ -68,6 +68,9 @@ SITargetLowering::SITargetLowering(TargetMachine &TM) :
>setOperationAction(ISD::ADD, MVT::v4i32, Expand);
>setOperationAction(ISD::ADD, MVT::v2i32, Expand);
>  
> +  setOperationAction(ISD::SUB, MVT::v2i32, Expand);
> +  setOperationAction(ISD::SUB, MVT::v4i32, Expand);
> +
>setOperationAction(ISD::SELECT_CC, MVT::f32, Custom);
>setOperationAction(ISD::SELECT_CC, MVT::i32, Custom);
>  
> diff --git a/test/CodeGen/R600/sub.ll b/test/CodeGen/R600/sub.ll
> index 12bfba3..10fce6c 100644
> --- a/test/CodeGen/R600/sub.ll
> +++ b/test/CodeGen/R600/sub.ll
> @@ -1,11 +1,36 @@
> -;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
> +;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK 
> %s
> +;RUN: llc < %s -march=r600 -mcpu=verde | FileCheck --check-prefix=SI-CHECK %s
>  
> -;CHECK: SUB_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> -;CHECK: SUB_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> -;CHECK: SUB_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> -;CHECK: SUB_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;EG-CHECK: @test2
> +;EG-CHECK: SUB_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;EG-CHECK: SUB_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
>  
> -define void @test(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) 
> {
> +;SI-CHECK: @test2
> +;SI-CHECK: V_SUB_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
> +;SI-CHECK: V_SUB_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
> +
> +define void @test2(<2 x i32> addrspace(1)* %out, <2 x i32> addrspace(1)* 
> %in) {
> +  %b_ptr = getelementptr <2 x i32> addrspace(1)* %in, i32 1
> +  %a = load <2 x i32> addrspace(1) * %in
> +  %b = load <2 x i32> addrspace(1) * %b_ptr
> +  %result = sub <2 x i32> %a, %b
> +  store <2 x i32> %result, <2 x i32> addrspace(1)* %out
> +  ret void
> +}
> +
> +;EG-CHECK: @test4
> +;EG-CHECK: SUB_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;EG-CHECK: SUB_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;EG-CHECK: SUB_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;EG-CHECK: SUB_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +
> +;SI-CHECK: @test4
> +;SI-CHECK: V_SUB_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
> +;SI-CHECK: V_SUB_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
> +;SI-CHECK: V_SUB_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
> +;SI-CHECK: V_SUB_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
> +
> +define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* 
> %in) {
>%b_ptr = getelementptr <4 x i32> addrspace(1)* %in, i32 1
>%a = load <4 x i32> addrspace(1) * %in
>%b = load <4 x i32> addrspace(1) * %b_ptr
> -- 
> 1.8.1.2
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] R600/SI: Expand add for v2i32 and v4i32

2013-06-19 Thread Tom Stellard
On Mon, Jun 17, 2013 at 04:11:39PM -0500, Aaron Watry wrote:
> Also add SI tests to existing file and a v2i32 test for both
> R600 and SI.
> 
Reviewed-by: Tom Stellard 
> Signed-off-by: Aaron Watry 
> ---
>  lib/Target/R600/SIISelLowering.cpp |  2 ++
>  test/CodeGen/R600/add.ll   | 37 +++--
>  2 files changed, 33 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/Target/R600/SIISelLowering.cpp 
> b/lib/Target/R600/SIISelLowering.cpp
> index d74f401..bf4918a 100644
> --- a/lib/Target/R600/SIISelLowering.cpp
> +++ b/lib/Target/R600/SIISelLowering.cpp
> @@ -65,6 +65,8 @@ SITargetLowering::SITargetLowering(TargetMachine &TM) :
>  
>setOperationAction(ISD::ADD, MVT::i64, Legal);
>setOperationAction(ISD::ADD, MVT::i32, Legal);
> +  setOperationAction(ISD::ADD, MVT::v4i32, Expand);
> +  setOperationAction(ISD::ADD, MVT::v2i32, Expand);
>  
>setOperationAction(ISD::SELECT_CC, MVT::f32, Custom);
>setOperationAction(ISD::SELECT_CC, MVT::i32, Custom);
> diff --git a/test/CodeGen/R600/add.ll b/test/CodeGen/R600/add.ll
> index 185998b..dd590e5 100644
> --- a/test/CodeGen/R600/add.ll
> +++ b/test/CodeGen/R600/add.ll
> @@ -1,11 +1,36 @@
> -;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
> +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck 
> --check-prefix=EG-CHECK %s
> +; RUN: llc < %s -march=r600 -mcpu=verde | FileCheck --check-prefix=SI-CHECK 
> %s
>  
> -;CHECK: ADD_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> -;CHECK: ADD_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> -;CHECK: ADD_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> -;CHECK: ADD_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;EG-CHECK: @test2
> +;EG-CHECK: ADD_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;EG-CHECK: ADD_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], literal\.[xyzw]}}
>  
> -define void @test(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) 
> {
> +;SI-CHECK: @test2
> +;SI-CHECK: V_ADD_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
> +;SI-CHECK: V_ADD_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
> +
> +define void @test2(<2 x i32> addrspace(1)* %out, <2 x i32> addrspace(1)* 
> %in) {
> +  %b_ptr = getelementptr <2 x i32> addrspace(1)* %in, i32 1
> +  %a = load <2 x i32> addrspace(1) * %in
> +  %b = load <2 x i32> addrspace(1) * %b_ptr
> +  %result = add <2 x i32> %a, %b
> +  store <2 x i32> %result, <2 x i32> addrspace(1)* %out
> +  ret void
> +}
> +
> +;EG-CHECK: @test4
> +;EG-CHECK: ADD_INT T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;EG-CHECK: ADD_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;EG-CHECK: ADD_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +;EG-CHECK: ADD_INT * T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
> +
> +;SI-CHECK: @test4
> +;SI-CHECK: V_ADD_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
> +;SI-CHECK: V_ADD_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
> +;SI-CHECK: V_ADD_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
> +;SI-CHECK: V_ADD_I32_e32 VGPR{{[0-9]+, VGPR[0-9]+, VGPR[0-9]+}}
> +
> +define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* 
> %in) {
>%b_ptr = getelementptr <4 x i32> addrspace(1)* %in, i32 1
>%a = load <4 x i32> addrspace(1) * %in
>%b = load <4 x i32> addrspace(1) * %b_ptr
> -- 
> 1.8.1.2
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] R600: Expand v2i32 load/store instead of custom lowering

2013-06-19 Thread Tom Stellard
On Mon, Jun 17, 2013 at 04:11:38PM -0500, Aaron Watry wrote:
> The custom lowering causes llc to crash with a segfault.
> 
> Ideally, the custom lowering can be fixed, but this allows
> programs which load/store v2i32 to work without crashing.
> 
> Signed-off-by: Aaron Watry
> ---
>  lib/Target/R600/R600ISelLowering.cpp | 4 ++--
>  test/CodeGen/R600/load.vec.ll| 6 ++
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/Target/R600/R600ISelLowering.cpp 
> b/lib/Target/R600/R600ISelLowering.cpp
> index 9cedadb..812df83 100644
> --- a/lib/Target/R600/R600ISelLowering.cpp
> +++ b/lib/Target/R600/R600ISelLowering.cpp
> @@ -86,7 +86,7 @@ R600TargetLowering::R600TargetLowering(TargetMachine &TM) :
>  
>// Legalize loads and stores to the private address space.
>setOperationAction(ISD::LOAD, MVT::i32, Custom);
> -  setOperationAction(ISD::LOAD, MVT::v2i32, Custom);
> +  setOperationAction(ISD::LOAD, MVT::v2i32, Expand);
>setOperationAction(ISD::LOAD, MVT::v4i32, Custom);
>setLoadExtAction(ISD::EXTLOAD, MVT::v4i8, Custom);
>setLoadExtAction(ISD::EXTLOAD, MVT::i8, Custom);
> @@ -94,7 +94,7 @@ R600TargetLowering::R600TargetLowering(TargetMachine &TM) :
>setLoadExtAction(ISD::ZEXTLOAD, MVT::v4i8, Custom);
>setOperationAction(ISD::STORE, MVT::i8, Custom);
>setOperationAction(ISD::STORE, MVT::i32, Custom);
> -  setOperationAction(ISD::STORE, MVT::v2i32, Custom);
> +  setOperationAction(ISD::STORE, MVT::v2i32, Expand);

We have to use custom lowering in order for things like constant loads
and global stores to work, so I think we need to fix whatever crash is
happening with custom lowering.

-Tom
>setOperationAction(ISD::STORE, MVT::v4i32, Custom);
>  
>setOperationAction(ISD::LOAD, MVT::i32, Custom);
> diff --git a/test/CodeGen/R600/load.vec.ll b/test/CodeGen/R600/load.vec.ll
> index 08e034e..da1149a 100644
> --- a/test/CodeGen/R600/load.vec.ll
> +++ b/test/CodeGen/R600/load.vec.ll
> @@ -1,6 +1,10 @@
> +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck 
> --check-prefix=EG-CHECK  %s
>  ; RUN: llc < %s -march=r600 -mcpu=SI | FileCheck --check-prefix=SI-CHECK  %s
>  
>  ; load a v2i32 value from the global address space.
> +; EG-CHECK: @load_v2i32
> +; EG-CHECK: VTX_READ_32 T{{[0-9]+}}.X, T{{[0-9]+}}.X, 4
> +; EG-CHECK: VTX_READ_32 T{{[0-9]+}}.X, T{{[0-9]+}}.X, 0
>  ; SI-CHECK: @load_v2i32
>  ; SI-CHECK: BUFFER_LOAD_DWORDX2 VGPR{{[0-9]+}}
>  define void @load_v2i32(<2 x i32> addrspace(1)* %out, <2 x i32> 
> addrspace(1)* %in) {
> @@ -10,6 +14,8 @@ define void @load_v2i32(<2 x i32> addrspace(1)* %out, <2 x 
> i32> addrspace(1)* %i
>  }
>  
>  ; load a v4i32 value from the global address space.
> +; EG-CHECK: @load_v4i32
> +; EG-CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 0
>  ; SI-CHECK: @load_v4i32
>  ; SI-CHECK: BUFFER_LOAD_DWORDX4 VGPR{{[0-9]+}}
>  define void @load_v4i32(<4 x i32> addrspace(1)* %out, <4 x i32> 
> addrspace(1)* %in) {
> -- 
> 1.8.1.2
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] R600/SI: Add support for v4i32 and v4f32 kernel args

2013-06-19 Thread Tom Stellard
On Tue, Jun 18, 2013 at 08:26:53PM -0500, Aaron Watry wrote:
> Tested on Pitcairn by: Aaron Watry 
>
Thanks.
 
> Follow-up question: Would it be as easy as it looks to add v2i32 right away?
>

I think so.

-Tom
> On Tue, Jun 18, 2013 at 6:21 PM, Tom Stellard  wrote:
> > From: Tom Stellard 
> >
> > ---
> >  lib/Target/R600/AMDGPUCallingConv.td|  9 +
> >  test/CodeGen/R600/128bit-kernel-args.ll | 16 ++--
> >  2 files changed, 15 insertions(+), 10 deletions(-)
> >
> > diff --git a/lib/Target/R600/AMDGPUCallingConv.td 
> > b/lib/Target/R600/AMDGPUCallingConv.td
> > index 84e4f3a..826932b 100644
> > --- a/lib/Target/R600/AMDGPUCallingConv.td
> > +++ b/lib/Target/R600/AMDGPUCallingConv.td
> > @@ -38,10 +38,11 @@ def CC_SI : CallingConv<[
> >
> >  // Calling convention for SI compute kernels
> >  def CC_SI_Kernel : CallingConv<[
> > -  CCIfType<[i64],  CCAssignToStack <8, 4>>,
> > -  CCIfType<[i32, f32], CCAssignToStack <4, 4>>,
> > -  CCIfType<[i16],  CCAssignToStack <2, 4>>,
> > -  CCIfType<[i8],   CCAssignToStack <1, 4>>
> > +  CCIfType<[v4i32, v4f32], CCAssignToStack <16, 4>>,
> > +  CCIfType<[i64],  CCAssignToStack < 8, 4>>,
> > +  CCIfType<[i32, f32], CCAssignToStack < 4, 4>>,
> > +  CCIfType<[i16],  CCAssignToStack < 2, 4>>,
> > +  CCIfType<[i8],   CCAssignToStack < 1, 4>>
> >  ]>;
> >
> >  def CC_AMDGPU : CallingConv<[
> > diff --git a/test/CodeGen/R600/128bit-kernel-args.ll 
> > b/test/CodeGen/R600/128bit-kernel-args.ll
> > index 114f9e7..bd60385 100644
> > --- a/test/CodeGen/R600/128bit-kernel-args.ll
> > +++ b/test/CodeGen/R600/128bit-kernel-args.ll
> > @@ -1,16 +1,20 @@
> > -;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
> > -
> > -; CHECK: @v4i32_kernel_arg
> > -; CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 40
> > +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s 
> > --check-prefix=R600-CHECK
> > +; RUN: llc < %s -march=r600 -mcpu=SI | FileCheck %s --check-prefix=SI-CHECK
> >
> > +; R600-CHECK: @v4i32_kernel_arg
> > +; R600-CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 40
> > +; SI-CHECK: @v4i32_kernel_arg
> > +; SI-CHECK: BUFFER_STORE_DWORDX4
> >  define void @v4i32_kernel_arg(<4 x i32> addrspace(1)* %out, <4 x i32>  
> > %in) {
> >  entry:
> >store <4 x i32> %in, <4 x i32> addrspace(1)* %out
> >ret void
> >  }
> >
> > -; CHECK: @v4f32_kernel_arg
> > -; CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 40
> > +; R600-CHECK: @v4f32_kernel_arg
> > +; R600-CHECK: VTX_READ_128 T{{[0-9]+}}.XYZW, T{{[0-9]+}}.X, 40
> > +; SI-CHECK: @v4f32_kernel_arg
> > +; SI-CHECK: BUFFER_STORE_DWORDX4
> >  define void @v4f32_kernel_args(<4 x float> addrspace(1)* %out, <4 x float> 
> >  %in) {
> >  entry:
> >store <4 x float> %in, <4 x float> addrspace(1)* %out
> > --
> > 1.7.11.4
> >
> > ___
> > llvm-commits mailing list
> > llvm-comm...@cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gallium: fix PIPE_QUERY_TIMESTAMP_DISJOINT

2013-06-19 Thread Jose Fonseca


- Original Message -
> 
> 
> - Original Message -
> > From: Roland Scheidegger 
> > 
> > The semantics didn't really make sense, not really matching neither d3d9
> > (though the docs are all broken there) nor d3d10. So make it match d3d10
> > semantics, which actually gives meaning to the "disjoint" part.
> > Drivers are fixed up in a very primitive way, I have no idea what could
> > actually cause the counter to become unreliable so just always return
> > FALSE for the disjoint part.
> > ---
> >  src/gallium/docs/source/context.rst   |   10 ++
> >  src/gallium/drivers/nv50/nv50_query.c |5 +++--
> >  src/gallium/drivers/nvc0/nvc0_query.c |4 +---
> >  3 files changed, 10 insertions(+), 9 deletions(-)
> > 
> > diff --git a/src/gallium/docs/source/context.rst
> > b/src/gallium/docs/source/context.rst
> > index ede89be..bfd58a4 100644
> > --- a/src/gallium/docs/source/context.rst
> > +++ b/src/gallium/docs/source/context.rst
> > @@ -330,11 +330,13 @@ scaled to nanoseconds, recorded after all commands
> > issued prior to
> >  This query does not require a call to ``begin_query``.
> >  The result is an unsigned 64-bit integer.
> >  
> > -``PIPE_QUERY_TIMESTAMP_DISJOINT`` can be used to check whether the
> > -internal timer resolution is good enough to distinguish between the
> > -events at ``begin_query`` and ``end_query``.
> > +``PIPE_QUERY_TIMESTAMP_DISJOINT`` can be used to check the
> > +internal timer resolution and whether the timestamp counter has become
> > +unreliable due to things like throttling etc. - only if this is FALSE
> > +a timestamp query (within the timestamp_disjoint query) should be trusted.
> >  The result is a 64-bit integer specifying the timer resolution in Hz,
> > -followed by a boolean value indicating whether the timer has incremented.
> > +followed by a boolean value indicating whether the timestamp counter
> > +is discontinuous or disjoint.
> >  
> >  ``PIPE_QUERY_PRIMITIVES_GENERATED`` returns a 64-bit integer indicating
> >  the number of primitives processed by the pipeline (regardless of whether
> > diff --git a/src/gallium/drivers/nv50/nv50_query.c
> > b/src/gallium/drivers/nv50/nv50_query.c
> > index 656ff9d..b97eff2 100644
> > --- a/src/gallium/drivers/nv50/nv50_query.c
> > +++ b/src/gallium/drivers/nv50/nv50_query.c
> > @@ -181,7 +181,6 @@ nv50_query_begin(struct pipe_context *pipe, struct
> > pipe_query *pq)
> >nv50_query_get(push, q, 0x20, 0x05805002);
> >nv50_query_get(push, q, 0x30, 0x06805002);
> >break;
> > -   case PIPE_QUERY_TIMESTAMP_DISJOINT:
> > case PIPE_QUERY_TIME_ELAPSED:
> >nv50_query_get(push, q, 0x10, 0x5002);
> >break;
> > @@ -229,6 +228,8 @@ nv50_query_end(struct pipe_context *pipe, struct
> > pipe_query *pq)
> > case NVA0_QUERY_STREAM_OUTPUT_BUFFER_OFFSET:
> >nv50_query_get(push, q, 0, 0x0d005002 | (q->index << 5));
> >break;
> > +   case PIPE_QUERY_TIMESTAMP_DISJOINT:
> > +  break;
> > default:
> >assert(0);
> >break;
> > @@ -289,7 +290,7 @@ nv50_query_result(struct pipe_context *pipe, struct
> > pipe_query *pq,
> >break;
> > case PIPE_QUERY_TIMESTAMP_DISJOINT: /* u32 sequence, u32 0, u64 time */
> >res64[0] = 10;
> 
> I don't think this statement is needed. Same below. Otherwise looks good.
> 
> Jose
> 
> > -  res8[8] = (data64[1] == data64[3]) ? FALSE : TRUE;
> > +  res8[8] = FALSE;
> >break;
> > case PIPE_QUERY_TIME_ELAPSED:
> >res64[0] = data64[1] - data64[3];
> > diff --git a/src/gallium/drivers/nvc0/nvc0_query.c
> > b/src/gallium/drivers/nvc0/nvc0_query.c
> > index 8e584c9..3f5a9fb 100644
> > --- a/src/gallium/drivers/nvc0/nvc0_query.c
> > +++ b/src/gallium/drivers/nvc0/nvc0_query.c
> > @@ -285,7 +285,6 @@ nvc0_query_begin(struct pipe_context *pipe, struct
> > pipe_query *pq)
> > case PIPE_QUERY_SO_OVERFLOW_PREDICATE:
> >nvc0_query_get(push, q, 0x10, 0x03005002 | (q->index << 5));
> >break;
> > -   case PIPE_QUERY_TIMESTAMP_DISJOINT:
> > case PIPE_QUERY_TIME_ELAPSED:
> >nvc0_query_get(push, q, 0x10, 0x5002);
> >break;
> > @@ -360,7 +359,6 @@ nvc0_query_end(struct pipe_context *pipe, struct
> > pipe_query *pq)
> >nvc0_query_get(push, q, 0x20, 0x5002);
> >break;
> > case PIPE_QUERY_TIMESTAMP:
> > -   case PIPE_QUERY_TIMESTAMP_DISJOINT:
> > case PIPE_QUERY_TIME_ELAPSED:
> >nvc0_query_get(push, q, 0, 0x5002);
> >break;
> > @@ -478,7 +476,7 @@ nvc0_query_result(struct pipe_context *pipe, struct
> > pipe_query *pq,
> >break;
> > case PIPE_QUERY_TIMESTAMP_DISJOINT: /* u32 sequence, u32 0, u64 time */
> >res64[0] = 10;
> 

Actually, jsut noticed it should be using a `struct 
pipe_query_data_timestamp_disjoint *` instead of res64/res8.

Jose

> 
> 
> > -  res8[8] = (data64[1] == data64[3]) ? FALSE : TRUE;
> > +  res8[8] = FALSE;
> >break;
> >  

Re: [Mesa-dev] [PATCH 2/2] softpipe: handle all queries, and change for the new disjoint semantics

2013-06-19 Thread Jose Fonseca
Looks good to me.

Jose

- Original Message -
> From: Roland Scheidegger 
> 
> The driver can do render_condition but wasn't handling the occlusion
> and so_overflow predicates (though the latter might not work yet due
> to gs support).
> ---
>  src/gallium/drivers/softpipe/sp_query.c |   39
>  ++-
>  1 file changed, 28 insertions(+), 11 deletions(-)
> 
> diff --git a/src/gallium/drivers/softpipe/sp_query.c
> b/src/gallium/drivers/softpipe/sp_query.c
> index b5bc0db..daeef53 100644
> --- a/src/gallium/drivers/softpipe/sp_query.c
> +++ b/src/gallium/drivers/softpipe/sp_query.c
> @@ -60,8 +60,10 @@ softpipe_create_query(struct pipe_context *pipe,
> struct softpipe_query* sq;
>  
> assert(type == PIPE_QUERY_OCCLUSION_COUNTER ||
> +  type == PIPE_QUERY_OCCLUSION_PREDICATE ||
>type == PIPE_QUERY_TIME_ELAPSED ||
>type == PIPE_QUERY_SO_STATISTICS ||
> +  type == PIPE_QUERY_SO_OVERFLOW_PREDICATE ||
>type == PIPE_QUERY_PRIMITIVES_EMITTED ||
>type == PIPE_QUERY_PRIMITIVES_GENERATED ||
>type == PIPE_QUERY_PIPELINE_STATISTICS ||
> @@ -90,9 +92,9 @@ softpipe_begin_query(struct pipe_context *pipe, struct
> pipe_query *q)
>  
> switch (sq->type) {
> case PIPE_QUERY_OCCLUSION_COUNTER:
> +   case PIPE_QUERY_OCCLUSION_PREDICATE:
>sq->start = softpipe->occlusion_count;
>break;
> -   case PIPE_QUERY_TIMESTAMP_DISJOINT:
> case PIPE_QUERY_TIME_ELAPSED:
>sq->start = os_time_get_nano();
>break;
> @@ -102,6 +104,10 @@ softpipe_begin_query(struct pipe_context *pipe, struct
> pipe_query *q)
>softpipe->num_primitives_generated = 0;
>sq->so.num_primitives_written = 0;
>softpipe->so_stats.num_primitives_written = 0;
> +  break;
> +   case PIPE_QUERY_SO_OVERFLOW_PREDICATE:
> +  sq->end = FALSE;
> +  break;
> case PIPE_QUERY_PRIMITIVES_EMITTED:
>sq->so.num_primitives_written = 0;
>softpipe->so_stats.num_primitives_written = 0;
> @@ -112,6 +118,7 @@ softpipe_begin_query(struct pipe_context *pipe, struct
> pipe_query *q)
>break;
> case PIPE_QUERY_TIMESTAMP:
> case PIPE_QUERY_GPU_FINISHED:
> +   case PIPE_QUERY_TIMESTAMP_DISJOINT:
>break;
> case PIPE_QUERY_PIPELINE_STATISTICS:
>/* reset our cache */
> @@ -141,15 +148,19 @@ softpipe_end_query(struct pipe_context *pipe, struct
> pipe_query *q)
> softpipe->active_query_count--;
> switch (sq->type) {
> case PIPE_QUERY_OCCLUSION_COUNTER:
> +   case PIPE_QUERY_OCCLUSION_PREDICATE:
>sq->end = softpipe->occlusion_count;
>break;
> case PIPE_QUERY_TIMESTAMP:
>sq->start = 0;
>/* fall through */
> -   case PIPE_QUERY_TIMESTAMP_DISJOINT:
> case PIPE_QUERY_TIME_ELAPSED:
>sq->end = os_time_get_nano();
>break;
> +   case PIPE_QUERY_SO_OVERFLOW_PREDICATE:
> +  sq->end = (softpipe->num_primitives_generated >
> + softpipe->so_stats.num_primitives_written);
> +  break;
> case PIPE_QUERY_SO_STATISTICS:
>sq->num_primitives_generated =
>   softpipe->num_primitives_generated;
> @@ -164,6 +175,7 @@ softpipe_end_query(struct pipe_context *pipe, struct
> pipe_query *q)
>sq->num_primitives_generated = softpipe->num_primitives_generated;
>break;
> case PIPE_QUERY_GPU_FINISHED:
> +   case PIPE_QUERY_TIMESTAMP_DISJOINT:
>break;
> case PIPE_QUERY_PIPELINE_STATISTICS:
>sq->stats.ia_vertices =
> @@ -195,9 +207,9 @@ softpipe_end_query(struct pipe_context *pipe, struct
> pipe_query *q)
>  
>  static boolean
>  softpipe_get_query_result(struct pipe_context *pipe,
> -   struct pipe_query *q,
> -   boolean wait,
> -   union pipe_query_result *vresult)
> +  struct pipe_query *q,
> +  boolean wait,
> +  union pipe_query_result *vresult)
>  {
> struct softpipe_query *sq = softpipe_query(q);
> uint64_t *result = (uint64_t*)vresult;
> @@ -215,15 +227,17 @@ softpipe_get_query_result(struct pipe_context *pipe,
>   sizeof(struct pipe_query_data_pipeline_statistics));;
>break;
> case PIPE_QUERY_GPU_FINISHED:
> -  *result = TRUE;
> +  vresult->b = TRUE;
> +  break;
> +   case PIPE_QUERY_SO_OVERFLOW_PREDICATE:
> +  vresult->b = sq->end != 0;
>break;
> case PIPE_QUERY_TIMESTAMP_DISJOINT: {
> -  struct pipe_query_data_timestamp_disjoint td;
> +  struct pipe_query_data_timestamp_disjoint *td =
> +  (struct pipe_query_data_timestamp_disjoint *)vresult;
>/* os_get_time_nano return nanoseconds */
> -  td.frequency = UINT64_C(10);
> -  td.disjoint = sq->end != sq->start;
> -  memcpy(vresult, &td,
> - sizeof(struct pipe_query_data_timestamp_disjoint));
> +  td->frequency = UINT64_C(10);
> +  

Re: [Mesa-dev] [PATCH 1/2] gallium: fix PIPE_QUERY_TIMESTAMP_DISJOINT

2013-06-19 Thread Jose Fonseca


- Original Message -
> From: Roland Scheidegger 
> 
> The semantics didn't really make sense, not really matching neither d3d9
> (though the docs are all broken there) nor d3d10. So make it match d3d10
> semantics, which actually gives meaning to the "disjoint" part.
> Drivers are fixed up in a very primitive way, I have no idea what could
> actually cause the counter to become unreliable so just always return
> FALSE for the disjoint part.
> ---
>  src/gallium/docs/source/context.rst   |   10 ++
>  src/gallium/drivers/nv50/nv50_query.c |5 +++--
>  src/gallium/drivers/nvc0/nvc0_query.c |4 +---
>  3 files changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/src/gallium/docs/source/context.rst
> b/src/gallium/docs/source/context.rst
> index ede89be..bfd58a4 100644
> --- a/src/gallium/docs/source/context.rst
> +++ b/src/gallium/docs/source/context.rst
> @@ -330,11 +330,13 @@ scaled to nanoseconds, recorded after all commands
> issued prior to
>  This query does not require a call to ``begin_query``.
>  The result is an unsigned 64-bit integer.
>  
> -``PIPE_QUERY_TIMESTAMP_DISJOINT`` can be used to check whether the
> -internal timer resolution is good enough to distinguish between the
> -events at ``begin_query`` and ``end_query``.
> +``PIPE_QUERY_TIMESTAMP_DISJOINT`` can be used to check the
> +internal timer resolution and whether the timestamp counter has become
> +unreliable due to things like throttling etc. - only if this is FALSE
> +a timestamp query (within the timestamp_disjoint query) should be trusted.
>  The result is a 64-bit integer specifying the timer resolution in Hz,
> -followed by a boolean value indicating whether the timer has incremented.
> +followed by a boolean value indicating whether the timestamp counter
> +is discontinuous or disjoint.
>  
>  ``PIPE_QUERY_PRIMITIVES_GENERATED`` returns a 64-bit integer indicating
>  the number of primitives processed by the pipeline (regardless of whether
> diff --git a/src/gallium/drivers/nv50/nv50_query.c
> b/src/gallium/drivers/nv50/nv50_query.c
> index 656ff9d..b97eff2 100644
> --- a/src/gallium/drivers/nv50/nv50_query.c
> +++ b/src/gallium/drivers/nv50/nv50_query.c
> @@ -181,7 +181,6 @@ nv50_query_begin(struct pipe_context *pipe, struct
> pipe_query *pq)
>nv50_query_get(push, q, 0x20, 0x05805002);
>nv50_query_get(push, q, 0x30, 0x06805002);
>break;
> -   case PIPE_QUERY_TIMESTAMP_DISJOINT:
> case PIPE_QUERY_TIME_ELAPSED:
>nv50_query_get(push, q, 0x10, 0x5002);
>break;
> @@ -229,6 +228,8 @@ nv50_query_end(struct pipe_context *pipe, struct
> pipe_query *pq)
> case NVA0_QUERY_STREAM_OUTPUT_BUFFER_OFFSET:
>nv50_query_get(push, q, 0, 0x0d005002 | (q->index << 5));
>break;
> +   case PIPE_QUERY_TIMESTAMP_DISJOINT:
> +  break;
> default:
>assert(0);
>break;
> @@ -289,7 +290,7 @@ nv50_query_result(struct pipe_context *pipe, struct
> pipe_query *pq,
>break;
> case PIPE_QUERY_TIMESTAMP_DISJOINT: /* u32 sequence, u32 0, u64 time */
>res64[0] = 10;

I don't think this statement is needed. Same below. Otherwise looks good.

Jose

> -  res8[8] = (data64[1] == data64[3]) ? FALSE : TRUE;
> +  res8[8] = FALSE;
>break;
> case PIPE_QUERY_TIME_ELAPSED:
>res64[0] = data64[1] - data64[3];
> diff --git a/src/gallium/drivers/nvc0/nvc0_query.c
> b/src/gallium/drivers/nvc0/nvc0_query.c
> index 8e584c9..3f5a9fb 100644
> --- a/src/gallium/drivers/nvc0/nvc0_query.c
> +++ b/src/gallium/drivers/nvc0/nvc0_query.c
> @@ -285,7 +285,6 @@ nvc0_query_begin(struct pipe_context *pipe, struct
> pipe_query *pq)
> case PIPE_QUERY_SO_OVERFLOW_PREDICATE:
>nvc0_query_get(push, q, 0x10, 0x03005002 | (q->index << 5));
>break;
> -   case PIPE_QUERY_TIMESTAMP_DISJOINT:
> case PIPE_QUERY_TIME_ELAPSED:
>nvc0_query_get(push, q, 0x10, 0x5002);
>break;
> @@ -360,7 +359,6 @@ nvc0_query_end(struct pipe_context *pipe, struct
> pipe_query *pq)
>nvc0_query_get(push, q, 0x20, 0x5002);
>break;
> case PIPE_QUERY_TIMESTAMP:
> -   case PIPE_QUERY_TIMESTAMP_DISJOINT:
> case PIPE_QUERY_TIME_ELAPSED:
>nvc0_query_get(push, q, 0, 0x5002);
>break;
> @@ -478,7 +476,7 @@ nvc0_query_result(struct pipe_context *pipe, struct
> pipe_query *pq,
>break;
> case PIPE_QUERY_TIMESTAMP_DISJOINT: /* u32 sequence, u32 0, u64 time */
>res64[0] = 10;



> -  res8[8] = (data64[1] == data64[3]) ? FALSE : TRUE;
> +  res8[8] = FALSE;
>break;
> case PIPE_QUERY_TIME_ELAPSED:
>res64[0] = data64[1] - data64[3];
> --
> 1.7.9.5
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/6] Eliminating unused built-in varyings

2013-06-19 Thread Marek Olšák
Could somebody please review this series?

Marek

On Thu, Jun 13, 2013 at 2:25 PM, Marek Olšák  wrote:
> Hi everyone,
>
> this series adds a new GLSL compiler optimization pass which eliminates 
> unused and set-but-unused built-in varyings and adds a few improvements to 
> the GLSL linker in the process.
>
> Before I show you how it works, I wanna say that there are patches which are 
> related to and will most probably conflict with the geometry shader work, but 
> they are necessary because the linkage of varyings is largely suboptimal.
>
> Also, the GL_EXT_separate_shader_objects extension must be disabled for this 
> optimization to be enabled. The reason is a program object with both a VS and 
> FS can be bound partially, e.g. by glUseShaderProgramEXT(GL_VERTEX_SHADER, 
> prog), so the extension makes every program object be just a set of "separate 
> shaders". The extension is not important anyway.
>
> Now, to illustrate how the optimization works, consider these 2 shader IR 
> dumps:
>
>
> Vertex shader (8 varyings):
> ...
> (declare (shader_out ) vec4 gl_FrontColor)
> (declare (shader_out ) vec4 gl_FrontSecondaryColor)
> (declare (shader_out ) (array vec4 6) gl_TexCoord)
> (function main
>   (signature void
> (parameters
> )
> (
>   ...
>   (assign  (xyzw) (var_ref gl_FrontColor)  (var_ref gl_Color) )
>   (assign  (xyzw) (var_ref gl_FrontSecondaryColor)  (var_ref 
> gl_SecondaryColor) )
>   (assign  (xyzw) (array_ref (var_ref gl_TexCoord) (constant int (1)) )  
> (var_ref gl_MultiTexCoord1) )
>   (assign  (xyzw) (array_ref (var_ref gl_TexCoord) (constant int (4)) )  
> (var_ref gl_MultiTexCoord4) )
>   (assign  (xyzw) (array_ref (var_ref gl_TexCoord) (constant int (5)) )  
> (var_ref gl_MultiTexCoord5) )
> ))
> )
>
> Fragment shader (6 varyings):
> ...
> (declare (shader_in ) vec4 gl_SecondaryColor)
> (declare (shader_in ) (array vec4 5) gl_TexCoord)
> (function main
>   (signature void
> (parameters
> )
> (
>   (declare () vec4 r)
>   (assign  (xyzw) (var_ref r)  ... (var_ref gl_SecondaryColor) ) )
>   (assign  (xyzw) (var_ref r)  ... (array_ref (var_ref gl_TexCoord) 
> (constant int (1)) ) ) ) )
>   (assign  (xyzw) (var_ref r)  ... (array_ref (var_ref gl_TexCoord) 
> (constant int (2)) ) ) ) )
>   (assign  (xyzw) (var_ref r)  ... (array_ref (var_ref gl_TexCoord) 
> (constant int (3)) ) ) ) )
>   (declare (temporary ) vec4 assignment_tmp)
>   (assign  (xyzw) (var_ref assignment_tmp)  ... (array_ref (var_ref 
> gl_TexCoord) (constant int (4)) ) ) ) )
>   ...
> ))
> )
>
>
> Note that only gl_TexCoord[1], gl_TexCoord[4], and gl_SecondaryColor are used 
> by both shaders. The optimization replaces all occurences of varyings which 
> are unused by the other stage by temporary variables. It also breaks down the 
> gl_TexCoord array into separate vec4 variables if needed. Here's the result:
>
>
> Vertex shader (3 varyings instead of 8):
> ...
> (declare (shader_out ) vec4 gl_out_TexCoord1)
> (declare (shader_out ) vec4 gl_out_TexCoord4)
> (declare (temporary ) vec4 gl_out_TexCoord5_dummy)
> (declare (temporary ) vec4 gl_out_FrontColor0_dummy)
> (declare (shader_out ) vec4 gl_FrontSecondaryColor)
> (function main
>   (signature void
> (parameters
> )
> (
>   ...
>   (assign  (xyzw) (var_ref gl_out_FrontColor0_dummy)  (var_ref gl_Color) )
>   (assign  (xyzw) (var_ref gl_FrontSecondaryColor)  (var_ref 
> gl_SecondaryColor) )
>   (assign  (xyzw) (var_ref gl_out_TexCoord1)  (var_ref gl_MultiTexCoord1) 
> )
>   (assign  (xyzw) (var_ref gl_out_TexCoord4)  (var_ref gl_MultiTexCoord4) 
> )
>   (assign  (xyzw) (var_ref gl_out_TexCoord5_dummy)  (var_ref 
> gl_MultiTexCoord5) )
> ))
> )
>
> Fragment shader (3 varyings instead of 6):
> ...
> (declare (shader_in ) vec4 gl_in_TexCoord1)
> (declare (temporary ) vec4 gl_in_TexCoord2_dummy)
> (declare (temporary ) vec4 gl_in_TexCoord3_dummy)
> (declare (shader_in ) vec4 gl_in_TexCoord4)
> (declare (shader_in ) vec4 gl_SecondaryColor)
> (function main
>   (signature void
> (parameters
> )
> (
>   (declare () vec4 r)
>   (assign  (xyzw) (var_ref r)  ... (var_ref gl_SecondaryColor) ) )
>   (assign  (xyzw) (var_ref r)  ... (var_ref gl_in_TexCoord1) ) ) )
>   (assign  (xyzw) (var_ref r)  ... (var_ref gl_in_TexCoord2_dummy) ) ) )
>   (assign  (xyzw) (var_ref r)  ... (var_ref gl_in_TexCoord3_dummy) ) ) )
>   (declare (temporary ) vec4 assignment_tmp)
>   (assign  (xyzw) (var_ref assignment_tmp)  ... (var_ref gl_in_TexCoord4) 
> ) ) )
>   ...
> ))
> )
>
> The locations of varyings remain the same. That's all. Please review.
>
> Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Remove interleaved user array upload optimization

2013-06-19 Thread Ian Romanick
From: Ian Romanick 

The checks to determine when the data can be uploaded in an interleaved
fashion can be tricked by certain data layouts.  For example,

float data[...];

glVertexAttribPointer(0, 4, GL_FLOAT, GL_FALSE, 16, &data[0]);
glVertexAttribPointer(1, 4, GL_FLOAT, GL_FALSE, 16, &data[4]);
glDrawArrays(GL_POINTS, 0, 1);

will hit the interleaved path with an incorrect size (16 bytes instead
of 32 bytes).  As a result, the data for attribute 1 never gets
uploaded.  The single element draw case is the only sensible case I can
think of for non-interleaved-that-looks-like-interleaved data, but there
may be others as well.

I don't see an easy way to fix the checks, so I have chosen to just rip
out the optimization.  This may cause performance issues on applications
that use large arrays of data that are not in VBOs.

Signed-off-by: Ian Romanick 
Cc: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_draw_upload.c | 37 +
 1 file changed, 1 insertion(+), 36 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c 
b/src/mesa/drivers/dri/i965/brw_draw_upload.c
index 2ded14b..9d46fee 100644
--- a/src/mesa/drivers/dri/i965/brw_draw_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c
@@ -403,8 +403,6 @@ static void brw_prepare_vertices(struct brw_context *brw)
struct intel_context *intel = intel_context(ctx);
/* CACHE_NEW_VS_PROG */
GLbitfield64 vs_inputs = brw->vs.prog_data->inputs_read;
-   const unsigned char *ptr = NULL;
-   GLuint interleaved = 0;
unsigned int min_index = brw->vb.min_index + brw->basevertex;
unsigned int max_index = brw->vb.max_index + brw->basevertex;
int delta, i, j;
@@ -498,19 +496,8 @@ static void brw_prepare_vertices(struct brw_context *brw)
  */
 assert(input->offset < brw->vb.buffers[input->buffer].bo->size);
   } else {
-/* Queue the buffer object up to be uploaded in the next pass,
- * when we've decided if we're doing interleaved or not.
+/* Queue the buffer object up to be uploaded in the next pass.
  */
-if (nr_uploads == 0) {
-   interleaved = glarray->StrideB;
-   ptr = glarray->Ptr;
-}
-else if (interleaved != glarray->StrideB ||
- (uintptr_t)(glarray->Ptr - ptr) > interleaved)
-{
-   interleaved = 0;
-}
-
 upload[nr_uploads++] = input;
   }
}
@@ -527,28 +514,6 @@ static void brw_prepare_vertices(struct brw_context *brw)
}
 
/* Handle any arrays to be uploaded. */
-   if (nr_uploads > 1) {
-  if (interleaved) {
-struct brw_vertex_buffer *buffer = &brw->vb.buffers[j];
-/* All uploads are interleaved, so upload the arrays together as
- * interleaved.  First, upload the contents and set up upload[0].
- */
-copy_array_to_vbo_array(brw, upload[0], min_index, max_index,
-buffer, interleaved);
-buffer->offset -= delta * interleaved;
-
-for (i = 0; i < nr_uploads; i++) {
-   /* Then, just point upload[i] at upload[0]'s buffer. */
-   upload[i]->offset =
-  ((const unsigned char *)upload[i]->glarray->Ptr - ptr);
-   upload[i]->buffer = j;
-}
-j++;
-
-nr_uploads = 0;
-  }
-   }
-   /* Upload non-interleaved arrays */
for (i = 0; i < nr_uploads; i++) {
   struct brw_vertex_buffer *buffer = &brw->vb.buffers[j];
   if (upload[i]->glarray->InstanceDivisor == 0) {
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] prog_parameter.c ASAN Patch

2013-06-19 Thread Myles C. Maxfield
Any word on this?

Thanks,
Myles


On Mon, Jun 17, 2013 at 12:09 PM, Myles C. Maxfield <
myles.maxfi...@gmail.com> wrote:

> Sure. I was under the impression that |size| couldn't be both greater than
> 4 and a non-multiple of 4, but I've reworked the patch to incorporate this
> and to be a little more straightforward.
>
> Is the only way to replace "ASAN" with "Address Sanitizer" to change the
> subject of this email thread?
>
> Anyway, here's a similar but modified patch:
>
> From: Myles C. Maxfield 
> Date: Mon, 17 Jun 2013 11:50:05 -0700
> Subject: [PATCH] Appeasing Address Sanitizer
>
> ---
>  src/mesa/program/prog_parameter.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/program/prog_parameter.c
> b/src/mesa/program/prog_parameter.c
> index 95b153e..1d46476 100644
> --- a/src/mesa/program/prog_parameter.c
> +++ b/src/mesa/program/prog_parameter.c
> @@ -155,7 +155,18 @@ _mesa_add_parameter(struct gl_program_parameter_list
> *paramList,
>   p->Size = size;
>   p->DataType = datatype;
>   if (values) {
> -COPY_4V(paramList->ParameterValues[oldNum + i], values);
> +if (size >= (i+1)*4) {
> +COPY_4V(paramList->ParameterValues[oldNum + i], values);
> +} else {
> +/* silence asan */
> +for (j = 0; j < 4; j++) {
> +if (i*4+j < size) {
> +paramList->ParameterValues[oldNum + i][j] =
> values[i*4+j];
> +} else {
> +paramList->ParameterValues[oldNum + i][j].f =
> 0.0f;
> +}
> +}
> +}
>  values += 4;
>  p->Initialized = GL_TRUE;
>   }
> --
> 1.7.12.4 (Apple Git-37)
>
>
> On Mon, Jun 17, 2013 at 8:13 AM, Brian Paul  wrote:
>
>> On 06/14/2013 05:12 PM, Myles C. Maxfield wrote:
>>
>>> Sorry for the triple post; I received a bounce email the first time and
>>> got sent to the spam folder the second time, so I'm trying a third time.
>>>
>>> Hello, all. I was running Mesa with Address Sanitizer [1] turned on, and
>>> found one place where ASAN pointed out a read-before-initialized problem.
>>> In particular, in _mesa_add_parameter, in prog_parameter.c, |values|
>>> represents an array holding a variable number of values. These values get
>>> copied out of the array 4 at a time with the COPY_4V macro, however, the
>>> array might only contain a single element. In this case, ASAN reports a
>>> read-before-initialize because the last 3 of the 4 elements haven't been
>>> written to yet. I was hoping to contribute a patch that will silence this
>>> problem that ASAN reports. I'm happy to incorporate any feedback anyone has
>>> into this patch.
>>>
>>> Thanks,
>>> Myles C. Maxfield
>>>
>>> [1]https://code.google.com/p/**address-sanitizer/
>>>
>>> diff --git a/src/mesa/program/prog_**parameter.c
>>> b/src/mesa/program/prog_**parameter.c
>>> index 2018fa5..63915fb 100644
>>> --- a/src/mesa/program/prog_**parameter.c
>>> +++ b/src/mesa/program/prog_**parameter.c
>>> @@ -158,7 +158,17 @@ _mesa_add_parameter(struct
>>> gl_program_parameter_list *paramList,
>>>p->DataType = datatype;
>>>p->Flags = flags;
>>>if (values) {
>>> -COPY_4V(paramList->**ParameterValues[oldNum + i], values);
>>> +if (size & 3) {
>>> +  for (j = 0; j < size; j++) {
>>> +paramList->ParameterValues[**oldNum + i][j] =
>>> values[j];
>>> +  }
>>> +  /* silence asan */
>>> +  for (j = size; j < 4; j++) {
>>> +paramList->ParameterValues[**oldNum + i][j].f = 0;
>>> +  }
>>> +} else {
>>> +  COPY_4V(paramList->**ParameterValues[oldNum + i],
>>> values);
>>> +}
>>>   values += 4;
>>>   p->Initialized = GL_TRUE;
>>>}
>>>
>>
>> The value of 'size' can actually be greater than 4 (IIRC, and the
>> function comment are still correct).  For example, for a matrix, size=16.
>>  So the first for-loop should be fixed, just to be safe.
>>
>> In the commit message, let's not use "ASAN" since it's not obvious that
>> it means Address Sanitizer.
>>
>> -Brian
>>
>>
>>
>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev