Re: [Mesa-dev] [PATCH 02/15] mesa: Share common code between ARB_debug_output and KHR_debug functions

2013-09-05 Thread Ian Romanick
On 09/04/2013 04:46 PM, Timothy Arceri wrote:
 Since the calling functions all should alias, all of this should
 be removed too.
 
 The functions have to do different validations on types they are not 
 exactly the same. If there is some way to signal this to be done 
 using an alias can you please give me some advice on how to do this 
 so I can learn.

The general guideline is that if an extension (or core GL version) only
adds functionality, the functions should alias.  In other words, alias
if the new thing is a proper superset of the old thing.

In this case, all of the functionality in the ARB_debug_output is
unchanged in KHR_debug.  Both

glDebugMessageControl(GL_DEBUG_SOURCE_THIRD_PARTY,
  GL_DONT_CARE, GL_DONT_CARE, 0,
  NULL, GL_FALSE);

and

glDebugMessageControlARB(GL_DEBUG_SOURCE_THIRD_PARTY_ARB,
 GL_DONT_CARE, GL_DONT_CARE, 0,
 NULL, GL_FALSE);

behave identically.

In addition, notice that GL_DEBUG_CALLBACK_FUNCTION_ARB is the same as
GL_DEBUG_CALLBACK_FUNCTION, and GL_DEBUG_CALLBACK_USER_PARAM_ARB is the
same as GL_DEBUG_CALLBACK_USER_PARAM.  If glDebugMessageCallbackARB and
glDebugMessageCallback are different, which state does
glGetPointerv(GL_DEBUG_CALLBACK_FUNCTION) return?

It used to be easier in the days when specs still defined GLX protocol.
 If two functions had the same protocol opcode assigned, they had to alias.

We (I) have made mistakes the other direction in the past.  In Mesa 9.1
and earlier, glBindFramebufferEXT and glBindFramebuffer alias, but they
should not.

Let's be cautious... let's poke at the AMD and NVIDIA drivers to see
what they do.  It should be easy enough to figure out if they use the
same implementation for the KHR and ARB functions.

There's a Khronos face-to-face next week, and I'll raise this as a bug.
 KHR_debug really should have listed interactions with ARB_debug.

 I think this entire patch should be removed.
 
 I'm trying to stay positive be its hard when your emails and review
 have an obvious overtone of annoyance.

Several people reviewed the patch series, and yet a bunch of obvious
problems slipped through.  *That* is very annoying.

 The entire patch is not junk. For example the changes to the

Well... I don't think I said junk.  I don't think there's anything in
the series that is junk, per se.  There are a bunch of little things
that should have been noticed and fixed before the code landed.

 validation function are still valid as mentioned above however you
 want to call them, and the generic message_insert function is reused
 by the push/pop implementation. If answer my question from above I
 can create a new patch.

To me, this sounds like the patch does a bunch of different things...
and it should have been split into multiple patches.  Can I add that to
the list of things the reviewers didn't catch? :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/21] glsl: Add missing type inference support for ARB_gpu_shader5 unops.

2013-09-05 Thread Kenneth Graunke

On 09/04/2013 07:11 PM, Matt Turner wrote:

On Wed, Sep 4, 2013 at 3:22 PM, Kenneth Graunke kenn...@whitecape.org wrote:

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
  src/glsl/ir.cpp | 4 
  1 file changed, 4 insertions(+)

diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp
index e9317f8..4abadd8 100644
--- a/src/glsl/ir.cpp
+++ b/src/glsl/ir.cpp
@@ -250,6 +250,7 @@ ir_expression::ir_expression(int op, ir_rvalue *op0)
 case ir_unop_cos_reduced:
 case ir_unop_dFdx:
 case ir_unop_dFdy:
+   case ir_unop_bitfield_reverse:
this-type = op0-type;
break;

@@ -257,6 +258,9 @@ ir_expression::ir_expression(int op, ir_rvalue *op0)
 case ir_unop_b2i:
 case ir_unop_u2i:
 case ir_unop_bitcast_f2i:
+   case ir_unop_bit_count:
+   case ir_unop_find_msb:
+   case ir_unop_find_lsb:
this-type = glsl_type::get_instance(GLSL_TYPE_INT,
op0-type-vector_elements, 1);
break;
--
1.8.3.4


ir_binop_bfm, ir_triop_bfi, ir_triop_bitfield_extract, and
ir_quadop_bitfield_insert too?


Adding ir_binop_bfm makes sense.

The next patch adds triop support.

I didn't add an ir_expression constructor with type inference for 
quadops since I wasn't sure how to handle ir_quadop_vector (and it's the 
only other quadop).  But I guess it can be handled as:


glsl_type::get_instance(operands[0]-type-base_type,
operands[0] != NULL +
operands[1] != NULL +
operands[2] != NULL +
operands[3] != NULL,
1);

Then ir_builder::bitfield_extract() could use it.

--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/15] Implement KHR_debug

2013-09-05 Thread Ian Romanick
On 09/04/2013 03:11 PM, Timothy Arceri wrote:

 Since you obviously didn't run 'make check', I will be reverting
 this entire series later today.
 
 YOU MUST RUN 'make check'.
 
 Ok, well what can I say I didnt know (or I guess didnt check) that
 Mesa had a Testsuite I will submit a patch to add a section to
 devinfo.html about submitting patches to include this information and
 reminding people to use git send-email rather than attaching patches
 to emails this seems to be the other common MISTAKE new contributors
 make.

That would be excellent.  You might also add the bit about using
--in-reply-to for updated versions of patches.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/15] mesa: Implement KHR_debug ObjectLabel functions

2013-09-05 Thread Ian Romanick
On 09/04/2013 05:43 PM, Timothy Arceri wrote:
 +/**
 
 + * Helper for _mesa_ObjectLabel() and _mesa_ObjectPtrLabel().
 + */
 +static void
 +set_label(struct gl_context *ctx, char **labelPtr, const char *label,
 +  int length, const char *caller)
 +{
 +   if (*labelPtr) {
 +  /* free old label string */
 +  free(*labelPtr);
 +   }
 +
 +   if (label) {
 +   /* set new label string */
 +
 +  if (length = 0) {

 This should be  0.  malloc(0) is not portable.

 Shouldn't there also be a MAX_LABEL_LENGTH test for this patch?
 
 I think you are reviewing an old version of the patch. New version has that 
 test.

Yeah, I noticed that there were some v2 and v3 patches on the list.  If
you're using git-send-email, you can use --in-reply-to to make v2 and v3
patches show up as replies to the originals.  This usually works well
you're sending out updates to individual patches (as opposed to
re-sending the whole series).  Not everyone does this, but it does make
it harder for reviewers to accidentally review old patches.

 + /* explicit length */
 + *labelPtr = (char *) malloc(length);
 + if (*labelPtr) {
 +memcpy(*labelPtr, label, length);
 + }
 +  }
 +  else {
 + /* null-terminated string */
 + int len = strlen(label);
 + if (len = MAX_LABEL_LENGTH) {
 
 The reason MAX_LABEL_LENGTH exists is so that you can have a fixed-size
 array in your structure (so you don't have to malloc a buffer.  Either
 make a fixed size buffer, or make MAX_LABEL_LENGTH be the maximum size
 representable in a GLsizei (and eliminate this check).
 
 Ok makes sense. However the check is still valid its in the spec. We
 shouldn't just truncate the string I know for sure that the AMD driver
 does the same thing. I have posted extensive tests for the objectlabel
 code on the piglit list.

Which behavior are you say AMD has?  Generate the error or truncate the
string?  Any idea what NVIDIA does?  If both AMD and NVIDIA do the same
thing that's different from what the spec says, I typically submit a
spec bug.

Assuming that at least one of them follows the spec, I was suggesting we
do one of two things:

A. Set MAX_LABEL_LENGTH to the minimum required by the spec, and include
some method to statically allocate a buffer of that size for every
object.  I looked at patch 7, and there are a couple of ways to do that.
 I suspect that AMD puts 'char Label[0];' at the end of their data
structures.  If it's a debug context, they just allocate and extra 256
bytes for each of their structures.

B. Set MAX_LABEL_LENGTH so large, such as MAX_INT (0x7fff) that it
is impossible to have a value that is larger (due to the limited range
of GLsizei).  Eliminate the checks for length  MAX_LABEL_LENGTH.

After working through the mental exercise of A, I'm not very excited
about it.  It saves a pointer in every data structure in non-debug
contexts, but I'm not sure it's worth it.  Hmm... since the patch that
landed includes the missing check, it's probably not worth B either.  We
haven't done any intense cache analysis of Mesa data structures to even
know if having that extra pointer in the middle of the structure will
cause performance problems.  Let's avoid the evil of premature
optimization and leave the code as-is.

 +/* An INVALID_VALUE error is generated if the number of 
 characters
 + * in label, excluding the null terminator when length is
 + * negative, is not less than the value of MAX_LABEL_LENGTH.
 + */
 +_mesa_error(ctx, GL_INVALID_VALUE,
 +%s(length=%d, which is not less than 
 +GL_MAX_LABEL_LENGTH=%d), caller, length,
 +MAX_LABEL_LENGTH);
 +return;
 + }
 + *labelPtr = _mesa_strdup(label);
 +  }
 +   }
 +}
 +
 +/**
 + * Helper for _mesa_GetObjectLabel() and _mesa_GetObjectPtrLabel().
 + */
 +static void
 +copy_label(char **labelPtr, char *label, int *length, int bufSize)
 +{
 +   int labelLen = 0;
 +
 +   if (*labelPtr)
 +  labelLen = strlen(*labelPtr);
 +
 +   if (label) {

 There should be a spec quote here explaining why this value is returned.
 Other places in OpenGL include the NUL terminator in the length.

/* The KHR_debug spec says:
 *
 * The string representation of the message is stored in
 * message and its length (excluding the null-terminator)
 * is stored in length
 */
 
 ok
 
 So are you going to revert the patchset or do I create a fixup patchset?

I don't think reverting a bunch of stuff is going to help at this
point... and it will probably discourage you even further.  That also
won't help.

After replying to your e-mail about patch 2, I think any significant
work should wait until there is a conclusion from Khronos.  It would be
truly awful to re-work the patches only to find that everyone else
treats the KHR and ARB entry points 

Re: [Mesa-dev] [PATCH 14/21] glsl: Add a new ir_builder::dotlike() function.

2013-09-05 Thread Kenneth Graunke

On 09/04/2013 07:11 PM, Matt Turner wrote:

On Wed, Sep 4, 2013 at 3:22 PM, Kenneth Graunke kenn...@whitecape.org wrote:

dotlike() uses ir_binop_mul for scalars, and ir_binop_dot for vectors.

When generating built-in functions, we often want to use regular
multiply for scalar signatures, and dot() for vector signatures.
ir_binop_dot only works on vectors, so we have to switch opcodes,
even if the code is otherwise identical.  dotlike() makes this easy.


Why not just make dot() do this?


I wasn't sure how I felt about dot() returning something other than 
ir_binop_dot.  But it kind of makes sense, so if people would prefer 
that, I'm fine with doing that instead.


--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/15] mesa: Implement KHR_debug ObjectLabel functions

2013-09-05 Thread Timothy Arceri
On Wed, 2013-09-04 at 20:09 -0700, Ian Romanick wrote:
 On 09/04/2013 05:43 PM, Timothy Arceri wrote:
  +/**
  
  + * Helper for _mesa_ObjectLabel() and _mesa_ObjectPtrLabel().
  + */
  +static void
  +set_label(struct gl_context *ctx, char **labelPtr, const char *label,
  +  int length, const char *caller)
  +{
  +   if (*labelPtr) {
  +  /* free old label string */
  +  free(*labelPtr);
  +   }
  +
  +   if (label) {
  +   /* set new label string */
  +
  +  if (length = 0) {
 
  This should be  0.  malloc(0) is not portable.
 
  Shouldn't there also be a MAX_LABEL_LENGTH test for this patch?
  
  I think you are reviewing an old version of the patch. New version has that 
  test.
 
 Yeah, I noticed that there were some v2 and v3 patches on the list.  If
 you're using git-send-email, you can use --in-reply-to to make v2 and v3
 patches show up as replies to the originals.  This usually works well
 you're sending out updates to individual patches (as opposed to
 re-sending the whole series).  Not everyone does this, but it does make
 it harder for reviewers to accidentally review old patches.

ok thanks I was wondering how you guys manage to keep track of
everything.

 
  + /* explicit length */
  + *labelPtr = (char *) malloc(length);
  + if (*labelPtr) {
  +memcpy(*labelPtr, label, length);
  + }
  +  }
  +  else {
  + /* null-terminated string */
  + int len = strlen(label);
  + if (len = MAX_LABEL_LENGTH) {
  
  The reason MAX_LABEL_LENGTH exists is so that you can have a fixed-size
  array in your structure (so you don't have to malloc a buffer.  Either
  make a fixed size buffer, or make MAX_LABEL_LENGTH be the maximum size
  representable in a GLsizei (and eliminate this check).
  
  Ok makes sense. However the check is still valid its in the spec. We
  shouldn't just truncate the string I know for sure that the AMD driver
  does the same thing. I have posted extensive tests for the objectlabel
  code on the piglit list.
 
 Which behavior are you say AMD has?  Generate the error or truncate the
 string?  Any idea what NVIDIA does?  If both AMD and NVIDIA do the same
 thing that's different from what the spec says, I typically submit a
 spec bug.

The AMD driver generates the error. I don't have a Nvidia card so no
idea what they do. If someone with a Nvidia card feels like testing this
this the ObjectLabel piglit tests test for both cases of this error
message:
http://lists.freedesktop.org/archives/piglit/2013-August/007139.html

 
 Assuming that at least one of them follows the spec, I was suggesting we
 do one of two things:
 
 A. Set MAX_LABEL_LENGTH to the minimum required by the spec, and include
 some method to statically allocate a buffer of that size for every
 object.  I looked at patch 7, and there are a couple of ways to do that.
  I suspect that AMD puts 'char Label[0];' at the end of their data
 structures.  If it's a debug context, they just allocate and extra 256
 bytes for each of their structures.
 
 B. Set MAX_LABEL_LENGTH so large, such as MAX_INT (0x7fff) that it
 is impossible to have a value that is larger (due to the limited range
 of GLsizei).  Eliminate the checks for length  MAX_LABEL_LENGTH.
 
 After working through the mental exercise of A, I'm not very excited
 about it.  It saves a pointer in every data structure in non-debug
 contexts, but I'm not sure it's worth it.  Hmm... since the patch that
 landed includes the missing check, it's probably not worth B either.  We
 haven't done any intense cache analysis of Mesa data structures to even
 know if having that extra pointer in the middle of the structure will
 cause performance problems.  Let's avoid the evil of premature
 optimization and leave the code as-is.
 
  +/* An INVALID_VALUE error is generated if the number of 
  characters
  + * in label, excluding the null terminator when length is
  + * negative, is not less than the value of MAX_LABEL_LENGTH.
  + */
  +_mesa_error(ctx, GL_INVALID_VALUE,
  +%s(length=%d, which is not less than 
  +GL_MAX_LABEL_LENGTH=%d), caller, length,
  +MAX_LABEL_LENGTH);
  +return;
  + }
  + *labelPtr = _mesa_strdup(label);
  +  }
  +   }
  +}
  +
  +/**
  + * Helper for _mesa_GetObjectLabel() and _mesa_GetObjectPtrLabel().
  + */
  +static void
  +copy_label(char **labelPtr, char *label, int *length, int bufSize)
  +{
  +   int labelLen = 0;
  +
  +   if (*labelPtr)
  +  labelLen = strlen(*labelPtr);
  +
  +   if (label) {
 
  There should be a spec quote here explaining why this value is returned.
  Other places in OpenGL include the NUL terminator in the length.
 
 /* The KHR_debug spec says:
  *
  * The string representation of the message is stored in
  * message and its length (excluding 

[Mesa-dev] [PATCH] i965/gen7: always lower textureGrad() on gen7

2013-09-05 Thread Chia-I Wu
sample_d is slower than the lowered version on gen7.  For gen7, this improves
Xonotic benchmark with Ultimate effects by as much as 25%:

 before the change:  40.06 fps
 after the change:   51.10 fps
 after the change with INTEL_DEBUG=no16: 44.46 fps

As sample_d is not allowed in SIMD16 mode, I firstly thought the difference
was from SIMD8 versus SIMD16.  If that was the case, we would want to apply
brw_lower_texture_gradients() only on fragment shaders in SIMD16 mode.

But, as the numbers show, there is still 10% improvement when SIMD16 is forced
off after the change.  Thus textureGrad() is lowered unconditionally for now.
Due to this and that I haven't tried it on Haswell, this is still RFC.

No piglit regressions.

Signed-off-by: Chia-I Wu olva...@gmail.com
---
 .../dri/i965/brw_lower_texture_gradients.cpp   | 54 ++
 1 file changed, 36 insertions(+), 18 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp 
b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
index 1589a20..f3fcb56 100644
--- a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
+++ b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
@@ -34,8 +34,8 @@ using namespace ir_builder;
 
 class lower_texture_grad_visitor : public ir_hierarchical_visitor {
 public:
-   lower_texture_grad_visitor(bool has_sample_d_c)
-  : has_sample_d_c(has_sample_d_c)
+   lower_texture_grad_visitor(bool has_sample_d, bool has_sample_d_c)
+  : has_sample_d(has_sample_d), has_sample_d_c(has_sample_d_c)
{
   progress = false;
}
@@ -44,6 +44,7 @@ public:
 
 
bool progress;
+   bool has_sample_d;
bool has_sample_d_c;
 
 private:
@@ -90,22 +91,33 @@ txs_type(const glsl_type *type)
 ir_visitor_status
 lower_texture_grad_visitor::visit_leave(ir_texture *ir)
 {
-   /* Only lower textureGrad with shadow samplers */
-   if (ir-op != ir_txd || !ir-shadow_comparitor)
+   if (ir-op != ir_txd)
   return visit_continue;
 
-   /* Lower textureGrad() with samplerCubeShadow even if we have the sample_d_c
-* message.  GLSL provides gradients for the 'r' coordinate.  Unfortunately:
-*
-* From the Ivybridge PRM, Volume 4, Part 1, sample_d message description:
-* The r coordinate contains the faceid, and the r gradients are ignored
-*  by hardware.
-*
-* We likely need to do a similar treatment for samplerCube and
-* samplerCubeArray, but we have insufficient testing for that at the 
moment.
-*/
-   bool need_lowering = !has_sample_d_c ||
-  ir-sampler-type-sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE;
+   bool need_lowering = false;
+
+   if (ir-shadow_comparitor) {
+  /* Lower textureGrad() with samplerCubeShadow even if we have the
+   * sample_d_c message.  GLSL provides gradients for the 'r' coordinate.
+   * Unfortunately:
+   *
+   * From the Ivybridge PRM, Volume 4, Part 1, sample_d message
+   * description: The r coordinate contains the faceid, and the r
+   * gradients are ignored by hardware.
+   */
+  if (ir-sampler-type-sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE)
+ need_lowering = true;
+  else if (!has_sample_d_c)
+ need_lowering = true;
+   }
+   else {
+  /* We likely need to do a similar treatment for samplerCube and
+   * samplerCubeArray, but we have insufficient testing for that at the
+   * moment.
+   */
+  if (!has_sample_d)
+ need_lowering = true;
+   }
 
if (!need_lowering)
   return visit_continue;
@@ -154,7 +166,9 @@ lower_texture_grad_visitor::visit_leave(ir_texture *ir)
   expr(ir_unop_sqrt, dot(dPdy, dPdy)));
}
 
-   /* lambda_base = log2(rho).  We're ignoring GL state biases for now. */
+   /* lambda_base = log2(rho).  It will be biased and clamped by values
+* defined in SAMPLER_STATE to get the final lambda.
+*/
ir-op = ir_txl;
ir-lod_info.lod = expr(ir_unop_log2, rho);
 
@@ -168,8 +182,12 @@ bool
 brw_lower_texture_gradients(struct brw_context *brw,
 struct exec_list *instructions)
 {
+   /* sample_d is slower than the lowered version on gen7, and is not allowed
+* in SIMD16 mode.  Treating it as unsupported improves the performance.
+*/
+   bool has_sample_d = brw-gen != 7;
bool has_sample_d_c = brw-gen = 8 || brw-is_haswell;
-   lower_texture_grad_visitor v(has_sample_d_c);
+   lower_texture_grad_visitor v(has_sample_d, has_sample_d_c);
 
visit_list_elements(v, instructions);
 
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] docs: Add some notes on submitting patches

2013-09-05 Thread Timothy Arceri

Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
---
 docs/devinfo.html |   23 +++
 1 file changed, 23 insertions(+)

diff --git a/docs/devinfo.html b/docs/devinfo.html
index 4c1099c..d921e0d 100644
--- a/docs/devinfo.html
+++ b/docs/devinfo.html
@@ -155,6 +155,29 @@ of ttbool/tt, tttrue/tt, and
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp can serve as examples.
 /p
 
+h2Submitting patches/h2
+
+p
+You should always run the Mesa Testsuite before submitting patches.
+The Testsuite can be run using the 'make check' command. All test
+must pass before patches will be accepted, this may mean you have
+to update the tests themselves.
+/p
+
+p
+Patches should be sent to the Mesa mailing list for review.
+When submitting a patch make sure to use git send-email rather than attaching
+patches to emails. Sending patches as attachments prevents people from being
+able to provide in-line review comments.
+/p
+
+p
+When submitting follow-up patches you can use --in-reply-to to make v2, v3,
+etc patches show up as replies to the originals. This usually works well
+when you're sending out updates to individual patches (as opposed to
+re-sending the whole series). Using --in-reply-to makes
+it harder for reviewers to accidentally review old patches.
+/p
 
 h2Marking a commit as a candidate for a stable branch/h2
 
-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/gen7: always lower textureGrad() on gen7

2013-09-05 Thread Chris Forbes
A possible explanation for the perf change is that Xonotic uses
anisotropic filtering at this quality level. Lowering to txl defeats
it.

It would be worth doing an image quality comparison before and after the change.

-- Chris

On Thu, Sep 5, 2013 at 8:35 PM, Chia-I Wu olva...@gmail.com wrote:
 sample_d is slower than the lowered version on gen7.  For gen7, this improves
 Xonotic benchmark with Ultimate effects by as much as 25%:

  before the change:  40.06 fps
  after the change:   51.10 fps
  after the change with INTEL_DEBUG=no16: 44.46 fps

 As sample_d is not allowed in SIMD16 mode, I firstly thought the difference
 was from SIMD8 versus SIMD16.  If that was the case, we would want to apply
 brw_lower_texture_gradients() only on fragment shaders in SIMD16 mode.

 But, as the numbers show, there is still 10% improvement when SIMD16 is forced
 off after the change.  Thus textureGrad() is lowered unconditionally for now.
 Due to this and that I haven't tried it on Haswell, this is still RFC.

 No piglit regressions.

 Signed-off-by: Chia-I Wu olva...@gmail.com
 ---
  .../dri/i965/brw_lower_texture_gradients.cpp   | 54 
 ++
  1 file changed, 36 insertions(+), 18 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp 
 b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
 index 1589a20..f3fcb56 100644
 --- a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
 @@ -34,8 +34,8 @@ using namespace ir_builder;

  class lower_texture_grad_visitor : public ir_hierarchical_visitor {
  public:
 -   lower_texture_grad_visitor(bool has_sample_d_c)
 -  : has_sample_d_c(has_sample_d_c)
 +   lower_texture_grad_visitor(bool has_sample_d, bool has_sample_d_c)
 +  : has_sample_d(has_sample_d), has_sample_d_c(has_sample_d_c)
 {
progress = false;
 }
 @@ -44,6 +44,7 @@ public:


 bool progress;
 +   bool has_sample_d;
 bool has_sample_d_c;

  private:
 @@ -90,22 +91,33 @@ txs_type(const glsl_type *type)
  ir_visitor_status
  lower_texture_grad_visitor::visit_leave(ir_texture *ir)
  {
 -   /* Only lower textureGrad with shadow samplers */
 -   if (ir-op != ir_txd || !ir-shadow_comparitor)
 +   if (ir-op != ir_txd)
return visit_continue;

 -   /* Lower textureGrad() with samplerCubeShadow even if we have the 
 sample_d_c
 -* message.  GLSL provides gradients for the 'r' coordinate.  
 Unfortunately:
 -*
 -* From the Ivybridge PRM, Volume 4, Part 1, sample_d message description:
 -* The r coordinate contains the faceid, and the r gradients are ignored
 -*  by hardware.
 -*
 -* We likely need to do a similar treatment for samplerCube and
 -* samplerCubeArray, but we have insufficient testing for that at the 
 moment.
 -*/
 -   bool need_lowering = !has_sample_d_c ||
 -  ir-sampler-type-sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE;
 +   bool need_lowering = false;
 +
 +   if (ir-shadow_comparitor) {
 +  /* Lower textureGrad() with samplerCubeShadow even if we have the
 +   * sample_d_c message.  GLSL provides gradients for the 'r' coordinate.
 +   * Unfortunately:
 +   *
 +   * From the Ivybridge PRM, Volume 4, Part 1, sample_d message
 +   * description: The r coordinate contains the faceid, and the r
 +   * gradients are ignored by hardware.
 +   */
 +  if (ir-sampler-type-sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE)
 + need_lowering = true;
 +  else if (!has_sample_d_c)
 + need_lowering = true;
 +   }
 +   else {
 +  /* We likely need to do a similar treatment for samplerCube and
 +   * samplerCubeArray, but we have insufficient testing for that at the
 +   * moment.
 +   */
 +  if (!has_sample_d)
 + need_lowering = true;
 +   }

 if (!need_lowering)
return visit_continue;
 @@ -154,7 +166,9 @@ lower_texture_grad_visitor::visit_leave(ir_texture *ir)
expr(ir_unop_sqrt, dot(dPdy, dPdy)));
 }

 -   /* lambda_base = log2(rho).  We're ignoring GL state biases for now. */
 +   /* lambda_base = log2(rho).  It will be biased and clamped by values
 +* defined in SAMPLER_STATE to get the final lambda.
 +*/
 ir-op = ir_txl;
 ir-lod_info.lod = expr(ir_unop_log2, rho);

 @@ -168,8 +182,12 @@ bool
  brw_lower_texture_gradients(struct brw_context *brw,
  struct exec_list *instructions)
  {
 +   /* sample_d is slower than the lowered version on gen7, and is not allowed
 +* in SIMD16 mode.  Treating it as unsupported improves the performance.
 +*/
 +   bool has_sample_d = brw-gen != 7;
 bool has_sample_d_c = brw-gen = 8 || brw-is_haswell;
 -   lower_texture_grad_visitor v(has_sample_d_c);
 +   lower_texture_grad_visitor v(has_sample_d, has_sample_d_c);

 visit_list_elements(v, instructions);

 --
 

Re: [Mesa-dev] [PATCH 10/11] radeonsi: implement streamout shader support

2013-09-05 Thread Michel Dänzer
On Die, 2013-09-03 at 15:23 +0200, Marek Olšák wrote:
 The shader is responsible for writing to streamout buffers using
 the TBUFFER_STORE_FORMAT_* instructions.
 
 The locations of some input SGPRs and VGPRs are assigned dynamically, because
 the input SGPRs controlling streamout are not declared if they are not needed,
 decreasing the indices of all following inputs.
 ---
  src/gallium/drivers/radeonsi/radeonsi_shader.c | 279 
 -
  src/gallium/drivers/radeonsi/radeonsi_shader.h |   5 +-
  src/gallium/drivers/radeonsi/si_state_draw.c   |   7 +-
  3 files changed, 276 insertions(+), 15 deletions(-)
 
 diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c 
 b/src/gallium/drivers/radeonsi/radeonsi_shader.c
 index 335cd79..92f7cf5 100644
 --- a/src/gallium/drivers/radeonsi/radeonsi_shader.c
 +++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c

[...]

 + /* Pack the output. */
 + LLVMValueRef vdata;
 +
 + switch (num_comps) {
 + case 1: /* as i32 */
 + vdata = out[0];
 + break;
 + case 2: /* as v2i32 */
 + case 3: /* as v4i32 (aligned to 4) */
 + case 4: /* as v4i32 */
 + vdata = LLVMGetUndef(LLVMVectorType(i32, 
 util_next_power_of_two(num_comps)));
 + for (j = 0; j  num_comps; j++) {
 + vdata = LLVMBuildInsertElement(builder, 
 vdata, out[j],
 +
 LLVMConstInt(i32, j, 0), );
 + }
 + break;
 + }

This introduces a warning:

.../radeonsi_shader.c: In function 'si_llvm_emit_epilogue':
.../radeonsi_shader.c:708:15: warning: 'vdata' may be used uninitialized in 
this function [-Wmaybe-uninitialized]
  LLVMValueRef args[] = {
   ^
.../radeonsi_shader.c:840:17: note: 'vdata' was declared here
LLVMValueRef vdata;
 ^

Other than that, the series looks good to me.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] mesa: fix coding style of case statement

2013-09-05 Thread Timothy Arceri

Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
---
 src/mesa/main/objectlabel.c |  124 +--
 1 file changed, 62 insertions(+), 62 deletions(-)

diff --git a/src/mesa/main/objectlabel.c b/src/mesa/main/objectlabel.c
index 90d9e09..7e39c92 100644
--- a/src/mesa/main/objectlabel.c
+++ b/src/mesa/main/objectlabel.c
@@ -117,78 +117,78 @@ get_label_pointer(struct gl_context *ctx, GLenum 
identifier, GLuint name,
char **labelPtr = NULL;
 
switch (identifier) {
-   case GL_BUFFER:
-  {
- struct gl_buffer_object *bufObj = _mesa_lookup_bufferobj(ctx, name);
- if (bufObj)
-labelPtr = bufObj-Label;
-  }
+   case GL_BUFFER: {
+  struct gl_buffer_object *bufObj = _mesa_lookup_bufferobj(ctx, name);
+  if (bufObj)
+ labelPtr = bufObj-Label;
   break;
-   case GL_SHADER:
-  {
- struct gl_shader *shader = _mesa_lookup_shader(ctx, name);
- if (shader)
-labelPtr = shader-Label;
-  }
+   }
+
+   case GL_SHADER: {
+  struct gl_shader *shader = _mesa_lookup_shader(ctx, name);
+  if (shader)
+ labelPtr = shader-Label;
   break;
-   case GL_PROGRAM:
-  {
- struct gl_shader_program *program =
-_mesa_lookup_shader_program(ctx, name);
- if (program)
-labelPtr = program-Label;
-  }
+   }
+
+   case GL_PROGRAM: {
+  struct gl_shader_program *program =
+ _mesa_lookup_shader_program(ctx, name);
+  if (program)
+ labelPtr = program-Label;
   break;
-   case GL_VERTEX_ARRAY:
-  {
- struct gl_array_object *obj = _mesa_lookup_arrayobj(ctx, name);
- if (obj)
-labelPtr = obj-Label;
-  }
+   }
+
+   case GL_VERTEX_ARRAY: {
+  struct gl_array_object *obj = _mesa_lookup_arrayobj(ctx, name);
+  if (obj)
+ labelPtr = obj-Label;
   break;
-   case GL_QUERY:
-  {
- struct gl_query_object *query = _mesa_lookup_query_object(ctx, name);
- if (query)
-labelPtr = query-Label;
-  }
+   }
+
+   case GL_QUERY: {
+  struct gl_query_object *query = _mesa_lookup_query_object(ctx, name);
+  if (query)
+ labelPtr = query-Label;
   break;
-   case GL_TRANSFORM_FEEDBACK:
-  {
- struct gl_transform_feedback_object *tfo =
-_mesa_lookup_transform_feedback_object(ctx, name);
- if (tfo)
-labelPtr = tfo-Label;
-  }
+   }
+
+   case GL_TRANSFORM_FEEDBACK: {
+  struct gl_transform_feedback_object *tfo =
+ _mesa_lookup_transform_feedback_object(ctx, name);
+  if (tfo)
+ labelPtr = tfo-Label;
   break;
-   case GL_SAMPLER:
-  {
- struct gl_sampler_object *so = _mesa_lookup_samplerobj(ctx, name);
- if (so)
-labelPtr = so-Label;
-  }
+   }
+
+   case GL_SAMPLER: {
+  struct gl_sampler_object *so = _mesa_lookup_samplerobj(ctx, name);
+  if (so)
+ labelPtr = so-Label;
   break;
-   case GL_TEXTURE:
-  {
- struct gl_texture_object *texObj = _mesa_lookup_texture(ctx, name);
- if (texObj)
-labelPtr = texObj-Label;
-  }
+   }
+
+   case GL_TEXTURE: {
+  struct gl_texture_object *texObj = _mesa_lookup_texture(ctx, name);
+  if (texObj)
+ labelPtr = texObj-Label;
   break;
-   case GL_RENDERBUFFER:
-  {
- struct gl_renderbuffer *rb = _mesa_lookup_renderbuffer(ctx, name);
- if (rb)
-labelPtr = rb-Label;
-  }
+   }
+
+   case GL_RENDERBUFFER: {
+  struct gl_renderbuffer *rb = _mesa_lookup_renderbuffer(ctx, name);
+  if (rb)
+ labelPtr = rb-Label;
   break;
-   case GL_FRAMEBUFFER:
-  {
- struct gl_framebuffer *rb = _mesa_lookup_framebuffer(ctx, name);
- if (rb)
-labelPtr = rb-Label;
-  }
+   }
+
+   case GL_FRAMEBUFFER: {
+  struct gl_framebuffer *rb = _mesa_lookup_framebuffer(ctx, name);
+  if (rb)
+ labelPtr = rb-Label;
   break;
+   }
+
case GL_DISPLAY_LIST:
   if (ctx-API == API_OPENGL_COMPAT) {
  struct gl_display_list *list = _mesa_lookup_list(ctx, name);
-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] mesa: add spec quote explaining why value is returned

2013-09-05 Thread Timothy Arceri
Add this comment because other places in OpenGL include the NULL terminator in 
the length.

Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
---
 src/mesa/main/objectlabel.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/src/mesa/main/objectlabel.c b/src/mesa/main/objectlabel.c
index 7e39c92..0942f72 100644
--- a/src/mesa/main/objectlabel.c
+++ b/src/mesa/main/objectlabel.c
@@ -92,6 +92,11 @@ copy_label(char **labelPtr, char *label, int *length, int 
bufSize)
 {
int labelLen = 0;
 
+   /* The KHR_debug spec says:
+*
+* The actual number of characters written into label,
+* excluding the null terminator, is returned in length.
+*/
if (*labelPtr)
   labelLen = strlen(*labelPtr);
 
-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] i915: crash in test, only OpenGL 1.4, on Windows OpenGL 2

2013-09-05 Thread Ville Syrjälä
On Thu, Sep 05, 2013 at 03:03:33AM +0200, Ondrej Riha wrote:
 Hello,
 
 I have i915: Atom N455 and have problem with OpenGL ES 2.0 test. On Windows I 
 have OpenGL 2, but on Linux I have only OpenGL 1.4 and therefore OpenGL ES 
 2.0 test crashed. For more info see:
 
 https://bugs.launchpad.net/glmark2/+bug/1220783

You've managed to mail the entirely wrong person for this question. I'm Cc:ing
the Mesa list as they should be able to give you an accurate answer.

But I do believe there was a relatively recent change to make the i915
Mesa driver always advertise OpenGL 2.1 support for gen3 hardware. So maybe
Mesa 9.2ish or so?

I think your hardware is a pineview (can't really check w/o the pci id
though), and that should be gen3 AFAIK.

-- 
Ville Syrjälä
Intel OTC
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 18/21] glsl: Write a new built-in function module.

2013-09-05 Thread Pohjolainen, Topi
On Wed, Sep 04, 2013 at 03:22:41PM -0700, Kenneth Graunke wrote:
 This creates a new replacement for the existing built-in function code.
 The new module lives in builtin_functions.cpp (not builtin_function.cpp)
 and exists in parallel with the existing system.  It isn't used yet.
 
 The new built-in function code takes a significantly different approach:
 
 Instead of implementing built-ins via printed IR, build time scripts,
 and run time parsing, we now implement them directly in C++, using
 ir_builder.  This translates to faster load times, and a much less
 complex build system.
 
 It also takes a different approach to built-in availability: each
 signature now stores a boolean predicate, which makes it easy to
 construct arbitrary expressions based on _mesa_glsl_parse_state's
 fields.  This is much more flexible than the old system, and also
 easier to use.
 
 Built-ins are also now stored in a single gl_shader object, rather
 than being spread out across a number of shaders that need to be linked.
 When searching for a matching prototype, we simply consult the
 availability predicate.  This also simplifies the code.
 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/glsl/Makefile.sources  |1 +
  src/glsl/builtin_functions.cpp | 3466 
 
  src/glsl/ir.h  |   10 +
  3 files changed, 3477 insertions(+)
  create mode 100644 src/glsl/builtin_functions.cpp
 
 diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
 index 979c416..3e706ef 100644
 --- a/src/glsl/Makefile.sources
 +++ b/src/glsl/Makefile.sources
 @@ -21,6 +21,7 @@ LIBGLSL_FILES = \
   $(GLSL_SRCDIR)/ast_function.cpp \
   $(GLSL_SRCDIR)/ast_to_hir.cpp \
   $(GLSL_SRCDIR)/ast_type.cpp \
 + $(GLSL_SRCDIR)/builtin_functions.cpp \
   $(GLSL_SRCDIR)/builtin_types.cpp \
   $(GLSL_SRCDIR)/builtin_variables.cpp \
   $(GLSL_SRCDIR)/glsl_parser_extras.cpp \
 diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
 new file mode 100644
 index 000..440ec41
 --- /dev/null
 +++ b/src/glsl/builtin_functions.cpp
 @@ -0,0 +1,3466 @@
 +/*
 + * Copyright © 2013 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the Software),
 + * to deal in the Software without restriction, including without limitation
 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the next
 + * paragraph) shall be included in all copies or substantial portions of the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 + * DEALINGS IN THE SOFTWARE.
 + */
 +
 +/**
 + * \file builtin_functions.cpp
 + *
 + * Support for GLSL built-in functions.
 + *
 + * This file is split into several main components:
 + *
 + * 1. Availability predicates
 + *
 + *A series of small functions that check whether the current shader
 + *supports the version/extensions required to expose a built-in.
 + *
 + * 2. Core builtin_builder class functionality
 + *
 + * 3. Lists of built-in functions
 + *
 + *The builtin_builder::create_builtins() function contains lists of all
 + *built-in function signatures, where they're available, what types they
 + *take, and so on.
 + *
 + * 4. Implementations of built-in function signatures
 + *
 + *A series of functions which create ir_function_signatures and emit IR
 + *via ir_builder to implement them.
 + *
 + * 5. External API
 + *
 + *A few functions the rest of the compiler can use to interact with the
 + *built-in function module.  For example, searching for a built-in by
 + *name and parameters.
 + */
 +
 +#include stdarg.h
 +#include stdio.h
 +#include main/core.h /* for struct gl_shader */
 +#include ir_builder.h
 +#include glsl_parser_extras.h
 +#include program/prog_instruction.h
 +#include limits
 +
 +using namespace ir_builder;
 +
 +/**
 + * Availability predicates:
 + *  @{
 + */
 +static bool
 +always_available(const _mesa_glsl_parse_state *state)
 +{
 +   return true;
 +}
 +
 +static bool
 +legacy_vs_only(const _mesa_glsl_parse_state *state)
 +{
 +   return state-target == vertex_shader 
 +  state-language_version = 130 
 +  !state-es_shader;
 +}
 +
 +static bool
 +fs_only(const 

Re: [Mesa-dev] [PATCH 18/21] glsl: Write a new built-in function module.

2013-09-05 Thread Pohjolainen, Topi
On Wed, Sep 04, 2013 at 03:22:41PM -0700, Kenneth Graunke wrote:
 This creates a new replacement for the existing built-in function code.
 The new module lives in builtin_functions.cpp (not builtin_function.cpp)
 and exists in parallel with the existing system.  It isn't used yet.
 
 The new built-in function code takes a significantly different approach:
 
 Instead of implementing built-ins via printed IR, build time scripts,
 and run time parsing, we now implement them directly in C++, using
 ir_builder.  This translates to faster load times, and a much less
 complex build system.
 
 It also takes a different approach to built-in availability: each
 signature now stores a boolean predicate, which makes it easy to
 construct arbitrary expressions based on _mesa_glsl_parse_state's
 fields.  This is much more flexible than the old system, and also
 easier to use.
 
 Built-ins are also now stored in a single gl_shader object, rather
 than being spread out across a number of shaders that need to be linked.
 When searching for a matching prototype, we simply consult the
 availability predicate.  This also simplifies the code.
 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/glsl/Makefile.sources  |1 +
  src/glsl/builtin_functions.cpp | 3466 
 
  src/glsl/ir.h  |   10 +
  3 files changed, 3477 insertions(+)
  create mode 100644 src/glsl/builtin_functions.cpp
 
 diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
 index 979c416..3e706ef 100644
 --- a/src/glsl/Makefile.sources
 +++ b/src/glsl/Makefile.sources
 @@ -21,6 +21,7 @@ LIBGLSL_FILES = \
   $(GLSL_SRCDIR)/ast_function.cpp \
   $(GLSL_SRCDIR)/ast_to_hir.cpp \
   $(GLSL_SRCDIR)/ast_type.cpp \
 + $(GLSL_SRCDIR)/builtin_functions.cpp \
   $(GLSL_SRCDIR)/builtin_types.cpp \
   $(GLSL_SRCDIR)/builtin_variables.cpp \
   $(GLSL_SRCDIR)/glsl_parser_extras.cpp \
 diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
 new file mode 100644
 index 000..440ec41
 --- /dev/null
 +++ b/src/glsl/builtin_functions.cpp
 @@ -0,0 +1,3466 @@
 +/*
 + * Copyright © 2013 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the Software),
 + * to deal in the Software without restriction, including without limitation
 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the next
 + * paragraph) shall be included in all copies or substantial portions of the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 + * DEALINGS IN THE SOFTWARE.
 + */
 +
 +/**
 + * \file builtin_functions.cpp
 + *
 + * Support for GLSL built-in functions.
 + *
 + * This file is split into several main components:
 + *
 + * 1. Availability predicates
 + *
 + *A series of small functions that check whether the current shader
 + *supports the version/extensions required to expose a built-in.
 + *
 + * 2. Core builtin_builder class functionality
 + *
 + * 3. Lists of built-in functions
 + *
 + *The builtin_builder::create_builtins() function contains lists of all
 + *built-in function signatures, where they're available, what types they
 + *take, and so on.
 + *
 + * 4. Implementations of built-in function signatures
 + *
 + *A series of functions which create ir_function_signatures and emit IR
 + *via ir_builder to implement them.
 + *
 + * 5. External API
 + *
 + *A few functions the rest of the compiler can use to interact with the
 + *built-in function module.  For example, searching for a built-in by
 + *name and parameters.
 + */
 +
 +#include stdarg.h
 +#include stdio.h
 +#include main/core.h /* for struct gl_shader */
 +#include ir_builder.h
 +#include glsl_parser_extras.h
 +#include program/prog_instruction.h
 +#include limits
 +
 +using namespace ir_builder;
 +
 +/**
 + * Availability predicates:
 + *  @{
 + */
 +static bool
 +always_available(const _mesa_glsl_parse_state *state)
 +{
 +   return true;
 +}
 +
 +static bool
 +legacy_vs_only(const _mesa_glsl_parse_state *state)
 +{
 +   return state-target == vertex_shader 
 +  state-language_version = 130 
 +  !state-es_shader;
 +}
 +
 +static bool
 +fs_only(const 

Re: [Mesa-dev] [PATCH 18/21] glsl: Write a new built-in function module.

2013-09-05 Thread Pohjolainen, Topi
On Thu, Sep 05, 2013 at 02:27:04PM +0300, Pohjolainen, Topi wrote:
 On Wed, Sep 04, 2013 at 03:22:41PM -0700, Kenneth Graunke wrote:
  This creates a new replacement for the existing built-in function code.
  The new module lives in builtin_functions.cpp (not builtin_function.cpp)
  and exists in parallel with the existing system.  It isn't used yet.
  
  The new built-in function code takes a significantly different approach:
  
  Instead of implementing built-ins via printed IR, build time scripts,
  and run time parsing, we now implement them directly in C++, using
  ir_builder.  This translates to faster load times, and a much less
  complex build system.
  
  It also takes a different approach to built-in availability: each
  signature now stores a boolean predicate, which makes it easy to
  construct arbitrary expressions based on _mesa_glsl_parse_state's
  fields.  This is much more flexible than the old system, and also
  easier to use.
  
  Built-ins are also now stored in a single gl_shader object, rather
  than being spread out across a number of shaders that need to be linked.
  When searching for a matching prototype, we simply consult the
  availability predicate.  This also simplifies the code.
  
  Signed-off-by: Kenneth Graunke kenn...@whitecape.org
  ---
   src/glsl/Makefile.sources  |1 +
   src/glsl/builtin_functions.cpp | 3466 
  
   src/glsl/ir.h  |   10 +
   3 files changed, 3477 insertions(+)
   create mode 100644 src/glsl/builtin_functions.cpp
  
  diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
  index 979c416..3e706ef 100644
  --- a/src/glsl/Makefile.sources
  +++ b/src/glsl/Makefile.sources
  @@ -21,6 +21,7 @@ LIBGLSL_FILES = \
  $(GLSL_SRCDIR)/ast_function.cpp \
  $(GLSL_SRCDIR)/ast_to_hir.cpp \
  $(GLSL_SRCDIR)/ast_type.cpp \
  +   $(GLSL_SRCDIR)/builtin_functions.cpp \
  $(GLSL_SRCDIR)/builtin_types.cpp \
  $(GLSL_SRCDIR)/builtin_variables.cpp \
  $(GLSL_SRCDIR)/glsl_parser_extras.cpp \
  diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
  new file mode 100644
  index 000..440ec41
  --- /dev/null
  +++ b/src/glsl/builtin_functions.cpp
  @@ -0,0 +1,3466 @@
  +/*
  + * Copyright © 2013 Intel Corporation
  + *
  + * Permission is hereby granted, free of charge, to any person obtaining a
  + * copy of this software and associated documentation files (the 
  Software),
  + * to deal in the Software without restriction, including without 
  limitation
  + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
  + * and/or sell copies of the Software, and to permit persons to whom the
  + * Software is furnished to do so, subject to the following conditions:
  + *
  + * The above copyright notice and this permission notice (including the 
  next
  + * paragraph) shall be included in all copies or substantial portions of 
  the
  + * Software.
  + *
  + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS 
  OR
  + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
  + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
  OTHER
  + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
  + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
  + * DEALINGS IN THE SOFTWARE.
  + */
  +
  +/**
  + * \file builtin_functions.cpp
  + *
  + * Support for GLSL built-in functions.
  + *
  + * This file is split into several main components:
  + *
  + * 1. Availability predicates
  + *
  + *A series of small functions that check whether the current shader
  + *supports the version/extensions required to expose a built-in.
  + *
  + * 2. Core builtin_builder class functionality
  + *
  + * 3. Lists of built-in functions
  + *
  + *The builtin_builder::create_builtins() function contains lists of all
  + *built-in function signatures, where they're available, what types 
  they
  + *take, and so on.
  + *
  + * 4. Implementations of built-in function signatures
  + *
  + *A series of functions which create ir_function_signatures and emit IR
  + *via ir_builder to implement them.
  + *
  + * 5. External API
  + *
  + *A few functions the rest of the compiler can use to interact with the
  + *built-in function module.  For example, searching for a built-in by
  + *name and parameters.
  + */
  +
  +#include stdarg.h
  +#include stdio.h
  +#include main/core.h /* for struct gl_shader */
  +#include ir_builder.h
  +#include glsl_parser_extras.h
  +#include program/prog_instruction.h
  +#include limits
  +
  +using namespace ir_builder;
  +
  +/**
  + * Availability predicates:
  + *  @{
  + */
  +static bool
  +always_available(const _mesa_glsl_parse_state *state)
  +{
  +   return true;
  +}
  +
  +static bool
  

Re: [Mesa-dev] [PATCH 10/11] radeonsi: implement streamout shader support

2013-09-05 Thread Marek Olšák
The warning is wrong and my gcc 4.7.3 doesn't show it. This code at
the beginning of the block where vdata is declared ensures vdata is
always initialized:

if (!num_comps || num_comps  4)
continue;

You might be seeing one of these bugs:

http://gcc.gnu.org/bugzilla/buglist.cgi?quicksearch=may%20be%20uninitialized

That said, I'm gonna set vdata to NULL anyway.

Marek

On Thu, Sep 5, 2013 at 11:38 AM, Michel Dänzer daen...@debian.org wrote:
 On Die, 2013-09-03 at 15:23 +0200, Marek Olšák wrote:
 The shader is responsible for writing to streamout buffers using
 the TBUFFER_STORE_FORMAT_* instructions.

 The locations of some input SGPRs and VGPRs are assigned dynamically, because
 the input SGPRs controlling streamout are not declared if they are not 
 needed,
 decreasing the indices of all following inputs.
 ---
  src/gallium/drivers/radeonsi/radeonsi_shader.c | 279 
 -
  src/gallium/drivers/radeonsi/radeonsi_shader.h |   5 +-
  src/gallium/drivers/radeonsi/si_state_draw.c   |   7 +-
  3 files changed, 276 insertions(+), 15 deletions(-)

 diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c 
 b/src/gallium/drivers/radeonsi/radeonsi_shader.c
 index 335cd79..92f7cf5 100644
 --- a/src/gallium/drivers/radeonsi/radeonsi_shader.c
 +++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c

 [...]

 + /* Pack the output. */
 + LLVMValueRef vdata;
 +
 + switch (num_comps) {
 + case 1: /* as i32 */
 + vdata = out[0];
 + break;
 + case 2: /* as v2i32 */
 + case 3: /* as v4i32 (aligned to 4) */
 + case 4: /* as v4i32 */
 + vdata = LLVMGetUndef(LLVMVectorType(i32, 
 util_next_power_of_two(num_comps)));
 + for (j = 0; j  num_comps; j++) {
 + vdata = 
 LLVMBuildInsertElement(builder, vdata, out[j],
 +
 LLVMConstInt(i32, j, 0), );
 + }
 + break;
 + }

 This introduces a warning:

 .../radeonsi_shader.c: In function 'si_llvm_emit_epilogue':
 .../radeonsi_shader.c:708:15: warning: 'vdata' may be used uninitialized in 
 this function [-Wmaybe-uninitialized]
   LLVMValueRef args[] = {
^
 .../radeonsi_shader.c:840:17: note: 'vdata' was declared here
 LLVMValueRef vdata;
  ^

 Other than that, the series looks good to me.


 --
 Earthling Michel Dänzer   |   http://www.amd.com
 Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/gen7: always lower textureGrad() on gen7

2013-09-05 Thread Chia-I Wu
On Thu, Sep 5, 2013 at 5:12 PM, Chris Forbes chr...@ijw.co.nz wrote:
 A possible explanation for the perf change is that Xonotic uses
 anisotropic filtering at this quality level. Lowering to txl defeats
 it.
I had a look at that.  gl_sampler-MaxAnisotropy is never greater than
1.0 in gen7_update_sampler_state() so there is no anisotropic
filtering in this case.

It makes sense to me that avoiding punting to SIMD8 helps the
performance.  But it is not clear to me why 10% performance change
can still be observed when INTEL_DEBUG=no16 is specified.  A
reasonable explanation is that the image quality is degraded in some
way, which is why I am still nervous about the change.

An alternative approach to avoid punting seems to emulate SIMD16
sample_d with two SIMD8 sample_d.  It will take longer to implement
given my familiarity with the code, and may be less performant.  BUt
that would allow things like anisotropic filtering to be honored.


 It would be worth doing an image quality comparison before and after the 
 change.
Yeah, that is worth doing.  I will do that.


 -- Chris

 On Thu, Sep 5, 2013 at 8:35 PM, Chia-I Wu olva...@gmail.com wrote:
 sample_d is slower than the lowered version on gen7.  For gen7, this improves
 Xonotic benchmark with Ultimate effects by as much as 25%:

  before the change:  40.06 fps
  after the change:   51.10 fps
  after the change with INTEL_DEBUG=no16: 44.46 fps

 As sample_d is not allowed in SIMD16 mode, I firstly thought the difference
 was from SIMD8 versus SIMD16.  If that was the case, we would want to apply
 brw_lower_texture_gradients() only on fragment shaders in SIMD16 mode.

 But, as the numbers show, there is still 10% improvement when SIMD16 is 
 forced
 off after the change.  Thus textureGrad() is lowered unconditionally for now.
 Due to this and that I haven't tried it on Haswell, this is still RFC.

 No piglit regressions.

 Signed-off-by: Chia-I Wu olva...@gmail.com
 ---
  .../dri/i965/brw_lower_texture_gradients.cpp   | 54 
 ++
  1 file changed, 36 insertions(+), 18 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp 
 b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
 index 1589a20..f3fcb56 100644
 --- a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
 @@ -34,8 +34,8 @@ using namespace ir_builder;

  class lower_texture_grad_visitor : public ir_hierarchical_visitor {
  public:
 -   lower_texture_grad_visitor(bool has_sample_d_c)
 -  : has_sample_d_c(has_sample_d_c)
 +   lower_texture_grad_visitor(bool has_sample_d, bool has_sample_d_c)
 +  : has_sample_d(has_sample_d), has_sample_d_c(has_sample_d_c)
 {
progress = false;
 }
 @@ -44,6 +44,7 @@ public:


 bool progress;
 +   bool has_sample_d;
 bool has_sample_d_c;

  private:
 @@ -90,22 +91,33 @@ txs_type(const glsl_type *type)
  ir_visitor_status
  lower_texture_grad_visitor::visit_leave(ir_texture *ir)
  {
 -   /* Only lower textureGrad with shadow samplers */
 -   if (ir-op != ir_txd || !ir-shadow_comparitor)
 +   if (ir-op != ir_txd)
return visit_continue;

 -   /* Lower textureGrad() with samplerCubeShadow even if we have the 
 sample_d_c
 -* message.  GLSL provides gradients for the 'r' coordinate.  
 Unfortunately:
 -*
 -* From the Ivybridge PRM, Volume 4, Part 1, sample_d message 
 description:
 -* The r coordinate contains the faceid, and the r gradients are ignored
 -*  by hardware.
 -*
 -* We likely need to do a similar treatment for samplerCube and
 -* samplerCubeArray, but we have insufficient testing for that at the 
 moment.
 -*/
 -   bool need_lowering = !has_sample_d_c ||
 -  ir-sampler-type-sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE;
 +   bool need_lowering = false;
 +
 +   if (ir-shadow_comparitor) {
 +  /* Lower textureGrad() with samplerCubeShadow even if we have the
 +   * sample_d_c message.  GLSL provides gradients for the 'r' 
 coordinate.
 +   * Unfortunately:
 +   *
 +   * From the Ivybridge PRM, Volume 4, Part 1, sample_d message
 +   * description: The r coordinate contains the faceid, and the r
 +   * gradients are ignored by hardware.
 +   */
 +  if (ir-sampler-type-sampler_dimensionality == 
 GLSL_SAMPLER_DIM_CUBE)
 + need_lowering = true;
 +  else if (!has_sample_d_c)
 + need_lowering = true;
 +   }
 +   else {
 +  /* We likely need to do a similar treatment for samplerCube and
 +   * samplerCubeArray, but we have insufficient testing for that at the
 +   * moment.
 +   */
 +  if (!has_sample_d)
 + need_lowering = true;
 +   }

 if (!need_lowering)
return visit_continue;
 @@ -154,7 +166,9 @@ lower_texture_grad_visitor::visit_leave(ir_texture *ir)
expr(ir_unop_sqrt, dot(dPdy, dPdy)));

Re: [Mesa-dev] [PATCH] Fix for throwing BadDrawable (invalid Pixmap or Window parameter) by Xserver

2013-09-05 Thread Maniak, Lukasz
Original explanatory text:

Fix for throwing BadDrawable (invalid Pixmap or Window parameter)
by Xserver when calling XCreatePixmap or fill_bitmap from
DRI_glXUseXFont.

currentDrawable which came from __glXGetCurrentContext was not correct in
this case, we use DefaultRootWindow(dpy) instead.

To reproduce this error you just have to call this:

dpy = (machine.display)-handle();
fontStruct = XLoadQueryFont(dpy, fixed);
int firstGlyph = fontStruct-min_char_or_byte2;
int dispListBase = 1;
glXUseXFont(fontStruct-fid, firstGlyph, 1, dispListBase);

And you will get this:

X Error of failed request: BadDrawable (invalid Pixmap or Window parameter)
Major opcode of failed request: 53 (X_CreatePixmap)

Feedback from CC reporters would be really helpful.

-Original Message-
From: Ian Romanick [mailto:i...@freedesktop.org] 
Sent: Wednesday, September 04, 2013 7:09 PM
To: Maniak, Lukasz
Cc: mesa-dev@lists.freedesktop.org; Alexander Monakov; djee...@gmail.com
Subject: Re: [Mesa-dev] [PATCH] Fix for throwing BadDrawable (invalid Pixmap or 
Window parameter) by Xserver

The explanatory text from the mangled version of the patch should be included 
here as well.

Can you give some more explanation of how this can occur?  I'd like to have a 
piglit test.  Looking at the documentation for glXUseXFont, there needs to be a 
valid context bound, and there is some commentary about invalid windows:

GLXBadCurrentWindow is generated if the drawable associated
with the current context of the calling thread is a window,
and that window is no longer valid.

Also, this seems related to

https://bugs.freedesktop.org/show_bug.cgi?id=56922

and possibly

https://bugs.freedesktop.org/show_bug.cgi?id=54080

Perhaps the reporters of those bugs (added to CC) can comment on whether this 
change fixes their bugs.

On 09/04/2013 09:40 AM, Lukasz Maniak wrote:
 Signed-off-by: Lukasz Maniak lukasz.man...@intel.com
 ---
  src/glx/xfont.c | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)
 
 diff --git a/src/glx/xfont.c b/src/glx/xfont.c index 316c585..60e28ab 
 100644
 --- a/src/glx/xfont.c
 +++ b/src/glx/xfont.c
 @@ -215,7 +215,6 @@ _X_HIDDEN void
  DRI_glXUseXFont(struct glx_context *CC, Font font, int first, int 
 count, int listbase)  {
 Display *dpy;
 -   Window win;
 Pixmap pixmap;
 GC gc;
 XGCValues values;
 @@ -231,7 +230,6 @@ DRI_glXUseXFont(struct glx_context *CC, Font font, int 
 first, int count, int lis
 int i;
  
 dpy = CC-currentDpy;
 -   win = CC-currentDrawable;
  
 fs = XQueryFont(dpy, font);
 if (!fs) {
 @@ -279,7 +277,7 @@ DRI_glXUseXFont(struct glx_context *CC, Font font, int 
 first, int count, int lis
 glPixelStorei(GL_UNPACK_SKIP_PIXELS, 0);
 glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
  
 -   pixmap = XCreatePixmap(dpy, win, 10, 10, 1);
 +   pixmap = XCreatePixmap(dpy, DefaultRootWindow(dpy), 10, 10, 1);
 values.foreground = BlackPixel(dpy, DefaultScreen(dpy));
 values.background = WhitePixel(dpy, DefaultScreen(dpy));
 values.font = fs-fid;
 @@ -342,7 +340,7 @@ DRI_glXUseXFont(struct glx_context *CC, Font font, int 
 first, int count, int lis
if (valid  (bm_width  0)  (bm_height  0)) {
  
   memset(bm, '\0', bm_width * bm_height);
 - fill_bitmap(dpy, win, gc, bm_width, bm_height, x, y, c, bm);
 + fill_bitmap(dpy, DefaultRootWindow(dpy), gc, bm_width, 
 + bm_height, x, y, c, bm);
  
   glBitmap(width, height, x0, y0, dx, dy, bm);  #ifdef DEBUG
 

-
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial 
Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | 
Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i 
moze zawierac informacje poufne. W razie przypadkowego otrzymania tej 
wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; 
jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). If you are not the intended recipient, please 
contact the sender and delete all copies; any review or distribution by others 
is strictly prohibited.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Fix for throwing BadDrawable (invalid Pixmap or Window parameter) by Xserver

2013-09-05 Thread Alexander Monakov
On Thu, Sep 5, 2013 at 6:25 PM, Maniak, Lukasz lukasz.man...@intel.com wrote:
 Original explanatory text:

 Fix for throwing BadDrawable (invalid Pixmap or Window parameter)
 by Xserver when calling XCreatePixmap or fill_bitmap from
 DRI_glXUseXFont.

 currentDrawable which came from __glXGetCurrentContext was not correct in
 this case, we use DefaultRootWindow(dpy) instead.

 To reproduce this error you just have to call this:

 dpy = (machine.display)-handle();
 fontStruct = XLoadQueryFont(dpy, fixed);
 int firstGlyph = fontStruct-min_char_or_byte2;
 int dispListBase = 1;
 glXUseXFont(fontStruct-fid, firstGlyph, 1, dispListBase);

But it doesn't make sense to call glXUseXFont without a current context.

 And you will get this:

 X Error of failed request: BadDrawable (invalid Pixmap or Window parameter)
 Major opcode of failed request: 53 (X_CreatePixmap)

 Feedback from CC reporters would be really helpful.

Ian added me as the reporter of bug 54080, but that issue does not
reference glXUseXFont at all.

Alexander
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] gallium: comment that INSTANCEID doesn't include start_instance

2013-09-05 Thread Marek Olšák
---
 src/gallium/include/pipe/p_shader_tokens.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/include/pipe/p_shader_tokens.h 
b/src/gallium/include/pipe/p_shader_tokens.h
index 872dfe9..1beec05 100644
--- a/src/gallium/include/pipe/p_shader_tokens.h
+++ b/src/gallium/include/pipe/p_shader_tokens.h
@@ -153,7 +153,7 @@ struct tgsi_declaration_interp
 #define TGSI_SEMANTIC_FACE   7
 #define TGSI_SEMANTIC_EDGEFLAG   8
 #define TGSI_SEMANTIC_PRIMID 9
-#define TGSI_SEMANTIC_INSTANCEID 10
+#define TGSI_SEMANTIC_INSTANCEID 10 /** doesn't include start_instance */
 #define TGSI_SEMANTIC_VERTEXID   11
 #define TGSI_SEMANTIC_STENCIL12
 #define TGSI_SEMANTIC_CLIPDIST   13
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radeonsi: fix gl_InstanceID with non-zero start_instance

2013-09-05 Thread Marek Olšák
start_instance doesn't affect gl_InstanceID.

There's no piglit test, but it's kinda obvious the code was wrong.
---
 src/gallium/drivers/radeonsi/radeonsi_shader.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c 
b/src/gallium/drivers/radeonsi/radeonsi_shader.c
index 80dd773..867a385 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_shader.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c
@@ -122,7 +122,7 @@ static LLVMValueRef build_indexed_load(
return result;
 }
 
-static LLVMValueRef get_instance_index(
+static LLVMValueRef get_instance_index_for_fetch(
struct radeon_llvm_context * radeon_bld,
unsigned divisor)
 {
@@ -174,7 +174,7 @@ static void declare_input_vs(
if (divisor) {
/* Build index from instance ID, start instance and divisor */
si_shader_ctx-shader-shader.uses_instanceid = true;
-   buffer_index = get_instance_index(si_shader_ctx-radeon_bld, 
divisor);
+   buffer_index = 
get_instance_index_for_fetch(si_shader_ctx-radeon_bld, divisor);
} else {
/* Load the buffer index, which is always stored in VGPR0
 * for Vertex Shaders */
@@ -414,7 +414,8 @@ static void declare_system_value(
 
switch (decl-Semantic.Name) {
case TGSI_SEMANTIC_INSTANCEID:
-   value = get_instance_index(radeon_bld, 1);
+   value = LLVMGetParam(radeon_bld-main_fn,
+si_shader_ctx-param_instance_id);
break;
 
case TGSI_SEMANTIC_VERTEXID:
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/gen7: always lower textureGrad() on gen7

2013-09-05 Thread Roland Scheidegger
Hmm I don't think the math works out here actually, which may explain
why it's faster.
I believe the derivatives need to be transformed to cube coord system
and I don't see that being done here (this is actually something I
haven't figured out the math yet how to do with reasonable effort for
llvmpipe).
OTOH you could actually simplify the rho calculation a bit, since you
could do the sqrt easily after the max hence only needing one instead of
two sqrt (though if your hw has blazing fast sqrt it won't matter...).

Roland


Am 05.09.2013 10:35, schrieb Chia-I Wu:
 sample_d is slower than the lowered version on gen7.  For gen7, this improves
 Xonotic benchmark with Ultimate effects by as much as 25%:
 
  before the change:  40.06 fps
  after the change:   51.10 fps
  after the change with INTEL_DEBUG=no16: 44.46 fps
 
 As sample_d is not allowed in SIMD16 mode, I firstly thought the difference
 was from SIMD8 versus SIMD16.  If that was the case, we would want to apply
 brw_lower_texture_gradients() only on fragment shaders in SIMD16 mode.
 
 But, as the numbers show, there is still 10% improvement when SIMD16 is forced
 off after the change.  Thus textureGrad() is lowered unconditionally for now.
 Due to this and that I haven't tried it on Haswell, this is still RFC.
 
 No piglit regressions.
 
 Signed-off-by: Chia-I Wu olva...@gmail.com
 ---
  .../dri/i965/brw_lower_texture_gradients.cpp   | 54 
 ++
  1 file changed, 36 insertions(+), 18 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp 
 b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
 index 1589a20..f3fcb56 100644
 --- a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp
 @@ -34,8 +34,8 @@ using namespace ir_builder;
  
  class lower_texture_grad_visitor : public ir_hierarchical_visitor {
  public:
 -   lower_texture_grad_visitor(bool has_sample_d_c)
 -  : has_sample_d_c(has_sample_d_c)
 +   lower_texture_grad_visitor(bool has_sample_d, bool has_sample_d_c)
 +  : has_sample_d(has_sample_d), has_sample_d_c(has_sample_d_c)
 {
progress = false;
 }
 @@ -44,6 +44,7 @@ public:
  
  
 bool progress;
 +   bool has_sample_d;
 bool has_sample_d_c;
  
  private:
 @@ -90,22 +91,33 @@ txs_type(const glsl_type *type)
  ir_visitor_status
  lower_texture_grad_visitor::visit_leave(ir_texture *ir)
  {
 -   /* Only lower textureGrad with shadow samplers */
 -   if (ir-op != ir_txd || !ir-shadow_comparitor)
 +   if (ir-op != ir_txd)
return visit_continue;
  
 -   /* Lower textureGrad() with samplerCubeShadow even if we have the 
 sample_d_c
 -* message.  GLSL provides gradients for the 'r' coordinate.  
 Unfortunately:
 -*
 -* From the Ivybridge PRM, Volume 4, Part 1, sample_d message description:
 -* The r coordinate contains the faceid, and the r gradients are ignored
 -*  by hardware.
 -*
 -* We likely need to do a similar treatment for samplerCube and
 -* samplerCubeArray, but we have insufficient testing for that at the 
 moment.
 -*/
 -   bool need_lowering = !has_sample_d_c ||
 -  ir-sampler-type-sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE;
 +   bool need_lowering = false;
 +
 +   if (ir-shadow_comparitor) {
 +  /* Lower textureGrad() with samplerCubeShadow even if we have the
 +   * sample_d_c message.  GLSL provides gradients for the 'r' coordinate.
 +   * Unfortunately:
 +   *
 +   * From the Ivybridge PRM, Volume 4, Part 1, sample_d message
 +   * description: The r coordinate contains the faceid, and the r
 +   * gradients are ignored by hardware.
 +   */
 +  if (ir-sampler-type-sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE)
 + need_lowering = true;
 +  else if (!has_sample_d_c)
 + need_lowering = true;
 +   }
 +   else {
 +  /* We likely need to do a similar treatment for samplerCube and
 +   * samplerCubeArray, but we have insufficient testing for that at the
 +   * moment.
 +   */
 +  if (!has_sample_d)
 + need_lowering = true;
 +   }
  
 if (!need_lowering)
return visit_continue;
 @@ -154,7 +166,9 @@ lower_texture_grad_visitor::visit_leave(ir_texture *ir)
  expr(ir_unop_sqrt, dot(dPdy, dPdy)));
 }
  
 -   /* lambda_base = log2(rho).  We're ignoring GL state biases for now. */
 +   /* lambda_base = log2(rho).  It will be biased and clamped by values
 +* defined in SAMPLER_STATE to get the final lambda.
 +*/
 ir-op = ir_txl;
 ir-lod_info.lod = expr(ir_unop_log2, rho);
  
 @@ -168,8 +182,12 @@ bool
  brw_lower_texture_gradients(struct brw_context *brw,
  struct exec_list *instructions)
  {
 +   /* sample_d is slower than the lowered version on gen7, and is not allowed
 +* in SIMD16 mode.  Treating it as unsupported 

Re: [Mesa-dev] [PATCH 1/2] gallium: comment that INSTANCEID doesn't include start_instance

2013-09-05 Thread Christian König
I was aware that this was only correct in one use case, but didn't know 
which one. Probably should have added a comment about that.


Thanks for fixing it, both patches are: Reviewed-by: Christian König 
christian.koe...@amd.com


Am 05.09.2013 16:41, schrieb Marek Olšák:

---
  src/gallium/include/pipe/p_shader_tokens.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/include/pipe/p_shader_tokens.h 
b/src/gallium/include/pipe/p_shader_tokens.h
index 872dfe9..1beec05 100644
--- a/src/gallium/include/pipe/p_shader_tokens.h
+++ b/src/gallium/include/pipe/p_shader_tokens.h
@@ -153,7 +153,7 @@ struct tgsi_declaration_interp
  #define TGSI_SEMANTIC_FACE   7
  #define TGSI_SEMANTIC_EDGEFLAG   8
  #define TGSI_SEMANTIC_PRIMID 9
-#define TGSI_SEMANTIC_INSTANCEID 10
+#define TGSI_SEMANTIC_INSTANCEID 10 /** doesn't include start_instance */
  #define TGSI_SEMANTIC_VERTEXID   11
  #define TGSI_SEMANTIC_STENCIL12
  #define TGSI_SEMANTIC_CLIPDIST   13


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Fix for throwing BadDrawable (invalid Pixmap or Window parameter) by Xserver

2013-09-05 Thread Alexander Monakov
On Thu, Sep 5, 2013 at 6:40 PM, Alexander Monakov amona...@gmail.com wrote:
 But it doesn't make sense to call glXUseXFont without a current context.

However ARB_create_context allows to make a context current without an
associated drawable.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] R600/SI: expose TBUFFER_STORE_FORMAT_* for OpenGL transform feedback

2013-09-05 Thread Tom Stellard
On Mon, Sep 02, 2013 at 09:07:18PM +0200, Marek Olšák wrote:
 For _XYZ, the type of VDATA is v4i32, because v3i32 doesn't exist.
 
 The ADDR64 bit is not exposed. A simpler intrinsic that doesn't take
 a resource descriptor might be nicer.
 
 The maximum number of input SGPRs is bumped to 17.
 
 Signed-off-by: Marek Olšák marek.ol...@amd.com
 ---
  lib/Target/R600/AMDGPUCallingConv.td   |  3 ++-
  lib/Target/R600/AMDGPUISelLowering.cpp |  1 +
  lib/Target/R600/AMDGPUISelLowering.h   |  1 +
  lib/Target/R600/SIISelLowering.cpp | 39 
 ++
  lib/Target/R600/SIInstrInfo.td | 27 +++
  lib/Target/R600/SIInstructions.td  | 29 +
  lib/Target/R600/SIIntrinsics.td| 18 
  7 files changed, 113 insertions(+), 5 deletions(-)
 
 diff --git a/lib/Target/R600/AMDGPUCallingConv.td 
 b/lib/Target/R600/AMDGPUCallingConv.td
 index 84d3118..d26be32 100644
 --- a/lib/Target/R600/AMDGPUCallingConv.td
 +++ b/lib/Target/R600/AMDGPUCallingConv.td
 @@ -19,7 +19,8 @@ def CC_SI : CallingConv[
  
CCIfInRegCCIfType[f32, i32] , CCAssignToReg[
  SGPR0, SGPR1, SGPR2, SGPR3, SGPR4, SGPR5, SGPR6, SGPR7,
 -SGPR8, SGPR9, SGPR10, SGPR11, SGPR12, SGPR13, SGPR14, SGPR15
 +SGPR8, SGPR9, SGPR10, SGPR11, SGPR12, SGPR13, SGPR14, SGPR15,
 +SGPR16

Why is this necessary?  Are we using all 16 user sgprs now?

],
  
CCIfInRegCCIfType[i64] , CCAssignToRegWithShadow
 diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp 
 b/lib/Target/R600/AMDGPUISelLowering.cpp
 index 1237323..30d9503 100644
 --- a/lib/Target/R600/AMDGPUISelLowering.cpp
 +++ b/lib/Target/R600/AMDGPUISelLowering.cpp
 @@ -718,5 +718,6 @@ const char* 
 AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
NODE_NAME_CASE(SAMPLED)
NODE_NAME_CASE(SAMPLEL)
NODE_NAME_CASE(STORE_MSKOR)
 +  NODE_NAME_CASE(TBUFFER_STORE_FORMAT)
}
  }
 diff --git a/lib/Target/R600/AMDGPUISelLowering.h 
 b/lib/Target/R600/AMDGPUISelLowering.h
 index 75ac4c2..8a68356 100644
 --- a/lib/Target/R600/AMDGPUISelLowering.h
 +++ b/lib/Target/R600/AMDGPUISelLowering.h
 @@ -160,6 +160,7 @@ enum {
FIRST_MEM_OPCODE_NUMBER = ISD::FIRST_TARGET_MEMORY_OPCODE,
STORE_MSKOR,
LOAD_CONSTANT,
 +  TBUFFER_STORE_FORMAT,
LAST_AMDGPU_ISD_NUMBER
  };
  
 diff --git a/lib/Target/R600/SIISelLowering.cpp 
 b/lib/Target/R600/SIISelLowering.cpp
 index f196059..6fa0c85 100644
 --- a/lib/Target/R600/SIISelLowering.cpp
 +++ b/lib/Target/R600/SIISelLowering.cpp
 @@ -86,6 +86,8 @@ SITargetLowering::SITargetLowering(TargetMachine TM) :
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v16i8, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v4f32, Custom);
  
 +  setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);
 +
setLoadExtAction(ISD::SEXTLOAD, MVT::i32, Expand);
  
setLoadExtAction(ISD::EXTLOAD, MVT::f32, Expand);
 @@ -462,6 +464,43 @@ SDValue SITargetLowering::LowerOperation(SDValue Op, 
 SelectionDAG DAG) const {
   Op.getOperand(3));
  }
}
 +
 +  case ISD::INTRINSIC_VOID:
 +SDValue Chain = Op.getOperand(0);
 +unsigned IntrinsicID = 
 castConstantSDNode(Op.getOperand(1))-getZExtValue();
 +
 +switch (IntrinsicID) {
 +  case AMDGPUIntrinsic::SI_tbuffer_store: {
 +SDLoc DL(Op);
 +SDValue Ops [] = {
 +  Chain,
 +  ResourceDescriptorToi128(Op.getOperand(2), DAG),
 +  Op.getOperand(3),
 +  Op.getOperand(4),
 +  Op.getOperand(5),
 +  Op.getOperand(6),
 +  Op.getOperand(7),
 +  Op.getOperand(8),
 +  Op.getOperand(9),
 +  Op.getOperand(10),
 +  Op.getOperand(11),
 +  Op.getOperand(12),
 +  Op.getOperand(13),
 +  Op.getOperand(14)
 +};
 +EVT VT = Op.getOperand(3).getValueType();
 +
 +MachineMemOperand *MMO = MF.getMachineMemOperand(
 +MachinePointerInfo(),
 +MachineMemOperand::MOStore,
 +VT.getSizeInBits() / 8, 4);
 +return DAG.getMemIntrinsicNode(AMDGPUISD::TBUFFER_STORE_FORMAT, DL,
 +   Op-getVTList(), Ops,
 +   sizeof(Ops)/sizeof(Ops[0]), VT, MMO);
 +  }
 +  default:
 +break;
 +}
}
return SDValue();
  }
 diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
 index ecc4718..c902feb 100644
 --- a/lib/Target/R600/SIInstrInfo.td
 +++ b/lib/Target/R600/SIInstrInfo.td
 @@ -21,6 +21,25 @@ def SIload_constant : SDNodeAMDGPUISD::LOAD_CONSTANT,
[SDNPMayLoad, SDNPMemOperand]
  ;
  
 +def SItbuffer_store : SDNodeAMDGPUISD::TBUFFER_STORE_FORMAT,
 +  SDTypeProfile0, 13,
 +[SDTCisVT0, i128,   // rsrc(SGPR)
 + SDTCisVT1, iAny,   // vdata(VGPR)
 + SDTCisVT2, i32,// num_channels(imm)
 + SDTCisVT3, i32,// vaddr(VGPR)
 + SDTCisVT4, i32,// 

Re: [Mesa-dev] [PATCH] docs: Add some notes on submitting patches

2013-09-05 Thread Brian Paul

On 09/05/2013 02:54 AM, Timothy Arceri wrote:


Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
---
  docs/devinfo.html |   23 +++
  1 file changed, 23 insertions(+)

diff --git a/docs/devinfo.html b/docs/devinfo.html
index 4c1099c..d921e0d 100644
--- a/docs/devinfo.html
+++ b/docs/devinfo.html
@@ -155,6 +155,29 @@ of ttbool/tt, tttrue/tt, and
  src/mesa/state_tracker/st_glsl_to_tgsi.cpp can serve as examples.
  /p

+h2Submitting patches/h2
+
+p
+You should always run the Mesa Testsuite before submitting patches.
+The Testsuite can be run using the 'make check' command. All test


All tests



+must pass before patches will be accepted, this may mean you have
+to update the tests themselves.
+/p
+
+p
+Patches should be sent to the Mesa mailing list for review.
+When submitting a patch make sure to use git send-email rather than attaching
+patches to emails. Sending patches as attachments prevents people from being
+able to provide in-line review comments.
+/p
+
+p
+When submitting follow-up patches you can use --in-reply-to to make v2, v3,
+etc patches show up as replies to the originals. This usually works well
+when you're sending out updates to individual patches (as opposed to
+re-sending the whole series). Using --in-reply-to makes
+it harder for reviewers to accidentally review old patches.
+/p

  h2Marking a commit as a candidate for a stable branch/h2




Reviewed-by: Brian Paul bri...@vmware.com

I can fix the typo above and commit/push.

-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/15] mesa: Implement KHR_debug ObjectLabel functions

2013-09-05 Thread Brian Paul

On 09/04/2013 09:09 PM, Ian Romanick wrote:


In the mean time, we should land Brian's patch to fix 'make check'.


Does that imply your R-b?

-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glx: Fix for throwing BadDrawable (invalid Pixmap or Window parameter) by Xserver

2013-09-05 Thread Brian Paul

On 09/04/2013 10:30 AM, Maniak, Lukasz wrote:

 Fix for thro= wing BadDrawable (invalid Pixmap or Window parameter)

 by Xserver w= hen calling XCreatePixmap or fill_bitmap from

 DRI_glXUseXF= ont.

= /span

 current= Drawable which came from __glXGetCurrentContext was not
correct in

 this case, w= e use DefaultRootWindow(dpy) instead.


By was not correct do you mean it was zero?

The drawable/window parameter to XCreatePixmap() is really just used to 
determine the depth for the Pixmap.  I think any window/depth should be 
OK since we're just looking for zero and non-zero pixels when generating 
the gl bitmaps.





Signed-off-by: Lukasz Maniak l= t;lukasz.man...@intel.com

---

src/glx/xfont.c | 6 ++-= ---

1 file changed, 2 insertions(#= 43;), 4 deletions(-)

diff --git a/src/glx/xfont.c b/= src/glx/xfont.c

index 316c585..60e28ab 100644

--- a/src/glx/xfont.c

+++ b/src/glx/xfont= .c

@@ -215,7 +215,6 @@ _X_HIDD= EN void

DRI_glXUseXFont(struct glx_cont= ext *CC, Font font, int first, int
count, int listbase)

{

 Display *dpy= ;

-   Window win;= /o:p

 Pixmap pixma= p;

 GC gc; =

 XGCValu= es values;

@@ -231,7 +230,6 @@ DRI_glX= UseXFont(struct glx_context *CC, Font font,
int first, int count, int lis

 int i;= /o:p

 dpy =3D= CC-currentDpy;

-   win =3D CC-cu= rrentDrawable;

 fs =3D = XQueryFont(dpy, font);

 if (!fs) {

@@ -279,7 +277,7 @@ DRI_glX= UseXFont(struct glx_context *CC, Font font,
int first, int count, int lis

 glPixelStore= i(GL_UNPACK_SKIP_PIXELS, 0);

 glPixel= Storei(GL_UNPACK_ALIGNMENT, 1);

-   pixmap =3D XCreat= ePixmap(dpy, win, 10, 10, 1);

+   pixmap =3D XC= reatePixmap(dpy, DefaultRootWindow(dpy), 10, 10, 1);

 values.foreg= round =3D BlackPixel(dpy, DefaultScreen(dpy));

 values.backg= round =3D WhitePixel(dpy, DefaultScreen(dpy));

 values.font = =3D fs-fid;

@@ -342,7 +340,7 @@ DRI_glX= UseXFont(struct glx_context *CC, Font font,
int first, int count, int lis

  = nbsp; if (valid  (bm_width  0)  (bm_height  0))= {

  = nbsp;memset(bm, '\0', bm_width * bm_height);

- = fill_bitmap(dpy, win, gc, bm_width, bm_height, x, y, c, = bm);

+n= bsp;fill_bitmap(dpy, DefaultRootWindow(dpy), gc, bm_widt=
h, bm_height, x, y, c, bm);

  = nbsp;glBitmap(width, height, x0, y0, dx, dy, bm);

#ifdef DEBUG= /p



A less invasive change would be to simply replace

 win = CC-currentDrawable;

with

 win = DefaultRootWindow(dpy);

I'd prefer that.

-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/15] mesa: Implement KHR_debug ObjectLabel functions

2013-09-05 Thread Brian Paul

On 09/05/2013 01:05 AM, Timothy Arceri wrote:


I also have one more question about working on Mesa. Is there a wiki
page or something where developers register who is working on what
extension to avoid double up?


There's no such page right now.  People sometimes will post a message to 
say they're working on a new feature.


I'd be happy to have a wiki page that gives a heads-up on who's doing what.

-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] R600/SI: expose TBUFFER_STORE_FORMAT_* for OpenGL transform feedback

2013-09-05 Thread Marek Olšák
No, we use 11 user data SGPRs for the vertex shader, but there are
also 6 additional SGPRs loaded by the hw based on the VGT state (4
streamout offsets, streamout_enable, and streamout_write_index). The 6
SGPRs can be enabled by setting SPI_SHADER_PGM_RSRC2_VS.SO_* = 1.

Marek

On Thu, Sep 5, 2013 at 5:44 PM, Tom Stellard t...@stellard.net wrote:
 On Mon, Sep 02, 2013 at 09:07:18PM +0200, Marek Olšák wrote:
 For _XYZ, the type of VDATA is v4i32, because v3i32 doesn't exist.

 The ADDR64 bit is not exposed. A simpler intrinsic that doesn't take
 a resource descriptor might be nicer.

 The maximum number of input SGPRs is bumped to 17.

 Signed-off-by: Marek Olšák marek.ol...@amd.com
 ---
  lib/Target/R600/AMDGPUCallingConv.td   |  3 ++-
  lib/Target/R600/AMDGPUISelLowering.cpp |  1 +
  lib/Target/R600/AMDGPUISelLowering.h   |  1 +
  lib/Target/R600/SIISelLowering.cpp | 39 
 ++
  lib/Target/R600/SIInstrInfo.td | 27 +++
  lib/Target/R600/SIInstructions.td  | 29 +
  lib/Target/R600/SIIntrinsics.td| 18 
  7 files changed, 113 insertions(+), 5 deletions(-)

 diff --git a/lib/Target/R600/AMDGPUCallingConv.td 
 b/lib/Target/R600/AMDGPUCallingConv.td
 index 84d3118..d26be32 100644
 --- a/lib/Target/R600/AMDGPUCallingConv.td
 +++ b/lib/Target/R600/AMDGPUCallingConv.td
 @@ -19,7 +19,8 @@ def CC_SI : CallingConv[

CCIfInRegCCIfType[f32, i32] , CCAssignToReg[
  SGPR0, SGPR1, SGPR2, SGPR3, SGPR4, SGPR5, SGPR6, SGPR7,
 -SGPR8, SGPR9, SGPR10, SGPR11, SGPR12, SGPR13, SGPR14, SGPR15
 +SGPR8, SGPR9, SGPR10, SGPR11, SGPR12, SGPR13, SGPR14, SGPR15,
 +SGPR16

 Why is this necessary?  Are we using all 16 user sgprs now?

],

CCIfInRegCCIfType[i64] , CCAssignToRegWithShadow
 diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp 
 b/lib/Target/R600/AMDGPUISelLowering.cpp
 index 1237323..30d9503 100644
 --- a/lib/Target/R600/AMDGPUISelLowering.cpp
 +++ b/lib/Target/R600/AMDGPUISelLowering.cpp
 @@ -718,5 +718,6 @@ const char* 
 AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
NODE_NAME_CASE(SAMPLED)
NODE_NAME_CASE(SAMPLEL)
NODE_NAME_CASE(STORE_MSKOR)
 +  NODE_NAME_CASE(TBUFFER_STORE_FORMAT)
}
  }
 diff --git a/lib/Target/R600/AMDGPUISelLowering.h 
 b/lib/Target/R600/AMDGPUISelLowering.h
 index 75ac4c2..8a68356 100644
 --- a/lib/Target/R600/AMDGPUISelLowering.h
 +++ b/lib/Target/R600/AMDGPUISelLowering.h
 @@ -160,6 +160,7 @@ enum {
FIRST_MEM_OPCODE_NUMBER = ISD::FIRST_TARGET_MEMORY_OPCODE,
STORE_MSKOR,
LOAD_CONSTANT,
 +  TBUFFER_STORE_FORMAT,
LAST_AMDGPU_ISD_NUMBER
  };

 diff --git a/lib/Target/R600/SIISelLowering.cpp 
 b/lib/Target/R600/SIISelLowering.cpp
 index f196059..6fa0c85 100644
 --- a/lib/Target/R600/SIISelLowering.cpp
 +++ b/lib/Target/R600/SIISelLowering.cpp
 @@ -86,6 +86,8 @@ SITargetLowering::SITargetLowering(TargetMachine TM) :
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v16i8, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v4f32, Custom);

 +  setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);
 +
setLoadExtAction(ISD::SEXTLOAD, MVT::i32, Expand);

setLoadExtAction(ISD::EXTLOAD, MVT::f32, Expand);
 @@ -462,6 +464,43 @@ SDValue SITargetLowering::LowerOperation(SDValue Op, 
 SelectionDAG DAG) const {
   Op.getOperand(3));
  }
}
 +
 +  case ISD::INTRINSIC_VOID:
 +SDValue Chain = Op.getOperand(0);
 +unsigned IntrinsicID = 
 castConstantSDNode(Op.getOperand(1))-getZExtValue();
 +
 +switch (IntrinsicID) {
 +  case AMDGPUIntrinsic::SI_tbuffer_store: {
 +SDLoc DL(Op);
 +SDValue Ops [] = {
 +  Chain,
 +  ResourceDescriptorToi128(Op.getOperand(2), DAG),
 +  Op.getOperand(3),
 +  Op.getOperand(4),
 +  Op.getOperand(5),
 +  Op.getOperand(6),
 +  Op.getOperand(7),
 +  Op.getOperand(8),
 +  Op.getOperand(9),
 +  Op.getOperand(10),
 +  Op.getOperand(11),
 +  Op.getOperand(12),
 +  Op.getOperand(13),
 +  Op.getOperand(14)
 +};
 +EVT VT = Op.getOperand(3).getValueType();
 +
 +MachineMemOperand *MMO = MF.getMachineMemOperand(
 +MachinePointerInfo(),
 +MachineMemOperand::MOStore,
 +VT.getSizeInBits() / 8, 4);
 +return DAG.getMemIntrinsicNode(AMDGPUISD::TBUFFER_STORE_FORMAT, DL,
 +   Op-getVTList(), Ops,
 +   sizeof(Ops)/sizeof(Ops[0]), VT, MMO);
 +  }
 +  default:
 +break;
 +}
}
return SDValue();
  }
 diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
 index ecc4718..c902feb 100644
 --- a/lib/Target/R600/SIInstrInfo.td
 +++ b/lib/Target/R600/SIInstrInfo.td
 @@ -21,6 +21,25 @@ def SIload_constant : SDNodeAMDGPUISD::LOAD_CONSTANT,
  

Re: [Mesa-dev] [PATCH 1/2] mesa: fix coding style of case statement

2013-09-05 Thread Brian Paul

On 09/05/2013 03:57 AM, Timothy Arceri wrote:


Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
---
  src/mesa/main/objectlabel.c |  124 +--
  1 file changed, 62 insertions(+), 62 deletions(-)

diff --git a/src/mesa/main/objectlabel.c b/src/mesa/main/objectlabel.c
index 90d9e09..7e39c92 100644
--- a/src/mesa/main/objectlabel.c
+++ b/src/mesa/main/objectlabel.c
@@ -117,78 +117,78 @@ get_label_pointer(struct gl_context *ctx, GLenum 
identifier, GLuint name,
 char **labelPtr = NULL;

 switch (identifier) {
-   case GL_BUFFER:
-  {
- struct gl_buffer_object *bufObj = _mesa_lookup_bufferobj(ctx, name);
- if (bufObj)
-labelPtr = bufObj-Label;
-  }
+   case GL_BUFFER: {
+  struct gl_buffer_object *bufObj = _mesa_lookup_bufferobj(ctx, name);
+  if (bufObj)
+ labelPtr = bufObj-Label;
break;
-   case GL_SHADER:
-  {
- struct gl_shader *shader = _mesa_lookup_shader(ctx, name);
- if (shader)
-labelPtr = shader-Label;
-  }
+   }
+
+   case GL_SHADER: {
+  struct gl_shader *shader = _mesa_lookup_shader(ctx, name);
+  if (shader)
+ labelPtr = shader-Label;
break;
-   case GL_PROGRAM:
-  {
- struct gl_shader_program *program =
-_mesa_lookup_shader_program(ctx, name);
- if (program)
-labelPtr = program-Label;
-  }
+   }
+
+   case GL_PROGRAM: {
+  struct gl_shader_program *program =
+ _mesa_lookup_shader_program(ctx, name);
+  if (program)
+ labelPtr = program-Label;
break;
-   case GL_VERTEX_ARRAY:
-  {
- struct gl_array_object *obj = _mesa_lookup_arrayobj(ctx, name);
- if (obj)
-labelPtr = obj-Label;
-  }
+   }
+
+   case GL_VERTEX_ARRAY: {
+  struct gl_array_object *obj = _mesa_lookup_arrayobj(ctx, name);
+  if (obj)
+ labelPtr = obj-Label;
break;
-   case GL_QUERY:
-  {
- struct gl_query_object *query = _mesa_lookup_query_object(ctx, name);
- if (query)
-labelPtr = query-Label;
-  }
+   }
+
+   case GL_QUERY: {
+  struct gl_query_object *query = _mesa_lookup_query_object(ctx, name);
+  if (query)
+ labelPtr = query-Label;
break;
-   case GL_TRANSFORM_FEEDBACK:
-  {
- struct gl_transform_feedback_object *tfo =
-_mesa_lookup_transform_feedback_object(ctx, name);
- if (tfo)
-labelPtr = tfo-Label;
-  }
+   }
+
+   case GL_TRANSFORM_FEEDBACK: {
+  struct gl_transform_feedback_object *tfo =
+ _mesa_lookup_transform_feedback_object(ctx, name);
+  if (tfo)
+ labelPtr = tfo-Label;
break;
-   case GL_SAMPLER:
-  {
- struct gl_sampler_object *so = _mesa_lookup_samplerobj(ctx, name);
- if (so)
-labelPtr = so-Label;
-  }
+   }
+
+   case GL_SAMPLER: {
+  struct gl_sampler_object *so = _mesa_lookup_samplerobj(ctx, name);
+  if (so)
+ labelPtr = so-Label;
break;
-   case GL_TEXTURE:
-  {
- struct gl_texture_object *texObj = _mesa_lookup_texture(ctx, name);
- if (texObj)
-labelPtr = texObj-Label;
-  }
+   }
+
+   case GL_TEXTURE: {
+  struct gl_texture_object *texObj = _mesa_lookup_texture(ctx, name);
+  if (texObj)
+ labelPtr = texObj-Label;
break;
-   case GL_RENDERBUFFER:
-  {
- struct gl_renderbuffer *rb = _mesa_lookup_renderbuffer(ctx, name);
- if (rb)
-labelPtr = rb-Label;
-  }
+   }
+
+   case GL_RENDERBUFFER: {
+  struct gl_renderbuffer *rb = _mesa_lookup_renderbuffer(ctx, name);
+  if (rb)
+ labelPtr = rb-Label;
break;
-   case GL_FRAMEBUFFER:
-  {
- struct gl_framebuffer *rb = _mesa_lookup_framebuffer(ctx, name);
- if (rb)
-labelPtr = rb-Label;
-  }
+   }
+
+   case GL_FRAMEBUFFER: {
+  struct gl_framebuffer *rb = _mesa_lookup_framebuffer(ctx, name);
+  if (rb)
+ labelPtr = rb-Label;
break;
+   }
+
 case GL_DISPLAY_LIST:
if (ctx-API == API_OPENGL_COMPAT) {
   struct gl_display_list *list = _mesa_lookup_list(ctx, name);



FWIW, I'm happy with the code as-is.  I wrote it that way.

-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] mesa: Don't allow glSamplerParameteriv(GL_TEXTURE_CUBE_MAP_SEAMLESS) in ES

2013-09-05 Thread Paul Berry
On 4 September 2013 11:29, Ian Romanick i...@freedesktop.org wrote:

 From: Ian Romanick ian.d.roman...@intel.com

 There is no GL_TEXTURE_CUBE_MAP_SEAMLESS in any version of OpenGL ES or
 in any extension that applies to OpenGL ES.  The same error check
 already occurs for glTexParameteri.

 Signed-off-by: Ian Romanick ian.d.roman...@intel.com
 Cc: Maxence Le Dore maxence.led...@gmail.com
 ---
  src/mesa/main/samplerobj.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/src/mesa/main/samplerobj.c b/src/mesa/main/samplerobj.c
 index 39cfcd0..c3b612c 100644
 --- a/src/mesa/main/samplerobj.c
 +++ b/src/mesa/main/samplerobj.c
 @@ -569,7 +569,8 @@ static GLuint
  set_sampler_cube_map_seamless(struct gl_context *ctx,
struct gl_sampler_object *samp, GLboolean
 param)
  {
 -   if (!ctx-Extensions.AMD_seamless_cubemap_per_texture)
 +   if (!_mesa_is_desktop_gl(ctx)
 +   || !ctx-Extensions.AMD_seamless_cubemap_per_texture)
return INVALID_PNAME;

 if (samp-CubeMapSeamless == param)
 --
 1.8.1.4


Should we add a similar check to these functions too?

- _mesa_GetSamplerParameteriv()
- _mesa_GetSamplerParameterfv()
- _mesa_GetSamplerParameterIiv()
- _mesa_GetSamplerParameterIuiv()
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 21/22] i965/gen7: Generalize gen7_vs_state in preparation for GS.

2013-09-05 Thread Kenneth Graunke

On 09/01/2013 07:45 AM, Paul Berry wrote:

On 29 August 2013 21:31, Kenneth Graunke wrote:

[snip]
  I definitely don't want to share portions of 3DSTATE_VS.


In an effort to help us make a more informed decision about this, I've
investigated the following questions:

1. Are there any historical commits that would have been simpler if we
had shared code between VS and FS in the way that we're currently
discussing sharing code between VS and GS?  Are there any that would
have been more complex?


These are good questions to be asking - thanks for looking into this.


Are there any historical commits where we made
mistakes because the code wasn't shared (e.g. changed one function but
forgot to change the other)?  I looked just at gen7, and I looked at
patches since the beginning of 2012.  I considered both the code sharing
that I've proposed in this patch as well as the counter-proposal to just
share the code for constant emission.

These two commits would have been helped by the code sharing I've
proposed; they would also have been helped if we just shared code for
constant emission:

e6893b9 (Set MOCS L3 cacheability for IVB/BYT (v2))
2273b65 (Change L3 MOCS of 3DSTATE_CONSTANT_VS/PS)



These two commits would have been helped by the code sharing I've
proposed, but not by sharing code for constant emission.

decc708 (Upload separate per-stage sampler state tables)
f5a690c (i965: Split sampler count variable to be per-stage)

In all of the above cases the savings would have been small--one diff
hunk instead of two.


In the context of the whole MOCS series, the savings are fairly minimal. 
 We had to set MOCS fields in 10 different places:


brw_emit_vertices
gen6_blorp_emit_vertices
gen7_blorp_emit_surface_state
gen7_blorp_emit_constant_ps
gen7_blorp_emit_depth_stencil_config
gen7_emit_depth_stencil_hiz
gen7_vs_state (1 line)
gen7_wm_state
gen7_update_texture_surface
gen7_update_renderbuffer_surface

If geometry shaders had landed first, we would have had to add MOCS to 
3DSTATE_GS as well.  This would have meant adding 1 additional line to a 
+32 -9 series.


If these are the only patches that really would have been affected in 
the last year, and the savings are 1-2 lines of code, I think that shows 
that the sharing is not significantly useful.



I didn't find any commits that would have been more complex under either
my or Eric's proposal.  I didn't find any instances of past mistakes
where we modified one function but not the other.



2. If we start sharing the code to populate the first 4 DWORDs of
3DSTATE_{VS,GS} now, is it likely that in the future we'll be able to
extend the code sharing to 3DSTATE_{HS,DS,PS}?  To answer this question,
I looked at the documentation for all of these commands in Gen7 (IVB and
HSW).  I found these differences:

- 3DSTATE_HS uses a completely different layout from the others; for
example its kernel start pointer isn't until dw3, whereas the other
stages use dw1.

- dw2[26] is denormal mode for PS; it's MBZ for VS, DS, and GS.

- dw2[17] is thread priority for HSW only on VS, DS, and PS; it's
thread priority for both HSW and IVB on GS.  I suspect this is a
documentation bug, and it's really meant to be HSW only for all stages.

- dw2[15:14] are rounding mode on PS only.

- Accesses UAV (a HSW-only bit) is on dw2[12] for VS and GS, dw2[14]
for DS, and dw4[5] for PS.  I suspect this bit is related to
ARB_shader_image_load_store functionality, but I'm not certain.

- dw2[11] is mask stack exception enable on GS and PS; it's MBZ on VS
and DS.

At the moment we never set any of the bits that differ between VS, DS,
GS, and PS, so we could share the code to populate dw0-3 between those 4
stages without having to make stage-specific exceptions.  It's possible
that we might have to make stage-specific exceptions in the future if we
ever decide to set some of those bits, but the only likely candidate
seems to be Accesses UAV.  It's pretty clear that we couldn't share
any code to populate dw0-3 of 3DSTATE_HS.


We will need to support UAVs eventually (they're an upcoming GL 
feature), but I don't know when or on what platforms.


Since 3DSTATE_HS isn't shareable, this means that we need to add 
3DSTATE_GS today, and 3DSTATE_DS in a little while.


I'm still not convinced that the added complexity is worth the benefit. 
 I would still like to see 3DSTATE_VS, 3DSTATE_GS emitted within a 
single function, in one single BEGIN...OUT...ADVANCE block.


--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] i965: Enable AMD_seamless_cubemap_per_texture

2013-09-05 Thread Paul Berry
On 4 September 2013 11:29, Ian Romanick i...@freedesktop.org wrote:

 From: Ian Romanick ian.d.roman...@intel.com

 The change is very small.  Do seamless filtering if either the context
 enable is set or the sampler enable is set.

 The AMD_seamless_cubemap_per_texture says:

 If TEXTURE_CUBE_MAP_SEAMLESS_ARB is emabled (sic) globally or the
 value of the texture's TEXTURE_CUBE_MAP_SEAMLESS_ARB parameter is
 TRUE, seamless cube map sampling is enabled...

 Signed-off-by: Ian Romanick ian.d.roman...@intel.com
 ---
  docs/relnotes/9.3.html   | 4 
  src/mesa/drivers/dri/i965/brw_wm_sampler_state.c | 2 +-
  src/mesa/drivers/dri/i965/gen7_sampler_state.c   | 2 +-
  src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
  4 files changed, 7 insertions(+), 2 deletions(-)


Patch 1 is:

Acked-by: Paul Berry stereotype...@gmail.com

I sent a comment on patch 2.

Patches 3-4 are:

Reviewed-by: Paul Berry stereotype...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] R600/SI: Merge offset0 and offset1 fields for single address DS instructions

2013-09-05 Thread Tom Stellard
From: Tom Stellard thomas.stell...@amd.com

Also remove unused data fields from the DS_Load_Helper class.
---
 lib/Target/R600/SIInstrInfo.td| 28 
 lib/Target/R600/SIInstructions.td |  6 +++---
 2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
index 09d5f01..292d650 100644
--- a/lib/Target/R600/SIInstrInfo.td
+++ b/lib/Target/R600/SIInstrInfo.td
@@ -339,13 +339,22 @@ class VOP3_64 bits9 op, string opName, listdag 
pattern : VOP3 
 // Vector I/O classes
 
//===--===//
 
-class DS_Load_Helper bits8 op, string asm, RegisterClass regClass : DS 
+class DS_1A bits8 op, dag outs, dag ins, string asm, listdag pat :
+DS op, outs, ins, asm, pat {
+  bits16 offset;
+
+  let offset0 = offset{7-0};
+  let offset1 = offset{15-8};
+}
+
+class DS_Load_Helper bits8 op, string asm, RegisterClass regClass : DS_1A 
   op,
   (outs regClass:$vdst),
-  (ins i1imm:$gds, VReg_32:$addr, VReg_32:$data0, VReg_32:$data1,
-   i8imm:$offset0, i8imm:$offset1),
-  asm# $vdst, $gds, $addr, $data0, $data1, $offset0, $offset1, [M0],
+  (ins i1imm:$gds, VReg_32:$addr, i16imm:$offset),
+  asm# $vdst, $gds, $addr, $offset, [M0],
   [] {
+  let data0 = 0;
+  let data1 = 0;
   let mayLoad = 1;
   let mayStore = 0;
 }
@@ -362,16 +371,19 @@ class DS_Store_Helper bits8 op, string asm, 
RegisterClass regClass : DS 
   let vdst = 0;
 }
 
-class DS_1A1D_RET bits8 op, string asm, RegisterClass rc : DS 
+class DS_1A1D_RET bits8 op, string asm, RegisterClass rc : DS_1A 
   op,
   (outs rc:$vdst),
-  (ins i1imm:$gds, VReg_32:$addr, VReg_32:$data0, i8imm:$offset0,
-   i8imm:$offset1),
-  asm# $gds, $vdst, $addr, $data0, $offset0, $offset1, [M0],
+  (ins i1imm:$gds, VReg_32:$addr, VReg_32:$data0, i16imm:$offset),
+  asm# $gds, $vdst, $addr, $data0, $offset, [M0],
   [] {
+  bits16 offset;
+
   let mayStore = 1;
   let mayLoad = 1;
   let data1 = 0;
+  let offset0 = offset{7-0};
+  let offset1 = offset{15-8};
 }
 
 class MTBUF_Store_Helper bits3 op, string asm, RegisterClass regClass : 
MTBUF 
diff --git a/lib/Target/R600/SIInstructions.td 
b/lib/Target/R600/SIInstructions.td
index 31a5ad2..94dcf2c 100644
--- a/lib/Target/R600/SIInstructions.td
+++ b/lib/Target/R600/SIInstructions.td
@@ -1754,7 +1754,7 @@ def : Pat 
 
 class DSReadPat DS inst, ValueType vt, PatFrag frag : Pat 
   (frag i32:$src0),
-  (vt (inst 0, $src0, $src0, $src0, 0, 0))
+  (vt (inst 0, $src0, 0))
 ;
 
 def : DSReadPat DS_READ_I8,  i32, sextloadi8_local;
@@ -1764,7 +1764,7 @@ def : DSReadPat DS_READ_U16, i32, az_extloadi16_local;
 def : DSReadPat DS_READ_B32, i32, local_load;
 def : Pat 
 (local_load i32:$src0),
-(i32 (DS_READ_B32 0, $src0, $src0, $src0, 0, 0))
+(i32 (DS_READ_B32 0, $src0, 0))
 ;
 
 class DSWritePat DS inst, ValueType vt, PatFrag frag : Pat 
@@ -1777,7 +1777,7 @@ def : DSWritePat DS_WRITE_B16, i32, truncstorei16_local;
 def : DSWritePat DS_WRITE_B32, i32, local_store;
 
 def : Pat (atomic_load_add_local i32:$ptr, i32:$val),
-   (DS_ADD_U32_RTN 0, $ptr, $val, 0, 0);
+   (DS_ADD_U32_RTN 0, $ptr, $val, 0);
 
 /** == **/
 /**   SMRD Patterns**/
-- 
1.8.1.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] R600/SI: Add isDS helper function to SIInstrInfo

2013-09-05 Thread Tom Stellard
From: Tom Stellard thomas.stell...@amd.com

---
 lib/Target/R600/SIDefines.h|  3 ++-
 lib/Target/R600/SIInstrFormats.td  |  3 +++
 lib/Target/R600/SIInstrInfo.cpp|  4 
 lib/Target/R600/SIInstrInfo.h  |  1 +
 lib/Target/R600/SILowerControlFlow.cpp | 12 +++-
 5 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/lib/Target/R600/SIDefines.h b/lib/Target/R600/SIDefines.h
index f5445ad..259145d 100644
--- a/lib/Target/R600/SIDefines.h
+++ b/lib/Target/R600/SIDefines.h
@@ -14,7 +14,8 @@
 namespace SIInstrFlags {
 enum {
   MIMG = 1  3,
-  SMRD = 1  4
+  SMRD = 1  4,
+  DS   = 1  5
 };
 }
 
diff --git a/lib/Target/R600/SIInstrFormats.td 
b/lib/Target/R600/SIInstrFormats.td
index 9576c05..59cc16b 100644
--- a/lib/Target/R600/SIInstrFormats.td
+++ b/lib/Target/R600/SIInstrFormats.td
@@ -19,12 +19,14 @@ class InstSI dag outs, dag ins, string asm, listdag 
pattern :
   field bits1 LGKM_CNT = 0;
   field bits1 MIMG = 0;
   field bits1 SMRD = 0;
+  field bits1 DS = 0;
 
   let TSFlags{0} = VM_CNT;
   let TSFlags{1} = EXP_CNT;
   let TSFlags{2} = LGKM_CNT;
   let TSFlags{3} = MIMG;
   let TSFlags{4} = SMRD;
+  let TSFlags{5} = DS;
 }
 
 class Enc32 dag outs, dag ins, string asm, listdag pattern :
@@ -308,6 +310,7 @@ class DS bits8 op, dag outs, dag ins, string asm, 
listdag pattern :
   let Inst{63-56} = vdst;
 
   let LGKM_CNT = 1;
+  let DS = 1;
 }
 
 class MUBUF bits7 op, dag outs, dag ins, string asm, listdag pattern :
diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp
index e23eef3..356cf24 100644
--- a/lib/Target/R600/SIInstrInfo.cpp
+++ b/lib/Target/R600/SIInstrInfo.cpp
@@ -232,6 +232,10 @@ int SIInstrInfo::isSMRD(uint16_t Opcode) const {
   return get(Opcode).TSFlags  SIInstrFlags::SMRD;
 }
 
+bool SIInstrInfo::isDS(uint16_t Opcode) const {
+  return get(Opcode).TSFlags  SIInstrFlags::DS;
+}
+
 
//===--===//
 // Indirect addressing callbacks
 
//===--===//
diff --git a/lib/Target/R600/SIInstrInfo.h b/lib/Target/R600/SIInstrInfo.h
index 87b8063..4ccd4ce 100644
--- a/lib/Target/R600/SIInstrInfo.h
+++ b/lib/Target/R600/SIInstrInfo.h
@@ -49,6 +49,7 @@ public:
   virtual bool isSafeToMoveRegClassDefs(const TargetRegisterClass *RC) const;
   int isMIMG(uint16_t Opcode) const;
   int isSMRD(uint16_t Opcode) const;
+  bool isDS(uint16_t Opcode) const;
 
   virtual int getIndirectIndexBegin(const MachineFunction MF) const;
 
diff --git a/lib/Target/R600/SILowerControlFlow.cpp 
b/lib/Target/R600/SILowerControlFlow.cpp
index a6c43bb..5a8836c 100644
--- a/lib/Target/R600/SILowerControlFlow.cpp
+++ b/lib/Target/R600/SILowerControlFlow.cpp
@@ -67,7 +67,7 @@ private:
 
   static char ID;
   const TargetRegisterInfo *TRI;
-  const TargetInstrInfo *TII;
+  const SIInstrInfo *TII;
 
   bool shouldSkip(MachineBasicBlock *From, MachineBasicBlock *To);
 
@@ -407,7 +407,7 @@ void SILowerControlFlowPass::IndirectDst(MachineInstr MI) {
 }
 
 bool SILowerControlFlowPass::runOnMachineFunction(MachineFunction MF) {
-  TII = MF.getTarget().getInstrInfo();
+  TII = static_castconst SIInstrInfo*(MF.getTarget().getInstrInfo());
   TRI = MF.getTarget().getRegisterInfo();
   SIMachineFunctionInfo *MFI = MF.getInfoSIMachineFunctionInfo();
 
@@ -425,6 +425,11 @@ bool 
SILowerControlFlowPass::runOnMachineFunction(MachineFunction MF) {
 
   Next = llvm::next(I);
   MachineInstr MI = *I;
+
+  if (TII-isDS(MI.getOpcode())) {
+NeedM0 = true;
+  }
+
   switch (MI.getOpcode()) {
 default: break;
 case AMDGPU::SI_IF:
@@ -487,9 +492,6 @@ bool 
SILowerControlFlowPass::runOnMachineFunction(MachineFunction MF) {
 case AMDGPU::DS_READ_B32:
   NeedWQM = true;
   // Fall through
-case AMDGPU::DS_WRITE_B32:
-case AMDGPU::DS_ADD_U32_RTN:
-  NeedM0 = true;
   break;
 
 case AMDGPU::V_INTERP_P1_F32:
-- 
1.8.1.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Update: UVD status on loongson 3a platform

2013-09-05 Thread Jerome Glisse
On Thu, Sep 05, 2013 at 10:14:32PM +0800, Chen Jie wrote:
 Hi all,
 
 This thread is about
 http://lists.freedesktop.org/archives/dri-devel/2013-April/037598.html.
 
 We recently find some interesting thing about UVD based playback on
 loongson 3a plaform, and also find a way to fix the problem.
 
 First, we find memcpy in [mesa]src/gallium/drivers/radeon/radeon_uvd.c
 caused the problem:
 * If memcpy is implemented though 16B or 8B load/store instructions,
 it will normally caused video mosaic. When insert a memcmp after the
 copying code in memcpy, it will report the src and dest are not equal.
 * If memcpy use 1B load/store instructions only, the memcmp after the
 copying code reports equal.
 
 Then we find the following changeset fixs out problem:
 
 diff --git a/src/gallium/drivers/radeon/radeon_uvd.c
 b/src/gallium/drivers/radeon/radeon_uvd.c
 index 2f98de2..f9599b6 100644
 --- a/src/gallium/drivers/radeon/radeon_uvd.c
 +++ b/src/gallium/drivers/radeon/radeon_uvd.c
 @@ -162,7 +162,7 @@ static bool create_buffer(struct ruvd_decoder *dec,
unsigned size)
  {
   buffer-buf = dec-ws-buffer_create(dec-ws, size, 4096, false,
 - RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM);
 + RADEON_DOMAIN_GTT);
   if (!buffer-buf)
   return false;
 
 The VRAM is mapped to an uncached area in out platform, so, my
 question is what could go wrong while using  4B load/store
 instructions in UVD workflow? Any idea?
 

How do you map the VRAM into user process mapping ? ie do you have
something like Intel PAT or something like MTRR or something else.

In other word, can you map into process address space a region of
io memory (GPU VRAM in this case) and mark it as uncached so that
none of the access to it goes through CPU cache.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa (git 20130828) fails to build on MIPS

2013-09-05 Thread Christophe Jarry
 According to tis
 https://lists.gnu.org/archive/html/bug-tar/2005-02/msg1.html error 141
 is 128+13 = SIGPIPE (broke pipe signal)
 And this may be relevant
 https://groups.google.com/forum/#!topic/golang-nuts/xjZ8jJx0IFw
 Check whether you are using right yacc and bison?

Thanks for your suggestions: I used SIGPIPE during the build and this gave me
Error 141. Without it, make gives me the following error:

make[2]: Entering directory `/usr/src/mesa/mesa-20130828/src/glsl'
/usr/lib/pkgusr/mkdir -p ../../src/glsl/glcpp
  LEX  glsl_lexer.cpp
  YACC glsl_parser.cpp
/usr/lib/pkgusr/mkdir -p ../../src/glsl/glcpp
  YACC glcpp/glcpp-parse.c
  LEX  glcpp/glcpp-lex.c
/bin/sh: line 1:  5657 Segmentation fault \
 flex -o glsl_lexer.cpp glsl_lexer.ll
make[2]: *** [glsl_lexer.cpp] Error 139
make[2]: *** Waiting for unfinished jobs
/bin/sh: line 1:  5673 Segmentation fault \
 flex -o glcpp/glcpp-lex.c glcpp/glcpp-lex.l
make[2]: *** [glcpp/glcpp-lex.c] Error 139
bison: m4 subprocess failed
make[2]: *** [glcpp/glcpp-parse.c] Error 1
bison: m4 subprocess failed
make[2]: *** [glsl_parser.cpp] Error 1
make[2]: Leaving directory `/usr/src/mesa/mesa-20130828/src/glsl'


My versions of Flex and Bison:

$ flex --version
2.5.37

$ bison --version
bison (GNU Bison) 2.7
[...]


According to mesa documentation (docs/install.html):

On GNU/Linux systems, flex and bison are used. Versions 2.5.35 and 2.4.1,
respectively, (or later) should work.


Obviously, it does not work. What may I do?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Update: UVD status on loongson 3a platform

2013-09-05 Thread Jerome Glisse
On Thu, Sep 05, 2013 at 03:29:52PM -0400, Jerome Glisse wrote:
 On Thu, Sep 05, 2013 at 10:14:32PM +0800, Chen Jie wrote:
  Hi all,
  
  This thread is about
  http://lists.freedesktop.org/archives/dri-devel/2013-April/037598.html.
  
  We recently find some interesting thing about UVD based playback on
  loongson 3a plaform, and also find a way to fix the problem.
  
  First, we find memcpy in [mesa]src/gallium/drivers/radeon/radeon_uvd.c
  caused the problem:
  * If memcpy is implemented though 16B or 8B load/store instructions,
  it will normally caused video mosaic. When insert a memcmp after the
  copying code in memcpy, it will report the src and dest are not equal.
  * If memcpy use 1B load/store instructions only, the memcmp after the
  copying code reports equal.
  
  Then we find the following changeset fixs out problem:
  
  diff --git a/src/gallium/drivers/radeon/radeon_uvd.c
  b/src/gallium/drivers/radeon/radeon_uvd.c
  index 2f98de2..f9599b6 100644
  --- a/src/gallium/drivers/radeon/radeon_uvd.c
  +++ b/src/gallium/drivers/radeon/radeon_uvd.c
  @@ -162,7 +162,7 @@ static bool create_buffer(struct ruvd_decoder *dec,
 unsigned size)
   {
buffer-buf = dec-ws-buffer_create(dec-ws, size, 4096, false,
  - RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM);
  + RADEON_DOMAIN_GTT);
if (!buffer-buf)
return false;
  
  The VRAM is mapped to an uncached area in out platform, so, my
  question is what could go wrong while using  4B load/store
  instructions in UVD workflow? Any idea?
  
 
 How do you map the VRAM into user process mapping ? ie do you have
 something like Intel PAT or something like MTRR or something else.
 
 In other word, can you map into process address space a region of
 io memory (GPU VRAM in this case) and mark it as uncached so that
 none of the access to it goes through CPU cache.
 
 Cheers,
 Jerome

Also it might be that you can't do write combining on your platform,
which would be a major drawback as it's assume by radeon userspace.
I would need to check the pcie specification, but write combining is
probably not mandatory meaning that your architecture might not have
it. This would explain why only memset with byte size copy works.

Don't think there is any easy way to work around that.

Cheers,
Jerome
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] R600: Don't use trans slot for instructions that read LDS source registers

2013-09-05 Thread Tom Stellard
From: Tom Stellard thomas.stell...@amd.com

This fixes some regressions in the piglit local memory store tests
introduced by recent commits which made the scheduler aware of the trans
slot.

It's not possible to test this using lit, because there is no way to
determine from the assembly dumps whether or not an instruction is in
the trans slot.

Even if this were possible, the test would be highly sensitive to
changes in the scheduler and might generate confusing false negatives.
---
 lib/Target/R600/R600InstrInfo.cpp| 17 +
 lib/Target/R600/R600InstrInfo.h  |  1 +
 lib/Target/R600/R600MachineScheduler.cpp |  5 +
 lib/Target/R600/R600Packetizer.cpp   |  5 +
 lib/Target/R600/R600RegisterInfo.td  | 10 +-
 5 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/lib/Target/R600/R600InstrInfo.cpp 
b/lib/Target/R600/R600InstrInfo.cpp
index 0e7cfb4..60a3f7d 100644
--- a/lib/Target/R600/R600InstrInfo.cpp
+++ b/lib/Target/R600/R600InstrInfo.cpp
@@ -204,6 +204,23 @@ bool R600InstrInfo::mustBeLastInClause(unsigned Opcode) 
const {
   }
 }
 
+bool R600InstrInfo::readsLDSSrcReg(const MachineInstr *MI) const {
+  if (!isALUInstr(MI-getOpcode())) {
+return false;
+  }
+  for (MachineInstr::const_mop_iterator I = MI-operands_begin(),
+E = MI-operands_end(); I != E; ++I) {
+if (!I-isReg() || !I-isUse() ||
+TargetRegisterInfo::isVirtualRegister(I-getReg())) {
+  continue;
+}
+if (AMDGPU::R600_LDS_SRC_REGRegClass.contains(I-getReg())) {
+  return true;
+}
+  }
+  return false;
+}
+
 int R600InstrInfo::getSrcIdx(unsigned Opcode, unsigned SrcNum) const {
   static const unsigned OpTable[] = {
 AMDGPU::OpName::src0,
diff --git a/lib/Target/R600/R600InstrInfo.h b/lib/Target/R600/R600InstrInfo.h
index 24cc43d..0d1ffc8 100644
--- a/lib/Target/R600/R600InstrInfo.h
+++ b/lib/Target/R600/R600InstrInfo.h
@@ -78,6 +78,7 @@ namespace llvm {
   bool usesTextureCache(const MachineInstr *MI) const;
 
   bool mustBeLastInClause(unsigned Opcode) const;
+  bool readsLDSSrcReg(const MachineInstr *MI) const;
 
   /// \returns The operand index for the given source number.  Legal values
   /// for SrcNum are 0, 1, and 2.
diff --git a/lib/Target/R600/R600MachineScheduler.cpp 
b/lib/Target/R600/R600MachineScheduler.cpp
index 0499dd5..f67ba89 100644
--- a/lib/Target/R600/R600MachineScheduler.cpp
+++ b/lib/Target/R600/R600MachineScheduler.cpp
@@ -314,6 +314,11 @@ R600SchedStrategy::AluKind 
R600SchedStrategy::getAluKind(SUnit *SU) const {
 if (regBelongsToClass(DestReg, AMDGPU::R600_Reg128RegClass))
   return AluT_XYZW;
 
+// LDS src registers cannot be used in the Trans slot.
+if (TII-readsLDSSrcReg(MI)) {
+  return AluT_XYZW;
+}
+
 return AluAny;
 
 }
diff --git a/lib/Target/R600/R600Packetizer.cpp 
b/lib/Target/R600/R600Packetizer.cpp
index 6c70052..ee256d5 100644
--- a/lib/Target/R600/R600Packetizer.cpp
+++ b/lib/Target/R600/R600Packetizer.cpp
@@ -272,6 +272,11 @@ public:
   return false;
 }
 
+// We cannot read LDS source registrs from the Trans slot.
+if (isTransSlot  TII-readsLDSSrcReg(MI)) {
+  return false;
+}
+
 CurrentPacketMIs.pop_back();
 return true;
   }
diff --git a/lib/Target/R600/R600RegisterInfo.td 
b/lib/Target/R600/R600RegisterInfo.td
index fa987cf..514427e 100644
--- a/lib/Target/R600/R600RegisterInfo.td
+++ b/lib/Target/R600/R600RegisterInfo.td
@@ -95,6 +95,12 @@ foreach Index = 448-480 in {
 
 // Special Registers
 
+def OQA : R600RegOQA, 219;
+def OQB : R600RegOQB, 220;
+def OQAP : R600RegOQAP, 221;
+def OQBP : R600RegOQAP, 222;
+def LDS_DIRECT_A : R600RegLDS_DIRECT_A, 223;
+def LDS_DIRECT_B : R600RegLDS_DIRECT_B, 224;
 def ZERO : R600Reg0.0, 248;
 def ONE : R600Reg1.0, 249;
 def NEG_ONE : R600Reg-1.0, 249;
@@ -115,7 +121,6 @@ def PRED_SEL_OFF: R600RegPred_sel_off, 0;
 def PRED_SEL_ZERO : R600RegPred_sel_zero, 2;
 def PRED_SEL_ONE : R600RegPred_sel_one, 3;
 def AR_X : R600RegAR.x, 0;
-def OQAP : R600RegOQAP, 221;
 
 def R600_ArrayBase : RegisterClass AMDGPU, [f32, i32], 32,
   (add (sequence ArrayBase%u, 448, 480));
@@ -130,6 +135,9 @@ let isAllocatable = 0 in {
 // XXX: Only use the X channel, until we support wider stack widths
 def R600_Addr : RegisterClass AMDGPU, [i32], 127, (add (sequence 
Addr%u_X, 0, 127));
 
+def R600_LDS_SRC_REG : RegisterClassAMDGPU, [i32], 32,
+  (add OQA, OQB, OQAP, OQBP, LDS_DIRECT_A, LDS_DIRECT_B);
+
 } // End isAllocatable = 0
 
 def R600_KC0_X : RegisterClass AMDGPU, [f32, i32], 32,
-- 
1.7.11.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: pad IBs to a multiple of 4 DWs on r6xx

2013-09-05 Thread Marek Olšák
Reviewed-by: Marek Olšák marek.ol...@amd.com

Though I'm not sure if 0x8000 is correct.

Marek

On Wed, Sep 4, 2013 at 11:55 PM, Alex Deucher alexdeuc...@gmail.com wrote:
 IBs need to be a multiple of 4 dwords on r6xx asics
 to avoid a hw bug.

 Signed-off-by: Alex Deucher alexander.deuc...@amd.com
 CC: 9.2 mesa-sta...@lists.freedesktop.org
 CC: 9.1 mesa-sta...@lists.freedesktop.org
 ---
  src/gallium/drivers/r600/r600_hw_context.c | 13 +
  1 file changed, 13 insertions(+)

 diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
 b/src/gallium/drivers/r600/r600_hw_context.c
 index 97b0f9c..0a219af 100644
 --- a/src/gallium/drivers/r600/r600_hw_context.c
 +++ b/src/gallium/drivers/r600/r600_hw_context.c
 @@ -347,6 +347,19 @@ void r600_context_flush(struct r600_context *ctx, 
 unsigned flags)
 flags |= RADEON_FLUSH_KEEP_TILING_FLAGS;
 }

 +   /* Pad the GFX CS to a multiple of 4 dwords on rv6xx
 +* to avoid a hw bug.
 +*/
 +   if (ctx-chip_class  R700) {
 +   unsigned i;
 +   unsigned padding_dw = 4 - cs-cdw % 4;
 +   if (padding_dw  4) {
 +   for (i = 0; i  padding_dw; i++) {
 +   cs-buf[cs-cdw++] = 0x8000;
 +   }
 +   }
 +   }
 +
 /* Flush the CS. */
 ctx-ws-cs_flush(ctx-rings.gfx.cs, flags, ctx-screen-cs_count++);
  }
 --
 1.8.3.1

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] i965: Enable AMD_seamless_cubemap_per_texture

2013-09-05 Thread Kenneth Graunke

On 09/04/2013 11:29 AM, Ian Romanick wrote:

From: Ian Romanick ian.d.roman...@intel.com

The change is very small.  Do seamless filtering if either the context
enable is set or the sampler enable is set.

The AMD_seamless_cubemap_per_texture says:

 If TEXTURE_CUBE_MAP_SEAMLESS_ARB is emabled (sic) globally or the
 value of the texture's TEXTURE_CUBE_MAP_SEAMLESS_ARB parameter is
 TRUE, seamless cube map sampling is enabled...

Signed-off-by: Ian Romanick ian.d.roman...@intel.com
---
  docs/relnotes/9.3.html   | 4 
  src/mesa/drivers/dri/i965/brw_wm_sampler_state.c | 2 +-
  src/mesa/drivers/dri/i965/gen7_sampler_state.c   | 2 +-
  src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
  4 files changed, 7 insertions(+), 2 deletions(-)


Chris actually did this a year and a half ago:
http://lists.freedesktop.org/archives/mesa-dev/2012-April/021267.html

Eric didn't seem to think it was useful at the time.

I'm fine with enabling it.

Patch 4 is:
Reviewed-by: Kenneth Graunke kenn...@whitecape.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] configure.ac: Add a more informative warning when libclc.pc is not found

2013-09-05 Thread Tom Stellard
From: Tom Stellard thomas.stell...@amd.com

---
 configure.ac | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index b19ab18..702a58b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1387,7 +1387,10 @@ if test x$enable_opencl = xyes; then
 fi
 
 if test x$LIBCLC_INCLUDEDIR == x || test x$LIBCLC_LIBEXECDIR == x; then
-AC_MSG_ERROR([pkg-config cannot use libclc.pc which is required to 
build clover])
+AC_MSG_ERROR([pkg-config cannot find libclc.pc which is required to 
build clover.
+Make sure the directory containing libclc.pc is specified 
in your
+PKG_CONFIG_PATH environment variable.
+By default libclc.pc is installed to 
/usr/local/share/pkgconfig/])
 fi
 
 GALLIUM_STATE_TRACKERS_DIRS=$GALLIUM_STATE_TRACKERS_DIRS clover
-- 
1.7.11.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] configure.ac: Add a more informative warning when libclc.pc is not found

2013-09-05 Thread Matt Turner
On Thu, Sep 5, 2013 at 4:27 PM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 ---
  configure.ac | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

 diff --git a/configure.ac b/configure.ac
 index b19ab18..702a58b 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -1387,7 +1387,10 @@ if test x$enable_opencl = xyes; then
  fi

  if test x$LIBCLC_INCLUDEDIR == x || test x$LIBCLC_LIBEXECDIR == x; 
 then
 -AC_MSG_ERROR([pkg-config cannot use libclc.pc which is required to 
 build clover])
 +AC_MSG_ERROR([pkg-config cannot find libclc.pc which is required to 
 build clover.
 +Make sure the directory containing libclc.pc is 
 specified in your
 +PKG_CONFIG_PATH environment variable.
 +By default libclc.pc is installed to 
 /usr/local/share/pkgconfig/])
  fi

  GALLIUM_STATE_TRACKERS_DIRS=$GALLIUM_STATE_TRACKERS_DIRS clover
 --
 1.7.11.4

Just responding to this because it's kind of relevant:

Is it possible to use PKG_CHECK_EXISTS and PKG_CHECK_MODULES instead
of calling pkg-config directly? Users who don't have libclc (and
didn't request it) currently see something that looks like an error,
twice no less.

Package libclc was not found in the pkg-config search path.
Perhaps you should add the directory containing `libclc.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libclc' found
Package libclc was not found in the pkg-config search path.
Perhaps you should add the directory containing `libclc.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libclc' found
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] i965: Increase the size of brw_stage_state::surf_offset.

2013-09-05 Thread Kenneth Graunke
Since BRW_MAX_WM_SURFACES is greater than BRW_MAX_VEC4_SURFACES, the
existing array isn't large enough to be used by the WM.  Increasing it
will make it possible to share them.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_context.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 97c66ab..87bcd3c 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -838,7 +838,7 @@ struct brw_stage_state
 
/* Binding table: pointers to SURFACE_STATE entries. */
uint32_t bind_bo_offset;
-   uint32_t surf_offset[BRW_MAX_VEC4_SURFACES];
+   uint32_t surf_offset[BRW_MAX_WM_SURFACES];
 
/** SAMPLER_STATE count and table offset */
uint32_t sampler_count;
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] i965: Use brw_stage_state for WM data as well.

2013-09-05 Thread Kenneth Graunke
This gets the VS, GS, and PS all using the same data structure.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_context.h   | 28 +---
 src/mesa/drivers/dri/i965/brw_draw.c  |  3 +-
 src/mesa/drivers/dri/i965/brw_fs.cpp  |  4 +--
 src/mesa/drivers/dri/i965/brw_misc_state.c|  6 ++--
 src/mesa/drivers/dri/i965/brw_vtbl.c  |  2 +-
 src/mesa/drivers/dri/i965/brw_wm.c|  8 ++---
 src/mesa/drivers/dri/i965/brw_wm_sampler_state.c  |  6 ++--
 src/mesa/drivers/dri/i965/brw_wm_state.c  | 28 
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c  | 40 +++
 src/mesa/drivers/dri/i965/gen6_sampler_state.c|  2 +-
 src/mesa/drivers/dri/i965/gen6_wm_state.c | 14 
 src/mesa/drivers/dri/i965/gen7_wm_state.c | 15 +
 src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |  8 ++---
 13 files changed, 71 insertions(+), 93 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 87bcd3c..4a8b0dd 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1220,43 +1220,17 @@ struct brw_context
} sf;
 
struct {
+  struct brw_stage_state base;
   struct brw_wm_prog_data *prog_data;
 
   GLuint render_surf;
 
-  drm_intel_bo *scratch_bo;
-
   /**
* Buffer object used in place of multisampled null render targets on
* Gen6.  See brw_update_null_renderbuffer_surface().
*/
   drm_intel_bo *multisampled_null_render_target_bo;
 
-  /** Offset in the program cache to the WM program */
-  uint32_t prog_offset;
-
-  uint32_t state_offset; /* offset in batchbuffer to pre-gen6 WM state */
-
-  drm_intel_bo *const_bo; /* pull constant buffer. */
-  /**
-   * This is offset in the batch to the push constants on gen6.
-   *
-   * Pre-gen6, push constants live in the CURBE.
-   */
-  uint32_t push_const_offset;
-
-  /** Binding table of pointers to surf_bo entries */
-  uint32_t bind_bo_offset;
-  uint32_t surf_offset[BRW_MAX_WM_SURFACES];
-
-  /** SAMPLER_STATE count and table offset */
-  uint32_t sampler_count;
-  uint32_t sampler_offset;
-
-  /** Offsets in the batch to sampler default colors (texture border color)
-   */
-  uint32_t sdc_offset[BRW_MAX_TEX_UNIT];
-
   struct {
  struct ra_regs *regs;
 
diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index 37f5e38..42f2685 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -333,7 +333,8 @@ static bool brw_try_draw_prims( struct gl_context *ctx,
 * won't work since ARB programs use the texture unit number as the sampler
 * index.
 */
-   brw-wm.sampler_count = 
_mesa_fls(ctx-FragmentProgram._Current-Base.SamplersUsed);
+   brw-wm.base.sampler_count =
+  _mesa_fls(ctx-FragmentProgram._Current-Base.SamplersUsed);
brw-gs.base.sampler_count = ctx-GeometryProgram._Current ?
   _mesa_fls(ctx-GeometryProgram._Current-Base.SamplersUsed) : 0;
brw-vs.base.sampler_count =
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 96cb2ee..daa23b4 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3186,12 +3186,12 @@ brw_fs_precompile(struct gl_context *ctx, struct 
gl_shader_program *prog)
 
key.program_string_id = bfp-id;
 
-   uint32_t old_prog_offset = brw-wm.prog_offset;
+   uint32_t old_prog_offset = brw-wm.base.prog_offset;
struct brw_wm_prog_data *old_prog_data = brw-wm.prog_data;
 
bool success = do_wm_prog(brw, prog, bfp, key);
 
-   brw-wm.prog_offset = old_prog_offset;
+   brw-wm.base.prog_offset = old_prog_offset;
brw-wm.prog_data = old_prog_data;
 
return success;
diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c 
b/src/mesa/drivers/dri/i965/brw_misc_state.c
index 16a41cc..a951493 100644
--- a/src/mesa/drivers/dri/i965/brw_misc_state.c
+++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
@@ -81,7 +81,7 @@ static void upload_binding_table_pointers(struct brw_context 
*brw)
OUT_BATCH(0); /* gs */
OUT_BATCH(0); /* clip */
OUT_BATCH(0); /* sf */
-   OUT_BATCH(brw-wm.bind_bo_offset);
+   OUT_BATCH(brw-wm.base.bind_bo_offset);
ADVANCE_BATCH();
 }
 
@@ -115,7 +115,7 @@ static void upload_gen6_binding_table_pointers(struct 
brw_context *brw)
 (4 - 2));
OUT_BATCH(brw-vs.base.bind_bo_offset); /* vs */
OUT_BATCH(brw-ff_gs.bind_bo_offset); /* gs */
-   OUT_BATCH(brw-wm.bind_bo_offset); /* wm/ps */
+   OUT_BATCH(brw-wm.base.bind_bo_offset); /* wm/ps */
ADVANCE_BATCH();
 }
 
@@ -161,7 +161,7 @@ static void upload_pipelined_state_pointers(struct 
brw_context *brw )
OUT_RELOC(brw-batch.bo, I915_GEM_DOMAIN_INSTRUCTION, 0,
 

[Mesa-dev] [PATCH 1/3] i965: Add comments to the new brw_state_state structure's fields.

2013-09-05 Thread Kenneth Graunke
These are largely based on the similar fields in brw-wm.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_context.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 57f086b..97c66ab 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -821,15 +821,22 @@ struct brw_query_object {
  */
 struct brw_stage_state
 {
+   /** Scratch buffer */
drm_intel_bo *scratch_bo;
+
+   /** Pull constant buffer */
drm_intel_bo *const_bo;
+
/** Offset in the program cache to the program */
uint32_t prog_offset;
+
+   /** Offset in the batchbuffer to Gen4-5 pipelined state (VS/WM/GS_STATE). */
uint32_t state_offset;
 
uint32_t push_const_offset; /* Offset in the batchbuffer */
int push_const_size; /* in 256-bit register increments */
 
+   /* Binding table: pointers to SURFACE_STATE entries. */
uint32_t bind_bo_offset;
uint32_t surf_offset[BRW_MAX_VEC4_SURFACES];
 
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/15] mesa: Implement KHR_debug ObjectLabel functions

2013-09-05 Thread Ian Romanick
On 09/05/2013 09:57 AM, Brian Paul wrote:
 On 09/04/2013 09:09 PM, Ian Romanick wrote:
 
 In the mean time, we should land Brian's patch to fix 'make check'.
 
 Does that imply your R-b?

Yeah.  I meant to go ahead and reply (again) to the patch.  It was
getting late, and I forgot.

 -Brian
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] mesa: Don't allow glSamplerParameteriv(GL_TEXTURE_CUBE_MAP_SEAMLESS) in ES

2013-09-05 Thread Ian Romanick
On 09/05/2013 11:32 AM, Paul Berry wrote:
 On 4 September 2013 11:29, Ian Romanick i...@freedesktop.org
 mailto:i...@freedesktop.org wrote:
 
 From: Ian Romanick ian.d.roman...@intel.com
 mailto:ian.d.roman...@intel.com
 
 There is no GL_TEXTURE_CUBE_MAP_SEAMLESS in any version of OpenGL ES or
 in any extension that applies to OpenGL ES.  The same error check
 already occurs for glTexParameteri.
 
 Signed-off-by: Ian Romanick ian.d.roman...@intel.com
 mailto:ian.d.roman...@intel.com
 Cc: Maxence Le Dore maxence.led...@gmail.com
 mailto:maxence.led...@gmail.com
 ---
  src/mesa/main/samplerobj.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/src/mesa/main/samplerobj.c b/src/mesa/main/samplerobj.c
 index 39cfcd0..c3b612c 100644
 --- a/src/mesa/main/samplerobj.c
 +++ b/src/mesa/main/samplerobj.c
 @@ -569,7 +569,8 @@ static GLuint
  set_sampler_cube_map_seamless(struct gl_context *ctx,
struct gl_sampler_object *samp,
 GLboolean param)
  {
 -   if (!ctx-Extensions.AMD_seamless_cubemap_per_texture)
 +   if (!_mesa_is_desktop_gl(ctx)
 +   || !ctx-Extensions.AMD_seamless_cubemap_per_texture)
return INVALID_PNAME;
 
 if (samp-CubeMapSeamless == param)
 --
 1.8.1.4
 
 
 Should we add a similar check to these functions too?
 
 - _mesa_GetSamplerParameteriv()
 - _mesa_GetSamplerParameterfv()
 - _mesa_GetSamplerParameterIiv()
 - _mesa_GetSamplerParameterIuiv()

Yes.  I forgot about the getters.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: Don't return any data for GL_SHADER_BINARY_FORMATS

2013-09-05 Thread Ian Romanick
From: Ian Romanick ian.d.roman...@intel.com

We return 0 for GL_NUM_SHADER_BINARY_FORMATS, so
GL_SHADER_BINARY_FORMATS should not write any data to the application
buffer.

Fixes piglit test 'arb_get_program_binary-overrun shader'.

Signed-off-by: Ian Romanick ian.d.roman...@intel.com
---
 src/mesa/main/get_hash_params.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/get_hash_params.py b/src/mesa/main/get_hash_params.py
index 30855c3..cd75944 100644
--- a/src/mesa/main/get_hash_params.py
+++ b/src/mesa/main/get_hash_params.py
@@ -308,7 +308,7 @@ descriptor=[
   [ MAX_VERTEX_UNIFORM_VECTORS, LOC_CUSTOM, TYPE_INT, 0, 
extra_ARB_ES2_compatibility_api_es2 ],
   [ MAX_FRAGMENT_UNIFORM_VECTORS, LOC_CUSTOM, TYPE_INT, 0, 
extra_ARB_ES2_compatibility_api_es2 ],
   [ NUM_SHADER_BINARY_FORMATS, CONST(0), 
extra_ARB_ES2_compatibility_api_es2 ],
-  [ SHADER_BINARY_FORMATS, CONST(0), extra_ARB_ES2_compatibility_api_es2 ],
+  [ SHADER_BINARY_FORMATS, LOC_CUSTOM, TYPE_INVALID, 0, 
extra_ARB_ES2_compatibility_api_es2 ],
 
 # GL_ARB_get_program_binary / GL_OES_get_program_binary
   [ NUM_PROGRAM_BINARY_FORMATS, CONST(0), NO_EXTRA ],
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] configure.ac: Add a more informative warning when libclc.pc is not found v2

2013-09-05 Thread Tom Stellard
From: Tom Stellard thomas.stell...@amd.com

v2:
  - Don't display an error message when the user doesn't ask for libclc.
---
 configure.ac | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/configure.ac b/configure.ac
index b19ab18..fcfa4f7 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1372,10 +1372,7 @@ AC_ARG_WITH([clang-libdir],
[CLANG_LIBDIR=$withval],
[CLANG_LIBDIR=])
 
-LIBCLC_INCLUDEDIR=`pkg-config --variable=includedir libclc`
-LIBCLC_LIBEXECDIR=`pkg-config --variable=libexecdir libclc`
-AC_SUBST([LIBCLC_INCLUDEDIR])
-AC_SUBST([LIBCLC_LIBEXECDIR])
+PKG_CHECK_EXISTS([libclc], [have_libclc=yes], [have_libclc=no])
 
 if test x$enable_opencl = xyes; then
 if test x$with_gallium_drivers = x; then
@@ -1386,8 +1383,16 @@ if test x$enable_opencl = xyes; then
 AC_MSG_ERROR([gcc = 4.6 is required to build clover])
 fi
 
-if test x$LIBCLC_INCLUDEDIR == x || test x$LIBCLC_LIBEXECDIR == x; then
-AC_MSG_ERROR([pkg-config cannot use libclc.pc which is required to 
build clover])
+if test x$have_libclc = xno; then
+AC_MSG_ERROR([pkg-config cannot find libclc.pc which is required to 
build clover.
+Make sure the directory containing libclc.pc is specified 
in your
+PKG_CONFIG_PATH environment variable.
+By default libclc.pc is installed to 
/usr/local/share/pkgconfig/])
+else
+LIBCLC_INCLUDEDIR=`pkg-config --variable=includedir libclc`
+LIBCLC_LIBEXECDIR=`pkg-config --variable=libexecdir libclc`
+AC_SUBST([LIBCLC_INCLUDEDIR])
+AC_SUBST([LIBCLC_LIBEXECDIR])
 fi
 
 GALLIUM_STATE_TRACKERS_DIRS=$GALLIUM_STATE_TRACKERS_DIRS clover
-- 
1.7.11.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 18/21] glsl: Write a new built-in function module.

2013-09-05 Thread Kenneth Graunke

On 09/05/2013 04:20 AM, Pohjolainen, Topi wrote:

On Wed, Sep 04, 2013 at 03:22:41PM -0700, Kenneth Graunke wrote:

[snip]

+/**
+ * builtin_builder: A singleton object representing the core of the built-in
+ * function module.
+ *
+ * It has code to generate
+ * It generates IR for every built-in function signature, and organizes them
+ * into functions.


I guess there are some leftovers here in the comment (or something missing
perhaps).


Whoops.  Thanks!  I deleted the first partial sentence.

--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: Add missing type inference for ir_binop_bfm.

2013-09-05 Thread Kenneth Graunke
Matt noticed that this was missing.  Nothing uses this currently.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/glsl/ir.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp
index 8769c32..b654950 100644
--- a/src/glsl/ir.cpp
+++ b/src/glsl/ir.cpp
@@ -400,6 +400,7 @@ ir_expression::ir_expression(int op, ir_rvalue *op0, 
ir_rvalue *op1)
 
case ir_binop_lshift:
case ir_binop_rshift:
+   case ir_binop_bfm:
   this-type = op0-type;
   break;
 
-- 
1.8.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 16/21] glsl: Add IR builder shortcuts for a bunch of random opcodes.

2013-09-05 Thread Kenneth Graunke

On 09/04/2013 07:11 PM, Matt Turner wrote:

On Wed, Sep 4, 2013 at 3:22 PM, Kenneth Graunke kenn...@whitecape.org wrote:

[snip]

Name these arguments x/y/a to match GLSL mix()?

Also, add fma() maybe?


Good idea.  Done.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] build: Delete cross-compiling macros.

2013-09-05 Thread Matt Turner
Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] build: Delete cross-compiling macros.

2013-09-05 Thread Kenneth Graunke
Now that builtin_compiler is gone, nothing uses these.

Cc: Matt Turner matts...@gmail.com
---
 configure.ac|  31 ---
 m4/ax_prog_cc_for_build.m4  | 125 
 m4/ax_prog_cxx_for_build.m4 | 109 --
 3 files changed, 265 deletions(-)
 delete mode 100644 m4/ax_prog_cc_for_build.m4
 delete mode 100644 m4/ax_prog_cxx_for_build.m4

diff --git a/configure.ac b/configure.ac
index 382f26f..7731a99 100644
--- a/configure.ac
+++ b/configure.ac
@@ -45,9 +45,7 @@ LIBKMS_XORG_REQUIRED=1.0.0
 dnl Check for progs
 AC_PROG_CPP
 AC_PROG_CC
-AX_PROG_CC_FOR_BUILD
 AC_PROG_CXX
-AX_PROG_CXX_FOR_BUILD
 AM_PROG_CC_C_O
 AM_PROG_AS
 AC_CHECK_PROGS([MAKE], [gmake make])
@@ -142,21 +140,6 @@ dnl Cache LDFLAGS and CPPFLAGS so we can add to them and 
restore later
 _SAVE_LDFLAGS=$LDFLAGS
 _SAVE_CPPFLAGS=$CPPFLAGS
 
-dnl build host compiler macros
-DEFINES_FOR_BUILD=
-AC_SUBST([DEFINES_FOR_BUILD])
-case $build_os in
-linux*|*-gnu*|gnu*)
-DEFINES_FOR_BUILD=$DEFINES_FOR_BUILD -D_GNU_SOURCE
-;;
-solaris*)
-DEFINES_FOR_BUILD=$DEFINES_FOR_BUILD -DSVR4
-;;
-cygwin*)
-DEFINES_FOR_BUILD=$DEFINES_FOR_BUILD
-;;
-esac
-
 dnl Compiler macros
 DEFINES=
 AC_SUBST([DEFINES])
@@ -179,7 +162,6 @@ if test x$GCC = xyes; then
 CFLAGS=$CFLAGS -Wall -std=gnu99
 ;;
 *)
-CFLAGS_FOR_BUILD=$CFLAGS_FOR_BUILD -Wall -std=c99
 CFLAGS=$CFLAGS -Wall -std=c99
 ;;
 esac
@@ -209,16 +191,13 @@ if test x$GCC = xyes; then
 CFLAGS=$save_CFLAGS
 
 # Work around aliasing bugs - developers should comment this out
-CFLAGS_FOR_BUILD=$CFLAGS_FOR_BUILD -fno-strict-aliasing
 CFLAGS=$CFLAGS -fno-strict-aliasing
 
 # gcc's builtin memcmp is slower than glibc's
 # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
-CFLAGS_FOR_BUILD=$CFLAGS_FOR_BUILD -fno-builtin-memcmp
 CFLAGS=$CFLAGS -fno-builtin-memcmp
 fi
 if test x$GXX = xyes; then
-CXXFLAGS_FOR_BUILD=$CXXFLAGS_FOR_BUILD -Wall
 CXXFLAGS=$CXXFLAGS -Wall
 
 # Enable -fvisibility=hidden if using a gcc that supports it
@@ -235,12 +214,10 @@ if test x$GXX = xyes; then
 CXXFLAGS=$save_CXXFLAGS
 
 # Work around aliasing bugs - developers should comment this out
-CXXFLAGS_FOR_BUILD=$CXXFLAGS_FOR_BUILD -fno-strict-aliasing
 CXXFLAGS=$CXXFLAGS -fno-strict-aliasing
 
 # gcc's builtin memcmp is slower than glibc's
 # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
-CXXFLAGS_FOR_BUILD=$CXXFLAGS_FOR_BUILD -fno-builtin-memcmp
 CXXFLAGS=$CXXFLAGS -fno-builtin-memcmp
 fi
 
@@ -315,14 +292,6 @@ AC_ARG_ENABLE([debug],
 [enable_debug=no]
 )
 if test x$enable_debug = xyes; then
-DEFINES_FOR_BUILD=$DEFINES_FOR_BUILD -DDEBUG
-if test x$GCC_FOR_BUILD = xyes; then
-CFLAGS_FOR_BUILD=$CFLAGS_FOR_BUILD -g -O0
-fi
-if test x$GXX_FOR_BUILD = xyes; then
-CXXFLAGS_FOR_BUILD=$CXXFLAGS_FOR_BUILD -g -O0
-fi
-
 DEFINES=$DEFINES -DDEBUG
 if test x$GCC = xyes; then
 CFLAGS=$CFLAGS -g -O0
diff --git a/m4/ax_prog_cc_for_build.m4 b/m4/ax_prog_cc_for_build.m4
deleted file mode 100644
index 6369809..000
--- a/m4/ax_prog_cc_for_build.m4
+++ /dev/null
@@ -1,125 +0,0 @@
-# ===
-#   http://www.gnu.org/software/autoconf-archive/ax_prog_cc_for_build.html
-# ===
-#
-# SYNOPSIS
-#
-#   AX_PROG_CC_FOR_BUILD
-#
-# DESCRIPTION
-#
-#   This macro searches for a C compiler that generates native executables,
-#   that is a C compiler that surely is not a cross-compiler. This can be
-#   useful if you have to generate source code at compile-time like for
-#   example GCC does.
-#
-#   The macro sets the CC_FOR_BUILD and CPP_FOR_BUILD macros to anything
-#   needed to compile or link (CC_FOR_BUILD) and preprocess (CPP_FOR_BUILD).
-#   The value of these variables can be overridden by the user by specifying
-#   a compiler with an environment variable (like you do for standard CC).
-#
-#   It also sets BUILD_EXEEXT and BUILD_OBJEXT to the executable and object
-#   file extensions for the build platform, and GCC_FOR_BUILD to `yes' if
-#   the compiler we found is GCC. All these variables but GCC_FOR_BUILD are
-#   substituted in the Makefile.
-#
-# LICENSE
-#
-#   Copyright (c) 2008 Paolo Bonzini bonz...@gnu.org
-#
-#   Copying and distribution of this file, with or without modification, are
-#   permitted in any medium without royalty provided the copyright notice
-#   and this notice are preserved. This file is offered as-is, without any
-#   warranty.
-
-#serial 5
-
-AU_ALIAS([AC_PROG_CC_FOR_BUILD], [AX_PROG_CC_FOR_BUILD])
-AC_DEFUN([AX_PROG_CC_FOR_BUILD], [dnl
-AC_REQUIRE([AC_PROG_CC])dnl
-AC_REQUIRE([AC_PROG_CPP])dnl
-AC_REQUIRE([AC_EXEEXT])dnl
-AC_REQUIRE([AC_CANONICAL_SYSTEM])dnl
-
-dnl Use the standard macros, but make them use other 

Re: [Mesa-dev] [PATCH] glsl: Add missing type inference for ir_binop_bfm.

2013-09-05 Thread Matt Turner
Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] configure.ac: Add a more informative warning when libclc.pc is not found v2

2013-09-05 Thread Matt Turner
Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] Move nv30, nv50 and nvc0 to nouveau.

2013-09-05 Thread Johannes Obermayr
---

Sorry for annoying the mailing list but ...

irc_dri-devel
[Dienstag, 20. August 2013] [21:23:56] jobermayr  calim: Would you accept 
such a patch: https://github.com/jobermayr/mesa/commit/b859d1d
[Dienstag, 20. August 2013] [21:56:05] calim  jobermayr: what's that good for 
?
[Dienstag, 20. August 2013] [21:56:33] calim  ah, you moved everything into a 
nouveau subdir
[Dienstag, 20. August 2013] [21:59:42] calim  hm, I don't care, doesn't 
really have an effect other than requiring more key presses to reach the driver 
dir
key_statement
[Dienstag, 20. August 2013] [21:59:58] calim  so, I'd accept it
/key_statement
[Dienstag, 20. August 2013] [22:01:00] calim  but you remove the ability to 
not build nv30 support ...
[Dienstag, 20. August 2013] [22:02:45] calim  I mean, you could have kept the 
separate libnvXX.a
note_from_today
Depending targets (dri-nouveau, egl-static, pipe-loader, vdpau-nouveau, 
xorg-nouveau and xmvc-nouveau) require nv30_screen_create, nv50_screen_create 
and nvc0_screen_create in nouveau_drm_screen_create (libnouveaudrm.la). So it 
is not possible not to build nv30 and since all three former libnvXX.la are 
required it makes sense to build only one libnouveau.la ...
/note_from_today
[Dienstag, 20. August 2013] [22:38:05] jobermayr  calim: It only builds 
one libnouveau library, a bit faster compile times on -jX and all things which 
go into it are better structured
/irc_dri-devel

email_in_german
Am Dienstag, 20. August 2013, 23:27:59 schrieb Johannes Obermayr an Christoph 
Bumiller:
 Hallo Christoph,
 
 anbei der Patch zur Umstrukturierung (entpackt ~ 4 MB, deshalb nicht an die 
 Liste ...).
 
 Falls mal aboll's und mein Wunsch in Erfüllung gehen sollte und wir die 
 Shared-Libs-Patches einspielen dürfen, müssen dann in libnouveau.so nur die 
 drei *_screen_create Symbole freigegeben werden.
 
 Wie vorhin auf der Liste angekündigt gibt es einen kleinen 
 Geschwindigkeitsbonus beim Kompilieren obendrein 

 Gruß
 Johannes
/email_in_german

irc_dri-devel
[Sonntag, 1. September 2013] [23:23:37] jobermayr calim: This commit also 
contains whiteline and new blank line at EOF fixes: 
https://github.com/jobermayr/mesa/commit/5a677fc . Is it sth. you will push to 
master or must I maintain it in my branch?
[Donnerstag, 5. September 2013] [17:56:33] jobermayr_ calim: What about 
pushing https://github.com/jobermayr/mesa/commit/def1781 and for 9.2: 
https://github.com/jobermayr/mesa/commit/03073db ? Don't you accept it anymore?
/irc_dri-devel

general_question
Why is it so difficult to get an agreed patch in master?
/general_question

---
 configure.ac   |5 +-
 src/gallium/Android.mk |5 +-
 src/gallium/drivers/Makefile.am|2 +-
 src/gallium/drivers/nouveau/Android.mk |8 +-
 src/gallium/drivers/nouveau/Makefile.am|   14 +-
 src/gallium/drivers/nouveau/Makefile.sources   |   91 +
 src/gallium/drivers/nouveau/codegen/nv50_ir.cpp| 1231 
 src/gallium/drivers/nouveau/codegen/nv50_ir.h  | 1197 
 src/gallium/drivers/nouveau/codegen/nv50_ir_bb.cpp |  550 
 .../drivers/nouveau/codegen/nv50_ir_build_util.cpp |  614 
 .../drivers/nouveau/codegen/nv50_ir_build_util.h   |  324 +++
 .../drivers/nouveau/codegen/nv50_ir_driver.h   |  220 ++
 .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 1682 +++
 .../drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp  | 1962 +
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 2988 
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 2852 +++
 .../drivers/nouveau/codegen/nv50_ir_graph.cpp  |  436 +++
 .../drivers/nouveau/codegen/nv50_ir_graph.h|  228 ++
 .../drivers/nouveau/codegen/nv50_ir_inlines.h  |  420 +++
 .../nouveau/codegen/nv50_ir_lowering_nv50.cpp  | 1101 
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  | 1597 +++
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 2464 
 .../drivers/nouveau/codegen/nv50_ir_print.cpp  |  698 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 2050 ++
 .../drivers/nouveau/codegen/nv50_ir_ssa.cpp|  552 
 .../drivers/nouveau/codegen/nv50_ir_target.cpp |  469 +++
 .../drivers/nouveau/codegen/nv50_ir_target.h   |  235 ++
 .../nouveau/codegen/nv50_ir_target_nv50.cpp|  552 
 .../drivers/nouveau/codegen/nv50_ir_target_nv50.h  |   72 +
 .../nouveau/codegen/nv50_ir_target_nvc0.cpp|  604 
 .../drivers/nouveau/codegen/nv50_ir_target_nvc0.h  |   74 +
 .../drivers/nouveau/codegen/nv50_ir_util.cpp   |  390 +++
 src/gallium/drivers/nouveau/codegen/nv50_ir_util.h |  788 ++
 .../drivers/nouveau/codegen/target_lib_nvc0.asm|   96 +
 .../drivers/nouveau/codegen/target_lib_nvc0.asm.h  |  112 +
 .../drivers/nouveau/codegen/target_lib_nve4.asm|  698 +
 

Re: [Mesa-dev] Mesa (git 20130828) fails to build on MIPS

2013-09-05 Thread Dominik Behr
It looks like you have segfaults in /bin/sh. That reminds me why I am not
using my Fuloong anymore ;-)
Maybe you should consider cross-compiling Mesa on a stable system.



On Thu, Sep 5, 2013 at 1:14 PM, Christophe Jarry 
christophe.ja...@ouvaton.org wrote:

  According to tis
  https://lists.gnu.org/archive/html/bug-tar/2005-02/msg1.html error
 141
  is 128+13 = SIGPIPE (broke pipe signal)
  And this may be relevant
  https://groups.google.com/forum/#!topic/golang-nuts/xjZ8jJx0IFw
  Check whether you are using right yacc and bison?

 Thanks for your suggestions: I used SIGPIPE during the build and this gave
 me
 Error 141. Without it, make gives me the following error:

 make[2]: Entering directory `/usr/src/mesa/mesa-20130828/src/glsl'
 /usr/lib/pkgusr/mkdir -p ../../src/glsl/glcpp
   LEX  glsl_lexer.cpp
   YACC glsl_parser.cpp
 /usr/lib/pkgusr/mkdir -p ../../src/glsl/glcpp
   YACC glcpp/glcpp-parse.c
   LEX  glcpp/glcpp-lex.c
 /bin/sh: line 1:  5657 Segmentation fault \
  flex -o glsl_lexer.cpp glsl_lexer.ll
 make[2]: *** [glsl_lexer.cpp] Error 139
 make[2]: *** Waiting for unfinished jobs
 /bin/sh: line 1:  5673 Segmentation fault \
  flex -o glcpp/glcpp-lex.c glcpp/glcpp-lex.l
 make[2]: *** [glcpp/glcpp-lex.c] Error 139
 bison: m4 subprocess failed
 make[2]: *** [glcpp/glcpp-parse.c] Error 1
 bison: m4 subprocess failed
 make[2]: *** [glsl_parser.cpp] Error 1
 make[2]: Leaving directory `/usr/src/mesa/mesa-20130828/src/glsl'


 My versions of Flex and Bison:

 $ flex --version
 2.5.37

 $ bison --version
 bison (GNU Bison) 2.7
 [...]


 According to mesa documentation (docs/install.html):

 On GNU/Linux systems, flex and bison are used. Versions 2.5.35 and 2.4.1,
 respectively, (or later) should work.


 Obviously, it does not work. What may I do?
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: pad IBs to a multiple of 4 DWs on r6xx

2013-09-05 Thread Dominik Behr
0x8000 is Type 2 NOP.
You could make it a little better/faster by inserting single multi-DWORD
Type 3 NOP
And pad to 8 DWORDs. CP fetches are 32 bytes each and R600 has requires
padding. Same with padding CP ring buffer updates to 32 bytes (pad to
32bytes before you update CP_RB_WPTR).


On Thu, Sep 5, 2013 at 3:56 PM, Marek Olšák mar...@gmail.com wrote:

 Reviewed-by: Marek Olšák marek.ol...@amd.com

 Though I'm not sure if 0x8000 is correct.

 Marek

 On Wed, Sep 4, 2013 at 11:55 PM, Alex Deucher alexdeuc...@gmail.com
 wrote:
  IBs need to be a multiple of 4 dwords on r6xx asics
  to avoid a hw bug.
 
  Signed-off-by: Alex Deucher alexander.deuc...@amd.com
  CC: 9.2 mesa-sta...@lists.freedesktop.org
  CC: 9.1 mesa-sta...@lists.freedesktop.org
  ---
   src/gallium/drivers/r600/r600_hw_context.c | 13 +
   1 file changed, 13 insertions(+)
 
  diff --git a/src/gallium/drivers/r600/r600_hw_context.c
 b/src/gallium/drivers/r600/r600_hw_context.c
  index 97b0f9c..0a219af 100644
  --- a/src/gallium/drivers/r600/r600_hw_context.c
  +++ b/src/gallium/drivers/r600/r600_hw_context.c
  @@ -347,6 +347,19 @@ void r600_context_flush(struct r600_context *ctx,
 unsigned flags)
  flags |= RADEON_FLUSH_KEEP_TILING_FLAGS;
  }
 
  +   /* Pad the GFX CS to a multiple of 4 dwords on rv6xx
  +* to avoid a hw bug.
  +*/
  +   if (ctx-chip_class  R700) {
  +   unsigned i;
  +   unsigned padding_dw = 4 - cs-cdw % 4;
  +   if (padding_dw  4) {
  +   for (i = 0; i  padding_dw; i++) {
  +   cs-buf[cs-cdw++] = 0x8000;
  +   }
  +   }
  +   }
  +
  /* Flush the CS. */
  ctx-ws-cs_flush(ctx-rings.gfx.cs, flags,
 ctx-screen-cs_count++);
   }
  --
  1.8.3.1
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vbo: Implement new gs prim types in vbo_count_tessellated_primitives.

2013-09-05 Thread Matt Turner
Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Update: UVD status on loongson 3a platform

2013-09-05 Thread cee1
2013/9/6 Jerome Glisse j.gli...@gmail.com:
 On Thu, Sep 05, 2013 at 03:29:52PM -0400, Jerome Glisse wrote:
 On Thu, Sep 05, 2013 at 10:14:32PM +0800, Chen Jie wrote:
  Hi all,
 
  This thread is about
  http://lists.freedesktop.org/archives/dri-devel/2013-April/037598.html.
 
  We recently find some interesting thing about UVD based playback on
  loongson 3a plaform, and also find a way to fix the problem.
 
  First, we find memcpy in [mesa]src/gallium/drivers/radeon/radeon_uvd.c
  caused the problem:
  * If memcpy is implemented though 16B or 8B load/store instructions,
  it will normally caused video mosaic. When insert a memcmp after the
  copying code in memcpy, it will report the src and dest are not equal.
  * If memcpy use 1B load/store instructions only, the memcmp after the
  copying code reports equal.
 
  Then we find the following changeset fixs out problem:
 
  diff --git a/src/gallium/drivers/radeon/radeon_uvd.c
  b/src/gallium/drivers/radeon/radeon_uvd.c
  index 2f98de2..f9599b6 100644
  --- a/src/gallium/drivers/radeon/radeon_uvd.c
  +++ b/src/gallium/drivers/radeon/radeon_uvd.c
  @@ -162,7 +162,7 @@ static bool create_buffer(struct ruvd_decoder *dec,
 unsigned size)
   {
buffer-buf = dec-ws-buffer_create(dec-ws, size, 4096, false,
  - RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM);
  + RADEON_DOMAIN_GTT);
if (!buffer-buf)
return false;
 
  The VRAM is mapped to an uncached area in out platform, so, my
  question is what could go wrong while using  4B load/store
  instructions in UVD workflow? Any idea?
 

 How do you map the VRAM into user process mapping ? ie do you have
 something like Intel PAT or something like MTRR or something else.

 In other word, can you map into process address space a region of
 io memory (GPU VRAM in this case) and mark it as uncached so that
 none of the access to it goes through CPU cache.

 Cheers,
 Jerome

 Also it might be that you can't do write combining on your platform,
 which would be a major drawback as it's assume by radeon userspace.
 I would need to check the pcie specification, but write combining is
 probably not mandatory meaning that your architecture might not have
 it. This would explain why only memset with byte size copy works.

 Don't think there is any easy way to work around that.
The original mesa code allows to allocate buffer in GTT and VRAM
domain. And we change it so that all buffers are allocated in GTT
domain, it seems fix our problem.


-- 
Regards,

- cee1
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Update: UVD status on loongson 3a platform

2013-09-05 Thread cee1
2013/9/6 Jerome Glisse j.gli...@gmail.com:
 On Thu, Sep 05, 2013 at 10:14:32PM +0800, Chen Jie wrote:
 Hi all,

 This thread is about
 http://lists.freedesktop.org/archives/dri-devel/2013-April/037598.html.

 We recently find some interesting thing about UVD based playback on
 loongson 3a plaform, and also find a way to fix the problem.

 First, we find memcpy in [mesa]src/gallium/drivers/radeon/radeon_uvd.c
 caused the problem:
 * If memcpy is implemented though 16B or 8B load/store instructions,
 it will normally caused video mosaic. When insert a memcmp after the
 copying code in memcpy, it will report the src and dest are not equal.
 * If memcpy use 1B load/store instructions only, the memcmp after the
 copying code reports equal.

 Then we find the following changeset fixs out problem:

 diff --git a/src/gallium/drivers/radeon/radeon_uvd.c
 b/src/gallium/drivers/radeon/radeon_uvd.c
 index 2f98de2..f9599b6 100644
 --- a/src/gallium/drivers/radeon/radeon_uvd.c
 +++ b/src/gallium/drivers/radeon/radeon_uvd.c
 @@ -162,7 +162,7 @@ static bool create_buffer(struct ruvd_decoder *dec,
unsigned size)
  {
   buffer-buf = dec-ws-buffer_create(dec-ws, size, 4096, false,
 - RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM);
 + RADEON_DOMAIN_GTT);
   if (!buffer-buf)
   return false;

 The VRAM is mapped to an uncached area in out platform, so, my
 question is what could go wrong while using  4B load/store
 instructions in UVD workflow? Any idea?


 How do you map the VRAM into user process mapping ? ie do you have
 something like Intel PAT or something like MTRR or something else.

 In other word, can you map into process address space a region of
 io memory (GPU VRAM in this case) and mark it as uncached so that
 none of the access to it goes through CPU cache.
Yes, of course.

On mips, there's a specific range of address space that is used to
access IO memory directly, and the address of VRAM BOs is just in this
range.



-- 
Regards,

- cee1
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev