Re: [Mesa-dev] [PATCH 1/6] radeonsi: move sampler descriptors from IB to memory

2014-07-16 Thread Michel Dänzer
On 16.07.2014 00:07, Marek Olšák wrote:
> On Tue, Jul 15, 2014 at 11:53 AM, Michel Dänzer  wrote:
>> On 13.07.2014 01:35, Marek Olšák wrote:
>>>
>>> Border colors have been broken if texturing from multiple shader stages is
>>> used. This patch doesn't change that.
>>
>> [...]
>>
>>> +/* Upload border colors and update the pointers in resource descriptors.
>>> + * There can only be 4096 border colors per context.
>>> + *
>>> + * XXX: This is broken if sampler states are bound to multiple shader 
>>> stages,
>>> + *  because TA_BC_BASE_ADDR is shared by all of them and we overwrite 
>>> it
>>> + *  for stages which were set earlier. This is also broken for
>>> + *  fine-grained sampler state updates.
>>> + */
>>
>> I don't think that's accurate, as the BO for storing the border colours
>> is per-context, not per-shader-stage.
> 
> Ah yes. The problem only occurs when the BO is reallocated. Consider this:
> 
> set_sampler_states(SHADER_VERTEX)
>   // This sets TA_BC_BASE_ADDR and sets the border color pointers
>   // in the sampler descriptors. The pointers are relative to the base 
> address.
> 
> set_sampler_states(SHADER_FRAGMENT)
>   // If the buffer is reallocated, TA_BC_BASE_ADDR is changed.
>   // All border color pointers for fragment sampler states are set and valid.
>   // All border color pointers for vertex sampler states are now invalid,
>   // because TA_BC_BASE_ADDR has been changed.
> 
> The reallocation can also happen halfway through setting up border
> colors, e.g. you set border colors 0,1,2,3, then you have to
> reallocate, and then you set border colors 4,5,6,7, so the first four
> border color pointers end up being incorrect, because the previous
> buffer has been thrown away.

Exactly, except in that case border colours 4,5,6,7 won't work properly
either, because their values are written to the old buffer, because the
border_color_table pointer isn't updated when reallocating the BO.


> I think the proper solution would be to update all border colors for
> all bound sampler states again when the buffer is reallocated. Also,
> to prevent frequent reallocations, we can check if the current border
> color pointer in a sampler state is still valid and if it is, we can
> skip the upload.

Sounds good, are you going to give this a shot?


-- 
Earthling Michel Dänzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/6] radeonsi: move vertex buffer descriptors from IB to memory

2014-07-16 Thread Michel Dänzer
On 13.07.2014 01:35, Marek Olšák wrote:
> From: Marek Olšák 
> 
> This removes the intermediate storage (pm4 state) and generates descriptors
> directly in a staging buffer.
> 
> It also reduces the number of flushes, because the descriptors no longer
> take CS space.

Cool.


> diff --git a/src/gallium/drivers/radeonsi/si_pm4.h 
> b/src/gallium/drivers/radeonsi/si_pm4.h
> index a719586..0702bd4 100644
> --- a/src/gallium/drivers/radeonsi/si_pm4.h
> +++ b/src/gallium/drivers/radeonsi/si_pm4.h
> @@ -76,10 +76,6 @@ void si_pm4_add_bo(struct si_pm4_state *state,
>  enum radeon_bo_usage usage,
>  enum radeon_bo_priority priority);
>  
> -void si_pm4_sh_data_begin(struct si_pm4_state *state);
> -void si_pm4_sh_data_add(struct si_pm4_state *state, uint32_t dw);
> -void si_pm4_sh_data_end(struct si_pm4_state *state, unsigned base, unsigned 
> idx);
> -
>  void si_pm4_inval_shader_cache(struct si_pm4_state *state);
>  void si_pm4_inval_texture_cache(struct si_pm4_state *state);
>  

It might be better to split out the removal of the si_pm4_sh_data_*
functions to a separate patch.

Either way though, patches 2-6 are

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] A proposal for new testing requirements for stable releases

2014-07-16 Thread Michel Dänzer
On 16.07.2014 06:16, Carl Worth wrote:
> Michel Dänzer  writes:
> 
> Quite frankly, my real concern with all of this is not that the driver
> maintainers will propose something bad, but that I will inadvertently
> botch something while cherry-picking or merging a conflict, etc. that I
> won't be able to notice in my touch testing.

[...]

> But my proposal was not intended to make that a requirement. You can
> continue to get your commits in with no additional testing on your part,
> by just affirming for each release that you're OK with what the
> stable-branch maintainer has put together.
> 
> When I phrase things that way, does it seem more reasonable to you?
> 
> And if this feels like a more bureaucratic means for doing nothing
> effectually different than what we did before, please humor me.

It does seem like that to me.


> (Maybe I'm just a wimp, but it's been tough at times to trust myself to
> resolve conflicts in code that I know I don't have any way to test. I do
> ask for help if the conflict looks really messy. But I don't like to
> bother people for things that look trivial. And even the trivial, manual
> conflict resolution can give me misgivings that I might be breaking the
> release.)

I'm fine with you asking the patch author or another developer of the
affected subsystem for a backport if there is any conflict, however trivial.


-- 
Earthling Michel Dänzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 80266] Many instances of 1<<31, which is undefined in C99

2014-07-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=80266

--- Comment #15 from Michel Dänzer  ---
Please submit patches fixing these issues to the mesa-dev mailing list, instead
of waiting here for anyone else to do it.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] radeonsi: move sampler descriptors from IB to memory

2014-07-16 Thread Marek Olšák
On Wed, Jul 16, 2014 at 10:17 AM, Michel Dänzer  wrote:
> On 16.07.2014 00:07, Marek Olšák wrote:
>> On Tue, Jul 15, 2014 at 11:53 AM, Michel Dänzer  wrote:
>>> On 13.07.2014 01:35, Marek Olšák wrote:

 Border colors have been broken if texturing from multiple shader stages is
 used. This patch doesn't change that.
>>>
>>> [...]
>>>
 +/* Upload border colors and update the pointers in resource descriptors.
 + * There can only be 4096 border colors per context.
 + *
 + * XXX: This is broken if sampler states are bound to multiple shader 
 stages,
 + *  because TA_BC_BASE_ADDR is shared by all of them and we overwrite 
 it
 + *  for stages which were set earlier. This is also broken for
 + *  fine-grained sampler state updates.
 + */
>>>
>>> I don't think that's accurate, as the BO for storing the border colours
>>> is per-context, not per-shader-stage.
>>
>> Ah yes. The problem only occurs when the BO is reallocated. Consider this:
>>
>> set_sampler_states(SHADER_VERTEX)
>>   // This sets TA_BC_BASE_ADDR and sets the border color pointers
>>   // in the sampler descriptors. The pointers are relative to the base 
>> address.
>>
>> set_sampler_states(SHADER_FRAGMENT)
>>   // If the buffer is reallocated, TA_BC_BASE_ADDR is changed.
>>   // All border color pointers for fragment sampler states are set and valid.
>>   // All border color pointers for vertex sampler states are now invalid,
>>   // because TA_BC_BASE_ADDR has been changed.
>>
>> The reallocation can also happen halfway through setting up border
>> colors, e.g. you set border colors 0,1,2,3, then you have to
>> reallocate, and then you set border colors 4,5,6,7, so the first four
>> border color pointers end up being incorrect, because the previous
>> buffer has been thrown away.
>
> Exactly, except in that case border colours 4,5,6,7 won't work properly
> either, because their values are written to the old buffer, because the
> border_color_table pointer isn't updated when reallocating the BO.
>
>
>> I think the proper solution would be to update all border colors for
>> all bound sampler states again when the buffer is reallocated. Also,
>> to prevent frequent reallocations, we can check if the current border
>> color pointer in a sampler state is still valid and if it is, we can
>> skip the upload.
>
> Sounds good, are you going to give this a shot?

Yes, but not in this patch. For now, I'll just change the comment to
"this code is very broken".

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Release-candidate branch for upcoming 10.2.4

2014-07-16 Thread Ilia Mirkin
On Tue, Jul 15, 2014 at 12:49 AM, Carl Worth  wrote:
> Hi folks,
>
> I've pushed out an update to the 10.2 branch and I need some specific
> testing in the next three days.
>
> I've tested the branch on Intel (Haswell) as well as both swrast and
> Gallium softpipe and found no piglit regressions compared to the 10.2.3
> release.
>
> The branch includes a few patches to nouveau and radeonsi which I have
> not been able to test. If someone will test one of these drivers with
> piglit and let me know that all looks good, I'll be happy to include the
> patches in the release. Otherwise, I'll drop any untested patches before
> making the final release on Friday.
>
> Also, there's still time in the next three days for someone to nominate
> further driver-specific changes. I'll just need positive piglit test
> results for any such patches, (on top of the branch as it stands now),
> before I'll accept them.

Tested nouveau with GK107 (nvc0 driver) and GT218 (nv50 driver). No
regressions, and nvc0 had the expected improvements. Looks good.

Cheers,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] wglinfo: query and report multisample information

2014-07-16 Thread Jose Fonseca

On 15/07/14 15:39, Brian Paul wrote:

Before, we always reported zeros in the multisample columns of the
format list.  Since PIXELFORMATDESCRIPTOR doesn't have fields for
multisample, we use a new format_info structure to extend that type.

We can only query this info with the wglGetPixelFormatAttribivARB()
function which is part of the WGL_ARB_pixel_format extension.
---
  src/wgl/wglinfo.c |  237 -
  1 file changed, 178 insertions(+), 59 deletions(-)

diff --git a/src/wgl/wglinfo.c b/src/wgl/wglinfo.c
index 2b2c921..e14ebd6 100644
--- a/src/wgl/wglinfo.c
+++ b/src/wgl/wglinfo.c
@@ -50,6 +50,21 @@ typedef enum
  } InfoMode;


+static GLboolean have_WGL_ARB_pixel_format;
+static GLboolean have_WGL_ARB_multisample;
+
+static PFNWGLGETPIXELFORMATATTRIBIVARBPROC wglGetPixelFormatAttribivARB_func;
+
+
+/**
+ * An extension of PIXELFORMATDESCRIPTOR to handle multisample, etc.
+ */
+struct format_info {
+   PIXELFORMATDESCRIPTOR pfd;
+   int sampleBuffers, numSamples;
+   int transparency;
+};
+

  static LRESULT CALLBACK
  WndProc(HWND hWnd,
@@ -159,6 +174,12 @@ print_screen_info(HDC _hdc, GLboolean limits, GLboolean 
singleLine)
  printf("WGL extensions:\n");
  print_extension_list(wglExtensions, singleLine);
   }
+ if (extension_supported("WGL_ARB_pixel_format", wglExtensions)) {
+have_WGL_ARB_pixel_format = GL_TRUE;
+ }
+ if (extension_supported("WGL_ARB_multisample", wglExtensions)) {
+have_WGL_ARB_multisample = GL_TRUE;
+ }
}
  #endif
printf("OpenGL vendor string: %s\n", glVendor);
@@ -208,27 +229,27 @@ visual_render_type_name(BYTE iPixelType)
  }

  static void
-print_visual_attribs_verbose(int iPixelFormat, LPPIXELFORMATDESCRIPTOR ppfd)
+print_visual_attribs_verbose(int iPixelFormat, const struct format_info *info)
  {
 printf("Visual ID: %x  generic=%d  native=%d\n",
iPixelFormat,
-  ppfd->dwFlags & PFD_GENERIC_FORMAT ? 1 : 0,
-  ppfd->dwFlags & PFD_DRAW_TO_WINDOW ? 1 : 0);
+  info->pfd.dwFlags & PFD_GENERIC_FORMAT ? 1 : 0,
+  info->pfd.dwFlags & PFD_DRAW_TO_WINDOW ? 1 : 0);
 printf("bufferSize=%d level=%d renderType=%s doubleBuffer=%d 
stereo=%d\n",
-  0 /* ppfd->bufferSize */, 0 /* ppfd->level */,
- visual_render_type_name(ppfd->iPixelType),
-  ppfd->dwFlags & PFD_DOUBLEBUFFER ? 1 : 0,
-  ppfd->dwFlags & PFD_STEREO ? 1 : 0);
+  0 /* info->pfd.bufferSize */, 0 /* info->pfd.level */,
+ visual_render_type_name(info->pfd.iPixelType),
+  info->pfd.dwFlags & PFD_DOUBLEBUFFER ? 1 : 0,
+  info->pfd.dwFlags & PFD_STEREO ? 1 : 0);
 printf("rgba: cRedBits=%d cGreenBits=%d cBlueBits=%d cAlphaBits=%d\n",
-  ppfd->cRedBits, ppfd->cGreenBits,
-  ppfd->cBlueBits, ppfd->cAlphaBits);
+  info->pfd.cRedBits, info->pfd.cGreenBits,
+  info->pfd.cBlueBits, info->pfd.cAlphaBits);
 printf("cAuxBuffers=%d cDepthBits=%d cStencilBits=%d\n",
-  ppfd->cAuxBuffers, ppfd->cDepthBits, ppfd->cStencilBits);
+  info->pfd.cAuxBuffers, info->pfd.cDepthBits, info->pfd.cStencilBits);
 printf("accum: cRedBits=%d cGreenBits=%d cBlueBits=%d cAlphaBits=%d\n",
-  ppfd->cAccumRedBits, ppfd->cAccumGreenBits,
-  ppfd->cAccumBlueBits, ppfd->cAccumAlphaBits);
+  info->pfd.cAccumRedBits, info->pfd.cAccumGreenBits,
+  info->pfd.cAccumBlueBits, info->pfd.cAccumAlphaBits);
 printf("multiSample=%d  multiSampleBuffers=%d\n",
-  0 /* ppfd->numSamples */, 0 /* ppfd->numMultisample */);
+  info->numSamples, info->sampleBuffers);
  }


@@ -242,32 +263,32 @@ print_visual_attribs_short_header(void)


  static void
-print_visual_attribs_short(int iPixelFormat, LPPIXELFORMATDESCRIPTOR ppfd)
+print_visual_attribs_short(int iPixelFormat, const struct format_info *info)
  {
 char *caveat = "None";

 printf("0x%02x %2d  %2d %2d %2d %2d %c%c %c  %c %2d %2d %2d %2d %2d %2d 
%2d",
iPixelFormat,
-  ppfd->dwFlags & PFD_GENERIC_FORMAT ? 1 : 0,
-  ppfd->dwFlags & PFD_DRAW_TO_WINDOW ? 1 : 0,
-  0,
-  0 /* ppfd->bufferSize */,
-  0 /* ppfd->level */,
-  ppfd->iPixelType == PFD_TYPE_RGBA ? 'r' : ' ',
-  ppfd->iPixelType == PFD_TYPE_COLORINDEX ? 'c' : ' ',
-  ppfd->dwFlags & PFD_DOUBLEBUFFER ? 'y' : '.',
-  ppfd->dwFlags & PFD_STEREO ? 'y' : '.',
-  ppfd->cRedBits, ppfd->cGreenBits,
-  ppfd->cBlueBits, ppfd->cAlphaBits,
-  ppfd->cAuxBuffers,
-  ppfd->cDepthBits,
-  ppfd->cStencilBits
+  info->pfd.dwFlags & PFD_GENERIC_FORMAT ? 1 : 0,
+  info->pfd.dwFlags & PFD_DRAW_TO_WINDOW ? 1 : 0,
+  info->transparency,
+  info->pfd.cColorBits,
+  0 /* info->pfd.level */,
+  info->pfd.iPixelType == PFD_TYPE_RGBA ? 'r' 

[Mesa-dev] [PATCH] r600g: Implement GL_ARB_texture_gather

2014-07-16 Thread Glenn Kennard
Only supported on evergreen and later. Currently limited
to single component textures as the hardware GATHER4
instruction ignores texture swizzles.

Piglit quick run passes on radeon 6670 with all
applicable textureGather tests, no regressions.

Signed-off-by: Glenn Kennard 
---
Changes from v1:
 Removed PIPE_CAP_TEXTURE_GATHER_SM5 cap
 
This patch should be equivalent to the ARB_texture_gather only
portions of David Airlie's work in progress gather implementation
http://cgit.freedesktop.org/~airlied/mesa/log/?h=r600g-texture-gather

Further work is needed to enable the GL_ARB_gpu_shader5 enhancements
to texture gather, in particular keying sampler swizzle state to
shader variants with the appropriate component selects.

 docs/GL3.txt   |  2 +-
 docs/relnotes/10.3.html|  2 +-
 src/gallium/drivers/r600/r600_pipe.c   |  3 ++-
 src/gallium/drivers/r600/r600_shader.c | 47 +-
 4 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index a2f438b..20e57b0 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -118,7 +118,7 @@ GL 4.0:
   GL_ARB_tessellation_shader   started (Fabian)
   GL_ARB_texture_buffer_object_rgb32   DONE (i965, nvc0, r600, 
radeonsi, softpipe)
   GL_ARB_texture_cube_map_arrayDONE (i965, nv50, nvc0, 
r600, radeonsi, softpipe)
-  GL_ARB_texture_gatherDONE (i965, nv50, nvc0, 
radeonsi)
+  GL_ARB_texture_gatherDONE (i965, nv50, nvc0, 
radeonsi, r600)
   GL_ARB_texture_query_lod DONE (i965, nv50, nvc0, 
radeonsi)
   GL_ARB_transform_feedback2   DONE (i965, nv50, nvc0, 
r600, radeonsi)
   GL_ARB_transform_feedback3   DONE (i965, nv50, nvc0, 
r600, radeonsi)
diff --git a/docs/relnotes/10.3.html b/docs/relnotes/10.3.html
index 2e718fc..1c0fab6 100644
--- a/docs/relnotes/10.3.html
+++ b/docs/relnotes/10.3.html
@@ -49,7 +49,7 @@ Note: some of the new features are only available with 
certain drivers.
 GL_ARB_sample_shading on radeonsi
 GL_ARB_stencil_texturing on nv50, nvc0, r600, and radeonsi
 GL_ARB_texture_cube_map_array on radeonsi
-GL_ARB_texture_gather on radeonsi
+GL_ARB_texture_gather on radeonsi, r600
 GL_ARB_texture_query_levels on nv50, nvc0, llvmpipe, r600, radeonsi, 
softpipe
 GL_ARB_texture_query_lod on radeonsi
 GL_ARB_viewport_array on nvc0
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index ca6399f..a762b00 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -303,6 +303,8 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE:
case PIPE_CAP_CUBE_MAP_ARRAY:
case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT:
+   return 0;
+   case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
return family >= CHIP_CEDAR ? 1 : 0;
 
/* Unsupported features. */
@@ -312,7 +314,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_FRAGMENT_COLOR_CLAMPED:
case PIPE_CAP_VERTEX_COLOR_CLAMPED:
case PIPE_CAP_USER_VERTEX_BUFFERS:
-   case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
case PIPE_CAP_TEXTURE_GATHER_SM5:
case PIPE_CAP_TEXTURE_QUERY_LOD:
case PIPE_CAP_SAMPLE_SHADING:
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 6952e3c..db928f3 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -4477,7 +4477,8 @@ static int tgsi_tex(struct r600_shader_ctx *ctx)
 
if (inst->Instruction.Opcode == TGSI_OPCODE_TEX2 ||
inst->Instruction.Opcode == TGSI_OPCODE_TXB2 ||
-   inst->Instruction.Opcode == TGSI_OPCODE_TXL2)
+   inst->Instruction.Opcode == TGSI_OPCODE_TXL2 ||
+   inst->Instruction.Opcode == TGSI_OPCODE_TG4)
sampler_src_reg = 2;
 
src_gpr = tgsi_tex_get_src_gpr(ctx, 0);
@@ -5079,6 +5080,13 @@ static int tgsi_tex(struct r600_shader_ctx *ctx)
case FETCH_OP_SAMPLE_G:
opcode = FETCH_OP_SAMPLE_C_G;
break;
+   /* Texture gather variants */
+   case FETCH_OP_GATHER4:
+   tex.op = FETCH_OP_GATHER4_C;
+   break;
+   case FETCH_OP_GATHER4_O:
+   tex.op = FETCH_OP_GATHER4_C_O;
+   break;
}
}
 
@@ -5089,9 +5097,21 @@ static int tgsi_tex(struct r600_shader_ctx *ctx)
tex.resource_id = tex.sampler_id + R600_MAX_CONST_BUFFERS;
tex.src_gpr = src_gpr;
tex.dst_gpr = ctx->file_offset[inst->Dst[0].Register.File] + 
inst->Dst[0].Register.Index;
-   tex.dst_se

Re: [Mesa-dev] [PATCH] r600g: Implement GL_ARB_texture_gather

2014-07-16 Thread Ilia Mirkin
On Wed, Jul 16, 2014 at 9:53 AM, Glenn Kennard  wrote:
> Only supported on evergreen and later. Currently limited
> to single component textures as the hardware GATHER4
> instruction ignores texture swizzles.
>
> Piglit quick run passes on radeon 6670 with all
> applicable textureGather tests, no regressions.
>
> Signed-off-by: Glenn Kennard 
> ---
> Changes from v1:
>  Removed PIPE_CAP_TEXTURE_GATHER_SM5 cap
>
> This patch should be equivalent to the ARB_texture_gather only
> portions of David Airlie's work in progress gather implementation
> http://cgit.freedesktop.org/~airlied/mesa/log/?h=r600g-texture-gather
>
> Further work is needed to enable the GL_ARB_gpu_shader5 enhancements
> to texture gather, in particular keying sampler swizzle state to
> shader variants with the appropriate component selects.
>
>  docs/GL3.txt   |  2 +-
>  docs/relnotes/10.3.html|  2 +-
>  src/gallium/drivers/r600/r600_pipe.c   |  3 ++-
>  src/gallium/drivers/r600/r600_shader.c | 47 
> +-
>  4 files changed, 45 insertions(+), 9 deletions(-)
>
> diff --git a/docs/GL3.txt b/docs/GL3.txt
> index a2f438b..20e57b0 100644
> --- a/docs/GL3.txt
> +++ b/docs/GL3.txt
> @@ -118,7 +118,7 @@ GL 4.0:
>GL_ARB_tessellation_shader   started (Fabian)
>GL_ARB_texture_buffer_object_rgb32   DONE (i965, nvc0, 
> r600, radeonsi, softpipe)
>GL_ARB_texture_cube_map_arrayDONE (i965, nv50, 
> nvc0, r600, radeonsi, softpipe)
> -  GL_ARB_texture_gatherDONE (i965, nv50, 
> nvc0, radeonsi)
> +  GL_ARB_texture_gatherDONE (i965, nv50, 
> nvc0, radeonsi, r600)
>GL_ARB_texture_query_lod DONE (i965, nv50, 
> nvc0, radeonsi)
>GL_ARB_transform_feedback2   DONE (i965, nv50, 
> nvc0, r600, radeonsi)
>GL_ARB_transform_feedback3   DONE (i965, nv50, 
> nvc0, r600, radeonsi)
> diff --git a/docs/relnotes/10.3.html b/docs/relnotes/10.3.html
> index 2e718fc..1c0fab6 100644
> --- a/docs/relnotes/10.3.html
> +++ b/docs/relnotes/10.3.html
> @@ -49,7 +49,7 @@ Note: some of the new features are only available with 
> certain drivers.
>  GL_ARB_sample_shading on radeonsi
>  GL_ARB_stencil_texturing on nv50, nvc0, r600, and radeonsi
>  GL_ARB_texture_cube_map_array on radeonsi
> -GL_ARB_texture_gather on radeonsi
> +GL_ARB_texture_gather on radeonsi, r600
>  GL_ARB_texture_query_levels on nv50, nvc0, llvmpipe, r600, radeonsi, 
> softpipe
>  GL_ARB_texture_query_lod on radeonsi
>  GL_ARB_viewport_array on nvc0
> diff --git a/src/gallium/drivers/r600/r600_pipe.c 
> b/src/gallium/drivers/r600/r600_pipe.c
> index ca6399f..a762b00 100644
> --- a/src/gallium/drivers/r600/r600_pipe.c
> +++ b/src/gallium/drivers/r600/r600_pipe.c
> @@ -303,6 +303,8 @@ static int r600_get_param(struct pipe_screen* pscreen, 
> enum pipe_cap param)
> case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE:
> case PIPE_CAP_CUBE_MAP_ARRAY:
> case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT:
> +   return 0;

oops?

> +   case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
> return family >= CHIP_CEDAR ? 1 : 0;
>
> /* Unsupported features. */
> @@ -312,7 +314,6 @@ static int r600_get_param(struct pipe_screen* pscreen, 
> enum pipe_cap param)
> case PIPE_CAP_FRAGMENT_COLOR_CLAMPED:
> case PIPE_CAP_VERTEX_COLOR_CLAMPED:
> case PIPE_CAP_USER_VERTEX_BUFFERS:
> -   case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
> case PIPE_CAP_TEXTURE_GATHER_SM5:
> case PIPE_CAP_TEXTURE_QUERY_LOD:
> case PIPE_CAP_SAMPLE_SHADING:
> diff --git a/src/gallium/drivers/r600/r600_shader.c 
> b/src/gallium/drivers/r600/r600_shader.c
> index 6952e3c..db928f3 100644
> --- a/src/gallium/drivers/r600/r600_shader.c
> +++ b/src/gallium/drivers/r600/r600_shader.c
> @@ -4477,7 +4477,8 @@ static int tgsi_tex(struct r600_shader_ctx *ctx)
>
> if (inst->Instruction.Opcode == TGSI_OPCODE_TEX2 ||
> inst->Instruction.Opcode == TGSI_OPCODE_TXB2 ||
> -   inst->Instruction.Opcode == TGSI_OPCODE_TXL2)
> +   inst->Instruction.Opcode == TGSI_OPCODE_TXL2 ||
> +   inst->Instruction.Opcode == TGSI_OPCODE_TG4)
> sampler_src_reg = 2;
>
> src_gpr = tgsi_tex_get_src_gpr(ctx, 0);
> @@ -5079,6 +5080,13 @@ static int tgsi_tex(struct r600_shader_ctx *ctx)
> case FETCH_OP_SAMPLE_G:
> opcode = FETCH_OP_SAMPLE_C_G;
> break;
> +   /* Texture gather variants */
> +   case FETCH_OP_GATHER4:
> +   tex.op = FETCH_OP_GATHER4_C;
> +   break;
> +   case FETCH_OP_GATHER4_O:
> +   tex.op = FETCH_OP_GATHER4_C_O;
> +   break;
> }
> }
>
> @@ -50

Re: [Mesa-dev] [PATCH 00/10] [RFC] Probably useless algebraic optimizations

2014-07-16 Thread Brian Paul
Some of these optimizations are pretty obscure and I can't imagine a 
real shader hitting them.


What's the cost of checking for these cases?  I don't know how expensive 
the equals() methods are.


Do we want to litter the optimizer with cases that may never be used in 
practice?


-Brian


On 07/15/2014 04:27 PM, Thomas Helland wrote:

So, a little update on these patches.
I've written some shaders for hitting each
specific case in the patch-series.

This shows that:
Patch 1 (X - X) == 0,and
Patch 9 (A - neg(B)) == A + B
have no effect at all.

The rest of the patches do indeed have
a positive effect on the special-case shader.

If anyone wants to have a look at the shaders
then let me know. I could always put them
in a dropbox-folder, or github, or something.

The report from shader-db (sorted by patch-number):

Patch 2:
helped: shaders/mine/a_or_nota.shader_test fs16:  16 -> 5 (-68.75%)
helped: shaders/mine/a_or_nota.shader_test fs8:   16 -> 5 (-68.75%)
helped: shaders/mine/a_or_nota.shader_test vs:11 -> 5 (-54.55%)

Patch 3:
helped: shaders/mine/a_and_nota.shader_test fs16: 16 -> 5 (-68.75%)
helped: shaders/mine/a_and_nota.shader_test fs8:  16 -> 5 (-68.75%)
helped: shaders/mine/a_and_nota.shader_test vs:   11 -> 5 (-54.55%)

Patch 4:
helped: shaders/mine/or_and.shader_test fs16: 16 -> 14 (-12.50%)
helped: shaders/mine/or_and.shader_test fs8:  16 -> 14 (-12.50%)
helped: shaders/mine/or_and.shader_test vs:   11 -> 10 (-9.09%)

Patch 5:
helped: shaders/mine/minOver.shader_test fs16:8 -> 5 (-37.50%)
helped: shaders/mine/minOver.shader_test fs8: 8 -> 5 (-37.50%)
helped: shaders/mine/minOver.shader_test vs:  6 -> 5 (-16.67%)
helped: shaders/mine/minUnder.shader_test fs16:   8 -> 5 (-37.50%)
helped: shaders/mine/minUnder.shader_test fs8:8 -> 5 (-37.50%)
helped: shaders/mine/minUnder.shader_test vs: 6 -> 5 (-16.67%)

Patch 6
helped: shaders/mine/maxOver.shader_test fs16:8 -> 5 (-37.50%)
helped: shaders/mine/maxOver.shader_test fs8: 8 -> 5 (-37.50%)
helped: shaders/mine/maxOver.shader_test vs:  6 -> 5 (-16.67%)
helped: shaders/mine/maxUnder.shader_test fs16:   8 -> 5 (-37.50%)
helped: shaders/mine/maxUnder.shader_test fs8:8 -> 5 (-37.50%)
helped: shaders/mine/maxUnder.shader_test vs: 6 -> 5 (-16.67%)

Patch 7:
helped: shaders/mine/loglog.shader_test fs16: 17 -> 11 (-35.29%)
helped: shaders/mine/loglog.shader_test fs8:  17 -> 11 (-35.29%)
helped: shaders/mine/loglog.shader_test vs:   7 -> 6 (-14.29%)

Patch 8:
helped: shaders/mine/expexp.shader_test fs16: 17 -> 11 (-35.29%)
helped: shaders/mine/expexp.shader_test fs8:  17 -> 11 (-35.29%)
helped: shaders/mine/expexp.shader_test vs:   7 -> 6 (-14.29%)

Patch 10:
helped: shaders/mine/pow0x.shader_test fs16:  8 -> 5 (-37.50%)
helped: shaders/mine/pow0x.shader_test fs8:   8 -> 5 (-37.50%)
helped: shaders/mine/pow0x.shader_test vs:6 -> 5 (-16.67%)
helped: shaders/mine/powx-1.shader_test fs16: 11 -> 5 (-54.55%)
helped: shaders/mine/powx-1.shader_test fs8:  11 -> 5 (-54.55%)
helped: shaders/mine/powx-1.shader_test vs:   7 -> 5 (-28.57%)
helped: shaders/mine/powx0.shader_test fs16:  8 -> 5 (-37.50%)
helped: shaders/mine/powx0.shader_test fs8:   8 -> 5 (-37.50%)
helped: shaders/mine/powx0.shader_test vs:6 -> 5 (-16.67%)



2014-07-15 0:22 GMT+02:00 mailto:thomashellan...@gmail.com>>:

From: Thomas Helland mailto:thomashellan...@gmail.com>>

When writing that A || (A && B) patch some
days ago I also wrote some other patches
that have no impact on my collection of shaders.
(shader-db + Some TF2 and Portal-shaders).
No reduction in instruction count, and no
significant increase in compilation time.

I decided to put them up here anyway, as
with your collection of shaders maybe YMMV.

These are mostly RFC-quality, and not all are
as complete and nicely formatted as they could be.
Possibly some are also implemented incorrectly.
(I'm still trying to get a good understanding of
the buildup of the ir, the visitors, etc)

Feel free to do with these patches as you please;
Ignore, test, review, flame, make cookies...

Thomas Helland (10):
   glsl: Optimize X - X -> 0
   glsl: Optimize !A || A == 1
   glsl: Optimize !A && A == 0
   glsl: Optimize (A || B) && A == A
   glsl: Optimize min(-8, sin(x)) == -8 and similar
   glsl: Optimize max(8, sin(x)) == 8 and similar
   glsl: Optimize log(x) + log(y) == log(x*y)
   glsl: Optimize exp(x)*exp(y) == exp(x+y)
   glsl: Optimize A - neg(B) == A + B  and  neg(A) - B == neg(A + B)
   glsl: Optimize some more pow() special case

[Mesa-dev] [PATCHi v3] r600g: Implement GL_ARB_texture_gather

2014-07-16 Thread Glenn Kennard
Only supported on evergreen and later. Currently limited
to single component textures as the hardware GATHER4
instruction ignores texture swizzles.

Piglit quick run passes on radeon 6670 with all
applicable textureGather tests, no regressions.

Signed-off-by: Glenn Kennard 
---
Changes from v2:
 Remove accidental disabling of unrelated caps that snuck in.
 Oddly enough not caught by comparing piglit "quick" runs.
Changes from v1:
 Removed PIPE_CAP_TEXTURE_GATHER_SM5 cap

 docs/GL3.txt   |  2 +-
 docs/relnotes/10.3.html|  2 +-
 src/gallium/drivers/r600/r600_pipe.c   |  2 +-
 src/gallium/drivers/r600/r600_shader.c | 47 +-
 4 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index a2f438b..20e57b0 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -118,7 +118,7 @@ GL 4.0:
   GL_ARB_tessellation_shader   started (Fabian)
   GL_ARB_texture_buffer_object_rgb32   DONE (i965, nvc0, r600, 
radeonsi, softpipe)
   GL_ARB_texture_cube_map_arrayDONE (i965, nv50, nvc0, 
r600, radeonsi, softpipe)
-  GL_ARB_texture_gatherDONE (i965, nv50, nvc0, 
radeonsi)
+  GL_ARB_texture_gatherDONE (i965, nv50, nvc0, 
radeonsi, r600)
   GL_ARB_texture_query_lod DONE (i965, nv50, nvc0, 
radeonsi)
   GL_ARB_transform_feedback2   DONE (i965, nv50, nvc0, 
r600, radeonsi)
   GL_ARB_transform_feedback3   DONE (i965, nv50, nvc0, 
r600, radeonsi)
diff --git a/docs/relnotes/10.3.html b/docs/relnotes/10.3.html
index 2e718fc..1c0fab6 100644
--- a/docs/relnotes/10.3.html
+++ b/docs/relnotes/10.3.html
@@ -49,7 +49,7 @@ Note: some of the new features are only available with 
certain drivers.
 GL_ARB_sample_shading on radeonsi
 GL_ARB_stencil_texturing on nv50, nvc0, r600, and radeonsi
 GL_ARB_texture_cube_map_array on radeonsi
-GL_ARB_texture_gather on radeonsi
+GL_ARB_texture_gather on radeonsi, r600
 GL_ARB_texture_query_levels on nv50, nvc0, llvmpipe, r600, radeonsi, 
softpipe
 GL_ARB_texture_query_lod on radeonsi
 GL_ARB_viewport_array on nvc0
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index ca6399f..5bf9c00 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -303,6 +303,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE:
case PIPE_CAP_CUBE_MAP_ARRAY:
case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT:
+   case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
return family >= CHIP_CEDAR ? 1 : 0;
 
/* Unsupported features. */
@@ -312,7 +313,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_FRAGMENT_COLOR_CLAMPED:
case PIPE_CAP_VERTEX_COLOR_CLAMPED:
case PIPE_CAP_USER_VERTEX_BUFFERS:
-   case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
case PIPE_CAP_TEXTURE_GATHER_SM5:
case PIPE_CAP_TEXTURE_QUERY_LOD:
case PIPE_CAP_SAMPLE_SHADING:
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 6952e3c..db928f3 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -4477,7 +4477,8 @@ static int tgsi_tex(struct r600_shader_ctx *ctx)
 
if (inst->Instruction.Opcode == TGSI_OPCODE_TEX2 ||
inst->Instruction.Opcode == TGSI_OPCODE_TXB2 ||
-   inst->Instruction.Opcode == TGSI_OPCODE_TXL2)
+   inst->Instruction.Opcode == TGSI_OPCODE_TXL2 ||
+   inst->Instruction.Opcode == TGSI_OPCODE_TG4)
sampler_src_reg = 2;
 
src_gpr = tgsi_tex_get_src_gpr(ctx, 0);
@@ -5079,6 +5080,13 @@ static int tgsi_tex(struct r600_shader_ctx *ctx)
case FETCH_OP_SAMPLE_G:
opcode = FETCH_OP_SAMPLE_C_G;
break;
+   /* Texture gather variants */
+   case FETCH_OP_GATHER4:
+   tex.op = FETCH_OP_GATHER4_C;
+   break;
+   case FETCH_OP_GATHER4_O:
+   tex.op = FETCH_OP_GATHER4_C_O;
+   break;
}
}
 
@@ -5089,9 +5097,21 @@ static int tgsi_tex(struct r600_shader_ctx *ctx)
tex.resource_id = tex.sampler_id + R600_MAX_CONST_BUFFERS;
tex.src_gpr = src_gpr;
tex.dst_gpr = ctx->file_offset[inst->Dst[0].Register.File] + 
inst->Dst[0].Register.Index;
-   tex.dst_sel_x = (inst->Dst[0].Register.WriteMask & 1) ? 0 : 7;
-   tex.dst_sel_y = (inst->Dst[0].Register.WriteMask & 2) ? 1 : 7;
-   tex.dst_sel_z = (inst->Dst[0].Register.WriteMask & 4) ? 2 : 7;
+
+   if (inst->Instruction.Opcode == TGSI_OPCODE_TG4) {
+   int8_t te

[Mesa-dev] [Bug 78716] Fix Mesa bugs for running Unreal Engine 4.1 Cave effects demo compiled for Linux

2014-07-16 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=78716

Eero Tamminen  changed:

   What|Removed |Added

 CC||eero.t.tammi...@intel.com

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] glsl: handle a switch where default is in the middle of cases

2014-07-16 Thread Ian Romanick
Reviewed-by: Ian Romanick 

At some point we should do some significant re-work of the
switch-statement handling in Mesa.  The current structure makes it hard
to do a lot of things (e.g., jump-tables for uniform control flow).

On 07/13/2014 11:45 PM, Tapani Pälli wrote:
> This fixes following tests in es3conform:
> 
>shaders.switch.default_not_last_dynamic_vertex
>shaders.switch.default_not_last_dynamic_fragment
> 
> and makes following tests in Piglit pass:
> 
>glsl-1.30/execution/switch/fs-default-notlast-fallthrough
>glsl-1.30/execution/switch/fs-default_notlast
> 
> No Piglit regressions.
> 
> v2: take away unnecessary ir_if, just use conditional assignment
> v3: use foreach_in_list instead of foreach_list
> 
> Signed-off-by: Tapani Pälli 
> Reviewed-by: Roland Scheidegger  (v2)
> ---
>  src/glsl/ast_to_hir.cpp   | 83 
> +--
>  src/glsl/glsl_parser_extras.h |  3 ++
>  2 files changed, 83 insertions(+), 3 deletions(-)
> 
> diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
> index 885bee5..0e0013e 100644
> --- a/src/glsl/ast_to_hir.cpp
> +++ b/src/glsl/ast_to_hir.cpp
> @@ -4513,6 +4513,12 @@ ast_switch_statement::hir(exec_list *instructions,
> instructions->push_tail(new(ctx) ir_assignment(deref_is_break_var,
>is_break_val));
>  
> +   state->switch_state.run_default =
> +  new(ctx) ir_variable(glsl_type::bool_type,
> + "run_default_tmp",
> + ir_var_temporary);
> +   instructions->push_tail(state->switch_state.run_default);
> +
> /* Cache test expression.
>  */
> test_to_hir(instructions, state);
> @@ -4567,8 +4573,71 @@ ir_rvalue *
>  ast_case_statement_list::hir(exec_list *instructions,
>   struct _mesa_glsl_parse_state *state)
>  {
> -   foreach_list_typed (ast_case_statement, case_stmt, link, & this->cases)
> -  case_stmt->hir(instructions, state);
> +   exec_list default_case, after_default, tmp;
> +
> +   foreach_list_typed (ast_case_statement, case_stmt, link, & this->cases) {
> +  case_stmt->hir(&tmp, state);
> +
> +  /* Default case. */
> +  if (state->switch_state.previous_default && default_case.is_empty()) {
> + default_case.append_list(&tmp);
> + continue;
> +  }
> +
> +  /* If default case found, append 'after_default' list. */
> +  if (!default_case.is_empty())
> + after_default.append_list(&tmp);
> +  else
> + instructions->append_list(&tmp);
> +   }
> +
> +   /* Handle the default case. This is done here because default might not be
> +* the last case. We need to add checks against following cases first to 
> see
> +* if default should be chosen or not.
> +*/
> +   if (!default_case.is_empty()) {
> +
> +  /* Default case was the last one, no checks required. */
> +  if (after_default.is_empty()) {
> + instructions->append_list(&default_case);
> + return NULL;
> +  }
> +
> +  ir_rvalue *const true_val = new (state) ir_constant(true);
> +  ir_dereference_variable *deref_run_default_var =
> + new(state) ir_dereference_variable(state->switch_state.run_default);
> +
> +  /* Choose to run default case initially, following conditional
> +   * assignments might change this.
> +   */
> +  ir_assignment *const init_var =
> + new(state) ir_assignment(deref_run_default_var, true_val);
> +  instructions->push_tail(init_var);
> +
> +  foreach_in_list(ir_instruction, ir, &after_default) {
> + ir_assignment *assign = ir->as_assignment();
> +
> + if (!assign)
> +continue;
> +
> + /* Clone the check between case label and init expression. */
> + ir_expression *exp = (ir_expression*) assign->condition;
> + ir_expression *clone = exp->clone(state, NULL);
> +
> + ir_dereference_variable *deref_var =
> +new(state) 
> ir_dereference_variable(state->switch_state.run_default);
> + ir_rvalue *const false_val = new (state) ir_constant(false);
> +
> + ir_assignment *const set_false =
> +new(state) ir_assignment(deref_var, false_val, clone);
> +
> + instructions->push_tail(set_false);
> +  }
> +
> +  /* Append default case and all cases after it. */
> +  instructions->append_list(&default_case);
> +  instructions->append_list(&after_default);
> +   }
>  
> /* Case statements do not have r-values. */
> return NULL;
> @@ -4728,9 +4797,17 @@ ast_case_label::hir(exec_list *instructions,
>}
>state->switch_state.previous_default = this;
>  
> +  /* Set fallthru condition on 'run_default' bool. */
> +  ir_dereference_variable *deref_run_default =
> + new(ctx) ir_dereference_variable(state->switch_state.run_default);
> +  ir_rvalue *const cond_true = new(ctx) ir_constant(true);
> +   

[Mesa-dev] [PATCH 1/2] clover: Call end_query before getting timestamp result

2014-07-16 Thread Tom Stellard
Also change the wait parameter from false to true.
---

I'm really not sure what is correct here, but this patch fixes event profiling 
on SI.

 src/gallium/state_trackers/clover/core/timestamp.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/clover/core/timestamp.cpp 
b/src/gallium/state_trackers/clover/core/timestamp.cpp
index 481c4f9..a6edaf6 100644
--- a/src/gallium/state_trackers/clover/core/timestamp.cpp
+++ b/src/gallium/state_trackers/clover/core/timestamp.cpp
@@ -47,7 +47,8 @@ cl_ulong
 timestamp::query::operator()() const {
pipe_query_result result;
 
-   if (!q().pipe->get_query_result(q().pipe, _query, false, &result))
+   q().pipe->end_query(q().pipe, _query);
+   if (!q().pipe->get_query_result(q().pipe, _query, true, &result))
   throw error(CL_PROFILING_INFO_NOT_AVAILABLE);
 
return result.u64;
-- 
1.8.1.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] clover: Use 1 as default value for CL_DEVICE_PROFILING_TIMER_RESOLUTION

2014-07-16 Thread Tom Stellard
We use PIPE_QUERY_TIMESTAMP for profiling events, and gallium specifies
that the timestamp be in nanoseconds.
---
 src/gallium/state_trackers/clover/api/device.cpp | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/state_trackers/clover/api/device.cpp 
b/src/gallium/state_trackers/clover/api/device.cpp
index 1176668..25d29f5 100644
--- a/src/gallium/state_trackers/clover/api/device.cpp
+++ b/src/gallium/state_trackers/clover/api/device.cpp
@@ -249,7 +249,9 @@ clGetDeviceInfo(cl_device_id d_dev, cl_device_info param,
   break;
 
case CL_DEVICE_PROFILING_TIMER_RESOLUTION:
-  buf.as_scalar() = 0;
+  // PIPE_QUERY_TIMESTAMP returns a timestamp in units of nanoseconds,
+  // so we default to 1 here.
+  buf.as_scalar() = 1;
   break;
 
case CL_DEVICE_ENDIAN_LITTLE:
-- 
1.8.1.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] clover: Call end_query before getting timestamp result

2014-07-16 Thread Niels Ole Salscheider
On Wednesday 16 July 2014, 16:49:08, Tom Stellard wrote:
> Also change the wait parameter from false to true.
> ---
> 
> I'm really not sure what is correct here, but this patch fixes event
> profiling on SI.

I think you should call end_query in the constructor right after the call to 
create_query. That is because you want the corresponding packet to be emited 
as soon as the query is created and not when you are interested in the results 
(i. e. when the corresponding event has occured).

>  src/gallium/state_trackers/clover/core/timestamp.cpp | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/gallium/state_trackers/clover/core/timestamp.cpp
> b/src/gallium/state_trackers/clover/core/timestamp.cpp index
> 481c4f9..a6edaf6 100644
> --- a/src/gallium/state_trackers/clover/core/timestamp.cpp
> +++ b/src/gallium/state_trackers/clover/core/timestamp.cpp
> @@ -47,7 +47,8 @@ cl_ulong
>  timestamp::query::operator()() const {
> pipe_query_result result;
> 
> -   if (!q().pipe->get_query_result(q().pipe, _query, false, &result))
> +   q().pipe->end_query(q().pipe, _query);
> +   if (!q().pipe->get_query_result(q().pipe, _query, true, &result))
>throw error(CL_PROFILING_INFO_NOT_AVAILABLE);
> 
> return result.u64;
> --
> 1.8.1.5
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] r600g/compute: Defrag the pool if it's necesary

2014-07-16 Thread Bruno Jiménez
This patch adds a new member to the pool to track its status.
For now it is used only for the 'fragmented' status, but if
needed it could be used for more statuses.

The pool will be considered fragmented if: An item that isn't
the last is freed or demoted.

This 'strategy' has a problem, although it shouldn't cause any bug.
If for example we have two items, A and B. We choose to free A first,
now the pool will have the 'fragmented' status. If we now free B,
the pool will retain its 'fragmented' status even if it isn't
fragmented.
---
 src/gallium/drivers/r600/compute_memory_pool.c | 32 --
 src/gallium/drivers/r600/compute_memory_pool.h |  4 
 2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/r600/compute_memory_pool.c 
b/src/gallium/drivers/r600/compute_memory_pool.c
index 00b28bc..b158f5e 100644
--- a/src/gallium/drivers/r600/compute_memory_pool.c
+++ b/src/gallium/drivers/r600/compute_memory_pool.c
@@ -262,23 +262,10 @@ int compute_memory_finalize_pending(struct 
compute_memory_pool* pool,
unallocated += align(item->size_in_dw, ITEM_ALIGNMENT);
}
 
-   /* If we require more space than the size of the pool, then grow the
-* pool.
-*
-* XXX: I'm pretty sure this won't work.  Imagine this scenario:
-*
-* Offset Item Size
-*   0A50
-* 200B50
-* 400C50
-*
-* Total size = 450
-* Allocated size = 150
-* Pending Item D Size = 200
-*
-* In this case, there are 300 units of free space in the pool, but
-* they aren't contiguous, so it will be impossible to allocate Item D.
-*/
+   if (pool->status & POOL_FRAGMENTED) {
+   compute_memory_defrag(pool, pipe);
+   }
+
if (pool->size_in_dw < allocated + unallocated) {
err = compute_memory_grow_pool(pool, pipe, allocated + 
unallocated);
if (err == -1)
@@ -324,6 +311,8 @@ void compute_memory_defrag(struct compute_memory_pool *pool,
 
last_pos += align(item->size_in_dw, ITEM_ALIGNMENT);
}
+
+   pool->status &= ~POOL_FRAGMENTED;
 }
 
 int compute_memory_promote_item(struct compute_memory_pool *pool,
@@ -430,6 +419,10 @@ void compute_memory_demote_item(struct compute_memory_pool 
*pool,
 
/* Remember to mark the buffer as 'pending' by setting start_in_dw to 
-1 */
item->start_in_dw = -1;
+
+   if (item->link.next != pool->item_list) {
+   pool->status |= POOL_FRAGMENTED;
+   }
 }
 
 /**
@@ -533,6 +526,11 @@ void compute_memory_free(struct compute_memory_pool* pool, 
int64_t id)
LIST_FOR_EACH_ENTRY_SAFE(item, next, pool->item_list, link) {
 
if (item->id == id) {
+
+   if (item->link.next != pool->item_list) {
+   pool->status |= POOL_FRAGMENTED;
+   }
+
list_del(&item->link);
 
if (item->real_buffer) {
diff --git a/src/gallium/drivers/r600/compute_memory_pool.h 
b/src/gallium/drivers/r600/compute_memory_pool.h
index 5d18777..acc68ea 100644
--- a/src/gallium/drivers/r600/compute_memory_pool.h
+++ b/src/gallium/drivers/r600/compute_memory_pool.h
@@ -32,6 +32,8 @@
 #define ITEM_FOR_PROMOTING  (1<<2)
 #define ITEM_FOR_DEMOTING   (1<<3)
 
+#define POOL_FRAGMENTED (1<<0)
+
 struct compute_memory_pool;
 
 struct compute_memory_item
@@ -60,6 +62,8 @@ struct compute_memory_pool
 
uint32_t *shadow; ///host copy of the pool, used for defragmentation
 
+   uint32_t status;/**< Status of the pool */
+
struct list_head *item_list; ///Allocated memory chunks in the 
buffer,they must be ordered by "start_in_dw"
struct list_head *unallocated_list; ///Unallocated memory chunks
 };
-- 
2.0.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] r600g/compute: Add a function for defragmenting the pool

2014-07-16 Thread Bruno Jiménez
This new function will move items forward in the pool, so that
there's no gap between them, effectively defragmenting the pool.

For now this function is a bit dumb as it just moves items
forward without trying to see if other items in the pool could
fit in the gaps.
---
 src/gallium/drivers/r600/compute_memory_pool.c | 25 +
 src/gallium/drivers/r600/compute_memory_pool.h |  3 +++
 2 files changed, 28 insertions(+)

diff --git a/src/gallium/drivers/r600/compute_memory_pool.c 
b/src/gallium/drivers/r600/compute_memory_pool.c
index 0b41318..00b28bc 100644
--- a/src/gallium/drivers/r600/compute_memory_pool.c
+++ b/src/gallium/drivers/r600/compute_memory_pool.c
@@ -302,6 +302,30 @@ int compute_memory_finalize_pending(struct 
compute_memory_pool* pool,
return 0;
 }
 
+/**
+ * Defragments the pool, so that there's no gap between items.
+ * \param pool The pool to be defragmented
+ */
+void compute_memory_defrag(struct compute_memory_pool *pool,
+   struct pipe_context *pipe)
+{
+   struct compute_memory_item *item;
+   int64_t last_pos;
+
+   COMPUTE_DBG(pool->screen, "* compute_memory_defrag()\n");
+
+   last_pos = 0;
+   LIST_FOR_EACH_ENTRY(item, pool->item_list, link) {
+   if (item->start_in_dw != last_pos) {
+   assert(last_pos < item->start_in_dw);
+
+   compute_memory_move_item(pool, item, last_pos, pipe);
+   }
+
+   last_pos += align(item->size_in_dw, ITEM_ALIGNMENT);
+   }
+}
+
 int compute_memory_promote_item(struct compute_memory_pool *pool,
struct compute_memory_item *item, struct pipe_context *pipe,
int64_t allocated)
@@ -417,6 +441,7 @@ void compute_memory_demote_item(struct compute_memory_pool 
*pool,
  *
  * \param item The item that will be moved
  * \param new_start_in_dw  The new position of the item in \a item_list
+ * \see compute_memory_defrag
  */
 void compute_memory_move_item(struct compute_memory_pool *pool,
struct compute_memory_item *item, uint64_t new_start_in_dw,
diff --git a/src/gallium/drivers/r600/compute_memory_pool.h 
b/src/gallium/drivers/r600/compute_memory_pool.h
index 7332010..5d18777 100644
--- a/src/gallium/drivers/r600/compute_memory_pool.h
+++ b/src/gallium/drivers/r600/compute_memory_pool.h
@@ -86,6 +86,9 @@ void compute_memory_shadow(struct compute_memory_pool* pool,
 int compute_memory_finalize_pending(struct compute_memory_pool* pool,
struct pipe_context * pipe);
 
+void compute_memory_defrag(struct compute_memory_pool *pool,
+   struct pipe_context *pipe);
+
 int compute_memory_promote_item(struct compute_memory_pool *pool,
struct compute_memory_item *item, struct pipe_context *pipe,
int64_t allocated);
-- 
2.0.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] r600g/compute: Remove unneeded code from compute_memory_promote_item

2014-07-16 Thread Bruno Jiménez
Now that we know that the pool is defragmented, we positively know
that allocated + unallocated will be the total size of the
current pool plus all the items that will be promoted. So we only
need to grow the pool once.

This will allow us to just add the new items to the end of the
item_list without the need of looking for a place to the new item.
---
 src/gallium/drivers/r600/compute_memory_pool.c | 46 ++
 src/gallium/drivers/r600/compute_memory_pool.h |  2 +-
 2 files changed, 12 insertions(+), 36 deletions(-)

diff --git a/src/gallium/drivers/r600/compute_memory_pool.c 
b/src/gallium/drivers/r600/compute_memory_pool.c
index 75a8bd3..04aaac9 100644
--- a/src/gallium/drivers/r600/compute_memory_pool.c
+++ b/src/gallium/drivers/r600/compute_memory_pool.c
@@ -239,6 +239,7 @@ int compute_memory_finalize_pending(struct 
compute_memory_pool* pool,
 
int64_t allocated = 0;
int64_t unallocated = 0;
+   int64_t last_pos;
 
int err = 0;
 
@@ -276,14 +277,18 @@ int compute_memory_finalize_pending(struct 
compute_memory_pool* pool,
return -1;
}
 
+   /* After defragmenting the pool, allocated is equal to the first 
available
+* position for new items in the pool */
+   last_pos = allocated;
+
/* Loop through all the unallocated items, check if they are marked
 * for promoting, allocate space for them and add them to the 
item_list. */
LIST_FOR_EACH_ENTRY_SAFE(item, next, pool->unallocated_list, link) {
if (item->status & ITEM_FOR_PROMOTING) {
-   err = compute_memory_promote_item(pool, item, pipe, 
allocated);
-   item->status ^= ITEM_FOR_PROMOTING;
+   err = compute_memory_promote_item(pool, item, pipe, 
last_pos);
+   item->status &= ~ITEM_FOR_PROMOTING;
 
-   allocated += align(item->size_in_dw, ITEM_ALIGNMENT);
+   last_pos += align(item->size_in_dw, ITEM_ALIGNMENT);
 
if (err == -1)
return -1;
@@ -321,42 +326,14 @@ void compute_memory_defrag(struct compute_memory_pool 
*pool,
 
 int compute_memory_promote_item(struct compute_memory_pool *pool,
struct compute_memory_item *item, struct pipe_context *pipe,
-   int64_t allocated)
+   int64_t start_in_dw)
 {
struct pipe_screen *screen = (struct pipe_screen *)pool->screen;
struct r600_context *rctx = (struct r600_context *)pipe;
struct pipe_resource *src = (struct pipe_resource *)item->real_buffer;
-   struct pipe_resource *dst = NULL;
+   struct pipe_resource *dst = (struct pipe_resource *)pool->bo;
struct pipe_box box;
 
-   struct list_head *pos;
-   int64_t start_in_dw;
-   int err = 0;
-
-
-   /* Search for free space in the pool for this item. */
-   while ((start_in_dw=compute_memory_prealloc_chunk(pool,
-   item->size_in_dw)) == -1) {
-   int64_t need = item->size_in_dw + 2048 -
-   (pool->size_in_dw - allocated);
-
-   if (need <= 0) {
-   /* There's enough free space, but it's too
-* fragmented. Assume half of the item can fit
-* int the last chunk */
-   need = (item->size_in_dw / 2) + ITEM_ALIGNMENT;
-   }
-
-   need = align(need, ITEM_ALIGNMENT);
-
-   err = compute_memory_grow_pool(pool,
-   pipe,
-   pool->size_in_dw + need);
-
-   if (err == -1)
-   return -1;
-   }
-   dst = (struct pipe_resource *)pool->bo;
COMPUTE_DBG(pool->screen, "  + Found space for Item %p id = %u "
"start_in_dw = %u (%u bytes) size_in_dw = %u (%u 
bytes)\n",
item, item->id, start_in_dw, start_in_dw * 4,
@@ -366,8 +343,7 @@ int compute_memory_promote_item(struct compute_memory_pool 
*pool,
list_del(&item->link);
 
/* Add it back to the item_list */
-   pos = compute_memory_postalloc_chunk(pool, start_in_dw);
-   list_add(&item->link, pos);
+   list_addtail(&item->link, pool->item_list);
item->start_in_dw = start_in_dw;
 
if (src != NULL) {
diff --git a/src/gallium/drivers/r600/compute_memory_pool.h 
b/src/gallium/drivers/r600/compute_memory_pool.h
index acc68ea..5a1b33b 100644
--- a/src/gallium/drivers/r600/compute_memory_pool.h
+++ b/src/gallium/drivers/r600/compute_memory_pool.h
@@ -95,7 +95,7 @@ void compute_memory_defrag(struct compute_memory_pool *pool,
 
 int compute_memory_promote_item(struct compute_memory_pool *pool,
struct compute_memory_item *item, struct pipe_context *pipe,
-   int64_t allocated);
+   int64_t start_in_dw);
 

[Mesa-dev] [PATCH 0/5] [RFC] r600g/compute: Adding support for defragmenting compute_memory_pool

2014-07-16 Thread Bruno Jiménez
Hi,

This series finally adds support for defragmenting the pool for
OpenCL buffers in the r600g driver. It is mostly a rewritten of
the series that I wrote some months ago.

For defragmenting the pool I have thought of two different
possibilities:

- Creating a new pool and moving every item here in the correct
position. This has the advantage of being very simple to
implement and that it allows the pool to be grown at the
same time. But it has a couple of problems, namely that it
has a high memory peak usage (sum of current pool + new pool)
and that in the case of having a pool not very fragmented you
have to copy every item to its new place.
- Using the same pool by moving the items in it. This has the
advantage of using less memory (sum of current pool + biggest
item in it) and that it is easier to handle the case of
only having few elements out of place. The disadvantages
are that it doesn't allow growing the pool at the same time
and that it may involve twice the number of item-copies in 
the worst case.

I have chosen to implement the second option, but if you think
that it is better the first one I can rewrite the series for it.
(^_^)

The worst case I have mentioned is this: Imagine that you have
a series of items in which the first is, at least, 1 'unit'
smaller than the rest. You now free this item and create a new
one with the same size [why would anyone do this? I don't know]
For now, the defragmenter code is so dumb that it will move
every item to the front of the pool without trying first to
put this new item in the available space.

Hopefully situations like this won't be very common.

If you want me to explain any detail about any of the patches
just ask. And as said, if you prefer the first version of the
defragmenter, just ask. [In fact, after having written this,
I may add it for the case grow+defrag]

Also, no regressions found in piglit.

Thanks in advance!
Bruno

Bruno Jiménez (5):
  r600g/compute: Add a function for moving items in the pool
  r600g/compute: Add a function for defragmenting the pool
  r600g/compute: Defrag the pool if it's necesary
  r600g/compute: Quick exit if there's nothing to add to the pool
  r600g/compute: Remove unneeded code from compute_memory_promote_item

 src/gallium/drivers/r600/compute_memory_pool.c | 196 ++---
 src/gallium/drivers/r600/compute_memory_pool.h |  13 +-
 2 files changed, 156 insertions(+), 53 deletions(-)

-- 
2.0.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] r600g/compute: Quick exit if there's nothing to add to the pool

2014-07-16 Thread Bruno Jiménez
This way we can avoid defragmenting the pool, even if it is needed
to defragment it, and looping again through the list of unallocated
items.
---
 src/gallium/drivers/r600/compute_memory_pool.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/gallium/drivers/r600/compute_memory_pool.c 
b/src/gallium/drivers/r600/compute_memory_pool.c
index b158f5e..75a8bd3 100644
--- a/src/gallium/drivers/r600/compute_memory_pool.c
+++ b/src/gallium/drivers/r600/compute_memory_pool.c
@@ -262,6 +262,10 @@ int compute_memory_finalize_pending(struct 
compute_memory_pool* pool,
unallocated += align(item->size_in_dw, ITEM_ALIGNMENT);
}
 
+   if (unallocated == 0) {
+   return 0;
+   }
+
if (pool->status & POOL_FRAGMENTED) {
compute_memory_defrag(pool, pipe);
}
-- 
2.0.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



[Mesa-dev] [PATCH 1/5] r600g/compute: Add a function for moving items in the pool

2014-07-16 Thread Bruno Jiménez
This function will be used in the future by compute_memory_defrag
to move items forward in the pool.

It does so by first checking for overlaping ranges, if the ranges
don't overlap it will copy the contents directly. If they overlap
it will try first to make a temporary buffer, if this buffer fails
to allocate, it will finally fall back to a mapping.

Note that it will only be needed to move items forward, it only
checks for overlapping ranges in that case. If needed, it can
easily be added by changing the first if.
---
 src/gallium/drivers/r600/compute_memory_pool.c | 89 ++
 src/gallium/drivers/r600/compute_memory_pool.h |  4 ++
 2 files changed, 93 insertions(+)

diff --git a/src/gallium/drivers/r600/compute_memory_pool.c 
b/src/gallium/drivers/r600/compute_memory_pool.c
index fe19d9e..0b41318 100644
--- a/src/gallium/drivers/r600/compute_memory_pool.c
+++ b/src/gallium/drivers/r600/compute_memory_pool.c
@@ -408,6 +408,95 @@ void compute_memory_demote_item(struct compute_memory_pool 
*pool,
item->start_in_dw = -1;
 }
 
+/**
+ * Moves the item \a item forward in the pool to \a new_start_in_dw
+ *
+ * This function assumes two things:
+ * 1) The item is \b only moved forward
+ * 2) The item \b won't change it's position inside the \a item_list
+ *
+ * \param item The item that will be moved
+ * \param new_start_in_dw  The new position of the item in \a item_list
+ */
+void compute_memory_move_item(struct compute_memory_pool *pool,
+   struct compute_memory_item *item, uint64_t new_start_in_dw,
+   struct pipe_context *pipe)
+{
+   struct pipe_screen *screen = (struct pipe_screen *)pool->screen;
+   struct r600_context *rctx = (struct r600_context *)pipe;
+   struct pipe_resource *src = (struct pipe_resource *)pool->bo;
+   struct pipe_resource *dst;
+   struct pipe_box box;
+
+   struct compute_memory_item *prev;
+
+   COMPUTE_DBG(pool->screen, "* compute_memory_move_item()\n"
+   "  + Moving item %i from %u (%u bytes) to %u (%u 
bytes)\n",
+   item->id, item->start_in_dw, item->start_in_dw * 4,
+   new_start_in_dw, new_start_in_dw * 4);
+
+   if (pool->item_list != item->link.prev) {
+   prev = container_of(item->link.prev, item, link);
+   assert(prev->start_in_dw + prev->size_in_dw <= new_start_in_dw);
+   }
+
+   u_box_1d(item->start_in_dw * 4, item->size_in_dw * 4, &box);
+
+   /* If the ranges don't overlap, we can just copy the item directly */
+   if (new_start_in_dw + item->size_in_dw <= item->start_in_dw) {
+   dst = (struct pipe_resource *)pool->bo;
+
+   rctx->b.b.resource_copy_region(pipe,
+   dst, 0, new_start_in_dw * 4, 0, 0,
+   src, 0, &box);
+   } else {
+   /* The ranges overlap, we will try first to use an intermediate
+* resource to move the item */
+   dst = (struct pipe_resource *)r600_compute_buffer_alloc_vram(
+   pool->screen, item->size_in_dw * 4);
+
+   if (dst != NULL) {
+   rctx->b.b.resource_copy_region(pipe,
+   dst, 0, 0, 0, 0,
+   src, 0, &box);
+
+   src = dst;
+   dst = (struct pipe_resource *)pool->bo;
+
+   box.x = 0;
+
+   rctx->b.b.resource_copy_region(pipe,
+   dst, 0, new_start_in_dw * 4, 0, 0,
+   src, 0, &box);
+
+   pool->screen->b.b.resource_destroy(screen, src);
+
+   } else {
+   /* The allocation of the temporary resource failed,
+* falling back to use mappings */
+   uint32_t *map;
+   int64_t offset;
+   struct pipe_transfer *trans;
+
+   offset = item->start_in_dw - new_start_in_dw;
+
+   u_box_1d(new_start_in_dw * 4, (offset + 
item->size_in_dw) * 4, &box);
+
+   map = pipe->transfer_map(pipe, src, 0, 
PIPE_TRANSFER_READ_WRITE,
+   &box, &trans);
+
+   assert(map);
+   assert(trans);
+
+   memmove(map, map + offset, item->size_in_dw * 4);
+
+   pipe->transfer_unmap(pipe, trans);
+   }
+   }
+
+   item->start_in_dw = new_start_in_dw;
+}
+
 void compute_memory_free(struct compute_memory_pool* pool, int64_t id)
 {
struct compute_memory_item *item, *next;
diff --git a/src/gallium/drivers/r600/compute_memory_pool.h 
b/src/gallium/drivers/r600/compute_memory_pool.h
index 259474a..7332010 100644
--- a/src/gallium/drivers/r600/compute_memory_pool.h
+++ b/src/gallium/drivers/r600/compute_memory_

[Mesa-dev] [PATCH 1/2] clover: Call end_query before getting timestamp result v2

2014-07-16 Thread Tom Stellard
v2:
  - Move the end_query() call into the timestamp constructor.
  - Still pass false as the wait parameter to get_query_result().
---
 src/gallium/state_trackers/clover/core/timestamp.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/state_trackers/clover/core/timestamp.cpp 
b/src/gallium/state_trackers/clover/core/timestamp.cpp
index 481c4f9..3fd341f 100644
--- a/src/gallium/state_trackers/clover/core/timestamp.cpp
+++ b/src/gallium/state_trackers/clover/core/timestamp.cpp
@@ -30,6 +30,7 @@ using namespace clover;
 timestamp::query::query(command_queue &q) :
q(q),
_query(q.pipe->create_query(q.pipe, PIPE_QUERY_TIMESTAMP, 0)) {
+   q.pipe->end_query(q.pipe, _query);
 }
 
 timestamp::query::query(query &&other) :
-- 
1.8.1.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] clover: Call end_query before getting timestamp result v2

2014-07-16 Thread Niels Ole Salscheider
Reviewed-by: Niels Ole Salscheider 

On Wednesday 16 July 2014, 17:37:48, Tom Stellard wrote:
> v2:
>   - Move the end_query() call into the timestamp constructor.
>   - Still pass false as the wait parameter to get_query_result().
> ---
>  src/gallium/state_trackers/clover/core/timestamp.cpp | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/gallium/state_trackers/clover/core/timestamp.cpp
> b/src/gallium/state_trackers/clover/core/timestamp.cpp index
> 481c4f9..3fd341f 100644
> --- a/src/gallium/state_trackers/clover/core/timestamp.cpp
> +++ b/src/gallium/state_trackers/clover/core/timestamp.cpp
> @@ -30,6 +30,7 @@ using namespace clover;
>  timestamp::query::query(command_queue &q) :
> q(q),
> _query(q.pipe->create_query(q.pipe, PIPE_QUERY_TIMESTAMP, 0)) {
> +   q.pipe->end_query(q.pipe, _query);
>  }
> 
>  timestamp::query::query(query &&other) :
> --
> 1.8.1.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/10] [RFC] Probably useless algebraic optimizations

2014-07-16 Thread Thomas Helland
>Some of these optimizations are pretty obscure and I can't imagine a
>real shader hitting them.
>

Some of these optimizations are indeed obscure.
Only reason I can think of for any of these
to succeed are if some other optimizations
has pushed constants up the tree, or eliminated things.

>What's the cost of checking for these cases?  I don't know how expensive
>the equals() methods are.
>

The time it takes to run a shader-db run on my machine
has remained pretty constant when adding these.
(Within 140 +/- 5 seconds).

>Do we want to litter the optimizer with cases that may never be used in
>practice?
>

Obviously we don't want to fill our optimizer with junk.
If nobody sees these as useful, then lets not waste more
time on them, and get on with more important things.

(This mail wll probably not get threaded properly,
 shitty gmail web interface)

-Thomas
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Fix z_offset computation in intel_miptree_unmap_depthstencil()

2014-07-16 Thread Anuj Phogat
The bug is triggered by using glTexSubImage2d() with GL_DEPTH_STENCIL
as base internal format and non-zero x, y offsets. Currently x, y
offsets are ignored while updating the texture image.

Fixes Khronos GLES3 CTS tests:
npot_tex_sub_image_2d
npot_tex_sub_image_3d
npot_pbo_tex_sub_image_2d
npot_pbo_tex_sub_image_2d

Cc: 
Signed-off-by: Anuj Phogat 
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 2ab0faa..b36ffc7 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -2129,9 +2129,9 @@ intel_miptree_unmap_depthstencil(struct brw_context *brw,
 x + s_image_x + map->x,
 y + s_image_y + map->y,
 brw->has_swizzling);
-   ptrdiff_t z_offset = ((y + z_image_y) *
+   ptrdiff_t z_offset = ((y + z_image_y + map->y) *
   (z_mt->pitch / 4) +
- (x + z_image_x));
+ (x + z_image_x + map->x));
 
if (map_z32f_x24s8) {
   z_map[z_offset] = packed_map[(y * map->w + x) * 2 + 0];
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] Revert "i965: Extend compute-to-mrf pass to understand blocks of MOVs"

2014-07-16 Thread Anuj Phogat
This reverts commit bbefb15e01e1c16af69646898918982ae00f8c92.
Fixes the 11 regressions caused in framebuffer_blit tests in
Khronos GLES3 CTS tests:

Original patch reduced the instruction count but had no performance
benefits. So, it's safe to revert it without causing any performance
regressions.

Signed-off-by: Anuj Phogat 
Acked-by: Kristian Høgsberg 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 63 ++--
 1 file changed, 10 insertions(+), 53 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index a3ad375..ccd9ac1 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2039,8 +2039,7 @@ bool
 fs_visitor::compute_to_mrf()
 {
bool progress = false;
-   int next_ip = 0, block_size = 0, step = dispatch_width / 8;
-   fs_inst *block_start = NULL, *block_end = NULL;
+   int next_ip = 0;
 
calculate_live_intervals();
 
@@ -2054,27 +2053,8 @@ fs_visitor::compute_to_mrf()
  inst->dst.type != inst->src[0].type ||
  inst->src[0].abs || inst->src[0].negate ||
   !inst->src[0].is_contiguous() ||
-  inst->src[0].subreg_offset) {
- block_start = NULL;
+  inst->src[0].subreg_offset)
 continue;
-  }
-
-  /* We're trying to identify a block of GRF-to-MRF MOVs for the purpose
-   * of rewriting the send that assigned the GRFs to just return in the
-   * MRFs directly.  send can't saturate, so if any of the MOVs do that,
-   * cancel the block.
-   */
-  if (inst->saturate) {
- block_start = NULL;
-  } else if (block_start && inst->dst.reg == block_end->dst.reg + step &&
- inst->src[0].reg == block_end->src[0].reg &&
- inst->src[0].reg_offset == block_end->src[0].reg_offset + 1) {
- block_size++;
- block_end = inst;
-  } else if (inst->src[0].reg_offset == 0) {
- block_size = 1;
- block_start = block_end = inst;
-  }
 
   /* Work out which hardware MRF registers are written by this
* instruction.
@@ -2117,8 +2097,14 @@ fs_visitor::compute_to_mrf()
if (scan_inst->is_partial_write())
   break;
 
-   /* SEND instructions can't have MRF as a destination before Gen7. */
-   if (brw->gen < 7 && scan_inst->mlen)
+/* Things returning more than one register would need us to
+ * understand coalescing out more than one MOV at a time.
+ */
+if (scan_inst->regs_written > 1)
+   break;
+
+   /* SEND instructions can't have MRF as a destination. */
+   if (scan_inst->mlen)
   break;
 
if (brw->gen == 6) {
@@ -2130,35 +2116,6 @@ fs_visitor::compute_to_mrf()
   }
}
 
-/* We have a contiguous block of mov to MRF that aligns with the
- * return registers of a send instruction.  Modify the send
- * instruction to just return in the MRFs.
- */
-if (scan_inst->mlen > 0 &&
-scan_inst->regs_written == block_size && block_size > 1) {
-   int i = 0;
-
-   scan_inst->dst.file = MRF;
-   scan_inst->dst.reg = block_start->dst.reg;
-   assert(!block_start->saturate);
-
-   for (fs_inst *next, *mov = block_start;
-i < block_size;
-mov = next, i++) {
-  next = (fs_inst *) mov->next;
-  mov->remove();
-   }
-
-   progress = true;
-   break;
-}
-
-/* If the block size we've tracked doesn't match the regs_written
- * of the instruction, we can't do anything.
- */
-if (scan_inst->regs_written > 1)
-   break;
-
if (scan_inst->dst.reg_offset == inst->src[0].reg_offset) {
   /* Found the creator of our MRF's source value. */
   scan_inst->dst.file = MRF;
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Backends and support for pow-instructions

2014-07-16 Thread Thomas Helland
2014-07-13 20:13 GMT+02:00 Matt Turner :
>
> On Sun, Jul 13, 2014 at 10:50 AM, Thomas Helland
>  wrote:
> > I've considered writing an algebraic optimization to convert
> > this into an ir_binop_pow. If my understanding is correct the backend
> > will then implement this in a similar fashion as above if it does not
> > have a native pow() instruction.
> >
> > If, on the other hand, we have a pow() instruction, my guess is
> > we'd see reduced instruction-counts.
> >
> > Is my understanding correct? Is this something that's worth doing?
>
> Yes and yes :)
>
> It's something I've thought about doing for a while. The only hang-up
> is that we don't get nice expression trees to match in opt_algebraic.
> Ideally, we'd get an ir_instruction with an rvalue that looked like
>
> (assign (xyz) (var_ref r3) (expression vec3 log2 (expression vec3 *
> (expression vec3 exp2 (swiz xyz (var_ref r3))) (constant vec3
> (2.20 2.20 2.20)
>
> and then the bit of code in opt_algebraic is simple. Unfortunately, r3
> is likely a vec4 and is used repeatedly throughout the shader for many
> unrelated things. If we were able to split up these variables (i.e.,
> recognize that the use of r3 for log2/mul/exp2 is a distinct live
> range from the other uses of r3, and give it a new variable name) then
> tree grafting would be able to give us the expression tree that we
> want.
>

So we would probably be helped with a UD-chain, and a pass to
make new variables for each of the new definitions?
As far as I've managed to aclimate to the code-base we
do not have such a feature yet in the glsl-compiler?

> That would let a lot of existing optimization passes perform better as well.
>
> Ken and I worked on this kind of pass in the i965 backend [0]. It
> looked for full register writes outside of control flow, assigned the
> result to a new register, and rewrote future uses of the old with the
> new register. Something like that at the GLSL IR level would do the
> trick. One problem to solve is how to handle partial writes of
> variables, since in the case you brought up the shader only uses 3
> components of a vec4, but they're still a distinct live range.
>

I guess we would need to keep track of the uses and defs for
each component in the vector, some kind of fancy UD-chain
that works component-wise, and also globally on the vector.

I accidentally stumbled across some work in Eric's git-repo that
looks pretty useful as a basis for how to go about this. [1]
It seems to implement live-variable analysis that are both
control-flow and swizzle-aware, and works component-wise.
I have only given it a short glimpse, but seems promising.

> I'd be happy to help if you're interested in giving this a shot. I'm
> always on Freenode (#dri-devel, #intel-gfx).
>
> [0] http://lists.freedesktop.org/archives/mesa-dev/2014-April/057812.html

[1] 
http://cgit.freedesktop.org/~anholt/mesa/commit/?h=deadcode&id=1752b0916424a0e9d1596832b7961bc56a618de2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glapi: add indexed blend functions (GL 4.0)

2014-07-16 Thread Anuj Phogat
On Mon, Jul 14, 2014 at 9:38 PM, Tapani Pälli  wrote:
> This makes some of the UE4 engine demos (Stylized, Mobile Temple)
> render correctly, tested on Intel Haswell machine.
>
> Signed-off-by: Tapani Pälli 
> ---
>  src/mapi/glapi/gen/GL4x.xml | 26 ++
>  src/mesa/main/tests/dispatch_sanity.cpp | 10 +-
>  2 files changed, 31 insertions(+), 5 deletions(-)
>
> diff --git a/src/mapi/glapi/gen/GL4x.xml b/src/mapi/glapi/gen/GL4x.xml
> index 8efef0b..848316e 100644
> --- a/src/mapi/glapi/gen/GL4x.xml
> +++ b/src/mapi/glapi/gen/GL4x.xml
> @@ -12,6 +12,32 @@
>
>  
>
> +
> +  
> +
> +
> +
> +  
> +
> +   alias="BlendFuncSeparateiARB">
> +
> +
> +
> +
> +
> +  
> +
> +   alias="BlendEquationiARB">
> +
> +
> +  
> +
> +   alias="BlendEquationSeparateiARB" >
> +
> +
> +
> +  
> +
>  
>
>  
> diff --git a/src/mesa/main/tests/dispatch_sanity.cpp 
> b/src/mesa/main/tests/dispatch_sanity.cpp
> index 0e57653..1a2c4d0 100644
> --- a/src/mesa/main/tests/dispatch_sanity.cpp
> +++ b/src/mesa/main/tests/dispatch_sanity.cpp
> @@ -542,11 +542,11 @@ const struct function gl_core_functions_possible[] = {
> { "glVertexAttribDivisor", 33, -1 },
>
> /* GL 4.0 */
> -   { "glMinSampleShading", 40, -1 },// XXX: Add to xml
> -// { "glBlendEquationi", 40, -1 },  // XXX: Add to xml
> -// { "glBlendEquationSeparatei", 40, -1 },  // XXX: Add to xml
> -// { "glBlendFunci", 40, -1 },  // XXX: Add to xml
> -// { "glBlendFuncSeparatei", 40, -1 },  // XXX: Add to xml
> +   { "glMinSampleShading", 40, -1 },
> +   { "glBlendEquationi", 40, -1 },
> +   { "glBlendEquationSeparatei", 40, -1 },
> +   { "glBlendFunci", 40, -1 },
> +   { "glBlendFuncSeparatei", 40, -1 },
>
> /* GL 4.3 */
> { "glIsRenderbuffer", 43, -1 },
> --
> 1.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

I agree with Ken's comment# 11 on bugzilla.
Acked-by: Anuj Phogat 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Updated debdiff for mesa to compile on m68k

2014-07-16 Thread John Paul Adrian Glaubitz
Hi Thorsten!

On 07/16/2014 01:03 PM, Thorsten Glaser wrote:
> updated debdiff attached. Package compiles fine with it
> (on i386 and m68k), and the invalid alignment assumptions
> were made explicit with no ABI breakage.

Awesome! That's great to hear!

> Please apply.

Absolutely. Could the upstream Mesa developers maybe apply the patch
as well?

We're putting lots of efforts into the m68k port and we have many
users who love running Debian on retro m68k hardware and emulators
and we even got some official funding through Debian to buy hardware.
Even Greg Kroah-Hartman says he appreciates the port when I asked
him about on LinuxTag, it helps spotting regressions :).

So, please help us and apply Thorsten's patch provided it won't
break anything else.

Thanks!

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] Fix crash in update_framebuffer_state

2014-07-16 Thread David Weber
Hi,

GPU: Radeon HD 5770
mesa: 10.2.2 with gallium/llvm backend
llvm: 3.4.2
linux: 3.15.3
xf86-video-ati: 7.4.0

Switching from the software to the OpenGL backend in Gwenview with EGL
enabled QT4 crashes with the following BT:
state_tracker/st_atom_framebuffer.c:60:update_framebuffer_size:
Assertion `surface' failed.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x7fffe09e7ec1 in _debug_assert_fail (expr=0x7fffe0f9f85c
"surface", file=0x7fffe0f9f838 "state_tracker/st_atom_framebuffer.c",
line=60,
function=0x7fffe0f9f9a0 <__func__.33915>
"update_framebuffer_size") at util/u_debug.c:277
277 util/u_debug.c: Datei oder Verzeichnis nicht gefunden.
(gdb) bt
#0  0x7fffe09e7ec1 in _debug_assert_fail (expr=0x7fffe0f9f85c
"surface", file=0x7fffe0f9f838 "state_tracker/st_atom_framebuffer.c",
line=60,
function=0x7fffe0f9f9a0 <__func__.33915>
"update_framebuffer_size") at util/u_debug.c:277
#1  0x7fffe0c8715d in update_framebuffer_size
(framebuffer=0x17f82b0, surface=0x0) at
state_tracker/st_atom_framebuffer.c:60
#2  0x7fffe0c87446 in update_framebuffer_state (st=0x17f76d0) at
state_tracker/st_atom_framebuffer.c:132
#3  0x7fffe0c84457 in st_validate_state (st=0x17f76d0) at
state_tracker/st_atom.c:213
#4  0x7fffe0c91618 in st_Clear (ctx=0x17b3a30, mask=2) at
state_tracker/st_cb_clear.c:446
#5  0x7fffe0b10a39 in _mesa_Clear (mask=16384) at main/clear.c:226
#6  0x720c9aaa in ?? () from /usr/lib64/qt4/libQtOpenGL.so.4
#7  0x74a21cfb in QPainter::begin(QPaintDevice*) () from
/usr/lib64/qt4/libQtGui.so.4
#8  0x74a22768 in QPainter::QPainter(QPaintDevice*) () from
/usr/lib64/qt4/libQtGui.so.4
#9  0x74ec9544 in QGraphicsView::paintEvent(QPaintEvent*) ()
from /usr/lib64/qt4/libQtGui.so.4
#10 0x749221f0 in QWidget::event(QEvent*) () from
/usr/lib64/qt4/libQtGui.so.4
#11 0x74cb595e in QFrame::event(QEvent*) () from
/usr/lib64/qt4/libQtGui.so.4
#12 0x74ecd32b in QGraphicsView::viewportEvent(QEvent*) ()
from /usr/lib64/qt4/libQtGui.so.4
#13 0x76a9f223 in
QCoreApplicationPrivate::sendThroughObjectEventFilters(QObject*,
QEvent*) () from /usr/lib64/qt4/libQtCore.so.4
#14 0x748d4bac in QApplicationPrivate::notify_helper(QObject*,
QEvent*) () from /usr/lib64/qt4/libQtGui.so.4
#15 0x748d7602 in QApplication::notify(QObject*, QEvent*) ()
from /usr/lib64/qt4/libQtGui.so.4
#16 0x75600a08 in KApplication::notify(QObject*, QEvent*) ()
from /usr/lib64/libkdeui.so.5
#17 0x76a9f0ad in QCoreApplication::notifyInternal(QObject*,
QEvent*) () from /usr/lib64/qt4/libQtCore.so.4
#18 0x7492705f in QWidgetPrivate::drawWidget(QPaintDevice*,
QRegion const&, QPoint const&, int, QPainter*, QWidgetBackingStore*)
() from /usr/lib64/qt4/libQtGui.so.4
#19 0x74ae5639 in QWidgetPrivate::repaint_sys(QRegion const&)
() from /usr/lib64/qt4/libQtGui.so.4
#20 0x749159e4 in QWidgetPrivate::syncBackingStore() () from
/usr/lib64/qt4/libQtGui.so.4
#21 0x74922691 in QWidget::event(QEvent*) () from
/usr/lib64/qt4/libQtGui.so.4
#22 0x7209fd0a in QGLWidget::event(QEvent*) () from
/usr/lib64/qt4/libQtOpenGL.so.4
#23 0x748d4bcc in QApplicationPrivate::notify_helper(QObject*,
QEvent*) () from /usr/lib64/qt4/libQtGui.so.4
#24 0x748d7602 in QApplication::notify(QObject*, QEvent*) ()
from /usr/lib64/qt4/libQtGui.so.4
#25 0x75600a08 in KApplication::notify(QObject*, QEvent*) ()
from /usr/lib64/libkdeui.so.5
#26 0x76a9f0ad in QCoreApplication::notifyInternal(QObject*,
QEvent*) () from /usr/lib64/qt4/libQtCore.so.4
#27 0x76aa26e8 in
QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*)
() from /usr/lib64/qt4/libQtCore.so.4
#28 0x76acd653 in ?? () from /usr/lib64/qt4/libQtCore.so.4
#29 0x705e4a94 in g_main_context_dispatch () from
/usr/lib64/libglib-2.0.so.0
#30 0x705e4df0 in ?? () from /usr/lib64/libglib-2.0.so.0
#31 0x705e4eac in g_main_context_iteration () from
/usr/lib64/libglib-2.0.so.0
#32 0x76acd7c6 in
QEventDispatcherGlib::processEvents(QFlags)
() from /usr/lib64/qt4/libQtCore.so.4
#33 0x74975f26 in ?? () from /usr/lib64/qt4/libQtGui.so.4
#34 0x76a9dcef in
QEventLoop::processEvents(QFlags) ()
from /usr/lib64/qt4/libQtCore.so.4
#35 0x76a9dfd0 in
QEventLoop::exec(QFlags) () from
/usr/lib64/qt4/libQtCore.so.4
#36 0x74d6e147 in QDialog::exec() () from /usr/lib64/qt4/libQtGui.so.4
#37 0x0044c7f3 in Gwenview::MainWindow::showConfigDialog
(this=0xaf1e30) at /home/weber/work/gwenview/app/mainwindow.cpp:1468
#38 0x76ab3644 in QMetaObject::activate(QObject*, QMetaObject
const*, int, void**) () from /usr/lib64/qt4/libQtCore.so.4
#39 0x748ce752 in QAction::triggered(bool) () from
/usr/lib64/qt4/libQtGui.so.4
#40 0x748cfb00 in QAction::activate(QAction::ActionEvent) ()
from /usr/lib64/qt4/libQtGui.so.4
#41 0x74ceef83 in ?? () from /usr/lib64

Re: [Mesa-dev] Updated debdiff for mesa to compile on m68k

2014-07-16 Thread Thorsten Glaser
On Wed, 16 Jul 2014, John Paul Adrian Glaubitz wrote:

> Absolutely. Could the upstream Mesa developers maybe apply the patch
> as well?

They are not taking us for real, see #728053 for their feedback…

> We're putting lots of efforts into the m68k port and we have many
> users who love running Debian on retro m68k hardware and emulators
> and we even got some official funding through Debian to buy hardware.
> Even Greg Kroah-Hartman says he appreciates the port when I asked
> him about on LinuxTag, it helps spotting regressions :).

Not just that – I’ve seen people run MiNT on Atari with actual
ATI Radeon cards (not yet supported on Linux due to missing
kernel-side PCI bus glue, but probably not much work), and
it’s not a stretch to believe Nięvida could be next.

bye,
//mirabilos
-- 
Sometimes they [people] care too much: pretty printers [and syntax highligh-
ting, d.A.] mechanically produce pretty output that accentuates irrelevant
detail in the program, which is as sensible as putting all the prepositions
in English text in bold font.   -- Rob Pike in "Notes on Programming in C"
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Updated debdiff for mesa to compile on m68k

2014-07-16 Thread Eero Tamminen
Hi,

On keskiviikko 16 heinäkuu 2014, Thorsten Glaser wrote:
> On Wed, 16 Jul 2014, John Paul Adrian Glaubitz wrote:
> > Absolutely. Could the upstream Mesa developers maybe apply the patch
> > as well?
> 
> They are not taking us for real, see #728053 for their feedback…

While effect of unaligned accesses is normally invisible,
that's not always the case, even on non-m68k platforms.

It's bad/sloppy coding, because:
- There's other (newer) HW besides m68k which has alignment
  requirements [1]. On ARM that can be even configured, both
  at CPU and (Linux) kernel level.  Even on Intel e.g. atomic
  accesses need be aligned
- even if HW supports unaligned accesses, it's newer faster
  nor safer than aligned, and in some cases it can be a lot
  slower [2] (extreme case is when each access causes interrupt)
- if gcc happens to add padding [3] for (non-packed) unaligned
  structure members to align them (because HW requires it or
  because it's faster), those structures use more memory than
  structures where their members are at their natural alignment
  - If that causes data not to fit cache, for a frequently
used structure(s), that's also slower


- Eero

[1] Hasn't anybody else had issues with this on some other
architecture?

[2] http://stackoverflow.com/questions/12491578/whats-the-actual-effect-of-
successful-unaligned-accesses-on-x86

[3] "pahole" tool from "dwarves" package can be used to inspect
how hole-ridden structures in binaries (with debug data) are.
Valgrind DHAT tool can be used to inspect how much those
structure members get accesses:
http://valgrind.org/docs/manual/dh-manual.html
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 2/2] i965/fs: Set force_uncompressed and force_sechalf on samplepos setup.

2014-07-16 Thread Anuj Phogat
On Thu, Jul 10, 2014 at 8:51 PM, Kenneth Graunke  wrote:
> gen8_fs_generator uses these to decide whether to set the execution size
> to 8 or 16, so we incorrectly made both of these MOVs the full width in
> SIMD16 shaders.  (It happened to work out on Gen4-7.)
>
> Setting them should also help inform optimization passes what's really
> going on, which could help avoid bugs.
>
> Signed-off-by: Kenneth Graunke 
> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index a3ad375..ceea32a 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -1248,19 +1248,21 @@ fs_visitor::emit_samplepos_setup(ir_variable *ir)
>stride(retype(brw_vec1_grf(payload.sample_pos_reg, 0),
>  BRW_REGISTER_TYPE_B), 16, 8, 2);
>
> -   emit(MOV(int_sample_x, fs_reg(sample_pos_reg)));
> +   fs_inst *inst = emit(MOV(int_sample_x, fs_reg(sample_pos_reg)));
> if (dispatch_width == 16) {
> -  fs_inst *inst = emit(MOV(half(int_sample_x, 1),
> -   fs_reg(suboffset(sample_pos_reg, 16;
> +  inst->force_uncompressed = true;
> +  inst = emit(MOV(half(int_sample_x, 1),
> +  fs_reg(suboffset(sample_pos_reg, 16;
>inst->force_sechalf = true;
> }
> /* Compute gl_SamplePosition.x */
> compute_sample_position(pos, int_sample_x);
> pos.reg_offset++;
> -   emit(MOV(int_sample_y, fs_reg(suboffset(sample_pos_reg, 1;
> +   inst = emit(MOV(int_sample_y, fs_reg(suboffset(sample_pos_reg, 1;
> if (dispatch_width == 16) {
> -  fs_inst *inst = emit(MOV(half(int_sample_y, 1),
> -   fs_reg(suboffset(sample_pos_reg, 17;
> +  inst->force_uncompressed = true;
> +  inst = emit(MOV(half(int_sample_y, 1),
> +  fs_reg(suboffset(sample_pos_reg, 17;
>inst->force_sechalf = true;
> }
> /* Compute gl_SamplePosition.y */
> --
> 2.0.0
>
> ___
> mesa-stable mailing list
> mesa-sta...@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-stable

Both patches are:
Reviewed-by: Anuj Phogat 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glapi: add indexed blend functions (GL 4.0)

2014-07-16 Thread Matt Turner
On Wed, Jul 16, 2014 at 5:38 PM, Anuj Phogat  wrote:
> I agree with Ken's comment# 11 on bugzilla.

For those attempting to follow along:
https://bugs.freedesktop.org/show_bug.cgi?id=78716

Ian, there's a spec question in there for you.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glapi: add indexed blend functions (GL 4.0)

2014-07-16 Thread Matt Turner
On Mon, Jul 14, 2014 at 9:38 PM, Tapani Pälli  wrote:
> This makes some of the UE4 engine demos (Stylized, Mobile Temple)
> render correctly, tested on Intel Haswell machine.
>
> Signed-off-by: Tapani Pälli 

This should have a Bugzilla: tag.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/5] RadeonSI Draw Indirect re-spin

2014-07-16 Thread Michel Dänzer
On 08.07.2014 10:37, Marek Olšák wrote:
> I'm re-sending what remains to be merged to get ARB_draw_indirect on 
> radeonsi. They are patches 3-5.
> 
> The first 2 patches bump the LLVM version requirement to 3.4.2, which 
> contains a fix for the calling convention to allow one more user SGPR.
> 
> Please review.

The series is

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glapi: add indexed blend functions (GL 4.0)

2014-07-16 Thread Tapani

On 07/17/2014 04:31 AM, Matt Turner wrote:

On Mon, Jul 14, 2014 at 9:38 PM, Tapani Pälli  wrote:

This makes some of the UE4 engine demos (Stylized, Mobile Temple)
render correctly, tested on Intel Haswell machine.

Signed-off-by: Tapani Pälli 

This should have a Bugzilla: tag.


Yep, I did not want to refer bug #78716 because I think we should open 
up new ones for the driver issues found (original issues for opening 
this bug were in UE4 itself and have been resolved). But agreed, I was 
also too lazy to open up new bug for this :) I'll add Bugzilla tag.


// Tapani
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev