date:20141106

On Thu, Aug 28, 2014 at 12:50 PM, Matt Turner matts...@gmail.com wrote:
 On Tue, Aug 12, 2014 at 9:48 AM, Kenneth Graunke kenn...@whitecape.org 
 wrote:
 diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
 index d3509a0..19f7ef8 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
 @@ -186,7 +186,7 @@ fs_generator::generate_fb_write(fs_inst *inst)
brw_imm_ud(0x1  11));
   }

 -if (inst-target  0) {
 +if (fp  inst-target  0) {
 /* Set the render target index for choosing BLEND_STATE. */
 brw_MOV(p, retype(brw_vec1_reg(BRW_MESSAGE_REGISTER_FILE,
inst-base_mrf, 2),

 I don't understand this change. I don't see fp being used elsewhere in
 this patch.

 Patch 2 is

 Reviewed-by: Matt Turner matts...@gmail.com

 and so is this one once this hunk is sorted out.

Ping?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] egl: dri2: Use present extension. (Was: Re: [RFC] egl: Add DRI3 support to the EGL backend.)

2014-11-06 Thread Michel Dänzer


On 05.11.2014 20:14, Joonas Lahtinen wrote:


Modified not refer to DRI3, just uses the present extension to get rid
of the excess buffer invalidations.


AFAICT there's no fallback from your changes to the current behaviour if 
the X server doesn't support the Present extension. There probably needs 
to be such a fallback.



--
Earthling Michel Dänzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 84566] Unify the format conversion code

https://bugs.freedesktop.org/show_bug.cgi?id=84566

--- Comment #49 from Iago Toral ito...@igalia.com ---
Jason you had this commit in your original branch: MAYBEREVERT: Fill X
components with 1

That basically makes packing to padded formats (like RGBX) set X=1. In the
commit message you mention that you are not sure you like this... my opinion is
that this should not be necessary and it degrades performance when packing to
these format, so I would be more for removing this one. What do you think?

FWIW, I have checked that removing this commit does not affect piglit results.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping

2014-11-06 Thread Juha-Pekka Heikkila

On 05.11.2014 21:21, Ian Romanick wrote:
 On 11/04/2014 01:24 PM, Roland Scheidegger wrote:
 Am 04.11.2014 um 13:05 schrieb Juha-Pekka Heikkila:
 +   for(i = 0; i  n; i++) {
 +  _mesa_clamp_float_rgba(rgba_src[i], temp, min, max);
 +
 +  *operand = _mm_mul_ps(multiplier, *operand);
 +  truncated_integers = _mm_cvttps_epi32(*operand);
 +  mmove = _mm_set_ps(aMap[map_p[ACOMP]], bMap[map_p[BCOMP]],
 + gMap[map_p[GCOMP]], rMap[map_p[RCOMP]] );
 +
 +  _mm_storeu_ps(rgba_dst[i], mmove);
 The sse2 code at the end looks counterproductive to me. Not sure what
 gcc will generate but I'd suspect it involves some simd-int domain
 transition for the table lookups, plus another int-simd transition to
 get the values back into simd domain (alternatively it might use
 stores/load here) just so you can store them again...
 It would probably be better to just store the values directly after the
 table lookups.
 But in any case actually I'm beginning to suspect noone really cares
 about performance anyway for that path (who the hell uses these
 scale/map features?) so whatever works...
 
 Which raises another question... do we have any piglit tests that
 actually exercise this path?

No we don't. I made small test for this to see how it works, I was
planning to move my test to Piglit later.

/Juha-Pekka

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] egl: dri2: Use present extension. (Was: Re: [RFC] egl: Add DRI3 support to the EGL backend.)

2014-11-06 Thread Joonas Lahtinen

On to, 2014-11-06 at 18:12 +0900, Michel Dänzer wrote:
 On 05.11.2014 20:14, Joonas Lahtinen wrote:
 
  Modified not refer to DRI3, just uses the present extension to get rid
  of the excess buffer invalidations.
 
 AFAICT there's no fallback from your changes to the current behaviour if 
 the X server doesn't support the Present extension. There probably needs 
 to be such a fallback.
 
 

It gets rid of such nasty hack (the intel_viewport one), that I thought
there is no point making fallback. Because without this, the egl dri2
backend is fundamentally broken anyway.

Regards, Joonas

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] egl: dri2: Use present extension. (Was: Re: [RFC] egl: Add DRI3 support to the EGL backend.)

2014-11-06 Thread Joonas Lahtinen

On ke, 2014-11-05 at 15:19 +, Emil Velikov wrote:
 Hi Joonas,
 
 Does getting rid of the viewport hack give you any noticeable
 performance improvement?

Yes, it significantly reduces the CPU load when multiple glViewport
calls are made per frame (4x4 grid or so).

 Is there any interest in converting the
 egl_dri2 backend to dri3, rather than just copying over the present bits ?
 

This could be one thing to do. But in the meanwhile, I would commit this
present extension patch so that the affected use cases get the
improvements.

Regards, Joonas

 On 05/11/14 11:14, Joonas Lahtinen wrote:
  Hi,
  
  Modified not refer to DRI3, just uses the present extension to get rid
  of the excess buffer invalidations.
  
  Regards, Joonas
  
  From 257e2a8c750f9dcf868cce9da8632df3cae0fcec Mon Sep 17 00:00:00 2001
  From: Joonas Lahtinen joonas.lahti...@linux.intel.com
  Date: Wed, 5 Nov 2014 12:25:32 +0200
  Subject: [PATCH] egl: dri2: Use present extension.
  
  Present extension is used to avoid excess buffer invalidations, because
  the XCB interface doesn't supply that information.
  
  Signed-off-by: Daniel van der Wath danielx.j.van.der.w...@intel.com
  Signed-off-by: Joonas Lahtinen joonas.lahti...@linux.intel.com
  ---
   configure.ac|5 +-
   src/egl/drivers/dri2/egl_dri2.c |2 +-
   src/egl/drivers/dri2/egl_dri2.h |   24 ++-
   src/egl/drivers/dri2/platform_x11.c |  247 
  ---
   src/mesa/drivers/dri/i965/brw_context.c |9 +-
   5 files changed, 262 insertions(+), 25 deletions(-)
  
  diff --git a/configure.ac b/configure.ac
  index fc7d372..75d90c0 100644
  --- a/configure.ac
  +++ b/configure.ac
  @@ -952,7 +952,8 @@ xyesno)
   fi
   
   if test x$enable_dri = xyes; then
  -   dri_modules=$dri_modules xcb-dri2 = $XCBDRI2_REQUIRED
  +   PKG_CHECK_MODULES([PRESENTPROTO], [presentproto = 
  $PRESENTPROTO_REQUIRED])
  +   dri_modules=$dri_modules xcb-dri2 = $XCBDRI2_REQUIRED 
  xcb-present
 Afaics you are not changing anything on the dri modules (or glx/dri2) to
 require the above changes. Perhaps you need to push the presentproto
 check in the x11 case below ?
 
   fi
   
   if test x$enable_dri3 = xyes; then
  @@ -1564,7 +1565,7 @@ for plat in $egl_platforms; do
  ;;
   
  x11)
  -   PKG_CHECK_MODULES([XCB_DRI2], [x11-xcb xcb xcb-dri2 = 
  $XCBDRI2_REQUIRED xcb-xfixes])
  +   PKG_CHECK_MODULES([XCB_DRI2], [x11-xcb xcb xcb-dri2 = 
  $XCBDRI2_REQUIRED xcb-xfixes xcb-present])
  ;;
   
  drm)
 [snip]
  diff --git a/src/egl/drivers/dri2/platform_x11.c 
  b/src/egl/drivers/dri2/platform_x11.c
  index f8c4b70..a1445b2 100644
  --- a/src/egl/drivers/dri2/platform_x11.c
  +++ b/src/egl/drivers/dri2/platform_x11.c
  @@ -188,6 +188,205 @@ get_xcb_screen(xcb_screen_iterator_t iter, int screen)
   return NULL;
   }
   
  +/*
  + * Called by the XCB_PRESENT_COMPLETE_NOTIFY case.
  + */
  +static void
  +dri2_update_num_back(struct dri2_egl_surface *priv)
  +{
  +   priv-num_back = 1;
  +   if (priv-flipping)
  +  priv-num_back++;
  +   if (priv-base.SwapInterval == 0)
  +  priv-num_back++;
  +}
  +
 This seems to be out of sync with dri3_glx. Don't you need something
 similar to commit f7a36ef5fe23056299a77414f9ad8b5e5a1d ?
 
 [snip]
  +/**
  + *
  + * Process any present events that have been received from the X server
  + *
  + * From glx, we additionally invalidate the drawable here if there has a 
  been a special event.
  + */
  +static void
  +dri2_flush_present_events(struct dri2_egl_display *dri2_dpy, struct 
  dri2_egl_surface *priv)
  +{
  +   xcb_connection_t *c = dri2_dpy-conn;
  +
  +   /* Check to see if any configuration changes have occurred
  +* since we were last invoked
  +*/
  +   if (priv-special_event) {
  +  xcb_generic_event_t*ev;
  +
  +  while ((ev = xcb_poll_for_special_event(c, priv-special_event)) != 
  NULL) {
  + xcb_present_generic_event_t *ge = (void *) ev;
  + dri2_handle_present_event(priv, ge);
  + _eglLog(_EGL_INFO, DRI: Invalidating buffer 0x%x\n, 
  priv-dri_drawable);
  + (*dri2_dpy-flush-invalidate)(priv-dri_drawable);
 Hmm why does one need to invalidate at this stage - I take that it's
 related to lack of fence objects ?
 
 [snip]
  diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
  b/src/mesa/drivers/dri/i965/brw_context.c
  index e1a994a..dbadd10 100644
  --- a/src/mesa/drivers/dri/i965/brw_context.c
  +++ b/src/mesa/drivers/dri/i965/brw_context.c
  @@ -148,6 +148,9 @@ intel_viewport(struct gl_context *ctx)
  __DRIcontext *driContext = brw-driContext;
   
  if (_mesa_is_winsys_fbo(ctx-DrawBuffer)) {
  +  if (unlikely(INTEL_DEBUG  DEBUG_DRI))
  + fprintf(stderr, invalidating drawables\n);
  +
 dri2InvalidateDrawable(driContext-driDrawablePriv);

[Mesa-dev] [Bug 74563] Surfaceless contexts are not properly released by DRI drivers

https://bugs.freedesktop.org/show_bug.cgi?id=74563

Tapani Pälli lem...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Tapani Pälli lem...@gmail.com ---
was pushed to master

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] mesa: add runtime support for SSSE3

Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
---
 configure.ac   | 6 ++
 src/mesa/x86/common_x86.c  | 4 
 src/mesa/x86/common_x86_features.h | 4 +++-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index 03f1bca..cc0a661 100644
--- a/configure.ac
+++ b/configure.ac
@@ -258,6 +258,12 @@ if test x$SSE41_SUPPORTED = x1; then
 fi
 AM_CONDITIONAL([SSE41_SUPPORTED], [test x$SSE41_SUPPORTED = x1])
 
+AX_CHECK_COMPILE_FLAG([-mssse3], [SSSE3_SUPPORTED=1], [SSSE3_SUPPORTED=0])
+if test x$SSSE3_SUPPORTED = x1; then
+DEFINES=$DEFINES -DUSE_SSSE3
+fi
+AM_CONDITIONAL([SSSE3_SUPPORTED], [test x$SSSE3_SUPPORTED = x1])
+
 dnl Can't have static and shared libraries, default to static if user
 dnl explicitly requested. If both disabled, set to static since shared
 dnl was explicitly requested.
diff --git a/src/mesa/x86/common_x86.c b/src/mesa/x86/common_x86.c
index 25f5c40..ad0648a 100644
--- a/src/mesa/x86/common_x86.c
+++ b/src/mesa/x86/common_x86.c
@@ -352,6 +352,10 @@ _mesa_get_x86_features(void)
 
   __get_cpuid(1, eax, ebx, ecx, edx);
 
+#ifdef bit_SSSE3
+  if (ecx  bit_SSSE3)
+ _mesa_x86_cpu_features |= X86_FEATURE_SSSE3;
+#endif
   if (ecx  bit_SSE4_1)
  _mesa_x86_cpu_features |= X86_FEATURE_SSE4_1;
}
diff --git a/src/mesa/x86/common_x86_features.h 
b/src/mesa/x86/common_x86_features.h
index 66f2cf6..6eb2b38 100644
--- a/src/mesa/x86/common_x86_features.h
+++ b/src/mesa/x86/common_x86_features.h
@@ -43,7 +43,8 @@
 #define X86_FEATURE_XMM2   (16)
 #define X86_FEATURE_3DNOWEXT   (17)
 #define X86_FEATURE_3DNOW  (18)
-#define X86_FEATURE_SSE4_1 (19)
+#define X86_FEATURE_SSSE3  (19)
+#define X86_FEATURE_SSE4_1 (110)
 
 /* standard X86 CPU features */
 #define X86_CPU_FPU(10)
@@ -65,6 +66,7 @@
 #define cpu_has_xmm2   (_mesa_x86_cpu_features  X86_FEATURE_XMM2)
 #define cpu_has_3dnow  (_mesa_x86_cpu_features  X86_FEATURE_3DNOW)
 #define cpu_has_3dnowext   (_mesa_x86_cpu_features  X86_FEATURE_3DNOWEXT)
+#define cpu_has_ssse3  (_mesa_x86_cpu_features  X86_FEATURE_SSSE3)
 #define cpu_has_sse4_1 (_mesa_x86_cpu_features  X86_FEATURE_SSE4_1)
 
 #endif
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy

Also cleans up some if statements in the *faster functions.

Callgrind cpu usage results from pts benchmarks:

For ytile_copy_faster()

Nexuiz 1.6.1: 2.16% - 1.20%

Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
---
 src/mesa/Makefile.am   |  8 +++
 src/mesa/drivers/dri/i965/intel_tex_subimage.c | 82 ++
 src/mesa/main/fast_rgba8_copy.c| 78 
 src/mesa/main/fast_rgba8_copy.h| 37 
 4 files changed, 141 insertions(+), 64 deletions(-)
 create mode 100644 src/mesa/main/fast_rgba8_copy.c
 create mode 100644 src/mesa/main/fast_rgba8_copy.h

diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
index e71bccb..2402096 100644
--- a/src/mesa/Makefile.am
+++ b/src/mesa/Makefile.am
@@ -107,6 +107,10 @@ AM_CXXFLAGS = $(LLVM_CFLAGS) $(VISIBILITY_CXXFLAGS)
 
 ARCH_LIBS =
 
+if SSSE3_SUPPORTED
+ARCH_LIBS += libmesa_ssse3.la
+endif
+
 if SSE41_SUPPORTED
 ARCH_LIBS += libmesa_sse41.la
 endif
@@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \
main/streaming-load-memcpy.c
 libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
 
+libmesa_ssse3_la_SOURCES = \
+   main/fast_rgba8_copy.c
+libmesa_ssse3_la_CFLAGS = $(AM_CFLAGS) -mssse3
+
 pkgconfigdir = $(libdir)/pkgconfig
 pkgconfig_DATA = gl.pc
 
diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c 
b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
index cb5738a..0deeb75 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
@@ -27,6 +27,7 @@
  **/
 
 #include main/bufferobj.h
+#include main/fast_rgba8_copy.h
 #include main/image.h
 #include main/macros.h
 #include main/mtypes.h
@@ -42,9 +43,7 @@
 #include intel_mipmap_tree.h
 #include intel_blit.h
 
-#ifdef __SSSE3__
-#include tmmintrin.h
-#endif
+#include x86/common_x86_asm.h
 
 #define FILE_DEBUG_FLAG DEBUG_TEXTURE
 
@@ -175,18 +174,6 @@ err:
return false;
 }
 
-#ifdef __SSSE3__
-static const uint8_t rgba8_permutation[16] =
-   { 2,1,0,3, 6,5,4,7, 10,9,8,11, 14,13,12,15 };
-
-/* NOTE: dst must be 16 byte aligned */
-#define rgba8_copy_16(dst, src) \
-   *(__m128i *)(dst) = _mm_shuffle_epi8(\
-  (__m128i) _mm_loadu_ps((float *)(src)),   \
-  *(__m128i *) rgba8_permutation\
-   )
-#endif
-
 /**
  * Copy RGBA to BGRA - swap R and B.
  */
@@ -196,29 +183,6 @@ rgba8_copy(void *dst, const void *src, size_t bytes)
uint8_t *d = dst;
uint8_t const *s = src;
 
-#ifdef __SSSE3__
-   /* Fast copying for tile spans.
-*
-* As long as the destination texture is 16 aligned,
-* any 16 or 64 spans we get here should also be 16 aligned.
-*/
-
-   if (bytes == 16) {
-  assert(!(((uintptr_t)dst)  0xf));
-  rgba8_copy_16(d+ 0, s+ 0);
-  return dst;
-   }
-
-   if (bytes == 64) {
-  assert(!(((uintptr_t)dst)  0xf));
-  rgba8_copy_16(d+ 0, s+ 0);
-  rgba8_copy_16(d+16, s+16);
-  rgba8_copy_16(d+32, s+32);
-  rgba8_copy_16(d+48, s+48);
-  return dst;
-   }
-#endif
-
while (bytes = 4) {
   d[0] = s[2];
   d[1] = s[1];
@@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, 
uint32_t x3,
   mem_copy_fn mem_copy)
 {
if (x0 == 0  x3 == xtile_width  y0 == 0  y1 == xtile_height) {
-  if (mem_copy == memcpy)
- return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
-   dst, src, src_pitch, swizzle_bit, memcpy);
-  else if (mem_copy == rgba8_copy)
- return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
-   dst, src, src_pitch, swizzle_bit, rgba8_copy);
-   } else {
-  if (mem_copy == memcpy)
- return xtile_copy(x0, x1, x2, x3, y0, y1,
-   dst, src, src_pitch, swizzle_bit, memcpy);
-  else if (mem_copy == rgba8_copy)
- return xtile_copy(x0, x1, x2, x3, y0, y1,
-   dst, src, src_pitch, swizzle_bit, rgba8_copy);
+  return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
+dst, src, src_pitch, swizzle_bit, mem_copy);
}
xtile_copy(x0, x1, x2, x3, y0, y1,
   dst, src, src_pitch, swizzle_bit, mem_copy);
@@ -388,19 +341,8 @@ ytile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, 
uint32_t x3,
   mem_copy_fn mem_copy)
 {
if (x0 == 0  x3 == ytile_width  y0 == 0  y1 == ytile_height) {
-  if (mem_copy == memcpy)
- return ytile_copy(0, 0, ytile_width, ytile_width, 0, ytile_height,
-   dst, src, src_pitch, swizzle_bit, memcpy);
-  else if (mem_copy == rgba8_copy)
- return ytile_copy(0, 0, ytile_width, ytile_width, 0, ytile_height,
-   dst, src, src_pitch, swizzle_bit, rgba8_copy);
-   } else {
-  if (mem_copy == memcpy)

Re: [Mesa-dev] [PATCH 0/2] Disable the EGL state tracker for Linux/DRI builds

On 05/11/14 21:11, Jose Fonseca wrote:
 How many people/companies use EGL for Windows/fbdev, how about OpenVG on
 any platform ?
 
 I already said this privately to Marek when he was RFC'ing on this change: 
 I'm fine if Linux-specific drivers abandon st/egl to focus solely on st/dri, 
 but removing st/egl altogether seems unnecessary and short-sighted: EGL is a 
 cross-platform API, Mesa is a cross-platform implementation of OpenGL and 
 friends, so sooner or later people will want to have Mesa's EGL support on 
 platforms others than Linux.
 
 This is not hypothetical:
 - See https://bugs.freedesktop.org/show_bug.cgi?id=40920 for an example of a 
 bug reported from an user using llvmpipe + egl + opengv on windows.
 - VMware doesn't currently ship or support EGL on Windows, but I suspect we 
 eventually we'll want to support EGL on non-linux platforms.
 
 Even if OpenVG is loosing popularity, but maybe Khronos will come up with 
 another cross-platform graphics API (maybe OpenGL NG) that's tied to EGL.
 
 So a cross-platform implementation of EGL is bound to be useful.
 
 
 I don't test, but I build egl-static and OpenVG on Windows nightly w/ 
 llvmpipe.  It's like a superset of OSMesa, and it seems more useful, as it 
 gives one more APIs than OSMesa, and through a standard API to create/bind 
 contexts .
 
 
 In short, stop caring about st/egl on Linux, maybe even remove DRI support 
 out st/egl if you must, but please don't go out of your way to break EGL on 
 non-linux platforms.
 
So let me justify why I brought this in the first place:

1. Over the last two years st/egl had the following patches
 * Build fixes  related - most of the patches (80%?)
 * Interface changes - ~10 patches
 * Bugfixes - ~3 patches
 * New features - 1 patch (already present with the dri2 backend)
2. Over the last two years I've not seen any bug reports from people
using either st/egl or st/vega. Must admit I was not looking too closely.
3. Afaict the VMWare or other commercial products do not use it.

So based on those my naive question was Is there anyone actually using
those state-trackers, rather than just building them - i.e. it was
meant as a question, rather than a message of hate wrt the code-base :-)

I must admit I cannot predict the future (i.e. what VMWare, Khronos
and/or others have in plan) but based on the lack of testers,
maintainers and new improvements imho it make sense to remove the stale
code. As soon as any of that changes we can always bring it back.

So I would not call it short-sighted, but imho it does not make sense to
cling onto something in the hopes that one day someone may use it.

Cheers,
Emil


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] megadriver: explicitly link against glapi, link with -no-undefined

Hi Frank,

On 31/10/14 01:11, Frank Henigman wrote:
 I was too hasty with my Tested-by.  While it worked in a
 shared-glapi configuration, it broke the build with the following:
 
 ./configure --prefix=/usr --build=x86_64-pc-linux-gnu
 --host=x86_64-cros-linux-gnu --mandir=/usr/share/man
 --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc
 --localstatedir=/var/lib --libdir=/usr/lib64
 --disable-dependency-tracking --disable-option-checking
 --with-driver=dri --disable-glu --disable-glut --without-demos
 --enable-texture-float --disable-dri3 --disable-llvm-shared-libs
 --enable-glx --enable-llvm-gallium --disable-egl --disable-gbm
 --disable-gles1 --disable-gles2 --disable-shared-glapi
 --disable-gallium --enable-debug --enable-glx-tls --enable-asm
 --disable-xlib-glx --enable-dri --with-dri-drivers=i965
 --with-gallium-drivers=
 
 the output is:
 ...
 make[5]: Entering directory
 `/build/lumpy/tmp/portage/media-libs/mesa-10.3-r11/work/Mesa-10.3/src/mesa/drivers/dri'
   CXXLD  mesa_dri_drivers.la
 ../../../../src/mapi/glapi/.libs/libglapi.a(glapi_libglapi_la-glapi_getproc.o):glapi_getproc.c:static_functions:
 error: undefined reference to 'glAreTexturesResidentEXT'
 ../../../../src/mapi/glapi/.libs/libglapi.a(glapi_libglapi_la-glapi_getproc.o):glapi_getproc.c:static_functions:
 error: undefined reference to 'glDeleteTexturesEXT'
 ...
 
Indeed. static-glapi does not get too much testing, plus it seems that
it's broken (in a way) for a lng time.
It seems that we'll have to (temporary) resolve with shoving
dlopen(libglapi.so) into gbm, so that in time programs can nuke it from
their codebase. glapi is internal (implementation) detail that they
should not need to bother/know.

-Emil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] egl_dri2: Allow both 24 and 32 bit X visuals for RGBA configs

Humble ping x2

On 14/10/14 15:25, Emil Velikov wrote:
 Humble ping.
 
 On 23/09/14 01:25, Emil Velikov wrote:
 From: Sjoerd Simons sjoerd.sim...@collabora.co.uk

 When using RGBA EGLConfigs allow both RGB and RGBA X visuals, such that
 application can decide whether they want to use RGBA (and have the
 compositor blend their windows).

 On my system with this change EGLConfigs with a 24 bit visual comes up
 first, as such applications blindly picking the first EGLConfig will
 still get an RGB X visual.

 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67676
 ---

 Hello gents,

 This patch has been stuck in bugzilla since February this year. Bringing 
 it around here to gather greater exposure and perhaps some 
 comments/reviews.

 -Emil

  src/egl/drivers/dri2/egl_dri2.c |  5 +
  src/egl/drivers/dri2/platform_x11.c | 17 +
  2 files changed, 14 insertions(+), 8 deletions(-)

 diff --git a/src/egl/drivers/dri2/egl_dri2.c 
 b/src/egl/drivers/dri2/egl_dri2.c
 index 20a7243..2ed90a7 100644
 --- a/src/egl/drivers/dri2/egl_dri2.c
 +++ b/src/egl/drivers/dri2/egl_dri2.c
 @@ -110,6 +110,11 @@ EGLint dri2_to_egl_attribute_map[] = {
  static EGLBoolean
  dri2_match_config(const _EGLConfig *conf, const _EGLConfig *criteria)
  {
 +
 +   if (criteria-NativeVisualID != EGL_DONT_CARE 
 +conf-NativeVisualID != criteria-NativeVisualID)
 +  return EGL_FALSE;
 +
 if (_eglCompareConfigs(conf, criteria, NULL, EGL_FALSE) != 0)
return EGL_FALSE;
  
 diff --git a/src/egl/drivers/dri2/platform_x11.c 
 b/src/egl/drivers/dri2/platform_x11.c
 index a7a7338..3395fb7 100644
 --- a/src/egl/drivers/dri2/platform_x11.c
 +++ b/src/egl/drivers/dri2/platform_x11.c
 @@ -672,14 +672,15 @@ dri2_x11_add_configs_for_visuals(struct 
 dri2_egl_display *dri2_dpy,
  dri2_add_config(disp, dri2_dpy-driver_configs[j], id++,
  surface_type, config_attrs, rgba_masks);
  
 -/* Allow a 24-bit RGB visual to match a 32-bit RGBA EGLConfig.
 - * Otherwise it will only match a 32-bit RGBA visual.  On a
 - * composited window manager on X11, this will make all of the
 - * EGLConfigs with destination alpha get blended by the
 - * compositor.  This is probably not what the application
 - * wants... especially on drivers that only have 32-bit RGBA
 - * EGLConfigs! */
 -if (d.data-depth == 24) {
 +/* Allow both 24-bit RGB visual and 32 bit RGBA to match a 
 32-bit
 + * RGBA EGLConfig.  Otherwise it will only match a 32-bit RGBA
 + * visual.  On a composited window manager on X11, this will 
 make
 + * all of the EGLConfigs with destination alpha get blended by 
 the
 + * compositor.  This is probably not what the application 
 wants...
 + * especially on drivers that only have 32-bit RGBA EGLConfigs!
 + * Allowing both allows applications to make the decision 
 whether
 + * 32 bit visuals are intended */
 +if (d.data-depth == 24 || d.data-depth == 32) {
 rgba_masks[3] =
~(rgba_masks[0] | rgba_masks[1] | rgba_masks[2]);
 dri2_add_config(disp, dri2_dpy-driver_configs[j], id++,

 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-06 Thread Juha-Pekka Heikkila

On 29.10.2014 14:05, Timothy Arceri wrote:
 Makes use of SSE to speed up compute of min and max elements
 
 Callgrind cpu usage results from pts benchmarks:
 
 Openarena 0.8.8: 3.67% - 1.03%
 UrbanTerror: 2.36% - 0.81%
 
 V5:
 - actually make use of the optimisation in android (Emil Velikov)
 - set a better array size limit for using SSE and added TODO
 
 V4:
 - fixed bugs with incrementing pointer and updating counters
 
 V3:
 - Removed sse_minmax.c from Makefile.sources
 - handle the first few values without SSE until the pointer is aligned
  and use _mm_load_si128 rather than _mm_loadu_si128
 - guard the call to the SSE code better at build time
 
 V2:
 - removed GL* types
 - use _mm_store_si128() rather than _mm_store_ps()
 - add runtime check for SSE
 - use aligned attribute for local mix/max
 - bunch of tidyups
 
 Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
 ---
  src/mesa/Android.libmesa_dricore.mk |  8 ++-
  src/mesa/Android.libmesa_st_mesa.mk |  5 ++
  src/mesa/Makefile.am|  3 +-
  src/mesa/main/sse_minmax.c  | 97 
 +
  src/mesa/main/sse_minmax.h  | 30 
  src/mesa/vbo/vbo_exec_array.c   | 14 --
  6 files changed, 152 insertions(+), 5 deletions(-)
  create mode 100644 src/mesa/main/sse_minmax.c
  create mode 100644 src/mesa/main/sse_minmax.h
 
 diff --git a/src/mesa/Android.libmesa_dricore.mk 
 b/src/mesa/Android.libmesa_dricore.mk
 index 1e6d948..2ab593d 100644
 --- a/src/mesa/Android.libmesa_dricore.mk
 +++ b/src/mesa/Android.libmesa_dricore.mk
 @@ -51,10 +51,16 @@ endif # MESA_ENABLE_ASM
  
  ifeq ($(ARCH_X86_HAVE_SSE4_1),true)
  LOCAL_SRC_FILES += \
 - $(SRCDIR)main/streaming-load-memcpy.c
 + $(SRCDIR)main/streaming-load-memcpy.c \
 + $(SRCDIR)main/sse_minmax.c
  LOCAL_CFLAGS := -msse4.1
  endif
  
 +ifeq ($(ARCH_X86_HAVE_SSE4_1),true)
 +LOCAL_CFLAGS += \
 +   -DUSE_SSE41
 +endif
 +
  LOCAL_C_INCLUDES := \
   $(call intermediates-dir-for STATIC_LIBRARIES,libmesa_program,,) \
   $(MESA_TOP)/src \
 diff --git a/src/mesa/Android.libmesa_st_mesa.mk 
 b/src/mesa/Android.libmesa_st_mesa.mk
 index 8b8d652..618d6bf 100644
 --- a/src/mesa/Android.libmesa_st_mesa.mk
 +++ b/src/mesa/Android.libmesa_st_mesa.mk
 @@ -48,6 +48,11 @@ ifeq ($(TARGET_ARCH),x86)
  endif # x86
  endif # MESA_ENABLE_ASM
  
 +ifeq ($(ARCH_X86_HAVE_SSE4_1),true)
 +LOCAL_CFLAGS := \
 +   -DUSE_SSE41
 +endif
 +
  LOCAL_C_INCLUDES := \
   $(call intermediates-dir-for STATIC_LIBRARIES,libmesa_program,,) \
   $(MESA_TOP)/src/gallium/auxiliary \
 diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
 index e71bccb..932db4f 100644
 --- a/src/mesa/Makefile.am
 +++ b/src/mesa/Makefile.am
 @@ -151,7 +151,8 @@ libmesagallium_la_LIBADD = \
   $(ARCH_LIBS)
  
  libmesa_sse41_la_SOURCES = \
 - main/streaming-load-memcpy.c
 + main/streaming-load-memcpy.c \
 + main/sse_minmax.c
  libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
  
  pkgconfigdir = $(libdir)/pkgconfig
 diff --git a/src/mesa/main/sse_minmax.c b/src/mesa/main/sse_minmax.c
 new file mode 100644
 index 000..91a55e5
 --- /dev/null
 +++ b/src/mesa/main/sse_minmax.c
 @@ -0,0 +1,97 @@
 +/*
 + * Copyright © 2014 Timothy Arceri
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the Software),
 + * to deal in the Software without restriction, including without limitation
 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the next
 + * paragraph) shall be included in all copies or substantial portions of the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
 DEALINGS
 + * IN THE SOFTWARE.
 + *
 + * Author:
 + *Timothy Arceri t_arc...@yahoo.com.au
 + *
 + */
 +
 +#ifdef __SSE4_1__
 +#include main/sse_minmax.h
 +#include smmintrin.h
 +#include stdint.h
 +
 +void
 +_mesa_uint_array_min_max(const unsigned *ui_indices, unsigned *min_index,
 + unsigned *max_index, const unsigned count)
 +{
 +   unsigned max_ui = 0;
 +   unsigned min_ui = ~0U;
 +   unsigned i = 0;
 +   unsigned aligned_count = count;
 +
 +   /* handle the first few values without SSE until the pointer is aligned */
 +   while (((uintptr_t)ui_indices  15)  aligned_count) {
 +  if

Re: [Mesa-dev] [PATCH v3 3/9] gallium/auxiliary: implement sw_probe_wrapped

Hi David,

Just a few nitpicks which I've missed the previously

On 02/11/14 18:31, David Heidelberg wrote:
 Implement pipe_loader_sw_probe_wrapped which allows to use the wrapped
 software renderer backend when using the pipe loader.
 
 Signed-off-by: David Heidelberg da...@ixit.cz
 ---
  src/gallium/auxiliary/pipe-loader/pipe_loader.h| 11 +++
  src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c | 22
 ++
  src/gallium/targets/gbm/Makefile.am|  1 +
  src/gallium/targets/opencl/Makefile.am |  1 +
  src/gallium/targets/xa/Makefile.am |  1 +
  src/gallium/tests/trivial/Makefile.am  |  1 +
  6 files changed, 37 insertions(+)
 
 diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader.h
 b/src/gallium/auxiliary/pipe-loader/pipe_loader.h
 index 6127a6a..9f43f17 100644
 --- a/src/gallium/auxiliary/pipe-loader/pipe_loader.h
 +++ b/src/gallium/auxiliary/pipe-loader/pipe_loader.h
 @@ -166,6 +166,17 @@ pipe_loader_sw_probe_null(struct pipe_loader_device
 **devs);
  int
  pipe_loader_sw_probe(struct pipe_loader_device **devs, int ndev);
  +/**
 + * Get a software device wrapped atop another device.
 + *
 + * This function is platform-specific.
 + *
 + * \sa pipe_loader_probe
 + */
 +boolean
 +pipe_loader_sw_probe_wrapped(struct pipe_loader_device **dev,
 + struct pipe_screen *screen);
 +
  #ifdef HAVE_PIPE_LOADER_DRM
   /**
 diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c
 b/src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c
 index b1b1ca6..b152f60 100644
 --- a/src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c
 +++ b/src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c
 @@ -29,8 +29,11 @@
   #include util/u_memory.h
  #include util/u_dl.h
 +#ifdef HAVE_PIPE_LOADER_DRI
  #include sw/dri/dri_sw_winsys.h
 +#endif
Does this bring any benefit ? Afaics the header does not pull any
dependencies - drm, Xlib, etc so it should be safe as is.

  #include sw/null/null_sw_winsys.h
 +#include sw/wrapper/wrapper_sw_winsys.h
  #ifdef HAVE_PIPE_LOADER_XLIB
  /* Explicitly wrap the header to ease build without X11 headers */
  #include sw/xlib/xlib_sw_winsys.h
 @@ -140,6 +143,25 @@ pipe_loader_sw_probe(struct pipe_loader_device
 **devs, int ndev)
 return i;
  }
  +boolean
 +pipe_loader_sw_probe_wrapped(struct pipe_loader_device **dev,
 + struct pipe_screen *screen)
 +{
 +   struct pipe_loader_sw_device *sdev =
 CALLOC_STRUCT(pipe_loader_sw_device);
 +
Can we check for CALLOC_STRUCT failure ?

 +   sdev-base.type = PIPE_LOADER_DEVICE_SOFTWARE;
 +   sdev-base.driver_name = swrast;
 +   sdev-base.ops = pipe_loader_sw_ops;
 +   sdev-ws = wrapper_sw_winsys_wrap_pipe_screen(screen);
 +
 +   if (!sdev-ws) {
 +  FREE(sdev);
 +  return FALSE;
 +   }
 +   *dev = sdev-base;
 +   return TRUE;
 +}
 +
  static void
  pipe_loader_sw_release(struct pipe_loader_device **dev)
  {
 diff --git a/src/gallium/targets/gbm/Makefile.am
 b/src/gallium/targets/gbm/Makefile.am
 index 2c9b425..679c994 100644
 --- a/src/gallium/targets/gbm/Makefile.am
 +++ b/src/gallium/targets/gbm/Makefile.am
 @@ -34,6 +34,7 @@ gbm_gallium_drm_la_SOURCES =
  gbm_gallium_drm_la_LIBADD = \
  $(top_builddir)/src/gallium/state_trackers/gbm/libgbm.la \
  $(top_builddir)/src/gallium/auxiliary/libgallium.la \
 +$(top_builddir)/src/gallium/winsys/sw/wrapper/libwsw.la \
You can avoid adding this for each target by add it to
GALLIUM_PIPE_LOADER_WINSYS_LIBS. Something like the following would do
the job.


--- a/src/gallium/Automake.inc
+++ b/src/gallium/Automake.inc

GALLIUM_PIPE_LOADER_WINSYS_LIBS = \
+  $(top_builddir)/src/gallium/winsys/sw/wrapper/libwsw.la \
   $(top_builddir)/src/gallium/winsys/sw/null/libws_null.la


Cheers,
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/3][RFC v2] Clamp rgba floats with sse

2014-11-06 Thread Juha-Pekka Heikkila

On 05.11.2014 17:25, Emil Velikov wrote:
 Hi Juha-Pekka,
 
 On 04/11/14 12:05, Juha-Pekka Heikkila wrote:
 [snip]
 I made 'x86' folder under
 src/mesa/main. The idea here being if there is optimization targeting
 architecture it'd exist directly under the place where it was used, in its
 own subdirectly indicating targeted architecture. I don't think majority
 of such code would be generic code thus this approach.

 IMHO adding x86 brings no benefit at this point considering all the
 optimisations are x86 based. Yet if you insist on having it can you move
 the other optimisations in there please.

This was just following the comments I got for my first rfc round for
this sse stuff, comments there reached consensus these should be
collected somewhere separate. I'll remove the x86 folder, if there are
later wishes to separate these to their own location I let someone else
continue with it.

/Juha-Pekka

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [PATCH] i965: Fix an off-by-1 error in the draw upload code's size calculation.

Hi Ken,

From what I've gathered the proposed patch is incorrect and is (most
likely) working around a buggy application behaviour. Afaics Ian
suggested that we add a driconf option for such applications.

Should I consider this patch for the stable branch or the above sounds
about right and we can drop it ?

Thanks
Emil

On 14/10/14 23:42, Kenneth Graunke wrote:
 According to INTEL_DEBUG=perf, Borderlands: The Pre-Sequel was
 stalling on nearly every glBufferSubData call, with very slightly
 overlapping busy ranges.
 
 It turns out the draw upload code was accidentally including an extra
 stride's worth of data in the vertex buffer size due to a simple
 off-by-one error.  We considered this extra bit of buffer space to be
 busy (in use by the GPU), when it was actually idle.
 
 The new diagram should make it easier to understand the formula.  It's
 basically what I drew on paper when working through an actual
 glDrawRangeElements call.
 
 Eliminates all glBufferSubData stalls in Borderlands: The Pre-Sequel.
 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 Cc: mesa-sta...@lists.freedesktop.org
 ---
  src/mesa/drivers/dri/i965/brw_draw_upload.c | 22 +-
  1 file changed, 21 insertions(+), 1 deletion(-)
 
 No Piglit regressions on Haswell.  This might help Dota 2 and Serious Sam 3
 as well, but I haven't checked.
 
 diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c 
 b/src/mesa/drivers/dri/i965/brw_draw_upload.c
 index 5a12439..6cb653c 100644
 --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c
 +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c
 @@ -486,8 +486,28 @@ brw_prepare_vertices(struct brw_context *brw)
offset = 0;
size = intel_buffer-Base.Size;
 } else {
 +  /* Compute the size/amount of data referenced by the GPU.
 +   * If the data is interleaved, StrideB may be larger than
 +   * _ElementSize.  As an example, assume we have 2 
 interleaved
 +   * attributes A and B.  The data is organized like this:
 +   *
 +   *   StrideEltSize
 +   *_,,_,
 +   *   /\  / \
 +   *A: ---   ---   ---   ---   ---   ---
 +   *B:---   ---   ---   ---   ---   ---
 +   *
 +   *   |= 4 elts ==|  (4-1) * Stride + EltSize
 +   *
 +   * max_index - min_index gives the number of elements that
 +   * will be referenced.  Say we're drawing 4 elements.  On
 +   * the first three, we need the full stride in order to get
 +   * to the next element.  But on the last, we only want the
 +   * element size, since we don't actually read the other
 +   * interleaved vertex attributes stored beyond that.
 +   */
offset = buffer-offset + min_index * buffer-stride;
 -  size = (buffer-stride * (max_index - min_index) +
 +  size = (buffer-stride * MAX2(max_index - min_index - 1, 
 0) +
glarray-_ElementSize);
 }
  }
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 85918] Mesa: MSVC 2010/2012 Compile error

https://bugs.freedesktop.org/show_bug.cgi?id=85918

--- Comment #4 from Emil Velikov emil.l.veli...@gmail.com ---
Roland,
Indeed this issue is not related (does not seem) to the verison of msvc yet
it's a nice reminder about the topic, plus an humble ping for Michael :)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 85918] Mesa: MSVC 2010/2012 Compile error

https://bugs.freedesktop.org/show_bug.cgi?id=85918

Emil Velikov emil.l.veli...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
Version|unspecified |10.3
 Resolution|--- |FIXED

--- Comment #5 from Emil Velikov emil.l.veli...@gmail.com ---
Pushed to the 10.3 branch. Issue will be fixed in mesa 10.3.3.

commit d5700dc276008decb2a5d63bfa38522c5f4ad3f3
Author: Brian Paul bri...@vmware.com
Date:   Wed Sep 10 08:16:24 2014 -0600

mesa: fix UNCLAMPED_FLOAT_TO_UBYTE() macro for MSVC

MSVC replaces the F in 255.0F with the macro argument which leads
to an error.  s/F/FLT/ to avoid that.

It turns out we weren't using this macro at all on MSVC until the
recent mesa: Drop USE_IEEE define. change.

Reviewed-by: Roland Scheidegger srol...@vmware.com
(cherry picked from commit 9608193cbc6ea14e49adcd0193f9e7c6058d5a2f)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85918
Nominated-by: Roland Scheidegger srol...@vmware.com

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] ARB_texture_buffer_range offsets

2014-11-06 Thread Marek Olšák

I'd say it's a spec bug. ARB_texture_buffer_range should say that the
offset should be a multiple of an element size, but it doesn't. The
question is, what should the element size be? One component or the
whole pixel?

Marek

On Wed, Nov 5, 2014 at 9:08 PM, Roland Scheidegger srol...@vmware.com wrote:
 Trying to fix some bug due to alignment issues in llvmpipe's vertex
 fetch, I came across some issue with ARB_texture_buffer_range.
 Namely, it looks like the offsets specified there are always in bytes,
 regardless the actual format (hence, as long as the
 TEXTURE_BUFFER_OFFSET_ALIGNMENT is 1, it would be allowed to have an
 offset of 15 bytes for a rgba32f format for instance making all fetches
 quite unaligned).
 However in gallium we actually have first_elem and last_elem parameters
 in the sampler views which are specified in number of elements (so takes
 the format into account), which is what d3d10 does and the state tracker
 translates to that apparently. IMHO d3d10 makes way more sense there
 because that way the necessary alignment scales automatically depending
 on the format (so, if the format is 2x16bit for instance you'd need 4
 byte alignment for the offset, and only need 16 bytes alignment for
 4x32bit, ensuring all lookups are always aligned). This means that 15
 byte offset in the example above is completely untranslatable.
 But if I see that right, OpenGL doesn't work like that, meaning
 effectively gallium drivers (and I doubt most other drivers neither)
 cannot actually claim to support TEXTURE_BUFFER_OFFSET_ALIGNMENT lower
 than 16, even if they'd only need that for 4x32bit formats. Though most
 gallium drivers indeed claim 1 right now.
 Looks quite messy...

 Roland
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 85918] Mesa: MSVC 2010/2012 Compile error

https://bugs.freedesktop.org/show_bug.cgi?id=85918

--- Comment #6 from Michael Bachmann mr.bachm...@web.de ---
Thank you for taking care about this issue and for including the fix in Version
10.3.3.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] ARB_texture_buffer_range offsets

2014-11-06 Thread Roland Scheidegger

Am 06.11.2014 um 16:15 schrieb Marek Olšák:
I'd say it's a spec bug. ARB_texture_buffer_range should say that the
offset should be a multiple of an element size, but it doesn't. The
question is, what should the element size be? One component or the
whole pixel?
Imho whole pixel (for block compressed that would be full block, for
things like packed 565 too but neither are possible in GL), i.e. format
granularity. That said, the whole alignment thing is problematic for
rgb32 (and the possiblity of that was added later,
ARB_texture_buffer_object_rgb32), so maybe it's things like that why the
offset can be just byte aligned (in other words, I'm not convinced it's
just a spec bug, d3d10 doesn't have that problem with alignment).

Roland

Marek

On Wed, Nov 5, 2014 at 9:08 PM, Roland Scheidegger srol...@vmware.com wrote:
Trying to fix some bug due to alignment issues in llvmpipe's vertex
fetch, I came across some issue with ARB_texture_buffer_range.
Namely, it looks like the offsets specified there are always in bytes,
regardless the actual format (hence, as long as the
TEXTURE_BUFFER_OFFSET_ALIGNMENT is 1, it would be allowed to have an
offset of 15 bytes for a rgba32f format for instance making all fetches
quite unaligned).
However in gallium we actually have first_elem and last_elem parameters
in the sampler views which are specified in number of elements (so takes
the format into account), which is what d3d10 does and the state tracker
translates to that apparently. IMHO d3d10 makes way more sense there
because that way the necessary alignment scales automatically depending
on the format (so, if the format is 2x16bit for instance you'd need 4
byte alignment for the offset, and only need 16 bytes alignment for
4x32bit, ensuring all lookups are always aligned). This means that 15
byte offset in the example above is completely untranslatable.
But if I see that right, OpenGL doesn't work like that, meaning
effectively gallium drivers (and I doubt most other drivers neither)
cannot actually claim to support TEXTURE_BUFFER_OFFSET_ALIGNMENT lower
than 16, even if they'd only need that for 4x32bit formats. Though most
gallium drivers indeed claim 1 right now.
Looks quite messy...

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIBaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_Im=Ds_jdCUhL1dGXrkeea1fzl6_iInrZFJOSltaM6dlF9ws=BNwWkIpsz9GFgPRoMLDU8tEVUPzmIxKINN3Uu9evnXse=

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/1] clover: Fix build after llvm r221375

2014-11-06 Thread Jan Vesely

Signed-off-by: Jan Vesely jan.ves...@rutgers.edu
---
 src/gallium/state_trackers/clover/llvm/invocation.cpp | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
b/src/gallium/state_trackers/clover/llvm/invocation.cpp
index e953822..3a4fcf0 100644
--- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
+++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
@@ -282,7 +282,11 @@ namespace {
 
   for (unsigned i = 0; i  kernel_node-getNumOperands(); ++i) {
  kernels.push_back(llvm::dyn_castllvm::Function(
+#if HAVE_LLVM = 0x0306
+
kernel_node-getOperandAsMDNode(i)-getOperand(0)));
+#else
 
kernel_node-getOperand(i)-getOperand(0)));
+#endif
   }
}
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] ARB_texture_buffer_range offsets

2014-11-06 Thread Marek Olšák

For radeonsi, I think only x8, x8y8, and x16 fetches can be
byte-aligned. Everything else is dword-aligned (the 2 lowest bits are
ignored). I guess the cap should be 4 then.

Marek

On Thu, Nov 6, 2014 at 4:55 PM, Roland Scheidegger srol...@vmware.com wrote:
Am 06.11.2014 um 16:15 schrieb Marek Olšák:
I'd say it's a spec bug. ARB_texture_buffer_range should say that the
offset should be a multiple of an element size, but it doesn't. The
question is, what should the element size be? One component or the
whole pixel?
Imho whole pixel (for block compressed that would be full block, for
things like packed 565 too but neither are possible in GL), i.e. format
granularity. That said, the whole alignment thing is problematic for
rgb32 (and the possiblity of that was added later,
ARB_texture_buffer_object_rgb32), so maybe it's things like that why the
offset can be just byte aligned (in other words, I'm not convinced it's
just a spec bug, d3d10 doesn't have that problem with alignment).

Roland

Marek

On Wed, Nov 5, 2014 at 9:08 PM, Roland Scheidegger srol...@vmware.com
wrote:
Trying to fix some bug due to alignment issues in llvmpipe's vertex
fetch, I came across some issue with ARB_texture_buffer_range.
Namely, it looks like the offsets specified there are always in bytes,
regardless the actual format (hence, as long as the
TEXTURE_BUFFER_OFFSET_ALIGNMENT is 1, it would be allowed to have an
offset of 15 bytes for a rgba32f format for instance making all fetches
quite unaligned).
However in gallium we actually have first_elem and last_elem parameters
in the sampler views which are specified in number of elements (so takes
the format into account), which is what d3d10 does and the state tracker
translates to that apparently. IMHO d3d10 makes way more sense there
because that way the necessary alignment scales automatically depending
on the format (so, if the format is 2x16bit for instance you'd need 4
byte alignment for the offset, and only need 16 bytes alignment for
4x32bit, ensuring all lookups are always aligned). This means that 15
byte offset in the example above is completely untranslatable.
But if I see that right, OpenGL doesn't work like that, meaning
effectively gallium drivers (and I doubt most other drivers neither)
cannot actually claim to support TEXTURE_BUFFER_OFFSET_ALIGNMENT lower
than 16, even if they'd only need that for 4x32bit formats. Though most
gallium drivers indeed claim 1 right now.
Looks quite messy...

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/4] i965: Add code to verify the CFG is sane.

2014-11-06 Thread Jason Ekstrand

In general, it seems as if this can miss several things.  For instance, it
checks that all the predicessors are valid but never that we have all the
predecessors.  Same for successors.  If we really want to be able to
validate a CFG, maybe a stack-based approach like calculate_cfg would work
better?  Also, did you run this on piglit/shader-db to ensure that
everything coming out of calculate_cfg actually passes?

More comments inline

On Wed, Nov 5, 2014 at 4:13 PM, Matt Turner matts...@gmail.com wrote:

 ---
  src/mesa/drivers/dri/i965/test_verify_cfg.cpp | 273
 ++
  src/mesa/drivers/dri/i965/test_verify_cfg.h   |  26 +++
  2 files changed, 299 insertions(+)
  create mode 100644 src/mesa/drivers/dri/i965/test_verify_cfg.cpp
  create mode 100644 src/mesa/drivers/dri/i965/test_verify_cfg.h

 diff --git a/src/mesa/drivers/dri/i965/test_verify_cfg.cpp
 b/src/mesa/drivers/dri/i965/test_verify_cfg.cpp
 new file mode 100644
 index 000..0aa74c5
 --- /dev/null
 +++ b/src/mesa/drivers/dri/i965/test_verify_cfg.cpp
 @@ -0,0 +1,273 @@
 +/*
 + * Copyright © 2014 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the
 Software),
 + * to deal in the Software without restriction, including without
 limitation
 + * the rights to use, copy, modify, merge, publish, distribute,
 sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the
 next
 + * paragraph) shall be included in all copies or substantial portions of
 the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
 EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
 SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
 OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 DEALINGS
 + * IN THE SOFTWARE.
 + */
 +
 +#include gtest/gtest.h
 +#include test_verify_cfg.h
 +#include brw_cfg.h
 +
 +static bool
 +is_unconditional_jump(const backend_instruction *inst)
 +{
 +   return (inst-opcode == BRW_OPCODE_BREAK ||
 +   inst-opcode == BRW_OPCODE_CONTINUE ||
 +   inst-opcode == BRW_OPCODE_WHILE) 
 +  inst-predicate == BRW_PREDICATE_NONE;
 +}
 +
 +void
 +verify_cfg(backend_visitor *v)
 +{
 +   foreach_block(block, v-cfg) {
 +  switch (block-start()-opcode) {
 +  case BRW_OPCODE_ENDIF: {
 + /* Has two predecessors:
 +  *- the previous block is always a predecessor
 +  *- always a predecessor ending in an IF or an ELSE
 +  *
 +  * Note that if the body of the if block is empty, then the
 +  * previous block *is* the block that ends with IF, so the ENDIF
 +  * block will have the same predecessor twice.
 +  */
 + if (is_unconditional_jump(block-prev()-end())) {
 +EXPECT_EQ(block-parents.length(), 1u);
 + } else {
 +EXPECT_EQ(block-parents.length(), 2u);
 + }


This obviously catches

if (foo) {
/* Do stuff */
} else {
break;
}

but what about

if (foo) {
break;
} else {
/* Do stuff */
}

Or worse, what about

if (foo) {
break;
} else {
continue;
}

Maybe there's something I'm missing here.


 +
 + bool found_prev = false, found_if = false, found_else = false,
 +  found_other = false;
 + foreach_list_typed(bblock_link, parent, link, block-parents) {
 +if (parent-block == block-prev()  !found_prev)
 +   found_prev = true;
 +else if (parent-block-end()-opcode == BRW_OPCODE_IF)
 +   found_if = true;
 +else if (parent-block-end()-opcode == BRW_OPCODE_ELSE)
 +   found_else = true;
 +else
 +   found_other = true;
 + }


how do we detect if a parent is simply missing?  Or are we trusting in the
check above?


 + EXPECT_NE(found_prev,
 is_unconditional_jump(block-prev()-end()));
 + EXPECT_NE(found_if, found_else);
 + EXPECT_FALSE(found_other);
 + break;
 +  }
 +  case BRW_OPCODE_DO: {
 + /* Has two or more predecessors
 +  *- the previous block is always a predecessor
 +  *- always a predecessor ending in a WHILE
 +  *- some number of predecessors ending in continue
 +  */
 + if (is_unconditional_jump(block-prev()-end())) {
 +EXPECT_GE(block-parents.length(), 1u);
 + } else {
 +EXPECT_GE(block-parents.length(), 2u);
 + }
 +
 + bool found_prev = false,

Re: [Mesa-dev] [PATCH 1/1] clover: Fix build after llvm r221375

2014-11-06 Thread Tom Stellard

On Thu, Nov 06, 2014 at 11:46:41AM -0500, Jan Vesely wrote:
 Signed-off-by: Jan Vesely jan.ves...@rutgers.edu

I've pushed this, thanks!

-Tom

 ---
  src/gallium/state_trackers/clover/llvm/invocation.cpp | 4 
  1 file changed, 4 insertions(+)
 
 diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
 b/src/gallium/state_trackers/clover/llvm/invocation.cpp
 index e953822..3a4fcf0 100644
 --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
 +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
 @@ -282,7 +282,11 @@ namespace {
  
for (unsigned i = 0; i  kernel_node-getNumOperands(); ++i) {
   kernels.push_back(llvm::dyn_castllvm::Function(
 +#if HAVE_LLVM = 0x0306
 +
 kernel_node-getOperandAsMDNode(i)-getOperand(0)));
 +#else
  
 kernel_node-getOperand(i)-getOperand(0)));
 +#endif
}
 }
  
 -- 
 1.9.3
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/4] i965/cfg: Add functions to get first and last non-CF instructions.

2014-11-06 Thread Jason Ekstrand

1 and 4 are Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com

On Wed, Nov 5, 2014 at 4:13 PM, Matt Turner matts...@gmail.com wrote:

 ---
  src/mesa/drivers/dri/i965/brw_cfg.h | 74
 +
  1 file changed, 74 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h
 b/src/mesa/drivers/dri/i965/brw_cfg.h
 index e6403bd..c2029cc 100644
 --- a/src/mesa/drivers/dri/i965/brw_cfg.h
 +++ b/src/mesa/drivers/dri/i965/brw_cfg.h
 @@ -71,6 +71,12 @@ struct bblock_t {
 const bblock_t *next() const;
 bblock_t *prev();
 const bblock_t *prev() const;
 +
 +   bool starts_with_control_flow() const;
 +   bool ends_with_control_flow() const;
 +
 +   backend_instruction *first_non_control_flow_inst();
 +   backend_instruction *last_non_control_flow_inst();
  #endif

 struct exec_node link;
 @@ -142,6 +148,50 @@ bblock_prev_const(const struct bblock_t *block)
 return (const struct bblock_t *)block-link.prev;
  }

 +static inline bool
 +bblock_starts_with_control_flow(const struct bblock_t *block)
 +{
 +   enum opcode op = bblock_start_const(block)-opcode;
 +   return op == BRW_OPCODE_DO || op == BRW_OPCODE_ENDIF;
 +}
 +
 +static inline bool
 +bblock_ends_with_control_flow(const struct bblock_t *block)
 +{
 +   enum opcode op = bblock_end_const(block)-opcode;
 +   return op == BRW_OPCODE_IF ||
 +  op == BRW_OPCODE_ELSE ||
 +  op == BRW_OPCODE_WHILE ||
 +  op == BRW_OPCODE_BREAK ||
 +  op == BRW_OPCODE_CONTINUE;
 +}
 +
 +static inline struct backend_instruction *
 +bblock_first_non_control_flow_inst(struct bblock_t *block)
 +{
 +   struct backend_instruction *inst = bblock_start(block);
 +   if (bblock_starts_with_control_flow(block))
 +#ifdef __cplusplus
 +  inst = (struct backend_instruction *)inst-next;
 +#else
 +  inst = (struct backend_instruction *)inst-link.next;
 +#endif
 +   return inst;
 +}
 +
 +static inline struct backend_instruction *
 +bblock_last_non_control_flow_inst(struct bblock_t *block)
 +{
 +   struct backend_instruction *inst = bblock_end(block);
 +   if (bblock_ends_with_control_flow(block))
 +#ifdef __cplusplus
 +  inst = (struct backend_instruction *)inst-prev;
 +#else
 +  inst = (struct backend_instruction *)inst-link.prev;
 +#endif
 +   return inst;
 +}
 +
  #ifdef __cplusplus
  inline backend_instruction *
  bblock_t::start()
 @@ -190,6 +240,30 @@ bblock_t::prev() const
  {
 return bblock_prev_const(this);
  }
 +
 +inline bool
 +bblock_t::starts_with_control_flow() const
 +{
 +   return bblock_starts_with_control_flow(this);
 +}
 +
 +inline bool
 +bblock_t::ends_with_control_flow() const
 +{
 +   return bblock_ends_with_control_flow(this);
 +}
 +
 +inline backend_instruction *
 +bblock_t::first_non_control_flow_inst()
 +{
 +   return bblock_first_non_control_flow_inst(this);
 +}
 +
 +inline backend_instruction *
 +bblock_t::last_non_control_flow_inst()
 +{
 +   return bblock_last_non_control_flow_inst(this);
 +}
  #endif

  struct cfg_t {
 --
 2.0.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 6/6] st/va: MPEG4 call vlVaDecoderFixMPEG4Startcode()

From: Michael Varga michael.va...@amd.com

If the VOP and GOV headers were truncated they will be regenerated.

Signed-off-by: Michael Varga michael.va...@amd.com
---
 src/gallium/state_trackers/va/picture.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index 7107231..9b94b39 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -503,6 +503,13 @@ handleVASliceDataBufferType(vlVaContext *context, 
vlVaBuffer *buf)
  buffers[num_buffers] = (void *const)start_code_vc1;
  sizes[num_buffers++] = sizeof(start_code_vc1);
   break;
+   case PIPE_VIDEO_FORMAT_MPEG4:
+  if (bufHasStartcode(buf, 0x01, 24))
+ break;
+
+  vlVaDecoderFixMPEG4Startcode(context);
+  buffers[num_buffers] = (void *)context-mpeg4.start_code;
+  sizes[num_buffers++] = context-mpeg4.start_code_size;
default:
   break;
}
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] mesa: add runtime support for SSSE3

On Thu, Nov 6, 2014 at 4:18 AM, Timothy Arceri t_arc...@yahoo.com.au wrote:
 Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
 ---
  configure.ac   | 6 ++
  src/mesa/x86/common_x86.c  | 4 
  src/mesa/x86/common_x86_features.h | 4 +++-
  3 files changed, 13 insertions(+), 1 deletion(-)

 diff --git a/configure.ac b/configure.ac
 index 03f1bca..cc0a661 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -258,6 +258,12 @@ if test x$SSE41_SUPPORTED = x1; then
  fi
  AM_CONDITIONAL([SSE41_SUPPORTED], [test x$SSE41_SUPPORTED = x1])

 +AX_CHECK_COMPILE_FLAG([-mssse3], [SSSE3_SUPPORTED=1], [SSSE3_SUPPORTED=0])
 +if test x$SSSE3_SUPPORTED = x1; then
 +DEFINES=$DEFINES -DUSE_SSSE3
 +fi
 +AM_CONDITIONAL([SSSE3_SUPPORTED], [test x$SSSE3_SUPPORTED = x1])
 +
  dnl Can't have static and shared libraries, default to static if user
  dnl explicitly requested. If both disabled, set to static since shared
  dnl was explicitly requested.
 diff --git a/src/mesa/x86/common_x86.c b/src/mesa/x86/common_x86.c
 index 25f5c40..ad0648a 100644
 --- a/src/mesa/x86/common_x86.c
 +++ b/src/mesa/x86/common_x86.c
 @@ -352,6 +352,10 @@ _mesa_get_x86_features(void)

__get_cpuid(1, eax, ebx, ecx, edx);

 +#ifdef bit_SSSE3

Why did you put this in an #ifdef bit_SSSE3? That's defined everywhere.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/6] st/va: MPEG4 populate the PPS structure

From: Michael Varga michael.va...@amd.com

Signed-off-by: Michael Varga michael.va...@amd.com
---
 src/gallium/state_trackers/va/picture.c| 72 ++
 src/gallium/state_trackers/va/va_private.h |  9 
 2 files changed, 81 insertions(+)

diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index 8775681..a4eb26b 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -80,6 +80,12 @@ handlePictureParameterBuffer(vlVaDriver *drv, vlVaContext 
*context, vlVaBuffer *
VAPictureParameterBufferMPEG2 *mpeg2;
VAPictureParameterBufferH264 *h264;
VAPictureParameterBufferVC1 * vc1;
+   VAPictureParameterBufferMPEG4 *mpeg4;
+   vlVaSurface *surf_forward;
+   vlVaSurface *surf_backward;
+   unsigned int i;
+   static const uint8_t default_intra_quant_matrix[64] = { 0 };
+   static const uint8_t default_non_intra_quant_matrix[64] = { 0 };
 
switch (u_reduce_video_profile(context-decoder-profile)) {
case PIPE_VIDEO_FORMAT_MPEG12:
@@ -214,6 +220,72 @@ handlePictureParameterBuffer(vlVaDriver *drv, vlVaContext 
*context, vlVaBuffer *
   context-desc.vc1.pquant = 
vc1-pic_quantizer_fields.bits.pic_quantizer_scale;
   break;
 
+   case PIPE_VIDEO_FORMAT_MPEG4:
+  assert(buf-size = sizeof(VAPictureParameterBufferMPEG4)  
buf-num_elements == 1);
+  mpeg4 = buf-data;
+
+  context-mpeg4.pps = *mpeg4;
+
+  /* vop_width */
+  /* vop_height */
+  /* forward_reference_picture */
+  /* backward_reference_picture */
+  context-desc.mpeg4.short_video_header =
+mpeg4-vol_fields.bits.short_video_header;
+  /* chroma_format */
+  context-desc.mpeg4.interlaced = mpeg4-vol_fields.bits.interlaced;
+  /* obmc_disable */
+  /* sprite_enable */
+  /* sprite_warping_accuracy */
+  context-desc.mpeg4.quant_type = mpeg4-vol_fields.bits.quant_type;
+  context-desc.mpeg4.quarter_sample = 
mpeg4-vol_fields.bits.quarter_sample;
+  /* data_partitioned */
+  /* reversible_vlc */
+  context-desc.mpeg4.resync_marker_disable =
+mpeg4-vol_fields.bits.resync_marker_disable;
+  /* no_of_sprite_warping_points */
+  /* sprite_trajectory_du */
+  /* sprite_trajectory_dv */
+  /* quant_precision */
+  context-desc.mpeg4.vop_coding_type = 
mpeg4-vop_fields.bits.vop_coding_type;
+  /* backward_reference_vop_coding_type */
+  /* vop_rounding_type */
+  /* intra_dc_vlc_thr */
+  context-desc.mpeg4.top_field_first =
+mpeg4-vop_fields.bits.top_field_first;
+  context-desc.mpeg4.alternate_vertical_scan_flag =
+mpeg4-vop_fields.bits.alternate_vertical_scan_flag;
+  context-desc.mpeg4.vop_fcode_forward = mpeg4-vop_fcode_forward;
+  context-desc.mpeg4.vop_fcode_backward = mpeg4-vop_fcode_backward;
+  context-desc.mpeg4.vop_time_increment_resolution =
+mpeg4-vop_time_increment_resolution;
+  /* num_gobs_in_vop */
+  /* num_macroblocks_in_gob */
+  context-desc.mpeg4.trb[0] = mpeg4-TRB;
+  context-desc.mpeg4.trb[1] = mpeg4-TRB;
+  context-desc.mpeg4.trd[0] = mpeg4-TRD;
+  context-desc.mpeg4.trd[1] = mpeg4-TRD;
+
+  /* default [non-]intra quant matrix because mpv does not set these
+ matrices */
+  if (!context-desc.mpeg4.intra_matrix)
+ context-desc.mpeg4.intra_matrix = default_intra_quant_matrix;
+  if (!context-desc.mpeg4.non_intra_matrix)
+ context-desc.mpeg4.non_intra_matrix = default_non_intra_quant_matrix;
+
+  surf_forward = handle_table_get(drv-htab, 
mpeg4-forward_reference_picture);
+  if (surf_forward)
+ context-desc.mpeg4.ref[0] = surf_forward-buffer;
+  surf_backward = handle_table_get(drv-htab, 
mpeg4-backward_reference_picture);
+  if (surf_backward)
+ context-desc.mpeg4.ref[1] = surf_backward-buffer;
+
+  context-mpeg4.vti_bits = 0;
+  for (i = context-desc.mpeg4.vop_time_increment_resolution; i  0; i /= 
2)
+ ++context-mpeg4.vti_bits;
+
+  break;
+
default:
   break;
}
diff --git a/src/gallium/state_trackers/va/va_private.h 
b/src/gallium/state_trackers/va/va_private.h
index 060a1fa..7d2fc24 100644
--- a/src/gallium/state_trackers/va/va_private.h
+++ b/src/gallium/state_trackers/va/va_private.h
@@ -162,6 +162,15 @@ typedef struct {
   struct pipe_vc1_picture_desc vc1;
   struct pipe_h264_picture_desc h264;
} desc;
+
+   struct {
+  unsigned long long int frame_num;
+  unsigned int start_code_size;
+  unsigned int vti_bits;
+  unsigned int quant_scale;
+  VAPictureParameterBufferMPEG4 pps;
+  uint8_t start_code[32];
+   } mpeg4;
 } vlVaContext;
 
 typedef struct {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy

On Thu, Nov 6, 2014 at 4:20 AM, Timothy Arceri t_arc...@yahoo.com.au wrote:
 Also cleans up some if statements in the *faster functions.

 Callgrind cpu usage results from pts benchmarks:

 For ytile_copy_faster()

 Nexuiz 1.6.1: 2.16% - 1.20%

 Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
 ---
  src/mesa/Makefile.am   |  8 +++
  src/mesa/drivers/dri/i965/intel_tex_subimage.c | 82 
 ++
  src/mesa/main/fast_rgba8_copy.c| 78 
  src/mesa/main/fast_rgba8_copy.h| 37 
  4 files changed, 141 insertions(+), 64 deletions(-)
  create mode 100644 src/mesa/main/fast_rgba8_copy.c
  create mode 100644 src/mesa/main/fast_rgba8_copy.h

 diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
 index e71bccb..2402096 100644
 --- a/src/mesa/Makefile.am
 +++ b/src/mesa/Makefile.am
 @@ -107,6 +107,10 @@ AM_CXXFLAGS = $(LLVM_CFLAGS) $(VISIBILITY_CXXFLAGS)

  ARCH_LIBS =

 +if SSSE3_SUPPORTED
 +ARCH_LIBS += libmesa_ssse3.la
 +endif
 +
  if SSE41_SUPPORTED
  ARCH_LIBS += libmesa_sse41.la
  endif
 @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \
 main/streaming-load-memcpy.c
  libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1

 +libmesa_ssse3_la_SOURCES = \
 +   main/fast_rgba8_copy.c
 +libmesa_ssse3_la_CFLAGS = $(AM_CFLAGS) -mssse3
 +
  pkgconfigdir = $(libdir)/pkgconfig
  pkgconfig_DATA = gl.pc

 diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c 
 b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
 index cb5738a..0deeb75 100644
 --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
 +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
 @@ -27,6 +27,7 @@
   **/

  #include main/bufferobj.h
 +#include main/fast_rgba8_copy.h
  #include main/image.h
  #include main/macros.h
  #include main/mtypes.h
 @@ -42,9 +43,7 @@
  #include intel_mipmap_tree.h
  #include intel_blit.h

 -#ifdef __SSSE3__
 -#include tmmintrin.h
 -#endif
 +#include x86/common_x86_asm.h

  #define FILE_DEBUG_FLAG DEBUG_TEXTURE

 @@ -175,18 +174,6 @@ err:
 return false;
  }

 -#ifdef __SSSE3__
 -static const uint8_t rgba8_permutation[16] =
 -   { 2,1,0,3, 6,5,4,7, 10,9,8,11, 14,13,12,15 };
 -
 -/* NOTE: dst must be 16 byte aligned */
 -#define rgba8_copy_16(dst, src) \
 -   *(__m128i *)(dst) = _mm_shuffle_epi8(\
 -  (__m128i) _mm_loadu_ps((float *)(src)),   \
 -  *(__m128i *) rgba8_permutation\
 -   )
 -#endif
 -
  /**
   * Copy RGBA to BGRA - swap R and B.
   */
 @@ -196,29 +183,6 @@ rgba8_copy(void *dst, const void *src, size_t bytes)
 uint8_t *d = dst;
 uint8_t const *s = src;

 -#ifdef __SSSE3__
 -   /* Fast copying for tile spans.
 -*
 -* As long as the destination texture is 16 aligned,
 -* any 16 or 64 spans we get here should also be 16 aligned.
 -*/
 -
 -   if (bytes == 16) {
 -  assert(!(((uintptr_t)dst)  0xf));
 -  rgba8_copy_16(d+ 0, s+ 0);
 -  return dst;
 -   }
 -
 -   if (bytes == 64) {
 -  assert(!(((uintptr_t)dst)  0xf));
 -  rgba8_copy_16(d+ 0, s+ 0);
 -  rgba8_copy_16(d+16, s+16);
 -  rgba8_copy_16(d+32, s+32);
 -  rgba8_copy_16(d+48, s+48);
 -  return dst;
 -   }
 -#endif
 -
 while (bytes = 4) {
d[0] = s[2];
d[1] = s[1];
 @@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, 
 uint32_t x3,
mem_copy_fn mem_copy)
  {
 if (x0 == 0  x3 == xtile_width  y0 == 0  y1 == xtile_height) {
 -  if (mem_copy == memcpy)
 - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
 -   dst, src, src_pitch, swizzle_bit, memcpy);
 -  else if (mem_copy == rgba8_copy)
 - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
 -   dst, src, src_pitch, swizzle_bit, rgba8_copy);
 -   } else {
 -  if (mem_copy == memcpy)
 - return xtile_copy(x0, x1, x2, x3, y0, y1,
 -   dst, src, src_pitch, swizzle_bit, memcpy);
 -  else if (mem_copy == rgba8_copy)
 - return xtile_copy(x0, x1, x2, x3, y0, y1,
 -   dst, src, src_pitch, swizzle_bit, rgba8_copy);
 +  return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
 +dst, src, src_pitch, swizzle_bit, mem_copy);
 }
 xtile_copy(x0, x1, x2, x3, y0, y1,
dst, src, src_pitch, swizzle_bit, mem_copy);
 @@ -388,19 +341,8 @@ ytile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, 
 uint32_t x3,
mem_copy_fn mem_copy)
  {
 if (x0 == 0  x3 == ytile_width  y0 == 0  y1 == ytile_height) {
 -  if (mem_copy == memcpy)
 - return ytile_copy(0, 0, ytile_width, ytile_width, 0, ytile_height,
 -   dst, src, src_pitch, swizzle_bit, memcpy);
 -  else if (mem_copy ==

Re: [Mesa-dev] [PATCH 2/4] i965/fs: Wire up control flow correctly in predicated break pass.

2014-11-06 Thread Jason Ekstrand

On Wed, Nov 5, 2014 at 4:13 PM, Matt Turner matts...@gmail.com wrote:

 When the earlier block ended with control flow, we'd mistakenly remove
 some of its links to its children. The same happened with the later
 block.
 ---
  src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp | 10
 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

 diff --git
 a/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
 b/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
 index b7a1d7e..047c2c0 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
 @@ -107,10 +107,14 @@ fs_visitor::opt_peephole_predicated_break()
}
endif_inst-remove(endif_block);

 -  earlier_block-children.make_empty();
 -  later_block-parents.make_empty();
 +  if (!earlier_block-ends_with_control_flow()) {
 + earlier_block-children.make_empty();
 + earlier_block-add_successor(cfg-mem_ctx, jump_block);
 +  }

 -  earlier_block-add_successor(cfg-mem_ctx, jump_block);
 +  if (!later_block-starts_with_control_flow()) {
 + later_block-parents.make_empty();
 +  }


I *think* this is correct.  I didn't really understand it before and I
don't 100% now.

Acked-by: Jason Ekstrand jason.ekstr...@intel.com


jump_block-add_successor(cfg-mem_ctx, later_block);

if (earlier_block-can_combine_with(jump_block)) {
 --
 2.0.4

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 84566] Unify the format conversion code

https://bugs.freedesktop.org/show_bug.cgi?id=84566

--- Comment #50 from Jason Ekstrand ja...@jlekstrand.net ---
(In reply to Iago Toral from comment #49)
 Jason you had this commit in your original branch: MAYBEREVERT: Fill X
 components with 1
 
 That basically makes packing to padded formats (like RGBX) set X=1. In the
 commit message you mention that you are not sure you like this... my opinion
 is that this should not be necessary and it degrades performance when
 packing to these format, so I would be more for removing this one. What do
 you think?
 
 FWIW, I have checked that removing this commit does not affect piglit
 results.

Go ahead and remove it.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/6] st/va: refactored handleVASliceDataBufferType

From: Michael Varga michael.va...@amd.com

This patch cleans the function handleVASliceDataBufferType() for better
readability.

Signed-off-by: Michael Varga michael.va...@amd.com
---
 src/gallium/state_trackers/va/picture.c | 75 ++---
 1 file changed, 40 insertions(+), 35 deletions(-)

diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index 57d1fb1..8775681 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -272,51 +272,56 @@ handleSliceParameterBuffer(vlVaContext *context, 
vlVaBuffer *buf)
}
 }
 
+static unsigned int
+bufHasStartcode(vlVaBuffer *buf, unsigned int code, unsigned int bits)
+{
+   struct vl_vlc vlc = {0};
+   int i;
+
+   /* search the first 64 bytes for a startcode */
+   vl_vlc_init(vlc, 1, (const void * const*)buf-data, buf-size);
+   for (i = 0; i  64  vl_vlc_bits_left(vlc) = bits; ++i) {
+  if (vl_vlc_peekbits(vlc, bits) == code)
+ return 1;
+  vl_vlc_eatbits(vlc, 8);
+  vl_vlc_fillbits(vlc);
+   }
+
+   return 0;
+}
+
 static void
 handleVASliceDataBufferType(vlVaContext *context, vlVaBuffer *buf)
 {
+   enum pipe_video_format format;
unsigned num_buffers = 0;
void * const *buffers[2];
unsigned sizes[2];
-   enum pipe_video_format format;
+   static const uint8_t start_code_h264[] = { 0x00, 0x00, 0x01 };
+   static const uint8_t start_code_vc1[] = { 0x00, 0x00, 0x01, 0x0d };
 
format = u_reduce_video_profile(context-decoder-profile);
-   if (format == PIPE_VIDEO_FORMAT_MPEG4_AVC ||
-   format == PIPE_VIDEO_FORMAT_VC1) {
-  struct vl_vlc vlc = {0};
-  bool found = false;
-  int peek_bits, i;
-
-  /* search the first 64 bytes for a startcode */
-  vl_vlc_init(vlc, 1, (const void * const*)buf-data, buf-size);
-  peek_bits = (format == PIPE_VIDEO_FORMAT_MPEG4_AVC) ? 24 : 32;
-  for (i = 0; i  64  vl_vlc_bits_left(vlc) = peek_bits; ++i) {
- uint32_t value = vl_vlc_peekbits(vlc, peek_bits);
- if ((format == PIPE_VIDEO_FORMAT_MPEG4_AVC  value == 0x01) ||
-(format == PIPE_VIDEO_FORMAT_VC1  (value == 0x010d ||
-value == 0x010c || value == 0x010b))) {
-found = true;
+   switch (format) {
+   case PIPE_VIDEO_FORMAT_MPEG4_AVC:
+ if (bufHasStartcode(buf, 0x01, 24))
 break;
- }
- vl_vlc_eatbits(vlc, 8);
- vl_vlc_fillbits(vlc);
-  }
-  /* none found, ok add one manually */
-  if (!found) {
- static const uint8_t start_code_h264[] = { 0x00, 0x00, 0x01 };
- static const uint8_t start_code_vc1[] = { 0x00, 0x00, 0x01, 0x0d };
-
- if (format == PIPE_VIDEO_FORMAT_MPEG4_AVC) {
-buffers[num_buffers] = (void *const)start_code_h264;
-sizes[num_buffers] = sizeof(start_code_h264);
- }
- else {
-buffers[num_buffers] = (void *const)start_code_vc1;
-sizes[num_buffers] = sizeof(start_code_vc1);
- }
- ++num_buffers;
-  }
+
+ buffers[num_buffers] = (void *const)start_code_h264;
+ sizes[num_buffers++] = sizeof(start_code_h264);
+  break;
+   case PIPE_VIDEO_FORMAT_VC1:
+  if (bufHasStartcode(buf, 0x010d, 32) ||
+  bufHasStartcode(buf, 0x010c, 32) ||
+  bufHasStartcode(buf, 0x010b, 32))
+ break;
+
+ buffers[num_buffers] = (void *const)start_code_vc1;
+ sizes[num_buffers++] = sizeof(start_code_vc1);
+  break;
+   default:
+  break;
}
+
buffers[num_buffers] = buf-data;
sizes[num_buffers] = buf-size;
++num_buffers;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/6] st/va: MPEG4 populate the SPS structure

From: Michael Varga michael.va...@amd.com

Signed-off-by: Michael Varga michael.va...@amd.com
---
 src/gallium/state_trackers/va/picture.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index 327c33d..ff13bc6 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -344,6 +344,7 @@ static void
 handleSliceParameterBuffer(vlVaContext *context, vlVaBuffer *buf)
 {
VASliceParameterBufferH264 *h264;
+   VASliceParameterBufferMPEG4 *mpeg4;
 
switch (u_reduce_video_profile(context-decoder-profile)) {
case PIPE_VIDEO_FORMAT_MPEG4_AVC:
@@ -354,7 +355,12 @@ handleSliceParameterBuffer(vlVaContext *context, 
vlVaBuffer *buf)
   context-desc.h264.num_ref_idx_l1_active_minus1 =
  h264-num_ref_idx_l1_active_minus1;
   break;
+   case PIPE_VIDEO_FORMAT_MPEG4:
+  assert(buf-size = sizeof(VASliceParameterBufferMPEG4)  
buf-num_elements == 1);
+  mpeg4 = buf-data;
 
+  context-mpeg4.quant_scale = mpeg4-quant_scale;
+  break;
default:
   break;
}
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/6] st/va: MPEG4 populate the iq matrix buffers

From: Michael Varga michael.va...@amd.com

Signed-off-by: Michael Varga michael.va...@amd.com
---
 src/gallium/state_trackers/va/picture.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index a4eb26b..327c33d 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -296,6 +296,7 @@ handleIQMatrixBuffer(vlVaContext *context, vlVaBuffer *buf)
 {
VAIQMatrixBufferMPEG2 *mpeg2;
VAIQMatrixBufferH264 *h264;
+   VAIQMatrixBufferMPEG4 *mpeg4;
 
switch (u_reduce_video_profile(context-decoder-profile)) {
case PIPE_VIDEO_FORMAT_MPEG12:
@@ -319,6 +320,21 @@ handleIQMatrixBuffer(vlVaContext *context, vlVaBuffer *buf)
   memcpy(context-desc.h264.pps-ScalingList8x8, h264-ScalingList8x8, 2 
* 64);
   break;
 
+   case PIPE_VIDEO_FORMAT_MPEG4:
+  assert(buf-size = sizeof(VAIQMatrixBufferMPEG4)  buf-num_elements 
== 1);
+  mpeg4 = buf-data;
+
+  if (mpeg4-load_intra_quant_mat)
+ context-desc.mpeg4.intra_matrix = mpeg4-intra_quant_mat;
+  else
+ context-desc.mpeg4.intra_matrix = NULL;
+
+  if (mpeg4-load_non_intra_quant_mat)
+ context-desc.mpeg4.non_intra_matrix = mpeg4-non_intra_quant_mat;
+  else
+ context-desc.mpeg4.non_intra_matrix = NULL;
+  break;
+
default:
   break;
}
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] egl_dri2: Allow both 24 and 32 bit X visuals for RGBA configs

2014-11-06 Thread Marek Olšák

If there's no feedback for so long, I suggest you commit the patch
without review. If it fixes a bug, that's one more reason to commit
it.

Marek

On Thu, Nov 6, 2014 at 2:12 PM, Emil Velikov emil.l.veli...@gmail.com wrote:
 Humble ping x2

 On 14/10/14 15:25, Emil Velikov wrote:
 Humble ping.

 On 23/09/14 01:25, Emil Velikov wrote:
 From: Sjoerd Simons sjoerd.sim...@collabora.co.uk

 When using RGBA EGLConfigs allow both RGB and RGBA X visuals, such that
 application can decide whether they want to use RGBA (and have the
 compositor blend their windows).

 On my system with this change EGLConfigs with a 24 bit visual comes up
 first, as such applications blindly picking the first EGLConfig will
 still get an RGB X visual.

 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67676
 ---

 Hello gents,

 This patch has been stuck in bugzilla since February this year. Bringing
 it around here to gather greater exposure and perhaps some
 comments/reviews.

 -Emil

  src/egl/drivers/dri2/egl_dri2.c |  5 +
  src/egl/drivers/dri2/platform_x11.c | 17 +
  2 files changed, 14 insertions(+), 8 deletions(-)

 diff --git a/src/egl/drivers/dri2/egl_dri2.c 
 b/src/egl/drivers/dri2/egl_dri2.c
 index 20a7243..2ed90a7 100644
 --- a/src/egl/drivers/dri2/egl_dri2.c
 +++ b/src/egl/drivers/dri2/egl_dri2.c
 @@ -110,6 +110,11 @@ EGLint dri2_to_egl_attribute_map[] = {
  static EGLBoolean
  dri2_match_config(const _EGLConfig *conf, const _EGLConfig *criteria)
  {
 +
 +   if (criteria-NativeVisualID != EGL_DONT_CARE 
 +conf-NativeVisualID != criteria-NativeVisualID)
 +  return EGL_FALSE;
 +
 if (_eglCompareConfigs(conf, criteria, NULL, EGL_FALSE) != 0)
return EGL_FALSE;

 diff --git a/src/egl/drivers/dri2/platform_x11.c 
 b/src/egl/drivers/dri2/platform_x11.c
 index a7a7338..3395fb7 100644
 --- a/src/egl/drivers/dri2/platform_x11.c
 +++ b/src/egl/drivers/dri2/platform_x11.c
 @@ -672,14 +672,15 @@ dri2_x11_add_configs_for_visuals(struct 
 dri2_egl_display *dri2_dpy,
  dri2_add_config(disp, dri2_dpy-driver_configs[j], id++,
  surface_type, config_attrs, rgba_masks);

 -/* Allow a 24-bit RGB visual to match a 32-bit RGBA EGLConfig.
 - * Otherwise it will only match a 32-bit RGBA visual.  On a
 - * composited window manager on X11, this will make all of the
 - * EGLConfigs with destination alpha get blended by the
 - * compositor.  This is probably not what the application
 - * wants... especially on drivers that only have 32-bit RGBA
 - * EGLConfigs! */
 -if (d.data-depth == 24) {
 +/* Allow both 24-bit RGB visual and 32 bit RGBA to match a 
 32-bit
 + * RGBA EGLConfig.  Otherwise it will only match a 32-bit RGBA
 + * visual.  On a composited window manager on X11, this will 
 make
 + * all of the EGLConfigs with destination alpha get blended by 
 the
 + * compositor.  This is probably not what the application 
 wants...
 + * especially on drivers that only have 32-bit RGBA EGLConfigs!
 + * Allowing both allows applications to make the decision 
 whether
 + * 32 bit visuals are intended */
 +if (d.data-depth == 24 || d.data-depth == 32) {
 rgba_masks[3] =
~(rgba_masks[0] | rgba_masks[1] | rgba_masks[2]);
 dri2_add_config(disp, dri2_dpy-driver_configs[j], id++,



 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/4] i965: Add code to verify the CFG is sane.

On Thu, Nov 6, 2014 at 9:41 AM, Jason Ekstrand ja...@jlekstrand.net wrote:
 In general, it seems as if this can miss several things.  For instance, it
 checks that all the predicessors are valid but never that we have all the
 predecessors.

I'm not sure what you mean. That we don't validate that if A - B then
B has an incoming edge from A? That's true.

I think that would be a good addition. I'm not sure I want to do that
in this patch. It was pretty big as is.

Same for successors.  If we really want to be able to
 validate a CFG, maybe a stack-based approach like calculate_cfg would work
 better?  Also, did you run this on piglit/shader-db to ensure that
 everything coming out of calculate_cfg actually passes?

I don't think it will pass because of the things you pointed out about
unconditional jumps.

 More comments inline

 On Wed, Nov 5, 2014 at 4:13 PM, Matt Turner matts...@gmail.com wrote:

 ---
  src/mesa/drivers/dri/i965/test_verify_cfg.cpp | 273
 ++
  src/mesa/drivers/dri/i965/test_verify_cfg.h   |  26 +++
  2 files changed, 299 insertions(+)
  create mode 100644 src/mesa/drivers/dri/i965/test_verify_cfg.cpp
  create mode 100644 src/mesa/drivers/dri/i965/test_verify_cfg.h

 diff --git a/src/mesa/drivers/dri/i965/test_verify_cfg.cpp
 b/src/mesa/drivers/dri/i965/test_verify_cfg.cpp
 new file mode 100644
 index 000..0aa74c5
 --- /dev/null
 +++ b/src/mesa/drivers/dri/i965/test_verify_cfg.cpp
 @@ -0,0 +1,273 @@
 +/*
 + * Copyright © 2014 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining
 a
 + * copy of this software and associated documentation files (the
 Software),
 + * to deal in the Software without restriction, including without
 limitation
 + * the rights to use, copy, modify, merge, publish, distribute,
 sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the
 next
 + * paragraph) shall be included in all copies or substantial portions of
 the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
 EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
 SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
 OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
 ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 DEALINGS
 + * IN THE SOFTWARE.
 + */
 +
 +#include gtest/gtest.h
 +#include test_verify_cfg.h
 +#include brw_cfg.h
 +
 +static bool
 +is_unconditional_jump(const backend_instruction *inst)
 +{
 +   return (inst-opcode == BRW_OPCODE_BREAK ||
 +   inst-opcode == BRW_OPCODE_CONTINUE ||
 +   inst-opcode == BRW_OPCODE_WHILE) 
 +  inst-predicate == BRW_PREDICATE_NONE;
 +}
 +
 +void
 +verify_cfg(backend_visitor *v)
 +{
 +   foreach_block(block, v-cfg) {
 +  switch (block-start()-opcode) {
 +  case BRW_OPCODE_ENDIF: {
 + /* Has two predecessors:
 +  *- the previous block is always a predecessor
 +  *- always a predecessor ending in an IF or an ELSE
 +  *
 +  * Note that if the body of the if block is empty, then the
 +  * previous block *is* the block that ends with IF, so the ENDIF
 +  * block will have the same predecessor twice.
 +  */
 + if (is_unconditional_jump(block-prev()-end())) {
 +EXPECT_EQ(block-parents.length(), 1u);
 + } else {
 +EXPECT_EQ(block-parents.length(), 2u);
 + }


 This obviously catches

 if (foo) {
 /* Do stuff */
 } else {
 break;
 }

 but what about

 if (foo) {
 break;
 } else {
 /* Do stuff */
 }

 Or worse, what about

 if (foo) {
 break;
 } else {
 continue;
 }

 Maybe there's something I'm missing here.

I think you're right. For purposes of testing the predicated break
pass I didn't really care about unconditional jumps (since they would
be removed by the pass!). I added the unconditional jump cases as an
after thought.

So the question is whether I can actually test the cases you mention
in the current model? I'm not super excited expending a bunch more
effort to rewrite some testing code that already tests the thing I
want.


 +
 + bool found_prev = false, found_if = false, found_else = false,
 +  found_other = false;
 + foreach_list_typed(bblock_link, parent, link, block-parents) {
 +if (parent-block == block-prev()  !found_prev)
 +   found_prev = true;
 +else if (parent-block-end()-opcode == BRW_OPCODE_IF)
 +   found_if = true;
 +else if (parent-block-end()-opcode == BRW_OPCODE_ELSE)
 +   found_else = true;
 +

Re: [Mesa-dev] [PATCH 2/4] i965/fs: Wire up control flow correctly in predicated break pass.

On Thu, Nov 6, 2014 at 10:07 AM, Jason Ekstrand ja...@jlekstrand.net wrote:


 On Wed, Nov 5, 2014 at 4:13 PM, Matt Turner matts...@gmail.com wrote:

 When the earlier block ended with control flow, we'd mistakenly remove
 some of its links to its children. The same happened with the later
 block.
 ---
  src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp | 10
 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

 diff --git
 a/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
 b/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
 index b7a1d7e..047c2c0 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp
 @@ -107,10 +107,14 @@ fs_visitor::opt_peephole_predicated_break()
}
endif_inst-remove(endif_block);

 -  earlier_block-children.make_empty();
 -  later_block-parents.make_empty();
 +  if (!earlier_block-ends_with_control_flow()) {
 + earlier_block-children.make_empty();
 + earlier_block-add_successor(cfg-mem_ctx, jump_block);
 +  }

 -  earlier_block-add_successor(cfg-mem_ctx, jump_block);
 +  if (!later_block-starts_with_control_flow()) {
 + later_block-parents.make_empty();
 +  }


 I *think* this is correct.  I didn't really understand it before and I don't
 100% now.

Basically, the earlier code incorrectly assumed that the blocks were
were attempting to merge did not start/end in control flow and
therefore had only a single successor. In that case it was safe to
erase all of the successor/predecessors and recreate them.

When the blocks actually start/end with control flow, they have
multiple successors/predecessors and that assumption is false. In that
case, we're not going to be able to merge blocks and we shouldn't
destroy their successor/predecessor information.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl/list: Add an exec_list_validate function

On Wed, Nov 5, 2014 at 5:12 PM, Jason Ekstrand ja...@jlekstrand.net wrote:


 On Wed, Nov 5, 2014 at 2:46 PM, Matt Turner matts...@gmail.com wrote:

 On Wed, Nov 5, 2014 at 2:00 PM, Jason Ekstrand ja...@jlekstrand.net
 wrote:
  This can be very useful for trying to debug list corruptions.
 
  Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com
  Cc: Ian Romanick i...@freedesktop.org
  ---
   src/glsl/list.h | 14 ++
   1 file changed, 14 insertions(+)
 
  diff --git a/src/glsl/list.h b/src/glsl/list.h
  index aac13fd..1d18ec9 100644
  --- a/src/glsl/list.h
  +++ b/src/glsl/list.h
  @@ -521,6 +521,20 @@ exec_node_insert_list_before(struct exec_node *n,
  struct exec_list *before)
  exec_list_make_empty(before);
   }
 
  +static inline void
  +exec_list_validate(struct exec_list *list)
  +{
  +   assert(list-head-prev == (struct exec_node *) list-head);
  +   assert(list-tail == NULL);
  +   assert(list-tail_pred-next == (struct exec_node *) list-tail);
  +
  +   for (struct exec_node *node = list-head;
  +node-next != NULL; node = node-next) {

 Just use foreach_in_list().


 Sure, I can do that.

  +  assert(node-next-prev == node);
  +  assert(node-prev-next == node);
  +   }
  +}

 Are you intending to call this from gdb? I'm having a hard time
 imagining committing code that *sometimes* corrupts lists, which seems
 like why this function would be useful to call from real code.


 It is useful to call from gdb but I also call it all over nir_validate.c.  I
 spent most of today fighting linked list corruptions, and this was very
 helpful for tracking them down.  Another option would be rename it to
 exec_list_is_valid and make it return a bool.  Then the standard procedure
 would be assert(exec_list_is_valid(list)).  Would that be better?


 If you just want to call it from gdb, wrap the whole thing in #ifndef
 NDEBUG. I don't want to ever accidentally call this function and think
 it validated something when it actually did nothing.

Yeah, ignore all this.

Reviewed-by: Matt Turner matts...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] gallivm: Properly update for removal of JITMemoryManager in LLVM 3.6.

On Wed, Oct 22, 2014 at 11:11 AM,  jfons...@vmware.com wrote:
 From: José Fonseca jfons...@vmware.com

 JITMemoryManager was removed in LLVM 3.6, and replaced by its base
 class RTDyldMemoryManager.

 This change fixes our JIT memory managers specializations to derive
 from RTDyldMemoryManager in LLVM 3.6 instead of JITMemoryManager.

 This enables llvmpipe to run with LLVM 3.6.

Should these be marked for stable? Or does 10.3.x work with 3.6?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] New stable-branch 10.3 candidate pushed

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello list,

Barring a slight delay, during which I was expecting more
non-freedreno patches, it's time for yet another 10.3 stable
candidate. Currently we have:
 - 102 queued (89 of which are for freedreno)
 - 4 nominated (outstanding)
 - and 3 rejected patches


In a nut shell this gives us
 - freedreno: 0ad is playable (+ 395 piglit fixes)
 - r600/radeonsi: 3 rendering fixes and one crash
 - glsl: 3 bug-fixes/crashes
 - build: fixed static linking against llvm, fix msvc and android builds

Take a look at section Mesa stable queue for more information.


Testing
- ---
The following results are against piglit a636a3610d7.


Fixes - gallium swrast
- --
Tests:
 - arb_explicit_uniform_location-use-of-unused-loc (crash  pass)
 - linker/set-location-to-sampler (fail  pass)


Fixes - classic i965(snb)
- -
Tests:
 - arb_explicit_uniform_location-use-of-unused-loc (crash  pass)
 - linker/set-location-to-sampler (fail  pass)
 - GLX_OML_sync_control/timing -divisor 2  (warn  pass)
 - GLX_OML_sync_control/timing -msc-delta 1 (warn  pass)


Regressions - classic i965(snb)
- ---
Tests:
 - GLX_OML_sync_control/timing -divisor 1  (pass  warn)

The OMX_sync_control tests continue to oscillate between warn and
pass, with the result depending on the moon cycle :P


Testing reports/general approval
- 
Any testing reports (or general approval of the state of the branch)
will be greatly appreciated.


Trivial merge conflicts
- ---
Here are the commits where I manually merged conflicts, (so these might
merit additional review):

commit 4956788a5f40fff2ea72d0f5bd6fcdb116492896
Author: Anuj Phogat anuj.pho...@gmail.com
Date:   Mon Sep 22 15:10:28 2014 -0700

glsl: Use signed array index in update_max_array_access()

(cherry picked from commit 7a652c41b4de4bdbb954a4ebf6cdb605d197e999)


As usual the plan is to have the next stable (10.3.3) released this
Friday, so if you have any questions or comments that you would like to
share before the release, please go ahead.


Cheers,
Emil


Mesa stable queue
- -

Nominated (4)
=

Kenneth Graunke (1):
  i965: Fix an off-by-1 error in the draw upload code's size
calculation.

Tom Stellard (2):
  radeonsi: Program RASTER_CONFIG for harvested GPUs v4
  R600/SI: radeonsi: Program RASTER_CONFIG for harvested GPUs v3

Tomasz Figa (1):
  glsl: Fix no return value for non-void function


Rejected(3)
===

Anuj Phogat (2):
  glsl: No compile error for out of bounds array index
  glsl: Don't abort if array index is out of bounds

Mauro Rossi (1):
  gallium/nouveau: use std::isfinite in c++ sources


Queued (102)


Anuj Phogat (2):
  glsl: Fix crash due to negative array index
  glsl: Use signed array index in update_max_array_access()

Brian Paul (1):
  mesa: fix UNCLAMPED_FLOAT_TO_UBYTE() macro for MSVC

Ilia Mirkin (27):
  freedreno/ir3: INEG operates on src0, not src1
  freedreno/ir3: add UARL support
  freedreno/ir3: negate result of USLT/etc
  freedreno/ir3: use unsigned comparison for UIF
  freedreno/ir3: add TXL support
  freedreno/ir3: fix UCMP handling
  freedreno/ir3: implement UMUL correctly
  freedreno: add default .dir-locals.el for emacs settings
  freedreno/ir3: make texture instruction construction more dynamic
  freedreno/ir3: fix TXB/TXL to actually pull the bias/lod argument
  freedreno/ir3: add TXQ support
  freedreno/ir3: add TXB2 support
  freedreno: dual-source render targets are not supported
  freedreno: instanced drawing/compute not yet supported
  freedreno/ir3: avoid fan-in sources referring to same instruction
  freedreno/ir3: add IDIV/UDIV support
  freedreno/ir3: add UMOD support, based on UDIV
  freedreno/ir3: add MOD support
  freedreno/ir3: add ISSG support
  freedreno/ir3: add UMAD support
  freedreno/ir3: make TXQ return integers, not floats
  freedreno/ir3: shadow comes before array
  freedreno/ir3: add texture offset support
  freedreno/ir3: add TXD support and expose ARB_shader_texture_lod
  freedreno/ir3: add TXF support
  freedreno: positions come out as integers, not half-integers
  freedreno/ir3: fix FSLT/etc handling to return 0/-1 instead of 0/1.0

Jan Vesely (1):
  configure: include llvm systemlibs when using static llvm

Marek Olšák (5):
  r600g: fix polygon mode for points and lines and point/line fill
modes
  radeonsi: fix polygon mode for points and lines and point/line
fill modes
  radeonsi: fix incorrect index buffer max size for lowered 8-bit
indices
  Revert st/mesa: set MaxUnrollIterations = 255
  r300g: remove enabled/disabled hyperz and AA compression messages

Mauro Rossi (1):
  gallium/nouveau: fully build the driver under android

Michel

Re: [Mesa-dev] [PATCH 3/3] gallivm: Properly update for removal of JITMemoryManager in LLVM 3.6.

On 06/11/14 19:20, Matt Turner wrote:
 On Wed, Oct 22, 2014 at 11:11 AM,  jfons...@vmware.com wrote:
 From: José Fonseca jfons...@vmware.com

 JITMemoryManager was removed in LLVM 3.6, and replaced by its base
 class RTDyldMemoryManager.

 This change fixes our JIT memory managers specializations to derive
 from RTDyldMemoryManager in LLVM 3.6 instead of JITMemoryManager.

 This enables llvmpipe to run with LLVM 3.6.
 
 Should these be marked for stable? Or does 10.3.x work with 3.6?
If I understood Michel correctly a while back - it's not the best of
ideas to add support for unreleased llvm in the mesa stable branches.

As the llvm release is made we can obviously pick them up when needed.

Cheers,
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [PATCH] i965: Fix an off-by-1 error in the draw upload code's size calculation.

On Thursday, November 06, 2014 02:55:25 PM Emil Velikov wrote:
 Hi Ken,
 
 From what I've gathered the proposed patch is incorrect and is (most
 likely) working around a buggy application behaviour. Afaics Ian
 suggested that we add a driconf option for such applications.
 
 Should I consider this patch for the stable branch or the above sounds
 about right and we can drop it ?
 
 Thanks
 Emil

We should drop it.  Sorry for the noise...

--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] gallivm: Properly update for removal of JITMemoryManager in LLVM 3.6.

On Thu, Nov 6, 2014 at 11:27 AM, Emil Velikov emil.l.veli...@gmail.com wrote:
 On 06/11/14 19:20, Matt Turner wrote:
 On Wed, Oct 22, 2014 at 11:11 AM,  jfons...@vmware.com wrote:
 From: José Fonseca jfons...@vmware.com

 JITMemoryManager was removed in LLVM 3.6, and replaced by its base
 class RTDyldMemoryManager.

 This change fixes our JIT memory managers specializations to derive
 from RTDyldMemoryManager in LLVM 3.6 instead of JITMemoryManager.

 This enables llvmpipe to run with LLVM 3.6.

 Should these be marked for stable? Or does 10.3.x work with 3.6?
 If I understood Michel correctly a while back - it's not the best of
 ideas to add support for unreleased llvm in the mesa stable branches.

 As the llvm release is made we can obviously pick them up when needed.

 Cheers,
 Emil

Ah, sure. That makes sense.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [PATCH] i965: Fix an off-by-1 error in the draw upload code's size calculation.

On 06/11/14 19:29, Kenneth Graunke wrote:
 On Thursday, November 06, 2014 02:55:25 PM Emil Velikov wrote:
 Hi Ken,

 From what I've gathered the proposed patch is incorrect and is (most
 likely) working around a buggy application behaviour. Afaics Ian
 suggested that we add a driconf option for such applications.

 Should I consider this patch for the stable branch or the above sounds
 about right and we can drop it ?

 Thanks
 Emil
 
 We should drop it.  Sorry for the noise...
 
Ack. There is nothing to apologise for :)

-Emil
 --Ken
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

On Wed, Nov 5, 2014 at 12:54 PM, Matt Turner matts...@gmail.com wrote:
 On Wed, Nov 5, 2014 at 12:50 PM, Timothy Arceri t_arc...@yahoo.com.au wrote:
 There have been quite a few eyes over this now but nobody has given it a
 reviewed by yet.

 Would be nice to get it in before the code freeze. Any takers?

 Yes, I'll make sure that happens.

I made a couple of trivial changes to the commit message and added
some spaces between __m128i and * in casts and pushed it with review.

Thanks!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 84566] Unify the format conversion code

https://bugs.freedesktop.org/show_bug.cgi?id=84566

--- Comment #51 from Jason Ekstrand ja...@jlekstrand.net ---
(In reply to Samuel Iglesias from comment #48)
 (In reply to Jason Ekstrand from comment #34)
  (In reply to Samuel Iglesias from comment #33)
   Jason, I would like to know your opinion about the integer RGBA clamping
   done in pack.c (_mesa_pack_rgba_span_from_ints()).
   
   glReadPixels() and glGetTexImage() specs said (for example, in OpenGL 3.3.
   core specification) that for an integer RGBA color, each component is
   clamped to the representable range of type.
   
   Those GL functions end up calling pack.c's functions for performing the
   conversion (mesa_pack_rgba_span_from_ints() and others).
   
   It's possible to replace some of those pack.c's conversion functions by 
   the
   master conversion but the problem is in the clamping operation. I would 
   like
   to add a boolean argument called clamp to the master conversion function
   which is passed to _mesa_swizzle_and_convert() where each of its
   convert_{uint,int,byte,ubyte,short,ushort}() do the right thing when 
   clamp
   is true.
   
   This clamp parameter would be false for every call to either master
   conversion function or _mesa_swizzle_and_convert() except when they are
   being called from pack.c.
   
   What do you think?
  
  Supporting clamping is probably fine.  I think we determined we needed a
  clamp parameter anyway for some of the float conversions.  I guess it makes
  sense for integers too.  Let's watch out for performance when we implement
  it though.  Changing the loop inside mesa_swizzle_and_convert without
  hurting performance can be tricky.  The teximage-colors test in Piglit has a
  -benchmark flag that can be used for testing that.
 
 In the end, we did not make that change in pack.c as we could just use the
 autogenerated format pack/unpack functions. However I am retaking this topic
 again because we found another issue which would require a similar solution:
 
 The convert_*() functions in format_utils.c convert between source and
 destination data and are used by _mesa_swizzle_and_convert. We found that
 these were not good enough for various conversions that involved
 non-normalized types of different sizes: INT to SHORT, INT to BYTE, etc.
 Because of that, several piglit tests related to glGetTexImage() and others
 failed, like for example bin/ext_texture_integer-getteximage-clamping.
 
 In order to fix that we added the clamp expressions for these cases [1]  and
 with that we achieved no regressions when executing a full piglit run on
 i965 driver.
 Unfortunately, when testing our patches on a couple of Gallium drivers, we
 found a regression that we tracked down back to that patch:
 bin/arb_clear_buffer_object-formats.
 Reverting the patch makes fixes the problem with these Gallium drivers but
 then, bin/ext_texture_integer-getteximage-clamping fails again on i965. We
 wonder if there could be more cases like this that piglit is not covering,
 since it looks weird that this affects just this one test.
 
 So, we wonder if that patch for _mesa_swizzle_and_convert is correct and we
 should fix the failed gallium cases elsewhere, or if we should revert that
 patch and fix the cases it fixed in a different way. What do you think? Was
 _mesa_swizzle_and_convert implemented like that  on purpose or are these
 real bugs?

From my brief reading of the GL spec, it looks like clamping integers to the
max representable range is what it expects by default.  From the glTexImage
spec:

The selected groups are transferred to the GL as described in section 3.7.2
and then clamped to the representable range of the internal format. If the
internalformat of the texture is signed or unsigned integer, components are
clamped to [-2^(n-1), 2^(n-1)-1] or [0, 2^(n-1)-1], respectively, where n is
the number of bits per component. For color component groups, if the
internalformat of the texture is signed or unsigned normalized fixed-point,
components are clamped t0 [-1, 1] or [0, 1], respectively.

Therefore, it seems as if we want to be clamping when we have integer
destinations.  I'm not sure why the gallium drivers are regressing when you do.

One more observation is that it doesn't look like your clamping patch is
complete.  If we're going to clamp when we have an integer destination, we
should always clamp with integer destinations, not just in the few cases that
piglit hits.  I wouldn't be surprised if piglit's coverage in those areas is
terrible.

 If we decide to revert our clamp patch, then a solution could be to have a
 separate implementation of _mesa_swizzle_and_convert() and its convert_*()
 functions that clamps. We would have to use that version in glGetTexImage()
 (maybe in glReadPIxels too)  and use the normal version for texture uploads
 (texstore). If we do this, then we would have to add a clamp flag to
 _mesa_format_convert, the same flag to _mesa_swizzle_and_convert and the
 have a new

Re: [Mesa-dev] ARB_texture_buffer_range offsets

2014-11-06 Thread Roland Scheidegger

But even dword offsets aren't translatable right now to
pipe_sampler_view first_elements if the format has more than 32 bits.

Roland

Am 06.11.2014 um 18:21 schrieb Marek Olšák:
For radeonsi, I think only x8, x8y8, and x16 fetches can be
byte-aligned. Everything else is dword-aligned (the 2 lowest bits are
ignored). I guess the cap should be 4 then.

Marek

Roland

Marek

On Wed, Nov 5, 2014 at 9:08 PM, Roland Scheidegger srol...@vmware.com
wrote:
Trying to fix some bug due to alignment issues in llvmpipe's vertex
fetch, I came across some issue with ARB_texture_buffer_range.
Namely, it looks like the offsets specified there are always in bytes,
regardless the actual format (hence, as long as the
TEXTURE_BUFFER_OFFSET_ALIGNMENT is 1, it would be allowed to have an
offset of 15 bytes for a rgba32f format for instance making all fetches
quite unaligned).
However in gallium we actually have first_elem and last_elem parameters
in the sampler views which are specified in number of elements (so takes
the format into account), which is what d3d10 does and the state tracker
translates to that apparently. IMHO d3d10 makes way more sense there
because that way the necessary alignment scales automatically depending
on the format (so, if the format is 2x16bit for instance you'd need 4
byte alignment for the offset, and only need 16 bytes alignment for
4x32bit, ensuring all lookups are always aligned). This means that 15
byte offset in the example above is completely untranslatable.
But if I see that right, OpenGL doesn't work like that, meaning
effectively gallium drivers (and I doubt most other drivers neither)
cannot actually claim to support TEXTURE_BUFFER_OFFSET_ALIGNMENT lower
than 16, even if they'd only need that for 4x32bit formats. Though most
gallium drivers indeed claim 1 right now.
Looks quite messy...

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] egl_dri2: Allow both 24 and 32 bit X visuals for RGBA configs

I thought Eric and Chad already NAKed it in bugzilla.  The problem is
that applications ask for an RGBA visual for GL blending.  They use the
alpha channel to generate their images, but the final alpha values are,
basically, random... and the composited result would be pure garbage.

As Chad points out in comment #1, EGL just doesn't let applications do
the thing the patch is trying to do.

On 11/06/2014 05:12 AM, Emil Velikov wrote:
 Humble ping x2
 
 On 14/10/14 15:25, Emil Velikov wrote:
 Humble ping.

 On 23/09/14 01:25, Emil Velikov wrote:
 From: Sjoerd Simons sjoerd.sim...@collabora.co.uk

 When using RGBA EGLConfigs allow both RGB and RGBA X visuals, such that
 application can decide whether they want to use RGBA (and have the
 compositor blend their windows).

 On my system with this change EGLConfigs with a 24 bit visual comes up
 first, as such applications blindly picking the first EGLConfig will
 still get an RGB X visual.

 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67676
 ---

 Hello gents,

 This patch has been stuck in bugzilla since February this year. Bringing 
 it around here to gather greater exposure and perhaps some 
 comments/reviews.

 -Emil

  src/egl/drivers/dri2/egl_dri2.c |  5 +
  src/egl/drivers/dri2/platform_x11.c | 17 +
  2 files changed, 14 insertions(+), 8 deletions(-)

 diff --git a/src/egl/drivers/dri2/egl_dri2.c 
 b/src/egl/drivers/dri2/egl_dri2.c
 index 20a7243..2ed90a7 100644
 --- a/src/egl/drivers/dri2/egl_dri2.c
 +++ b/src/egl/drivers/dri2/egl_dri2.c
 @@ -110,6 +110,11 @@ EGLint dri2_to_egl_attribute_map[] = {
  static EGLBoolean
  dri2_match_config(const _EGLConfig *conf, const _EGLConfig *criteria)
  {
 +
 +   if (criteria-NativeVisualID != EGL_DONT_CARE 
 +conf-NativeVisualID != criteria-NativeVisualID)
 +  return EGL_FALSE;
 +
 if (_eglCompareConfigs(conf, criteria, NULL, EGL_FALSE) != 0)
return EGL_FALSE;
  
 diff --git a/src/egl/drivers/dri2/platform_x11.c 
 b/src/egl/drivers/dri2/platform_x11.c
 index a7a7338..3395fb7 100644
 --- a/src/egl/drivers/dri2/platform_x11.c
 +++ b/src/egl/drivers/dri2/platform_x11.c
 @@ -672,14 +672,15 @@ dri2_x11_add_configs_for_visuals(struct 
 dri2_egl_display *dri2_dpy,
 dri2_add_config(disp, dri2_dpy-driver_configs[j], id++,
 surface_type, config_attrs, rgba_masks);
  
 -/* Allow a 24-bit RGB visual to match a 32-bit RGBA EGLConfig.
 - * Otherwise it will only match a 32-bit RGBA visual.  On a
 - * composited window manager on X11, this will make all of the
 - * EGLConfigs with destination alpha get blended by the
 - * compositor.  This is probably not what the application
 - * wants... especially on drivers that only have 32-bit RGBA
 - * EGLConfigs! */
 -if (d.data-depth == 24) {
 +/* Allow both 24-bit RGB visual and 32 bit RGBA to match a 
 32-bit
 + * RGBA EGLConfig.  Otherwise it will only match a 32-bit RGBA
 + * visual.  On a composited window manager on X11, this will 
 make
 + * all of the EGLConfigs with destination alpha get blended by 
 the
 + * compositor.  This is probably not what the application 
 wants...
 + * especially on drivers that only have 32-bit RGBA EGLConfigs!
 + * Allowing both allows applications to make the decision 
 whether
 + * 32 bit visuals are intended */
 +if (d.data-depth == 24 || d.data-depth == 32) {
 rgba_masks[3] =
~(rgba_masks[0] | rgba_masks[1] | rgba_masks[2]);
 dri2_add_config(disp, dri2_dpy-driver_configs[j], id++,


 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] mesa: add runtime support for SSSE3

On Thu, 2014-11-06 at 09:59 -0800, Matt Turner wrote:
 On Thu, Nov 6, 2014 at 4:18 AM, Timothy Arceri t_arc...@yahoo.com.au wrote:
  Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
  ---
   configure.ac   | 6 ++
   src/mesa/x86/common_x86.c  | 4 
   src/mesa/x86/common_x86_features.h | 4 +++-
   3 files changed, 13 insertions(+), 1 deletion(-)
 
  diff --git a/configure.ac b/configure.ac
  index 03f1bca..cc0a661 100644
  --- a/configure.ac
  +++ b/configure.ac
  @@ -258,6 +258,12 @@ if test x$SSE41_SUPPORTED = x1; then
   fi
   AM_CONDITIONAL([SSE41_SUPPORTED], [test x$SSE41_SUPPORTED = x1])
 
  +AX_CHECK_COMPILE_FLAG([-mssse3], [SSSE3_SUPPORTED=1], [SSSE3_SUPPORTED=0])
  +if test x$SSSE3_SUPPORTED = x1; then
  +DEFINES=$DEFINES -DUSE_SSSE3
  +fi
  +AM_CONDITIONAL([SSSE3_SUPPORTED], [test x$SSSE3_SUPPORTED = x1])
  +
   dnl Can't have static and shared libraries, default to static if user
   dnl explicitly requested. If both disabled, set to static since shared
   dnl was explicitly requested.
  diff --git a/src/mesa/x86/common_x86.c b/src/mesa/x86/common_x86.c
  index 25f5c40..ad0648a 100644
  --- a/src/mesa/x86/common_x86.c
  +++ b/src/mesa/x86/common_x86.c
  @@ -352,6 +352,10 @@ _mesa_get_x86_features(void)
 
 __get_cpuid(1, eax, ebx, ecx, edx);
 
  +#ifdef bit_SSSE3
 
 Why did you put this in an #ifdef bit_SSSE3? That's defined everywhere.

I have no idea. I'll fix it.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy

On Thu, 2014-11-06 at 10:03 -0800, Matt Turner wrote:
 On Thu, Nov 6, 2014 at 4:20 AM, Timothy Arceri t_arc...@yahoo.com.au wrote:
  Also cleans up some if statements in the *faster functions.
 
  Callgrind cpu usage results from pts benchmarks:
 
  For ytile_copy_faster()
 
  Nexuiz 1.6.1: 2.16% - 1.20%
 
  Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
  ---
   src/mesa/Makefile.am   |  8 +++
   src/mesa/drivers/dri/i965/intel_tex_subimage.c | 82 
  ++
   src/mesa/main/fast_rgba8_copy.c| 78 
  
   src/mesa/main/fast_rgba8_copy.h| 37 
   4 files changed, 141 insertions(+), 64 deletions(-)
   create mode 100644 src/mesa/main/fast_rgba8_copy.c
   create mode 100644 src/mesa/main/fast_rgba8_copy.h
 
  diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
  index e71bccb..2402096 100644
  --- a/src/mesa/Makefile.am
  +++ b/src/mesa/Makefile.am
  @@ -107,6 +107,10 @@ AM_CXXFLAGS = $(LLVM_CFLAGS) $(VISIBILITY_CXXFLAGS)
 
   ARCH_LIBS =
 
  +if SSSE3_SUPPORTED
  +ARCH_LIBS += libmesa_ssse3.la
  +endif
  +
   if SSE41_SUPPORTED
   ARCH_LIBS += libmesa_sse41.la
   endif
  @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \
  main/streaming-load-memcpy.c
   libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
 
  +libmesa_ssse3_la_SOURCES = \
  +   main/fast_rgba8_copy.c
  +libmesa_ssse3_la_CFLAGS = $(AM_CFLAGS) -mssse3
  +
   pkgconfigdir = $(libdir)/pkgconfig
   pkgconfig_DATA = gl.pc
 
  diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c 
  b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
  index cb5738a..0deeb75 100644
  --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
  +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
  @@ -27,6 +27,7 @@

  **/
 
   #include main/bufferobj.h
  +#include main/fast_rgba8_copy.h
   #include main/image.h
   #include main/macros.h
   #include main/mtypes.h
  @@ -42,9 +43,7 @@
   #include intel_mipmap_tree.h
   #include intel_blit.h
 
  -#ifdef __SSSE3__
  -#include tmmintrin.h
  -#endif
  +#include x86/common_x86_asm.h
 
   #define FILE_DEBUG_FLAG DEBUG_TEXTURE
 
  @@ -175,18 +174,6 @@ err:
  return false;
   }
 
  -#ifdef __SSSE3__
  -static const uint8_t rgba8_permutation[16] =
  -   { 2,1,0,3, 6,5,4,7, 10,9,8,11, 14,13,12,15 };
  -
  -/* NOTE: dst must be 16 byte aligned */
  -#define rgba8_copy_16(dst, src) \
  -   *(__m128i *)(dst) = _mm_shuffle_epi8(\
  -  (__m128i) _mm_loadu_ps((float *)(src)),   \
  -  *(__m128i *) rgba8_permutation\
  -   )
  -#endif
  -
   /**
* Copy RGBA to BGRA - swap R and B.
*/
  @@ -196,29 +183,6 @@ rgba8_copy(void *dst, const void *src, size_t bytes)
  uint8_t *d = dst;
  uint8_t const *s = src;
 
  -#ifdef __SSSE3__
  -   /* Fast copying for tile spans.
  -*
  -* As long as the destination texture is 16 aligned,
  -* any 16 or 64 spans we get here should also be 16 aligned.
  -*/
  -
  -   if (bytes == 16) {
  -  assert(!(((uintptr_t)dst)  0xf));
  -  rgba8_copy_16(d+ 0, s+ 0);
  -  return dst;
  -   }
  -
  -   if (bytes == 64) {
  -  assert(!(((uintptr_t)dst)  0xf));
  -  rgba8_copy_16(d+ 0, s+ 0);
  -  rgba8_copy_16(d+16, s+16);
  -  rgba8_copy_16(d+32, s+32);
  -  rgba8_copy_16(d+48, s+48);
  -  return dst;
  -   }
  -#endif
  -
  while (bytes = 4) {
 d[0] = s[2];
 d[1] = s[1];
  @@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t 
  x2, uint32_t x3,
 mem_copy_fn mem_copy)
   {
  if (x0 == 0  x3 == xtile_width  y0 == 0  y1 == xtile_height) {
  -  if (mem_copy == memcpy)
  - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
  -   dst, src, src_pitch, swizzle_bit, memcpy);
  -  else if (mem_copy == rgba8_copy)
  - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
  -   dst, src, src_pitch, swizzle_bit, rgba8_copy);
  -   } else {
  -  if (mem_copy == memcpy)
  - return xtile_copy(x0, x1, x2, x3, y0, y1,
  -   dst, src, src_pitch, swizzle_bit, memcpy);
  -  else if (mem_copy == rgba8_copy)
  - return xtile_copy(x0, x1, x2, x3, y0, y1,
  -   dst, src, src_pitch, swizzle_bit, rgba8_copy);
  +  return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
  +dst, src, src_pitch, swizzle_bit, mem_copy);
  }
  xtile_copy(x0, x1, x2, x3, y0, y1,
 dst, src, src_pitch, swizzle_bit, mem_copy);
  @@ -388,19 +341,8 @@ ytile_copy_faster(uint32_t x0, uint32_t x1, uint32_t 
  x2, uint32_t x3,
 mem_copy_fn mem_copy)
   {
  if (x0 == 0  x3 == ytile_width  y0 == 0  y1 == ytile_height) {
  -

[Mesa-dev] [PATCH] gbm: dlopen libglapi so gbm_create_device works

From: Frank Henigman fjhenig...@chromium.org

Dri driver libs are not linked to pull in libglapi so gbm_create_device()
fails when it tries to dlopen them (unless the application is linked
with something that does pull in libglapi, like libGL).
Until dri drivers can be fixed properly, dlopen libglapi before trying
to dlopen them.
https://bugs.freedesktop.org/show_bug.cgi?id=57702

Signed-off-by: Frank Henigman fjhenig...@google.com
---
 src/gbm/backends/dri/gbm_dri.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/gbm/backends/dri/gbm_dri.c b/src/gbm/backends/dri/gbm_dri.c
index f637e32..6ea2294 100644
--- a/src/gbm/backends/dri/gbm_dri.c
+++ b/src/gbm/backends/dri/gbm_dri.c
@@ -311,6 +311,11 @@ dri_open_driver(struct gbm_dri_device *dri)
if (search_paths == NULL)
   search_paths = DEFAULT_DRIVER_DIR;
 
+   /* Temporarily work around dri driver libs that need symbols in libglapi
+* but don't automatically link it in.
+*/
+   dlopen(libglapi.so.0, RTLD_LAZY | RTLD_GLOBAL);
+
dri-driver = NULL;
end = search_paths + strlen(search_paths);
for (p = search_paths; p  end  dri-driver == NULL; p = next + 1) {
-- 
2.1.0.rc2.206.gedb03e5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] mesa: add runtime support for SSSE3

On 11/06/2014 04:18 AM, Timothy Arceri wrote:
 Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
 ---
  configure.ac   | 6 ++
  src/mesa/x86/common_x86.c  | 4 
  src/mesa/x86/common_x86_features.h | 4 +++-
  3 files changed, 13 insertions(+), 1 deletion(-)
 
 diff --git a/configure.ac b/configure.ac
 index 03f1bca..cc0a661 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -258,6 +258,12 @@ if test x$SSE41_SUPPORTED = x1; then
  fi
  AM_CONDITIONAL([SSE41_SUPPORTED], [test x$SSE41_SUPPORTED = x1])
  
 +AX_CHECK_COMPILE_FLAG([-mssse3], [SSSE3_SUPPORTED=1], [SSSE3_SUPPORTED=0])
 +if test x$SSSE3_SUPPORTED = x1; then
 +DEFINES=$DEFINES -DUSE_SSSE3
 +fi
 +AM_CONDITIONAL([SSSE3_SUPPORTED], [test x$SSSE3_SUPPORTED = x1])
 +

Just so people can find things, the SSSE3 check should logically come
before the SSE4.1 check.

  dnl Can't have static and shared libraries, default to static if user
  dnl explicitly requested. If both disabled, set to static since shared
  dnl was explicitly requested.
 diff --git a/src/mesa/x86/common_x86.c b/src/mesa/x86/common_x86.c
 index 25f5c40..ad0648a 100644
 --- a/src/mesa/x86/common_x86.c
 +++ b/src/mesa/x86/common_x86.c
 @@ -352,6 +352,10 @@ _mesa_get_x86_features(void)
  
__get_cpuid(1, eax, ebx, ecx, edx);
  
 +#ifdef bit_SSSE3
 +  if (ecx  bit_SSSE3)
 + _mesa_x86_cpu_features |= X86_FEATURE_SSSE3;
 +#endif
if (ecx  bit_SSE4_1)
   _mesa_x86_cpu_features |= X86_FEATURE_SSE4_1;
 }
 diff --git a/src/mesa/x86/common_x86_features.h 
 b/src/mesa/x86/common_x86_features.h
 index 66f2cf6..6eb2b38 100644
 --- a/src/mesa/x86/common_x86_features.h
 +++ b/src/mesa/x86/common_x86_features.h
 @@ -43,7 +43,8 @@
  #define X86_FEATURE_XMM2 (16)
  #define X86_FEATURE_3DNOWEXT (17)
  #define X86_FEATURE_3DNOW(18)
 -#define X86_FEATURE_SSE4_1   (19)
 +#define X86_FEATURE_SSSE3(19)
 +#define X86_FEATURE_SSE4_1   (110)
  
  /* standard X86 CPU features */
  #define X86_CPU_FPU  (10)
 @@ -65,6 +66,7 @@
  #define cpu_has_xmm2 (_mesa_x86_cpu_features  X86_FEATURE_XMM2)
  #define cpu_has_3dnow(_mesa_x86_cpu_features  
 X86_FEATURE_3DNOW)
  #define cpu_has_3dnowext (_mesa_x86_cpu_features  X86_FEATURE_3DNOWEXT)
 +#define cpu_has_ssse3(_mesa_x86_cpu_features  
 X86_FEATURE_SSSE3)
  #define cpu_has_sse4_1   (_mesa_x86_cpu_features  
 X86_FEATURE_SSE4_1)
  
  #endif
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] megadriver: explicitly link against glapi, link with -no-undefined

On Thu, Nov 6, 2014 at 8:09 AM, Emil Velikov emil.l.veli...@gmail.com wrote:

 Indeed. static-glapi does not get too much testing, plus it seems that
 it's broken (in a way) for a lng time.
 It seems that we'll have to (temporary) resolve with shoving
 dlopen(libglapi.so) into gbm, so that in time programs can nuke it from
 their codebase. glapi is internal (implementation) detail that they
 should not need to bother/know.

Here you go
https://freedesktop.org/patch/36392/
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy

On Thu, Nov 6, 2014 at 1:22 PM, Timothy Arceri t_arc...@yahoo.com.au wrote:
 On Thu, 2014-11-06 at 10:03 -0800, Matt Turner wrote:
 On Thu, Nov 6, 2014 at 4:20 AM, Timothy Arceri t_arc...@yahoo.com.au wrote:
  Also cleans up some if statements in the *faster functions.
 
  Callgrind cpu usage results from pts benchmarks:
 
  For ytile_copy_faster()
 
  Nexuiz 1.6.1: 2.16% - 1.20%
 
  Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
  ---
   src/mesa/Makefile.am   |  8 +++
   src/mesa/drivers/dri/i965/intel_tex_subimage.c | 82 
  ++
   src/mesa/main/fast_rgba8_copy.c| 78 
  
   src/mesa/main/fast_rgba8_copy.h| 37 
   4 files changed, 141 insertions(+), 64 deletions(-)
   create mode 100644 src/mesa/main/fast_rgba8_copy.c
   create mode 100644 src/mesa/main/fast_rgba8_copy.h
 
  diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
  index e71bccb..2402096 100644
  --- a/src/mesa/Makefile.am
  +++ b/src/mesa/Makefile.am
  @@ -107,6 +107,10 @@ AM_CXXFLAGS = $(LLVM_CFLAGS) $(VISIBILITY_CXXFLAGS)
 
   ARCH_LIBS =
 
  +if SSSE3_SUPPORTED
  +ARCH_LIBS += libmesa_ssse3.la
  +endif
  +
   if SSE41_SUPPORTED
   ARCH_LIBS += libmesa_sse41.la
   endif
  @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \
  main/streaming-load-memcpy.c
   libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
 
  +libmesa_ssse3_la_SOURCES = \
  +   main/fast_rgba8_copy.c
  +libmesa_ssse3_la_CFLAGS = $(AM_CFLAGS) -mssse3
  +
   pkgconfigdir = $(libdir)/pkgconfig
   pkgconfig_DATA = gl.pc
 
  diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c 
  b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
  index cb5738a..0deeb75 100644
  --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
  +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
  @@ -27,6 +27,7 @@

  **/
 
   #include main/bufferobj.h
  +#include main/fast_rgba8_copy.h
   #include main/image.h
   #include main/macros.h
   #include main/mtypes.h
  @@ -42,9 +43,7 @@
   #include intel_mipmap_tree.h
   #include intel_blit.h
 
  -#ifdef __SSSE3__
  -#include tmmintrin.h
  -#endif
  +#include x86/common_x86_asm.h
 
   #define FILE_DEBUG_FLAG DEBUG_TEXTURE
 
  @@ -175,18 +174,6 @@ err:
  return false;
   }
 
  -#ifdef __SSSE3__
  -static const uint8_t rgba8_permutation[16] =
  -   { 2,1,0,3, 6,5,4,7, 10,9,8,11, 14,13,12,15 };
  -
  -/* NOTE: dst must be 16 byte aligned */
  -#define rgba8_copy_16(dst, src) \
  -   *(__m128i *)(dst) = _mm_shuffle_epi8(\
  -  (__m128i) _mm_loadu_ps((float *)(src)),   \
  -  *(__m128i *) rgba8_permutation\
  -   )
  -#endif
  -
   /**
* Copy RGBA to BGRA - swap R and B.
*/
  @@ -196,29 +183,6 @@ rgba8_copy(void *dst, const void *src, size_t bytes)
  uint8_t *d = dst;
  uint8_t const *s = src;
 
  -#ifdef __SSSE3__
  -   /* Fast copying for tile spans.
  -*
  -* As long as the destination texture is 16 aligned,
  -* any 16 or 64 spans we get here should also be 16 aligned.
  -*/
  -
  -   if (bytes == 16) {
  -  assert(!(((uintptr_t)dst)  0xf));
  -  rgba8_copy_16(d+ 0, s+ 0);
  -  return dst;
  -   }
  -
  -   if (bytes == 64) {
  -  assert(!(((uintptr_t)dst)  0xf));
  -  rgba8_copy_16(d+ 0, s+ 0);
  -  rgba8_copy_16(d+16, s+16);
  -  rgba8_copy_16(d+32, s+32);
  -  rgba8_copy_16(d+48, s+48);
  -  return dst;
  -   }
  -#endif
  -
  while (bytes = 4) {
 d[0] = s[2];
 d[1] = s[1];
  @@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t 
  x2, uint32_t x3,
 mem_copy_fn mem_copy)
   {
  if (x0 == 0  x3 == xtile_width  y0 == 0  y1 == xtile_height) {
  -  if (mem_copy == memcpy)
  - return xtile_copy(0, 0, xtile_width, xtile_width, 0, 
  xtile_height,
  -   dst, src, src_pitch, swizzle_bit, memcpy);
  -  else if (mem_copy == rgba8_copy)
  - return xtile_copy(0, 0, xtile_width, xtile_width, 0, 
  xtile_height,
  -   dst, src, src_pitch, swizzle_bit, rgba8_copy);
  -   } else {
  -  if (mem_copy == memcpy)
  - return xtile_copy(x0, x1, x2, x3, y0, y1,
  -   dst, src, src_pitch, swizzle_bit, memcpy);
  -  else if (mem_copy == rgba8_copy)
  - return xtile_copy(x0, x1, x2, x3, y0, y1,
  -   dst, src, src_pitch, swizzle_bit, rgba8_copy);
  +  return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
  +dst, src, src_pitch, swizzle_bit, mem_copy);
  }
  xtile_copy(x0, x1, x2, x3, y0, y1,
 dst, src, src_pitch, swizzle_bit, mem_copy);
  @@ -388,19 +341,8 @@ ytile_copy_faster(uint32_t x0, uint32_t x1, uint32_t 
  x2, uint32_t x3,
 mem_copy_fn mem_copy)

Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy

On 11/06/2014 02:12 PM, Matt Turner wrote:
 On Thu, Nov 6, 2014 at 1:22 PM, Timothy Arceri t_arc...@yahoo.com.au wrote:
 On Thu, 2014-11-06 at 10:03 -0800, Matt Turner wrote:
 On Thu, Nov 6, 2014 at 4:20 AM, Timothy Arceri t_arc...@yahoo.com.au 
 wrote:
 +#include assert.h
 +#include stdint.h
 +#include stddef.h

 I don't think you need these three includes for a single prototype.

 Right I can move assert to the .c
 
 Presumably one of the others can be removed as well? I don't know what
 defines size_t.

stddef.h since C89 at least.

 +
 +/* Fast copying for tile spans.
 + *
 + * As long as the destination texture is 16 aligned,
 + * any 16 or 64 spans we get here should also be 16 aligned.
 + */
 +void *
 +_mesa_fast_rgba8_copy(void *dst, const void *src, size_t n);
 --
 1.9.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation forglDrawElements

2014-11-06 Thread Marc Dietrich

Am Donnerstag, 6. November 2014, 11:55:59 schrieb Matt Turner:
 On Wed, Nov 5, 2014 at 12:54 PM, Matt Turner matts...@gmail.com wrote:
  On Wed, Nov 5, 2014 at 12:50 PM, Timothy Arceri t_arc...@yahoo.com.au 
wrote:
  There have been quite a few eyes over this now but nobody has given it a
  reviewed by yet.
  
  Would be nice to get it in before the code freeze. Any takers?
  
  Yes, I'll make sure that happens.
 
 I made a couple of trivial changes to the commit message and added
 some spaces between __m128i and * in casts and pushed it with review.

somehow this managed to successfully break my build - so finally a nice 
testcase :-)

Using CFLAGS=-march=native -fLTO. My cpu does not support sse4.1. During 
compile time, -mno-sse4.1 -msse4.1 is passed, while during linktime, only -
mno-sse4.1 is passed.

usr/lib64/gcc/x86_64-suse-linux/4.9/include/smmintrin.h: In function 
'vbo_get_minmax_indices':
/usr/lib64/gcc/x86_64-suse-linux/4.9/include/smmintrin.h:320:20: error: 
'__builtin_ia32_pmaxud128' needs isa option -m32 -msse4.1
   return (__m128i) __builtin_ia32_pmaxud128 ((__v4si)__X, (__v4si)__Y);
^
/usr/lib64/gcc/x86_64-suse-linux/4.9/include/smmintrin.h:314:20: error: 
'__builtin_ia32_pminud128' needs isa option -m32 -msse4.1
   return (__m128i) __builtin_ia32_pminud128 ((__v4si)__X, (__v4si)__Y);
^
make[4]: *** [/tmp/cc5QZcUd.ltrans1.ltrans.o] Error 1

So opposite to my earlier thoughts, the -m flags are not kept per-file 
during link. On the other hand, removing -march=native gives:

  CXXLDgallium_dri.la
../../src/mesa/vbo/vbo_exec_array.c: In function 'vbo_get_minmax_indices':
../../src/mesa/vbo/vbo_exec_array.c:197:0: internal compiler error: in 
propagate_rhs_into_lhs, at tree-ssa-dom.c:2913
Please submit a full bug report,
with preprocessed source if appropriate.
See http://bugs.opensuse.org/ for instructions.

with gcc-4.9.2 from opensuse/factory, so who knows what is really happening.

Trying with gcc-4.8 also fails like in the first example, even without 
march=native, maybe because the default march is x86_64 which also does not 
support see4.1.

So this is only a problem with Link-Time-Opt.

Marc

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation forglDrawElements

On Thu, Nov 6, 2014 at 2:33 PM, Marc Dietrich marvi...@gmx.de wrote:
 So this is only a problem with Link-Time-Opt.

I don't actually know how you're getting to this point in the build
with LTO. It fails for me in src/mapi.

I think LTO is something we should make work at some point, but I
don't think we should gate contributions on whether LTO works or not.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy

2014-11-06 Thread Chad Versace


On Thu 06 Nov 2014, Timothy Arceri wrote:

Also cleans up some if statements in the *faster functions.


I have comments about the cleanup below.


diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c 
b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
index cb5738a..0deeb75 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c


/**
* Copy texture data from linear to X tile layout, faster.
*
* Same as \ref xtile_copy but faster, because it passes constant parameters
* for common cases, allowing the compiler to inline code optimized for those
* cases.
*
* \copydoc tile_copy_fn
*/
static FLATTEN void
xtile_copy_faster(...)


@@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, 
uint32_t x3,
  mem_copy_fn mem_copy)




{
   if (x0 == 0  x3 == xtile_width  y0 == 0  y1 == xtile_height) {
-  if (mem_copy == memcpy)
- return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
-   dst, src, src_pitch, swizzle_bit, memcpy);
-  else if (mem_copy == rgba8_copy)
- return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
-   dst, src, src_pitch, swizzle_bit, rgba8_copy);
-   } else {
-  if (mem_copy == memcpy)
- return xtile_copy(x0, x1, x2, x3, y0, y1,
-   dst, src, src_pitch, swizzle_bit, memcpy);
-  else if (mem_copy == rgba8_copy)
- return xtile_copy(x0, x1, x2, x3, y0, y1,
-   dst, src, src_pitch, swizzle_bit, rgba8_copy);
+  return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
+dst, src, src_pitch, swizzle_bit, mem_copy);
   }
   xtile_copy(x0, x1, x2, x3, y0, y1,
  dst, src, src_pitch, swizzle_bit, mem_copy);


The cleanup of this if tree concerns me. Accoring the function
comment, the original author of this function, fjhenigman, clearly 
created the weird 'if' tree with the intentation that the compiler would 
inline code optimized for those cases.


Without one of the following, I object to this cleanup:
   - Frank's approval, or
   - Proof that gcc never does the desired optimizations, or
   - Proof that this change does not harm's Chrome's texture upload 
   performance.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy

I tested your patch with the teximage program in mesa demos, the
same thing I used to benchmark when I developed this code.
As Matt and Chad point out, the odd-looking _faster functions are
there for a reason.  Your change causes a huge slowdown.
I tested on a sandybridge system with a Intel(R) Celeron(R) CPU 857 @
1.20GHz.  Mesa compiled with -O2.

original code:
  TexSubImage(RGBA/ubyte 256 x 256): 9660.4 images/sec, 2415.1 MB/sec
  TexSubImage(RGBA/ubyte 1024 x 1024): 821.2 images/sec, 3284.7 MB/sec
  TexSubImage(RGBA/ubyte 4096 x 4096): 76.3 images/sec, 4884.9 MB/sec

  TexSubImage(BGRA/ubyte 256 x 256): 11307.1 images/sec, 2826.8 MB/sec
  TexSubImage(BGRA/ubyte 1024 x 1024): 944.6 images/sec, 3778.6 MB/sec
  TexSubImage(BGRA/ubyte 4096 x 4096): 76.7 images/sec, 4908.3 MB/sec

  TexSubImage(L/ubyte 256 x 256): 17847.5 images/sec, 1115.5 MB/sec
  TexSubImage(L/ubyte 1024 x 1024): 3068.2 images/sec, 3068.2 MB/sec
  TexSubImage(L/ubyte 4096 x 4096): 224.6 images/sec, 3593.0 MB/sec

your code:
  TexSubImage(RGBA/ubyte 256 x 256): 3271.6 images/sec, 817.9 MB/sec
  TexSubImage(RGBA/ubyte 1024 x 1024): 232.3 images/sec, 929.2 MB/sec
  TexSubImage(RGBA/ubyte 4096 x 4096): 47.5 images/sec, 3038.6 MB/sec

  TexSubImage(BGRA/ubyte 256 x 256): 2426.5 images/sec, 606.6 MB/sec
  TexSubImage(BGRA/ubyte 1024 x 1024): 164.1 images/sec, 656.4 MB/sec
  TexSubImage(BGRA/ubyte 4096 x 4096): 13.4 images/sec, 854.8 MB/sec

  TexSubImage(L/ubyte 256 x 256): 9514.5 images/sec, 594.7 MB/sec
  TexSubImage(L/ubyte 1024 x 1024): 864.1 images/sec, 864.1 MB/sec
  TexSubImage(L/ubyte 4096 x 4096): 59.7 images/sec, 955.2 MB/sec

This is just one run, not an average, but you can see it's slower
across the board up to a factor of around 6.
Also I couldn't configure the build after your patch.  I think you
left out a change to configure.ac to define SSSE3_SUPPORTED.

On Thu, Nov 6, 2014 at 6:08 PM, Chad Versace chad.vers...@intel.com wrote:
 On Thu 06 Nov 2014, Timothy Arceri wrote:

 Also cleans up some if statements in the *faster functions.


 I have comments about the cleanup below.

 diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
 b/src/mesa/drivers/dri/i965/intel_tex_subimage.c
 index cb5738a..0deeb75 100644
 --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c
 +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c


 /**
 * Copy texture data from linear to X tile layout, faster.
 *
 * Same as \ref xtile_copy but faster, because it passes constant parameters
 * for common cases, allowing the compiler to inline code optimized for those
 * cases.
 *
 * \copydoc tile_copy_fn
 */
 static FLATTEN void
 xtile_copy_faster(...)

 @@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t
 x2, uint32_t x3,
   mem_copy_fn mem_copy)



 {
if (x0 == 0  x3 == xtile_width  y0 == 0  y1 == xtile_height) {
 -  if (mem_copy == memcpy)
 - return xtile_copy(0, 0, xtile_width, xtile_width, 0,
 xtile_height,
 -   dst, src, src_pitch, swizzle_bit, memcpy);
 -  else if (mem_copy == rgba8_copy)
 - return xtile_copy(0, 0, xtile_width, xtile_width, 0,
 xtile_height,
 -   dst, src, src_pitch, swizzle_bit, rgba8_copy);
 -   } else {
 -  if (mem_copy == memcpy)
 - return xtile_copy(x0, x1, x2, x3, y0, y1,
 -   dst, src, src_pitch, swizzle_bit, memcpy);
 -  else if (mem_copy == rgba8_copy)
 - return xtile_copy(x0, x1, x2, x3, y0, y1,
 -   dst, src, src_pitch, swizzle_bit, rgba8_copy);
 +  return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height,
 +dst, src, src_pitch, swizzle_bit, mem_copy);
}
xtile_copy(x0, x1, x2, x3, y0, y1,
   dst, src, src_pitch, swizzle_bit, mem_copy);


 The cleanup of this if tree concerns me. Accoring the function
 comment, the original author of this function, fjhenigman, clearly created
 the weird 'if' tree with the intentation that the compiler would inline
 code optimized for those cases.

 Without one of the following, I object to this cleanup:
- Frank's approval, or
- Proof that gcc never does the desired optimizations, or
- Proof that this change does not harm's Chrome's texture upload
 performance.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy

On Thu, Nov 6, 2014 at 7:30 PM, Frank Henigman fjhenig...@google.com wrote:

 Also I couldn't configure the build after your patch.  I think you
 left out a change to configure.ac to define SSSE3_SUPPORTED.

Ah, that was in patch 1/2.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-06 Thread Siavash Eliasi

How and when is cpu_has_sse4_1 true? Is it controllable at runtime 
through setting some environmental variable? or is it set once during 
startup by detecting CPU features?


I guess checking for cpu_has_sse4_1 is unnecessary if it isn't 
controllable by user at runtime; because USE_SSE41 is a compile time 
check and requires the target machine to be SSE 4.1 capable already.


Best regards,
Siavash Eliasi.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

On Thu, Nov 6, 2014 at 1:30 AM, Siavash Eliasi siavashser...@gmail.com wrote:
 How and when is cpu_has_sse4_1 true? Is it controllable at runtime through
 setting some environmental variable? or is it set once during startup by
 detecting CPU features?

It's actually a macro, but yes, see the end of
src/mesa/x86/common_x86.c. It's set by using the CPUID instruction to
detect SSE 4.1 capabilities.

  if (ecx  bit_SSE4_1)
 _mesa_x86_cpu_features |= X86_FEATURE_SSE4_1;

 I guess checking for cpu_has_sse4_1 is unnecessary if it isn't
 controllable by user at runtime; because USE_SSE41 is a compile time check
 and requires the target machine to be SSE 4.1 capable already.

Right.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/blorp: Fix hiz ops on MSAA surfaces

2014-11-06 Thread Anuj Phogat

On Tue, Nov 4, 2014 at 8:40 AM, Chris Forbes chr...@ijw.co.nz wrote:

 Two things were broken here:
 - The depth/stencil surface dimensions were broken for MSAA.
 - Sample count was programmed incorrectly.

 Result was the depth resolve didn't work correctly on MSAA surfaces, and
 so sampling the surface later produced garbage.

 Fixes the new piglit test arb_texture_multisample-sample-depth, and
 various artifacts in 'tesseract' with msaa=4 glineardepth=0.

 Not observed any piglit regressions on Haswell.

 Signed-off-by: Chris Forbes chr...@ijw.co.nz
 ---
  src/mesa/drivers/dri/i965/brw_blorp.h|  4 
  src/mesa/drivers/dri/i965/gen7_blorp.cpp | 24 +---
  2 files changed, 21 insertions(+), 7 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h
 b/src/mesa/drivers/dri/i965/brw_blorp.h
 index ff68000..c4ff0f7 100644
 --- a/src/mesa/drivers/dri/i965/brw_blorp.h
 +++ b/src/mesa/drivers/dri/i965/brw_blorp.h
 @@ -236,6 +236,10 @@ public:
 bool use_wm_prog;
 brw_blorp_wm_push_constants wm_push_consts;
 bool color_write_disable[4];
 +
 +   unsigned num_samples() const {
 +  return dst.mt ? dst.num_samples : depth.mt-num_samples;
 +   }
  };


 diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
 b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
 index 206a6ff..cc57ffe 100644
 --- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
 +++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
 @@ -415,7 +415,7 @@ gen7_blorp_emit_sf_config(struct brw_context *brw,
OUT_BATCH(_3DSTATE_SF  16 | (7 - 2));
OUT_BATCH(params-depth_format 
  GEN7_SF_DEPTH_BUFFER_SURFACE_FORMAT_SHIFT);
 -  OUT_BATCH(params-dst.num_samples  1 ? GEN6_SF_MSRAST_ON_PATTERN :
 0);
 +  OUT_BATCH(params-num_samples()  1 ? GEN6_SF_MSRAST_ON_PATTERN :
 0);
OUT_BATCH(0);
OUT_BATCH(0);
OUT_BATCH(0);
 @@ -470,7 +470,7 @@ gen7_blorp_emit_wm_config(struct brw_context *brw,
dw1 |= GEN7_WM_DISPATCH_ENABLE; /* We are rendering */
 }

 -  if (params-dst.num_samples  1) {
 +  if (params-num_samples()  1) {
   dw1 |= GEN7_WM_MSRAST_ON_PATTERN;
   if (prog_data  prog_data-persample_msaa_dispatch)
  dw2 |= GEN7_WM_MSDISPMODE_PERSAMPLE;
 @@ -661,8 +661,17 @@ gen7_blorp_emit_depth_stencil_config(struct
 brw_context *brw,
 * larger to allow the fast depth clear to fit the hardware
 * alignment requirements. (8x4)
 */
 -  surfwidth = params-depth.width;
 -  surfheight = params-depth.height;
 +
 +  if (params-num_samples()  1) {
 + /* If this is an MSAA + HIZ op, we need to program the
 +  * aligned logical size of the depth surface.
 +  */
 + surfwidth = ALIGN(params-depth.mt-logical_width0, 8);
 + surfheight = ALIGN(params-depth.mt-logical_height0, 4);
 +  } else {
 + surfwidth = params-depth.width;
 + surfheight = params-depth.height;
 +  }
 } else {
surfwidth = params-depth.mt-logical_width0;
surfheight = params-depth.mt-logical_height0;
 @@ -805,10 +814,11 @@ gen7_blorp_exec(struct brw_context *brw,
 uint32_t sampler_offset = 0;

 uint32_t prog_offset = params-get_wm_prog(brw, prog_data);
 -   gen6_emit_3dstate_multisample(brw, params-dst.num_samples);
 +   unsigned num_samples = params-num_samples();
 +   gen6_emit_3dstate_multisample(brw, num_samples);
 gen6_emit_3dstate_sample_mask(brw,
 - params-dst.num_samples  1 ?
 - (1  params-dst.num_samples) - 1 : 1);
 + num_samples  1 ?
 + (1  num_samples) - 1 : 1);
 gen6_blorp_emit_state_base_address(brw, params);
 gen6_blorp_emit_vertices(brw, params);
 gen7_blorp_emit_urb_config(brw, params);
 --
 2.1.2

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Looks good to me. Verified the requirement in IVB PRM.
Reviewed-by: Anuj Phogat anuj.pho...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] egl: dri2: Use present extension. (Was: Re: [RFC] egl: Add DRI3 support to the EGL backend.)

2014-11-06 Thread Michel Dänzer


On 06.11.2014 19:18, Joonas Lahtinen wrote:

On to, 2014-11-06 at 18:12 +0900, Michel Dänzer wrote:

On 05.11.2014 20:14, Joonas Lahtinen wrote:


Modified not refer to DRI3, just uses the present extension to get rid
of the excess buffer invalidations.


AFAICT there's no fallback from your changes to the current behaviour if
the X server doesn't support the Present extension. There probably needs
to be such a fallback.


It gets rid of such nasty hack (the intel_viewport one), that I thought
there is no point making fallback. Because without this, the egl dri2
backend is fundamentally broken anyway.


Well, AFAICT your code uses Present extension functionality 
unconditionally, without checking that the X server supports Present. I 
can't see how that could possibly work on an X server which doesn't 
support Present, but I think it would be better to keep it working at 
least as badly as it does now in that case. :)



--
Earthling Michel Dänzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/blorp: Fix hiz ops on MSAA surfaces

2014-11-06 Thread Chris Forbes

Thanks for the review. Ken has a slightly cleaner version of the patch
which avoids adding the helper function.
On Nov 7, 2014 2:29 AM, Anuj Phogat anuj.pho...@gmail.com wrote:



 On Tue, Nov 4, 2014 at 8:40 AM, Chris Forbes chr...@ijw.co.nz wrote:

 Two things were broken here:
 - The depth/stencil surface dimensions were broken for MSAA.
 - Sample count was programmed incorrectly.

 Result was the depth resolve didn't work correctly on MSAA surfaces, and
 so sampling the surface later produced garbage.

 Fixes the new piglit test arb_texture_multisample-sample-depth, and
 various artifacts in 'tesseract' with msaa=4 glineardepth=0.

 Not observed any piglit regressions on Haswell.

 Signed-off-by: Chris Forbes chr...@ijw.co.nz
 ---
  src/mesa/drivers/dri/i965/brw_blorp.h|  4 
  src/mesa/drivers/dri/i965/gen7_blorp.cpp | 24 +---
  2 files changed, 21 insertions(+), 7 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h
 b/src/mesa/drivers/dri/i965/brw_blorp.h
 index ff68000..c4ff0f7 100644
 --- a/src/mesa/drivers/dri/i965/brw_blorp.h
 +++ b/src/mesa/drivers/dri/i965/brw_blorp.h
 @@ -236,6 +236,10 @@ public:
 bool use_wm_prog;
 brw_blorp_wm_push_constants wm_push_consts;
 bool color_write_disable[4];
 +
 +   unsigned num_samples() const {
 +  return dst.mt ? dst.num_samples : depth.mt-num_samples;
 +   }
  };


 diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
 b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
 index 206a6ff..cc57ffe 100644
 --- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp
 +++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp
 @@ -415,7 +415,7 @@ gen7_blorp_emit_sf_config(struct brw_context *brw,
OUT_BATCH(_3DSTATE_SF  16 | (7 - 2));
OUT_BATCH(params-depth_format 
  GEN7_SF_DEPTH_BUFFER_SURFACE_FORMAT_SHIFT);
 -  OUT_BATCH(params-dst.num_samples  1 ? GEN6_SF_MSRAST_ON_PATTERN
 : 0);
 +  OUT_BATCH(params-num_samples()  1 ? GEN6_SF_MSRAST_ON_PATTERN :
 0);
OUT_BATCH(0);
OUT_BATCH(0);
OUT_BATCH(0);
 @@ -470,7 +470,7 @@ gen7_blorp_emit_wm_config(struct brw_context *brw,
dw1 |= GEN7_WM_DISPATCH_ENABLE; /* We are rendering */
 }

 -  if (params-dst.num_samples  1) {
 +  if (params-num_samples()  1) {
   dw1 |= GEN7_WM_MSRAST_ON_PATTERN;
   if (prog_data  prog_data-persample_msaa_dispatch)
  dw2 |= GEN7_WM_MSDISPMODE_PERSAMPLE;
 @@ -661,8 +661,17 @@ gen7_blorp_emit_depth_stencil_config(struct
 brw_context *brw,
 * larger to allow the fast depth clear to fit the hardware
 * alignment requirements. (8x4)
 */
 -  surfwidth = params-depth.width;
 -  surfheight = params-depth.height;
 +
 +  if (params-num_samples()  1) {
 + /* If this is an MSAA + HIZ op, we need to program the
 +  * aligned logical size of the depth surface.
 +  */
 + surfwidth = ALIGN(params-depth.mt-logical_width0, 8);
 + surfheight = ALIGN(params-depth.mt-logical_height0, 4);
 +  } else {
 + surfwidth = params-depth.width;
 + surfheight = params-depth.height;
 +  }
 } else {
surfwidth = params-depth.mt-logical_width0;
surfheight = params-depth.mt-logical_height0;
 @@ -805,10 +814,11 @@ gen7_blorp_exec(struct brw_context *brw,
 uint32_t sampler_offset = 0;

 uint32_t prog_offset = params-get_wm_prog(brw, prog_data);
 -   gen6_emit_3dstate_multisample(brw, params-dst.num_samples);
 +   unsigned num_samples = params-num_samples();
 +   gen6_emit_3dstate_multisample(brw, num_samples);
 gen6_emit_3dstate_sample_mask(brw,
 - params-dst.num_samples  1 ?
 - (1  params-dst.num_samples) - 1 : 1);
 + num_samples  1 ?
 + (1  num_samples) - 1 : 1);
 gen6_blorp_emit_state_base_address(brw, params);
 gen6_blorp_emit_vertices(brw, params);
 gen7_blorp_emit_urb_config(brw, params);
 --
 2.1.2

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


 Looks good to me. Verified the requirement in IVB PRM.
 Reviewed-by: Anuj Phogat anuj.pho...@gmail.com

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] egl_dri2: add a note about dri2_create_screen

The function is not called by platform_drm. As such one needs to
pay special attention at teardown.

Signed-off-by: Emil Velikov emil.l.veli...@gmail.com
---
 src/egl/drivers/dri2/egl_dri2.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index 609afde..2094ffd 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -543,6 +543,9 @@ dri2_setup_screen(_EGLDisplay *disp)
}
 }
 
+/* All platforms but DRM call this function to create the screen, query the
+ * dri extensions, setup the vtables and populate the driver_configs.
+ * DRM inherits all that information from its display - GBM. */
 EGLBoolean
 dri2_create_screen(_EGLDisplay *disp)
 {
-- 
2.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] egl_dri2: fix double free on drm platforms

Earlier commit failed to attribure that for drm platforms one does not
call dri2_create_screen, thus it does not create the screen and
driver_configs but inherits them from the display - gbm.

As such wrap cleanup in Platform != _EGL_PLATFORM_DRM to prevent
the issue and still cleanup correctly for non-drm platforms.

Cc: Kenneth Graunke kenn...@whitecape.org
Cc: Mark Janes mark.a.ja...@intel.com
Reported-by: Kenneth Graunke kenn...@whitecape.org
Reported-by: Mark Janes mark.a.ja...@intel.com
Signed-off-by: Emil Velikov emil.l.veli...@gmail.com
---
 src/egl/drivers/dri2/egl_dri2.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index dcc3239..609afde 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -707,9 +707,18 @@ dri2_terminate(_EGLDriver *drv, _EGLDisplay *disp)
   break;
}
 
+   /* The drm platform does not create the screen/driver_configs but reuses
+* the ones from the gbm device. As such the gbm itself is responsible
+* for the cleanup. */
+#ifdef HAVE_DRM_PLATFORM
+   if (disp-Platform != _EGL_PLATFORM_DRM) {
+#endif
for (i = 0; dri2_dpy-driver_configs[i]; i++)
   free((__DRIconfig *) dri2_dpy-driver_configs[i]);
free(dri2_dpy-driver_configs);
+#ifdef HAVE_DRM_PLATFORM
+   }
+#endif
free(dri2_dpy);
disp-DriverData = NULL;
 
-- 
2.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] gbm/dri: cleanup memory leak on teardown

During teardown we free the driver_configs list pointer, but we forget
to deallocate each config in that list.

Signed-off-by: Emil Velikov emil.l.veli...@gmail.com
---
 src/gbm/backends/dri/gbm_dri.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gbm/backends/dri/gbm_dri.c b/src/gbm/backends/dri/gbm_dri.c
index f637e32..0664266 100644
--- a/src/gbm/backends/dri/gbm_dri.c
+++ b/src/gbm/backends/dri/gbm_dri.c
@@ -914,8 +914,11 @@ static void
 dri_destroy(struct gbm_device *gbm)
 {
struct gbm_dri_device *dri = gbm_dri_device(gbm);
+   unsigned i;
 
dri-core-destroyScreen(dri-screen);
+   for (i = 0; dri-driver_configs[i]; i++)
+  free((__DRIconfig *) dri-driver_configs[i]);
free(dri-driver_configs);
dlclose(dri-driver);
free(dri-base.driver_name);
-- 
2.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] Bogus bounds checking in api_validate.c?

While working on some other things, I came across some bounds checking
code in _mesa_validate_DrawElements (and related functions) in
api_validate.c.

  /* use indices in the buffer object */
  /* make sure count doesn't go outside buffer bounds */
  if (index_bytes(type, count)  ctx-Array.VAO-IndexBufferObj-Size) {
 _mesa_warning(ctx, glDrawElements index out of buffer bounds);
 return GL_FALSE;
  }

index_bytes calculates how many bytes of data count indices will
occupy based on the type.  The problem is that this doesn't consider
the base pointer.  As far as I can tell, if I had a 64 byte buffer
object for my index data, and I did

glDrawElements(GL_POINTS, 16, GL_UNSIGNED_INT, 60);

_mesa_validate_DrawElements would say, Ok!

Am I missing something, or is this just broken?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi siavashser...@gmail.com wrote:
 Then I do recommend removing the if (cpu_has_sse4_1) from this patch and
 similar places, because there is no runtime CPU dispatching happening for
 SSE optimized code paths in action and just adds extra overhead (unnecessary
 branches) to the generated code.

No. Sorry, I realize I misread your previous question:

 I guess checking for cpu_has_sse4_1 is unnecessary if it isn't
 controllable by user at runtime; because USE_SSE41 is a compile time check
 and requires the target machine to be SSE 4.1 capable already.

USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you
to build the code and then use it only on systems that actually
support it.

All of this could have been pretty easily answered by a few greps though...
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] egl: dri2: Use present extension. (Was: Re: [RFC] egl: Add DRI3 support to the EGL backend.)

On 11/06/2014 06:16 PM, Michel Dänzer wrote:
 On 06.11.2014 19:18, Joonas Lahtinen wrote:
 On to, 2014-11-06 at 18:12 +0900, Michel Dänzer wrote:
 On 05.11.2014 20:14, Joonas Lahtinen wrote:

 Modified not refer to DRI3, just uses the present extension to get rid
 of the excess buffer invalidations.

 AFAICT there's no fallback from your changes to the current behaviour if
 the X server doesn't support the Present extension. There probably needs
 to be such a fallback.

 It gets rid of such nasty hack (the intel_viewport one), that I thought
 there is no point making fallback. Because without this, the egl dri2
 backend is fundamentally broken anyway.
 
 Well, AFAICT your code uses Present extension functionality
 unconditionally, without checking that the X server supports Present. I
 can't see how that could possibly work on an X server which doesn't
 support Present, but I think it would be better to keep it working at
 least as badly as it does now in that case. :)

I was going to say pretty much the same thing.  Aren't there (non-Intel)
drivers that don't do Present?  If I'm not mistaken, some parts of DRI3
(not sure about Present) are even disabled in the Intel driver when SNA
is in use... or at least that was the case at one point.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] mesa: Remove _mesa_max_buffer_index

From: Ian Romanick ian.d.roman...@intel.com

It appears to be completely unused since f9be8543 (February 2012).

Signed-off-by: Ian Romanick ian.d.roman...@intel.com
Cc: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/main/api_validate.c | 46 
 src/mesa/main/api_validate.h |  6 --
 2 files changed, 52 deletions(-)

diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
index 0d206d8..d4b962f 100644
--- a/src/mesa/main/api_validate.c
+++ b/src/mesa/main/api_validate.c
@@ -77,52 +77,6 @@ index_bytes(GLenum type, GLsizei count)
 
 
 /**
- * Find the max index in the given element/index buffer
- */
-GLuint
-_mesa_max_buffer_index(struct gl_context *ctx, GLuint count, GLenum type,
-   const void *indices,
-   struct gl_buffer_object *elementBuf)
-{
-   const GLubyte *map = NULL;
-   GLuint max = 0;
-   GLuint i;
-
-   if (_mesa_is_bufferobj(elementBuf)) {
-  /* elements are in a user-defined buffer object.  need to map it */
-  map = ctx-Driver.MapBufferRange(ctx, 0, elementBuf-Size,
-  GL_MAP_READ_BIT, elementBuf,
-   MAP_INTERNAL);
-  /* Actual address is the sum of pointers */
-  indices = (const GLvoid *) ADD_POINTERS(map, (const GLubyte *) indices);
-   }
-
-   if (type == GL_UNSIGNED_INT) {
-  for (i = 0; i  count; i++)
- if (((GLuint *) indices)[i]  max)
-max = ((GLuint *) indices)[i];
-   }
-   else if (type == GL_UNSIGNED_SHORT) {
-  for (i = 0; i  count; i++)
- if (((GLushort *) indices)[i]  max)
-max = ((GLushort *) indices)[i];
-   }
-   else {
-  ASSERT(type == GL_UNSIGNED_BYTE);
-  for (i = 0; i  count; i++)
- if (((GLubyte *) indices)[i]  max)
-max = ((GLubyte *) indices)[i];
-   }
-
-   if (map) {
-  ctx-Driver.UnmapBuffer(ctx, elementBuf, MAP_INTERNAL);
-   }
-
-   return max;
-}
-
-
-/**
  * Check if OK to draw arrays/elements.
  */
 static GLboolean
diff --git a/src/mesa/main/api_validate.h b/src/mesa/main/api_validate.h
index 8238df1..0bb91c6 100644
--- a/src/mesa/main/api_validate.h
+++ b/src/mesa/main/api_validate.h
@@ -35,12 +35,6 @@ struct gl_context;
 struct gl_transform_feedback_object;
 
 
-extern GLuint
-_mesa_max_buffer_index(struct gl_context *ctx, GLuint count, GLenum type,
-   const void *indices,
-   struct gl_buffer_object *elementBuf);
-
-
 extern bool
 _mesa_is_valid_prim_mode(struct gl_context *ctx, GLenum mode);
 
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Bogus bounds checking in api_validate.c?

On Thursday, November 06, 2014 08:09:18 PM Ian Romanick wrote:
 While working on some other things, I came across some bounds checking
 code in _mesa_validate_DrawElements (and related functions) in
 api_validate.c.
 
   /* use indices in the buffer object */
   /* make sure count doesn't go outside buffer bounds */
   if (index_bytes(type, count)  ctx-Array.VAO-IndexBufferObj-Size) {
  _mesa_warning(ctx, glDrawElements index out of buffer bounds);
  return GL_FALSE;
   }
 
 index_bytes calculates how many bytes of data count indices will
 occupy based on the type.  The problem is that this doesn't consider
 the base pointer.  As far as I can tell, if I had a 64 byte buffer
 object for my index data, and I did
 
 glDrawElements(GL_POINTS, 16, GL_UNSIGNED_INT, 60);
 
 _mesa_validate_DrawElements would say, Ok!
 
 Am I missing something, or is this just broken?

It sure seems broken to me - but, thankfully, in a conservative fashion.  (It 
will say some invalid things are OK, but won't say legal things are invalid.)

Software drivers may be relying on this working to avoid a crash.

I checked the Ivybridge documentation, and found:

Software is responsible for ensuring that accesses outside the IB do not
 occur. This is possible as software can compute the range of IB values
 referenced by a 3DPRIMITIVE command (knowing the StartVertexLocation,
 InstanceCount, and VerticesPerInstance values) and can then compare this
 range to the IB extent.

which makes it sound like an accurate computation is necessary.  But, right 
below that, it says:

this field contains the address of the last valid byte in the index buffer.
 Any index buffer reads past this address returns an index value of 0 (as if
 the index buffer was zero-extended).

So the earlier statement is false; i965 will draw the in-bounds elements 
correctly, and then repeat element 0 over and over for any out-of-bounds data, 
resulting in one strange primitive and a lot of degenerate ones.

It's proabbly worth fixing, but I doubt it's critical either.

A more interesting thing to fix, I think, would be enforcing alignment 
restrictions (i.e. your offset has to be a multiple of the IB element size).

--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: Remove _mesa_max_buffer_index

On Thursday, November 06, 2014 11:00:13 PM Ian Romanick wrote:
 From: Ian Romanick ian.d.roman...@intel.com

 It appears to be completely unused since f9be8543 (February 2012).

 Signed-off-by: Ian Romanick ian.d.roman...@intel.com
 Cc: Kenneth Graunke kenn...@whitecape.org

Yep, looks unused to me.

Reviewed-by: Kenneth Graunke kenn...@whitecape.org

My 2012 commit message in f9be8543 is not quite true - glDrawRangeElements is 
actually useful even when working with VBOs.  The driver can use the given 
range to decide what portion of a VBO is busy/going to be read by the GPU.  It 
can then optimize glBufferSubData and glMapBufferRange to avoid unnecessary 
synchronization when an application tries to overwrite an unused subregion.

It turns out that's also rather crucial for performance in many modern apps.

--Ken

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] egl_dri2: fix double free on drm platforms