Re: [Mesa-dev] [PATCH 1/2] i965/fs: Remove FS_OPCODE_BLORP_FB_WRITE.
On Thu, Aug 28, 2014 at 12:50 PM, Matt Turner matts...@gmail.com wrote: On Tue, Aug 12, 2014 at 9:48 AM, Kenneth Graunke kenn...@whitecape.org wrote: diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index d3509a0..19f7ef8 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -186,7 +186,7 @@ fs_generator::generate_fb_write(fs_inst *inst) brw_imm_ud(0x1 11)); } -if (inst-target 0) { +if (fp inst-target 0) { /* Set the render target index for choosing BLEND_STATE. */ brw_MOV(p, retype(brw_vec1_reg(BRW_MESSAGE_REGISTER_FILE, inst-base_mrf, 2), I don't understand this change. I don't see fp being used elsewhere in this patch. Patch 2 is Reviewed-by: Matt Turner matts...@gmail.com and so is this one once this hunk is sorted out. Ping? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] egl: dri2: Use present extension. (Was: Re: [RFC] egl: Add DRI3 support to the EGL backend.)
On 05.11.2014 20:14, Joonas Lahtinen wrote: Modified not refer to DRI3, just uses the present extension to get rid of the excess buffer invalidations. AFAICT there's no fallback from your changes to the current behaviour if the X server doesn't support the Present extension. There probably needs to be such a fallback. -- Earthling Michel Dänzer| http://www.amd.com Libre software enthusiast |Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 84566] Unify the format conversion code
https://bugs.freedesktop.org/show_bug.cgi?id=84566 --- Comment #49 from Iago Toral ito...@igalia.com --- Jason you had this commit in your original branch: MAYBEREVERT: Fill X components with 1 That basically makes packing to padded formats (like RGBX) set X=1. In the commit message you mention that you are not sure you like this... my opinion is that this should not be necessary and it degrades performance when packing to these format, so I would be more for removing this one. What do you think? FWIW, I have checked that removing this commit does not affect piglit results. -- You are receiving this mail because: You are the QA Contact for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3][RFC v2] mesa/main/x86: Add sse2 streaming clamping
On 05.11.2014 21:21, Ian Romanick wrote: On 11/04/2014 01:24 PM, Roland Scheidegger wrote: Am 04.11.2014 um 13:05 schrieb Juha-Pekka Heikkila: + for(i = 0; i n; i++) { + _mesa_clamp_float_rgba(rgba_src[i], temp, min, max); + + *operand = _mm_mul_ps(multiplier, *operand); + truncated_integers = _mm_cvttps_epi32(*operand); + mmove = _mm_set_ps(aMap[map_p[ACOMP]], bMap[map_p[BCOMP]], + gMap[map_p[GCOMP]], rMap[map_p[RCOMP]] ); + + _mm_storeu_ps(rgba_dst[i], mmove); The sse2 code at the end looks counterproductive to me. Not sure what gcc will generate but I'd suspect it involves some simd-int domain transition for the table lookups, plus another int-simd transition to get the values back into simd domain (alternatively it might use stores/load here) just so you can store them again... It would probably be better to just store the values directly after the table lookups. But in any case actually I'm beginning to suspect noone really cares about performance anyway for that path (who the hell uses these scale/map features?) so whatever works... Which raises another question... do we have any piglit tests that actually exercise this path? No we don't. I made small test for this to see how it works, I was planning to move my test to Piglit later. /Juha-Pekka ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] egl: dri2: Use present extension. (Was: Re: [RFC] egl: Add DRI3 support to the EGL backend.)
On to, 2014-11-06 at 18:12 +0900, Michel Dänzer wrote: On 05.11.2014 20:14, Joonas Lahtinen wrote: Modified not refer to DRI3, just uses the present extension to get rid of the excess buffer invalidations. AFAICT there's no fallback from your changes to the current behaviour if the X server doesn't support the Present extension. There probably needs to be such a fallback. It gets rid of such nasty hack (the intel_viewport one), that I thought there is no point making fallback. Because without this, the egl dri2 backend is fundamentally broken anyway. Regards, Joonas ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] egl: dri2: Use present extension. (Was: Re: [RFC] egl: Add DRI3 support to the EGL backend.)
On ke, 2014-11-05 at 15:19 +, Emil Velikov wrote: Hi Joonas, Does getting rid of the viewport hack give you any noticeable performance improvement? Yes, it significantly reduces the CPU load when multiple glViewport calls are made per frame (4x4 grid or so). Is there any interest in converting the egl_dri2 backend to dri3, rather than just copying over the present bits ? This could be one thing to do. But in the meanwhile, I would commit this present extension patch so that the affected use cases get the improvements. Regards, Joonas On 05/11/14 11:14, Joonas Lahtinen wrote: Hi, Modified not refer to DRI3, just uses the present extension to get rid of the excess buffer invalidations. Regards, Joonas From 257e2a8c750f9dcf868cce9da8632df3cae0fcec Mon Sep 17 00:00:00 2001 From: Joonas Lahtinen joonas.lahti...@linux.intel.com Date: Wed, 5 Nov 2014 12:25:32 +0200 Subject: [PATCH] egl: dri2: Use present extension. Present extension is used to avoid excess buffer invalidations, because the XCB interface doesn't supply that information. Signed-off-by: Daniel van der Wath danielx.j.van.der.w...@intel.com Signed-off-by: Joonas Lahtinen joonas.lahti...@linux.intel.com --- configure.ac|5 +- src/egl/drivers/dri2/egl_dri2.c |2 +- src/egl/drivers/dri2/egl_dri2.h | 24 ++- src/egl/drivers/dri2/platform_x11.c | 247 --- src/mesa/drivers/dri/i965/brw_context.c |9 +- 5 files changed, 262 insertions(+), 25 deletions(-) diff --git a/configure.ac b/configure.ac index fc7d372..75d90c0 100644 --- a/configure.ac +++ b/configure.ac @@ -952,7 +952,8 @@ xyesno) fi if test x$enable_dri = xyes; then - dri_modules=$dri_modules xcb-dri2 = $XCBDRI2_REQUIRED + PKG_CHECK_MODULES([PRESENTPROTO], [presentproto = $PRESENTPROTO_REQUIRED]) + dri_modules=$dri_modules xcb-dri2 = $XCBDRI2_REQUIRED xcb-present Afaics you are not changing anything on the dri modules (or glx/dri2) to require the above changes. Perhaps you need to push the presentproto check in the x11 case below ? fi if test x$enable_dri3 = xyes; then @@ -1564,7 +1565,7 @@ for plat in $egl_platforms; do ;; x11) - PKG_CHECK_MODULES([XCB_DRI2], [x11-xcb xcb xcb-dri2 = $XCBDRI2_REQUIRED xcb-xfixes]) + PKG_CHECK_MODULES([XCB_DRI2], [x11-xcb xcb xcb-dri2 = $XCBDRI2_REQUIRED xcb-xfixes xcb-present]) ;; drm) [snip] diff --git a/src/egl/drivers/dri2/platform_x11.c b/src/egl/drivers/dri2/platform_x11.c index f8c4b70..a1445b2 100644 --- a/src/egl/drivers/dri2/platform_x11.c +++ b/src/egl/drivers/dri2/platform_x11.c @@ -188,6 +188,205 @@ get_xcb_screen(xcb_screen_iterator_t iter, int screen) return NULL; } +/* + * Called by the XCB_PRESENT_COMPLETE_NOTIFY case. + */ +static void +dri2_update_num_back(struct dri2_egl_surface *priv) +{ + priv-num_back = 1; + if (priv-flipping) + priv-num_back++; + if (priv-base.SwapInterval == 0) + priv-num_back++; +} + This seems to be out of sync with dri3_glx. Don't you need something similar to commit f7a36ef5fe23056299a77414f9ad8b5e5a1d ? [snip] +/** + * + * Process any present events that have been received from the X server + * + * From glx, we additionally invalidate the drawable here if there has a been a special event. + */ +static void +dri2_flush_present_events(struct dri2_egl_display *dri2_dpy, struct dri2_egl_surface *priv) +{ + xcb_connection_t *c = dri2_dpy-conn; + + /* Check to see if any configuration changes have occurred +* since we were last invoked +*/ + if (priv-special_event) { + xcb_generic_event_t*ev; + + while ((ev = xcb_poll_for_special_event(c, priv-special_event)) != NULL) { + xcb_present_generic_event_t *ge = (void *) ev; + dri2_handle_present_event(priv, ge); + _eglLog(_EGL_INFO, DRI: Invalidating buffer 0x%x\n, priv-dri_drawable); + (*dri2_dpy-flush-invalidate)(priv-dri_drawable); Hmm why does one need to invalidate at this stage - I take that it's related to lack of fence objects ? [snip] diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index e1a994a..dbadd10 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -148,6 +148,9 @@ intel_viewport(struct gl_context *ctx) __DRIcontext *driContext = brw-driContext; if (_mesa_is_winsys_fbo(ctx-DrawBuffer)) { + if (unlikely(INTEL_DEBUG DEBUG_DRI)) + fprintf(stderr, invalidating drawables\n); + dri2InvalidateDrawable(driContext-driDrawablePriv);
[Mesa-dev] [Bug 74563] Surfaceless contexts are not properly released by DRI drivers
https://bugs.freedesktop.org/show_bug.cgi?id=74563 Tapani Pälli lem...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #3 from Tapani Pälli lem...@gmail.com --- was pushed to master -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] mesa: add runtime support for SSSE3
Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- configure.ac | 6 ++ src/mesa/x86/common_x86.c | 4 src/mesa/x86/common_x86_features.h | 4 +++- 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 03f1bca..cc0a661 100644 --- a/configure.ac +++ b/configure.ac @@ -258,6 +258,12 @@ if test x$SSE41_SUPPORTED = x1; then fi AM_CONDITIONAL([SSE41_SUPPORTED], [test x$SSE41_SUPPORTED = x1]) +AX_CHECK_COMPILE_FLAG([-mssse3], [SSSE3_SUPPORTED=1], [SSSE3_SUPPORTED=0]) +if test x$SSSE3_SUPPORTED = x1; then +DEFINES=$DEFINES -DUSE_SSSE3 +fi +AM_CONDITIONAL([SSSE3_SUPPORTED], [test x$SSSE3_SUPPORTED = x1]) + dnl Can't have static and shared libraries, default to static if user dnl explicitly requested. If both disabled, set to static since shared dnl was explicitly requested. diff --git a/src/mesa/x86/common_x86.c b/src/mesa/x86/common_x86.c index 25f5c40..ad0648a 100644 --- a/src/mesa/x86/common_x86.c +++ b/src/mesa/x86/common_x86.c @@ -352,6 +352,10 @@ _mesa_get_x86_features(void) __get_cpuid(1, eax, ebx, ecx, edx); +#ifdef bit_SSSE3 + if (ecx bit_SSSE3) + _mesa_x86_cpu_features |= X86_FEATURE_SSSE3; +#endif if (ecx bit_SSE4_1) _mesa_x86_cpu_features |= X86_FEATURE_SSE4_1; } diff --git a/src/mesa/x86/common_x86_features.h b/src/mesa/x86/common_x86_features.h index 66f2cf6..6eb2b38 100644 --- a/src/mesa/x86/common_x86_features.h +++ b/src/mesa/x86/common_x86_features.h @@ -43,7 +43,8 @@ #define X86_FEATURE_XMM2 (16) #define X86_FEATURE_3DNOWEXT (17) #define X86_FEATURE_3DNOW (18) -#define X86_FEATURE_SSE4_1 (19) +#define X86_FEATURE_SSSE3 (19) +#define X86_FEATURE_SSE4_1 (110) /* standard X86 CPU features */ #define X86_CPU_FPU(10) @@ -65,6 +66,7 @@ #define cpu_has_xmm2 (_mesa_x86_cpu_features X86_FEATURE_XMM2) #define cpu_has_3dnow (_mesa_x86_cpu_features X86_FEATURE_3DNOW) #define cpu_has_3dnowext (_mesa_x86_cpu_features X86_FEATURE_3DNOWEXT) +#define cpu_has_ssse3 (_mesa_x86_cpu_features X86_FEATURE_SSSE3) #define cpu_has_sse4_1 (_mesa_x86_cpu_features X86_FEATURE_SSE4_1) #endif -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy
Also cleans up some if statements in the *faster functions. Callgrind cpu usage results from pts benchmarks: For ytile_copy_faster() Nexuiz 1.6.1: 2.16% - 1.20% Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- src/mesa/Makefile.am | 8 +++ src/mesa/drivers/dri/i965/intel_tex_subimage.c | 82 ++ src/mesa/main/fast_rgba8_copy.c| 78 src/mesa/main/fast_rgba8_copy.h| 37 4 files changed, 141 insertions(+), 64 deletions(-) create mode 100644 src/mesa/main/fast_rgba8_copy.c create mode 100644 src/mesa/main/fast_rgba8_copy.h diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am index e71bccb..2402096 100644 --- a/src/mesa/Makefile.am +++ b/src/mesa/Makefile.am @@ -107,6 +107,10 @@ AM_CXXFLAGS = $(LLVM_CFLAGS) $(VISIBILITY_CXXFLAGS) ARCH_LIBS = +if SSSE3_SUPPORTED +ARCH_LIBS += libmesa_ssse3.la +endif + if SSE41_SUPPORTED ARCH_LIBS += libmesa_sse41.la endif @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \ main/streaming-load-memcpy.c libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1 +libmesa_ssse3_la_SOURCES = \ + main/fast_rgba8_copy.c +libmesa_ssse3_la_CFLAGS = $(AM_CFLAGS) -mssse3 + pkgconfigdir = $(libdir)/pkgconfig pkgconfig_DATA = gl.pc diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c b/src/mesa/drivers/dri/i965/intel_tex_subimage.c index cb5738a..0deeb75 100644 --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c @@ -27,6 +27,7 @@ **/ #include main/bufferobj.h +#include main/fast_rgba8_copy.h #include main/image.h #include main/macros.h #include main/mtypes.h @@ -42,9 +43,7 @@ #include intel_mipmap_tree.h #include intel_blit.h -#ifdef __SSSE3__ -#include tmmintrin.h -#endif +#include x86/common_x86_asm.h #define FILE_DEBUG_FLAG DEBUG_TEXTURE @@ -175,18 +174,6 @@ err: return false; } -#ifdef __SSSE3__ -static const uint8_t rgba8_permutation[16] = - { 2,1,0,3, 6,5,4,7, 10,9,8,11, 14,13,12,15 }; - -/* NOTE: dst must be 16 byte aligned */ -#define rgba8_copy_16(dst, src) \ - *(__m128i *)(dst) = _mm_shuffle_epi8(\ - (__m128i) _mm_loadu_ps((float *)(src)), \ - *(__m128i *) rgba8_permutation\ - ) -#endif - /** * Copy RGBA to BGRA - swap R and B. */ @@ -196,29 +183,6 @@ rgba8_copy(void *dst, const void *src, size_t bytes) uint8_t *d = dst; uint8_t const *s = src; -#ifdef __SSSE3__ - /* Fast copying for tile spans. -* -* As long as the destination texture is 16 aligned, -* any 16 or 64 spans we get here should also be 16 aligned. -*/ - - if (bytes == 16) { - assert(!(((uintptr_t)dst) 0xf)); - rgba8_copy_16(d+ 0, s+ 0); - return dst; - } - - if (bytes == 64) { - assert(!(((uintptr_t)dst) 0xf)); - rgba8_copy_16(d+ 0, s+ 0); - rgba8_copy_16(d+16, s+16); - rgba8_copy_16(d+32, s+32); - rgba8_copy_16(d+48, s+48); - return dst; - } -#endif - while (bytes = 4) { d[0] = s[2]; d[1] = s[1]; @@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, mem_copy_fn mem_copy) { if (x0 == 0 x3 == xtile_width y0 == 0 y1 == xtile_height) { - if (mem_copy == memcpy) - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, - dst, src, src_pitch, swizzle_bit, memcpy); - else if (mem_copy == rgba8_copy) - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, - dst, src, src_pitch, swizzle_bit, rgba8_copy); - } else { - if (mem_copy == memcpy) - return xtile_copy(x0, x1, x2, x3, y0, y1, - dst, src, src_pitch, swizzle_bit, memcpy); - else if (mem_copy == rgba8_copy) - return xtile_copy(x0, x1, x2, x3, y0, y1, - dst, src, src_pitch, swizzle_bit, rgba8_copy); + return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, +dst, src, src_pitch, swizzle_bit, mem_copy); } xtile_copy(x0, x1, x2, x3, y0, y1, dst, src, src_pitch, swizzle_bit, mem_copy); @@ -388,19 +341,8 @@ ytile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, mem_copy_fn mem_copy) { if (x0 == 0 x3 == ytile_width y0 == 0 y1 == ytile_height) { - if (mem_copy == memcpy) - return ytile_copy(0, 0, ytile_width, ytile_width, 0, ytile_height, - dst, src, src_pitch, swizzle_bit, memcpy); - else if (mem_copy == rgba8_copy) - return ytile_copy(0, 0, ytile_width, ytile_width, 0, ytile_height, - dst, src, src_pitch, swizzle_bit, rgba8_copy); - } else { - if (mem_copy == memcpy)
Re: [Mesa-dev] [PATCH 0/2] Disable the EGL state tracker for Linux/DRI builds
On 05/11/14 21:11, Jose Fonseca wrote: How many people/companies use EGL for Windows/fbdev, how about OpenVG on any platform ? I already said this privately to Marek when he was RFC'ing on this change: I'm fine if Linux-specific drivers abandon st/egl to focus solely on st/dri, but removing st/egl altogether seems unnecessary and short-sighted: EGL is a cross-platform API, Mesa is a cross-platform implementation of OpenGL and friends, so sooner or later people will want to have Mesa's EGL support on platforms others than Linux. This is not hypothetical: - See https://bugs.freedesktop.org/show_bug.cgi?id=40920 for an example of a bug reported from an user using llvmpipe + egl + opengv on windows. - VMware doesn't currently ship or support EGL on Windows, but I suspect we eventually we'll want to support EGL on non-linux platforms. Even if OpenVG is loosing popularity, but maybe Khronos will come up with another cross-platform graphics API (maybe OpenGL NG) that's tied to EGL. So a cross-platform implementation of EGL is bound to be useful. I don't test, but I build egl-static and OpenVG on Windows nightly w/ llvmpipe. It's like a superset of OSMesa, and it seems more useful, as it gives one more APIs than OSMesa, and through a standard API to create/bind contexts . In short, stop caring about st/egl on Linux, maybe even remove DRI support out st/egl if you must, but please don't go out of your way to break EGL on non-linux platforms. So let me justify why I brought this in the first place: 1. Over the last two years st/egl had the following patches * Build fixes related - most of the patches (80%?) * Interface changes - ~10 patches * Bugfixes - ~3 patches * New features - 1 patch (already present with the dri2 backend) 2. Over the last two years I've not seen any bug reports from people using either st/egl or st/vega. Must admit I was not looking too closely. 3. Afaict the VMWare or other commercial products do not use it. So based on those my naive question was Is there anyone actually using those state-trackers, rather than just building them - i.e. it was meant as a question, rather than a message of hate wrt the code-base :-) I must admit I cannot predict the future (i.e. what VMWare, Khronos and/or others have in plan) but based on the lack of testers, maintainers and new improvements imho it make sense to remove the stale code. As soon as any of that changes we can always bring it back. So I would not call it short-sighted, but imho it does not make sense to cling onto something in the hopes that one day someone may use it. Cheers, Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] megadriver: explicitly link against glapi, link with -no-undefined
Hi Frank, On 31/10/14 01:11, Frank Henigman wrote: I was too hasty with my Tested-by. While it worked in a shared-glapi configuration, it broke the build with the following: ./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-cros-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --libdir=/usr/lib64 --disable-dependency-tracking --disable-option-checking --with-driver=dri --disable-glu --disable-glut --without-demos --enable-texture-float --disable-dri3 --disable-llvm-shared-libs --enable-glx --enable-llvm-gallium --disable-egl --disable-gbm --disable-gles1 --disable-gles2 --disable-shared-glapi --disable-gallium --enable-debug --enable-glx-tls --enable-asm --disable-xlib-glx --enable-dri --with-dri-drivers=i965 --with-gallium-drivers= the output is: ... make[5]: Entering directory `/build/lumpy/tmp/portage/media-libs/mesa-10.3-r11/work/Mesa-10.3/src/mesa/drivers/dri' CXXLD mesa_dri_drivers.la ../../../../src/mapi/glapi/.libs/libglapi.a(glapi_libglapi_la-glapi_getproc.o):glapi_getproc.c:static_functions: error: undefined reference to 'glAreTexturesResidentEXT' ../../../../src/mapi/glapi/.libs/libglapi.a(glapi_libglapi_la-glapi_getproc.o):glapi_getproc.c:static_functions: error: undefined reference to 'glDeleteTexturesEXT' ... Indeed. static-glapi does not get too much testing, plus it seems that it's broken (in a way) for a lng time. It seems that we'll have to (temporary) resolve with shoving dlopen(libglapi.so) into gbm, so that in time programs can nuke it from their codebase. glapi is internal (implementation) detail that they should not need to bother/know. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] egl_dri2: Allow both 24 and 32 bit X visuals for RGBA configs
Humble ping x2 On 14/10/14 15:25, Emil Velikov wrote: Humble ping. On 23/09/14 01:25, Emil Velikov wrote: From: Sjoerd Simons sjoerd.sim...@collabora.co.uk When using RGBA EGLConfigs allow both RGB and RGBA X visuals, such that application can decide whether they want to use RGBA (and have the compositor blend their windows). On my system with this change EGLConfigs with a 24 bit visual comes up first, as such applications blindly picking the first EGLConfig will still get an RGB X visual. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67676 --- Hello gents, This patch has been stuck in bugzilla since February this year. Bringing it around here to gather greater exposure and perhaps some comments/reviews. -Emil src/egl/drivers/dri2/egl_dri2.c | 5 + src/egl/drivers/dri2/platform_x11.c | 17 + 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index 20a7243..2ed90a7 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -110,6 +110,11 @@ EGLint dri2_to_egl_attribute_map[] = { static EGLBoolean dri2_match_config(const _EGLConfig *conf, const _EGLConfig *criteria) { + + if (criteria-NativeVisualID != EGL_DONT_CARE +conf-NativeVisualID != criteria-NativeVisualID) + return EGL_FALSE; + if (_eglCompareConfigs(conf, criteria, NULL, EGL_FALSE) != 0) return EGL_FALSE; diff --git a/src/egl/drivers/dri2/platform_x11.c b/src/egl/drivers/dri2/platform_x11.c index a7a7338..3395fb7 100644 --- a/src/egl/drivers/dri2/platform_x11.c +++ b/src/egl/drivers/dri2/platform_x11.c @@ -672,14 +672,15 @@ dri2_x11_add_configs_for_visuals(struct dri2_egl_display *dri2_dpy, dri2_add_config(disp, dri2_dpy-driver_configs[j], id++, surface_type, config_attrs, rgba_masks); -/* Allow a 24-bit RGB visual to match a 32-bit RGBA EGLConfig. - * Otherwise it will only match a 32-bit RGBA visual. On a - * composited window manager on X11, this will make all of the - * EGLConfigs with destination alpha get blended by the - * compositor. This is probably not what the application - * wants... especially on drivers that only have 32-bit RGBA - * EGLConfigs! */ -if (d.data-depth == 24) { +/* Allow both 24-bit RGB visual and 32 bit RGBA to match a 32-bit + * RGBA EGLConfig. Otherwise it will only match a 32-bit RGBA + * visual. On a composited window manager on X11, this will make + * all of the EGLConfigs with destination alpha get blended by the + * compositor. This is probably not what the application wants... + * especially on drivers that only have 32-bit RGBA EGLConfigs! + * Allowing both allows applications to make the decision whether + * 32 bit visuals are intended */ +if (d.data-depth == 24 || d.data-depth == 32) { rgba_masks[3] = ~(rgba_masks[0] | rgba_masks[1] | rgba_masks[2]); dri2_add_config(disp, dri2_dpy-driver_configs[j], id++, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On 29.10.2014 14:05, Timothy Arceri wrote: Makes use of SSE to speed up compute of min and max elements Callgrind cpu usage results from pts benchmarks: Openarena 0.8.8: 3.67% - 1.03% UrbanTerror: 2.36% - 0.81% V5: - actually make use of the optimisation in android (Emil Velikov) - set a better array size limit for using SSE and added TODO V4: - fixed bugs with incrementing pointer and updating counters V3: - Removed sse_minmax.c from Makefile.sources - handle the first few values without SSE until the pointer is aligned and use _mm_load_si128 rather than _mm_loadu_si128 - guard the call to the SSE code better at build time V2: - removed GL* types - use _mm_store_si128() rather than _mm_store_ps() - add runtime check for SSE - use aligned attribute for local mix/max - bunch of tidyups Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- src/mesa/Android.libmesa_dricore.mk | 8 ++- src/mesa/Android.libmesa_st_mesa.mk | 5 ++ src/mesa/Makefile.am| 3 +- src/mesa/main/sse_minmax.c | 97 + src/mesa/main/sse_minmax.h | 30 src/mesa/vbo/vbo_exec_array.c | 14 -- 6 files changed, 152 insertions(+), 5 deletions(-) create mode 100644 src/mesa/main/sse_minmax.c create mode 100644 src/mesa/main/sse_minmax.h diff --git a/src/mesa/Android.libmesa_dricore.mk b/src/mesa/Android.libmesa_dricore.mk index 1e6d948..2ab593d 100644 --- a/src/mesa/Android.libmesa_dricore.mk +++ b/src/mesa/Android.libmesa_dricore.mk @@ -51,10 +51,16 @@ endif # MESA_ENABLE_ASM ifeq ($(ARCH_X86_HAVE_SSE4_1),true) LOCAL_SRC_FILES += \ - $(SRCDIR)main/streaming-load-memcpy.c + $(SRCDIR)main/streaming-load-memcpy.c \ + $(SRCDIR)main/sse_minmax.c LOCAL_CFLAGS := -msse4.1 endif +ifeq ($(ARCH_X86_HAVE_SSE4_1),true) +LOCAL_CFLAGS += \ + -DUSE_SSE41 +endif + LOCAL_C_INCLUDES := \ $(call intermediates-dir-for STATIC_LIBRARIES,libmesa_program,,) \ $(MESA_TOP)/src \ diff --git a/src/mesa/Android.libmesa_st_mesa.mk b/src/mesa/Android.libmesa_st_mesa.mk index 8b8d652..618d6bf 100644 --- a/src/mesa/Android.libmesa_st_mesa.mk +++ b/src/mesa/Android.libmesa_st_mesa.mk @@ -48,6 +48,11 @@ ifeq ($(TARGET_ARCH),x86) endif # x86 endif # MESA_ENABLE_ASM +ifeq ($(ARCH_X86_HAVE_SSE4_1),true) +LOCAL_CFLAGS := \ + -DUSE_SSE41 +endif + LOCAL_C_INCLUDES := \ $(call intermediates-dir-for STATIC_LIBRARIES,libmesa_program,,) \ $(MESA_TOP)/src/gallium/auxiliary \ diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am index e71bccb..932db4f 100644 --- a/src/mesa/Makefile.am +++ b/src/mesa/Makefile.am @@ -151,7 +151,8 @@ libmesagallium_la_LIBADD = \ $(ARCH_LIBS) libmesa_sse41_la_SOURCES = \ - main/streaming-load-memcpy.c + main/streaming-load-memcpy.c \ + main/sse_minmax.c libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1 pkgconfigdir = $(libdir)/pkgconfig diff --git a/src/mesa/main/sse_minmax.c b/src/mesa/main/sse_minmax.c new file mode 100644 index 000..91a55e5 --- /dev/null +++ b/src/mesa/main/sse_minmax.c @@ -0,0 +1,97 @@ +/* + * Copyright © 2014 Timothy Arceri + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Author: + *Timothy Arceri t_arc...@yahoo.com.au + * + */ + +#ifdef __SSE4_1__ +#include main/sse_minmax.h +#include smmintrin.h +#include stdint.h + +void +_mesa_uint_array_min_max(const unsigned *ui_indices, unsigned *min_index, + unsigned *max_index, const unsigned count) +{ + unsigned max_ui = 0; + unsigned min_ui = ~0U; + unsigned i = 0; + unsigned aligned_count = count; + + /* handle the first few values without SSE until the pointer is aligned */ + while (((uintptr_t)ui_indices 15) aligned_count) { + if
Re: [Mesa-dev] [PATCH v3 3/9] gallium/auxiliary: implement sw_probe_wrapped
Hi David, Just a few nitpicks which I've missed the previously On 02/11/14 18:31, David Heidelberg wrote: Implement pipe_loader_sw_probe_wrapped which allows to use the wrapped software renderer backend when using the pipe loader. Signed-off-by: David Heidelberg da...@ixit.cz --- src/gallium/auxiliary/pipe-loader/pipe_loader.h| 11 +++ src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c | 22 ++ src/gallium/targets/gbm/Makefile.am| 1 + src/gallium/targets/opencl/Makefile.am | 1 + src/gallium/targets/xa/Makefile.am | 1 + src/gallium/tests/trivial/Makefile.am | 1 + 6 files changed, 37 insertions(+) diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader.h b/src/gallium/auxiliary/pipe-loader/pipe_loader.h index 6127a6a..9f43f17 100644 --- a/src/gallium/auxiliary/pipe-loader/pipe_loader.h +++ b/src/gallium/auxiliary/pipe-loader/pipe_loader.h @@ -166,6 +166,17 @@ pipe_loader_sw_probe_null(struct pipe_loader_device **devs); int pipe_loader_sw_probe(struct pipe_loader_device **devs, int ndev); +/** + * Get a software device wrapped atop another device. + * + * This function is platform-specific. + * + * \sa pipe_loader_probe + */ +boolean +pipe_loader_sw_probe_wrapped(struct pipe_loader_device **dev, + struct pipe_screen *screen); + #ifdef HAVE_PIPE_LOADER_DRM /** diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c b/src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c index b1b1ca6..b152f60 100644 --- a/src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c +++ b/src/gallium/auxiliary/pipe-loader/pipe_loader_sw.c @@ -29,8 +29,11 @@ #include util/u_memory.h #include util/u_dl.h +#ifdef HAVE_PIPE_LOADER_DRI #include sw/dri/dri_sw_winsys.h +#endif Does this bring any benefit ? Afaics the header does not pull any dependencies - drm, Xlib, etc so it should be safe as is. #include sw/null/null_sw_winsys.h +#include sw/wrapper/wrapper_sw_winsys.h #ifdef HAVE_PIPE_LOADER_XLIB /* Explicitly wrap the header to ease build without X11 headers */ #include sw/xlib/xlib_sw_winsys.h @@ -140,6 +143,25 @@ pipe_loader_sw_probe(struct pipe_loader_device **devs, int ndev) return i; } +boolean +pipe_loader_sw_probe_wrapped(struct pipe_loader_device **dev, + struct pipe_screen *screen) +{ + struct pipe_loader_sw_device *sdev = CALLOC_STRUCT(pipe_loader_sw_device); + Can we check for CALLOC_STRUCT failure ? + sdev-base.type = PIPE_LOADER_DEVICE_SOFTWARE; + sdev-base.driver_name = swrast; + sdev-base.ops = pipe_loader_sw_ops; + sdev-ws = wrapper_sw_winsys_wrap_pipe_screen(screen); + + if (!sdev-ws) { + FREE(sdev); + return FALSE; + } + *dev = sdev-base; + return TRUE; +} + static void pipe_loader_sw_release(struct pipe_loader_device **dev) { diff --git a/src/gallium/targets/gbm/Makefile.am b/src/gallium/targets/gbm/Makefile.am index 2c9b425..679c994 100644 --- a/src/gallium/targets/gbm/Makefile.am +++ b/src/gallium/targets/gbm/Makefile.am @@ -34,6 +34,7 @@ gbm_gallium_drm_la_SOURCES = gbm_gallium_drm_la_LIBADD = \ $(top_builddir)/src/gallium/state_trackers/gbm/libgbm.la \ $(top_builddir)/src/gallium/auxiliary/libgallium.la \ +$(top_builddir)/src/gallium/winsys/sw/wrapper/libwsw.la \ You can avoid adding this for each target by add it to GALLIUM_PIPE_LOADER_WINSYS_LIBS. Something like the following would do the job. --- a/src/gallium/Automake.inc +++ b/src/gallium/Automake.inc GALLIUM_PIPE_LOADER_WINSYS_LIBS = \ + $(top_builddir)/src/gallium/winsys/sw/wrapper/libwsw.la \ $(top_builddir)/src/gallium/winsys/sw/null/libws_null.la Cheers, Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/3][RFC v2] Clamp rgba floats with sse
On 05.11.2014 17:25, Emil Velikov wrote: Hi Juha-Pekka, On 04/11/14 12:05, Juha-Pekka Heikkila wrote: [snip] I made 'x86' folder under src/mesa/main. The idea here being if there is optimization targeting architecture it'd exist directly under the place where it was used, in its own subdirectly indicating targeted architecture. I don't think majority of such code would be generic code thus this approach. IMHO adding x86 brings no benefit at this point considering all the optimisations are x86 based. Yet if you insist on having it can you move the other optimisations in there please. This was just following the comments I got for my first rfc round for this sse stuff, comments there reached consensus these should be collected somewhere separate. I'll remove the x86 folder, if there are later wishes to separate these to their own location I let someone else continue with it. /Juha-Pekka ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH] i965: Fix an off-by-1 error in the draw upload code's size calculation.
Hi Ken, From what I've gathered the proposed patch is incorrect and is (most likely) working around a buggy application behaviour. Afaics Ian suggested that we add a driconf option for such applications. Should I consider this patch for the stable branch or the above sounds about right and we can drop it ? Thanks Emil On 14/10/14 23:42, Kenneth Graunke wrote: According to INTEL_DEBUG=perf, Borderlands: The Pre-Sequel was stalling on nearly every glBufferSubData call, with very slightly overlapping busy ranges. It turns out the draw upload code was accidentally including an extra stride's worth of data in the vertex buffer size due to a simple off-by-one error. We considered this extra bit of buffer space to be busy (in use by the GPU), when it was actually idle. The new diagram should make it easier to understand the formula. It's basically what I drew on paper when working through an actual glDrawRangeElements call. Eliminates all glBufferSubData stalls in Borderlands: The Pre-Sequel. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/dri/i965/brw_draw_upload.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) No Piglit regressions on Haswell. This might help Dota 2 and Serious Sam 3 as well, but I haven't checked. diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c b/src/mesa/drivers/dri/i965/brw_draw_upload.c index 5a12439..6cb653c 100644 --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c @@ -486,8 +486,28 @@ brw_prepare_vertices(struct brw_context *brw) offset = 0; size = intel_buffer-Base.Size; } else { + /* Compute the size/amount of data referenced by the GPU. + * If the data is interleaved, StrideB may be larger than + * _ElementSize. As an example, assume we have 2 interleaved + * attributes A and B. The data is organized like this: + * + * StrideEltSize + *_,,_, + * /\ / \ + *A: --- --- --- --- --- --- + *B:--- --- --- --- --- --- + * + * |= 4 elts ==| (4-1) * Stride + EltSize + * + * max_index - min_index gives the number of elements that + * will be referenced. Say we're drawing 4 elements. On + * the first three, we need the full stride in order to get + * to the next element. But on the last, we only want the + * element size, since we don't actually read the other + * interleaved vertex attributes stored beyond that. + */ offset = buffer-offset + min_index * buffer-stride; - size = (buffer-stride * (max_index - min_index) + + size = (buffer-stride * MAX2(max_index - min_index - 1, 0) + glarray-_ElementSize); } } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 85918] Mesa: MSVC 2010/2012 Compile error
https://bugs.freedesktop.org/show_bug.cgi?id=85918 --- Comment #4 from Emil Velikov emil.l.veli...@gmail.com --- Roland, Indeed this issue is not related (does not seem) to the verison of msvc yet it's a nice reminder about the topic, plus an humble ping for Michael :) -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 85918] Mesa: MSVC 2010/2012 Compile error
https://bugs.freedesktop.org/show_bug.cgi?id=85918 Emil Velikov emil.l.veli...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Version|unspecified |10.3 Resolution|--- |FIXED --- Comment #5 from Emil Velikov emil.l.veli...@gmail.com --- Pushed to the 10.3 branch. Issue will be fixed in mesa 10.3.3. commit d5700dc276008decb2a5d63bfa38522c5f4ad3f3 Author: Brian Paul bri...@vmware.com Date: Wed Sep 10 08:16:24 2014 -0600 mesa: fix UNCLAMPED_FLOAT_TO_UBYTE() macro for MSVC MSVC replaces the F in 255.0F with the macro argument which leads to an error. s/F/FLT/ to avoid that. It turns out we weren't using this macro at all on MSVC until the recent mesa: Drop USE_IEEE define. change. Reviewed-by: Roland Scheidegger srol...@vmware.com (cherry picked from commit 9608193cbc6ea14e49adcd0193f9e7c6058d5a2f) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85918 Nominated-by: Roland Scheidegger srol...@vmware.com -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] ARB_texture_buffer_range offsets
I'd say it's a spec bug. ARB_texture_buffer_range should say that the offset should be a multiple of an element size, but it doesn't. The question is, what should the element size be? One component or the whole pixel? Marek On Wed, Nov 5, 2014 at 9:08 PM, Roland Scheidegger srol...@vmware.com wrote: Trying to fix some bug due to alignment issues in llvmpipe's vertex fetch, I came across some issue with ARB_texture_buffer_range. Namely, it looks like the offsets specified there are always in bytes, regardless the actual format (hence, as long as the TEXTURE_BUFFER_OFFSET_ALIGNMENT is 1, it would be allowed to have an offset of 15 bytes for a rgba32f format for instance making all fetches quite unaligned). However in gallium we actually have first_elem and last_elem parameters in the sampler views which are specified in number of elements (so takes the format into account), which is what d3d10 does and the state tracker translates to that apparently. IMHO d3d10 makes way more sense there because that way the necessary alignment scales automatically depending on the format (so, if the format is 2x16bit for instance you'd need 4 byte alignment for the offset, and only need 16 bytes alignment for 4x32bit, ensuring all lookups are always aligned). This means that 15 byte offset in the example above is completely untranslatable. But if I see that right, OpenGL doesn't work like that, meaning effectively gallium drivers (and I doubt most other drivers neither) cannot actually claim to support TEXTURE_BUFFER_OFFSET_ALIGNMENT lower than 16, even if they'd only need that for 4x32bit formats. Though most gallium drivers indeed claim 1 right now. Looks quite messy... Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 85918] Mesa: MSVC 2010/2012 Compile error
https://bugs.freedesktop.org/show_bug.cgi?id=85918 --- Comment #6 from Michael Bachmann mr.bachm...@web.de --- Thank you for taking care about this issue and for including the fix in Version 10.3.3. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] ARB_texture_buffer_range offsets
Am 06.11.2014 um 16:15 schrieb Marek Olšák: I'd say it's a spec bug. ARB_texture_buffer_range should say that the offset should be a multiple of an element size, but it doesn't. The question is, what should the element size be? One component or the whole pixel? Imho whole pixel (for block compressed that would be full block, for things like packed 565 too but neither are possible in GL), i.e. format granularity. That said, the whole alignment thing is problematic for rgb32 (and the possiblity of that was added later, ARB_texture_buffer_object_rgb32), so maybe it's things like that why the offset can be just byte aligned (in other words, I'm not convinced it's just a spec bug, d3d10 doesn't have that problem with alignment). Roland Marek On Wed, Nov 5, 2014 at 9:08 PM, Roland Scheidegger srol...@vmware.com wrote: Trying to fix some bug due to alignment issues in llvmpipe's vertex fetch, I came across some issue with ARB_texture_buffer_range. Namely, it looks like the offsets specified there are always in bytes, regardless the actual format (hence, as long as the TEXTURE_BUFFER_OFFSET_ALIGNMENT is 1, it would be allowed to have an offset of 15 bytes for a rgba32f format for instance making all fetches quite unaligned). However in gallium we actually have first_elem and last_elem parameters in the sampler views which are specified in number of elements (so takes the format into account), which is what d3d10 does and the state tracker translates to that apparently. IMHO d3d10 makes way more sense there because that way the necessary alignment scales automatically depending on the format (so, if the format is 2x16bit for instance you'd need 4 byte alignment for the offset, and only need 16 bytes alignment for 4x32bit, ensuring all lookups are always aligned). This means that 15 byte offset in the example above is completely untranslatable. But if I see that right, OpenGL doesn't work like that, meaning effectively gallium drivers (and I doubt most other drivers neither) cannot actually claim to support TEXTURE_BUFFER_OFFSET_ALIGNMENT lower than 16, even if they'd only need that for 4x32bit formats. Though most gallium drivers indeed claim 1 right now. Looks quite messy... Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIBaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_Im=Ds_jdCUhL1dGXrkeea1fzl6_iInrZFJOSltaM6dlF9ws=BNwWkIpsz9GFgPRoMLDU8tEVUPzmIxKINN3Uu9evnXse= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/1] clover: Fix build after llvm r221375
Signed-off-by: Jan Vesely jan.ves...@rutgers.edu --- src/gallium/state_trackers/clover/llvm/invocation.cpp | 4 1 file changed, 4 insertions(+) diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp b/src/gallium/state_trackers/clover/llvm/invocation.cpp index e953822..3a4fcf0 100644 --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp @@ -282,7 +282,11 @@ namespace { for (unsigned i = 0; i kernel_node-getNumOperands(); ++i) { kernels.push_back(llvm::dyn_castllvm::Function( +#if HAVE_LLVM = 0x0306 + kernel_node-getOperandAsMDNode(i)-getOperand(0))); +#else kernel_node-getOperand(i)-getOperand(0))); +#endif } } -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] ARB_texture_buffer_range offsets
For radeonsi, I think only x8, x8y8, and x16 fetches can be byte-aligned. Everything else is dword-aligned (the 2 lowest bits are ignored). I guess the cap should be 4 then. Marek On Thu, Nov 6, 2014 at 4:55 PM, Roland Scheidegger srol...@vmware.com wrote: Am 06.11.2014 um 16:15 schrieb Marek Olšák: I'd say it's a spec bug. ARB_texture_buffer_range should say that the offset should be a multiple of an element size, but it doesn't. The question is, what should the element size be? One component or the whole pixel? Imho whole pixel (for block compressed that would be full block, for things like packed 565 too but neither are possible in GL), i.e. format granularity. That said, the whole alignment thing is problematic for rgb32 (and the possiblity of that was added later, ARB_texture_buffer_object_rgb32), so maybe it's things like that why the offset can be just byte aligned (in other words, I'm not convinced it's just a spec bug, d3d10 doesn't have that problem with alignment). Roland Marek On Wed, Nov 5, 2014 at 9:08 PM, Roland Scheidegger srol...@vmware.com wrote: Trying to fix some bug due to alignment issues in llvmpipe's vertex fetch, I came across some issue with ARB_texture_buffer_range. Namely, it looks like the offsets specified there are always in bytes, regardless the actual format (hence, as long as the TEXTURE_BUFFER_OFFSET_ALIGNMENT is 1, it would be allowed to have an offset of 15 bytes for a rgba32f format for instance making all fetches quite unaligned). However in gallium we actually have first_elem and last_elem parameters in the sampler views which are specified in number of elements (so takes the format into account), which is what d3d10 does and the state tracker translates to that apparently. IMHO d3d10 makes way more sense there because that way the necessary alignment scales automatically depending on the format (so, if the format is 2x16bit for instance you'd need 4 byte alignment for the offset, and only need 16 bytes alignment for 4x32bit, ensuring all lookups are always aligned). This means that 15 byte offset in the example above is completely untranslatable. But if I see that right, OpenGL doesn't work like that, meaning effectively gallium drivers (and I doubt most other drivers neither) cannot actually claim to support TEXTURE_BUFFER_OFFSET_ALIGNMENT lower than 16, even if they'd only need that for 4x32bit formats. Though most gallium drivers indeed claim 1 right now. Looks quite messy... Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIBaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_Im=Ds_jdCUhL1dGXrkeea1fzl6_iInrZFJOSltaM6dlF9ws=BNwWkIpsz9GFgPRoMLDU8tEVUPzmIxKINN3Uu9evnXse= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/4] i965: Add code to verify the CFG is sane.
In general, it seems as if this can miss several things. For instance, it checks that all the predicessors are valid but never that we have all the predecessors. Same for successors. If we really want to be able to validate a CFG, maybe a stack-based approach like calculate_cfg would work better? Also, did you run this on piglit/shader-db to ensure that everything coming out of calculate_cfg actually passes? More comments inline On Wed, Nov 5, 2014 at 4:13 PM, Matt Turner matts...@gmail.com wrote: --- src/mesa/drivers/dri/i965/test_verify_cfg.cpp | 273 ++ src/mesa/drivers/dri/i965/test_verify_cfg.h | 26 +++ 2 files changed, 299 insertions(+) create mode 100644 src/mesa/drivers/dri/i965/test_verify_cfg.cpp create mode 100644 src/mesa/drivers/dri/i965/test_verify_cfg.h diff --git a/src/mesa/drivers/dri/i965/test_verify_cfg.cpp b/src/mesa/drivers/dri/i965/test_verify_cfg.cpp new file mode 100644 index 000..0aa74c5 --- /dev/null +++ b/src/mesa/drivers/dri/i965/test_verify_cfg.cpp @@ -0,0 +1,273 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include gtest/gtest.h +#include test_verify_cfg.h +#include brw_cfg.h + +static bool +is_unconditional_jump(const backend_instruction *inst) +{ + return (inst-opcode == BRW_OPCODE_BREAK || + inst-opcode == BRW_OPCODE_CONTINUE || + inst-opcode == BRW_OPCODE_WHILE) + inst-predicate == BRW_PREDICATE_NONE; +} + +void +verify_cfg(backend_visitor *v) +{ + foreach_block(block, v-cfg) { + switch (block-start()-opcode) { + case BRW_OPCODE_ENDIF: { + /* Has two predecessors: + *- the previous block is always a predecessor + *- always a predecessor ending in an IF or an ELSE + * + * Note that if the body of the if block is empty, then the + * previous block *is* the block that ends with IF, so the ENDIF + * block will have the same predecessor twice. + */ + if (is_unconditional_jump(block-prev()-end())) { +EXPECT_EQ(block-parents.length(), 1u); + } else { +EXPECT_EQ(block-parents.length(), 2u); + } This obviously catches if (foo) { /* Do stuff */ } else { break; } but what about if (foo) { break; } else { /* Do stuff */ } Or worse, what about if (foo) { break; } else { continue; } Maybe there's something I'm missing here. + + bool found_prev = false, found_if = false, found_else = false, + found_other = false; + foreach_list_typed(bblock_link, parent, link, block-parents) { +if (parent-block == block-prev() !found_prev) + found_prev = true; +else if (parent-block-end()-opcode == BRW_OPCODE_IF) + found_if = true; +else if (parent-block-end()-opcode == BRW_OPCODE_ELSE) + found_else = true; +else + found_other = true; + } how do we detect if a parent is simply missing? Or are we trusting in the check above? + EXPECT_NE(found_prev, is_unconditional_jump(block-prev()-end())); + EXPECT_NE(found_if, found_else); + EXPECT_FALSE(found_other); + break; + } + case BRW_OPCODE_DO: { + /* Has two or more predecessors + *- the previous block is always a predecessor + *- always a predecessor ending in a WHILE + *- some number of predecessors ending in continue + */ + if (is_unconditional_jump(block-prev()-end())) { +EXPECT_GE(block-parents.length(), 1u); + } else { +EXPECT_GE(block-parents.length(), 2u); + } + + bool found_prev = false,
Re: [Mesa-dev] [PATCH 1/1] clover: Fix build after llvm r221375
On Thu, Nov 06, 2014 at 11:46:41AM -0500, Jan Vesely wrote: Signed-off-by: Jan Vesely jan.ves...@rutgers.edu I've pushed this, thanks! -Tom --- src/gallium/state_trackers/clover/llvm/invocation.cpp | 4 1 file changed, 4 insertions(+) diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp b/src/gallium/state_trackers/clover/llvm/invocation.cpp index e953822..3a4fcf0 100644 --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp @@ -282,7 +282,11 @@ namespace { for (unsigned i = 0; i kernel_node-getNumOperands(); ++i) { kernels.push_back(llvm::dyn_castllvm::Function( +#if HAVE_LLVM = 0x0306 + kernel_node-getOperandAsMDNode(i)-getOperand(0))); +#else kernel_node-getOperand(i)-getOperand(0))); +#endif } } -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] i965/cfg: Add functions to get first and last non-CF instructions.
1 and 4 are Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com On Wed, Nov 5, 2014 at 4:13 PM, Matt Turner matts...@gmail.com wrote: --- src/mesa/drivers/dri/i965/brw_cfg.h | 74 + 1 file changed, 74 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h b/src/mesa/drivers/dri/i965/brw_cfg.h index e6403bd..c2029cc 100644 --- a/src/mesa/drivers/dri/i965/brw_cfg.h +++ b/src/mesa/drivers/dri/i965/brw_cfg.h @@ -71,6 +71,12 @@ struct bblock_t { const bblock_t *next() const; bblock_t *prev(); const bblock_t *prev() const; + + bool starts_with_control_flow() const; + bool ends_with_control_flow() const; + + backend_instruction *first_non_control_flow_inst(); + backend_instruction *last_non_control_flow_inst(); #endif struct exec_node link; @@ -142,6 +148,50 @@ bblock_prev_const(const struct bblock_t *block) return (const struct bblock_t *)block-link.prev; } +static inline bool +bblock_starts_with_control_flow(const struct bblock_t *block) +{ + enum opcode op = bblock_start_const(block)-opcode; + return op == BRW_OPCODE_DO || op == BRW_OPCODE_ENDIF; +} + +static inline bool +bblock_ends_with_control_flow(const struct bblock_t *block) +{ + enum opcode op = bblock_end_const(block)-opcode; + return op == BRW_OPCODE_IF || + op == BRW_OPCODE_ELSE || + op == BRW_OPCODE_WHILE || + op == BRW_OPCODE_BREAK || + op == BRW_OPCODE_CONTINUE; +} + +static inline struct backend_instruction * +bblock_first_non_control_flow_inst(struct bblock_t *block) +{ + struct backend_instruction *inst = bblock_start(block); + if (bblock_starts_with_control_flow(block)) +#ifdef __cplusplus + inst = (struct backend_instruction *)inst-next; +#else + inst = (struct backend_instruction *)inst-link.next; +#endif + return inst; +} + +static inline struct backend_instruction * +bblock_last_non_control_flow_inst(struct bblock_t *block) +{ + struct backend_instruction *inst = bblock_end(block); + if (bblock_ends_with_control_flow(block)) +#ifdef __cplusplus + inst = (struct backend_instruction *)inst-prev; +#else + inst = (struct backend_instruction *)inst-link.prev; +#endif + return inst; +} + #ifdef __cplusplus inline backend_instruction * bblock_t::start() @@ -190,6 +240,30 @@ bblock_t::prev() const { return bblock_prev_const(this); } + +inline bool +bblock_t::starts_with_control_flow() const +{ + return bblock_starts_with_control_flow(this); +} + +inline bool +bblock_t::ends_with_control_flow() const +{ + return bblock_ends_with_control_flow(this); +} + +inline backend_instruction * +bblock_t::first_non_control_flow_inst() +{ + return bblock_first_non_control_flow_inst(this); +} + +inline backend_instruction * +bblock_t::last_non_control_flow_inst() +{ + return bblock_last_non_control_flow_inst(this); +} #endif struct cfg_t { -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] st/va: MPEG4 call vlVaDecoderFixMPEG4Startcode()
From: Michael Varga michael.va...@amd.com If the VOP and GOV headers were truncated they will be regenerated. Signed-off-by: Michael Varga michael.va...@amd.com --- src/gallium/state_trackers/va/picture.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/gallium/state_trackers/va/picture.c b/src/gallium/state_trackers/va/picture.c index 7107231..9b94b39 100644 --- a/src/gallium/state_trackers/va/picture.c +++ b/src/gallium/state_trackers/va/picture.c @@ -503,6 +503,13 @@ handleVASliceDataBufferType(vlVaContext *context, vlVaBuffer *buf) buffers[num_buffers] = (void *const)start_code_vc1; sizes[num_buffers++] = sizeof(start_code_vc1); break; + case PIPE_VIDEO_FORMAT_MPEG4: + if (bufHasStartcode(buf, 0x01, 24)) + break; + + vlVaDecoderFixMPEG4Startcode(context); + buffers[num_buffers] = (void *)context-mpeg4.start_code; + sizes[num_buffers++] = context-mpeg4.start_code_size; default: break; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] mesa: add runtime support for SSSE3
On Thu, Nov 6, 2014 at 4:18 AM, Timothy Arceri t_arc...@yahoo.com.au wrote: Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- configure.ac | 6 ++ src/mesa/x86/common_x86.c | 4 src/mesa/x86/common_x86_features.h | 4 +++- 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 03f1bca..cc0a661 100644 --- a/configure.ac +++ b/configure.ac @@ -258,6 +258,12 @@ if test x$SSE41_SUPPORTED = x1; then fi AM_CONDITIONAL([SSE41_SUPPORTED], [test x$SSE41_SUPPORTED = x1]) +AX_CHECK_COMPILE_FLAG([-mssse3], [SSSE3_SUPPORTED=1], [SSSE3_SUPPORTED=0]) +if test x$SSSE3_SUPPORTED = x1; then +DEFINES=$DEFINES -DUSE_SSSE3 +fi +AM_CONDITIONAL([SSSE3_SUPPORTED], [test x$SSSE3_SUPPORTED = x1]) + dnl Can't have static and shared libraries, default to static if user dnl explicitly requested. If both disabled, set to static since shared dnl was explicitly requested. diff --git a/src/mesa/x86/common_x86.c b/src/mesa/x86/common_x86.c index 25f5c40..ad0648a 100644 --- a/src/mesa/x86/common_x86.c +++ b/src/mesa/x86/common_x86.c @@ -352,6 +352,10 @@ _mesa_get_x86_features(void) __get_cpuid(1, eax, ebx, ecx, edx); +#ifdef bit_SSSE3 Why did you put this in an #ifdef bit_SSSE3? That's defined everywhere. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/6] st/va: MPEG4 populate the PPS structure
From: Michael Varga michael.va...@amd.com Signed-off-by: Michael Varga michael.va...@amd.com --- src/gallium/state_trackers/va/picture.c| 72 ++ src/gallium/state_trackers/va/va_private.h | 9 2 files changed, 81 insertions(+) diff --git a/src/gallium/state_trackers/va/picture.c b/src/gallium/state_trackers/va/picture.c index 8775681..a4eb26b 100644 --- a/src/gallium/state_trackers/va/picture.c +++ b/src/gallium/state_trackers/va/picture.c @@ -80,6 +80,12 @@ handlePictureParameterBuffer(vlVaDriver *drv, vlVaContext *context, vlVaBuffer * VAPictureParameterBufferMPEG2 *mpeg2; VAPictureParameterBufferH264 *h264; VAPictureParameterBufferVC1 * vc1; + VAPictureParameterBufferMPEG4 *mpeg4; + vlVaSurface *surf_forward; + vlVaSurface *surf_backward; + unsigned int i; + static const uint8_t default_intra_quant_matrix[64] = { 0 }; + static const uint8_t default_non_intra_quant_matrix[64] = { 0 }; switch (u_reduce_video_profile(context-decoder-profile)) { case PIPE_VIDEO_FORMAT_MPEG12: @@ -214,6 +220,72 @@ handlePictureParameterBuffer(vlVaDriver *drv, vlVaContext *context, vlVaBuffer * context-desc.vc1.pquant = vc1-pic_quantizer_fields.bits.pic_quantizer_scale; break; + case PIPE_VIDEO_FORMAT_MPEG4: + assert(buf-size = sizeof(VAPictureParameterBufferMPEG4) buf-num_elements == 1); + mpeg4 = buf-data; + + context-mpeg4.pps = *mpeg4; + + /* vop_width */ + /* vop_height */ + /* forward_reference_picture */ + /* backward_reference_picture */ + context-desc.mpeg4.short_video_header = +mpeg4-vol_fields.bits.short_video_header; + /* chroma_format */ + context-desc.mpeg4.interlaced = mpeg4-vol_fields.bits.interlaced; + /* obmc_disable */ + /* sprite_enable */ + /* sprite_warping_accuracy */ + context-desc.mpeg4.quant_type = mpeg4-vol_fields.bits.quant_type; + context-desc.mpeg4.quarter_sample = mpeg4-vol_fields.bits.quarter_sample; + /* data_partitioned */ + /* reversible_vlc */ + context-desc.mpeg4.resync_marker_disable = +mpeg4-vol_fields.bits.resync_marker_disable; + /* no_of_sprite_warping_points */ + /* sprite_trajectory_du */ + /* sprite_trajectory_dv */ + /* quant_precision */ + context-desc.mpeg4.vop_coding_type = mpeg4-vop_fields.bits.vop_coding_type; + /* backward_reference_vop_coding_type */ + /* vop_rounding_type */ + /* intra_dc_vlc_thr */ + context-desc.mpeg4.top_field_first = +mpeg4-vop_fields.bits.top_field_first; + context-desc.mpeg4.alternate_vertical_scan_flag = +mpeg4-vop_fields.bits.alternate_vertical_scan_flag; + context-desc.mpeg4.vop_fcode_forward = mpeg4-vop_fcode_forward; + context-desc.mpeg4.vop_fcode_backward = mpeg4-vop_fcode_backward; + context-desc.mpeg4.vop_time_increment_resolution = +mpeg4-vop_time_increment_resolution; + /* num_gobs_in_vop */ + /* num_macroblocks_in_gob */ + context-desc.mpeg4.trb[0] = mpeg4-TRB; + context-desc.mpeg4.trb[1] = mpeg4-TRB; + context-desc.mpeg4.trd[0] = mpeg4-TRD; + context-desc.mpeg4.trd[1] = mpeg4-TRD; + + /* default [non-]intra quant matrix because mpv does not set these + matrices */ + if (!context-desc.mpeg4.intra_matrix) + context-desc.mpeg4.intra_matrix = default_intra_quant_matrix; + if (!context-desc.mpeg4.non_intra_matrix) + context-desc.mpeg4.non_intra_matrix = default_non_intra_quant_matrix; + + surf_forward = handle_table_get(drv-htab, mpeg4-forward_reference_picture); + if (surf_forward) + context-desc.mpeg4.ref[0] = surf_forward-buffer; + surf_backward = handle_table_get(drv-htab, mpeg4-backward_reference_picture); + if (surf_backward) + context-desc.mpeg4.ref[1] = surf_backward-buffer; + + context-mpeg4.vti_bits = 0; + for (i = context-desc.mpeg4.vop_time_increment_resolution; i 0; i /= 2) + ++context-mpeg4.vti_bits; + + break; + default: break; } diff --git a/src/gallium/state_trackers/va/va_private.h b/src/gallium/state_trackers/va/va_private.h index 060a1fa..7d2fc24 100644 --- a/src/gallium/state_trackers/va/va_private.h +++ b/src/gallium/state_trackers/va/va_private.h @@ -162,6 +162,15 @@ typedef struct { struct pipe_vc1_picture_desc vc1; struct pipe_h264_picture_desc h264; } desc; + + struct { + unsigned long long int frame_num; + unsigned int start_code_size; + unsigned int vti_bits; + unsigned int quant_scale; + VAPictureParameterBufferMPEG4 pps; + uint8_t start_code[32]; + } mpeg4; } vlVaContext; typedef struct { -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy
On Thu, Nov 6, 2014 at 4:20 AM, Timothy Arceri t_arc...@yahoo.com.au wrote: Also cleans up some if statements in the *faster functions. Callgrind cpu usage results from pts benchmarks: For ytile_copy_faster() Nexuiz 1.6.1: 2.16% - 1.20% Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- src/mesa/Makefile.am | 8 +++ src/mesa/drivers/dri/i965/intel_tex_subimage.c | 82 ++ src/mesa/main/fast_rgba8_copy.c| 78 src/mesa/main/fast_rgba8_copy.h| 37 4 files changed, 141 insertions(+), 64 deletions(-) create mode 100644 src/mesa/main/fast_rgba8_copy.c create mode 100644 src/mesa/main/fast_rgba8_copy.h diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am index e71bccb..2402096 100644 --- a/src/mesa/Makefile.am +++ b/src/mesa/Makefile.am @@ -107,6 +107,10 @@ AM_CXXFLAGS = $(LLVM_CFLAGS) $(VISIBILITY_CXXFLAGS) ARCH_LIBS = +if SSSE3_SUPPORTED +ARCH_LIBS += libmesa_ssse3.la +endif + if SSE41_SUPPORTED ARCH_LIBS += libmesa_sse41.la endif @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \ main/streaming-load-memcpy.c libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1 +libmesa_ssse3_la_SOURCES = \ + main/fast_rgba8_copy.c +libmesa_ssse3_la_CFLAGS = $(AM_CFLAGS) -mssse3 + pkgconfigdir = $(libdir)/pkgconfig pkgconfig_DATA = gl.pc diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c b/src/mesa/drivers/dri/i965/intel_tex_subimage.c index cb5738a..0deeb75 100644 --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c @@ -27,6 +27,7 @@ **/ #include main/bufferobj.h +#include main/fast_rgba8_copy.h #include main/image.h #include main/macros.h #include main/mtypes.h @@ -42,9 +43,7 @@ #include intel_mipmap_tree.h #include intel_blit.h -#ifdef __SSSE3__ -#include tmmintrin.h -#endif +#include x86/common_x86_asm.h #define FILE_DEBUG_FLAG DEBUG_TEXTURE @@ -175,18 +174,6 @@ err: return false; } -#ifdef __SSSE3__ -static const uint8_t rgba8_permutation[16] = - { 2,1,0,3, 6,5,4,7, 10,9,8,11, 14,13,12,15 }; - -/* NOTE: dst must be 16 byte aligned */ -#define rgba8_copy_16(dst, src) \ - *(__m128i *)(dst) = _mm_shuffle_epi8(\ - (__m128i) _mm_loadu_ps((float *)(src)), \ - *(__m128i *) rgba8_permutation\ - ) -#endif - /** * Copy RGBA to BGRA - swap R and B. */ @@ -196,29 +183,6 @@ rgba8_copy(void *dst, const void *src, size_t bytes) uint8_t *d = dst; uint8_t const *s = src; -#ifdef __SSSE3__ - /* Fast copying for tile spans. -* -* As long as the destination texture is 16 aligned, -* any 16 or 64 spans we get here should also be 16 aligned. -*/ - - if (bytes == 16) { - assert(!(((uintptr_t)dst) 0xf)); - rgba8_copy_16(d+ 0, s+ 0); - return dst; - } - - if (bytes == 64) { - assert(!(((uintptr_t)dst) 0xf)); - rgba8_copy_16(d+ 0, s+ 0); - rgba8_copy_16(d+16, s+16); - rgba8_copy_16(d+32, s+32); - rgba8_copy_16(d+48, s+48); - return dst; - } -#endif - while (bytes = 4) { d[0] = s[2]; d[1] = s[1]; @@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, mem_copy_fn mem_copy) { if (x0 == 0 x3 == xtile_width y0 == 0 y1 == xtile_height) { - if (mem_copy == memcpy) - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, - dst, src, src_pitch, swizzle_bit, memcpy); - else if (mem_copy == rgba8_copy) - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, - dst, src, src_pitch, swizzle_bit, rgba8_copy); - } else { - if (mem_copy == memcpy) - return xtile_copy(x0, x1, x2, x3, y0, y1, - dst, src, src_pitch, swizzle_bit, memcpy); - else if (mem_copy == rgba8_copy) - return xtile_copy(x0, x1, x2, x3, y0, y1, - dst, src, src_pitch, swizzle_bit, rgba8_copy); + return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, +dst, src, src_pitch, swizzle_bit, mem_copy); } xtile_copy(x0, x1, x2, x3, y0, y1, dst, src, src_pitch, swizzle_bit, mem_copy); @@ -388,19 +341,8 @@ ytile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, mem_copy_fn mem_copy) { if (x0 == 0 x3 == ytile_width y0 == 0 y1 == ytile_height) { - if (mem_copy == memcpy) - return ytile_copy(0, 0, ytile_width, ytile_width, 0, ytile_height, - dst, src, src_pitch, swizzle_bit, memcpy); - else if (mem_copy ==
Re: [Mesa-dev] [PATCH 2/4] i965/fs: Wire up control flow correctly in predicated break pass.
On Wed, Nov 5, 2014 at 4:13 PM, Matt Turner matts...@gmail.com wrote: When the earlier block ended with control flow, we'd mistakenly remove some of its links to its children. The same happened with the later block. --- src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp b/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp index b7a1d7e..047c2c0 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp @@ -107,10 +107,14 @@ fs_visitor::opt_peephole_predicated_break() } endif_inst-remove(endif_block); - earlier_block-children.make_empty(); - later_block-parents.make_empty(); + if (!earlier_block-ends_with_control_flow()) { + earlier_block-children.make_empty(); + earlier_block-add_successor(cfg-mem_ctx, jump_block); + } - earlier_block-add_successor(cfg-mem_ctx, jump_block); + if (!later_block-starts_with_control_flow()) { + later_block-parents.make_empty(); + } I *think* this is correct. I didn't really understand it before and I don't 100% now. Acked-by: Jason Ekstrand jason.ekstr...@intel.com jump_block-add_successor(cfg-mem_ctx, later_block); if (earlier_block-can_combine_with(jump_block)) { -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 84566] Unify the format conversion code
https://bugs.freedesktop.org/show_bug.cgi?id=84566 --- Comment #50 from Jason Ekstrand ja...@jlekstrand.net --- (In reply to Iago Toral from comment #49) Jason you had this commit in your original branch: MAYBEREVERT: Fill X components with 1 That basically makes packing to padded formats (like RGBX) set X=1. In the commit message you mention that you are not sure you like this... my opinion is that this should not be necessary and it degrades performance when packing to these format, so I would be more for removing this one. What do you think? FWIW, I have checked that removing this commit does not affect piglit results. Go ahead and remove it. -- You are receiving this mail because: You are the QA Contact for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/6] st/va: refactored handleVASliceDataBufferType
From: Michael Varga michael.va...@amd.com This patch cleans the function handleVASliceDataBufferType() for better readability. Signed-off-by: Michael Varga michael.va...@amd.com --- src/gallium/state_trackers/va/picture.c | 75 ++--- 1 file changed, 40 insertions(+), 35 deletions(-) diff --git a/src/gallium/state_trackers/va/picture.c b/src/gallium/state_trackers/va/picture.c index 57d1fb1..8775681 100644 --- a/src/gallium/state_trackers/va/picture.c +++ b/src/gallium/state_trackers/va/picture.c @@ -272,51 +272,56 @@ handleSliceParameterBuffer(vlVaContext *context, vlVaBuffer *buf) } } +static unsigned int +bufHasStartcode(vlVaBuffer *buf, unsigned int code, unsigned int bits) +{ + struct vl_vlc vlc = {0}; + int i; + + /* search the first 64 bytes for a startcode */ + vl_vlc_init(vlc, 1, (const void * const*)buf-data, buf-size); + for (i = 0; i 64 vl_vlc_bits_left(vlc) = bits; ++i) { + if (vl_vlc_peekbits(vlc, bits) == code) + return 1; + vl_vlc_eatbits(vlc, 8); + vl_vlc_fillbits(vlc); + } + + return 0; +} + static void handleVASliceDataBufferType(vlVaContext *context, vlVaBuffer *buf) { + enum pipe_video_format format; unsigned num_buffers = 0; void * const *buffers[2]; unsigned sizes[2]; - enum pipe_video_format format; + static const uint8_t start_code_h264[] = { 0x00, 0x00, 0x01 }; + static const uint8_t start_code_vc1[] = { 0x00, 0x00, 0x01, 0x0d }; format = u_reduce_video_profile(context-decoder-profile); - if (format == PIPE_VIDEO_FORMAT_MPEG4_AVC || - format == PIPE_VIDEO_FORMAT_VC1) { - struct vl_vlc vlc = {0}; - bool found = false; - int peek_bits, i; - - /* search the first 64 bytes for a startcode */ - vl_vlc_init(vlc, 1, (const void * const*)buf-data, buf-size); - peek_bits = (format == PIPE_VIDEO_FORMAT_MPEG4_AVC) ? 24 : 32; - for (i = 0; i 64 vl_vlc_bits_left(vlc) = peek_bits; ++i) { - uint32_t value = vl_vlc_peekbits(vlc, peek_bits); - if ((format == PIPE_VIDEO_FORMAT_MPEG4_AVC value == 0x01) || -(format == PIPE_VIDEO_FORMAT_VC1 (value == 0x010d || -value == 0x010c || value == 0x010b))) { -found = true; + switch (format) { + case PIPE_VIDEO_FORMAT_MPEG4_AVC: + if (bufHasStartcode(buf, 0x01, 24)) break; - } - vl_vlc_eatbits(vlc, 8); - vl_vlc_fillbits(vlc); - } - /* none found, ok add one manually */ - if (!found) { - static const uint8_t start_code_h264[] = { 0x00, 0x00, 0x01 }; - static const uint8_t start_code_vc1[] = { 0x00, 0x00, 0x01, 0x0d }; - - if (format == PIPE_VIDEO_FORMAT_MPEG4_AVC) { -buffers[num_buffers] = (void *const)start_code_h264; -sizes[num_buffers] = sizeof(start_code_h264); - } - else { -buffers[num_buffers] = (void *const)start_code_vc1; -sizes[num_buffers] = sizeof(start_code_vc1); - } - ++num_buffers; - } + + buffers[num_buffers] = (void *const)start_code_h264; + sizes[num_buffers++] = sizeof(start_code_h264); + break; + case PIPE_VIDEO_FORMAT_VC1: + if (bufHasStartcode(buf, 0x010d, 32) || + bufHasStartcode(buf, 0x010c, 32) || + bufHasStartcode(buf, 0x010b, 32)) + break; + + buffers[num_buffers] = (void *const)start_code_vc1; + sizes[num_buffers++] = sizeof(start_code_vc1); + break; + default: + break; } + buffers[num_buffers] = buf-data; sizes[num_buffers] = buf-size; ++num_buffers; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/6] st/va: MPEG4 populate the SPS structure
From: Michael Varga michael.va...@amd.com Signed-off-by: Michael Varga michael.va...@amd.com --- src/gallium/state_trackers/va/picture.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/gallium/state_trackers/va/picture.c b/src/gallium/state_trackers/va/picture.c index 327c33d..ff13bc6 100644 --- a/src/gallium/state_trackers/va/picture.c +++ b/src/gallium/state_trackers/va/picture.c @@ -344,6 +344,7 @@ static void handleSliceParameterBuffer(vlVaContext *context, vlVaBuffer *buf) { VASliceParameterBufferH264 *h264; + VASliceParameterBufferMPEG4 *mpeg4; switch (u_reduce_video_profile(context-decoder-profile)) { case PIPE_VIDEO_FORMAT_MPEG4_AVC: @@ -354,7 +355,12 @@ handleSliceParameterBuffer(vlVaContext *context, vlVaBuffer *buf) context-desc.h264.num_ref_idx_l1_active_minus1 = h264-num_ref_idx_l1_active_minus1; break; + case PIPE_VIDEO_FORMAT_MPEG4: + assert(buf-size = sizeof(VASliceParameterBufferMPEG4) buf-num_elements == 1); + mpeg4 = buf-data; + context-mpeg4.quant_scale = mpeg4-quant_scale; + break; default: break; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6] st/va: MPEG4 populate the iq matrix buffers
From: Michael Varga michael.va...@amd.com Signed-off-by: Michael Varga michael.va...@amd.com --- src/gallium/state_trackers/va/picture.c | 16 1 file changed, 16 insertions(+) diff --git a/src/gallium/state_trackers/va/picture.c b/src/gallium/state_trackers/va/picture.c index a4eb26b..327c33d 100644 --- a/src/gallium/state_trackers/va/picture.c +++ b/src/gallium/state_trackers/va/picture.c @@ -296,6 +296,7 @@ handleIQMatrixBuffer(vlVaContext *context, vlVaBuffer *buf) { VAIQMatrixBufferMPEG2 *mpeg2; VAIQMatrixBufferH264 *h264; + VAIQMatrixBufferMPEG4 *mpeg4; switch (u_reduce_video_profile(context-decoder-profile)) { case PIPE_VIDEO_FORMAT_MPEG12: @@ -319,6 +320,21 @@ handleIQMatrixBuffer(vlVaContext *context, vlVaBuffer *buf) memcpy(context-desc.h264.pps-ScalingList8x8, h264-ScalingList8x8, 2 * 64); break; + case PIPE_VIDEO_FORMAT_MPEG4: + assert(buf-size = sizeof(VAIQMatrixBufferMPEG4) buf-num_elements == 1); + mpeg4 = buf-data; + + if (mpeg4-load_intra_quant_mat) + context-desc.mpeg4.intra_matrix = mpeg4-intra_quant_mat; + else + context-desc.mpeg4.intra_matrix = NULL; + + if (mpeg4-load_non_intra_quant_mat) + context-desc.mpeg4.non_intra_matrix = mpeg4-non_intra_quant_mat; + else + context-desc.mpeg4.non_intra_matrix = NULL; + break; + default: break; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] egl_dri2: Allow both 24 and 32 bit X visuals for RGBA configs
If there's no feedback for so long, I suggest you commit the patch without review. If it fixes a bug, that's one more reason to commit it. Marek On Thu, Nov 6, 2014 at 2:12 PM, Emil Velikov emil.l.veli...@gmail.com wrote: Humble ping x2 On 14/10/14 15:25, Emil Velikov wrote: Humble ping. On 23/09/14 01:25, Emil Velikov wrote: From: Sjoerd Simons sjoerd.sim...@collabora.co.uk When using RGBA EGLConfigs allow both RGB and RGBA X visuals, such that application can decide whether they want to use RGBA (and have the compositor blend their windows). On my system with this change EGLConfigs with a 24 bit visual comes up first, as such applications blindly picking the first EGLConfig will still get an RGB X visual. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67676 --- Hello gents, This patch has been stuck in bugzilla since February this year. Bringing it around here to gather greater exposure and perhaps some comments/reviews. -Emil src/egl/drivers/dri2/egl_dri2.c | 5 + src/egl/drivers/dri2/platform_x11.c | 17 + 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index 20a7243..2ed90a7 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -110,6 +110,11 @@ EGLint dri2_to_egl_attribute_map[] = { static EGLBoolean dri2_match_config(const _EGLConfig *conf, const _EGLConfig *criteria) { + + if (criteria-NativeVisualID != EGL_DONT_CARE +conf-NativeVisualID != criteria-NativeVisualID) + return EGL_FALSE; + if (_eglCompareConfigs(conf, criteria, NULL, EGL_FALSE) != 0) return EGL_FALSE; diff --git a/src/egl/drivers/dri2/platform_x11.c b/src/egl/drivers/dri2/platform_x11.c index a7a7338..3395fb7 100644 --- a/src/egl/drivers/dri2/platform_x11.c +++ b/src/egl/drivers/dri2/platform_x11.c @@ -672,14 +672,15 @@ dri2_x11_add_configs_for_visuals(struct dri2_egl_display *dri2_dpy, dri2_add_config(disp, dri2_dpy-driver_configs[j], id++, surface_type, config_attrs, rgba_masks); -/* Allow a 24-bit RGB visual to match a 32-bit RGBA EGLConfig. - * Otherwise it will only match a 32-bit RGBA visual. On a - * composited window manager on X11, this will make all of the - * EGLConfigs with destination alpha get blended by the - * compositor. This is probably not what the application - * wants... especially on drivers that only have 32-bit RGBA - * EGLConfigs! */ -if (d.data-depth == 24) { +/* Allow both 24-bit RGB visual and 32 bit RGBA to match a 32-bit + * RGBA EGLConfig. Otherwise it will only match a 32-bit RGBA + * visual. On a composited window manager on X11, this will make + * all of the EGLConfigs with destination alpha get blended by the + * compositor. This is probably not what the application wants... + * especially on drivers that only have 32-bit RGBA EGLConfigs! + * Allowing both allows applications to make the decision whether + * 32 bit visuals are intended */ +if (d.data-depth == 24 || d.data-depth == 32) { rgba_masks[3] = ~(rgba_masks[0] | rgba_masks[1] | rgba_masks[2]); dri2_add_config(disp, dri2_dpy-driver_configs[j], id++, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/4] i965: Add code to verify the CFG is sane.
On Thu, Nov 6, 2014 at 9:41 AM, Jason Ekstrand ja...@jlekstrand.net wrote: In general, it seems as if this can miss several things. For instance, it checks that all the predicessors are valid but never that we have all the predecessors. I'm not sure what you mean. That we don't validate that if A - B then B has an incoming edge from A? That's true. I think that would be a good addition. I'm not sure I want to do that in this patch. It was pretty big as is. Same for successors. If we really want to be able to validate a CFG, maybe a stack-based approach like calculate_cfg would work better? Also, did you run this on piglit/shader-db to ensure that everything coming out of calculate_cfg actually passes? I don't think it will pass because of the things you pointed out about unconditional jumps. More comments inline On Wed, Nov 5, 2014 at 4:13 PM, Matt Turner matts...@gmail.com wrote: --- src/mesa/drivers/dri/i965/test_verify_cfg.cpp | 273 ++ src/mesa/drivers/dri/i965/test_verify_cfg.h | 26 +++ 2 files changed, 299 insertions(+) create mode 100644 src/mesa/drivers/dri/i965/test_verify_cfg.cpp create mode 100644 src/mesa/drivers/dri/i965/test_verify_cfg.h diff --git a/src/mesa/drivers/dri/i965/test_verify_cfg.cpp b/src/mesa/drivers/dri/i965/test_verify_cfg.cpp new file mode 100644 index 000..0aa74c5 --- /dev/null +++ b/src/mesa/drivers/dri/i965/test_verify_cfg.cpp @@ -0,0 +1,273 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include gtest/gtest.h +#include test_verify_cfg.h +#include brw_cfg.h + +static bool +is_unconditional_jump(const backend_instruction *inst) +{ + return (inst-opcode == BRW_OPCODE_BREAK || + inst-opcode == BRW_OPCODE_CONTINUE || + inst-opcode == BRW_OPCODE_WHILE) + inst-predicate == BRW_PREDICATE_NONE; +} + +void +verify_cfg(backend_visitor *v) +{ + foreach_block(block, v-cfg) { + switch (block-start()-opcode) { + case BRW_OPCODE_ENDIF: { + /* Has two predecessors: + *- the previous block is always a predecessor + *- always a predecessor ending in an IF or an ELSE + * + * Note that if the body of the if block is empty, then the + * previous block *is* the block that ends with IF, so the ENDIF + * block will have the same predecessor twice. + */ + if (is_unconditional_jump(block-prev()-end())) { +EXPECT_EQ(block-parents.length(), 1u); + } else { +EXPECT_EQ(block-parents.length(), 2u); + } This obviously catches if (foo) { /* Do stuff */ } else { break; } but what about if (foo) { break; } else { /* Do stuff */ } Or worse, what about if (foo) { break; } else { continue; } Maybe there's something I'm missing here. I think you're right. For purposes of testing the predicated break pass I didn't really care about unconditional jumps (since they would be removed by the pass!). I added the unconditional jump cases as an after thought. So the question is whether I can actually test the cases you mention in the current model? I'm not super excited expending a bunch more effort to rewrite some testing code that already tests the thing I want. + + bool found_prev = false, found_if = false, found_else = false, + found_other = false; + foreach_list_typed(bblock_link, parent, link, block-parents) { +if (parent-block == block-prev() !found_prev) + found_prev = true; +else if (parent-block-end()-opcode == BRW_OPCODE_IF) + found_if = true; +else if (parent-block-end()-opcode == BRW_OPCODE_ELSE) + found_else = true; +
Re: [Mesa-dev] [PATCH 2/4] i965/fs: Wire up control flow correctly in predicated break pass.
On Thu, Nov 6, 2014 at 10:07 AM, Jason Ekstrand ja...@jlekstrand.net wrote: On Wed, Nov 5, 2014 at 4:13 PM, Matt Turner matts...@gmail.com wrote: When the earlier block ended with control flow, we'd mistakenly remove some of its links to its children. The same happened with the later block. --- src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp b/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp index b7a1d7e..047c2c0 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_peephole_predicated_break.cpp @@ -107,10 +107,14 @@ fs_visitor::opt_peephole_predicated_break() } endif_inst-remove(endif_block); - earlier_block-children.make_empty(); - later_block-parents.make_empty(); + if (!earlier_block-ends_with_control_flow()) { + earlier_block-children.make_empty(); + earlier_block-add_successor(cfg-mem_ctx, jump_block); + } - earlier_block-add_successor(cfg-mem_ctx, jump_block); + if (!later_block-starts_with_control_flow()) { + later_block-parents.make_empty(); + } I *think* this is correct. I didn't really understand it before and I don't 100% now. Basically, the earlier code incorrectly assumed that the blocks were were attempting to merge did not start/end in control flow and therefore had only a single successor. In that case it was safe to erase all of the successor/predecessors and recreate them. When the blocks actually start/end with control flow, they have multiple successors/predecessors and that assumption is false. In that case, we're not going to be able to merge blocks and we shouldn't destroy their successor/predecessor information. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl/list: Add an exec_list_validate function
On Wed, Nov 5, 2014 at 5:12 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Wed, Nov 5, 2014 at 2:46 PM, Matt Turner matts...@gmail.com wrote: On Wed, Nov 5, 2014 at 2:00 PM, Jason Ekstrand ja...@jlekstrand.net wrote: This can be very useful for trying to debug list corruptions. Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com Cc: Ian Romanick i...@freedesktop.org --- src/glsl/list.h | 14 ++ 1 file changed, 14 insertions(+) diff --git a/src/glsl/list.h b/src/glsl/list.h index aac13fd..1d18ec9 100644 --- a/src/glsl/list.h +++ b/src/glsl/list.h @@ -521,6 +521,20 @@ exec_node_insert_list_before(struct exec_node *n, struct exec_list *before) exec_list_make_empty(before); } +static inline void +exec_list_validate(struct exec_list *list) +{ + assert(list-head-prev == (struct exec_node *) list-head); + assert(list-tail == NULL); + assert(list-tail_pred-next == (struct exec_node *) list-tail); + + for (struct exec_node *node = list-head; +node-next != NULL; node = node-next) { Just use foreach_in_list(). Sure, I can do that. + assert(node-next-prev == node); + assert(node-prev-next == node); + } +} Are you intending to call this from gdb? I'm having a hard time imagining committing code that *sometimes* corrupts lists, which seems like why this function would be useful to call from real code. It is useful to call from gdb but I also call it all over nir_validate.c. I spent most of today fighting linked list corruptions, and this was very helpful for tracking them down. Another option would be rename it to exec_list_is_valid and make it return a bool. Then the standard procedure would be assert(exec_list_is_valid(list)). Would that be better? If you just want to call it from gdb, wrap the whole thing in #ifndef NDEBUG. I don't want to ever accidentally call this function and think it validated something when it actually did nothing. Yeah, ignore all this. Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] gallivm: Properly update for removal of JITMemoryManager in LLVM 3.6.
On Wed, Oct 22, 2014 at 11:11 AM, jfons...@vmware.com wrote: From: José Fonseca jfons...@vmware.com JITMemoryManager was removed in LLVM 3.6, and replaced by its base class RTDyldMemoryManager. This change fixes our JIT memory managers specializations to derive from RTDyldMemoryManager in LLVM 3.6 instead of JITMemoryManager. This enables llvmpipe to run with LLVM 3.6. Should these be marked for stable? Or does 10.3.x work with 3.6? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] New stable-branch 10.3 candidate pushed
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello list, Barring a slight delay, during which I was expecting more non-freedreno patches, it's time for yet another 10.3 stable candidate. Currently we have: - 102 queued (89 of which are for freedreno) - 4 nominated (outstanding) - and 3 rejected patches In a nut shell this gives us - freedreno: 0ad is playable (+ 395 piglit fixes) - r600/radeonsi: 3 rendering fixes and one crash - glsl: 3 bug-fixes/crashes - build: fixed static linking against llvm, fix msvc and android builds Take a look at section Mesa stable queue for more information. Testing - --- The following results are against piglit a636a3610d7. Fixes - gallium swrast - -- Tests: - arb_explicit_uniform_location-use-of-unused-loc (crash pass) - linker/set-location-to-sampler (fail pass) Fixes - classic i965(snb) - - Tests: - arb_explicit_uniform_location-use-of-unused-loc (crash pass) - linker/set-location-to-sampler (fail pass) - GLX_OML_sync_control/timing -divisor 2 (warn pass) - GLX_OML_sync_control/timing -msc-delta 1 (warn pass) Regressions - classic i965(snb) - --- Tests: - GLX_OML_sync_control/timing -divisor 1 (pass warn) The OMX_sync_control tests continue to oscillate between warn and pass, with the result depending on the moon cycle :P Testing reports/general approval - Any testing reports (or general approval of the state of the branch) will be greatly appreciated. Trivial merge conflicts - --- Here are the commits where I manually merged conflicts, (so these might merit additional review): commit 4956788a5f40fff2ea72d0f5bd6fcdb116492896 Author: Anuj Phogat anuj.pho...@gmail.com Date: Mon Sep 22 15:10:28 2014 -0700 glsl: Use signed array index in update_max_array_access() (cherry picked from commit 7a652c41b4de4bdbb954a4ebf6cdb605d197e999) As usual the plan is to have the next stable (10.3.3) released this Friday, so if you have any questions or comments that you would like to share before the release, please go ahead. Cheers, Emil Mesa stable queue - - Nominated (4) = Kenneth Graunke (1): i965: Fix an off-by-1 error in the draw upload code's size calculation. Tom Stellard (2): radeonsi: Program RASTER_CONFIG for harvested GPUs v4 R600/SI: radeonsi: Program RASTER_CONFIG for harvested GPUs v3 Tomasz Figa (1): glsl: Fix no return value for non-void function Rejected(3) === Anuj Phogat (2): glsl: No compile error for out of bounds array index glsl: Don't abort if array index is out of bounds Mauro Rossi (1): gallium/nouveau: use std::isfinite in c++ sources Queued (102) Anuj Phogat (2): glsl: Fix crash due to negative array index glsl: Use signed array index in update_max_array_access() Brian Paul (1): mesa: fix UNCLAMPED_FLOAT_TO_UBYTE() macro for MSVC Ilia Mirkin (27): freedreno/ir3: INEG operates on src0, not src1 freedreno/ir3: add UARL support freedreno/ir3: negate result of USLT/etc freedreno/ir3: use unsigned comparison for UIF freedreno/ir3: add TXL support freedreno/ir3: fix UCMP handling freedreno/ir3: implement UMUL correctly freedreno: add default .dir-locals.el for emacs settings freedreno/ir3: make texture instruction construction more dynamic freedreno/ir3: fix TXB/TXL to actually pull the bias/lod argument freedreno/ir3: add TXQ support freedreno/ir3: add TXB2 support freedreno: dual-source render targets are not supported freedreno: instanced drawing/compute not yet supported freedreno/ir3: avoid fan-in sources referring to same instruction freedreno/ir3: add IDIV/UDIV support freedreno/ir3: add UMOD support, based on UDIV freedreno/ir3: add MOD support freedreno/ir3: add ISSG support freedreno/ir3: add UMAD support freedreno/ir3: make TXQ return integers, not floats freedreno/ir3: shadow comes before array freedreno/ir3: add texture offset support freedreno/ir3: add TXD support and expose ARB_shader_texture_lod freedreno/ir3: add TXF support freedreno: positions come out as integers, not half-integers freedreno/ir3: fix FSLT/etc handling to return 0/-1 instead of 0/1.0 Jan Vesely (1): configure: include llvm systemlibs when using static llvm Marek Olšák (5): r600g: fix polygon mode for points and lines and point/line fill modes radeonsi: fix polygon mode for points and lines and point/line fill modes radeonsi: fix incorrect index buffer max size for lowered 8-bit indices Revert st/mesa: set MaxUnrollIterations = 255 r300g: remove enabled/disabled hyperz and AA compression messages Mauro Rossi (1): gallium/nouveau: fully build the driver under android Michel
Re: [Mesa-dev] [PATCH 3/3] gallivm: Properly update for removal of JITMemoryManager in LLVM 3.6.
On 06/11/14 19:20, Matt Turner wrote: On Wed, Oct 22, 2014 at 11:11 AM, jfons...@vmware.com wrote: From: José Fonseca jfons...@vmware.com JITMemoryManager was removed in LLVM 3.6, and replaced by its base class RTDyldMemoryManager. This change fixes our JIT memory managers specializations to derive from RTDyldMemoryManager in LLVM 3.6 instead of JITMemoryManager. This enables llvmpipe to run with LLVM 3.6. Should these be marked for stable? Or does 10.3.x work with 3.6? If I understood Michel correctly a while back - it's not the best of ideas to add support for unreleased llvm in the mesa stable branches. As the llvm release is made we can obviously pick them up when needed. Cheers, Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH] i965: Fix an off-by-1 error in the draw upload code's size calculation.
On Thursday, November 06, 2014 02:55:25 PM Emil Velikov wrote: Hi Ken, From what I've gathered the proposed patch is incorrect and is (most likely) working around a buggy application behaviour. Afaics Ian suggested that we add a driconf option for such applications. Should I consider this patch for the stable branch or the above sounds about right and we can drop it ? Thanks Emil We should drop it. Sorry for the noise... --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] gallivm: Properly update for removal of JITMemoryManager in LLVM 3.6.
On Thu, Nov 6, 2014 at 11:27 AM, Emil Velikov emil.l.veli...@gmail.com wrote: On 06/11/14 19:20, Matt Turner wrote: On Wed, Oct 22, 2014 at 11:11 AM, jfons...@vmware.com wrote: From: José Fonseca jfons...@vmware.com JITMemoryManager was removed in LLVM 3.6, and replaced by its base class RTDyldMemoryManager. This change fixes our JIT memory managers specializations to derive from RTDyldMemoryManager in LLVM 3.6 instead of JITMemoryManager. This enables llvmpipe to run with LLVM 3.6. Should these be marked for stable? Or does 10.3.x work with 3.6? If I understood Michel correctly a while back - it's not the best of ideas to add support for unreleased llvm in the mesa stable branches. As the llvm release is made we can obviously pick them up when needed. Cheers, Emil Ah, sure. That makes sense. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH] i965: Fix an off-by-1 error in the draw upload code's size calculation.
On 06/11/14 19:29, Kenneth Graunke wrote: On Thursday, November 06, 2014 02:55:25 PM Emil Velikov wrote: Hi Ken, From what I've gathered the proposed patch is incorrect and is (most likely) working around a buggy application behaviour. Afaics Ian suggested that we add a driconf option for such applications. Should I consider this patch for the stable branch or the above sounds about right and we can drop it ? Thanks Emil We should drop it. Sorry for the noise... Ack. There is nothing to apologise for :) -Emil --Ken ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On Wed, Nov 5, 2014 at 12:54 PM, Matt Turner matts...@gmail.com wrote: On Wed, Nov 5, 2014 at 12:50 PM, Timothy Arceri t_arc...@yahoo.com.au wrote: There have been quite a few eyes over this now but nobody has given it a reviewed by yet. Would be nice to get it in before the code freeze. Any takers? Yes, I'll make sure that happens. I made a couple of trivial changes to the commit message and added some spaces between __m128i and * in casts and pushed it with review. Thanks! ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 84566] Unify the format conversion code
https://bugs.freedesktop.org/show_bug.cgi?id=84566 --- Comment #51 from Jason Ekstrand ja...@jlekstrand.net --- (In reply to Samuel Iglesias from comment #48) (In reply to Jason Ekstrand from comment #34) (In reply to Samuel Iglesias from comment #33) Jason, I would like to know your opinion about the integer RGBA clamping done in pack.c (_mesa_pack_rgba_span_from_ints()). glReadPixels() and glGetTexImage() specs said (for example, in OpenGL 3.3. core specification) that for an integer RGBA color, each component is clamped to the representable range of type. Those GL functions end up calling pack.c's functions for performing the conversion (mesa_pack_rgba_span_from_ints() and others). It's possible to replace some of those pack.c's conversion functions by the master conversion but the problem is in the clamping operation. I would like to add a boolean argument called clamp to the master conversion function which is passed to _mesa_swizzle_and_convert() where each of its convert_{uint,int,byte,ubyte,short,ushort}() do the right thing when clamp is true. This clamp parameter would be false for every call to either master conversion function or _mesa_swizzle_and_convert() except when they are being called from pack.c. What do you think? Supporting clamping is probably fine. I think we determined we needed a clamp parameter anyway for some of the float conversions. I guess it makes sense for integers too. Let's watch out for performance when we implement it though. Changing the loop inside mesa_swizzle_and_convert without hurting performance can be tricky. The teximage-colors test in Piglit has a -benchmark flag that can be used for testing that. In the end, we did not make that change in pack.c as we could just use the autogenerated format pack/unpack functions. However I am retaking this topic again because we found another issue which would require a similar solution: The convert_*() functions in format_utils.c convert between source and destination data and are used by _mesa_swizzle_and_convert. We found that these were not good enough for various conversions that involved non-normalized types of different sizes: INT to SHORT, INT to BYTE, etc. Because of that, several piglit tests related to glGetTexImage() and others failed, like for example bin/ext_texture_integer-getteximage-clamping. In order to fix that we added the clamp expressions for these cases [1] and with that we achieved no regressions when executing a full piglit run on i965 driver. Unfortunately, when testing our patches on a couple of Gallium drivers, we found a regression that we tracked down back to that patch: bin/arb_clear_buffer_object-formats. Reverting the patch makes fixes the problem with these Gallium drivers but then, bin/ext_texture_integer-getteximage-clamping fails again on i965. We wonder if there could be more cases like this that piglit is not covering, since it looks weird that this affects just this one test. So, we wonder if that patch for _mesa_swizzle_and_convert is correct and we should fix the failed gallium cases elsewhere, or if we should revert that patch and fix the cases it fixed in a different way. What do you think? Was _mesa_swizzle_and_convert implemented like that on purpose or are these real bugs? From my brief reading of the GL spec, it looks like clamping integers to the max representable range is what it expects by default. From the glTexImage spec: The selected groups are transferred to the GL as described in section 3.7.2 and then clamped to the representable range of the internal format. If the internalformat of the texture is signed or unsigned integer, components are clamped to [-2^(n-1), 2^(n-1)-1] or [0, 2^(n-1)-1], respectively, where n is the number of bits per component. For color component groups, if the internalformat of the texture is signed or unsigned normalized fixed-point, components are clamped t0 [-1, 1] or [0, 1], respectively. Therefore, it seems as if we want to be clamping when we have integer destinations. I'm not sure why the gallium drivers are regressing when you do. One more observation is that it doesn't look like your clamping patch is complete. If we're going to clamp when we have an integer destination, we should always clamp with integer destinations, not just in the few cases that piglit hits. I wouldn't be surprised if piglit's coverage in those areas is terrible. If we decide to revert our clamp patch, then a solution could be to have a separate implementation of _mesa_swizzle_and_convert() and its convert_*() functions that clamps. We would have to use that version in glGetTexImage() (maybe in glReadPIxels too) and use the normal version for texture uploads (texstore). If we do this, then we would have to add a clamp flag to _mesa_format_convert, the same flag to _mesa_swizzle_and_convert and the have a new
Re: [Mesa-dev] ARB_texture_buffer_range offsets
But even dword offsets aren't translatable right now to pipe_sampler_view first_elements if the format has more than 32 bits. Roland Am 06.11.2014 um 18:21 schrieb Marek Olšák: For radeonsi, I think only x8, x8y8, and x16 fetches can be byte-aligned. Everything else is dword-aligned (the 2 lowest bits are ignored). I guess the cap should be 4 then. Marek On Thu, Nov 6, 2014 at 4:55 PM, Roland Scheidegger srol...@vmware.com wrote: Am 06.11.2014 um 16:15 schrieb Marek Olšák: I'd say it's a spec bug. ARB_texture_buffer_range should say that the offset should be a multiple of an element size, but it doesn't. The question is, what should the element size be? One component or the whole pixel? Imho whole pixel (for block compressed that would be full block, for things like packed 565 too but neither are possible in GL), i.e. format granularity. That said, the whole alignment thing is problematic for rgb32 (and the possiblity of that was added later, ARB_texture_buffer_object_rgb32), so maybe it's things like that why the offset can be just byte aligned (in other words, I'm not convinced it's just a spec bug, d3d10 doesn't have that problem with alignment). Roland Marek On Wed, Nov 5, 2014 at 9:08 PM, Roland Scheidegger srol...@vmware.com wrote: Trying to fix some bug due to alignment issues in llvmpipe's vertex fetch, I came across some issue with ARB_texture_buffer_range. Namely, it looks like the offsets specified there are always in bytes, regardless the actual format (hence, as long as the TEXTURE_BUFFER_OFFSET_ALIGNMENT is 1, it would be allowed to have an offset of 15 bytes for a rgba32f format for instance making all fetches quite unaligned). However in gallium we actually have first_elem and last_elem parameters in the sampler views which are specified in number of elements (so takes the format into account), which is what d3d10 does and the state tracker translates to that apparently. IMHO d3d10 makes way more sense there because that way the necessary alignment scales automatically depending on the format (so, if the format is 2x16bit for instance you'd need 4 byte alignment for the offset, and only need 16 bytes alignment for 4x32bit, ensuring all lookups are always aligned). This means that 15 byte offset in the example above is completely untranslatable. But if I see that right, OpenGL doesn't work like that, meaning effectively gallium drivers (and I doubt most other drivers neither) cannot actually claim to support TEXTURE_BUFFER_OFFSET_ALIGNMENT lower than 16, even if they'd only need that for 4x32bit formats. Though most gallium drivers indeed claim 1 right now. Looks quite messy... Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIBaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_Im=Ds_jdCUhL1dGXrkeea1fzl6_iInrZFJOSltaM6dlF9ws=BNwWkIpsz9GFgPRoMLDU8tEVUPzmIxKINN3Uu9evnXse= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] egl_dri2: Allow both 24 and 32 bit X visuals for RGBA configs
I thought Eric and Chad already NAKed it in bugzilla. The problem is that applications ask for an RGBA visual for GL blending. They use the alpha channel to generate their images, but the final alpha values are, basically, random... and the composited result would be pure garbage. As Chad points out in comment #1, EGL just doesn't let applications do the thing the patch is trying to do. On 11/06/2014 05:12 AM, Emil Velikov wrote: Humble ping x2 On 14/10/14 15:25, Emil Velikov wrote: Humble ping. On 23/09/14 01:25, Emil Velikov wrote: From: Sjoerd Simons sjoerd.sim...@collabora.co.uk When using RGBA EGLConfigs allow both RGB and RGBA X visuals, such that application can decide whether they want to use RGBA (and have the compositor blend their windows). On my system with this change EGLConfigs with a 24 bit visual comes up first, as such applications blindly picking the first EGLConfig will still get an RGB X visual. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67676 --- Hello gents, This patch has been stuck in bugzilla since February this year. Bringing it around here to gather greater exposure and perhaps some comments/reviews. -Emil src/egl/drivers/dri2/egl_dri2.c | 5 + src/egl/drivers/dri2/platform_x11.c | 17 + 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index 20a7243..2ed90a7 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -110,6 +110,11 @@ EGLint dri2_to_egl_attribute_map[] = { static EGLBoolean dri2_match_config(const _EGLConfig *conf, const _EGLConfig *criteria) { + + if (criteria-NativeVisualID != EGL_DONT_CARE +conf-NativeVisualID != criteria-NativeVisualID) + return EGL_FALSE; + if (_eglCompareConfigs(conf, criteria, NULL, EGL_FALSE) != 0) return EGL_FALSE; diff --git a/src/egl/drivers/dri2/platform_x11.c b/src/egl/drivers/dri2/platform_x11.c index a7a7338..3395fb7 100644 --- a/src/egl/drivers/dri2/platform_x11.c +++ b/src/egl/drivers/dri2/platform_x11.c @@ -672,14 +672,15 @@ dri2_x11_add_configs_for_visuals(struct dri2_egl_display *dri2_dpy, dri2_add_config(disp, dri2_dpy-driver_configs[j], id++, surface_type, config_attrs, rgba_masks); -/* Allow a 24-bit RGB visual to match a 32-bit RGBA EGLConfig. - * Otherwise it will only match a 32-bit RGBA visual. On a - * composited window manager on X11, this will make all of the - * EGLConfigs with destination alpha get blended by the - * compositor. This is probably not what the application - * wants... especially on drivers that only have 32-bit RGBA - * EGLConfigs! */ -if (d.data-depth == 24) { +/* Allow both 24-bit RGB visual and 32 bit RGBA to match a 32-bit + * RGBA EGLConfig. Otherwise it will only match a 32-bit RGBA + * visual. On a composited window manager on X11, this will make + * all of the EGLConfigs with destination alpha get blended by the + * compositor. This is probably not what the application wants... + * especially on drivers that only have 32-bit RGBA EGLConfigs! + * Allowing both allows applications to make the decision whether + * 32 bit visuals are intended */ +if (d.data-depth == 24 || d.data-depth == 32) { rgba_masks[3] = ~(rgba_masks[0] | rgba_masks[1] | rgba_masks[2]); dri2_add_config(disp, dri2_dpy-driver_configs[j], id++, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] mesa: add runtime support for SSSE3
On Thu, 2014-11-06 at 09:59 -0800, Matt Turner wrote: On Thu, Nov 6, 2014 at 4:18 AM, Timothy Arceri t_arc...@yahoo.com.au wrote: Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- configure.ac | 6 ++ src/mesa/x86/common_x86.c | 4 src/mesa/x86/common_x86_features.h | 4 +++- 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 03f1bca..cc0a661 100644 --- a/configure.ac +++ b/configure.ac @@ -258,6 +258,12 @@ if test x$SSE41_SUPPORTED = x1; then fi AM_CONDITIONAL([SSE41_SUPPORTED], [test x$SSE41_SUPPORTED = x1]) +AX_CHECK_COMPILE_FLAG([-mssse3], [SSSE3_SUPPORTED=1], [SSSE3_SUPPORTED=0]) +if test x$SSSE3_SUPPORTED = x1; then +DEFINES=$DEFINES -DUSE_SSSE3 +fi +AM_CONDITIONAL([SSSE3_SUPPORTED], [test x$SSSE3_SUPPORTED = x1]) + dnl Can't have static and shared libraries, default to static if user dnl explicitly requested. If both disabled, set to static since shared dnl was explicitly requested. diff --git a/src/mesa/x86/common_x86.c b/src/mesa/x86/common_x86.c index 25f5c40..ad0648a 100644 --- a/src/mesa/x86/common_x86.c +++ b/src/mesa/x86/common_x86.c @@ -352,6 +352,10 @@ _mesa_get_x86_features(void) __get_cpuid(1, eax, ebx, ecx, edx); +#ifdef bit_SSSE3 Why did you put this in an #ifdef bit_SSSE3? That's defined everywhere. I have no idea. I'll fix it. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy
On Thu, 2014-11-06 at 10:03 -0800, Matt Turner wrote: On Thu, Nov 6, 2014 at 4:20 AM, Timothy Arceri t_arc...@yahoo.com.au wrote: Also cleans up some if statements in the *faster functions. Callgrind cpu usage results from pts benchmarks: For ytile_copy_faster() Nexuiz 1.6.1: 2.16% - 1.20% Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- src/mesa/Makefile.am | 8 +++ src/mesa/drivers/dri/i965/intel_tex_subimage.c | 82 ++ src/mesa/main/fast_rgba8_copy.c| 78 src/mesa/main/fast_rgba8_copy.h| 37 4 files changed, 141 insertions(+), 64 deletions(-) create mode 100644 src/mesa/main/fast_rgba8_copy.c create mode 100644 src/mesa/main/fast_rgba8_copy.h diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am index e71bccb..2402096 100644 --- a/src/mesa/Makefile.am +++ b/src/mesa/Makefile.am @@ -107,6 +107,10 @@ AM_CXXFLAGS = $(LLVM_CFLAGS) $(VISIBILITY_CXXFLAGS) ARCH_LIBS = +if SSSE3_SUPPORTED +ARCH_LIBS += libmesa_ssse3.la +endif + if SSE41_SUPPORTED ARCH_LIBS += libmesa_sse41.la endif @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \ main/streaming-load-memcpy.c libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1 +libmesa_ssse3_la_SOURCES = \ + main/fast_rgba8_copy.c +libmesa_ssse3_la_CFLAGS = $(AM_CFLAGS) -mssse3 + pkgconfigdir = $(libdir)/pkgconfig pkgconfig_DATA = gl.pc diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c b/src/mesa/drivers/dri/i965/intel_tex_subimage.c index cb5738a..0deeb75 100644 --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c @@ -27,6 +27,7 @@ **/ #include main/bufferobj.h +#include main/fast_rgba8_copy.h #include main/image.h #include main/macros.h #include main/mtypes.h @@ -42,9 +43,7 @@ #include intel_mipmap_tree.h #include intel_blit.h -#ifdef __SSSE3__ -#include tmmintrin.h -#endif +#include x86/common_x86_asm.h #define FILE_DEBUG_FLAG DEBUG_TEXTURE @@ -175,18 +174,6 @@ err: return false; } -#ifdef __SSSE3__ -static const uint8_t rgba8_permutation[16] = - { 2,1,0,3, 6,5,4,7, 10,9,8,11, 14,13,12,15 }; - -/* NOTE: dst must be 16 byte aligned */ -#define rgba8_copy_16(dst, src) \ - *(__m128i *)(dst) = _mm_shuffle_epi8(\ - (__m128i) _mm_loadu_ps((float *)(src)), \ - *(__m128i *) rgba8_permutation\ - ) -#endif - /** * Copy RGBA to BGRA - swap R and B. */ @@ -196,29 +183,6 @@ rgba8_copy(void *dst, const void *src, size_t bytes) uint8_t *d = dst; uint8_t const *s = src; -#ifdef __SSSE3__ - /* Fast copying for tile spans. -* -* As long as the destination texture is 16 aligned, -* any 16 or 64 spans we get here should also be 16 aligned. -*/ - - if (bytes == 16) { - assert(!(((uintptr_t)dst) 0xf)); - rgba8_copy_16(d+ 0, s+ 0); - return dst; - } - - if (bytes == 64) { - assert(!(((uintptr_t)dst) 0xf)); - rgba8_copy_16(d+ 0, s+ 0); - rgba8_copy_16(d+16, s+16); - rgba8_copy_16(d+32, s+32); - rgba8_copy_16(d+48, s+48); - return dst; - } -#endif - while (bytes = 4) { d[0] = s[2]; d[1] = s[1]; @@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, mem_copy_fn mem_copy) { if (x0 == 0 x3 == xtile_width y0 == 0 y1 == xtile_height) { - if (mem_copy == memcpy) - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, - dst, src, src_pitch, swizzle_bit, memcpy); - else if (mem_copy == rgba8_copy) - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, - dst, src, src_pitch, swizzle_bit, rgba8_copy); - } else { - if (mem_copy == memcpy) - return xtile_copy(x0, x1, x2, x3, y0, y1, - dst, src, src_pitch, swizzle_bit, memcpy); - else if (mem_copy == rgba8_copy) - return xtile_copy(x0, x1, x2, x3, y0, y1, - dst, src, src_pitch, swizzle_bit, rgba8_copy); + return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, +dst, src, src_pitch, swizzle_bit, mem_copy); } xtile_copy(x0, x1, x2, x3, y0, y1, dst, src, src_pitch, swizzle_bit, mem_copy); @@ -388,19 +341,8 @@ ytile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, mem_copy_fn mem_copy) { if (x0 == 0 x3 == ytile_width y0 == 0 y1 == ytile_height) { -
[Mesa-dev] [PATCH] gbm: dlopen libglapi so gbm_create_device works
From: Frank Henigman fjhenig...@chromium.org Dri driver libs are not linked to pull in libglapi so gbm_create_device() fails when it tries to dlopen them (unless the application is linked with something that does pull in libglapi, like libGL). Until dri drivers can be fixed properly, dlopen libglapi before trying to dlopen them. https://bugs.freedesktop.org/show_bug.cgi?id=57702 Signed-off-by: Frank Henigman fjhenig...@google.com --- src/gbm/backends/dri/gbm_dri.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/gbm/backends/dri/gbm_dri.c b/src/gbm/backends/dri/gbm_dri.c index f637e32..6ea2294 100644 --- a/src/gbm/backends/dri/gbm_dri.c +++ b/src/gbm/backends/dri/gbm_dri.c @@ -311,6 +311,11 @@ dri_open_driver(struct gbm_dri_device *dri) if (search_paths == NULL) search_paths = DEFAULT_DRIVER_DIR; + /* Temporarily work around dri driver libs that need symbols in libglapi +* but don't automatically link it in. +*/ + dlopen(libglapi.so.0, RTLD_LAZY | RTLD_GLOBAL); + dri-driver = NULL; end = search_paths + strlen(search_paths); for (p = search_paths; p end dri-driver == NULL; p = next + 1) { -- 2.1.0.rc2.206.gedb03e5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] mesa: add runtime support for SSSE3
On 11/06/2014 04:18 AM, Timothy Arceri wrote: Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- configure.ac | 6 ++ src/mesa/x86/common_x86.c | 4 src/mesa/x86/common_x86_features.h | 4 +++- 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index 03f1bca..cc0a661 100644 --- a/configure.ac +++ b/configure.ac @@ -258,6 +258,12 @@ if test x$SSE41_SUPPORTED = x1; then fi AM_CONDITIONAL([SSE41_SUPPORTED], [test x$SSE41_SUPPORTED = x1]) +AX_CHECK_COMPILE_FLAG([-mssse3], [SSSE3_SUPPORTED=1], [SSSE3_SUPPORTED=0]) +if test x$SSSE3_SUPPORTED = x1; then +DEFINES=$DEFINES -DUSE_SSSE3 +fi +AM_CONDITIONAL([SSSE3_SUPPORTED], [test x$SSSE3_SUPPORTED = x1]) + Just so people can find things, the SSSE3 check should logically come before the SSE4.1 check. dnl Can't have static and shared libraries, default to static if user dnl explicitly requested. If both disabled, set to static since shared dnl was explicitly requested. diff --git a/src/mesa/x86/common_x86.c b/src/mesa/x86/common_x86.c index 25f5c40..ad0648a 100644 --- a/src/mesa/x86/common_x86.c +++ b/src/mesa/x86/common_x86.c @@ -352,6 +352,10 @@ _mesa_get_x86_features(void) __get_cpuid(1, eax, ebx, ecx, edx); +#ifdef bit_SSSE3 + if (ecx bit_SSSE3) + _mesa_x86_cpu_features |= X86_FEATURE_SSSE3; +#endif if (ecx bit_SSE4_1) _mesa_x86_cpu_features |= X86_FEATURE_SSE4_1; } diff --git a/src/mesa/x86/common_x86_features.h b/src/mesa/x86/common_x86_features.h index 66f2cf6..6eb2b38 100644 --- a/src/mesa/x86/common_x86_features.h +++ b/src/mesa/x86/common_x86_features.h @@ -43,7 +43,8 @@ #define X86_FEATURE_XMM2 (16) #define X86_FEATURE_3DNOWEXT (17) #define X86_FEATURE_3DNOW(18) -#define X86_FEATURE_SSE4_1 (19) +#define X86_FEATURE_SSSE3(19) +#define X86_FEATURE_SSE4_1 (110) /* standard X86 CPU features */ #define X86_CPU_FPU (10) @@ -65,6 +66,7 @@ #define cpu_has_xmm2 (_mesa_x86_cpu_features X86_FEATURE_XMM2) #define cpu_has_3dnow(_mesa_x86_cpu_features X86_FEATURE_3DNOW) #define cpu_has_3dnowext (_mesa_x86_cpu_features X86_FEATURE_3DNOWEXT) +#define cpu_has_ssse3(_mesa_x86_cpu_features X86_FEATURE_SSSE3) #define cpu_has_sse4_1 (_mesa_x86_cpu_features X86_FEATURE_SSE4_1) #endif ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] megadriver: explicitly link against glapi, link with -no-undefined
On Thu, Nov 6, 2014 at 8:09 AM, Emil Velikov emil.l.veli...@gmail.com wrote: Indeed. static-glapi does not get too much testing, plus it seems that it's broken (in a way) for a lng time. It seems that we'll have to (temporary) resolve with shoving dlopen(libglapi.so) into gbm, so that in time programs can nuke it from their codebase. glapi is internal (implementation) detail that they should not need to bother/know. Here you go https://freedesktop.org/patch/36392/ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy
On Thu, Nov 6, 2014 at 1:22 PM, Timothy Arceri t_arc...@yahoo.com.au wrote: On Thu, 2014-11-06 at 10:03 -0800, Matt Turner wrote: On Thu, Nov 6, 2014 at 4:20 AM, Timothy Arceri t_arc...@yahoo.com.au wrote: Also cleans up some if statements in the *faster functions. Callgrind cpu usage results from pts benchmarks: For ytile_copy_faster() Nexuiz 1.6.1: 2.16% - 1.20% Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au --- src/mesa/Makefile.am | 8 +++ src/mesa/drivers/dri/i965/intel_tex_subimage.c | 82 ++ src/mesa/main/fast_rgba8_copy.c| 78 src/mesa/main/fast_rgba8_copy.h| 37 4 files changed, 141 insertions(+), 64 deletions(-) create mode 100644 src/mesa/main/fast_rgba8_copy.c create mode 100644 src/mesa/main/fast_rgba8_copy.h diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am index e71bccb..2402096 100644 --- a/src/mesa/Makefile.am +++ b/src/mesa/Makefile.am @@ -107,6 +107,10 @@ AM_CXXFLAGS = $(LLVM_CFLAGS) $(VISIBILITY_CXXFLAGS) ARCH_LIBS = +if SSSE3_SUPPORTED +ARCH_LIBS += libmesa_ssse3.la +endif + if SSE41_SUPPORTED ARCH_LIBS += libmesa_sse41.la endif @@ -154,6 +158,10 @@ libmesa_sse41_la_SOURCES = \ main/streaming-load-memcpy.c libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1 +libmesa_ssse3_la_SOURCES = \ + main/fast_rgba8_copy.c +libmesa_ssse3_la_CFLAGS = $(AM_CFLAGS) -mssse3 + pkgconfigdir = $(libdir)/pkgconfig pkgconfig_DATA = gl.pc diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c b/src/mesa/drivers/dri/i965/intel_tex_subimage.c index cb5738a..0deeb75 100644 --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c @@ -27,6 +27,7 @@ **/ #include main/bufferobj.h +#include main/fast_rgba8_copy.h #include main/image.h #include main/macros.h #include main/mtypes.h @@ -42,9 +43,7 @@ #include intel_mipmap_tree.h #include intel_blit.h -#ifdef __SSSE3__ -#include tmmintrin.h -#endif +#include x86/common_x86_asm.h #define FILE_DEBUG_FLAG DEBUG_TEXTURE @@ -175,18 +174,6 @@ err: return false; } -#ifdef __SSSE3__ -static const uint8_t rgba8_permutation[16] = - { 2,1,0,3, 6,5,4,7, 10,9,8,11, 14,13,12,15 }; - -/* NOTE: dst must be 16 byte aligned */ -#define rgba8_copy_16(dst, src) \ - *(__m128i *)(dst) = _mm_shuffle_epi8(\ - (__m128i) _mm_loadu_ps((float *)(src)), \ - *(__m128i *) rgba8_permutation\ - ) -#endif - /** * Copy RGBA to BGRA - swap R and B. */ @@ -196,29 +183,6 @@ rgba8_copy(void *dst, const void *src, size_t bytes) uint8_t *d = dst; uint8_t const *s = src; -#ifdef __SSSE3__ - /* Fast copying for tile spans. -* -* As long as the destination texture is 16 aligned, -* any 16 or 64 spans we get here should also be 16 aligned. -*/ - - if (bytes == 16) { - assert(!(((uintptr_t)dst) 0xf)); - rgba8_copy_16(d+ 0, s+ 0); - return dst; - } - - if (bytes == 64) { - assert(!(((uintptr_t)dst) 0xf)); - rgba8_copy_16(d+ 0, s+ 0); - rgba8_copy_16(d+16, s+16); - rgba8_copy_16(d+32, s+32); - rgba8_copy_16(d+48, s+48); - return dst; - } -#endif - while (bytes = 4) { d[0] = s[2]; d[1] = s[1]; @@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, mem_copy_fn mem_copy) { if (x0 == 0 x3 == xtile_width y0 == 0 y1 == xtile_height) { - if (mem_copy == memcpy) - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, - dst, src, src_pitch, swizzle_bit, memcpy); - else if (mem_copy == rgba8_copy) - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, - dst, src, src_pitch, swizzle_bit, rgba8_copy); - } else { - if (mem_copy == memcpy) - return xtile_copy(x0, x1, x2, x3, y0, y1, - dst, src, src_pitch, swizzle_bit, memcpy); - else if (mem_copy == rgba8_copy) - return xtile_copy(x0, x1, x2, x3, y0, y1, - dst, src, src_pitch, swizzle_bit, rgba8_copy); + return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, +dst, src, src_pitch, swizzle_bit, mem_copy); } xtile_copy(x0, x1, x2, x3, y0, y1, dst, src, src_pitch, swizzle_bit, mem_copy); @@ -388,19 +341,8 @@ ytile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, mem_copy_fn mem_copy)
Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy
On 11/06/2014 02:12 PM, Matt Turner wrote: On Thu, Nov 6, 2014 at 1:22 PM, Timothy Arceri t_arc...@yahoo.com.au wrote: On Thu, 2014-11-06 at 10:03 -0800, Matt Turner wrote: On Thu, Nov 6, 2014 at 4:20 AM, Timothy Arceri t_arc...@yahoo.com.au wrote: +#include assert.h +#include stdint.h +#include stddef.h I don't think you need these three includes for a single prototype. Right I can move assert to the .c Presumably one of the others can be removed as well? I don't know what defines size_t. stddef.h since C89 at least. + +/* Fast copying for tile spans. + * + * As long as the destination texture is 16 aligned, + * any 16 or 64 spans we get here should also be 16 aligned. + */ +void * +_mesa_fast_rgba8_copy(void *dst, const void *src, size_t n); -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation forglDrawElements
Am Donnerstag, 6. November 2014, 11:55:59 schrieb Matt Turner: On Wed, Nov 5, 2014 at 12:54 PM, Matt Turner matts...@gmail.com wrote: On Wed, Nov 5, 2014 at 12:50 PM, Timothy Arceri t_arc...@yahoo.com.au wrote: There have been quite a few eyes over this now but nobody has given it a reviewed by yet. Would be nice to get it in before the code freeze. Any takers? Yes, I'll make sure that happens. I made a couple of trivial changes to the commit message and added some spaces between __m128i and * in casts and pushed it with review. somehow this managed to successfully break my build - so finally a nice testcase :-) Using CFLAGS=-march=native -fLTO. My cpu does not support sse4.1. During compile time, -mno-sse4.1 -msse4.1 is passed, while during linktime, only - mno-sse4.1 is passed. usr/lib64/gcc/x86_64-suse-linux/4.9/include/smmintrin.h: In function 'vbo_get_minmax_indices': /usr/lib64/gcc/x86_64-suse-linux/4.9/include/smmintrin.h:320:20: error: '__builtin_ia32_pmaxud128' needs isa option -m32 -msse4.1 return (__m128i) __builtin_ia32_pmaxud128 ((__v4si)__X, (__v4si)__Y); ^ /usr/lib64/gcc/x86_64-suse-linux/4.9/include/smmintrin.h:314:20: error: '__builtin_ia32_pminud128' needs isa option -m32 -msse4.1 return (__m128i) __builtin_ia32_pminud128 ((__v4si)__X, (__v4si)__Y); ^ make[4]: *** [/tmp/cc5QZcUd.ltrans1.ltrans.o] Error 1 So opposite to my earlier thoughts, the -m flags are not kept per-file during link. On the other hand, removing -march=native gives: CXXLDgallium_dri.la ../../src/mesa/vbo/vbo_exec_array.c: In function 'vbo_get_minmax_indices': ../../src/mesa/vbo/vbo_exec_array.c:197:0: internal compiler error: in propagate_rhs_into_lhs, at tree-ssa-dom.c:2913 Please submit a full bug report, with preprocessed source if appropriate. See http://bugs.opensuse.org/ for instructions. with gcc-4.9.2 from opensuse/factory, so who knows what is really happening. Trying with gcc-4.8 also fails like in the first example, even without march=native, maybe because the default march is x86_64 which also does not support see4.1. So this is only a problem with Link-Time-Opt. Marc ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation forglDrawElements
On Thu, Nov 6, 2014 at 2:33 PM, Marc Dietrich marvi...@gmx.de wrote: So this is only a problem with Link-Time-Opt. I don't actually know how you're getting to this point in the build with LTO. It fails for me in src/mapi. I think LTO is something we should make work at some point, but I don't think we should gate contributions on whether LTO works or not. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy
On Thu 06 Nov 2014, Timothy Arceri wrote: Also cleans up some if statements in the *faster functions. I have comments about the cleanup below. diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c b/src/mesa/drivers/dri/i965/intel_tex_subimage.c index cb5738a..0deeb75 100644 --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c /** * Copy texture data from linear to X tile layout, faster. * * Same as \ref xtile_copy but faster, because it passes constant parameters * for common cases, allowing the compiler to inline code optimized for those * cases. * * \copydoc tile_copy_fn */ static FLATTEN void xtile_copy_faster(...) @@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, mem_copy_fn mem_copy) { if (x0 == 0 x3 == xtile_width y0 == 0 y1 == xtile_height) { - if (mem_copy == memcpy) - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, - dst, src, src_pitch, swizzle_bit, memcpy); - else if (mem_copy == rgba8_copy) - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, - dst, src, src_pitch, swizzle_bit, rgba8_copy); - } else { - if (mem_copy == memcpy) - return xtile_copy(x0, x1, x2, x3, y0, y1, - dst, src, src_pitch, swizzle_bit, memcpy); - else if (mem_copy == rgba8_copy) - return xtile_copy(x0, x1, x2, x3, y0, y1, - dst, src, src_pitch, swizzle_bit, rgba8_copy); + return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, +dst, src, src_pitch, swizzle_bit, mem_copy); } xtile_copy(x0, x1, x2, x3, y0, y1, dst, src, src_pitch, swizzle_bit, mem_copy); The cleanup of this if tree concerns me. Accoring the function comment, the original author of this function, fjhenigman, clearly created the weird 'if' tree with the intentation that the compiler would inline code optimized for those cases. Without one of the following, I object to this cleanup: - Frank's approval, or - Proof that gcc never does the desired optimizations, or - Proof that this change does not harm's Chrome's texture upload performance. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy
I tested your patch with the teximage program in mesa demos, the same thing I used to benchmark when I developed this code. As Matt and Chad point out, the odd-looking _faster functions are there for a reason. Your change causes a huge slowdown. I tested on a sandybridge system with a Intel(R) Celeron(R) CPU 857 @ 1.20GHz. Mesa compiled with -O2. original code: TexSubImage(RGBA/ubyte 256 x 256): 9660.4 images/sec, 2415.1 MB/sec TexSubImage(RGBA/ubyte 1024 x 1024): 821.2 images/sec, 3284.7 MB/sec TexSubImage(RGBA/ubyte 4096 x 4096): 76.3 images/sec, 4884.9 MB/sec TexSubImage(BGRA/ubyte 256 x 256): 11307.1 images/sec, 2826.8 MB/sec TexSubImage(BGRA/ubyte 1024 x 1024): 944.6 images/sec, 3778.6 MB/sec TexSubImage(BGRA/ubyte 4096 x 4096): 76.7 images/sec, 4908.3 MB/sec TexSubImage(L/ubyte 256 x 256): 17847.5 images/sec, 1115.5 MB/sec TexSubImage(L/ubyte 1024 x 1024): 3068.2 images/sec, 3068.2 MB/sec TexSubImage(L/ubyte 4096 x 4096): 224.6 images/sec, 3593.0 MB/sec your code: TexSubImage(RGBA/ubyte 256 x 256): 3271.6 images/sec, 817.9 MB/sec TexSubImage(RGBA/ubyte 1024 x 1024): 232.3 images/sec, 929.2 MB/sec TexSubImage(RGBA/ubyte 4096 x 4096): 47.5 images/sec, 3038.6 MB/sec TexSubImage(BGRA/ubyte 256 x 256): 2426.5 images/sec, 606.6 MB/sec TexSubImage(BGRA/ubyte 1024 x 1024): 164.1 images/sec, 656.4 MB/sec TexSubImage(BGRA/ubyte 4096 x 4096): 13.4 images/sec, 854.8 MB/sec TexSubImage(L/ubyte 256 x 256): 9514.5 images/sec, 594.7 MB/sec TexSubImage(L/ubyte 1024 x 1024): 864.1 images/sec, 864.1 MB/sec TexSubImage(L/ubyte 4096 x 4096): 59.7 images/sec, 955.2 MB/sec This is just one run, not an average, but you can see it's slower across the board up to a factor of around 6. Also I couldn't configure the build after your patch. I think you left out a change to configure.ac to define SSSE3_SUPPORTED. On Thu, Nov 6, 2014 at 6:08 PM, Chad Versace chad.vers...@intel.com wrote: On Thu 06 Nov 2014, Timothy Arceri wrote: Also cleans up some if statements in the *faster functions. I have comments about the cleanup below. diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c b/src/mesa/drivers/dri/i965/intel_tex_subimage.c index cb5738a..0deeb75 100644 --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c /** * Copy texture data from linear to X tile layout, faster. * * Same as \ref xtile_copy but faster, because it passes constant parameters * for common cases, allowing the compiler to inline code optimized for those * cases. * * \copydoc tile_copy_fn */ static FLATTEN void xtile_copy_faster(...) @@ -352,19 +316,8 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, mem_copy_fn mem_copy) { if (x0 == 0 x3 == xtile_width y0 == 0 y1 == xtile_height) { - if (mem_copy == memcpy) - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, - dst, src, src_pitch, swizzle_bit, memcpy); - else if (mem_copy == rgba8_copy) - return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, - dst, src, src_pitch, swizzle_bit, rgba8_copy); - } else { - if (mem_copy == memcpy) - return xtile_copy(x0, x1, x2, x3, y0, y1, - dst, src, src_pitch, swizzle_bit, memcpy); - else if (mem_copy == rgba8_copy) - return xtile_copy(x0, x1, x2, x3, y0, y1, - dst, src, src_pitch, swizzle_bit, rgba8_copy); + return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, +dst, src, src_pitch, swizzle_bit, mem_copy); } xtile_copy(x0, x1, x2, x3, y0, y1, dst, src, src_pitch, swizzle_bit, mem_copy); The cleanup of this if tree concerns me. Accoring the function comment, the original author of this function, fjhenigman, clearly created the weird 'if' tree with the intentation that the compiler would inline code optimized for those cases. Without one of the following, I object to this cleanup: - Frank's approval, or - Proof that gcc never does the desired optimizations, or - Proof that this change does not harm's Chrome's texture upload performance. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy
On Thu, Nov 6, 2014 at 7:30 PM, Frank Henigman fjhenig...@google.com wrote: Also I couldn't configure the build after your patch. I think you left out a change to configure.ac to define SSSE3_SUPPORTED. Ah, that was in patch 1/2. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
How and when is cpu_has_sse4_1 true? Is it controllable at runtime through setting some environmental variable? or is it set once during startup by detecting CPU features? I guess checking for cpu_has_sse4_1 is unnecessary if it isn't controllable by user at runtime; because USE_SSE41 is a compile time check and requires the target machine to be SSE 4.1 capable already. Best regards, Siavash Eliasi. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On Thu, Nov 6, 2014 at 1:30 AM, Siavash Eliasi siavashser...@gmail.com wrote: How and when is cpu_has_sse4_1 true? Is it controllable at runtime through setting some environmental variable? or is it set once during startup by detecting CPU features? It's actually a macro, but yes, see the end of src/mesa/x86/common_x86.c. It's set by using the CPUID instruction to detect SSE 4.1 capabilities. if (ecx bit_SSE4_1) _mesa_x86_cpu_features |= X86_FEATURE_SSE4_1; I guess checking for cpu_has_sse4_1 is unnecessary if it isn't controllable by user at runtime; because USE_SSE41 is a compile time check and requires the target machine to be SSE 4.1 capable already. Right. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/blorp: Fix hiz ops on MSAA surfaces
On Tue, Nov 4, 2014 at 8:40 AM, Chris Forbes chr...@ijw.co.nz wrote: Two things were broken here: - The depth/stencil surface dimensions were broken for MSAA. - Sample count was programmed incorrectly. Result was the depth resolve didn't work correctly on MSAA surfaces, and so sampling the surface later produced garbage. Fixes the new piglit test arb_texture_multisample-sample-depth, and various artifacts in 'tesseract' with msaa=4 glineardepth=0. Not observed any piglit regressions on Haswell. Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/drivers/dri/i965/brw_blorp.h| 4 src/mesa/drivers/dri/i965/gen7_blorp.cpp | 24 +--- 2 files changed, 21 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h b/src/mesa/drivers/dri/i965/brw_blorp.h index ff68000..c4ff0f7 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp.h +++ b/src/mesa/drivers/dri/i965/brw_blorp.h @@ -236,6 +236,10 @@ public: bool use_wm_prog; brw_blorp_wm_push_constants wm_push_consts; bool color_write_disable[4]; + + unsigned num_samples() const { + return dst.mt ? dst.num_samples : depth.mt-num_samples; + } }; diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp b/src/mesa/drivers/dri/i965/gen7_blorp.cpp index 206a6ff..cc57ffe 100644 --- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp +++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp @@ -415,7 +415,7 @@ gen7_blorp_emit_sf_config(struct brw_context *brw, OUT_BATCH(_3DSTATE_SF 16 | (7 - 2)); OUT_BATCH(params-depth_format GEN7_SF_DEPTH_BUFFER_SURFACE_FORMAT_SHIFT); - OUT_BATCH(params-dst.num_samples 1 ? GEN6_SF_MSRAST_ON_PATTERN : 0); + OUT_BATCH(params-num_samples() 1 ? GEN6_SF_MSRAST_ON_PATTERN : 0); OUT_BATCH(0); OUT_BATCH(0); OUT_BATCH(0); @@ -470,7 +470,7 @@ gen7_blorp_emit_wm_config(struct brw_context *brw, dw1 |= GEN7_WM_DISPATCH_ENABLE; /* We are rendering */ } - if (params-dst.num_samples 1) { + if (params-num_samples() 1) { dw1 |= GEN7_WM_MSRAST_ON_PATTERN; if (prog_data prog_data-persample_msaa_dispatch) dw2 |= GEN7_WM_MSDISPMODE_PERSAMPLE; @@ -661,8 +661,17 @@ gen7_blorp_emit_depth_stencil_config(struct brw_context *brw, * larger to allow the fast depth clear to fit the hardware * alignment requirements. (8x4) */ - surfwidth = params-depth.width; - surfheight = params-depth.height; + + if (params-num_samples() 1) { + /* If this is an MSAA + HIZ op, we need to program the + * aligned logical size of the depth surface. + */ + surfwidth = ALIGN(params-depth.mt-logical_width0, 8); + surfheight = ALIGN(params-depth.mt-logical_height0, 4); + } else { + surfwidth = params-depth.width; + surfheight = params-depth.height; + } } else { surfwidth = params-depth.mt-logical_width0; surfheight = params-depth.mt-logical_height0; @@ -805,10 +814,11 @@ gen7_blorp_exec(struct brw_context *brw, uint32_t sampler_offset = 0; uint32_t prog_offset = params-get_wm_prog(brw, prog_data); - gen6_emit_3dstate_multisample(brw, params-dst.num_samples); + unsigned num_samples = params-num_samples(); + gen6_emit_3dstate_multisample(brw, num_samples); gen6_emit_3dstate_sample_mask(brw, - params-dst.num_samples 1 ? - (1 params-dst.num_samples) - 1 : 1); + num_samples 1 ? + (1 num_samples) - 1 : 1); gen6_blorp_emit_state_base_address(brw, params); gen6_blorp_emit_vertices(brw, params); gen7_blorp_emit_urb_config(brw, params); -- 2.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Looks good to me. Verified the requirement in IVB PRM. Reviewed-by: Anuj Phogat anuj.pho...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] egl: dri2: Use present extension. (Was: Re: [RFC] egl: Add DRI3 support to the EGL backend.)
On 06.11.2014 19:18, Joonas Lahtinen wrote: On to, 2014-11-06 at 18:12 +0900, Michel Dänzer wrote: On 05.11.2014 20:14, Joonas Lahtinen wrote: Modified not refer to DRI3, just uses the present extension to get rid of the excess buffer invalidations. AFAICT there's no fallback from your changes to the current behaviour if the X server doesn't support the Present extension. There probably needs to be such a fallback. It gets rid of such nasty hack (the intel_viewport one), that I thought there is no point making fallback. Because without this, the egl dri2 backend is fundamentally broken anyway. Well, AFAICT your code uses Present extension functionality unconditionally, without checking that the X server supports Present. I can't see how that could possibly work on an X server which doesn't support Present, but I think it would be better to keep it working at least as badly as it does now in that case. :) -- Earthling Michel Dänzer| http://www.amd.com Libre software enthusiast |Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/blorp: Fix hiz ops on MSAA surfaces
Thanks for the review. Ken has a slightly cleaner version of the patch which avoids adding the helper function. On Nov 7, 2014 2:29 AM, Anuj Phogat anuj.pho...@gmail.com wrote: On Tue, Nov 4, 2014 at 8:40 AM, Chris Forbes chr...@ijw.co.nz wrote: Two things were broken here: - The depth/stencil surface dimensions were broken for MSAA. - Sample count was programmed incorrectly. Result was the depth resolve didn't work correctly on MSAA surfaces, and so sampling the surface later produced garbage. Fixes the new piglit test arb_texture_multisample-sample-depth, and various artifacts in 'tesseract' with msaa=4 glineardepth=0. Not observed any piglit regressions on Haswell. Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/drivers/dri/i965/brw_blorp.h| 4 src/mesa/drivers/dri/i965/gen7_blorp.cpp | 24 +--- 2 files changed, 21 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h b/src/mesa/drivers/dri/i965/brw_blorp.h index ff68000..c4ff0f7 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp.h +++ b/src/mesa/drivers/dri/i965/brw_blorp.h @@ -236,6 +236,10 @@ public: bool use_wm_prog; brw_blorp_wm_push_constants wm_push_consts; bool color_write_disable[4]; + + unsigned num_samples() const { + return dst.mt ? dst.num_samples : depth.mt-num_samples; + } }; diff --git a/src/mesa/drivers/dri/i965/gen7_blorp.cpp b/src/mesa/drivers/dri/i965/gen7_blorp.cpp index 206a6ff..cc57ffe 100644 --- a/src/mesa/drivers/dri/i965/gen7_blorp.cpp +++ b/src/mesa/drivers/dri/i965/gen7_blorp.cpp @@ -415,7 +415,7 @@ gen7_blorp_emit_sf_config(struct brw_context *brw, OUT_BATCH(_3DSTATE_SF 16 | (7 - 2)); OUT_BATCH(params-depth_format GEN7_SF_DEPTH_BUFFER_SURFACE_FORMAT_SHIFT); - OUT_BATCH(params-dst.num_samples 1 ? GEN6_SF_MSRAST_ON_PATTERN : 0); + OUT_BATCH(params-num_samples() 1 ? GEN6_SF_MSRAST_ON_PATTERN : 0); OUT_BATCH(0); OUT_BATCH(0); OUT_BATCH(0); @@ -470,7 +470,7 @@ gen7_blorp_emit_wm_config(struct brw_context *brw, dw1 |= GEN7_WM_DISPATCH_ENABLE; /* We are rendering */ } - if (params-dst.num_samples 1) { + if (params-num_samples() 1) { dw1 |= GEN7_WM_MSRAST_ON_PATTERN; if (prog_data prog_data-persample_msaa_dispatch) dw2 |= GEN7_WM_MSDISPMODE_PERSAMPLE; @@ -661,8 +661,17 @@ gen7_blorp_emit_depth_stencil_config(struct brw_context *brw, * larger to allow the fast depth clear to fit the hardware * alignment requirements. (8x4) */ - surfwidth = params-depth.width; - surfheight = params-depth.height; + + if (params-num_samples() 1) { + /* If this is an MSAA + HIZ op, we need to program the + * aligned logical size of the depth surface. + */ + surfwidth = ALIGN(params-depth.mt-logical_width0, 8); + surfheight = ALIGN(params-depth.mt-logical_height0, 4); + } else { + surfwidth = params-depth.width; + surfheight = params-depth.height; + } } else { surfwidth = params-depth.mt-logical_width0; surfheight = params-depth.mt-logical_height0; @@ -805,10 +814,11 @@ gen7_blorp_exec(struct brw_context *brw, uint32_t sampler_offset = 0; uint32_t prog_offset = params-get_wm_prog(brw, prog_data); - gen6_emit_3dstate_multisample(brw, params-dst.num_samples); + unsigned num_samples = params-num_samples(); + gen6_emit_3dstate_multisample(brw, num_samples); gen6_emit_3dstate_sample_mask(brw, - params-dst.num_samples 1 ? - (1 params-dst.num_samples) - 1 : 1); + num_samples 1 ? + (1 num_samples) - 1 : 1); gen6_blorp_emit_state_base_address(brw, params); gen6_blorp_emit_vertices(brw, params); gen7_blorp_emit_urb_config(brw, params); -- 2.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Looks good to me. Verified the requirement in IVB PRM. Reviewed-by: Anuj Phogat anuj.pho...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] egl_dri2: add a note about dri2_create_screen
The function is not called by platform_drm. As such one needs to pay special attention at teardown. Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/egl/drivers/dri2/egl_dri2.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index 609afde..2094ffd 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -543,6 +543,9 @@ dri2_setup_screen(_EGLDisplay *disp) } } +/* All platforms but DRM call this function to create the screen, query the + * dri extensions, setup the vtables and populate the driver_configs. + * DRM inherits all that information from its display - GBM. */ EGLBoolean dri2_create_screen(_EGLDisplay *disp) { -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] egl_dri2: fix double free on drm platforms
Earlier commit failed to attribure that for drm platforms one does not call dri2_create_screen, thus it does not create the screen and driver_configs but inherits them from the display - gbm. As such wrap cleanup in Platform != _EGL_PLATFORM_DRM to prevent the issue and still cleanup correctly for non-drm platforms. Cc: Kenneth Graunke kenn...@whitecape.org Cc: Mark Janes mark.a.ja...@intel.com Reported-by: Kenneth Graunke kenn...@whitecape.org Reported-by: Mark Janes mark.a.ja...@intel.com Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/egl/drivers/dri2/egl_dri2.c | 9 + 1 file changed, 9 insertions(+) diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index dcc3239..609afde 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -707,9 +707,18 @@ dri2_terminate(_EGLDriver *drv, _EGLDisplay *disp) break; } + /* The drm platform does not create the screen/driver_configs but reuses +* the ones from the gbm device. As such the gbm itself is responsible +* for the cleanup. */ +#ifdef HAVE_DRM_PLATFORM + if (disp-Platform != _EGL_PLATFORM_DRM) { +#endif for (i = 0; dri2_dpy-driver_configs[i]; i++) free((__DRIconfig *) dri2_dpy-driver_configs[i]); free(dri2_dpy-driver_configs); +#ifdef HAVE_DRM_PLATFORM + } +#endif free(dri2_dpy); disp-DriverData = NULL; -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] gbm/dri: cleanup memory leak on teardown
During teardown we free the driver_configs list pointer, but we forget to deallocate each config in that list. Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/gbm/backends/dri/gbm_dri.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/gbm/backends/dri/gbm_dri.c b/src/gbm/backends/dri/gbm_dri.c index f637e32..0664266 100644 --- a/src/gbm/backends/dri/gbm_dri.c +++ b/src/gbm/backends/dri/gbm_dri.c @@ -914,8 +914,11 @@ static void dri_destroy(struct gbm_device *gbm) { struct gbm_dri_device *dri = gbm_dri_device(gbm); + unsigned i; dri-core-destroyScreen(dri-screen); + for (i = 0; dri-driver_configs[i]; i++) + free((__DRIconfig *) dri-driver_configs[i]); free(dri-driver_configs); dlclose(dri-driver); free(dri-base.driver_name); -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Bogus bounds checking in api_validate.c?
While working on some other things, I came across some bounds checking code in _mesa_validate_DrawElements (and related functions) in api_validate.c. /* use indices in the buffer object */ /* make sure count doesn't go outside buffer bounds */ if (index_bytes(type, count) ctx-Array.VAO-IndexBufferObj-Size) { _mesa_warning(ctx, glDrawElements index out of buffer bounds); return GL_FALSE; } index_bytes calculates how many bytes of data count indices will occupy based on the type. The problem is that this doesn't consider the base pointer. As far as I can tell, if I had a 64 byte buffer object for my index data, and I did glDrawElements(GL_POINTS, 16, GL_UNSIGNED_INT, 60); _mesa_validate_DrawElements would say, Ok! Am I missing something, or is this just broken? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements
On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi siavashser...@gmail.com wrote: Then I do recommend removing the if (cpu_has_sse4_1) from this patch and similar places, because there is no runtime CPU dispatching happening for SSE optimized code paths in action and just adds extra overhead (unnecessary branches) to the generated code. No. Sorry, I realize I misread your previous question: I guess checking for cpu_has_sse4_1 is unnecessary if it isn't controllable by user at runtime; because USE_SSE41 is a compile time check and requires the target machine to be SSE 4.1 capable already. USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you to build the code and then use it only on systems that actually support it. All of this could have been pretty easily answered by a few greps though... ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] egl: dri2: Use present extension. (Was: Re: [RFC] egl: Add DRI3 support to the EGL backend.)
On 11/06/2014 06:16 PM, Michel Dänzer wrote: On 06.11.2014 19:18, Joonas Lahtinen wrote: On to, 2014-11-06 at 18:12 +0900, Michel Dänzer wrote: On 05.11.2014 20:14, Joonas Lahtinen wrote: Modified not refer to DRI3, just uses the present extension to get rid of the excess buffer invalidations. AFAICT there's no fallback from your changes to the current behaviour if the X server doesn't support the Present extension. There probably needs to be such a fallback. It gets rid of such nasty hack (the intel_viewport one), that I thought there is no point making fallback. Because without this, the egl dri2 backend is fundamentally broken anyway. Well, AFAICT your code uses Present extension functionality unconditionally, without checking that the X server supports Present. I can't see how that could possibly work on an X server which doesn't support Present, but I think it would be better to keep it working at least as badly as it does now in that case. :) I was going to say pretty much the same thing. Aren't there (non-Intel) drivers that don't do Present? If I'm not mistaken, some parts of DRI3 (not sure about Present) are even disabled in the Intel driver when SNA is in use... or at least that was the case at one point. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: Remove _mesa_max_buffer_index
From: Ian Romanick ian.d.roman...@intel.com It appears to be completely unused since f9be8543 (February 2012). Signed-off-by: Ian Romanick ian.d.roman...@intel.com Cc: Kenneth Graunke kenn...@whitecape.org --- src/mesa/main/api_validate.c | 46 src/mesa/main/api_validate.h | 6 -- 2 files changed, 52 deletions(-) diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c index 0d206d8..d4b962f 100644 --- a/src/mesa/main/api_validate.c +++ b/src/mesa/main/api_validate.c @@ -77,52 +77,6 @@ index_bytes(GLenum type, GLsizei count) /** - * Find the max index in the given element/index buffer - */ -GLuint -_mesa_max_buffer_index(struct gl_context *ctx, GLuint count, GLenum type, - const void *indices, - struct gl_buffer_object *elementBuf) -{ - const GLubyte *map = NULL; - GLuint max = 0; - GLuint i; - - if (_mesa_is_bufferobj(elementBuf)) { - /* elements are in a user-defined buffer object. need to map it */ - map = ctx-Driver.MapBufferRange(ctx, 0, elementBuf-Size, - GL_MAP_READ_BIT, elementBuf, - MAP_INTERNAL); - /* Actual address is the sum of pointers */ - indices = (const GLvoid *) ADD_POINTERS(map, (const GLubyte *) indices); - } - - if (type == GL_UNSIGNED_INT) { - for (i = 0; i count; i++) - if (((GLuint *) indices)[i] max) -max = ((GLuint *) indices)[i]; - } - else if (type == GL_UNSIGNED_SHORT) { - for (i = 0; i count; i++) - if (((GLushort *) indices)[i] max) -max = ((GLushort *) indices)[i]; - } - else { - ASSERT(type == GL_UNSIGNED_BYTE); - for (i = 0; i count; i++) - if (((GLubyte *) indices)[i] max) -max = ((GLubyte *) indices)[i]; - } - - if (map) { - ctx-Driver.UnmapBuffer(ctx, elementBuf, MAP_INTERNAL); - } - - return max; -} - - -/** * Check if OK to draw arrays/elements. */ static GLboolean diff --git a/src/mesa/main/api_validate.h b/src/mesa/main/api_validate.h index 8238df1..0bb91c6 100644 --- a/src/mesa/main/api_validate.h +++ b/src/mesa/main/api_validate.h @@ -35,12 +35,6 @@ struct gl_context; struct gl_transform_feedback_object; -extern GLuint -_mesa_max_buffer_index(struct gl_context *ctx, GLuint count, GLenum type, - const void *indices, - struct gl_buffer_object *elementBuf); - - extern bool _mesa_is_valid_prim_mode(struct gl_context *ctx, GLenum mode); -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Bogus bounds checking in api_validate.c?
On Thursday, November 06, 2014 08:09:18 PM Ian Romanick wrote: While working on some other things, I came across some bounds checking code in _mesa_validate_DrawElements (and related functions) in api_validate.c. /* use indices in the buffer object */ /* make sure count doesn't go outside buffer bounds */ if (index_bytes(type, count) ctx-Array.VAO-IndexBufferObj-Size) { _mesa_warning(ctx, glDrawElements index out of buffer bounds); return GL_FALSE; } index_bytes calculates how many bytes of data count indices will occupy based on the type. The problem is that this doesn't consider the base pointer. As far as I can tell, if I had a 64 byte buffer object for my index data, and I did glDrawElements(GL_POINTS, 16, GL_UNSIGNED_INT, 60); _mesa_validate_DrawElements would say, Ok! Am I missing something, or is this just broken? It sure seems broken to me - but, thankfully, in a conservative fashion. (It will say some invalid things are OK, but won't say legal things are invalid.) Software drivers may be relying on this working to avoid a crash. I checked the Ivybridge documentation, and found: Software is responsible for ensuring that accesses outside the IB do not occur. This is possible as software can compute the range of IB values referenced by a 3DPRIMITIVE command (knowing the StartVertexLocation, InstanceCount, and VerticesPerInstance values) and can then compare this range to the IB extent. which makes it sound like an accurate computation is necessary. But, right below that, it says: this field contains the address of the last valid byte in the index buffer. Any index buffer reads past this address returns an index value of 0 (as if the index buffer was zero-extended). So the earlier statement is false; i965 will draw the in-bounds elements correctly, and then repeat element 0 over and over for any out-of-bounds data, resulting in one strange primitive and a lot of degenerate ones. It's proabbly worth fixing, but I doubt it's critical either. A more interesting thing to fix, I think, would be enforcing alignment restrictions (i.e. your offset has to be a multiple of the IB element size). --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Remove _mesa_max_buffer_index
On Thursday, November 06, 2014 11:00:13 PM Ian Romanick wrote: From: Ian Romanick ian.d.roman...@intel.com It appears to be completely unused since f9be8543 (February 2012). Signed-off-by: Ian Romanick ian.d.roman...@intel.com Cc: Kenneth Graunke kenn...@whitecape.org Yep, looks unused to me. Reviewed-by: Kenneth Graunke kenn...@whitecape.org My 2012 commit message in f9be8543 is not quite true - glDrawRangeElements is actually useful even when working with VBOs. The driver can use the given range to decide what portion of a VBO is busy/going to be read by the GPU. It can then optimize glBufferSubData and glMapBufferRange to avoid unnecessary synchronization when an application tries to overwrite an unused subregion. It turns out that's also rather crucial for performance in many modern apps. --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] egl_dri2: fix double free on drm platforms
On Friday, November 07, 2014 03:50:42 AM Emil Velikov wrote: Earlier commit failed to attribure that for drm platforms one does not call dri2_create_screen, thus it does not create the screen and driver_configs but inherits them from the display - gbm. As such wrap cleanup in Platform != _EGL_PLATFORM_DRM to prevent the issue and still cleanup correctly for non-drm platforms. Cc: Kenneth Graunke kenn...@whitecape.org Cc: Mark Janes mark.a.ja...@intel.com Reported-by: Kenneth Graunke kenn...@whitecape.org Reported-by: Mark Janes mark.a.ja...@intel.com Signed-off-by: Emil Velikov emil.l.veli...@gmail.com --- src/egl/drivers/dri2/egl_dri2.c | 9 + 1 file changed, 9 insertions(+) diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c index dcc3239..609afde 100644 --- a/src/egl/drivers/dri2/egl_dri2.c +++ b/src/egl/drivers/dri2/egl_dri2.c @@ -707,9 +707,18 @@ dri2_terminate(_EGLDriver *drv, _EGLDisplay *disp) break; } + /* The drm platform does not create the screen/driver_configs but reuses +* the ones from the gbm device. As such the gbm itself is responsible +* for the cleanup. */ Usually */ goes on its own line (same applies to later patches). +#ifdef HAVE_DRM_PLATFORM I don't think you need these #ifdefs - _EGL_PLATFORM_DRM is an enum value defined by src/egl/main/egldisplay.h (not the public EGL headers) regardless of build options. I imagine if you didn't build with HAVE_DRM_PLATFORM, disp- Platform will never be _EGL_PLATFORM_DRM. + if (disp-Platform != _EGL_PLATFORM_DRM) { +#endif for (i = 0; dri2_dpy-driver_configs[i]; i++) free((__DRIconfig *) dri2_dpy-driver_configs[i]); free(dri2_dpy-driver_configs); +#ifdef HAVE_DRM_PLATFORM + } +#endif free(dri2_dpy); disp-DriverData = NULL; Thanks for fixing this so quickly! The series fixes my crashes, and is: Reviewed-and-tested-by: Kenneth Graunke kenn...@whitecape.org with, of course, the caveat that I don't know the EGL code worth beans. :) --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev