[Mesa-dev] [PATCH 1/3] intel: Recognize GL_DEPTH_COMPONENT32 in get_teximage_readbuffer.
gl_texture_image::InternalFormat is actually the user requested internal format, not what the texture actually is. Thus, even though we don't support 32-bit depth buffers, we need to recognize the enumeration here. Otherwise, it wrongly returns the color read buffer instead of the depth read buffer. Fixes an issue in PlaneShift 0.5.7 when casting spells. The game calls CopyTexSubImage2D on buffers with a GL_DEPTH_COMPONENT32 internal format, which (prior to this patch) resulted in an attempt to copy an ARGB to S8_Z24. This patch fixes the behavior, but does not yet eliminate the software fallback. NOTE: This is a candidate for the 7.10 and 7.11 branches. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/intel/intel_tex_copy.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) I kind of wonder if we should just be using TexFormat (the actual format) rather than InternalFormat (the user requested format). diff --git a/src/mesa/drivers/dri/intel/intel_tex_copy.c b/src/mesa/drivers/dri/intel/intel_tex_copy.c index eda07a4..8b5c3f0 100644 --- a/src/mesa/drivers/dri/intel/intel_tex_copy.c +++ b/src/mesa/drivers/dri/intel/intel_tex_copy.c @@ -58,6 +58,7 @@ get_teximage_readbuffer(struct intel_context *intel, GLenum internalFormat) switch (internalFormat) { case GL_DEPTH_COMPONENT: case GL_DEPTH_COMPONENT16: + case GL_DEPTH_COMPONENT32: case GL_DEPTH24_STENCIL8_EXT: case GL_DEPTH_STENCIL_EXT: return intel_get_renderbuffer(intel-ctx.ReadBuffer, BUFFER_DEPTH); -- 1.7.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] intel: Remove restriction against Y-tiling in intel_copy_texsubimage.
intelEmitCopyBlit already checks for this, so the check is redundant and unnecessary. This consolidates the logic (which will soon change). NOTE: This is a candidate for the 7.10 and 7.11 branches. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/intel/intel_tex_copy.c |5 - 1 files changed, 0 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/intel/intel_tex_copy.c b/src/mesa/drivers/dri/intel/intel_tex_copy.c index 8b5c3f0..6a297c0 100644 --- a/src/mesa/drivers/dri/intel/intel_tex_copy.c +++ b/src/mesa/drivers/dri/intel/intel_tex_copy.c @@ -128,11 +128,6 @@ intel_copy_texsubimage(struct intel_context *intel, 0, image_x, image_y); - /* The blitter can't handle Y-tiled buffers. */ - if (intelImage-mt-region-tiling == I915_TILING_Y) { -return GL_FALSE; - } - if (ctx-ReadBuffer-Name == 0) { /* Flip vertical orientation for system framebuffers */ y = ctx-ReadBuffer-Height - (y + height); -- 1.7.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] intel: Add support copying Y-tiled buffers with the Gen6 blitter.
According to the Sandybridge PRM, Volume 1, Part 5, Section 1.9.15, Gen6's blitter supports Y-tiled buffers as well as X-tiled. Pitch is specified in 512-byte granularity for X-tiled, but 128-byte for Y-tiled. Gen5 and earlier unfortunately only support X-tiled buffers. Fixes a software fallback in PlaneShift 0.5.7 when casting spells. NOTE: This is a candidate for the 7.10 and 7.11 branches. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/intel/intel_blit.c | 12 +++- 1 files changed, 7 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/intel/intel_blit.c b/src/mesa/drivers/dri/intel/intel_blit.c index 30be1b9..de752f2 100644 --- a/src/mesa/drivers/dri/intel/intel_blit.c +++ b/src/mesa/drivers/dri/intel/intel_blit.c @@ -111,13 +111,13 @@ intelEmitCopyBlit(struct intel_context *intel, if (dst_tiling != I915_TILING_NONE) { if (dst_offset 4095) return GL_FALSE; - if (dst_tiling == I915_TILING_Y) + if (intel-gen 6 dst_tiling == I915_TILING_Y) return GL_FALSE; } if (src_tiling != I915_TILING_NONE) { if (src_offset 4095) return GL_FALSE; - if (src_tiling == I915_TILING_Y) + if (intel-gen 6 src_tiling == I915_TILING_Y) return GL_FALSE; } @@ -172,13 +172,15 @@ intelEmitCopyBlit(struct intel_context *intel, } #ifndef I915 - if (dst_tiling != I915_TILING_NONE) { + if (dst_tiling == I915_TILING_NONE) { CMD |= XY_DST_TILED; - dst_pitch /= 4; + if (dst_tiling == I915_TILING_X) +dst_pitch /= 4; } if (src_tiling != I915_TILING_NONE) { CMD |= XY_SRC_TILED; - src_pitch /= 4; + if (src_tiling == I915_TILING_X) +src_pitch /= 4; } #endif -- 1.7.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/9] svga: Flush when switching between HW to SW TNL, after updating need_swtnl.
From: José Fonseca jfons...@vmware.com Also, only flush when going from HW TNL to SW TNL, given it is impossible for the buffers resulting from SWTNL to be ever referred by HW TNL path. --- src/gallium/drivers/svga/svga_context.h |3 --- src/gallium/drivers/svga/svga_pipe_draw.c | 23 +++ 2 files changed, 15 insertions(+), 11 deletions(-) diff --git a/src/gallium/drivers/svga/svga_context.h b/src/gallium/drivers/svga/svga_context.h index eca529d..34b9e85 100644 --- a/src/gallium/drivers/svga/svga_context.h +++ b/src/gallium/drivers/svga/svga_context.h @@ -372,9 +372,6 @@ struct svga_context /** List of buffers with queued transfers */ struct list_head dirty_buffers; - - /** Was the previous draw done with the SW path? */ - boolean prev_draw_swtnl; }; /* A flag for each state_tracker state object: diff --git a/src/gallium/drivers/svga/svga_pipe_draw.c b/src/gallium/drivers/svga/svga_pipe_draw.c index 2093bca..a632fb1 100644 --- a/src/gallium/drivers/svga/svga_pipe_draw.c +++ b/src/gallium/drivers/svga/svga_pipe_draw.c @@ -141,18 +141,11 @@ svga_draw_vbo(struct pipe_context *pipe, const struct pipe_draw_info *info) unsigned reduced_prim = u_reduced_prim( info-mode ); unsigned count = info-count; enum pipe_error ret = 0; + boolean needed_swtnl; if (!u_trim_pipe_prim( info-mode, count )) return; - if (svga-state.sw.need_swtnl != svga-prev_draw_swtnl) { - /* We're switching between SW and HW drawing. Do a flush to avoid - * mixing HW and SW rendering with the same vertex buffer. - */ - pipe-flush(pipe, NULL); - svga-prev_draw_swtnl = svga-state.sw.need_swtnl; - } - /* * Mark currently bound target surfaces as dirty * doesn't really matter if it is done before drawing. @@ -167,6 +160,8 @@ svga_draw_vbo(struct pipe_context *pipe, const struct pipe_draw_info *info) svga-dirty |= SVGA_NEW_REDUCED_PRIMITIVE; } + needed_swtnl = svga-state.sw.need_swtnl; + svga_update_state_retry( svga, SVGA_STATE_NEED_SWTNL ); #ifdef DEBUG @@ -176,6 +171,18 @@ svga_draw_vbo(struct pipe_context *pipe, const struct pipe_draw_info *info) #endif if (svga-state.sw.need_swtnl) { + if (!needed_swtnl) { + /* + * We're switching from HW to SW TNL. SW TNL will require mapping all + * currently bound vertex buffers, some of which may already be + * referenced in the current command buffer as result of previous HW + * TNL. So flush now, to prevent the context to flush while a referred + * vertex buffer is mapped. + */ + + svga_context_flush(svga, NULL); + } + ret = svga_swtnl_draw_vbo( svga, info ); } else { -- 1.6.2.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/9] gallium/util: Upload manager optimizations
Make sure that the upload manager doesn't upload data that's not dirty. This speeds up the viewperf test proe-04/1 a factor 5 or so on svga. Also introduce an u_upload_unmap() function that can be used instead of u_upload_flush() so that we can pack even more data in upload buffers. With this we can basically reuse the upload buffer across flushes. Signed-off-by: Thomas Hellstrom thellst...@vmware.com --- src/gallium/auxiliary/util/u_upload_mgr.c | 35 +--- src/gallium/auxiliary/util/u_upload_mgr.h | 20 +--- 2 files changed, 42 insertions(+), 13 deletions(-) diff --git a/src/gallium/auxiliary/util/u_upload_mgr.c b/src/gallium/auxiliary/util/u_upload_mgr.c index 9562acb..d36697d 100644 --- a/src/gallium/auxiliary/util/u_upload_mgr.c +++ b/src/gallium/auxiliary/util/u_upload_mgr.c @@ -72,6 +72,22 @@ struct u_upload_mgr *u_upload_create( struct pipe_context *pipe, return upload; } +void u_upload_unmap( struct u_upload_mgr *upload ) +{ + if (upload-transfer) { + struct pipe_box *box = upload-transfer-box; + if (upload-offset box-x) { + + pipe_buffer_flush_mapped_range(upload-pipe, upload-transfer, +box-x, upload-offset - box-x); + } + pipe_transfer_unmap(upload-pipe, upload-transfer); + pipe_transfer_destroy(upload-pipe, upload-transfer); + upload-transfer = NULL; + upload-map = NULL; + } +} + /* Release old buffer. * * This must usually be called prior to firing the command stream @@ -84,15 +100,7 @@ struct u_upload_mgr *u_upload_create( struct pipe_context *pipe, void u_upload_flush( struct u_upload_mgr *upload ) { /* Unmap and unreference the upload buffer. */ - if (upload-transfer) { - if (upload-offset) { - pipe_buffer_flush_mapped_range(upload-pipe, upload-transfer, -0, upload-offset); - } - pipe_transfer_unmap(upload-pipe, upload-transfer); - pipe_transfer_destroy(upload-pipe, upload-transfer); - upload-transfer = NULL; - } + u_upload_unmap(upload); pipe_resource_reference( upload-buffer, NULL ); upload-size = 0; } @@ -172,6 +180,15 @@ enum pipe_error u_upload_alloc( struct u_upload_mgr *upload, offset = MAX2(upload-offset, alloc_offset); + if (!upload-map) { + upload-map = pipe_buffer_map_range(upload-pipe, upload-buffer, + offset, upload-size - offset, + PIPE_TRANSFER_WRITE | + PIPE_TRANSFER_FLUSH_EXPLICIT | + PIPE_TRANSFER_UNSYNCHRONIZED, + upload-transfer); + } + assert(offset upload-buffer-width0); assert(offset + size = upload-buffer-width0); assert(size); diff --git a/src/gallium/auxiliary/util/u_upload_mgr.h b/src/gallium/auxiliary/util/u_upload_mgr.h index c9a2ffe..9891513 100644 --- a/src/gallium/auxiliary/util/u_upload_mgr.h +++ b/src/gallium/auxiliary/util/u_upload_mgr.h @@ -56,15 +56,27 @@ struct u_upload_mgr *u_upload_create( struct pipe_context *pipe, */ void u_upload_destroy( struct u_upload_mgr *upload ); -/* Unmap and release old buffer. +/* Unmap and release old upload buffer. * + * This is like u_upload_unmap() except the upload buffer is released for + * recycling. This should be called on real hardware flushes on systems + * that don't support the PIPE_TRANSFER_UNSYNCHRONIZED flag, as otherwise + * the next u_upload_buffer will cause a sync on the buffer. + */ + +void u_upload_flush( struct u_upload_mgr *upload ); + +/** + * Unmap upload buffer + * + * \param upload Upload manager + * * This must usually be called prior to firing the command stream * which references the upload buffer, as many memory managers either * don't like firing a mapped buffer or cause subsequent maps of a - * fired buffer to wait. For now, it's easiest just to grab a new - * buffer. + * fired buffer to wait. */ -void u_upload_flush( struct u_upload_mgr *upload ); +void u_upload_unmap( struct u_upload_mgr *upload ); /** * Sub-allocate new memory from the upload buffer. -- 1.6.2.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/9] gallium/svga: Make use of u_upload_flush().
This enables us to pack more data into single upload buffers. Signed-off-by: Thomas Hellstrom thellst...@vmware.com --- src/gallium/drivers/svga/svga_context.c |8 src/gallium/drivers/svga/svga_draw.c|4 ++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/svga/svga_context.c b/src/gallium/drivers/svga/svga_context.c index dbbc249..cfb1b9d 100644 --- a/src/gallium/drivers/svga/svga_context.c +++ b/src/gallium/drivers/svga/svga_context.c @@ -207,6 +207,14 @@ void svga_context_flush( struct svga_context *svga, svga-curr.nr_fbs = 0; + /* Flush the upload managers to ensure recycling of upload buffers +* without throttling. This should really be conditioned on +* pipe_buffer_map_range not supporting PIPE_TRANSFER_UNSYNCHRONIZED. +*/ + + u_upload_flush(svga-upload_vb); + u_upload_flush(svga-upload_ib); + /* Ensure that texture dma uploads are processed * before submitting commands. */ diff --git a/src/gallium/drivers/svga/svga_draw.c b/src/gallium/drivers/svga/svga_draw.c index d8af615..28ba470 100644 --- a/src/gallium/drivers/svga/svga_draw.c +++ b/src/gallium/drivers/svga/svga_draw.c @@ -145,7 +145,7 @@ svga_hwtnl_flush( struct svga_hwtnl *hwtnl ) unsigned i; /* Unmap upload manager vertex buffers */ - u_upload_flush(svga-upload_vb); + u_upload_unmap(svga-upload_vb); for (i = 0; i hwtnl-cmd.vdecl_count; i++) { handle = svga_buffer_handle(svga, hwtnl-cmd.vdecl_vb[i]); @@ -156,7 +156,7 @@ svga_hwtnl_flush( struct svga_hwtnl *hwtnl ) } /* Unmap upload manager index buffers */ - u_upload_flush(svga-upload_ib); + u_upload_unmap(svga-upload_ib); for (i = 0; i hwtnl-cmd.prim_count; i++) { if (hwtnl-cmd.prim_ib[i]) { -- 1.6.2.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/9] gallium/svga: Upload only parts of user-buffers that we actually use
Stream user buffer contents rather than trying to maintain persistent host / hardware copies. Resulting negative array offsets are not allowed by the hardware, (well, at least not according to header files), so adjust index bias to make all array offsets positive. Signed-off-by: Thomas Hellstrom thellst...@vmware.com --- src/gallium/drivers/svga/svga_draw.c| 13 ++- src/gallium/drivers/svga/svga_draw.h|3 + src/gallium/drivers/svga/svga_draw_private.h|7 ++ src/gallium/drivers/svga/svga_pipe_draw.c | 130 ++- src/gallium/drivers/svga/svga_resource_buffer.h |7 ++ src/gallium/drivers/svga/svga_state_vdecl.c | 119 + 6 files changed, 208 insertions(+), 71 deletions(-) diff --git a/src/gallium/drivers/svga/svga_draw.c b/src/gallium/drivers/svga/svga_draw.c index 28ba470..aa09669 100644 --- a/src/gallium/drivers/svga/svga_draw.c +++ b/src/gallium/drivers/svga/svga_draw.c @@ -242,6 +242,11 @@ svga_hwtnl_flush( struct svga_hwtnl *hwtnl ) } +void svga_hwtnl_set_index_bias( struct svga_hwtnl *hwtnl, + int index_bias) +{ + hwtnl-index_bias = index_bias; +} @@ -265,15 +270,16 @@ enum pipe_error svga_hwtnl_prim( struct svga_hwtnl *hwtnl, unsigned size = vb ? vb-width0 : 0; unsigned offset = hwtnl-cmd.vdecl[i].array.offset; unsigned stride = hwtnl-cmd.vdecl[i].array.stride; - unsigned index_bias = range-indexBias; + int index_bias = (int) range-indexBias + hwtnl-index_bias; unsigned width; assert(vb); assert(size); assert(offset size); - assert(index_bias = 0); assert(min_index = max_index); - assert(offset + index_bias*stride size); + if (index_bias = 0) { +assert(offset + index_bias*stride size); + } if (min_index != ~0) { assert(offset + (index_bias + min_index) * stride size); } @@ -394,6 +400,7 @@ enum pipe_error svga_hwtnl_prim( struct svga_hwtnl *hwtnl, hwtnl-cmd.max_index[hwtnl-cmd.prim_count] = max_index; hwtnl-cmd.prim[hwtnl-cmd.prim_count] = *range; + hwtnl-cmd.prim[hwtnl-cmd.prim_count].indexBias += hwtnl-index_bias; pipe_resource_reference(hwtnl-cmd.prim_ib[hwtnl-cmd.prim_count], ib); hwtnl-cmd.prim_count++; diff --git a/src/gallium/drivers/svga/svga_draw.h b/src/gallium/drivers/svga/svga_draw.h index a2403d8..1dac174 100644 --- a/src/gallium/drivers/svga/svga_draw.h +++ b/src/gallium/drivers/svga/svga_draw.h @@ -79,5 +79,8 @@ svga_hwtnl_draw_range_elements( struct svga_hwtnl *hwtnl, enum pipe_error svga_hwtnl_flush( struct svga_hwtnl *hwtnl ); +void svga_hwtnl_set_index_bias( struct svga_hwtnl *hwtnl, +int index_bias); + #endif /* SVGA_DRAW_H_ */ diff --git a/src/gallium/drivers/svga/svga_draw_private.h b/src/gallium/drivers/svga/svga_draw_private.h index ca658ac..8126f7e 100644 --- a/src/gallium/drivers/svga/svga_draw_private.h +++ b/src/gallium/drivers/svga/svga_draw_private.h @@ -116,6 +116,13 @@ struct draw_cmd { struct svga_hwtnl { struct svga_context *svga; struct u_upload_mgr *upload_ib; + + /* Additional negative index bias due to partial buffer uploads +* This is compensated for in the offset associated with all +* vertex buffers. +*/ + + int index_bias; /* Flatshade information: */ diff --git a/src/gallium/drivers/svga/svga_pipe_draw.c b/src/gallium/drivers/svga/svga_pipe_draw.c index a632fb1..8e1c764 100644 --- a/src/gallium/drivers/svga/svga_pipe_draw.c +++ b/src/gallium/drivers/svga/svga_pipe_draw.c @@ -37,6 +37,116 @@ #include svga_state.h #include svga_swtnl.h #include svga_debug.h +#include svga_resource_buffer.h +#include util/u_upload_mgr.h + +/** + * svga_upload_user_buffers - upload parts of user buffers + * + * This function streams a part of a user buffer to hw and sets + * svga_buffer::source_offset to the first byte uploaded. After upload + * also svga_buffer::uploaded::buffer is set to !NULL + */ + +static int +svga_upload_user_buffers(struct svga_context *svga, + unsigned start, + unsigned count, + unsigned instance_count) +{ + const struct pipe_vertex_element *ve = svga-curr.velems-velem; + unsigned i; + int ret; + + for (i=0; i svga-curr.velems-count; i++) { + struct pipe_vertex_buffer *vb = + svga-curr.vb[ve[i].vertex_buffer_index]; + + if (vb-buffer svga_buffer_is_user_buffer(vb-buffer)) { + struct svga_buffer *buffer = svga_buffer(vb-buffer); + unsigned first, size; + boolean flushed; + unsigned instance_div = ve[i].instance_divisor; + + svga-dirty |= SVGA_NEW_VBUFFER; + + if (instance_div) { +first = 0; +size = vb-stride * + (instance_count + instance_div - 1) /
[Mesa-dev] [PATCH 5/9] svga: Handle null buffers in svga_buffer_is_user_buffer().
From: José Fonseca jfons...@vmware.com --- src/gallium/drivers/svga/svga_resource_buffer.h |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/src/gallium/drivers/svga/svga_resource_buffer.h b/src/gallium/drivers/svga/svga_resource_buffer.h index 2ae44d2..69d6f72 100644 --- a/src/gallium/drivers/svga/svga_resource_buffer.h +++ b/src/gallium/drivers/svga/svga_resource_buffer.h @@ -200,7 +200,11 @@ svga_buffer(struct pipe_resource *buffer) static INLINE boolean svga_buffer_is_user_buffer( struct pipe_resource *buffer ) { - return svga_buffer(buffer)-user; + if (buffer) { + return svga_buffer(buffer)-user; + } else { + return FALSE; + } } -- 1.6.2.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/9] svga: fix incorrect user buffer size computation
From: Brian Paul bri...@vmware.com Viewperf uses some unusual vertex arrays where the stride is less than the element size. In this case, the stride was 4 while the element size was 12. The difference of 8 bytes causes us to miss uploading the tail bit of the array data. Typically the stride is = the element size so there was no problem with other apps. --- src/gallium/drivers/svga/svga_pipe_draw.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/src/gallium/drivers/svga/svga_pipe_draw.c b/src/gallium/drivers/svga/svga_pipe_draw.c index 8e1c764..78f5aa1 100644 --- a/src/gallium/drivers/svga/svga_pipe_draw.c +++ b/src/gallium/drivers/svga/svga_pipe_draw.c @@ -25,6 +25,7 @@ #include svga_cmd.h +#include util/u_format.h #include util/u_inlines.h #include util/u_prim.h #include util/u_time.h @@ -75,8 +76,9 @@ svga_upload_user_buffers(struct svga_context *svga, size = vb-stride * (instance_count + instance_div - 1) / instance_div; } else if (vb-stride) { +uint elemSize = util_format_get_blocksize(ve-src_format); first = vb-stride * start; -size = vb-stride * count; +size = vb-stride * (count - 1) + elemSize; } else { /* Only a single vertex! * Upload with the largest vertex size the hw supports, -- 1.6.2.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/9] svga: fix incorrect user buffer size computation for instance divisor case
From: Brian Paul bri...@vmware.com See preceeding commit for more info. --- src/gallium/drivers/svga/svga_pipe_draw.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/svga/svga_pipe_draw.c b/src/gallium/drivers/svga/svga_pipe_draw.c index 78f5aa1..358ef82 100644 --- a/src/gallium/drivers/svga/svga_pipe_draw.c +++ b/src/gallium/drivers/svga/svga_pipe_draw.c @@ -68,15 +68,15 @@ svga_upload_user_buffers(struct svga_context *svga, unsigned first, size; boolean flushed; unsigned instance_div = ve[i].instance_divisor; + unsigned elemSize = util_format_get_blocksize(ve-src_format); svga-dirty |= SVGA_NEW_VBUFFER; if (instance_div) { first = 0; -size = vb-stride * - (instance_count + instance_div - 1) / instance_div; +count = (instance_count + instance_div - 1) / instance_div; +size = vb-stride * (count - 1) + elemSize; } else if (vb-stride) { -uint elemSize = util_format_get_blocksize(ve-src_format); first = vb-stride * start; size = vb-stride * (count - 1) + elemSize; } else { -- 1.6.2.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 9/9] svga: Fix multiple uploads of the same user-buffer.
If a user-buffer was referenced twice by a draw command, the affected ranges were uploaded separately, with only the last one being referenced by the hardware. Make sure we upload only a single range. Signed-off-by: Thomas Hellstrom thellst...@vmware.com --- src/gallium/drivers/svga/svga_pipe_draw.c | 101 ++- src/gallium/drivers/svga/svga_resource_buffer.h | 13 ++-- src/gallium/drivers/svga/svga_state_vdecl.c |6 +- 3 files changed, 90 insertions(+), 30 deletions(-) diff --git a/src/gallium/drivers/svga/svga_pipe_draw.c b/src/gallium/drivers/svga/svga_pipe_draw.c index 358ef82..0b4d41b 100644 --- a/src/gallium/drivers/svga/svga_pipe_draw.c +++ b/src/gallium/drivers/svga/svga_pipe_draw.c @@ -42,22 +42,42 @@ #include util/u_upload_mgr.h /** - * svga_upload_user_buffers - upload parts of user buffers + * Determine the ranges to upload for the user-buffers referenced + * by the next draw command. * - * This function streams a part of a user buffer to hw and sets - * svga_buffer::source_offset to the first byte uploaded. After upload - * also svga_buffer::uploaded::buffer is set to !NULL + * TODO: It might be beneficial to support multiple ranges. In that case, + * the struct svga_buffer::uploaded member should be made an array or a + * list, since we need to account for the possibility that different ranges + * may be uploaded to different hardware buffers chosen by the utility + * upload manager. */ -static int -svga_upload_user_buffers(struct svga_context *svga, - unsigned start, - unsigned count, - unsigned instance_count) +static void +svga_user_buffer_range(struct svga_context *svga, + unsigned start, + unsigned count, + unsigned instance_count) { const struct pipe_vertex_element *ve = svga-curr.velems-velem; - unsigned i; - int ret; + int i; + + /* +* Release old uploaded range (if not done already) and +* initialize new ranges. +*/ + + for (i=0; i svga-curr.velems-count; i++) { + struct pipe_vertex_buffer *vb = + svga-curr.vb[ve[i].vertex_buffer_index]; + + if (vb-buffer svga_buffer_is_user_buffer(vb-buffer)) { + struct svga_buffer *buffer = svga_buffer(vb-buffer); + + pipe_resource_reference(buffer-uploaded.buffer, NULL); + buffer-uploaded.start = ~0; + buffer-uploaded.end = 0; + } + } for (i=0; i svga-curr.velems-count; i++) { struct pipe_vertex_buffer *vb = @@ -66,30 +86,71 @@ svga_upload_user_buffers(struct svga_context *svga, if (vb-buffer svga_buffer_is_user_buffer(vb-buffer)) { struct svga_buffer *buffer = svga_buffer(vb-buffer); unsigned first, size; - boolean flushed; unsigned instance_div = ve[i].instance_divisor; unsigned elemSize = util_format_get_blocksize(ve-src_format); svga-dirty |= SVGA_NEW_VBUFFER; if (instance_div) { -first = 0; +first = ve[i].src_offset; count = (instance_count + instance_div - 1) / instance_div; size = vb-stride * (count - 1) + elemSize; } else if (vb-stride) { -first = vb-stride * start; +first = vb-stride * start + ve[i].src_offset; size = vb-stride * (count - 1) + elemSize; } else { /* Only a single vertex! * Upload with the largest vertex size the hw supports, * if possible. */ -first = 0; +first = ve[i].src_offset; size = MIN2(16, vb-buffer-width0); } + buffer-uploaded.start = MIN2(buffer-uploaded.start, first); + buffer-uploaded.end = MAX2(buffer-uploaded.end, first + size); + } + } +} + +/** + * svga_upload_user_buffers - upload parts of user buffers + * + * This function streams a part of a user buffer to hw and fills + * svga_buffer::uploaded with information on the upload. + */ + +static int +svga_upload_user_buffers(struct svga_context *svga, + unsigned start, + unsigned count, + unsigned instance_count) +{ + const struct pipe_vertex_element *ve = svga-curr.velems-velem; + unsigned i; + int ret; + + svga_user_buffer_range(svga, start, count, instance_count); + + for (i=0; i svga-curr.velems-count; i++) { + struct pipe_vertex_buffer *vb = + svga-curr.vb[ve[i].vertex_buffer_index]; + + if (vb-buffer svga_buffer_is_user_buffer(vb-buffer)) { + struct svga_buffer *buffer = svga_buffer(vb-buffer); + boolean flushed; + + /* + * Check if already uploaded. Otherwise go ahead and upload. + */ + + if (buffer-uploaded.buffer) +continue; + ret = u_upload_buffer( svga-upload_vb, -
Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup
Hmm. Forgive my ignorance, but isn't memcmp() on structs pretty prone to give incorrect != results, given that there may be padding between members in structs and that IIRC gcc struct assignment is member-wise. What happens if there's padding between the jit_context and variant members of struct lp_rast_state? I seem to recall hitting similar issues a number of times in the past. /Thomas On 06/30/2011 03:36 AM, Roland Scheidegger wrote: Ok in fact there's a gcc bug about memcmp: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 In short gcc's memcmp builtin is totally lame and loses to glibc's memcmp (including call overhead, no knowledge about alignment etc.) even when comparing only very few bytes (and loses BIG time for lots of bytes to compare). Oops. Well at least if the strings are the same (I'd guess if the first byte is different it's hard to beat the gcc builtin...). So this is really a gcc bug. The bug is quite old though with no fix in sight apparently so might need to think about some workaround (but just not doing the comparison doesn't look like the right idea, since apparently it would be faster with the comparison if gcc's memcmp got fixed). Roland Am 30.06.2011 01:47, schrieb Roland Scheidegger: I didn't even look at that was just curious why the memcmp (which is used a lot in other places) is slow. However, none of the other memcmp seem to show up prominently (cso functions are quite low in profiles, _mesa_search_program_cache uses memcmp too but it's not that high neither). So I guess those functions either aren't called that often or the sizes they compare are small. So should maybe just file a gcc bug for memcmp and look at that particular llvmpipe issue again :-). Roland Am 30.06.2011 01:16, schrieb Corbin Simpson: Okay, so maybe I'm failing to recognize the exact situation here, but wouldn't it be possible to mark the FS state with a serial number and just compare those? Or are these FS states not CSO-cached? ~ C. On Wed, Jun 29, 2011 at 3:44 PM, Roland Scheideggersrol...@vmware.com wrote: Actually I ran some numbers here and tried out a optimized struct compare: original ipers: 12.1 fps ajax patch: 15.5 fps optimized struct compare: 16.8 fps This is the function I used for that (just enabled in that lp_setup function): static INLINE int util_cmp_struct(const void *src1, const void *src2, unsigned count) { /* hmm pointer casting is evil */ const uint32_t *src1_ptr = (uint32_t *)src1; const uint32_t *src2_ptr = (uint32_t *)src2; unsigned i; assert(count % 4 == 0); for (i = 0; i count/4; i++) { if (*src1_ptr != *src2_ptr) { return 1; } src1_ptr++; src2_ptr++; } return 0; } (And no this doesn't use repz cmpsd here.) So, unless I made some mistake, memcmp is just dead slow (*), most of the slowness probably coming from the bytewise comparison (and apparently I was wrong in assuming the comparison there might never be the same for ipers). Of course, the optimized struct compare relies on structs really being dword aligned (I think this is always the case), and additionally it relies on the struct size being a whole multiple of dwords - likely struct needs padding to ensure that (at least I don't think this is always guaranteed for all structs). But since memcmp is used extensively (cso for one) maybe some optimization along these lines might be worth it (though of course for small structs the win isn't going to be as big - and can't beat the repz cmps in code size...). Roland (*) I actually found some references some implementations might be better they don't just use repz cmpsb but they split this up in parts which do dword (or qword even - well for really large structs could use sse2) comparisons for the parts where it's possible and only byte comparisons for the remaining bytes (and if the compiler does that it probably would know the size at compile time here hence could leave out much of the code). Of course memcmp requires that the return value isn't just a true or false value, hence there's more code needed once an unequal dword is found, though the compiler could optimize that away too in case it's not needed. Much the same as memcpy is optimized usually really, so blame gcc :-). Am 29.06.2011 20:33, schrieb Roland Scheidegger: Ohh that's interesting, you'd think the comparison shouldn't be that expensive (though I guess in ipers case the comparison is never true). memcmp is quite extensively used everywhere. Maybe we could replace that with something faster (since we only ever care if the blocks are the same but not care about the lexographic ordering and always compare whole structs, should compare dwords instead of bytes for a 4 time speedup)? Or isn't that the reason cmpsb instead of cmpsd is used? Also I guess it would help if the values which are more likely to be unequal are first in the struct (if we can tell that). Of course though if it's unlikely to be the same as the
Re: [Mesa-dev] [PATCH 3/3] intel: Add support copying Y-tiled buffers with the Gen6 blitter.
On Wed, 29 Jun 2011 23:04:04 -0700, Kenneth Graunke kenn...@whitecape.org wrote: According to the Sandybridge PRM, Volume 1, Part 5, Section 1.9.15, Gen6's blitter supports Y-tiled buffers as well as X-tiled. Pitch is specified in 512-byte granularity for X-tiled, but 128-byte for Y-tiled. To support Y-tiling, afaics, in the SNB blitter, you have to set a register that forces it to treat all source and/or destination surface as y-tiled: BCS_SWCTRL 0x22200 Fortunately they did permit use to modify it with MI_LOAD_REGISTER_IMM. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup
On Wed, 2011-06-29 at 16:16 -0700, Corbin Simpson wrote: Okay, so maybe I'm failing to recognize the exact situation here, but wouldn't it be possible to mark the FS state with a serial number and just compare those? Or are these FS states not CSO-cached? No, the struct being compared is poorly named collides with a CSO entity. It's really all the state which the compiled fragment shader will reference when it is later invoked. It's all packed into a single struct because it's easier to pass a single parameter to llvm-compiled shaders and add/change that parameter, but it is somewhat non-orthogonal and we end up generating too many of them. Keith ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup
On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote: Ok in fact there's a gcc bug about memcmp: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 In short gcc's memcmp builtin is totally lame and loses to glibc's memcmp (including call overhead, no knowledge about alignment etc.) even when comparing only very few bytes (and loses BIG time for lots of bytes to compare). Oops. Well at least if the strings are the same (I'd guess if the first byte is different it's hard to beat the gcc builtin...). So this is really a gcc bug. The bug is quite old though with no fix in sight apparently so might need to think about some workaround (but just not doing the comparison doesn't look like the right idea, since apparently it would be faster with the comparison if gcc's memcmp got fixed). Looking at the struct again (it's been a while), it seems like it could be rearranged to be variable-sized and on average significantly smaller: struct lp_rast_state { struct lp_jit_context jit_context; struct lp_fragment_shader_variant *variant; }; struct lp_jit_context { const float *constants; float alpha_ref_value; uint32_t stencil_ref_front, stencil_ref_back; uint8_t *blend_color; struct lp_jit_texture textures[PIPE_MAX_SAMPLERS]; }; If we moved the jit_context part behind variant, and then hopefully note that most of those lp_jit_texture structs are not in use. That would save time on the memcmp *and* space in the binned data. It's weird this wasn't showing up in past profiling. Kieth ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup
Great work Roland! And thanks Ajax to finding this hot spot. We use memcmp a lot -- all CSO caching, so we should use this everywhere. We should also code a sse2 version with intrinsics for x86-64, which is guaranteed to always have SSE2. Jose - Original Message - Actually I ran some numbers here and tried out a optimized struct compare: original ipers: 12.1 fps ajax patch: 15.5 fps optimized struct compare: 16.8 fps This is the function I used for that (just enabled in that lp_setup function): static INLINE int util_cmp_struct(const void *src1, const void *src2, unsigned count) { /* hmm pointer casting is evil */ const uint32_t *src1_ptr = (uint32_t *)src1; const uint32_t *src2_ptr = (uint32_t *)src2; unsigned i; assert(count % 4 == 0); for (i = 0; i count/4; i++) { if (*src1_ptr != *src2_ptr) { return 1; } src1_ptr++; src2_ptr++; } return 0; } (And no this doesn't use repz cmpsd here.) So, unless I made some mistake, memcmp is just dead slow (*), most of the slowness probably coming from the bytewise comparison (and apparently I was wrong in assuming the comparison there might never be the same for ipers). Of course, the optimized struct compare relies on structs really being dword aligned (I think this is always the case), and additionally it relies on the struct size being a whole multiple of dwords - likely struct needs padding to ensure that (at least I don't think this is always guaranteed for all structs). But since memcmp is used extensively (cso for one) maybe some optimization along these lines might be worth it (though of course for small structs the win isn't going to be as big - and can't beat the repz cmps in code size...). Roland (*) I actually found some references some implementations might be better they don't just use repz cmpsb but they split this up in parts which do dword (or qword even - well for really large structs could use sse2) comparisons for the parts where it's possible and only byte comparisons for the remaining bytes (and if the compiler does that it probably would know the size at compile time here hence could leave out much of the code). Of course memcmp requires that the return value isn't just a true or false value, hence there's more code needed once an unequal dword is found, though the compiler could optimize that away too in case it's not needed. Much the same as memcpy is optimized usually really, so blame gcc :-). Am 29.06.2011 20:33, schrieb Roland Scheidegger: Ohh that's interesting, you'd think the comparison shouldn't be that expensive (though I guess in ipers case the comparison is never true). memcmp is quite extensively used everywhere. Maybe we could replace that with something faster (since we only ever care if the blocks are the same but not care about the lexographic ordering and always compare whole structs, should compare dwords instead of bytes for a 4 time speedup)? Or isn't that the reason cmpsb instead of cmpsd is used? Also I guess it would help if the values which are more likely to be unequal are first in the struct (if we can tell that). Of course though if it's unlikely to be the same as the compared value anyway not comparing at all still might be a win (here). Roland Am 29.06.2011 19:19, schrieb Adam Jackson: Perversely, do this by eliminating the comparison between stored and current fs state. On ipers, a perf trace showed try_update_scene_state using 31% of a CPU, and 98% of that was in 'repz cmpsb', ie, the memcmp. Taking that out takes try_update_scene_state down to 6.5% of the profile; more importantly, ipers goes from 10 to 14fps and gears goes from 790 to 830fps. Signed-off-by: Adam Jackson a...@redhat.com --- src/gallium/drivers/llvmpipe/lp_setup.c | 61 ++- 1 files changed, 27 insertions(+), 34 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c b/src/gallium/drivers/llvmpipe/lp_setup.c index cbe06e5..9118db5 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup.c +++ b/src/gallium/drivers/llvmpipe/lp_setup.c @@ -839,42 +839,35 @@ try_update_scene_state( struct lp_setup_context *setup ) setup-dirty |= LP_SETUP_NEW_FS; } - if (setup-dirty LP_SETUP_NEW_FS) { - if (!setup-fs.stored || - memcmp(setup-fs.stored, - setup-fs.current, - sizeof setup-fs.current) != 0) - { - struct lp_rast_state *stored; - uint i; - - /* The fs state that's been stored in the scene is different from - * the new, current state. So allocate a new lp_rast_state object - * and append it to the bin's setup data buffer. - */ - stored = (struct lp_rast_state *) lp_scene_alloc(scene, sizeof *stored); - if (!stored) { -
Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup
- Original Message - Hmm. Forgive my ignorance, but isn't memcmp() on structs pretty prone to give incorrect != results, given that there may be padding between members in structs and that IIRC gcc struct assignment is member-wise. There's no alternative to bitwise comparison on C: $ cat cmp.c struct foo { int a; int b; }; int cmp(const struct foo *a, const struct foo *b) { return *a == *b; } $ gcc -c cmp.c cmp.c: In function ‘cmp’: cmp.c:7:12: error: invalid operands to binary == (have ‘const struct foo’ and ‘const struct foo’) What happens if there's padding between the jit_context and variant members of struct lp_rast_state? I seem to recall hitting similar issues a number of times in the past. I recall that as well, but my memory is the other way around: struct assignment is considerer harmful because it breaks memcmp. Instead all structures should be initialized with memset(0) first, and always copied with memcpy. This should ensure that padding doesn't get clobbered. But now that you mention it, I'm not still 100% that unsed bits on bitfields are preserved like that or not, when being assigned. We probably should ensure that the all bits in bitfields are used, using reserved members, so that the zeros there stay zero. Jose /Thomas On 06/30/2011 03:36 AM, Roland Scheidegger wrote: Ok in fact there's a gcc bug about memcmp: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 In short gcc's memcmp builtin is totally lame and loses to glibc's memcmp (including call overhead, no knowledge about alignment etc.) even when comparing only very few bytes (and loses BIG time for lots of bytes to compare). Oops. Well at least if the strings are the same (I'd guess if the first byte is different it's hard to beat the gcc builtin...). So this is really a gcc bug. The bug is quite old though with no fix in sight apparently so might need to think about some workaround (but just not doing the comparison doesn't look like the right idea, since apparently it would be faster with the comparison if gcc's memcmp got fixed). Roland Am 30.06.2011 01:47, schrieb Roland Scheidegger: I didn't even look at that was just curious why the memcmp (which is used a lot in other places) is slow. However, none of the other memcmp seem to show up prominently (cso functions are quite low in profiles, _mesa_search_program_cache uses memcmp too but it's not that high neither). So I guess those functions either aren't called that often or the sizes they compare are small. So should maybe just file a gcc bug for memcmp and look at that particular llvmpipe issue again :-). Roland Am 30.06.2011 01:16, schrieb Corbin Simpson: Okay, so maybe I'm failing to recognize the exact situation here, but wouldn't it be possible to mark the FS state with a serial number and just compare those? Or are these FS states not CSO-cached? ~ C. On Wed, Jun 29, 2011 at 3:44 PM, Roland Scheideggersrol...@vmware.com wrote: Actually I ran some numbers here and tried out a optimized struct compare: original ipers: 12.1 fps ajax patch: 15.5 fps optimized struct compare: 16.8 fps This is the function I used for that (just enabled in that lp_setup function): static INLINE int util_cmp_struct(const void *src1, const void *src2, unsigned count) { /* hmm pointer casting is evil */ const uint32_t *src1_ptr = (uint32_t *)src1; const uint32_t *src2_ptr = (uint32_t *)src2; unsigned i; assert(count % 4 == 0); for (i = 0; i count/4; i++) { if (*src1_ptr != *src2_ptr) { return 1; } src1_ptr++; src2_ptr++; } return 0; } (And no this doesn't use repz cmpsd here.) So, unless I made some mistake, memcmp is just dead slow (*), most of the slowness probably coming from the bytewise comparison (and apparently I was wrong in assuming the comparison there might never be the same for ipers). Of course, the optimized struct compare relies on structs really being dword aligned (I think this is always the case), and additionally it relies on the struct size being a whole multiple of dwords - likely struct needs padding to ensure that (at least I don't think this is always guaranteed for all structs). But since memcmp is used extensively (cso for one) maybe some optimization along these lines might be worth it (though of course for small structs the win isn't going to be as big - and can't beat the repz cmps in code size...). Roland (*) I actually found some references some implementations might be better they don't just use repz cmpsb but they split this up in parts which do dword (or qword even - well for really large structs could use sse2) comparisons for the parts where it's possible and only byte comparisons for the remaining
Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup
- Original Message - On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote: Ok in fact there's a gcc bug about memcmp: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 In short gcc's memcmp builtin is totally lame and loses to glibc's memcmp (including call overhead, no knowledge about alignment etc.) even when comparing only very few bytes (and loses BIG time for lots of bytes to compare). Oops. Well at least if the strings are the same (I'd guess if the first byte is different it's hard to beat the gcc builtin...). So this is really a gcc bug. The bug is quite old though with no fix in sight apparently so might need to think about some workaround (but just not doing the comparison doesn't look like the right idea, since apparently it would be faster with the comparison if gcc's memcmp got fixed). Looking at the struct again (it's been a while), it seems like it could be rearranged to be variable-sized and on average significantly smaller: struct lp_rast_state { struct lp_jit_context jit_context; struct lp_fragment_shader_variant *variant; }; struct lp_jit_context { const float *constants; float alpha_ref_value; uint32_t stencil_ref_front, stencil_ref_back; uint8_t *blend_color; struct lp_jit_texture textures[PIPE_MAX_SAMPLERS]; }; If we moved the jit_context part behind variant, and then hopefully note that most of those lp_jit_texture structs are not in use. That would save time on the memcmp *and* space in the binned data. Yeah, sounds a good idea. But there's some subtletly to computing the number of textures: it can't be just the NULL textures, because they may be reffered by the JIT code, which has no NULL checks and relies on the state setup to provide storage for all textures, or dummy memory if one is not bound. I think a better idea would be: - split the texture/sampler state - to make the lp_jit_context::textures an array of pointers, and put the struct lp_jit_texture in the pipe_texture object themselves - to make the lp_jit_context::samplers an array of pointers, and put the struct lp_jit_sampler in the pipe_sampler_state CSO struct lp_jit_context { struct lp_jit_texture *textures[PIPE_MAX_SAMPLERS]; struct lp_jit_sampler *samplers[PIPE_MAX_SAMPLERS]; }; struct lp_jit_texture { uint32_t width; uint32_t height; uint32_t depth; uint32_t first_level; uint32_t last_level; uint32_t row_stride[LP_MAX_TEXTURE_LEVELS]; uint32_t img_stride[LP_MAX_TEXTURE_LEVELS]; const void *data[LP_MAX_TEXTURE_LEVELS]; /* sampler state, actually */ float min_lod; float max_lod; float lod_bias; float border_color[4]; }; struct lp_jit_sampler { float min_lod; float max_lod; float lod_bias; float border_color[4]; }; Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] intel: Recognize GL_DEPTH_COMPONENT32 in get_teximage_readbuffer.
On Thu, Jun 30, 2011 at 12:04 AM, Kenneth Graunke kenn...@whitecape.org wrote: gl_texture_image::InternalFormat is actually the user requested internal format, not what the texture actually is. Thus, even though we don't support 32-bit depth buffers, we need to recognize the enumeration here. Otherwise, it wrongly returns the color read buffer instead of the depth read buffer. Fixes an issue in PlaneShift 0.5.7 when casting spells. The game calls CopyTexSubImage2D on buffers with a GL_DEPTH_COMPONENT32 internal format, which (prior to this patch) resulted in an attempt to copy an ARGB to S8_Z24. This patch fixes the behavior, but does not yet eliminate the software fallback. NOTE: This is a candidate for the 7.10 and 7.11 branches. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/intel/intel_tex_copy.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) I kind of wonder if we should just be using TexFormat (the actual format) rather than InternalFormat (the user requested format). diff --git a/src/mesa/drivers/dri/intel/intel_tex_copy.c b/src/mesa/drivers/dri/intel/intel_tex_copy.c index eda07a4..8b5c3f0 100644 --- a/src/mesa/drivers/dri/intel/intel_tex_copy.c +++ b/src/mesa/drivers/dri/intel/intel_tex_copy.c @@ -58,6 +58,7 @@ get_teximage_readbuffer(struct intel_context *intel, GLenum internalFormat) switch (internalFormat) { case GL_DEPTH_COMPONENT: case GL_DEPTH_COMPONENT16: + case GL_DEPTH_COMPONENT32: case GL_DEPTH24_STENCIL8_EXT: case GL_DEPTH_STENCIL_EXT: return intel_get_renderbuffer(intel-ctx.ReadBuffer, BUFFER_DEPTH); In the interest of covering all current and future depth formats, you could replace the switch with a call to _mesa_is_depth_format() || _mesa_is_depthstencil_format(). Or don't use internalFormat at all- query _mesa_get_format_bits(texImage-TexFormat, GL_DEPTH_BITS) 0. In fact, any place where we're doing a switch on a texture/image format we should look if the job can be done better with a call to a format predicate function. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 37177] Mathematica Plot3D Crash
https://bugs.freedesktop.org/show_bug.cgi?id=37177 --- Comment #3 from Mark van Rossum mvanr...@inf.ed.ac.uk 2011-06-30 05:34:57 PDT --- Confirmed for me on Fedora 15 with Mathematica 7 on VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c). -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup
On Thu, 2011-06-30 at 03:27 -0700, Jose Fonseca wrote: - Original Message - On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote: Ok in fact there's a gcc bug about memcmp: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 In short gcc's memcmp builtin is totally lame and loses to glibc's memcmp (including call overhead, no knowledge about alignment etc.) even when comparing only very few bytes (and loses BIG time for lots of bytes to compare). Oops. Well at least if the strings are the same (I'd guess if the first byte is different it's hard to beat the gcc builtin...). So this is really a gcc bug. The bug is quite old though with no fix in sight apparently so might need to think about some workaround (but just not doing the comparison doesn't look like the right idea, since apparently it would be faster with the comparison if gcc's memcmp got fixed). Looking at the struct again (it's been a while), it seems like it could be rearranged to be variable-sized and on average significantly smaller: struct lp_rast_state { struct lp_jit_context jit_context; struct lp_fragment_shader_variant *variant; }; struct lp_jit_context { const float *constants; float alpha_ref_value; uint32_t stencil_ref_front, stencil_ref_back; uint8_t *blend_color; struct lp_jit_texture textures[PIPE_MAX_SAMPLERS]; }; If we moved the jit_context part behind variant, and then hopefully note that most of those lp_jit_texture structs are not in use. That would save time on the memcmp *and* space in the binned data. Yeah, sounds a good idea. But there's some subtletly to computing the number of textures: it can't be just the NULL textures, because they may be reffered by the JIT code, which has no NULL checks and relies on the state setup to provide storage for all textures, or dummy memory if one is not bound. So it's a property of the variant, right? We should just store that information when we generate the llvm variant. I think a better idea would be: - split the texture/sampler state - to make the lp_jit_context::textures an array of pointers, and put the struct lp_jit_texture in the pipe_texture object themselves - to make the lp_jit_context::samplers an array of pointers, and put the struct lp_jit_sampler in the pipe_sampler_state CSO I like this too - it's somewhat more involved of course. In fact the two are orthogonal -- the struct below can still be shrunk significantly by knowing how many samplers textures the variant refers to. Interleaving them or packing them would reduce the bytes to be compared. Alternatively there could be just a pointer in jit_context to textures/samplers binned elsewhere. struct lp_jit_context { struct lp_jit_texture *textures[PIPE_MAX_SAMPLERS]; struct lp_jit_sampler *samplers[PIPE_MAX_SAMPLERS]; }; The jit context above seems to have lost some of its fields... The next step might be to split the context into four parts: textures, samplers, constants, other, and have jit_context just be a set of pointers into the binned data: struct lp_jit_context { struct lp_jit_texture **textures; struct lp_jit_sampler **samplers; const float *constants; const struct lp_jit_other *other; }; struct lp_jit_other { float alpha_ref_value; uint32_t stencil_ref_front; uint32_t stencil_ref_back; uint8_t *blend_color; }; struct lp_jit_texture { uint32_t width; uint32_t height; uint32_t depth; uint32_t first_level; uint32_t last_level; uint32_t row_stride[LP_MAX_TEXTURE_LEVELS]; uint32_t img_stride[LP_MAX_TEXTURE_LEVELS]; const void *data[LP_MAX_TEXTURE_LEVELS]; /* sampler state, actually */ float min_lod; float max_lod; float lod_bias; float border_color[4]; }; struct lp_jit_sampler { float min_lod; float max_lod; float lod_bias; float border_color[4]; }; Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup
On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote: Ok in fact there's a gcc bug about memcmp: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 In short gcc's memcmp builtin is totally lame and loses to glibc's memcmp (including call overhead, no knowledge about alignment etc.) even when comparing only very few bytes (and loses BIG time for lots of bytes to compare). Oops. Well at least if the strings are the same (I'd guess if the first byte is different it's hard to beat the gcc builtin...). So this is really a gcc bug. The bug is quite old though with no fix in sight apparently so might need to think about some workaround (but just not doing the comparison doesn't look like the right idea, since apparently it would be faster with the comparison if gcc's memcmp got fixed). How do things fare if you build with -fno-builtin-memcmp? - ajax signature.asc Description: This is a digitally signed message part ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup
Am 30.06.2011 12:14, schrieb Jose Fonseca: - Original Message - Hmm. Forgive my ignorance, but isn't memcmp() on structs pretty prone to give incorrect != results, given that there may be padding between members in structs and that IIRC gcc struct assignment is member-wise. There's no alternative to bitwise comparison on C: $ cat cmp.c struct foo { int a; int b; }; int cmp(const struct foo *a, const struct foo *b) { return *a == *b; } $ gcc -c cmp.c cmp.c: In function ‘cmp’: cmp.c:7:12: error: invalid operands to binary == (have ‘const struct foo’ and ‘const struct foo’) What happens if there's padding between the jit_context and variant members of struct lp_rast_state? I seem to recall hitting similar issues a number of times in the past. I recall that as well, but my memory is the other way around: struct assignment is considerer harmful because it breaks memcmp. Instead all structures should be initialized with memset(0) first, and always copied with memcpy. This should ensure that padding doesn't get clobbered. But now that you mention it, I'm not still 100% that unsed bits on bitfields are preserved like that or not, when being assigned. We probably should ensure that the all bits in bitfields are used, using reserved members, so that the zeros there stay zero. We've definitely hit issues like that in the past - I think if you use struct assignment you'll need to initialize the dst struct to 0 initially (but only once - even though the padding is probably undefined after such an assignment, any implementation should either copy everything including the padding or leave padding alone). I don't think anything else will touch the unused parts, though I guess it might be possible for instance if a 32bit int is assigned to a 16 bit bitfield which has padding after it. But generally using memcmp/memcpy should work ok, and it gives the compiler all the information it needs to do it fast. Well if it uses it or not is another question... I think the problem with gcc is that when it inserts the comparisons with repz cmpsb it knows alignment and size to copy but doesn't know the result is only used as a strict comparison - that makes it impossible to generate really optimized code there (as you need some byte comparison to get correct memcmp semantics unless you use bswap or do dword comparison then byte comparison on a non-match, both of which are probably slower if you expect the comparison to fail on first byte which is the case for instance in substring searches) and later it can't optimize that into something more sensible. So this might not be trivial to fix in gcc. Too bad no builtin is available which only returns true/false... Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 37177] Mathematica Plot3D Crash
https://bugs.freedesktop.org/show_bug.cgi?id=37177 --- Comment #4 from Ivan Iakoupov vox...@gmail.com 2011-06-30 07:43:07 PDT --- I did a bisect and it worked prior to this commit: commit dea5e57861ec998cb7ee913a8819752cb9fa946b Author: Eric Anholt Date: Mon Feb 14 18:57:49 2011 -0800 intel: Use the current context rather than last bound context for a drawable. If another thread bound a context to the drawable then unbound it, the driContextPriv would end up NULL. With the previous two fixes, this fixes glx-multithread-makecurrent-2, despite the issue not being about the multithreaded makecurrent. Currently I also get a full gpu freeze when opening a document with a Plot3D with mesa master so this bug gets hidden by that one. I'll try bisecting that freeze and file a new bug report. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix check for empty cs
On Wed, 2011-06-29 at 16:29 +0400, Vadim Girlin wrote: --- There is ~20% fps boost with the etqw timenetdemo after this patch. Btw, it seems that bad performance without that patch is caused by the occlusion queries, so probably they need some fixes too. I've added some counters to check it and found that there are thousands of redundant calls to r600_context_flush per second (or ~97% of all flushes), which are skipped with that patch. And there are no such flushes at all when I set r_useOcclusionQueries to 0 in the etqw settings. Default for r_useOcclusionQueries is 1 for me, but it seems the defaults may be different for others. There are probably not needed flushes for example in the r600_get_query_result. It calls ctx-flush, then r600_context_query_result, which also calls r600_context_flush. Then it calls r600_query_result - r600_bo_map - one more possible flush. Are all of these flush calls really needed? Vadim ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup
Am 30.06.2011 16:14, schrieb Adam Jackson: On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote: Ok in fact there's a gcc bug about memcmp: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 In short gcc's memcmp builtin is totally lame and loses to glibc's memcmp (including call overhead, no knowledge about alignment etc.) even when comparing only very few bytes (and loses BIG time for lots of bytes to compare). Oops. Well at least if the strings are the same (I'd guess if the first byte is different it's hard to beat the gcc builtin...). So this is really a gcc bug. The bug is quite old though with no fix in sight apparently so might need to think about some workaround (but just not doing the comparison doesn't look like the right idea, since apparently it would be faster with the comparison if gcc's memcmp got fixed). How do things fare if you build with -fno-builtin-memcmp? This is even faster: original ipers: 12.1 fps ajax patch: 15.5 fps optimized struct compare: 16.8 fps -fno-builtin-memcmp: 18.1 fps Looks like we have a winner :-) I guess glibc optimizes the hell out of it (in contrast to the other results, this affected all memcmp though I don't know if any others benefited from that on average). As noted by Keith though the struct we compare is really large (over 4k) so trimming the size might be a good idea anyway (of course the 4k size also meant any call overhead and non-optimal code due to glibc not knowing alignment beforehand and usage of return value is completely insignificant). A 50% improvement from disabling a compiler optimization, lol. Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup
On Thu, 2011-06-30 at 17:53 +0200, Roland Scheidegger wrote: Am 30.06.2011 16:14, schrieb Adam Jackson: On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote: Ok in fact there's a gcc bug about memcmp: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 In short gcc's memcmp builtin is totally lame and loses to glibc's memcmp (including call overhead, no knowledge about alignment etc.) even when comparing only very few bytes (and loses BIG time for lots of bytes to compare). Oops. Well at least if the strings are the same (I'd guess if the first byte is different it's hard to beat the gcc builtin...). So this is really a gcc bug. The bug is quite old though with no fix in sight apparently so might need to think about some workaround (but just not doing the comparison doesn't look like the right idea, since apparently it would be faster with the comparison if gcc's memcmp got fixed). How do things fare if you build with -fno-builtin-memcmp? This is even faster: original ipers: 12.1 fps ajax patch: 15.5 fps optimized struct compare: 16.8 fps -fno-builtin-memcmp: 18.1 fps Looks like we have a winner :-) I guess glibc optimizes the hell out of it (in contrast to the other results, this affected all memcmp though I don't know if any others benefited from that on average). As noted by Keith though the struct we compare is really large (over 4k) so trimming the size might be a good idea anyway (of course the 4k size also meant any call overhead and non-optimal code due to glibc not knowing alignment beforehand and usage of return value is completely insignificant). A 50% improvement from disabling a compiler optimization, lol. We probably what this everywhere throughout Mesa Gallium... Keith ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] intel: Recognize GL_DEPTH_COMPONENT32 in get_teximage_readbuffer.
On Thu, 30 Jun 2011 06:28:13 -0600, Brian Paul brian.e.p...@gmail.com wrote: On Thu, Jun 30, 2011 at 12:04 AM, Kenneth Graunke kenn...@whitecape.org wrote: gl_texture_image::InternalFormat is actually the user requested internal format, not what the texture actually is. Thus, even though we don't support 32-bit depth buffers, we need to recognize the enumeration here. Otherwise, it wrongly returns the color read buffer instead of the depth read buffer. Fixes an issue in PlaneShift 0.5.7 when casting spells. The game calls CopyTexSubImage2D on buffers with a GL_DEPTH_COMPONENT32 internal format, which (prior to this patch) resulted in an attempt to copy an ARGB to S8_Z24. This patch fixes the behavior, but does not yet eliminate the software fallback. NOTE: This is a candidate for the 7.10 and 7.11 branches. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/intel/intel_tex_copy.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) I kind of wonder if we should just be using TexFormat (the actual format) rather than InternalFormat (the user requested format). diff --git a/src/mesa/drivers/dri/intel/intel_tex_copy.c b/src/mesa/drivers/dri/intel/intel_tex_copy.c index eda07a4..8b5c3f0 100644 --- a/src/mesa/drivers/dri/intel/intel_tex_copy.c +++ b/src/mesa/drivers/dri/intel/intel_tex_copy.c @@ -58,6 +58,7 @@ get_teximage_readbuffer(struct intel_context *intel, GLenum internalFormat) switch (internalFormat) { case GL_DEPTH_COMPONENT: case GL_DEPTH_COMPONENT16: + case GL_DEPTH_COMPONENT32: case GL_DEPTH24_STENCIL8_EXT: case GL_DEPTH_STENCIL_EXT: return intel_get_renderbuffer(intel-ctx.ReadBuffer, BUFFER_DEPTH); In the interest of covering all current and future depth formats, you could replace the switch with a call to _mesa_is_depth_format() || _mesa_is_depthstencil_format(). Or don't use internalFormat at all- query _mesa_get_format_bits(texImage-TexFormat, GL_DEPTH_BITS) 0. internalFormat in this case is supposed to determine what is copied, so we have to look at it, not the (current, if any) texture format. So, I think _mesa_is_*_format() are going to be the right way to go. Also, we've got separate stencil issues in this path we need to look into. pgp75jfHBBBEL.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 38842] New: Various valid GLX attributes are rejected by MESA glxChooseFBConfig
https://bugs.freedesktop.org/show_bug.cgi?id=38842 Summary: Various valid GLX attributes are rejected by MESA glxChooseFBConfig Product: Mesa Version: 7.10 Platform: x86 (IA32) OS/Version: Linux (All) Status: NEW Severity: normal Priority: medium Component: Drivers/X11 AssignedTo: mesa-dev@lists.freedesktop.org ReportedBy: jonathan.kirk...@arm.com The following configuration attributes which are listed as valid in the glx specification are rejected by the the MESA implementation of glxChooseFBConfig: GLX_MAX_PBUFFER_WIDTH GLX_MAX_PBUFFER_HEIGHT GLX_MAX_PBUFFER_PIXELS GLX_VISUAL_ID GLX_X_VISUAL_TYPE The rejection occurs within the choose_visual method within fakeglx.c (having been called from glxChooseFBConfig. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] mesa master: commit 2699fce0d69db5158427c8b6c8194b2eefc5e58b
Commit 2699fce0d69db5158427c8b6c8194b2eefc5e58b: The first chunk (/common.py) looks really strange. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] st/mesa: use the first non-VOID channel in st_format_datatype
Otherwise PIPE_FORMAT_X8B8G8R8_UNORM and friends would fail. NOTE: This is a candidate for the 7.10 and 7.11 branches. --- src/mesa/state_tracker/st_format.c | 19 --- 1 files changed, 16 insertions(+), 3 deletions(-) diff --git a/src/mesa/state_tracker/st_format.c b/src/mesa/state_tracker/st_format.c index fa5d8f5..3260297 100644 --- a/src/mesa/state_tracker/st_format.c +++ b/src/mesa/state_tracker/st_format.c @@ -68,10 +68,18 @@ GLenum st_format_datatype(enum pipe_format format) { const struct util_format_description *desc; + int i; desc = util_format_description(format); assert(desc); + /* Find the first non-VOID channel. */ + for (i = 0; i 4; i++) { + if (desc-channel[i].type != UTIL_FORMAT_TYPE_VOID) { + break; + } + } + if (desc-layout == UTIL_FORMAT_LAYOUT_PLAIN) { if (format == PIPE_FORMAT_B5G5R5A1_UNORM || format == PIPE_FORMAT_B5G6R5_UNORM) { @@ -85,21 +93,26 @@ st_format_datatype(enum pipe_format format) } else { const GLuint size = format_max_bits(format); + + assert(i 4); + if (i == 4) +return GL_NONE; + if (size == 8) { -if (desc-channel[0].type == UTIL_FORMAT_TYPE_UNSIGNED) +if (desc-channel[i].type == UTIL_FORMAT_TYPE_UNSIGNED) return GL_UNSIGNED_BYTE; else return GL_BYTE; } else if (size == 16) { -if (desc-channel[0].type == UTIL_FORMAT_TYPE_UNSIGNED) +if (desc-channel[i].type == UTIL_FORMAT_TYPE_UNSIGNED) return GL_UNSIGNED_SHORT; else return GL_SHORT; } else { assert( size = 32 ); -if (desc-channel[0].type == UTIL_FORMAT_TYPE_UNSIGNED) +if (desc-channel[i].type == UTIL_FORMAT_TYPE_UNSIGNED) return GL_UNSIGNED_INT; else return GL_INT; -- 1.7.4.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] st/mesa: handle float formats in st_format_datatype
NOTE: This is a candidate for the 7.11 branch. --- src/mesa/state_tracker/st_format.c | 16 ++-- 1 files changed, 14 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_format.c b/src/mesa/state_tracker/st_format.c index 3260297..d1995f1 100644 --- a/src/mesa/state_tracker/st_format.c +++ b/src/mesa/state_tracker/st_format.c @@ -85,6 +85,10 @@ st_format_datatype(enum pipe_format format) format == PIPE_FORMAT_B5G6R5_UNORM) { return GL_UNSIGNED_SHORT; } + else if (format == PIPE_FORMAT_R11G11B10_FLOAT || + format == PIPE_FORMAT_R9G9B9E5_FLOAT) { + return GL_FLOAT; + } else if (format == PIPE_FORMAT_Z24_UNORM_S8_USCALED || format == PIPE_FORMAT_S8_USCALED_Z24_UNORM || format == PIPE_FORMAT_Z24X8_UNORM || @@ -105,18 +109,26 @@ st_format_datatype(enum pipe_format format) return GL_BYTE; } else if (size == 16) { +if (desc-channel[i].type == UTIL_FORMAT_TYPE_FLOAT) + return GL_HALF_FLOAT; if (desc-channel[i].type == UTIL_FORMAT_TYPE_UNSIGNED) return GL_UNSIGNED_SHORT; else return GL_SHORT; } - else { -assert( size = 32 ); + else if (size = 32) { +if (desc-channel[i].type == UTIL_FORMAT_TYPE_FLOAT) + return GL_FLOAT; if (desc-channel[i].type == UTIL_FORMAT_TYPE_UNSIGNED) return GL_UNSIGNED_INT; else return GL_INT; } + else { +assert(size == 64); +assert(desc-channel[i].type == UTIL_FORMAT_TYPE_FLOAT); +return GL_DOUBLE; + } } } else if (format == PIPE_FORMAT_UYVY) { -- 1.7.4.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/gen7: Remove gratuitous dirty flags from WM and PS state.
Commit b46dc45ceef3deb17ba2b0b4300eeb93e9cf7833 claimed that NEW_POLYGONSTIPPLE is gratuitous, but somehow just changed comments and whitespace instead of actually removing the flag. While we're at it, 3DSTATE_PS doesn't appear to need NEW_LINE or NEW_POLYGON either (those are in 3DSTATE_WM). Cc: Eric Anholt e...@anholt.net Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/gen7_wm_state.c |7 ++- 1 files changed, 2 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c b/src/mesa/drivers/dri/i965/gen7_wm_state.c index 17f7535..0688963 100644 --- a/src/mesa/drivers/dri/i965/gen7_wm_state.c +++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c @@ -138,7 +138,7 @@ upload_wm_state(struct brw_context *brw) const struct brw_tracked_state gen7_wm_state = { .dirty = { - .mesa = (_NEW_LINE | _NEW_POLYGON | _NEW_POLYGONSTIPPLE | + .mesa = (_NEW_LINE | _NEW_POLYGON _NEW_COLOR | _NEW_BUFFERS), .brw = (BRW_NEW_CURBE_OFFSETS | BRW_NEW_FRAGMENT_PROGRAM | @@ -240,10 +240,7 @@ upload_ps_state(struct brw_context *brw) const struct brw_tracked_state gen7_ps_state = { .dirty = { - .mesa = (_NEW_LINE | - _NEW_POLYGON | - _NEW_POLYGONSTIPPLE | - _NEW_PROGRAM_CONSTANTS), + .mesa = _NEW_PROGRAM_CONSTANTS, .brw = (BRW_NEW_CURBE_OFFSETS | BRW_NEW_FRAGMENT_PROGRAM | BRW_NEW_NR_WM_SURFACES | -- 1.7.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/13] Floating-point depth buffers
Hi, this patch series implements ARB_depth_buffer_float in Mesa and Gallium. There is complete r600g/r600-r700 support in my private branch, which passes the same tests that pass for Z24S8. Softpipe has only sampler support. This has turned out to be not so trivial, so it's possible I missed something. I did not implement NV_depth_buffer_float, because it's not compatible with the ARB variant. (GL_DEPTH_COMPONENT32F != GL_DEPTH_COMPONENT32F_NV etc.) The NV extension can operate on unclamped depth values, whereas the ARB one always clamps them. Please review. Marek Olšák (13): mesa: initial ARB_depth_buffer_float support mesa: implement texfetch functions for depth_buffer_float mesa: implement stencil unpacking for GL_FLOAT_32_UNSIGNED_INT_24_8_REV mesa: implement depth unpacking for GL_FLOAT_32_UNSIGNED_INT_24_8_REV mesa: implement texstore for DEPTH_COMPONENT32F mesa: implement texstore for DEPTH32F_STENCIL8 mesa: implement generatemipmap for GL_FLOAT_32_UNSIGNED_INT_24_8_REV mesa: implement depth/stencil renderbuffer wrapper accessors for Z32F_X24S8 st/mesa: initial ARB_depth_buffer_float support st/mesa: implement read/draw/copypixels for Z32F and Z32F_S8X24 gallium/util: implement pack functions for Z32F and Z32F_S8X24 gallium/util: implement software Z32F_S8X24 depth-stencil clear gallium/util: handle Z32F_FLOAT_S8X24_USCALED in pipe_tile_raw_to_rgba src/gallium/auxiliary/util/u_pack_color.h | 64 ++ src/gallium/auxiliary/util/u_surface.c| 35 +++- src/gallium/auxiliary/util/u_tile.c | 35 +++ src/mesa/main/depthstencil.c | 322 +++-- src/mesa/main/depthstencil.h |5 + src/mesa/main/fbobject.c | 19 ++ src/mesa/main/formats.c | 29 +++ src/mesa/main/formats.h |3 + src/mesa/main/framebuffer.c | 10 +- src/mesa/main/image.c | 18 ++- src/mesa/main/mipmap.c| 20 ++ src/mesa/main/pack.c | 62 +- src/mesa/main/readpix.c | 29 +++- src/mesa/main/renderbuffer.c |3 + src/mesa/main/texfetch.c | 14 ++ src/mesa/main/texfetch_tmp.h | 23 ++ src/mesa/main/texformat.c | 13 ++ src/mesa/main/texstore.c | 79 +++- src/mesa/state_tracker/st_cb_clear.c |6 +- src/mesa/state_tracker/st_cb_drawpixels.c | 64 +- src/mesa/state_tracker/st_cb_readpixels.c | 43 src/mesa/state_tracker/st_extensions.c| 11 + src/mesa/state_tracker/st_format.c| 19 ++ 23 files changed, 875 insertions(+), 51 deletions(-) Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/13] mesa: initial ARB_depth_buffer_float support
Using GL_NONE as DataType of Z32_FLOAT_X24S8, not sure what I should put there. The spec says the type is n/a. --- src/mesa/main/fbobject.c | 19 +++ src/mesa/main/formats.c | 29 + src/mesa/main/formats.h |3 +++ src/mesa/main/image.c| 18 -- src/mesa/main/readpix.c | 29 + src/mesa/main/renderbuffer.c |3 +++ src/mesa/main/texfetch.c | 14 ++ src/mesa/main/texformat.c| 13 + src/mesa/main/texstore.c |3 +++ 9 files changed, 125 insertions(+), 6 deletions(-) diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c index 8cc3fd4..d094dd3 100644 --- a/src/mesa/main/fbobject.c +++ b/src/mesa/main/fbobject.c @@ -1131,6 +1131,16 @@ _mesa_base_fbo_format(struct gl_context *ctx, GLenum internalFormat) return GL_DEPTH_STENCIL_EXT; else return 0; + case GL_DEPTH_COMPONENT32F: + if (ctx-Extensions.ARB_depth_buffer_float) + return GL_DEPTH_COMPONENT; + else + return 0; + case GL_DEPTH32F_STENCIL8: + if (ctx-Extensions.ARB_depth_buffer_float) + return GL_DEPTH_STENCIL; + else + return 0; case GL_RED: case GL_R8: case GL_R16: @@ -2266,6 +2276,15 @@ _mesa_GetFramebufferAttachmentParameterivEXT(GLenum target, GLenum attachment, /* special cases */ *params = GL_INDEX; } + else if (format == MESA_FORMAT_Z32_FLOAT_X24S8) { +/* depends on the attachment parameter */ +if (attachment == GL_STENCIL_ATTACHMENT) { + *params = GL_INDEX; +} +else { + *params = GL_FLOAT; +} + } else { *params = _mesa_get_format_datatype(format); } diff --git a/src/mesa/main/formats.c b/src/mesa/main/formats.c index e88ba43..f58b197 100644 --- a/src/mesa/main/formats.c +++ b/src/mesa/main/formats.c @@ -1091,6 +1091,25 @@ static struct gl_format_info format_info[MESA_FORMAT_COUNT] = 0, 0, 0, 0, 0, 1, 1, 4 }, + /* ARB_depth_buffer_float */ + { + MESA_FORMAT_Z32_FLOAT, /* Name */ + MESA_FORMAT_Z32_FLOAT, /* StrName */ + GL_DEPTH_COMPONENT, /* BaseFormat */ + GL_FLOAT,/* DataType */ + 0, 0, 0, 0, /* Red/Green/Blue/AlphaBits */ + 0, 0, 0, 32, 0, /* Lum/Int/Index/Depth/StencilBits */ + 1, 1, 4 /* BlockWidth/Height,Bytes */ + }, + { + MESA_FORMAT_Z32_FLOAT_X24S8, /* Name */ + MESA_FORMAT_Z32_FLOAT_X24S8, /* StrName */ + GL_DEPTH_STENCIL,/* BaseFormat */ + GL_NONE /* XXX */, /* DataType */ + 0, 0, 0, 0, /* Red/Green/Blue/AlphaBits */ + 0, 0, 0, 32, 8, /* Lum/Int/Index/Depth/StencilBits */ + 1, 1, 8 /* BlockWidth/Height,Bytes */ + }, }; @@ -1654,6 +1673,16 @@ _mesa_format_to_type_and_comps(gl_format format, *comps = 1; return; + case MESA_FORMAT_Z32_FLOAT: + *datatype = GL_FLOAT; + *comps = 1; + return; + + case MESA_FORMAT_Z32_FLOAT_X24S8: + *datatype = GL_FLOAT_32_UNSIGNED_INT_24_8_REV; + *comps = 1; + return; + case MESA_FORMAT_DUDV8: *datatype = GL_BYTE; *comps = 2; diff --git a/src/mesa/main/formats.h b/src/mesa/main/formats.h index 0640bbc..5b8c017 100644 --- a/src/mesa/main/formats.h +++ b/src/mesa/main/formats.h @@ -209,6 +209,9 @@ typedef enum MESA_FORMAT_RGB9_E5_FLOAT, MESA_FORMAT_R11_G11_B10_FLOAT, + MESA_FORMAT_Z32_FLOAT, + MESA_FORMAT_Z32_FLOAT_X24S8, + MESA_FORMAT_COUNT } gl_format; diff --git a/src/mesa/main/image.c b/src/mesa/main/image.c index 6d7bc73..37127dc 100644 --- a/src/mesa/main/image.c +++ b/src/mesa/main/image.c @@ -84,6 +84,7 @@ _mesa_type_is_packed(GLenum type) case GL_UNSIGNED_INT_24_8_EXT: case GL_UNSIGNED_INT_5_9_9_9_REV: case GL_UNSIGNED_INT_10F_11F_11F_REV: + case GL_FLOAT_32_UNSIGNED_INT_24_8_REV: return GL_TRUE; } @@ -228,6 +229,8 @@ _mesa_sizeof_packed_type( GLenum type ) return sizeof(GLuint); case GL_UNSIGNED_INT_10F_11F_11F_REV: return sizeof(GLuint); + case GL_FLOAT_32_UNSIGNED_INT_24_8_REV: + return 8; default: return -1; } @@ -379,6 +382,11 @@ _mesa_bytes_per_pixel( GLenum format, GLenum type ) return sizeof(GLuint); else return -1; + case GL_FLOAT_32_UNSIGNED_INT_24_8_REV: + if (format == GL_DEPTH_STENCIL) +return 8; + else +return -1; default: return -1; } @@ -531,8 +539,10 @@ _mesa_is_legal_format_and_type(const struct gl_context *ctx, else return GL_FALSE; case GL_DEPTH_STENCIL_EXT: - if
[Mesa-dev] [PATCH 02/13] mesa: implement texfetch functions for depth_buffer_float
--- src/mesa/main/texfetch.c | 16 src/mesa/main/texfetch_tmp.h | 23 +++ 2 files changed, 31 insertions(+), 8 deletions(-) diff --git a/src/mesa/main/texfetch.c b/src/mesa/main/texfetch.c index 4b85bc3..72283eb 100644 --- a/src/mesa/main/texfetch.c +++ b/src/mesa/main/texfetch.c @@ -916,17 +916,17 @@ texfetch_funcs[MESA_FORMAT_COUNT] = }, { MESA_FORMAT_Z32_FLOAT, - NULL, /* XXX */ - NULL, - NULL, - NULL + fetch_texel_1d_f_r_f32, /* Reuse the R32F functions. */ + fetch_texel_2d_f_r_f32, + fetch_texel_3d_f_r_f32, + store_texel_r_f32 }, { MESA_FORMAT_Z32_FLOAT_X24S8, - NULL, /* XXX */ - NULL, - NULL, - NULL + fetch_texel_1d_z32f_x24s8, + fetch_texel_2d_z32f_x24s8, + fetch_texel_3d_z32f_x24s8, + store_texel_z32f_x24s8 } }; diff --git a/src/mesa/main/texfetch_tmp.h b/src/mesa/main/texfetch_tmp.h index e6fd81d..3b1eedf 100644 --- a/src/mesa/main/texfetch_tmp.h +++ b/src/mesa/main/texfetch_tmp.h @@ -2374,6 +2374,29 @@ static void store_texel_r11_g11_b10f(struct gl_texture_image *texImage, #endif +/* MESA_FORMAT_Z32_FLOAT_X24S8 ***/ + +static void FETCH(z32f_x24s8)(const struct gl_texture_image *texImage, + GLint i, GLint j, GLint k, GLfloat *texel) +{ + const GLfloat *src = TEXEL_ADDR(GLfloat, texImage, i, j, k, 2); + texel[RCOMP] = src[0]; + texel[GCOMP] = 0.0F; + texel[BCOMP] = 0.0F; + texel[ACOMP] = 1.0F; +} + +#if DIM == 3 +static void store_texel_z32f_x24s8(struct gl_texture_image *texImage, + GLint i, GLint j, GLint k, const void *texel) +{ + const GLfloat *src = (const GLfloat *) texel; + GLfloat *dst = TEXEL_ADDR(GLfloat, texImage, i, j, k, 2); + dst[0] = src[0]; +} +#endif + + #undef TEXEL_ADDR #undef DIM #undef FETCH -- 1.7.4.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/13] mesa: implement stencil unpacking for GL_FLOAT_32_UNSIGNED_INT_24_8_REV
--- src/mesa/main/pack.c | 35 --- 1 files changed, 32 insertions(+), 3 deletions(-) diff --git a/src/mesa/main/pack.c b/src/mesa/main/pack.c index a232a51..c284c7d 100644 --- a/src/mesa/main/pack.c +++ b/src/mesa/main/pack.c @@ -1971,7 +1971,8 @@ extract_uint_indexes(GLuint n, GLuint indexes[], srcType == GL_INT || srcType == GL_UNSIGNED_INT_24_8_EXT || srcType == GL_HALF_FLOAT_ARB || - srcType == GL_FLOAT); + srcType == GL_FLOAT || + srcType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV); switch (srcType) { case GL_BITMAP: @@ -2142,6 +2143,23 @@ extract_uint_indexes(GLuint n, GLuint indexes[], } } break; + case GL_FLOAT_32_UNSIGNED_INT_24_8_REV: + { +GLuint i; +const GLuint *s = (const GLuint *) src; +if (unpack-SwapBytes) { + for (i = 0; i n; i++) { + GLuint value = s[i*2+1]; + SWAP4BYTE(value); + indexes[i] = value 0xff; /* lower 8 bits */ + } +} +else { + for (i = 0; i n; i++) + indexes[i] = s[i*2+1] 0xff; /* lower 8 bits */ +} + } + break; default: _mesa_problem(NULL, bad srcType in extract_uint_indexes); @@ -4412,11 +4430,13 @@ _mesa_unpack_stencil_span( struct gl_context *ctx, GLuint n, srcType == GL_INT || srcType == GL_UNSIGNED_INT_24_8_EXT || srcType == GL_HALF_FLOAT_ARB || - srcType == GL_FLOAT); + srcType == GL_FLOAT || + srcType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV); ASSERT(dstType == GL_UNSIGNED_BYTE || dstType == GL_UNSIGNED_SHORT || - dstType == GL_UNSIGNED_INT); + dstType == GL_UNSIGNED_INT || + dstType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV); /* only shift and offset apply to stencil */ transferOps = IMAGE_SHIFT_OFFSET_BIT; @@ -4488,6 +4508,15 @@ _mesa_unpack_stencil_span( struct gl_context *ctx, GLuint n, case GL_UNSIGNED_INT: memcpy(dest, indexes, n * sizeof(GLuint)); break; + case GL_FLOAT_32_UNSIGNED_INT_24_8_REV: +{ + GLuint *dst = (GLuint *) dest; + GLuint i; + for (i = 0; i n; i++) { + dst[i*2+1] = indexes[i] 0xff; /* lower 8 bits */ + } +} +break; default: _mesa_problem(ctx, bad dstType in _mesa_unpack_stencil_span); } -- 1.7.4.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/13] mesa: implement depth unpacking for GL_FLOAT_32_UNSIGNED_INT_24_8_REV
--- src/mesa/main/pack.c | 27 +-- 1 files changed, 25 insertions(+), 2 deletions(-) diff --git a/src/mesa/main/pack.c b/src/mesa/main/pack.c index c284c7d..d42ae7b 100644 --- a/src/mesa/main/pack.c +++ b/src/mesa/main/pack.c @@ -4827,6 +4827,20 @@ _mesa_unpack_depth_span( struct gl_context *ctx, GLuint n, } } break; + case GL_FLOAT_32_UNSIGNED_INT_24_8_REV: + { +GLuint i; +const GLfloat *src = (const GLfloat *)source; +for (i = 0; i n; i++) { + GLfloat value = src[i * 2]; + if (srcPacking-SwapBytes) { + SWAP4BYTE(value); + } + depthValues[i] = value; +} +needClamp = GL_TRUE; + } + break; case GL_FLOAT: DEPTH_VALUES(GLfloat, 1*); needClamp = GL_TRUE; @@ -4903,9 +4917,18 @@ _mesa_unpack_depth_span( struct gl_context *ctx, GLuint n, zValues[i] = (GLushort) (depthValues[i] * (GLfloat) depthMax); } } + else if (dstType == GL_FLOAT) { + /* Nothing to do. depthValues is pointing to dest. */ + } + else if (dstType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV) { + GLfloat *zValues = (GLfloat*) dest; + GLuint i; + for (i = 0; i n; i++) { + zValues[i*2] = depthValues[i]; + } + } else { - ASSERT(dstType == GL_FLOAT); - /*ASSERT(depthMax == 1.0F);*/ + ASSERT(0); } free(depthTemp); -- 1.7.4.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/13] mesa: implement texstore for DEPTH_COMPONENT32F
--- src/mesa/main/texstore.c | 12 +++- 1 files changed, 7 insertions(+), 5 deletions(-) diff --git a/src/mesa/main/texstore.c b/src/mesa/main/texstore.c index f1de31a..cdac214 100644 --- a/src/mesa/main/texstore.c +++ b/src/mesa/main/texstore.c @@ -1002,15 +1002,17 @@ memcpy_texture(struct gl_context *ctx, /** - * Store a 32-bit integer depth component texture image. + * Store a 32-bit integer or float depth component texture image. */ static GLboolean _mesa_texstore_z32(TEXSTORE_PARAMS) { const GLuint depthScale = 0x; const GLuint texelBytes = _mesa_get_format_bytes(dstFormat); + const GLenum dstType = _mesa_get_format_datatype(dstFormat); (void) dims; - ASSERT(dstFormat == MESA_FORMAT_Z32); + ASSERT(dstFormat == MESA_FORMAT_Z32 || + dstFormat == MESA_FORMAT_Z32_FLOAT); ASSERT(texelBytes == sizeof(GLuint)); if (ctx-Pixel.DepthScale == 1.0f @@ -1018,7 +1020,7 @@ _mesa_texstore_z32(TEXSTORE_PARAMS) !srcPacking-SwapBytes baseInternalFormat == GL_DEPTH_COMPONENT srcFormat == GL_DEPTH_COMPONENT - srcType == GL_UNSIGNED_INT) { + srcType == dstType) { /* simple memcpy path */ memcpy_texture(ctx, dims, dstFormat, dstAddr, dstXoffset, dstYoffset, dstZoffset, @@ -1039,7 +1041,7 @@ _mesa_texstore_z32(TEXSTORE_PARAMS) const GLvoid *src = _mesa_image_address(dims, srcPacking, srcAddr, srcWidth, srcHeight, srcFormat, srcType, img, row, 0); _mesa_unpack_depth_span(ctx, srcWidth, -GL_UNSIGNED_INT, (GLuint *) dstRow, +dstType, dstRow, depthScale, srcType, src, srcPacking); dstRow += dstRowStride; } @@ -4423,7 +4425,7 @@ texstore_funcs[MESA_FORMAT_COUNT] = { MESA_FORMAT_RGB9_E5_FLOAT, _mesa_texstore_rgb9_e5 }, { MESA_FORMAT_R11_G11_B10_FLOAT, _mesa_texstore_r11_g11_b10f }, - { MESA_FORMAT_Z32_FLOAT, NULL /* XXX */ }, + { MESA_FORMAT_Z32_FLOAT, _mesa_texstore_z32 }, { MESA_FORMAT_Z32_FLOAT_X24S8, /* XXX */ }, }; -- 1.7.4.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/13] mesa: implement texstore for DEPTH32F_STENCIL8
--- src/mesa/main/texstore.c | 68 +- 1 files changed, 67 insertions(+), 1 deletions(-) diff --git a/src/mesa/main/texstore.c b/src/mesa/main/texstore.c index cdac214..7e2cafc 100644 --- a/src/mesa/main/texstore.c +++ b/src/mesa/main/texstore.c @@ -4290,6 +4290,72 @@ _mesa_texstore_r11_g11_b10f(TEXSTORE_PARAMS) } +static GLboolean +_mesa_texstore_z32f_x24s8(TEXSTORE_PARAMS) +{ + ASSERT(dstFormat == MESA_FORMAT_Z32_FLOAT_X24S8); + ASSERT(srcFormat == GL_DEPTH_STENCIL || + srcFormat == GL_DEPTH_COMPONENT || + srcFormat == GL_STENCIL_INDEX); + ASSERT(srcFormat != GL_DEPTH_STENCIL || + srcType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV); + + if (srcFormat == GL_DEPTH_STENCIL + ctx-Pixel.DepthScale == 1.0f + ctx-Pixel.DepthBias == 0.0f + !srcPacking-SwapBytes) { + /* simple path */ + memcpy_texture(ctx, dims, + dstFormat, dstAddr, dstXoffset, dstYoffset, dstZoffset, + dstRowStride, + dstImageOffsets, + srcWidth, srcHeight, srcDepth, srcFormat, srcType, + srcAddr, srcPacking); + } + else if (srcFormat == GL_DEPTH_COMPONENT || +srcFormat == GL_STENCIL_INDEX) { + GLint img, row; + const GLint srcRowStride + = _mesa_image_row_stride(srcPacking, srcWidth, srcFormat, srcType) + / sizeof(uint64_t); + + /* In case we only upload depth we need to preserve the stencil */ + for (img = 0; img srcDepth; img++) { + uint64_t *dstRow = (uint64_t *) dstAddr ++ dstImageOffsets[dstZoffset + img] ++ dstYoffset * dstRowStride / sizeof(uint64_t) ++ dstXoffset; + const uint64_t *src += (const uint64_t *) _mesa_image_address(dims, srcPacking, srcAddr, + srcWidth, srcHeight, + srcFormat, srcType, + img, 0, 0); + for (row = 0; row srcHeight; row++) { +/* The unpack functions with: + *dstType = GL_FLOAT_32_UNSIGNED_INT_24_8_REV + * only write their own dword, so the other dword (stencil + * or depth) is preserved. */ +if (srcFormat != GL_STENCIL_INDEX) + _mesa_unpack_depth_span(ctx, srcWidth, + GL_FLOAT_32_UNSIGNED_INT_24_8_REV, /* dst type */ + dstRow, /* dst addr */ + 1.0f, srcType, src, srcPacking); + +if (srcFormat != GL_DEPTH_COMPONENT) + _mesa_unpack_stencil_span(ctx, srcWidth, + GL_FLOAT_32_UNSIGNED_INT_24_8_REV, /* dst type */ + dstRow, /* dst addr */ + srcType, src, srcPacking, + ctx-_ImageTransferState); + +src += srcRowStride; +dstRow += dstRowStride / sizeof(uint64_t); + } + } + } + return GL_TRUE; +} + /** * Table mapping MESA_FORMAT_* to _mesa_texstore_*() @@ -4426,7 +4492,7 @@ texstore_funcs[MESA_FORMAT_COUNT] = { MESA_FORMAT_R11_G11_B10_FLOAT, _mesa_texstore_r11_g11_b10f }, { MESA_FORMAT_Z32_FLOAT, _mesa_texstore_z32 }, - { MESA_FORMAT_Z32_FLOAT_X24S8, /* XXX */ }, + { MESA_FORMAT_Z32_FLOAT_X24S8, _mesa_texstore_z32f_x24s8 }, }; -- 1.7.4.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/13] mesa: implement generatemipmap for GL_FLOAT_32_UNSIGNED_INT_24_8_REV
--- src/mesa/main/mipmap.c | 20 1 files changed, 20 insertions(+), 0 deletions(-) diff --git a/src/mesa/main/mipmap.c b/src/mesa/main/mipmap.c index f2724db..8a811cb 100644 --- a/src/mesa/main/mipmap.c +++ b/src/mesa/main/mipmap.c @@ -706,6 +706,17 @@ do_row(GLenum datatype, GLuint comps, GLint srcWidth, } } + else if (datatype == GL_FLOAT_32_UNSIGNED_INT_24_8_REV comps == 1) { + GLuint i, j, k; + const GLfloat *rowA = (const GLfloat *) srcRowA; + const GLfloat *rowB = (const GLfloat *) srcRowB; + GLfloat *dst = (GLfloat *) dstRow; + for (i = j = 0, k = k0; i (GLuint) dstWidth; + i++, j += colStride, k += colStride) { + dst[i*2] = (rowA[j*2] + rowA[k*2] + rowB[j*2] + rowB[k*2]) * 0.25F; + } + } + else { _mesa_problem(NULL, bad format in do_row()); } @@ -1341,6 +1352,15 @@ do_row_3D(GLenum datatype, GLuint comps, GLint srcWidth, } } + else if (datatype == GL_FLOAT_32_UNSIGNED_INT_24_8_REV comps == 1) { + DECLARE_ROW_POINTERS(GLfloat, 2); + + for (i = j = 0, k = k0; i (GLuint) dstWidth; + i++, j += colStride, k += colStride) { + FILTER_F_3D(0); + } + } + else { _mesa_problem(NULL, bad format in do_row()); } -- 1.7.4.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/13] st/mesa: initial ARB_depth_buffer_float support
--- src/mesa/state_tracker/st_cb_clear.c |6 -- src/mesa/state_tracker/st_extensions.c | 11 +++ src/mesa/state_tracker/st_format.c | 19 +++ 3 files changed, 34 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_clear.c b/src/mesa/state_tracker/st_cb_clear.c index 181fedd..117000b 100644 --- a/src/mesa/state_tracker/st_cb_clear.c +++ b/src/mesa/state_tracker/st_cb_clear.c @@ -381,7 +381,8 @@ check_clear_depth_stencil_with_quad(struct gl_context *ctx, struct gl_renderbuff assert(rb-Format == MESA_FORMAT_S8 || rb-Format == MESA_FORMAT_Z24_S8 || - rb-Format == MESA_FORMAT_S8_Z24); + rb-Format == MESA_FORMAT_S8_Z24 || + rb-Format == MESA_FORMAT_Z32_FLOAT_X24S8); if (ctx-Scissor.Enabled (ctx-Scissor.X != 0 || @@ -436,7 +437,8 @@ check_clear_stencil_with_quad(struct gl_context *ctx, struct gl_renderbuffer *rb assert(rb-Format == MESA_FORMAT_S8 || rb-Format == MESA_FORMAT_Z24_S8 || - rb-Format == MESA_FORMAT_S8_Z24); + rb-Format == MESA_FORMAT_S8_Z24 || + rb-Format == MESA_FORMAT_Z32_FLOAT_X24S8); if (maskStencil) return GL_TRUE; diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index d3aebe5..99b231d 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -607,4 +607,15 @@ void st_init_extensions(struct st_context *st) if (screen-get_param(screen, PIPE_CAP_SM3)) { ctx-Extensions.ARB_shader_texture_lod = GL_TRUE; } + + if (screen-is_format_supported(screen, PIPE_FORMAT_Z32_FLOAT, + PIPE_TEXTURE_2D, 0, + PIPE_BIND_DEPTH_STENCIL | + PIPE_BIND_SAMPLER_VIEW) + screen-is_format_supported(screen, PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED, + PIPE_TEXTURE_2D, 0, + PIPE_BIND_DEPTH_STENCIL | + PIPE_BIND_SAMPLER_VIEW)) { + ctx-Extensions.ARB_depth_buffer_float = GL_TRUE; + } } diff --git a/src/mesa/state_tracker/st_format.c b/src/mesa/state_tracker/st_format.c index d1995f1..bd4f086 100644 --- a/src/mesa/state_tracker/st_format.c +++ b/src/mesa/state_tracker/st_format.c @@ -95,6 +95,9 @@ st_format_datatype(enum pipe_format format) format == PIPE_FORMAT_X8Z24_UNORM) { return GL_UNSIGNED_INT_24_8; } + else if (format == PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED) { + return GL_FLOAT_32_UNSIGNED_INT_24_8_REV; + } else { const GLuint size = format_max_bits(format); @@ -205,6 +208,10 @@ st_mesa_format_to_pipe_format(gl_format mesaFormat) return PIPE_FORMAT_Z24X8_UNORM; case MESA_FORMAT_S8: return PIPE_FORMAT_S8_USCALED; + case MESA_FORMAT_Z32_FLOAT: + return PIPE_FORMAT_Z32_FLOAT; + case MESA_FORMAT_Z32_FLOAT_X24S8: + return PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED; case MESA_FORMAT_YCBCR: return PIPE_FORMAT_UYVY; #if FEATURE_texture_s3tc @@ -427,6 +434,10 @@ st_pipe_format_to_mesa_format(enum pipe_format format) return MESA_FORMAT_X8_Z24; case PIPE_FORMAT_Z24_UNORM_S8_USCALED: return MESA_FORMAT_S8_Z24; + case PIPE_FORMAT_Z32_FLOAT: + return MESA_FORMAT_Z32_FLOAT; + case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED: + return MESA_FORMAT_Z32_FLOAT_X24S8; case PIPE_FORMAT_UYVY: return MESA_FORMAT_YCBCR; @@ -784,6 +795,10 @@ static const struct format_mapping format_map[] = { { GL_DEPTH_COMPONENT, 0 }, { DEFAULT_DEPTH_FORMATS } }, + { + { GL_DEPTH_COMPONENT32F, 0 }, + { PIPE_FORMAT_Z32_FLOAT, 0 } + }, /* stencil formats */ { @@ -800,6 +815,10 @@ static const struct format_mapping format_map[] = { { GL_DEPTH_STENCIL_EXT, GL_DEPTH24_STENCIL8_EXT, 0 }, { PIPE_FORMAT_Z24_UNORM_S8_USCALED, PIPE_FORMAT_S8_USCALED_Z24_UNORM, 0 } }, + { + { GL_DEPTH32F_STENCIL8, 0 }, + { PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED, 0 } + }, /* sRGB formats */ { -- 1.7.4.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/13] st/mesa: implement read/draw/copypixels for Z32F and Z32F_S8X24
--- src/mesa/state_tracker/st_cb_drawpixels.c | 64 + src/mesa/state_tracker/st_cb_readpixels.c | 43 +++ 2 files changed, 98 insertions(+), 9 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c b/src/mesa/state_tracker/st_cb_drawpixels.c index d61d7ac..dca3324 100644 --- a/src/mesa/state_tracker/st_cb_drawpixels.c +++ b/src/mesa/state_tracker/st_cb_drawpixels.c @@ -812,6 +812,7 @@ draw_stencil_pixels(struct gl_context *ctx, GLint x, GLint y, for (row = 0; row height; row++) { GLubyte sValues[MAX_WIDTH]; GLuint zValues[MAX_WIDTH]; + GLfloat *zValuesFloat = (GLfloat*)zValues; GLenum destType = GL_UNSIGNED_BYTE; const GLvoid *source = _mesa_image_address2d(clippedUnpack, pixels, width, height, @@ -822,7 +823,11 @@ draw_stencil_pixels(struct gl_context *ctx, GLint x, GLint y, ctx-_ImageTransferState); if (format == GL_DEPTH_STENCIL) { -_mesa_unpack_depth_span(ctx, spanWidth, GL_UNSIGNED_INT, zValues, +GLenum ztype = + pt-resource-format == PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED ? + GL_FLOAT : GL_UNSIGNED_INT; + +_mesa_unpack_depth_span(ctx, spanWidth, ztype, zValues, (1 24) - 1, type, source, clippedUnpack); } @@ -887,6 +892,26 @@ draw_stencil_pixels(struct gl_context *ctx, GLint x, GLint y, } } break; +case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED: + if (format == GL_DEPTH_STENCIL) { + uint *dest = (uint *) (stmap + spanY * pt-stride + spanX*4); + GLfloat *destf = (GLfloat*)dest; + GLint k; + assert(usage == PIPE_TRANSFER_WRITE); + for (k = 0; k spanWidth; k++) { + destf[k*2] = zValuesFloat[k]; + dest[k*2+1] = sValues[k] 0xff; + } + } + else { + uint *dest = (uint *) (stmap + spanY * pt-stride + spanX*4); + GLint k; + assert(usage == PIPE_TRANSFER_READ_WRITE); + for (k = 0; k spanWidth; k++) { + dest[k*2+1] = sValues[k] 0xff; + } + } + break; default: assert(0); } @@ -994,14 +1019,23 @@ st_DrawPixels(struct gl_context *ctx, GLint x, GLint y, GL_NONE, GL_NONE, PIPE_TEXTURE_2D, 0, PIPE_BIND_SAMPLER_VIEW); - if (tex_format == PIPE_FORMAT_Z24_UNORM_S8_USCALED) -stencil_format = PIPE_FORMAT_X24S8_USCALED; - else if (tex_format == PIPE_FORMAT_S8_USCALED_Z24_UNORM) -stencil_format = PIPE_FORMAT_S8X24_USCALED; - else -stencil_format = PIPE_FORMAT_S8_USCALED; - if (stencil_format == PIPE_FORMAT_NONE) -goto stencil_fallback; + + switch (tex_format) { + case PIPE_FORMAT_Z24_UNORM_S8_USCALED: + stencil_format = PIPE_FORMAT_X24S8_USCALED; + break; + case PIPE_FORMAT_S8_USCALED_Z24_UNORM: + stencil_format = PIPE_FORMAT_S8X24_USCALED; + break; + case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED: + stencil_format = PIPE_FORMAT_X32_S8X24_USCALED; + break; + case PIPE_FORMAT_S8_USCALED: + stencil_format = PIPE_FORMAT_S8_USCALED; + break; + default: + goto stencil_fallback; + } } /* Mesa state should be up to date by now */ @@ -1188,6 +1222,18 @@ copy_stencil_pixels(struct gl_context *ctx, GLint srcx, GLint srcy, assert(usage == PIPE_TRANSFER_WRITE); memcpy(dst, src, width); break; + case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED: + { +uint *dst4 = (uint *) dst; +int j; +dst4++; +assert(usage == PIPE_TRANSFER_READ_WRITE); +for (j = 0; j width; j++) { + *dst4 = src[j] 0xff; + dst4 += 2; +} + } + break; default: assert(0); } diff --git a/src/mesa/state_tracker/st_cb_readpixels.c b/src/mesa/state_tracker/st_cb_readpixels.c index 67926e3..02ddad7 100644 --- a/src/mesa/state_tracker/st_cb_readpixels.c +++ b/src/mesa/state_tracker/st_cb_readpixels.c @@ -151,6 +151,24 @@ st_read_stencil_pixels(struct gl_context *ctx, GLint x, GLint y, } } break; + case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED: + if (format == GL_DEPTH_STENCIL) { +const uint *src = (uint *) (stmap + srcY * pt-stride); +const GLfloat *srcf = (const GLfloat*)src; +GLint
[Mesa-dev] [PATCH 11/13] gallium/util: implement pack functions for Z32F and Z32F_S8X24
The suffix of 64 means it returns uint64_t. --- src/gallium/auxiliary/util/u_pack_color.h | 64 + 1 files changed, 64 insertions(+), 0 deletions(-) diff --git a/src/gallium/auxiliary/util/u_pack_color.h b/src/gallium/auxiliary/util/u_pack_color.h index 5378f2d..d2dfba5 100644 --- a/src/gallium/auxiliary/util/u_pack_color.h +++ b/src/gallium/auxiliary/util/u_pack_color.h @@ -458,6 +458,19 @@ util_pack_mask_z(enum pipe_format format, uint32_t z) } } + +static INLINE uint64_t +util_pack_mask_z64(enum pipe_format format, uint32_t z) +{ + switch (format) { + case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED: + return z; + default: + return util_pack_mask_z(format, z); + } +} + + static INLINE uint32_t util_pack_mask_z_stencil(enum pipe_format format, uint32_t z, uint8_t s) { @@ -481,6 +494,21 @@ util_pack_mask_z_stencil(enum pipe_format format, uint32_t z, uint8_t s) } +static INLINE uint64_t +util_pack_mask_z_stencil64(enum pipe_format format, uint32_t z, uint8_t s) +{ + uint64_t packed; + + switch (format) { + case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED: + packed = util_pack_mask_z64(format, z); + packed |= (uint64_t)s 32ull; + return packed; + default: + return util_pack_mask_z_stencil(format, z, s); + } +} + /** * Note: it's assumed that z is in [0,1] @@ -525,6 +553,24 @@ util_pack_z(enum pipe_format format, double z) return 0; } } + + +static INLINE uint64_t +util_pack_z64(enum pipe_format format, double z) +{ + union fi fui; + + if (z == 0) + return 0; + + switch (format) { + case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED: + fui.f = (float)z; + return fui.ui; + default: + return util_pack_z(format, z); + } +} /** @@ -554,6 +600,24 @@ util_pack_z_stencil(enum pipe_format format, double z, uint8_t s) } +static INLINE uint64_t +util_pack_z_stencil64(enum pipe_format format, double z, uint8_t s) +{ + uint64_t packed; + + switch (format) { + case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED: + packed = util_pack_z64(format, z); + packed |= (uint64_t)s 32ull; + break; + default: + return util_pack_z_stencil(format, z, s); + } + + return packed; +} + + /** * Pack 4 ubytes into a 4-byte word */ -- 1.7.4.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/13] gallium/util: implement software Z32F_S8X24 depth-stencil clear
--- src/gallium/auxiliary/util/u_surface.c | 35 +++- 1 files changed, 34 insertions(+), 1 deletions(-) diff --git a/src/gallium/auxiliary/util/u_surface.c b/src/gallium/auxiliary/util/u_surface.c index 4c5cc4d..8fcf6b9 100644 --- a/src/gallium/auxiliary/util/u_surface.c +++ b/src/gallium/auxiliary/util/u_surface.c @@ -358,8 +358,41 @@ util_clear_depth_stencil(struct pipe_context *pipe, dst_map += dst_stride; } } -break; + break; case 8: + { + uint64_t zstencil = util_pack_z_stencil64(dst-texture-format, + depth, stencil); + + assert(dst-format == PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED); + + if (!need_rmw) { +for (i = 0; i height; i++) { + uint64_t *row = (uint64_t *)dst_map; + for (j = 0; j width; j++) + *row++ = zstencil; + dst_map += dst_stride; +} + } + else { +uint64_t src_mask; + +if (clear_flags PIPE_CLEAR_DEPTH) + src_mask = 0xull; +else + src_mask = 0x00ffull; + +for (i = 0; i height; i++) { + uint64_t *row = (uint64_t *)dst_map; + for (j = 0; j width; j++) { + uint64_t tmp = *row ~src_mask; + *row++ = tmp | (zstencil src_mask); + } + dst_map += dst_stride; +} + } + break; + } default: assert(0); break; -- 1.7.4.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 13/13] gallium/util: handle Z32F_FLOAT_S8X24_USCALED in pipe_tile_raw_to_rgba
And make pipe_put_tile_rgba_format no-op like the other Z formats. --- src/gallium/auxiliary/util/u_tile.c | 35 +++ 1 files changed, 35 insertions(+), 0 deletions(-) diff --git a/src/gallium/auxiliary/util/u_tile.c b/src/gallium/auxiliary/util/u_tile.c index e3c7085..23f12e5 100644 --- a/src/gallium/auxiliary/util/u_tile.c +++ b/src/gallium/auxiliary/util/u_tile.c @@ -318,6 +318,32 @@ z32f_get_tile_rgba(const float *src, } } +/*** PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED ***/ + +/** + * Return each Z value as four floats in [0,1]. + */ +static void +z32f_x24s8_get_tile_rgba(const float *src, + unsigned w, unsigned h, + float *p, + unsigned dst_stride) +{ + unsigned i, j; + + for (i = 0; i h; i++) { + float *pRow = p; + for (j = 0; j w; j++, pRow += 4) { + pRow[0] = + pRow[1] = + pRow[2] = + pRow[3] = *src; + src += 2; + } + p += dst_stride; + } +} + void pipe_tile_raw_to_rgba(enum pipe_format format, @@ -352,6 +378,9 @@ pipe_tile_raw_to_rgba(enum pipe_format format, case PIPE_FORMAT_Z32_FLOAT: z32f_get_tile_rgba((float *) src, w, h, dst, dst_stride); break; + case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED: + z32f_x24s8_get_tile_rgba((float *) src, w, h, dst, dst_stride); + break; default: util_format_read_4f(format, dst, dst_stride * sizeof(float), @@ -445,6 +474,12 @@ pipe_put_tile_rgba_format(struct pipe_context *pipe, case PIPE_FORMAT_X8Z24_UNORM: /*z24s8_put_tile_rgba((unsigned *) packed, w, h, p, src_stride);*/ break; + case PIPE_FORMAT_Z32_FLOAT: + /*z32f_put_tile_rgba((unsigned *) packed, w, h, p, src_stride);*/ + break; + case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED: + /*z32f_s8x24_put_tile_rgba((unsigned *) packed, w, h, p, src_stride);*/ + break; default: util_format_write_4f(format, p, src_stride * sizeof(float), -- 1.7.4.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/13] mesa: implement depth/stencil renderbuffer wrapper accessors for Z32F_X24S8
--- src/mesa/main/depthstencil.c | 322 +++--- src/mesa/main/depthstencil.h |5 + src/mesa/main/framebuffer.c | 10 +- 3 files changed, 313 insertions(+), 24 deletions(-) diff --git a/src/mesa/main/depthstencil.c b/src/mesa/main/depthstencil.c index ab62c97..f979045 100644 --- a/src/mesa/main/depthstencil.c +++ b/src/mesa/main/depthstencil.c @@ -393,6 +393,217 @@ _mesa_new_z24_renderbuffer_wrapper(struct gl_context *ctx, } +static void +get_row_z32f(struct gl_context *ctx, struct gl_renderbuffer *z32frb, GLuint count, + GLint x, GLint y, void *values) +{ + struct gl_renderbuffer *dsrb = z32frb-Wrapped; + GLfloat temp[MAX_WIDTH*2]; + GLfloat *dst = (GLfloat *) values; + const GLfloat *src = (const GLfloat *) dsrb-GetPointer(ctx, dsrb, x, y); + GLuint i; + ASSERT(z32frb-DataType == GL_FLOAT); + ASSERT(dsrb-DataType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV); + ASSERT(dsrb-Format == MESA_FORMAT_Z32_FLOAT_X24S8); + if (!src) { + dsrb-GetRow(ctx, dsrb, count, x, y, temp); + src = temp; + } + for (i = 0; i count; i++) { + dst[i] = src[i*2]; + } +} + +static void +get_values_z32f(struct gl_context *ctx, struct gl_renderbuffer *z32frb, GLuint count, +const GLint x[], const GLint y[], void *values) +{ + struct gl_renderbuffer *dsrb = z32frb-Wrapped; + GLfloat temp[MAX_WIDTH*2]; + GLfloat *dst = (GLfloat *) values; + GLuint i; + ASSERT(z32frb-DataType == GL_FLOAT); + ASSERT(dsrb-DataType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV); + ASSERT(dsrb-Format == MESA_FORMAT_Z32_FLOAT_X24S8); + ASSERT(count = MAX_WIDTH); + /* don't bother trying direct access */ + dsrb-GetValues(ctx, dsrb, count, x, y, temp); + for (i = 0; i count; i++) { + dst[i] = temp[i*2]; + } +} + +static void +put_row_z32f(struct gl_context *ctx, struct gl_renderbuffer *z32frb, GLuint count, + GLint x, GLint y, const void *values, const GLubyte *mask) +{ + struct gl_renderbuffer *dsrb = z32frb-Wrapped; + const GLfloat *src = (const GLfloat *) values; + GLfloat *dst = (GLfloat *) dsrb-GetPointer(ctx, dsrb, x, y); + ASSERT(z32frb-DataType == GL_FLOAT); + ASSERT(dsrb-DataType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV); + ASSERT(dsrb-Format == MESA_FORMAT_Z32_FLOAT_X24S8); + if (dst) { + /* direct access */ + GLuint i; + for (i = 0; i count; i++) { + if (!mask || mask[i]) { +dst[i*2] = src[i]; + } + } + } + else { + /* get, modify, put */ + GLfloat temp[MAX_WIDTH*2]; + GLuint i; + dsrb-GetRow(ctx, dsrb, count, x, y, temp); + for (i = 0; i count; i++) { + if (!mask || mask[i]) { +temp[i*2] = src[i]; + } + } + dsrb-PutRow(ctx, dsrb, count, x, y, temp, mask); + } +} + +static void +put_mono_row_z32f(struct gl_context *ctx, struct gl_renderbuffer *z32frb, GLuint count, + GLint x, GLint y, const void *value, const GLubyte *mask) +{ + struct gl_renderbuffer *dsrb = z32frb-Wrapped; + GLfloat *dst = (GLfloat *) dsrb-GetPointer(ctx, dsrb, x, y); + ASSERT(z32frb-DataType == GL_FLOAT); + ASSERT(dsrb-DataType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV); + ASSERT(dsrb-Format == MESA_FORMAT_Z32_FLOAT_X24S8); + if (dst) { + /* direct access */ + GLuint i; + const GLfloat val = *(GLfloat*)value; + for (i = 0; i count; i++) { + if (!mask || mask[i]) { +dst[i*2] = val; + } + } + } + else { + /* get, modify, put */ + GLfloat temp[MAX_WIDTH*2]; + GLuint i; + const GLfloat val = *(GLfloat *)value; + dsrb-GetRow(ctx, dsrb, count, x, y, temp); + for (i = 0; i count; i++) { + if (!mask || mask[i]) { +temp[i*2] = val; + } + } + dsrb-PutRow(ctx, dsrb, count, x, y, temp, mask); + } +} + +static void +put_values_z32f(struct gl_context *ctx, struct gl_renderbuffer *z32frb, GLuint count, +const GLint x[], const GLint y[], +const void *values, const GLubyte *mask) +{ + struct gl_renderbuffer *dsrb = z32frb-Wrapped; + const GLfloat *src = (const GLfloat *) values; + ASSERT(z32frb-DataType == GL_FLOAT); + ASSERT(dsrb-DataType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV); + ASSERT(dsrb-Format == MESA_FORMAT_Z32_FLOAT_X24S8); + if (dsrb-GetPointer(ctx, dsrb, 0, 0)) { + /* direct access */ + GLuint i; + for (i = 0; i count; i++) { + if (!mask || mask[i]) { +GLfloat *dst = (GLfloat *) dsrb-GetPointer(ctx, dsrb, x[i], y[i]); +dst[1] = src[i]; + } + } + } + else { + /* get, modify, put */ + GLfloat temp[MAX_WIDTH*2]; + GLuint i; + dsrb-GetValues(ctx, dsrb, count, x, y, temp); + for (i = 0; i count; i++) { + if (!mask || mask[i]) { +temp[i*2] = src[i]; + } + } + dsrb-PutValues(ctx, dsrb, count, x, y,
[Mesa-dev] [Bug 5002] indirect rendering of glDrawArrays() to an NVidia machine is broke.
https://bugs.freedesktop.org/show_bug.cgi?id=5002 Trevor Forbes t...@internode.on.net changed: What|Removed |Added CC||t...@internode.on.net -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/13] mesa: implement depth/stencil renderbuffer wrapper accessors for Z32F_X24S8
On 06/30/2011 05:29 PM, Marek Olšák wrote: --- src/mesa/main/depthstencil.c | 322 +++--- src/mesa/main/depthstencil.h |5 + src/mesa/main/framebuffer.c | 10 +- 3 files changed, 313 insertions(+), 24 deletions(-) diff --git a/src/mesa/main/depthstencil.c b/src/mesa/main/depthstencil.c index ab62c97..f979045 100644 --- a/src/mesa/main/depthstencil.c +++ b/src/mesa/main/depthstencil.c [snip] +static void +put_values_z32f(struct gl_context *ctx, struct gl_renderbuffer *z32frb, GLuint count, +const GLint x[], const GLint y[], +const void *values, const GLubyte *mask) +{ + struct gl_renderbuffer *dsrb = z32frb-Wrapped; + const GLfloat *src = (const GLfloat *) values; + ASSERT(z32frb-DataType == GL_FLOAT); + ASSERT(dsrb-DataType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV); + ASSERT(dsrb-Format == MESA_FORMAT_Z32_FLOAT_X24S8); + if (dsrb-GetPointer(ctx, dsrb, 0, 0)) { + /* direct access */ + GLuint i; + for (i = 0; i count; i++) { + if (!mask || mask[i]) { +GLfloat *dst = (GLfloat *) dsrb-GetPointer(ctx, dsrb, x[i], y[i]); +dst[1] = src[i]; Don't you mean dst[0] = src[i] here? With dst[1], you'll be assigning to the stencil value... + } + } + } + else { + /* get, modify, put */ + GLfloat temp[MAX_WIDTH*2]; + GLuint i; + dsrb-GetValues(ctx, dsrb, count, x, y, temp); + for (i = 0; i count; i++) { + if (!mask || mask[i]) { +temp[i*2] = src[i]; ...when clearly this is assigning to the depth value. + } + } + dsrb-PutValues(ctx, dsrb, count, x, y, temp, mask); + } +} [snip] With that fixed, this patch is: Reviewed-by: Kenneth Graunke kenn...@whitecape.org ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 37177] Mathematica Plot3D Crash
https://bugs.freedesktop.org/show_bug.cgi?id=37177 Jason Tibbitts ti...@math.uh.edu changed: What|Removed |Added CC||ti...@math.uh.edu -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/13] Floating-point depth buffers
On 06/30/2011 05:29 PM, Marek Olšák wrote: Hi, this patch series implements ARB_depth_buffer_float in Mesa and Gallium. There is complete r600g/r600-r700 support in my private branch, which passes the same tests that pass for Z24S8. Softpipe has only sampler support. This has turned out to be not so trivial, so it's possible I missed something. I did not implement NV_depth_buffer_float, because it's not compatible with the ARB variant. (GL_DEPTH_COMPONENT32F != GL_DEPTH_COMPONENT32F_NV etc.) The NV extension can operate on unclamped depth values, whereas the ARB one always clamps them. Please review. Marek, You are awesome! Thanks so much for implementing this! Patches 1-7 look great: Reviewed-by: Kenneth Graunke kenn...@whitecape.org I replied with a comment on patch 8, and am not planning on reviewing patches 9-13 as I'm not familiar with Gallium. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 5002] indirect rendering of glDrawArrays() to an NVidia machine is broke.
https://bugs.freedesktop.org/show_bug.cgi?id=5002 --- Comment #8 from Trevor Forbes t...@internode.on.net 2011-06-30 21:39:59 PDT --- I also keep running into this problem with workstations running NVIDIA drivers 180. Try running Googleearth from a remote server for example.. Some people are getting around it by using an old NVIDIA driver or by copying the proprietary NVIDIA lib to the server and replacing the Mesa lib. In my case, I am setting LIBGL_NO_DRAWARRAYS=1 on the server before launching the application which works but is really a workaround rather than a solution. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev