[Mesa-dev] [PATCH 1/3] intel: Recognize GL_DEPTH_COMPONENT32 in get_teximage_readbuffer.

2011-06-30 Thread Kenneth Graunke
gl_texture_image::InternalFormat is actually the user requested internal
format, not what the texture actually is.  Thus, even though we don't
support 32-bit depth buffers, we need to recognize the enumeration here.
Otherwise, it wrongly returns the color read buffer instead of the depth
read buffer.

Fixes an issue in PlaneShift 0.5.7 when casting spells.  The game calls
CopyTexSubImage2D on buffers with a GL_DEPTH_COMPONENT32 internal
format, which (prior to this patch) resulted in an attempt to copy an
ARGB to S8_Z24.  This patch fixes the behavior, but does not yet
eliminate the software fallback.

NOTE: This is a candidate for the 7.10 and 7.11 branches.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/intel/intel_tex_copy.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

I kind of wonder if we should just be using TexFormat (the actual format)
rather than InternalFormat (the user requested format).

diff --git a/src/mesa/drivers/dri/intel/intel_tex_copy.c 
b/src/mesa/drivers/dri/intel/intel_tex_copy.c
index eda07a4..8b5c3f0 100644
--- a/src/mesa/drivers/dri/intel/intel_tex_copy.c
+++ b/src/mesa/drivers/dri/intel/intel_tex_copy.c
@@ -58,6 +58,7 @@ get_teximage_readbuffer(struct intel_context *intel, GLenum 
internalFormat)
switch (internalFormat) {
case GL_DEPTH_COMPONENT:
case GL_DEPTH_COMPONENT16:
+   case GL_DEPTH_COMPONENT32:
case GL_DEPTH24_STENCIL8_EXT:
case GL_DEPTH_STENCIL_EXT:
   return intel_get_renderbuffer(intel-ctx.ReadBuffer, BUFFER_DEPTH);
-- 
1.7.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] intel: Remove restriction against Y-tiling in intel_copy_texsubimage.

2011-06-30 Thread Kenneth Graunke
intelEmitCopyBlit already checks for this, so the check is redundant and
unnecessary.  This consolidates the logic (which will soon change).

NOTE: This is a candidate for the 7.10 and 7.11 branches.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/intel/intel_tex_copy.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/intel/intel_tex_copy.c 
b/src/mesa/drivers/dri/intel/intel_tex_copy.c
index 8b5c3f0..6a297c0 100644
--- a/src/mesa/drivers/dri/intel/intel_tex_copy.c
+++ b/src/mesa/drivers/dri/intel/intel_tex_copy.c
@@ -128,11 +128,6 @@ intel_copy_texsubimage(struct intel_context *intel,
 0,
 image_x, image_y);
 
-  /* The blitter can't handle Y-tiled buffers. */
-  if (intelImage-mt-region-tiling == I915_TILING_Y) {
-return GL_FALSE;
-  }
-
   if (ctx-ReadBuffer-Name == 0) {
 /* Flip vertical orientation for system framebuffers */
 y = ctx-ReadBuffer-Height - (y + height);
-- 
1.7.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] intel: Add support copying Y-tiled buffers with the Gen6 blitter.

2011-06-30 Thread Kenneth Graunke
According to the Sandybridge PRM, Volume 1, Part 5, Section 1.9.15,
Gen6's blitter supports Y-tiled buffers as well as X-tiled.  Pitch is
specified in 512-byte granularity for X-tiled, but 128-byte for Y-tiled.

Gen5 and earlier unfortunately only support X-tiled buffers.

Fixes a software fallback in PlaneShift 0.5.7 when casting spells.

NOTE: This is a candidate for the 7.10 and 7.11 branches.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/intel/intel_blit.c |   12 +++-
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/intel/intel_blit.c 
b/src/mesa/drivers/dri/intel/intel_blit.c
index 30be1b9..de752f2 100644
--- a/src/mesa/drivers/dri/intel/intel_blit.c
+++ b/src/mesa/drivers/dri/intel/intel_blit.c
@@ -111,13 +111,13 @@ intelEmitCopyBlit(struct intel_context *intel,
if (dst_tiling != I915_TILING_NONE) {
   if (dst_offset  4095)
 return GL_FALSE;
-  if (dst_tiling == I915_TILING_Y)
+  if (intel-gen  6  dst_tiling == I915_TILING_Y)
 return GL_FALSE;
}
if (src_tiling != I915_TILING_NONE) {
   if (src_offset  4095)
 return GL_FALSE;
-  if (src_tiling == I915_TILING_Y)
+  if (intel-gen  6  src_tiling == I915_TILING_Y)
 return GL_FALSE;
}
 
@@ -172,13 +172,15 @@ intelEmitCopyBlit(struct intel_context *intel,
}
 
 #ifndef I915
-   if (dst_tiling != I915_TILING_NONE) {
+   if (dst_tiling == I915_TILING_NONE) {
   CMD |= XY_DST_TILED;
-  dst_pitch /= 4;
+  if (dst_tiling == I915_TILING_X)
+dst_pitch /= 4;
}
if (src_tiling != I915_TILING_NONE) {
   CMD |= XY_SRC_TILED;
-  src_pitch /= 4;
+  if (src_tiling == I915_TILING_X)
+src_pitch /= 4;
}
 #endif
 
-- 
1.7.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/9] svga: Flush when switching between HW to SW TNL, after updating need_swtnl.

2011-06-30 Thread Thomas Hellstrom
From: José Fonseca jfons...@vmware.com

Also, only flush when going from HW TNL to SW TNL, given it is impossible
for the buffers resulting from SWTNL to be ever referred by HW TNL path.
---
 src/gallium/drivers/svga/svga_context.h   |3 ---
 src/gallium/drivers/svga/svga_pipe_draw.c |   23 +++
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_context.h 
b/src/gallium/drivers/svga/svga_context.h
index eca529d..34b9e85 100644
--- a/src/gallium/drivers/svga/svga_context.h
+++ b/src/gallium/drivers/svga/svga_context.h
@@ -372,9 +372,6 @@ struct svga_context
 
/** List of buffers with queued transfers */
struct list_head dirty_buffers;
-
-   /** Was the previous draw done with the SW path? */
-   boolean prev_draw_swtnl;
 };
 
 /* A flag for each state_tracker state object:
diff --git a/src/gallium/drivers/svga/svga_pipe_draw.c 
b/src/gallium/drivers/svga/svga_pipe_draw.c
index 2093bca..a632fb1 100644
--- a/src/gallium/drivers/svga/svga_pipe_draw.c
+++ b/src/gallium/drivers/svga/svga_pipe_draw.c
@@ -141,18 +141,11 @@ svga_draw_vbo(struct pipe_context *pipe, const struct 
pipe_draw_info *info)
unsigned reduced_prim = u_reduced_prim( info-mode );
unsigned count = info-count;
enum pipe_error ret = 0;
+   boolean needed_swtnl;
 
if (!u_trim_pipe_prim( info-mode, count ))
   return;
 
-   if (svga-state.sw.need_swtnl != svga-prev_draw_swtnl) {
-  /* We're switching between SW and HW drawing.  Do a flush to avoid
-   * mixing HW and SW rendering with the same vertex buffer.
-   */
-  pipe-flush(pipe, NULL);
-  svga-prev_draw_swtnl = svga-state.sw.need_swtnl;
-   }
-
/*
 * Mark currently bound target surfaces as dirty
 * doesn't really matter if it is done before drawing.
@@ -167,6 +160,8 @@ svga_draw_vbo(struct pipe_context *pipe, const struct 
pipe_draw_info *info)
   svga-dirty |= SVGA_NEW_REDUCED_PRIMITIVE;
}

+   needed_swtnl = svga-state.sw.need_swtnl;
+
svga_update_state_retry( svga, SVGA_STATE_NEED_SWTNL );
 
 #ifdef DEBUG
@@ -176,6 +171,18 @@ svga_draw_vbo(struct pipe_context *pipe, const struct 
pipe_draw_info *info)
 #endif
 
if (svga-state.sw.need_swtnl) {
+  if (!needed_swtnl) {
+ /*
+  * We're switching from HW to SW TNL.  SW TNL will require mapping all
+  * currently bound vertex buffers, some of which may already be
+  * referenced in the current command buffer as result of previous HW
+  * TNL. So flush now, to prevent the context to flush while a referred
+  * vertex buffer is mapped.
+  */
+
+ svga_context_flush(svga, NULL);
+  }
+
   ret = svga_swtnl_draw_vbo( svga, info );
}
else {
-- 
1.6.2.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/9] gallium/util: Upload manager optimizations

2011-06-30 Thread Thomas Hellstrom
Make sure that the upload manager doesn't upload data that's not
dirty. This speeds up the viewperf test proe-04/1 a factor 5 or so on svga.

Also introduce an u_upload_unmap() function that can be used
instead of u_upload_flush() so that we can pack
even more data in upload buffers. With this we can basically reuse the
upload buffer across flushes.

Signed-off-by: Thomas Hellstrom thellst...@vmware.com
---
 src/gallium/auxiliary/util/u_upload_mgr.c |   35 +---
 src/gallium/auxiliary/util/u_upload_mgr.h |   20 +---
 2 files changed, 42 insertions(+), 13 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_upload_mgr.c 
b/src/gallium/auxiliary/util/u_upload_mgr.c
index 9562acb..d36697d 100644
--- a/src/gallium/auxiliary/util/u_upload_mgr.c
+++ b/src/gallium/auxiliary/util/u_upload_mgr.c
@@ -72,6 +72,22 @@ struct u_upload_mgr *u_upload_create( struct pipe_context 
*pipe,
return upload;
 }
 
+void u_upload_unmap( struct u_upload_mgr *upload )
+{
+   if (upload-transfer) {
+  struct pipe_box *box = upload-transfer-box;
+  if (upload-offset  box-x) {
+
+ pipe_buffer_flush_mapped_range(upload-pipe, upload-transfer,
+box-x, upload-offset - box-x);
+  }
+  pipe_transfer_unmap(upload-pipe, upload-transfer);
+  pipe_transfer_destroy(upload-pipe, upload-transfer);
+  upload-transfer = NULL;
+  upload-map = NULL;
+   }
+}
+
 /* Release old buffer.
  * 
  * This must usually be called prior to firing the command stream
@@ -84,15 +100,7 @@ struct u_upload_mgr *u_upload_create( struct pipe_context 
*pipe,
 void u_upload_flush( struct u_upload_mgr *upload )
 {
/* Unmap and unreference the upload buffer. */
-   if (upload-transfer) {
-  if (upload-offset) {
- pipe_buffer_flush_mapped_range(upload-pipe, upload-transfer,
-0, upload-offset);
-  }
-  pipe_transfer_unmap(upload-pipe, upload-transfer);
-  pipe_transfer_destroy(upload-pipe, upload-transfer);
-  upload-transfer = NULL;
-   }
+   u_upload_unmap(upload);
pipe_resource_reference( upload-buffer, NULL );
upload-size = 0;
 }
@@ -172,6 +180,15 @@ enum pipe_error u_upload_alloc( struct u_upload_mgr 
*upload,
 
offset = MAX2(upload-offset, alloc_offset);
 
+   if (!upload-map) {
+  upload-map = pipe_buffer_map_range(upload-pipe, upload-buffer,
+ offset, upload-size - offset,
+ PIPE_TRANSFER_WRITE |
+ PIPE_TRANSFER_FLUSH_EXPLICIT |
+ PIPE_TRANSFER_UNSYNCHRONIZED,
+ upload-transfer);
+   }
+
assert(offset  upload-buffer-width0);
assert(offset + size = upload-buffer-width0);
assert(size);
diff --git a/src/gallium/auxiliary/util/u_upload_mgr.h 
b/src/gallium/auxiliary/util/u_upload_mgr.h
index c9a2ffe..9891513 100644
--- a/src/gallium/auxiliary/util/u_upload_mgr.h
+++ b/src/gallium/auxiliary/util/u_upload_mgr.h
@@ -56,15 +56,27 @@ struct u_upload_mgr *u_upload_create( struct pipe_context 
*pipe,
  */
 void u_upload_destroy( struct u_upload_mgr *upload );
 
-/* Unmap and release old buffer.
+/* Unmap and release old upload buffer.
  * 
+ * This is like u_upload_unmap() except the upload buffer is released for
+ * recycling. This should be called on real hardware flushes on systems
+ * that don't support the PIPE_TRANSFER_UNSYNCHRONIZED flag, as otherwise
+ * the next u_upload_buffer will cause a sync on the buffer.
+ */
+
+void u_upload_flush( struct u_upload_mgr *upload );
+
+/**
+ * Unmap upload buffer
+ *
+ * \param upload   Upload manager
+ *
  * This must usually be called prior to firing the command stream
  * which references the upload buffer, as many memory managers either
  * don't like firing a mapped buffer or cause subsequent maps of a
- * fired buffer to wait.  For now, it's easiest just to grab a new
- * buffer.
+ * fired buffer to wait.
  */
-void u_upload_flush( struct u_upload_mgr *upload );
+void u_upload_unmap( struct u_upload_mgr *upload );
 
 /**
  * Sub-allocate new memory from the upload buffer.
-- 
1.6.2.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/9] gallium/svga: Make use of u_upload_flush().

2011-06-30 Thread Thomas Hellstrom
This enables us to pack more data into single upload buffers.

Signed-off-by: Thomas Hellstrom thellst...@vmware.com
---
 src/gallium/drivers/svga/svga_context.c |8 
 src/gallium/drivers/svga/svga_draw.c|4 ++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_context.c 
b/src/gallium/drivers/svga/svga_context.c
index dbbc249..cfb1b9d 100644
--- a/src/gallium/drivers/svga/svga_context.c
+++ b/src/gallium/drivers/svga/svga_context.c
@@ -207,6 +207,14 @@ void svga_context_flush( struct svga_context *svga,
 
svga-curr.nr_fbs = 0;
 
+   /* Flush the upload managers to ensure recycling of upload buffers
+* without throttling. This should really be conditioned on
+* pipe_buffer_map_range not supporting PIPE_TRANSFER_UNSYNCHRONIZED.
+*/
+
+   u_upload_flush(svga-upload_vb);
+   u_upload_flush(svga-upload_ib);
+
/* Ensure that texture dma uploads are processed
 * before submitting commands.
 */
diff --git a/src/gallium/drivers/svga/svga_draw.c 
b/src/gallium/drivers/svga/svga_draw.c
index d8af615..28ba470 100644
--- a/src/gallium/drivers/svga/svga_draw.c
+++ b/src/gallium/drivers/svga/svga_draw.c
@@ -145,7 +145,7 @@ svga_hwtnl_flush( struct svga_hwtnl *hwtnl )
   unsigned i;
 
   /* Unmap upload manager vertex buffers */
-  u_upload_flush(svga-upload_vb);
+  u_upload_unmap(svga-upload_vb);
 
   for (i = 0; i  hwtnl-cmd.vdecl_count; i++) {
  handle = svga_buffer_handle(svga, hwtnl-cmd.vdecl_vb[i]);
@@ -156,7 +156,7 @@ svga_hwtnl_flush( struct svga_hwtnl *hwtnl )
   }
 
   /* Unmap upload manager index buffers */
-  u_upload_flush(svga-upload_ib);
+  u_upload_unmap(svga-upload_ib);
 
   for (i = 0; i  hwtnl-cmd.prim_count; i++) {
  if (hwtnl-cmd.prim_ib[i]) {
-- 
1.6.2.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/9] gallium/svga: Upload only parts of user-buffers that we actually use

2011-06-30 Thread Thomas Hellstrom
Stream user buffer contents rather than trying to maintain persistent
host / hardware copies.
Resulting negative array offsets are not allowed by the hardware,
(well, at least not according to header files), so adjust index bias
to make all array offsets positive.

Signed-off-by: Thomas Hellstrom thellst...@vmware.com
---
 src/gallium/drivers/svga/svga_draw.c|   13 ++-
 src/gallium/drivers/svga/svga_draw.h|3 +
 src/gallium/drivers/svga/svga_draw_private.h|7 ++
 src/gallium/drivers/svga/svga_pipe_draw.c   |  130 ++-
 src/gallium/drivers/svga/svga_resource_buffer.h |7 ++
 src/gallium/drivers/svga/svga_state_vdecl.c |  119 +
 6 files changed, 208 insertions(+), 71 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_draw.c 
b/src/gallium/drivers/svga/svga_draw.c
index 28ba470..aa09669 100644
--- a/src/gallium/drivers/svga/svga_draw.c
+++ b/src/gallium/drivers/svga/svga_draw.c
@@ -242,6 +242,11 @@ svga_hwtnl_flush( struct svga_hwtnl *hwtnl )
 }
 
 
+void svga_hwtnl_set_index_bias( struct svga_hwtnl *hwtnl,
+   int index_bias)
+{
+   hwtnl-index_bias = index_bias;
+}
 
 
 
@@ -265,15 +270,16 @@ enum pipe_error svga_hwtnl_prim( struct svga_hwtnl *hwtnl,
  unsigned size = vb ? vb-width0 : 0;
  unsigned offset = hwtnl-cmd.vdecl[i].array.offset;
  unsigned stride = hwtnl-cmd.vdecl[i].array.stride;
- unsigned index_bias = range-indexBias;
+ int index_bias = (int) range-indexBias + hwtnl-index_bias;
  unsigned width;
 
  assert(vb);
  assert(size);
  assert(offset  size);
- assert(index_bias = 0);
  assert(min_index = max_index);
- assert(offset + index_bias*stride  size);
+ if (index_bias = 0) {
+assert(offset + index_bias*stride  size);
+ }
  if (min_index != ~0) {
 assert(offset + (index_bias + min_index) * stride  size);
  }
@@ -394,6 +400,7 @@ enum pipe_error svga_hwtnl_prim( struct svga_hwtnl *hwtnl,
hwtnl-cmd.max_index[hwtnl-cmd.prim_count] = max_index;
 
hwtnl-cmd.prim[hwtnl-cmd.prim_count] = *range;
+   hwtnl-cmd.prim[hwtnl-cmd.prim_count].indexBias += hwtnl-index_bias;
 
pipe_resource_reference(hwtnl-cmd.prim_ib[hwtnl-cmd.prim_count], ib);
hwtnl-cmd.prim_count++;
diff --git a/src/gallium/drivers/svga/svga_draw.h 
b/src/gallium/drivers/svga/svga_draw.h
index a2403d8..1dac174 100644
--- a/src/gallium/drivers/svga/svga_draw.h
+++ b/src/gallium/drivers/svga/svga_draw.h
@@ -79,5 +79,8 @@ svga_hwtnl_draw_range_elements( struct svga_hwtnl *hwtnl,
 enum pipe_error
 svga_hwtnl_flush( struct svga_hwtnl *hwtnl );
 
+void svga_hwtnl_set_index_bias( struct svga_hwtnl *hwtnl,
+int index_bias);
+
 
 #endif /* SVGA_DRAW_H_ */
diff --git a/src/gallium/drivers/svga/svga_draw_private.h 
b/src/gallium/drivers/svga/svga_draw_private.h
index ca658ac..8126f7e 100644
--- a/src/gallium/drivers/svga/svga_draw_private.h
+++ b/src/gallium/drivers/svga/svga_draw_private.h
@@ -116,6 +116,13 @@ struct draw_cmd {
 struct svga_hwtnl {
struct svga_context *svga;
struct u_upload_mgr *upload_ib;
+
+   /* Additional negative index bias due to partial buffer uploads
+* This is compensated for in the offset associated with all
+* vertex buffers.
+*/
+
+   int index_bias;

/* Flatshade information:
 */
diff --git a/src/gallium/drivers/svga/svga_pipe_draw.c 
b/src/gallium/drivers/svga/svga_pipe_draw.c
index a632fb1..8e1c764 100644
--- a/src/gallium/drivers/svga/svga_pipe_draw.c
+++ b/src/gallium/drivers/svga/svga_pipe_draw.c
@@ -37,6 +37,116 @@
 #include svga_state.h
 #include svga_swtnl.h
 #include svga_debug.h
+#include svga_resource_buffer.h
+#include util/u_upload_mgr.h
+
+/**
+ * svga_upload_user_buffers - upload parts of user buffers
+ *
+ * This function streams a part of a user buffer to hw and sets
+ * svga_buffer::source_offset to the first byte uploaded. After upload
+ * also svga_buffer::uploaded::buffer is set to !NULL
+ */
+
+static int
+svga_upload_user_buffers(struct svga_context *svga,
+ unsigned start,
+ unsigned count,
+ unsigned instance_count)
+{
+   const struct pipe_vertex_element *ve = svga-curr.velems-velem;
+   unsigned i;
+   int ret;
+
+   for (i=0; i  svga-curr.velems-count; i++) {
+  struct pipe_vertex_buffer *vb =
+ svga-curr.vb[ve[i].vertex_buffer_index];
+
+  if (vb-buffer  svga_buffer_is_user_buffer(vb-buffer)) {
+ struct svga_buffer *buffer = svga_buffer(vb-buffer);
+ unsigned first, size;
+ boolean flushed;
+ unsigned instance_div = ve[i].instance_divisor;
+
+ svga-dirty |= SVGA_NEW_VBUFFER;
+
+ if (instance_div) {
+first = 0;
+size = vb-stride *
+   (instance_count + instance_div - 1) / 

[Mesa-dev] [PATCH 5/9] svga: Handle null buffers in svga_buffer_is_user_buffer().

2011-06-30 Thread Thomas Hellstrom
From: José Fonseca jfons...@vmware.com

---
 src/gallium/drivers/svga/svga_resource_buffer.h |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_resource_buffer.h 
b/src/gallium/drivers/svga/svga_resource_buffer.h
index 2ae44d2..69d6f72 100644
--- a/src/gallium/drivers/svga/svga_resource_buffer.h
+++ b/src/gallium/drivers/svga/svga_resource_buffer.h
@@ -200,7 +200,11 @@ svga_buffer(struct pipe_resource *buffer)
 static INLINE boolean 
 svga_buffer_is_user_buffer( struct pipe_resource *buffer )
 {
-   return svga_buffer(buffer)-user;
+   if (buffer) {
+  return svga_buffer(buffer)-user;
+   } else {
+  return FALSE;
+   }
 }
 
 
-- 
1.6.2.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/9] svga: fix incorrect user buffer size computation

2011-06-30 Thread Thomas Hellstrom
From: Brian Paul bri...@vmware.com

Viewperf uses some unusual vertex arrays where the stride is less
than the element size.  In this case, the stride was 4 while the
element size was 12.  The difference of 8 bytes causes us to miss
uploading the tail bit of the array data.

Typically the stride is = the element size so there was no problem
with other apps.
---
 src/gallium/drivers/svga/svga_pipe_draw.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_pipe_draw.c 
b/src/gallium/drivers/svga/svga_pipe_draw.c
index 8e1c764..78f5aa1 100644
--- a/src/gallium/drivers/svga/svga_pipe_draw.c
+++ b/src/gallium/drivers/svga/svga_pipe_draw.c
@@ -25,6 +25,7 @@
 
 #include svga_cmd.h
 
+#include util/u_format.h
 #include util/u_inlines.h
 #include util/u_prim.h
 #include util/u_time.h
@@ -75,8 +76,9 @@ svga_upload_user_buffers(struct svga_context *svga,
 size = vb-stride *
(instance_count + instance_div - 1) / instance_div;
  } else if (vb-stride) {
+uint elemSize = util_format_get_blocksize(ve-src_format);
 first = vb-stride * start;
-size = vb-stride * count;
+size = vb-stride * (count - 1) + elemSize;
  } else {
 /* Only a single vertex!
  * Upload with the largest vertex size the hw supports,
-- 
1.6.2.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/9] svga: fix incorrect user buffer size computation for instance divisor case

2011-06-30 Thread Thomas Hellstrom
From: Brian Paul bri...@vmware.com

See preceeding commit for more info.
---
 src/gallium/drivers/svga/svga_pipe_draw.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_pipe_draw.c 
b/src/gallium/drivers/svga/svga_pipe_draw.c
index 78f5aa1..358ef82 100644
--- a/src/gallium/drivers/svga/svga_pipe_draw.c
+++ b/src/gallium/drivers/svga/svga_pipe_draw.c
@@ -68,15 +68,15 @@ svga_upload_user_buffers(struct svga_context *svga,
  unsigned first, size;
  boolean flushed;
  unsigned instance_div = ve[i].instance_divisor;
+ unsigned elemSize = util_format_get_blocksize(ve-src_format);
 
  svga-dirty |= SVGA_NEW_VBUFFER;
 
  if (instance_div) {
 first = 0;
-size = vb-stride *
-   (instance_count + instance_div - 1) / instance_div;
+count = (instance_count + instance_div - 1) / instance_div;
+size = vb-stride * (count - 1) + elemSize;
  } else if (vb-stride) {
-uint elemSize = util_format_get_blocksize(ve-src_format);
 first = vb-stride * start;
 size = vb-stride * (count - 1) + elemSize;
  } else {
-- 
1.6.2.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 9/9] svga: Fix multiple uploads of the same user-buffer.

2011-06-30 Thread Thomas Hellstrom
If a user-buffer was referenced twice by a draw command, the affected ranges
were uploaded separately, with only the last one being referenced by the
hardware. Make sure we upload only a single range.

Signed-off-by: Thomas Hellstrom thellst...@vmware.com
---
 src/gallium/drivers/svga/svga_pipe_draw.c   |  101 ++-
 src/gallium/drivers/svga/svga_resource_buffer.h |   13 ++--
 src/gallium/drivers/svga/svga_state_vdecl.c |6 +-
 3 files changed, 90 insertions(+), 30 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_pipe_draw.c 
b/src/gallium/drivers/svga/svga_pipe_draw.c
index 358ef82..0b4d41b 100644
--- a/src/gallium/drivers/svga/svga_pipe_draw.c
+++ b/src/gallium/drivers/svga/svga_pipe_draw.c
@@ -42,22 +42,42 @@
 #include util/u_upload_mgr.h
 
 /**
- * svga_upload_user_buffers - upload parts of user buffers
+ * Determine the ranges to upload for the user-buffers referenced
+ * by the next draw command.
  *
- * This function streams a part of a user buffer to hw and sets
- * svga_buffer::source_offset to the first byte uploaded. After upload
- * also svga_buffer::uploaded::buffer is set to !NULL
+ * TODO: It might be beneficial to support multiple ranges. In that case,
+ * the struct svga_buffer::uploaded member should be made an array or a
+ * list, since we need to account for the possibility that different ranges
+ * may be uploaded to different hardware buffers chosen by the utility
+ * upload manager.
  */
 
-static int
-svga_upload_user_buffers(struct svga_context *svga,
- unsigned start,
- unsigned count,
- unsigned instance_count)
+static void
+svga_user_buffer_range(struct svga_context *svga,
+   unsigned start,
+   unsigned count,
+   unsigned instance_count)
 {
const struct pipe_vertex_element *ve = svga-curr.velems-velem;
-   unsigned i;
-   int ret;
+   int i;
+
+   /*
+* Release old uploaded range (if not done already) and
+* initialize new ranges.
+*/
+
+   for (i=0; i  svga-curr.velems-count; i++) {
+  struct pipe_vertex_buffer *vb =
+ svga-curr.vb[ve[i].vertex_buffer_index];
+
+  if (vb-buffer  svga_buffer_is_user_buffer(vb-buffer)) {
+ struct svga_buffer *buffer = svga_buffer(vb-buffer);
+
+ pipe_resource_reference(buffer-uploaded.buffer, NULL);
+ buffer-uploaded.start = ~0;
+ buffer-uploaded.end = 0;
+  }
+   }
 
for (i=0; i  svga-curr.velems-count; i++) {
   struct pipe_vertex_buffer *vb =
@@ -66,30 +86,71 @@ svga_upload_user_buffers(struct svga_context *svga,
   if (vb-buffer  svga_buffer_is_user_buffer(vb-buffer)) {
  struct svga_buffer *buffer = svga_buffer(vb-buffer);
  unsigned first, size;
- boolean flushed;
  unsigned instance_div = ve[i].instance_divisor;
  unsigned elemSize = util_format_get_blocksize(ve-src_format);
 
  svga-dirty |= SVGA_NEW_VBUFFER;
 
  if (instance_div) {
-first = 0;
+first = ve[i].src_offset;
 count = (instance_count + instance_div - 1) / instance_div;
 size = vb-stride * (count - 1) + elemSize;
  } else if (vb-stride) {
-first = vb-stride * start;
+first = vb-stride * start + ve[i].src_offset;
 size = vb-stride * (count - 1) + elemSize;
  } else {
 /* Only a single vertex!
  * Upload with the largest vertex size the hw supports,
  * if possible.
  */
-first = 0;
+first = ve[i].src_offset;
 size = MIN2(16, vb-buffer-width0);
  }
 
+ buffer-uploaded.start = MIN2(buffer-uploaded.start, first);
+ buffer-uploaded.end = MAX2(buffer-uploaded.end, first + size);
+  }
+   }
+}
+
+/**
+ * svga_upload_user_buffers - upload parts of user buffers
+ *
+ * This function streams a part of a user buffer to hw and fills
+ * svga_buffer::uploaded with information on the upload.
+ */
+
+static int
+svga_upload_user_buffers(struct svga_context *svga,
+ unsigned start,
+ unsigned count,
+ unsigned instance_count)
+{
+   const struct pipe_vertex_element *ve = svga-curr.velems-velem;
+   unsigned i;
+   int ret;
+
+   svga_user_buffer_range(svga, start, count, instance_count);
+
+   for (i=0; i  svga-curr.velems-count; i++) {
+  struct pipe_vertex_buffer *vb =
+ svga-curr.vb[ve[i].vertex_buffer_index];
+
+  if (vb-buffer  svga_buffer_is_user_buffer(vb-buffer)) {
+ struct svga_buffer *buffer = svga_buffer(vb-buffer);
+ boolean flushed;
+
+ /*
+  * Check if already uploaded. Otherwise go ahead and upload.
+  */
+
+ if (buffer-uploaded.buffer)
+continue;
+
  ret = u_upload_buffer( svga-upload_vb,
- 

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Thomas Hellstrom

Hmm.
Forgive my ignorance, but isn't memcmp() on structs pretty prone to give 
incorrect != results, given that there may be padding between members in 
structs and that IIRC gcc struct assignment is member-wise.


What happens if there's padding between the jit_context and variant 
members of struct lp_rast_state?


I seem to recall hitting similar issues a number of times in the past.

/Thomas

On 06/30/2011 03:36 AM, Roland Scheidegger wrote:

Ok in fact there's a gcc bug about memcmp:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
In short gcc's memcmp builtin is totally lame and loses to glibc's
memcmp (including call overhead, no knowledge about alignment etc.) even
when comparing only very few bytes (and loses BIG time for lots of bytes
to compare). Oops. Well at least if the strings are the same (I'd guess
if the first byte is different it's hard to beat the gcc builtin...).
So this is really a gcc bug. The bug is quite old though with no fix in
sight apparently so might need to think about some workaround (but just
not doing the comparison doesn't look like the right idea, since
apparently it would be faster with the comparison if gcc's memcmp got
fixed).


Roland



Am 30.06.2011 01:47, schrieb Roland Scheidegger:
   

I didn't even look at that was just curious why the memcmp (which is
used a lot in other places) is slow. However, none of the other memcmp
seem to show up prominently (cso functions are quite low in profiles,
_mesa_search_program_cache uses memcmp too but it's not that high
neither). So I guess those functions either aren't called that often or
the sizes they compare are small.
So should maybe just file a gcc bug for memcmp and look at that
particular llvmpipe issue again :-).

Roland

Am 30.06.2011 01:16, schrieb Corbin Simpson:
 

Okay, so maybe I'm failing to recognize the exact situation here, but
wouldn't it be possible to mark the FS state with a serial number and
just compare those? Or are these FS states not CSO-cached?

~ C.

On Wed, Jun 29, 2011 at 3:44 PM, Roland Scheideggersrol...@vmware.com  wrote:
   

Actually I ran some numbers here and tried out a optimized struct compare:
original ipers: 12.1 fps
ajax patch: 15.5 fps
optimized struct compare: 16.8 fps


This is the function I used for that (just enabled in that lp_setup
function):

static INLINE int util_cmp_struct(const void *src1, const void *src2,
unsigned count)
{
  /* hmm pointer casting is evil */
  const uint32_t *src1_ptr = (uint32_t *)src1;
  const uint32_t *src2_ptr = (uint32_t *)src2;
  unsigned i;
  assert(count % 4 == 0);
  for (i = 0; i  count/4; i++) {
if (*src1_ptr != *src2_ptr) {
  return 1;
}
src1_ptr++;
src2_ptr++;
  }
  return 0;
}

(And no this doesn't use repz cmpsd here.)

So, unless I made some mistake, memcmp is just dead slow (*), most of
the slowness probably coming from the bytewise comparison (and
apparently I was wrong in assuming the comparison there might never be
the same for ipers).
Of course, the optimized struct compare relies on structs really being
dword aligned (I think this is always the case), and additionally it
relies on the struct size being a whole multiple of dwords - likely
struct needs padding to ensure that (at least I don't think this is
always guaranteed for all structs).
But since memcmp is used extensively (cso for one) maybe some
optimization along these lines might be worth it (though of course for
small structs the win isn't going to be as big - and can't beat the repz
cmps in code size...).

Roland

(*) I actually found some references some implementations might be
better they don't just use repz cmpsb but they split this up in parts
which do dword (or qword even - well for really large structs could use
sse2) comparisons for the parts where it's possible and only byte
comparisons for the remaining bytes (and if the compiler does that it
probably would know the size at compile time here hence could leave out
much of the code). Of course memcmp requires that the return value isn't
just a true or false value, hence there's more code needed once an
unequal dword is found, though the compiler could optimize that away too
in case it's not needed. Much the same as memcpy is optimized usually
really, so blame gcc :-).



Am 29.06.2011 20:33, schrieb Roland Scheidegger:
 

Ohh that's interesting, you'd think the comparison shouldn't be that
expensive (though I guess in ipers case the comparison is never true).
memcmp is quite extensively used everywhere. Maybe we could replace that
with something faster (since we only ever care if the blocks are the
same but not care about the lexographic ordering and always compare
whole structs, should compare dwords instead of bytes for a 4 time
speedup)? Or isn't that the reason cmpsb instead of cmpsd is used?
Also I guess it would help if the values which are more likely to be
unequal are first in the struct (if we can tell that).
Of course though if it's unlikely to be the same as the 

Re: [Mesa-dev] [PATCH 3/3] intel: Add support copying Y-tiled buffers with the Gen6 blitter.

2011-06-30 Thread Chris Wilson
On Wed, 29 Jun 2011 23:04:04 -0700, Kenneth Graunke kenn...@whitecape.org 
wrote:
 According to the Sandybridge PRM, Volume 1, Part 5, Section 1.9.15,
 Gen6's blitter supports Y-tiled buffers as well as X-tiled.  Pitch is
 specified in 512-byte granularity for X-tiled, but 128-byte for Y-tiled.

To support Y-tiling, afaics, in the SNB blitter, you have to set a register
that forces it to treat all source and/or destination surface as y-tiled:

  BCS_SWCTRL 0x22200

Fortunately they did permit use to modify it with MI_LOAD_REGISTER_IMM.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Keith Whitwell
On Wed, 2011-06-29 at 16:16 -0700, Corbin Simpson wrote:
 Okay, so maybe I'm failing to recognize the exact situation here, but
 wouldn't it be possible to mark the FS state with a serial number and
 just compare those? Or are these FS states not CSO-cached?

No, the struct being compared is poorly named  collides with a CSO
entity.  It's really all the state which the compiled fragment shader
will reference when it is later invoked.  It's all packed into a single
struct because it's easier to pass a single parameter to llvm-compiled
shaders and add/change that parameter, but it is somewhat non-orthogonal
and we end up generating too many of them.

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Keith Whitwell
On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
 Ok in fact there's a gcc bug about memcmp:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
 In short gcc's memcmp builtin is totally lame and loses to glibc's
 memcmp (including call overhead, no knowledge about alignment etc.) even
 when comparing only very few bytes (and loses BIG time for lots of bytes
 to compare). Oops. Well at least if the strings are the same (I'd guess
 if the first byte is different it's hard to beat the gcc builtin...).
 So this is really a gcc bug. The bug is quite old though with no fix in
 sight apparently so might need to think about some workaround (but just
 not doing the comparison doesn't look like the right idea, since
 apparently it would be faster with the comparison if gcc's memcmp got
 fixed).

Looking at the struct again (it's been a while), it seems like it could
be rearranged to be variable-sized and on average significantly smaller:

struct lp_rast_state {
   struct lp_jit_context jit_context;
   struct lp_fragment_shader_variant *variant;
};

struct lp_jit_context {
   const float *constants;
   float alpha_ref_value;
   uint32_t stencil_ref_front, stencil_ref_back;
   uint8_t *blend_color;
   struct lp_jit_texture textures[PIPE_MAX_SAMPLERS];
};

If we moved the jit_context part behind variant, and then hopefully
note that most of those lp_jit_texture structs are not in use.  That
would save time on the memcmp *and* space in the binned data.

It's weird this wasn't showing up in past profiling.

Kieth


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Jose Fonseca
Great work Roland! And thanks Ajax to finding this hot spot.

We use memcmp a lot -- all CSO caching, so we should use this everywhere.

We should also code a sse2 version with intrinsics for x86-64, which is 
guaranteed to always have SSE2.

Jose

- Original Message -
 Actually I ran some numbers here and tried out a optimized struct
 compare:
 original ipers: 12.1 fps
 ajax patch: 15.5 fps
 optimized struct compare: 16.8 fps
 
 
 This is the function I used for that (just enabled in that lp_setup
 function):
 
 static INLINE int util_cmp_struct(const void *src1, const void *src2,
 unsigned count)
 {
   /* hmm pointer casting is evil */
   const uint32_t *src1_ptr = (uint32_t *)src1;
   const uint32_t *src2_ptr = (uint32_t *)src2;
   unsigned i;
   assert(count % 4 == 0);
   for (i = 0; i  count/4; i++) {
 if (*src1_ptr != *src2_ptr) {
   return 1;
 }
 src1_ptr++;
 src2_ptr++;
   }
   return 0;
 }
 
 (And no this doesn't use repz cmpsd here.)
 
 So, unless I made some mistake, memcmp is just dead slow (*), most of
 the slowness probably coming from the bytewise comparison (and
 apparently I was wrong in assuming the comparison there might never
 be
 the same for ipers).
 Of course, the optimized struct compare relies on structs really
 being
 dword aligned (I think this is always the case), and additionally it
 relies on the struct size being a whole multiple of dwords - likely
 struct needs padding to ensure that (at least I don't think this is
 always guaranteed for all structs).
 But since memcmp is used extensively (cso for one) maybe some
 optimization along these lines might be worth it (though of course
 for
 small structs the win isn't going to be as big - and can't beat the
 repz
 cmps in code size...).
 
 Roland
 
 (*) I actually found some references some implementations might be
 better they don't just use repz cmpsb but they split this up in parts
 which do dword (or qword even - well for really large structs could
 use
 sse2) comparisons for the parts where it's possible and only byte
 comparisons for the remaining bytes (and if the compiler does that it
 probably would know the size at compile time here hence could leave
 out
 much of the code). Of course memcmp requires that the return value
 isn't
 just a true or false value, hence there's more code needed once an
 unequal dword is found, though the compiler could optimize that away
 too
 in case it's not needed. Much the same as memcpy is optimized usually
 really, so blame gcc :-).
 
 
 
 Am 29.06.2011 20:33, schrieb Roland Scheidegger:
  Ohh that's interesting, you'd think the comparison shouldn't be
  that
  expensive (though I guess in ipers case the comparison is never
  true).
  memcmp is quite extensively used everywhere. Maybe we could replace
  that
  with something faster (since we only ever care if the blocks are
  the
  same but not care about the lexographic ordering and always compare
  whole structs, should compare dwords instead of bytes for a 4 time
  speedup)? Or isn't that the reason cmpsb instead of cmpsd is used?
  Also I guess it would help if the values which are more likely to
  be
  unequal are first in the struct (if we can tell that).
  Of course though if it's unlikely to be the same as the compared
  value
  anyway not comparing at all still might be a win (here).
  
  Roland
  
  Am 29.06.2011 19:19, schrieb Adam Jackson:
  Perversely, do this by eliminating the comparison between stored
  and
  current fs state.  On ipers, a perf trace showed
  try_update_scene_state
  using 31% of a CPU, and 98% of that was in 'repz cmpsb', ie, the
  memcmp.
  Taking that out takes try_update_scene_state down to 6.5% of the
  profile; more importantly, ipers goes from 10 to 14fps and gears
  goes
  from 790 to 830fps.
 
  Signed-off-by: Adam Jackson a...@redhat.com
  ---
   src/gallium/drivers/llvmpipe/lp_setup.c |   61
   ++-
   1 files changed, 27 insertions(+), 34 deletions(-)
 
  diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c
  b/src/gallium/drivers/llvmpipe/lp_setup.c
  index cbe06e5..9118db5 100644
  --- a/src/gallium/drivers/llvmpipe/lp_setup.c
  +++ b/src/gallium/drivers/llvmpipe/lp_setup.c
  @@ -839,42 +839,35 @@ try_update_scene_state( struct
  lp_setup_context *setup )
 setup-dirty |= LP_SETUP_NEW_FS;
  }
   
  -
  if (setup-dirty  LP_SETUP_NEW_FS) {
  -  if (!setup-fs.stored ||
  -  memcmp(setup-fs.stored,
  - setup-fs.current,
  - sizeof setup-fs.current) != 0)
  -  {
  - struct lp_rast_state *stored;
  - uint i;
  -
  - /* The fs state that's been stored in the scene is
  different from
  -  * the new, current state.  So allocate a new
  lp_rast_state object
  -  * and append it to the bin's setup data buffer.
  -  */
  - stored = (struct lp_rast_state *) lp_scene_alloc(scene,
  sizeof *stored);
  - if (!stored) {
  - 

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Jose Fonseca


- Original Message -
 Hmm.
 Forgive my ignorance, but isn't memcmp() on structs pretty prone to
 give
 incorrect != results, given that there may be padding between members
 in
 structs and that IIRC gcc struct assignment is member-wise.

There's no alternative to bitwise comparison on C:

$ cat cmp.c 
struct foo {
int a;
int b;
};


int cmp(const struct foo *a, const struct foo *b) {
return *a == *b;
}
$ gcc -c cmp.c 
cmp.c: In function ‘cmp’:
cmp.c:7:12: error: invalid operands to binary == (have ‘const struct foo’ and 
‘const struct foo’)

 What happens if there's padding between the jit_context and variant
 members of struct lp_rast_state?
 
 I seem to recall hitting similar issues a number of times in the
 past.

I recall that as well, but my memory is the other way around: struct assignment 
is considerer harmful because it breaks memcmp. Instead all structures should 
be initialized with memset(0) first, and always copied with memcpy.  This 
should ensure that padding doesn't get clobbered.

But now that you mention it, I'm not still 100% that unsed bits on bitfields 
are preserved like that or not, when being assigned. We probably should ensure 
that the  all bits  in bitfields are used, using reserved members, so that 
the zeros there stay zero.

Jose


 /Thomas
 
 On 06/30/2011 03:36 AM, Roland Scheidegger wrote:
  Ok in fact there's a gcc bug about memcmp:
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
  In short gcc's memcmp builtin is totally lame and loses to glibc's
  memcmp (including call overhead, no knowledge about alignment etc.)
  even
  when comparing only very few bytes (and loses BIG time for lots of
  bytes
  to compare). Oops. Well at least if the strings are the same (I'd
  guess
  if the first byte is different it's hard to beat the gcc
  builtin...).
  So this is really a gcc bug. The bug is quite old though with no
  fix in
  sight apparently so might need to think about some workaround (but
  just
  not doing the comparison doesn't look like the right idea, since
  apparently it would be faster with the comparison if gcc's memcmp
  got
  fixed).
 
 
  Roland
 
 
 
  Am 30.06.2011 01:47, schrieb Roland Scheidegger:
 
  I didn't even look at that was just curious why the memcmp (which
  is
  used a lot in other places) is slow. However, none of the other
  memcmp
  seem to show up prominently (cso functions are quite low in
  profiles,
  _mesa_search_program_cache uses memcmp too but it's not that high
  neither). So I guess those functions either aren't called that
  often or
  the sizes they compare are small.
  So should maybe just file a gcc bug for memcmp and look at that
  particular llvmpipe issue again :-).
 
  Roland
 
  Am 30.06.2011 01:16, schrieb Corbin Simpson:
   
  Okay, so maybe I'm failing to recognize the exact situation here,
  but
  wouldn't it be possible to mark the FS state with a serial number
  and
  just compare those? Or are these FS states not CSO-cached?
 
  ~ C.
 
  On Wed, Jun 29, 2011 at 3:44 PM, Roland
  Scheideggersrol...@vmware.com  wrote:
 
  Actually I ran some numbers here and tried out a optimized
  struct compare:
  original ipers: 12.1 fps
  ajax patch: 15.5 fps
  optimized struct compare: 16.8 fps
 
 
  This is the function I used for that (just enabled in that
  lp_setup
  function):
 
  static INLINE int util_cmp_struct(const void *src1, const void
  *src2,
  unsigned count)
  {
/* hmm pointer casting is evil */
const uint32_t *src1_ptr = (uint32_t *)src1;
const uint32_t *src2_ptr = (uint32_t *)src2;
unsigned i;
assert(count % 4 == 0);
for (i = 0; i  count/4; i++) {
  if (*src1_ptr != *src2_ptr) {
return 1;
  }
  src1_ptr++;
  src2_ptr++;
}
return 0;
  }
 
  (And no this doesn't use repz cmpsd here.)
 
  So, unless I made some mistake, memcmp is just dead slow (*),
  most of
  the slowness probably coming from the bytewise comparison (and
  apparently I was wrong in assuming the comparison there might
  never be
  the same for ipers).
  Of course, the optimized struct compare relies on structs really
  being
  dword aligned (I think this is always the case), and
  additionally it
  relies on the struct size being a whole multiple of dwords -
  likely
  struct needs padding to ensure that (at least I don't think this
  is
  always guaranteed for all structs).
  But since memcmp is used extensively (cso for one) maybe some
  optimization along these lines might be worth it (though of
  course for
  small structs the win isn't going to be as big - and can't beat
  the repz
  cmps in code size...).
 
  Roland
 
  (*) I actually found some references some implementations might
  be
  better they don't just use repz cmpsb but they split this up in
  parts
  which do dword (or qword even - well for really large structs
  could use
  sse2) comparisons for the parts where it's possible and only
  byte
  comparisons for the remaining 

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Jose Fonseca


- Original Message -
 On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
  Ok in fact there's a gcc bug about memcmp:
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
  In short gcc's memcmp builtin is totally lame and loses to glibc's
  memcmp (including call overhead, no knowledge about alignment etc.)
  even
  when comparing only very few bytes (and loses BIG time for lots of
  bytes
  to compare). Oops. Well at least if the strings are the same (I'd
  guess
  if the first byte is different it's hard to beat the gcc
  builtin...).
  So this is really a gcc bug. The bug is quite old though with no
  fix in
  sight apparently so might need to think about some workaround (but
  just
  not doing the comparison doesn't look like the right idea, since
  apparently it would be faster with the comparison if gcc's memcmp
  got
  fixed).
 
 Looking at the struct again (it's been a while), it seems like it
 could
 be rearranged to be variable-sized and on average significantly
 smaller:
 
 struct lp_rast_state {
struct lp_jit_context jit_context;
struct lp_fragment_shader_variant *variant;
 };
 
 struct lp_jit_context {
const float *constants;
float alpha_ref_value;
uint32_t stencil_ref_front, stencil_ref_back;
uint8_t *blend_color;
struct lp_jit_texture textures[PIPE_MAX_SAMPLERS];
 };
 
 If we moved the jit_context part behind variant, and then hopefully
 note that most of those lp_jit_texture structs are not in use.  That
 would save time on the memcmp *and* space in the binned data.

Yeah, sounds a good idea.

But there's some subtletly to computing the number of textures: it can't be 
just the NULL textures, because they may be reffered by the JIT code, which has 
no NULL checks and  relies on the state setup to provide storage for all 
textures, or dummy memory if one is not bound.

I think a better idea would be:
- split the texture/sampler state
- to make the lp_jit_context::textures an array of pointers, and put the struct 
lp_jit_texture in the pipe_texture object themselves
- to make the lp_jit_context::samplers an array of pointers, and put the struct 
lp_jit_sampler in the pipe_sampler_state CSO

struct lp_jit_context {
struct lp_jit_texture *textures[PIPE_MAX_SAMPLERS];
struct lp_jit_sampler *samplers[PIPE_MAX_SAMPLERS];
};

struct lp_jit_texture
{
   uint32_t width;
   uint32_t height;
   uint32_t depth;
   uint32_t first_level;
   uint32_t last_level;
   uint32_t row_stride[LP_MAX_TEXTURE_LEVELS];
   uint32_t img_stride[LP_MAX_TEXTURE_LEVELS];
   const void *data[LP_MAX_TEXTURE_LEVELS];
   /* sampler state, actually */
   float min_lod;
   float max_lod;
   float lod_bias;
   float border_color[4];
};

struct lp_jit_sampler
{
   float min_lod;
   float max_lod;
   float lod_bias;
   float border_color[4];
};


Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] intel: Recognize GL_DEPTH_COMPONENT32 in get_teximage_readbuffer.

2011-06-30 Thread Brian Paul
On Thu, Jun 30, 2011 at 12:04 AM, Kenneth Graunke kenn...@whitecape.org wrote:
 gl_texture_image::InternalFormat is actually the user requested internal
 format, not what the texture actually is.  Thus, even though we don't
 support 32-bit depth buffers, we need to recognize the enumeration here.
 Otherwise, it wrongly returns the color read buffer instead of the depth
 read buffer.

 Fixes an issue in PlaneShift 0.5.7 when casting spells.  The game calls
 CopyTexSubImage2D on buffers with a GL_DEPTH_COMPONENT32 internal
 format, which (prior to this patch) resulted in an attempt to copy an
 ARGB to S8_Z24.  This patch fixes the behavior, but does not yet
 eliminate the software fallback.

 NOTE: This is a candidate for the 7.10 and 7.11 branches.

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 ---
  src/mesa/drivers/dri/intel/intel_tex_copy.c |    1 +
  1 files changed, 1 insertions(+), 0 deletions(-)

 I kind of wonder if we should just be using TexFormat (the actual format)
 rather than InternalFormat (the user requested format).

 diff --git a/src/mesa/drivers/dri/intel/intel_tex_copy.c 
 b/src/mesa/drivers/dri/intel/intel_tex_copy.c
 index eda07a4..8b5c3f0 100644
 --- a/src/mesa/drivers/dri/intel/intel_tex_copy.c
 +++ b/src/mesa/drivers/dri/intel/intel_tex_copy.c
 @@ -58,6 +58,7 @@ get_teximage_readbuffer(struct intel_context *intel, GLenum 
 internalFormat)
    switch (internalFormat) {
    case GL_DEPTH_COMPONENT:
    case GL_DEPTH_COMPONENT16:
 +   case GL_DEPTH_COMPONENT32:
    case GL_DEPTH24_STENCIL8_EXT:
    case GL_DEPTH_STENCIL_EXT:
       return intel_get_renderbuffer(intel-ctx.ReadBuffer, BUFFER_DEPTH);

In the interest of covering all current and future depth formats, you
could replace the switch with a call to _mesa_is_depth_format() ||
_mesa_is_depthstencil_format().  Or don't use internalFormat at all-
query _mesa_get_format_bits(texImage-TexFormat, GL_DEPTH_BITS)  0.

In fact, any place where we're doing a switch on a texture/image
format we should look if the job can be done better with a call to a
format predicate function.

-Brian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 37177] Mathematica Plot3D Crash

2011-06-30 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=37177

--- Comment #3 from Mark van Rossum mvanr...@inf.ed.ac.uk 2011-06-30 05:34:57 
PDT ---
Confirmed for me on Fedora 15 with Mathematica 7
on VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated
Graphics Controller (rev 0c).

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Keith Whitwell
On Thu, 2011-06-30 at 03:27 -0700, Jose Fonseca wrote:
 
 - Original Message -
  On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
   Ok in fact there's a gcc bug about memcmp:
   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
   In short gcc's memcmp builtin is totally lame and loses to glibc's
   memcmp (including call overhead, no knowledge about alignment etc.)
   even
   when comparing only very few bytes (and loses BIG time for lots of
   bytes
   to compare). Oops. Well at least if the strings are the same (I'd
   guess
   if the first byte is different it's hard to beat the gcc
   builtin...).
   So this is really a gcc bug. The bug is quite old though with no
   fix in
   sight apparently so might need to think about some workaround (but
   just
   not doing the comparison doesn't look like the right idea, since
   apparently it would be faster with the comparison if gcc's memcmp
   got
   fixed).
  
  Looking at the struct again (it's been a while), it seems like it
  could
  be rearranged to be variable-sized and on average significantly
  smaller:
  
  struct lp_rast_state {
 struct lp_jit_context jit_context;
 struct lp_fragment_shader_variant *variant;
  };
  
  struct lp_jit_context {
 const float *constants;
 float alpha_ref_value;
 uint32_t stencil_ref_front, stencil_ref_back;
 uint8_t *blend_color;
 struct lp_jit_texture textures[PIPE_MAX_SAMPLERS];
  };
  
  If we moved the jit_context part behind variant, and then hopefully
  note that most of those lp_jit_texture structs are not in use.  That
  would save time on the memcmp *and* space in the binned data.
 
 Yeah, sounds a good idea.
 
 But there's some subtletly to computing the number of textures: it
  can't be just the NULL textures, because they may be reffered by the
  JIT code, which has no NULL checks and  relies on the state setup to
  provide storage for all textures, or dummy memory if one is not bound.

So it's a property of the variant, right?  We should just store that
information when we generate the llvm variant.

 I think a better idea would be:
 - split the texture/sampler state
 - to make the lp_jit_context::textures an array of pointers, and put the 
 struct lp_jit_texture in the pipe_texture object themselves
 - to make the lp_jit_context::samplers an array of pointers, and put the 
 struct lp_jit_sampler in the pipe_sampler_state CSO

I like this too - it's somewhat more involved of course.

In fact the two are orthogonal -- the struct below can still be shrunk
significantly by knowing how many samplers  textures the variant refers
to.  Interleaving them or packing them would reduce the bytes to be
compared.

Alternatively there could be just a pointer in jit_context to
textures/samplers binned elsewhere.

 struct lp_jit_context {
 struct lp_jit_texture *textures[PIPE_MAX_SAMPLERS];
 struct lp_jit_sampler *samplers[PIPE_MAX_SAMPLERS];
 };

The jit context above seems to have lost some of its fields...

The next step might be to split the context into four parts: textures,
samplers, constants, other, and have jit_context just be a set of
pointers into the binned data:

struct lp_jit_context {
 struct lp_jit_texture **textures;
 struct lp_jit_sampler **samplers;
 const float *constants;
 const struct lp_jit_other *other;   
};

struct lp_jit_other {
   float alpha_ref_value;
   uint32_t stencil_ref_front;
   uint32_t stencil_ref_back;
   uint8_t *blend_color;
};

 struct lp_jit_texture
 {
uint32_t width;
uint32_t height;
uint32_t depth;
uint32_t first_level;
uint32_t last_level;
uint32_t row_stride[LP_MAX_TEXTURE_LEVELS];
uint32_t img_stride[LP_MAX_TEXTURE_LEVELS];
const void *data[LP_MAX_TEXTURE_LEVELS];
/* sampler state, actually */
float min_lod;
float max_lod;
float lod_bias;
float border_color[4];
 };
 
 struct lp_jit_sampler
 {
float min_lod;
float max_lod;
float lod_bias;
float border_color[4];
 };
 
 
 Jose


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Adam Jackson
On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
 Ok in fact there's a gcc bug about memcmp:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
 In short gcc's memcmp builtin is totally lame and loses to glibc's
 memcmp (including call overhead, no knowledge about alignment etc.) even
 when comparing only very few bytes (and loses BIG time for lots of bytes
 to compare). Oops. Well at least if the strings are the same (I'd guess
 if the first byte is different it's hard to beat the gcc builtin...).
 So this is really a gcc bug. The bug is quite old though with no fix in
 sight apparently so might need to think about some workaround (but just
 not doing the comparison doesn't look like the right idea, since
 apparently it would be faster with the comparison if gcc's memcmp got
 fixed).

How do things fare if you build with -fno-builtin-memcmp?

- ajax


signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Roland Scheidegger
Am 30.06.2011 12:14, schrieb Jose Fonseca:
 
 
 - Original Message -
 Hmm.
 Forgive my ignorance, but isn't memcmp() on structs pretty prone to
 give
 incorrect != results, given that there may be padding between members
 in
 structs and that IIRC gcc struct assignment is member-wise.
 
 There's no alternative to bitwise comparison on C:
 
 $ cat cmp.c 
 struct foo {
   int a;
   int b;
 };
 
 
 int cmp(const struct foo *a, const struct foo *b) {
   return *a == *b;
 }
 $ gcc -c cmp.c 
 cmp.c: In function ‘cmp’:
 cmp.c:7:12: error: invalid operands to binary == (have ‘const struct foo’ and 
 ‘const struct foo’)
 
 What happens if there's padding between the jit_context and variant
 members of struct lp_rast_state?

 I seem to recall hitting similar issues a number of times in the
 past.
 
 I recall that as well, but my memory is the other way around: struct 
 assignment is considerer harmful because it breaks memcmp. Instead all 
 structures should be initialized with memset(0) first, and always copied with 
 memcpy.  This should ensure that padding doesn't get clobbered.
 
 But now that you mention it, I'm not still 100% that unsed bits on bitfields 
 are preserved like that or not, when being assigned. We probably should 
 ensure that the  all bits  in bitfields are used, using reserved members, 
 so that the zeros there stay zero.

We've definitely hit issues like that in the past - I think if you use
struct assignment you'll need to initialize the dst struct to 0
initially (but only once - even though the padding is probably undefined
after such an assignment, any implementation should either copy
everything including the padding or leave padding alone).
I don't think anything else will touch the unused parts, though I guess
it might be possible for instance if a 32bit int is assigned to a 16 bit
bitfield which has padding after it.
But generally using memcmp/memcpy should work ok, and it gives the
compiler all the information it needs to do it fast. Well if it uses it
or not is another question... I think the problem with gcc is that when
it inserts the comparisons with repz cmpsb it knows alignment and size
to copy but doesn't know the result is only used as a strict comparison
- that makes it impossible to generate really optimized code there (as
you need some byte comparison to get correct memcmp semantics unless you
use bswap or do dword comparison then byte comparison on a non-match,
both of which are probably slower if you expect the comparison to fail
on first byte which is the case for instance in substring searches) and
later it can't optimize that into something more sensible. So this might
not be trivial to fix in gcc. Too bad no builtin is available which only
returns true/false...

Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 37177] Mathematica Plot3D Crash

2011-06-30 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=37177

--- Comment #4 from Ivan Iakoupov vox...@gmail.com 2011-06-30 07:43:07 PDT ---
I did a bisect and it worked prior to this commit:

commit dea5e57861ec998cb7ee913a8819752cb9fa946b
Author: Eric Anholt
Date:   Mon Feb 14 18:57:49 2011 -0800

intel: Use the current context rather than last bound context for a
drawable.

If another thread bound a context to the drawable then unbound it, the
driContextPriv would end up NULL.

With the previous two fixes, this fixes glx-multithread-makecurrent-2,
despite the issue not being about the multithreaded makecurrent.

Currently I also get a full gpu freeze when opening a document with a Plot3D
with mesa master so this bug gets hidden by that one. I'll try bisecting that
freeze and file a new bug report.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix check for empty cs

2011-06-30 Thread Vadim Girlin
On Wed, 2011-06-29 at 16:29 +0400, Vadim Girlin wrote:
 ---
 
 There is ~20% fps boost with the etqw timenetdemo after this patch.

Btw, it seems that bad performance without that patch is caused by the
occlusion queries, so probably they need some fixes too. I've added some
counters to check it and found that there are thousands of redundant
calls to r600_context_flush per second (or ~97% of all flushes), which
are skipped with that patch. And there are no such flushes at all when I
set r_useOcclusionQueries to 0 in the etqw settings. Default for
r_useOcclusionQueries is 1 for me, but it seems the defaults may be
different for others.

There are probably not needed flushes for example in the
r600_get_query_result. It calls ctx-flush, then
r600_context_query_result, which also calls r600_context_flush. Then it
calls r600_query_result - r600_bo_map - one more possible flush. Are
all of these flush calls really needed?
 
Vadim



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Roland Scheidegger
Am 30.06.2011 16:14, schrieb Adam Jackson:
 On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
 Ok in fact there's a gcc bug about memcmp:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
 In short gcc's memcmp builtin is totally lame and loses to glibc's
 memcmp (including call overhead, no knowledge about alignment etc.) even
 when comparing only very few bytes (and loses BIG time for lots of bytes
 to compare). Oops. Well at least if the strings are the same (I'd guess
 if the first byte is different it's hard to beat the gcc builtin...).
 So this is really a gcc bug. The bug is quite old though with no fix in
 sight apparently so might need to think about some workaround (but just
 not doing the comparison doesn't look like the right idea, since
 apparently it would be faster with the comparison if gcc's memcmp got
 fixed).
 
 How do things fare if you build with -fno-builtin-memcmp?

This is even faster:
original ipers: 12.1 fps
ajax patch: 15.5 fps
optimized struct compare: 16.8 fps
-fno-builtin-memcmp: 18.1 fps

Looks like we have a winner :-) I guess glibc optimizes the hell out of
it (in contrast to the other results, this affected all memcmp though I
don't know if any others benefited from that on average).
As noted by Keith though the struct we compare is really large (over 4k)
so trimming the size might be a good idea anyway (of course the 4k size
also meant any call overhead and non-optimal code due to glibc not
knowing alignment beforehand and usage of return value is completely
insignificant).
A 50% improvement from disabling a compiler optimization, lol.

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Keith Whitwell
On Thu, 2011-06-30 at 17:53 +0200, Roland Scheidegger wrote:
 Am 30.06.2011 16:14, schrieb Adam Jackson:
  On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
  Ok in fact there's a gcc bug about memcmp:
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
  In short gcc's memcmp builtin is totally lame and loses to glibc's
  memcmp (including call overhead, no knowledge about alignment etc.) even
  when comparing only very few bytes (and loses BIG time for lots of bytes
  to compare). Oops. Well at least if the strings are the same (I'd guess
  if the first byte is different it's hard to beat the gcc builtin...).
  So this is really a gcc bug. The bug is quite old though with no fix in
  sight apparently so might need to think about some workaround (but just
  not doing the comparison doesn't look like the right idea, since
  apparently it would be faster with the comparison if gcc's memcmp got
  fixed).
  
  How do things fare if you build with -fno-builtin-memcmp?
 
 This is even faster:
 original ipers: 12.1 fps
 ajax patch: 15.5 fps
 optimized struct compare: 16.8 fps
 -fno-builtin-memcmp: 18.1 fps
 
 Looks like we have a winner :-) I guess glibc optimizes the hell out of
 it (in contrast to the other results, this affected all memcmp though I
 don't know if any others benefited from that on average).
 As noted by Keith though the struct we compare is really large (over 4k)
 so trimming the size might be a good idea anyway (of course the 4k size
 also meant any call overhead and non-optimal code due to glibc not
 knowing alignment beforehand and usage of return value is completely
 insignificant).
 A 50% improvement from disabling a compiler optimization, lol.

We probably what this everywhere throughout Mesa  Gallium...

Keith

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] intel: Recognize GL_DEPTH_COMPONENT32 in get_teximage_readbuffer.

2011-06-30 Thread Eric Anholt
On Thu, 30 Jun 2011 06:28:13 -0600, Brian Paul brian.e.p...@gmail.com wrote:
 On Thu, Jun 30, 2011 at 12:04 AM, Kenneth Graunke kenn...@whitecape.org 
 wrote:
  gl_texture_image::InternalFormat is actually the user requested internal
  format, not what the texture actually is.  Thus, even though we don't
  support 32-bit depth buffers, we need to recognize the enumeration here.
  Otherwise, it wrongly returns the color read buffer instead of the depth
  read buffer.
 
  Fixes an issue in PlaneShift 0.5.7 when casting spells.  The game calls
  CopyTexSubImage2D on buffers with a GL_DEPTH_COMPONENT32 internal
  format, which (prior to this patch) resulted in an attempt to copy an
  ARGB to S8_Z24.  This patch fixes the behavior, but does not yet
  eliminate the software fallback.
 
  NOTE: This is a candidate for the 7.10 and 7.11 branches.
 
  Signed-off-by: Kenneth Graunke kenn...@whitecape.org
  ---
   src/mesa/drivers/dri/intel/intel_tex_copy.c |    1 +
   1 files changed, 1 insertions(+), 0 deletions(-)
 
  I kind of wonder if we should just be using TexFormat (the actual format)
  rather than InternalFormat (the user requested format).
 
  diff --git a/src/mesa/drivers/dri/intel/intel_tex_copy.c 
  b/src/mesa/drivers/dri/intel/intel_tex_copy.c
  index eda07a4..8b5c3f0 100644
  --- a/src/mesa/drivers/dri/intel/intel_tex_copy.c
  +++ b/src/mesa/drivers/dri/intel/intel_tex_copy.c
  @@ -58,6 +58,7 @@ get_teximage_readbuffer(struct intel_context *intel, 
  GLenum internalFormat)
     switch (internalFormat) {
     case GL_DEPTH_COMPONENT:
     case GL_DEPTH_COMPONENT16:
  +   case GL_DEPTH_COMPONENT32:
     case GL_DEPTH24_STENCIL8_EXT:
     case GL_DEPTH_STENCIL_EXT:
        return intel_get_renderbuffer(intel-ctx.ReadBuffer, BUFFER_DEPTH);
 
 In the interest of covering all current and future depth formats, you
 could replace the switch with a call to _mesa_is_depth_format() ||
 _mesa_is_depthstencil_format().  Or don't use internalFormat at all-
 query _mesa_get_format_bits(texImage-TexFormat, GL_DEPTH_BITS)  0.

internalFormat in this case is supposed to determine what is copied, so
we have to look at it, not the (current, if any) texture format.  So, I
think _mesa_is_*_format() are going to be the right way to go.

Also, we've got separate stencil issues in this path we need to look
into.


pgp75jfHBBBEL.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 38842] New: Various valid GLX attributes are rejected by MESA glxChooseFBConfig

2011-06-30 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=38842

   Summary: Various valid GLX attributes are rejected by MESA
glxChooseFBConfig
   Product: Mesa
   Version: 7.10
  Platform: x86 (IA32)
OS/Version: Linux (All)
Status: NEW
  Severity: normal
  Priority: medium
 Component: Drivers/X11
AssignedTo: mesa-dev@lists.freedesktop.org
ReportedBy: jonathan.kirk...@arm.com


The following configuration attributes which are listed as valid in the glx
specification are rejected by the the MESA implementation of glxChooseFBConfig:

GLX_MAX_PBUFFER_WIDTH
GLX_MAX_PBUFFER_HEIGHT
GLX_MAX_PBUFFER_PIXELS
GLX_VISUAL_ID
GLX_X_VISUAL_TYPE

The rejection occurs within the choose_visual method within fakeglx.c (having
been called from glxChooseFBConfig.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] mesa master: commit 2699fce0d69db5158427c8b6c8194b2eefc5e58b

2011-06-30 Thread Gustaw Smolarczyk
Commit 2699fce0d69db5158427c8b6c8194b2eefc5e58b:
The first chunk (/common.py) looks really strange.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] st/mesa: use the first non-VOID channel in st_format_datatype

2011-06-30 Thread Marek Olšák
Otherwise PIPE_FORMAT_X8B8G8R8_UNORM and friends would fail.

NOTE: This is a candidate for the 7.10 and 7.11 branches.
---
 src/mesa/state_tracker/st_format.c |   19 ---
 1 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/src/mesa/state_tracker/st_format.c 
b/src/mesa/state_tracker/st_format.c
index fa5d8f5..3260297 100644
--- a/src/mesa/state_tracker/st_format.c
+++ b/src/mesa/state_tracker/st_format.c
@@ -68,10 +68,18 @@ GLenum
 st_format_datatype(enum pipe_format format)
 {
const struct util_format_description *desc;
+   int i;
 
desc = util_format_description(format);
assert(desc);
 
+   /* Find the first non-VOID channel. */
+   for (i = 0; i  4; i++) {
+   if (desc-channel[i].type != UTIL_FORMAT_TYPE_VOID) {
+   break;
+   }
+   }
+
if (desc-layout == UTIL_FORMAT_LAYOUT_PLAIN) {
   if (format == PIPE_FORMAT_B5G5R5A1_UNORM ||
   format == PIPE_FORMAT_B5G6R5_UNORM) {
@@ -85,21 +93,26 @@ st_format_datatype(enum pipe_format format)
   }
   else {
  const GLuint size = format_max_bits(format);
+
+ assert(i  4);
+ if (i == 4)
+return GL_NONE;
+
  if (size == 8) {
-if (desc-channel[0].type == UTIL_FORMAT_TYPE_UNSIGNED)
+if (desc-channel[i].type == UTIL_FORMAT_TYPE_UNSIGNED)
return GL_UNSIGNED_BYTE;
 else
return GL_BYTE;
  }
  else if (size == 16) {
-if (desc-channel[0].type == UTIL_FORMAT_TYPE_UNSIGNED)
+if (desc-channel[i].type == UTIL_FORMAT_TYPE_UNSIGNED)
return GL_UNSIGNED_SHORT;
 else
return GL_SHORT;
  }
  else {
 assert( size = 32 );
-if (desc-channel[0].type == UTIL_FORMAT_TYPE_UNSIGNED)
+if (desc-channel[i].type == UTIL_FORMAT_TYPE_UNSIGNED)
return GL_UNSIGNED_INT;
 else
return GL_INT;
-- 
1.7.4.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] st/mesa: handle float formats in st_format_datatype

2011-06-30 Thread Marek Olšák
NOTE: This is a candidate for the 7.11 branch.
---
 src/mesa/state_tracker/st_format.c |   16 ++--
 1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_format.c 
b/src/mesa/state_tracker/st_format.c
index 3260297..d1995f1 100644
--- a/src/mesa/state_tracker/st_format.c
+++ b/src/mesa/state_tracker/st_format.c
@@ -85,6 +85,10 @@ st_format_datatype(enum pipe_format format)
   format == PIPE_FORMAT_B5G6R5_UNORM) {
  return GL_UNSIGNED_SHORT;
   }
+  else if (format == PIPE_FORMAT_R11G11B10_FLOAT ||
+   format == PIPE_FORMAT_R9G9B9E5_FLOAT) {
+ return GL_FLOAT;
+  }
   else if (format == PIPE_FORMAT_Z24_UNORM_S8_USCALED ||
format == PIPE_FORMAT_S8_USCALED_Z24_UNORM ||
format == PIPE_FORMAT_Z24X8_UNORM ||
@@ -105,18 +109,26 @@ st_format_datatype(enum pipe_format format)
return GL_BYTE;
  }
  else if (size == 16) {
+if (desc-channel[i].type == UTIL_FORMAT_TYPE_FLOAT)
+   return GL_HALF_FLOAT;
 if (desc-channel[i].type == UTIL_FORMAT_TYPE_UNSIGNED)
return GL_UNSIGNED_SHORT;
 else
return GL_SHORT;
  }
- else {
-assert( size = 32 );
+ else if (size = 32) {
+if (desc-channel[i].type == UTIL_FORMAT_TYPE_FLOAT)
+   return GL_FLOAT;
 if (desc-channel[i].type == UTIL_FORMAT_TYPE_UNSIGNED)
return GL_UNSIGNED_INT;
 else
return GL_INT;
  }
+ else {
+assert(size == 64);
+assert(desc-channel[i].type == UTIL_FORMAT_TYPE_FLOAT);
+return GL_DOUBLE;
+ }
   }
}
else if (format == PIPE_FORMAT_UYVY) {
-- 
1.7.4.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/gen7: Remove gratuitous dirty flags from WM and PS state.

2011-06-30 Thread Kenneth Graunke
Commit b46dc45ceef3deb17ba2b0b4300eeb93e9cf7833 claimed that
NEW_POLYGONSTIPPLE is gratuitous, but somehow just changed comments
and whitespace instead of actually removing the flag.

While we're at it, 3DSTATE_PS doesn't appear to need NEW_LINE or
NEW_POLYGON either (those are in 3DSTATE_WM).

Cc: Eric Anholt e...@anholt.net
Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/gen7_wm_state.c |7 ++-
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c 
b/src/mesa/drivers/dri/i965/gen7_wm_state.c
index 17f7535..0688963 100644
--- a/src/mesa/drivers/dri/i965/gen7_wm_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c
@@ -138,7 +138,7 @@ upload_wm_state(struct brw_context *brw)
 
 const struct brw_tracked_state gen7_wm_state = {
.dirty = {
-  .mesa  = (_NEW_LINE | _NEW_POLYGON | _NEW_POLYGONSTIPPLE |
+  .mesa  = (_NEW_LINE | _NEW_POLYGON
_NEW_COLOR | _NEW_BUFFERS),
   .brw   = (BRW_NEW_CURBE_OFFSETS |
BRW_NEW_FRAGMENT_PROGRAM |
@@ -240,10 +240,7 @@ upload_ps_state(struct brw_context *brw)
 
 const struct brw_tracked_state gen7_ps_state = {
.dirty = {
-  .mesa  = (_NEW_LINE |
-   _NEW_POLYGON |
-   _NEW_POLYGONSTIPPLE |
-   _NEW_PROGRAM_CONSTANTS),
+  .mesa  = _NEW_PROGRAM_CONSTANTS,
   .brw   = (BRW_NEW_CURBE_OFFSETS |
BRW_NEW_FRAGMENT_PROGRAM |
 BRW_NEW_NR_WM_SURFACES |
-- 
1.7.4.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/13] Floating-point depth buffers

2011-06-30 Thread Marek Olšák
Hi,

this patch series implements ARB_depth_buffer_float in Mesa and Gallium. There 
is complete r600g/r600-r700 support in my private branch, which passes the same 
tests that pass for Z24S8. Softpipe has only sampler support. This has turned 
out to be not so trivial, so it's possible I missed something.

I did not implement NV_depth_buffer_float, because it's not compatible with the 
ARB variant. (GL_DEPTH_COMPONENT32F != GL_DEPTH_COMPONENT32F_NV etc.) The NV 
extension can operate on unclamped depth values, whereas the ARB one always 
clamps them.

Please review.

Marek Olšák (13):
  mesa: initial ARB_depth_buffer_float support
  mesa: implement texfetch functions for depth_buffer_float
  mesa: implement stencil unpacking for GL_FLOAT_32_UNSIGNED_INT_24_8_REV
  mesa: implement depth unpacking for GL_FLOAT_32_UNSIGNED_INT_24_8_REV
  mesa: implement texstore for DEPTH_COMPONENT32F
  mesa: implement texstore for DEPTH32F_STENCIL8
  mesa: implement generatemipmap for GL_FLOAT_32_UNSIGNED_INT_24_8_REV
  mesa: implement depth/stencil renderbuffer wrapper accessors for 
Z32F_X24S8
  st/mesa: initial ARB_depth_buffer_float support
  st/mesa: implement read/draw/copypixels for Z32F and Z32F_S8X24
  gallium/util: implement pack functions for Z32F and Z32F_S8X24
  gallium/util: implement software Z32F_S8X24 depth-stencil clear
  gallium/util: handle Z32F_FLOAT_S8X24_USCALED in pipe_tile_raw_to_rgba

 src/gallium/auxiliary/util/u_pack_color.h |   64 ++
 src/gallium/auxiliary/util/u_surface.c|   35 +++-
 src/gallium/auxiliary/util/u_tile.c   |   35 +++
 src/mesa/main/depthstencil.c  |  322 +++--
 src/mesa/main/depthstencil.h  |5 +
 src/mesa/main/fbobject.c  |   19 ++
 src/mesa/main/formats.c   |   29 +++
 src/mesa/main/formats.h   |3 +
 src/mesa/main/framebuffer.c   |   10 +-
 src/mesa/main/image.c |   18 ++-
 src/mesa/main/mipmap.c|   20 ++
 src/mesa/main/pack.c  |   62 +-
 src/mesa/main/readpix.c   |   29 +++-
 src/mesa/main/renderbuffer.c  |3 +
 src/mesa/main/texfetch.c  |   14 ++
 src/mesa/main/texfetch_tmp.h  |   23 ++
 src/mesa/main/texformat.c |   13 ++
 src/mesa/main/texstore.c  |   79 +++-
 src/mesa/state_tracker/st_cb_clear.c  |6 +-
 src/mesa/state_tracker/st_cb_drawpixels.c |   64 +-
 src/mesa/state_tracker/st_cb_readpixels.c |   43 
 src/mesa/state_tracker/st_extensions.c|   11 +
 src/mesa/state_tracker/st_format.c|   19 ++
 23 files changed, 875 insertions(+), 51 deletions(-)

 Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/13] mesa: initial ARB_depth_buffer_float support

2011-06-30 Thread Marek Olšák
Using GL_NONE as DataType of Z32_FLOAT_X24S8, not sure what I should put there.
The spec says the type is n/a.
---
 src/mesa/main/fbobject.c |   19 +++
 src/mesa/main/formats.c  |   29 +
 src/mesa/main/formats.h  |3 +++
 src/mesa/main/image.c|   18 --
 src/mesa/main/readpix.c  |   29 +
 src/mesa/main/renderbuffer.c |3 +++
 src/mesa/main/texfetch.c |   14 ++
 src/mesa/main/texformat.c|   13 +
 src/mesa/main/texstore.c |3 +++
 9 files changed, 125 insertions(+), 6 deletions(-)

diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c
index 8cc3fd4..d094dd3 100644
--- a/src/mesa/main/fbobject.c
+++ b/src/mesa/main/fbobject.c
@@ -1131,6 +1131,16 @@ _mesa_base_fbo_format(struct gl_context *ctx, GLenum 
internalFormat)
  return GL_DEPTH_STENCIL_EXT;
   else
  return 0;
+   case GL_DEPTH_COMPONENT32F:
+  if (ctx-Extensions.ARB_depth_buffer_float)
+ return GL_DEPTH_COMPONENT;
+  else
+ return 0;
+   case GL_DEPTH32F_STENCIL8:
+  if (ctx-Extensions.ARB_depth_buffer_float)
+ return GL_DEPTH_STENCIL;
+  else
+ return 0;
case GL_RED:
case GL_R8:
case GL_R16:
@@ -2266,6 +2276,15 @@ _mesa_GetFramebufferAttachmentParameterivEXT(GLenum 
target, GLenum attachment,
 /* special cases */
 *params = GL_INDEX;
  }
+ else if (format == MESA_FORMAT_Z32_FLOAT_X24S8) {
+/* depends on the attachment parameter */
+if (attachment == GL_STENCIL_ATTACHMENT) {
+   *params = GL_INDEX;
+}
+else {
+   *params = GL_FLOAT;
+}
+ }
  else {
 *params = _mesa_get_format_datatype(format);
  }
diff --git a/src/mesa/main/formats.c b/src/mesa/main/formats.c
index e88ba43..f58b197 100644
--- a/src/mesa/main/formats.c
+++ b/src/mesa/main/formats.c
@@ -1091,6 +1091,25 @@ static struct gl_format_info 
format_info[MESA_FORMAT_COUNT] =
   0, 0, 0, 0, 0,
   1, 1, 4
},
+   /* ARB_depth_buffer_float */
+   {
+  MESA_FORMAT_Z32_FLOAT,   /* Name */
+  MESA_FORMAT_Z32_FLOAT, /* StrName */
+  GL_DEPTH_COMPONENT,  /* BaseFormat */
+  GL_FLOAT,/* DataType */
+  0, 0, 0, 0,  /* Red/Green/Blue/AlphaBits */
+  0, 0, 0, 32, 0,  /* Lum/Int/Index/Depth/StencilBits */
+  1, 1, 4  /* BlockWidth/Height,Bytes */
+   },
+   {
+  MESA_FORMAT_Z32_FLOAT_X24S8, /* Name */
+  MESA_FORMAT_Z32_FLOAT_X24S8, /* StrName */
+  GL_DEPTH_STENCIL,/* BaseFormat */
+  GL_NONE /* XXX */,   /* DataType */
+  0, 0, 0, 0,  /* Red/Green/Blue/AlphaBits */
+  0, 0, 0, 32, 8,  /* Lum/Int/Index/Depth/StencilBits */
+  1, 1, 8  /* BlockWidth/Height,Bytes */
+   },
 };
 
 
@@ -1654,6 +1673,16 @@ _mesa_format_to_type_and_comps(gl_format format,
   *comps = 1;
   return;
 
+   case MESA_FORMAT_Z32_FLOAT:
+  *datatype = GL_FLOAT;
+  *comps = 1;
+  return;
+
+   case MESA_FORMAT_Z32_FLOAT_X24S8:
+  *datatype = GL_FLOAT_32_UNSIGNED_INT_24_8_REV;
+  *comps = 1;
+  return;
+
case MESA_FORMAT_DUDV8:
   *datatype = GL_BYTE;
   *comps = 2;
diff --git a/src/mesa/main/formats.h b/src/mesa/main/formats.h
index 0640bbc..5b8c017 100644
--- a/src/mesa/main/formats.h
+++ b/src/mesa/main/formats.h
@@ -209,6 +209,9 @@ typedef enum
MESA_FORMAT_RGB9_E5_FLOAT,
MESA_FORMAT_R11_G11_B10_FLOAT,
 
+   MESA_FORMAT_Z32_FLOAT,
+   MESA_FORMAT_Z32_FLOAT_X24S8,
+
MESA_FORMAT_COUNT
 } gl_format;
 
diff --git a/src/mesa/main/image.c b/src/mesa/main/image.c
index 6d7bc73..37127dc 100644
--- a/src/mesa/main/image.c
+++ b/src/mesa/main/image.c
@@ -84,6 +84,7 @@ _mesa_type_is_packed(GLenum type)
case GL_UNSIGNED_INT_24_8_EXT:
case GL_UNSIGNED_INT_5_9_9_9_REV:
case GL_UNSIGNED_INT_10F_11F_11F_REV:
+   case GL_FLOAT_32_UNSIGNED_INT_24_8_REV:
   return GL_TRUE;
}
 
@@ -228,6 +229,8 @@ _mesa_sizeof_packed_type( GLenum type )
  return sizeof(GLuint);
   case GL_UNSIGNED_INT_10F_11F_11F_REV:
  return sizeof(GLuint);
+  case GL_FLOAT_32_UNSIGNED_INT_24_8_REV:
+ return 8;
   default:
  return -1;
}
@@ -379,6 +382,11 @@ _mesa_bytes_per_pixel( GLenum format, GLenum type )
 return sizeof(GLuint);
  else
 return -1;
+  case GL_FLOAT_32_UNSIGNED_INT_24_8_REV:
+ if (format == GL_DEPTH_STENCIL)
+return 8;
+ else
+return -1;
   default:
  return -1;
}
@@ -531,8 +539,10 @@ _mesa_is_legal_format_and_type(const struct gl_context 
*ctx,
  else
 return GL_FALSE;
   case GL_DEPTH_STENCIL_EXT:
- if 

[Mesa-dev] [PATCH 02/13] mesa: implement texfetch functions for depth_buffer_float

2011-06-30 Thread Marek Olšák
---
 src/mesa/main/texfetch.c |   16 
 src/mesa/main/texfetch_tmp.h |   23 +++
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/src/mesa/main/texfetch.c b/src/mesa/main/texfetch.c
index 4b85bc3..72283eb 100644
--- a/src/mesa/main/texfetch.c
+++ b/src/mesa/main/texfetch.c
@@ -916,17 +916,17 @@ texfetch_funcs[MESA_FORMAT_COUNT] =
},
{
   MESA_FORMAT_Z32_FLOAT,
-  NULL, /* XXX */
-  NULL,
-  NULL,
-  NULL
+  fetch_texel_1d_f_r_f32, /* Reuse the R32F functions. */
+  fetch_texel_2d_f_r_f32,
+  fetch_texel_3d_f_r_f32,
+  store_texel_r_f32
},
{
   MESA_FORMAT_Z32_FLOAT_X24S8,
-  NULL, /* XXX */
-  NULL,
-  NULL,
-  NULL
+  fetch_texel_1d_z32f_x24s8,
+  fetch_texel_2d_z32f_x24s8,
+  fetch_texel_3d_z32f_x24s8,
+  store_texel_z32f_x24s8
}
 };
 
diff --git a/src/mesa/main/texfetch_tmp.h b/src/mesa/main/texfetch_tmp.h
index e6fd81d..3b1eedf 100644
--- a/src/mesa/main/texfetch_tmp.h
+++ b/src/mesa/main/texfetch_tmp.h
@@ -2374,6 +2374,29 @@ static void store_texel_r11_g11_b10f(struct 
gl_texture_image *texImage,
 #endif
 
 
+/* MESA_FORMAT_Z32_FLOAT_X24S8 ***/
+
+static void FETCH(z32f_x24s8)(const struct gl_texture_image *texImage,
+ GLint i, GLint j, GLint k, GLfloat *texel)
+{
+   const GLfloat *src = TEXEL_ADDR(GLfloat, texImage, i, j, k, 2);
+   texel[RCOMP] = src[0];
+   texel[GCOMP] = 0.0F;
+   texel[BCOMP] = 0.0F;
+   texel[ACOMP] = 1.0F;
+}
+
+#if DIM == 3
+static void store_texel_z32f_x24s8(struct gl_texture_image *texImage,
+   GLint i, GLint j, GLint k, const void 
*texel)
+{
+   const GLfloat *src = (const GLfloat *) texel;
+   GLfloat *dst = TEXEL_ADDR(GLfloat, texImage, i, j, k, 2);
+   dst[0] = src[0];
+}
+#endif
+
+
 #undef TEXEL_ADDR
 #undef DIM
 #undef FETCH
-- 
1.7.4.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/13] mesa: implement stencil unpacking for GL_FLOAT_32_UNSIGNED_INT_24_8_REV

2011-06-30 Thread Marek Olšák
---
 src/mesa/main/pack.c |   35 ---
 1 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/src/mesa/main/pack.c b/src/mesa/main/pack.c
index a232a51..c284c7d 100644
--- a/src/mesa/main/pack.c
+++ b/src/mesa/main/pack.c
@@ -1971,7 +1971,8 @@ extract_uint_indexes(GLuint n, GLuint indexes[],
   srcType == GL_INT ||
   srcType == GL_UNSIGNED_INT_24_8_EXT ||
   srcType == GL_HALF_FLOAT_ARB ||
-  srcType == GL_FLOAT);
+  srcType == GL_FLOAT ||
+  srcType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV);
 
switch (srcType) {
   case GL_BITMAP:
@@ -2142,6 +2143,23 @@ extract_uint_indexes(GLuint n, GLuint indexes[],
 }
  }
  break;
+  case GL_FLOAT_32_UNSIGNED_INT_24_8_REV:
+ {
+GLuint i;
+const GLuint *s = (const GLuint *) src;
+if (unpack-SwapBytes) {
+   for (i = 0; i  n; i++) {
+  GLuint value = s[i*2+1];
+  SWAP4BYTE(value);
+  indexes[i] = value  0xff;  /* lower 8 bits */
+   }
+}
+else {
+   for (i = 0; i  n; i++)
+  indexes[i] = s[i*2+1]  0xff;  /* lower 8 bits */
+}
+ }
+ break;
 
   default:
  _mesa_problem(NULL, bad srcType in extract_uint_indexes);
@@ -4412,11 +4430,13 @@ _mesa_unpack_stencil_span( struct gl_context *ctx, 
GLuint n,
   srcType == GL_INT ||
   srcType == GL_UNSIGNED_INT_24_8_EXT ||
   srcType == GL_HALF_FLOAT_ARB ||
-  srcType == GL_FLOAT);
+  srcType == GL_FLOAT ||
+  srcType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV);
 
ASSERT(dstType == GL_UNSIGNED_BYTE ||
   dstType == GL_UNSIGNED_SHORT ||
-  dstType == GL_UNSIGNED_INT);
+  dstType == GL_UNSIGNED_INT ||
+  dstType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV);
 
/* only shift and offset apply to stencil */
transferOps = IMAGE_SHIFT_OFFSET_BIT;
@@ -4488,6 +4508,15 @@ _mesa_unpack_stencil_span( struct gl_context *ctx, 
GLuint n,
  case GL_UNSIGNED_INT:
 memcpy(dest, indexes, n * sizeof(GLuint));
 break;
+ case GL_FLOAT_32_UNSIGNED_INT_24_8_REV:
+{
+   GLuint *dst = (GLuint *) dest;
+   GLuint i;
+   for (i = 0; i  n; i++) {
+  dst[i*2+1] = indexes[i]  0xff; /* lower 8 bits */
+   }
+}
+break;
  default:
 _mesa_problem(ctx, bad dstType in _mesa_unpack_stencil_span);
   }
-- 
1.7.4.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/13] mesa: implement depth unpacking for GL_FLOAT_32_UNSIGNED_INT_24_8_REV

2011-06-30 Thread Marek Olšák
---
 src/mesa/main/pack.c |   27 +--
 1 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/pack.c b/src/mesa/main/pack.c
index c284c7d..d42ae7b 100644
--- a/src/mesa/main/pack.c
+++ b/src/mesa/main/pack.c
@@ -4827,6 +4827,20 @@ _mesa_unpack_depth_span( struct gl_context *ctx, GLuint 
n,
 }
  }
  break;
+  case GL_FLOAT_32_UNSIGNED_INT_24_8_REV:
+ {
+GLuint i;
+const GLfloat *src = (const GLfloat *)source;
+for (i = 0; i  n; i++) {
+   GLfloat value = src[i * 2];
+   if (srcPacking-SwapBytes) {
+  SWAP4BYTE(value);
+   }
+   depthValues[i] = value;
+}
+needClamp = GL_TRUE;
+ }
+ break;
   case GL_FLOAT:
  DEPTH_VALUES(GLfloat, 1*);
  needClamp = GL_TRUE;
@@ -4903,9 +4917,18 @@ _mesa_unpack_depth_span( struct gl_context *ctx, GLuint 
n,
  zValues[i] = (GLushort) (depthValues[i] * (GLfloat) depthMax);
   }
}
+   else if (dstType == GL_FLOAT) {
+  /* Nothing to do. depthValues is pointing to dest. */
+   }
+   else if (dstType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV) {
+  GLfloat *zValues = (GLfloat*) dest;
+  GLuint i;
+  for (i = 0; i  n; i++) {
+ zValues[i*2] = depthValues[i];
+  }
+   }
else {
-  ASSERT(dstType == GL_FLOAT);
-  /*ASSERT(depthMax == 1.0F);*/
+  ASSERT(0);
}
 
free(depthTemp);
-- 
1.7.4.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/13] mesa: implement texstore for DEPTH_COMPONENT32F

2011-06-30 Thread Marek Olšák
---
 src/mesa/main/texstore.c |   12 +++-
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/mesa/main/texstore.c b/src/mesa/main/texstore.c
index f1de31a..cdac214 100644
--- a/src/mesa/main/texstore.c
+++ b/src/mesa/main/texstore.c
@@ -1002,15 +1002,17 @@ memcpy_texture(struct gl_context *ctx,
 
 
 /**
- * Store a 32-bit integer depth component texture image.
+ * Store a 32-bit integer or float depth component texture image.
  */
 static GLboolean
 _mesa_texstore_z32(TEXSTORE_PARAMS)
 {
const GLuint depthScale = 0x;
const GLuint texelBytes = _mesa_get_format_bytes(dstFormat);
+   const GLenum dstType = _mesa_get_format_datatype(dstFormat);
(void) dims;
-   ASSERT(dstFormat == MESA_FORMAT_Z32);
+   ASSERT(dstFormat == MESA_FORMAT_Z32 ||
+  dstFormat == MESA_FORMAT_Z32_FLOAT);
ASSERT(texelBytes == sizeof(GLuint));
 
if (ctx-Pixel.DepthScale == 1.0f 
@@ -1018,7 +1020,7 @@ _mesa_texstore_z32(TEXSTORE_PARAMS)
!srcPacking-SwapBytes 
baseInternalFormat == GL_DEPTH_COMPONENT 
srcFormat == GL_DEPTH_COMPONENT 
-   srcType == GL_UNSIGNED_INT) {
+   srcType == dstType) {
   /* simple memcpy path */
   memcpy_texture(ctx, dims,
  dstFormat, dstAddr, dstXoffset, dstYoffset, dstZoffset,
@@ -1039,7 +1041,7 @@ _mesa_texstore_z32(TEXSTORE_PARAMS)
 const GLvoid *src = _mesa_image_address(dims, srcPacking,
 srcAddr, srcWidth, srcHeight, srcFormat, srcType, img, row, 0);
 _mesa_unpack_depth_span(ctx, srcWidth,
-GL_UNSIGNED_INT, (GLuint *) dstRow,
+dstType, dstRow,
 depthScale, srcType, src, srcPacking);
 dstRow += dstRowStride;
  }
@@ -4423,7 +4425,7 @@ texstore_funcs[MESA_FORMAT_COUNT] =
{ MESA_FORMAT_RGB9_E5_FLOAT, _mesa_texstore_rgb9_e5 },
{ MESA_FORMAT_R11_G11_B10_FLOAT, _mesa_texstore_r11_g11_b10f },
 
-   { MESA_FORMAT_Z32_FLOAT, NULL /* XXX */ },
+   { MESA_FORMAT_Z32_FLOAT, _mesa_texstore_z32 },
{ MESA_FORMAT_Z32_FLOAT_X24S8, /* XXX */ },
 };
 
-- 
1.7.4.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/13] mesa: implement texstore for DEPTH32F_STENCIL8

2011-06-30 Thread Marek Olšák
---
 src/mesa/main/texstore.c |   68 +-
 1 files changed, 67 insertions(+), 1 deletions(-)

diff --git a/src/mesa/main/texstore.c b/src/mesa/main/texstore.c
index cdac214..7e2cafc 100644
--- a/src/mesa/main/texstore.c
+++ b/src/mesa/main/texstore.c
@@ -4290,6 +4290,72 @@ _mesa_texstore_r11_g11_b10f(TEXSTORE_PARAMS)
 }
 
 
+static GLboolean
+_mesa_texstore_z32f_x24s8(TEXSTORE_PARAMS)
+{
+   ASSERT(dstFormat == MESA_FORMAT_Z32_FLOAT_X24S8);
+   ASSERT(srcFormat == GL_DEPTH_STENCIL ||
+  srcFormat == GL_DEPTH_COMPONENT ||
+  srcFormat == GL_STENCIL_INDEX);
+   ASSERT(srcFormat != GL_DEPTH_STENCIL ||
+  srcType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV);
+
+   if (srcFormat == GL_DEPTH_STENCIL 
+   ctx-Pixel.DepthScale == 1.0f 
+   ctx-Pixel.DepthBias == 0.0f 
+   !srcPacking-SwapBytes) {
+  /* simple path */
+  memcpy_texture(ctx, dims,
+ dstFormat, dstAddr, dstXoffset, dstYoffset, dstZoffset,
+ dstRowStride,
+ dstImageOffsets,
+ srcWidth, srcHeight, srcDepth, srcFormat, srcType,
+ srcAddr, srcPacking);
+   }
+   else if (srcFormat == GL_DEPTH_COMPONENT ||
+srcFormat == GL_STENCIL_INDEX) {
+  GLint img, row;
+  const GLint srcRowStride
+ = _mesa_image_row_stride(srcPacking, srcWidth, srcFormat, srcType)
+ / sizeof(uint64_t);
+
+  /* In case we only upload depth we need to preserve the stencil */
+  for (img = 0; img  srcDepth; img++) {
+ uint64_t *dstRow = (uint64_t *) dstAddr
++ dstImageOffsets[dstZoffset + img]
++ dstYoffset * dstRowStride / sizeof(uint64_t)
++ dstXoffset;
+ const uint64_t *src
+= (const uint64_t *) _mesa_image_address(dims, srcPacking, srcAddr,
+  srcWidth, srcHeight,
+  srcFormat, srcType,
+  img, 0, 0);
+ for (row = 0; row  srcHeight; row++) {
+/* The unpack functions with:
+ *dstType = GL_FLOAT_32_UNSIGNED_INT_24_8_REV
+ * only write their own dword, so the other dword (stencil
+ * or depth) is preserved. */
+if (srcFormat != GL_STENCIL_INDEX)
+   _mesa_unpack_depth_span(ctx, srcWidth,
+   GL_FLOAT_32_UNSIGNED_INT_24_8_REV, /* 
dst type */
+   dstRow, /* dst addr */
+   1.0f, srcType, src, srcPacking);
+
+if (srcFormat != GL_DEPTH_COMPONENT)
+   _mesa_unpack_stencil_span(ctx, srcWidth,
+ GL_FLOAT_32_UNSIGNED_INT_24_8_REV, /* 
dst type */
+ dstRow, /* dst addr */
+ srcType, src, srcPacking,
+ ctx-_ImageTransferState);
+
+src += srcRowStride;
+dstRow += dstRowStride / sizeof(uint64_t);
+ }
+  }
+   }
+   return GL_TRUE;
+}
+
 
 /**
  * Table mapping MESA_FORMAT_* to _mesa_texstore_*()
@@ -4426,7 +4492,7 @@ texstore_funcs[MESA_FORMAT_COUNT] =
{ MESA_FORMAT_R11_G11_B10_FLOAT, _mesa_texstore_r11_g11_b10f },
 
{ MESA_FORMAT_Z32_FLOAT, _mesa_texstore_z32 },
-   { MESA_FORMAT_Z32_FLOAT_X24S8, /* XXX */ },
+   { MESA_FORMAT_Z32_FLOAT_X24S8, _mesa_texstore_z32f_x24s8 },
 };
 
 
-- 
1.7.4.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/13] mesa: implement generatemipmap for GL_FLOAT_32_UNSIGNED_INT_24_8_REV

2011-06-30 Thread Marek Olšák
---
 src/mesa/main/mipmap.c |   20 
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/src/mesa/main/mipmap.c b/src/mesa/main/mipmap.c
index f2724db..8a811cb 100644
--- a/src/mesa/main/mipmap.c
+++ b/src/mesa/main/mipmap.c
@@ -706,6 +706,17 @@ do_row(GLenum datatype, GLuint comps, GLint srcWidth,
   }
}
 
+   else if (datatype == GL_FLOAT_32_UNSIGNED_INT_24_8_REV  comps == 1) {
+  GLuint i, j, k;
+  const GLfloat *rowA = (const GLfloat *) srcRowA;
+  const GLfloat *rowB = (const GLfloat *) srcRowB;
+  GLfloat *dst = (GLfloat *) dstRow;
+  for (i = j = 0, k = k0; i  (GLuint) dstWidth;
+   i++, j += colStride, k += colStride) {
+ dst[i*2] = (rowA[j*2] + rowA[k*2] + rowB[j*2] + rowB[k*2]) * 0.25F;
+  }
+   }
+
else {
   _mesa_problem(NULL, bad format in do_row());
}
@@ -1341,6 +1352,15 @@ do_row_3D(GLenum datatype, GLuint comps, GLint srcWidth,
   }
}
 
+   else if (datatype == GL_FLOAT_32_UNSIGNED_INT_24_8_REV  comps == 1) {
+  DECLARE_ROW_POINTERS(GLfloat, 2);
+
+  for (i = j = 0, k = k0; i  (GLuint) dstWidth;
+   i++, j += colStride, k += colStride) {
+ FILTER_F_3D(0);
+  }
+   }
+
else {
   _mesa_problem(NULL, bad format in do_row());
}
-- 
1.7.4.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/13] st/mesa: initial ARB_depth_buffer_float support

2011-06-30 Thread Marek Olšák
---
 src/mesa/state_tracker/st_cb_clear.c   |6 --
 src/mesa/state_tracker/st_extensions.c |   11 +++
 src/mesa/state_tracker/st_format.c |   19 +++
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_clear.c 
b/src/mesa/state_tracker/st_cb_clear.c
index 181fedd..117000b 100644
--- a/src/mesa/state_tracker/st_cb_clear.c
+++ b/src/mesa/state_tracker/st_cb_clear.c
@@ -381,7 +381,8 @@ check_clear_depth_stencil_with_quad(struct gl_context *ctx, 
struct gl_renderbuff
 
assert(rb-Format == MESA_FORMAT_S8 ||
   rb-Format == MESA_FORMAT_Z24_S8 ||
-  rb-Format == MESA_FORMAT_S8_Z24);
+  rb-Format == MESA_FORMAT_S8_Z24 ||
+  rb-Format == MESA_FORMAT_Z32_FLOAT_X24S8);
 
if (ctx-Scissor.Enabled 
(ctx-Scissor.X != 0 ||
@@ -436,7 +437,8 @@ check_clear_stencil_with_quad(struct gl_context *ctx, 
struct gl_renderbuffer *rb
 
assert(rb-Format == MESA_FORMAT_S8 ||
   rb-Format == MESA_FORMAT_Z24_S8 ||
-  rb-Format == MESA_FORMAT_S8_Z24);
+  rb-Format == MESA_FORMAT_S8_Z24 ||
+  rb-Format == MESA_FORMAT_Z32_FLOAT_X24S8);
 
if (maskStencil) 
   return GL_TRUE;
diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index d3aebe5..99b231d 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c
@@ -607,4 +607,15 @@ void st_init_extensions(struct st_context *st)
if (screen-get_param(screen, PIPE_CAP_SM3)) {
   ctx-Extensions.ARB_shader_texture_lod = GL_TRUE;
}
+
+   if (screen-is_format_supported(screen, PIPE_FORMAT_Z32_FLOAT,
+   PIPE_TEXTURE_2D, 0,
+   PIPE_BIND_DEPTH_STENCIL |
+   PIPE_BIND_SAMPLER_VIEW) 
+   screen-is_format_supported(screen, PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED,
+   PIPE_TEXTURE_2D, 0,
+   PIPE_BIND_DEPTH_STENCIL |
+   PIPE_BIND_SAMPLER_VIEW)) {
+  ctx-Extensions.ARB_depth_buffer_float = GL_TRUE;
+   }
 }
diff --git a/src/mesa/state_tracker/st_format.c 
b/src/mesa/state_tracker/st_format.c
index d1995f1..bd4f086 100644
--- a/src/mesa/state_tracker/st_format.c
+++ b/src/mesa/state_tracker/st_format.c
@@ -95,6 +95,9 @@ st_format_datatype(enum pipe_format format)
format == PIPE_FORMAT_X8Z24_UNORM) {
  return GL_UNSIGNED_INT_24_8;
   }
+  else if (format == PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED) {
+ return GL_FLOAT_32_UNSIGNED_INT_24_8_REV;
+  }
   else {
  const GLuint size = format_max_bits(format);
 
@@ -205,6 +208,10 @@ st_mesa_format_to_pipe_format(gl_format mesaFormat)
   return PIPE_FORMAT_Z24X8_UNORM;
case MESA_FORMAT_S8:
   return PIPE_FORMAT_S8_USCALED;
+   case MESA_FORMAT_Z32_FLOAT:
+  return PIPE_FORMAT_Z32_FLOAT;
+   case MESA_FORMAT_Z32_FLOAT_X24S8:
+  return PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED;
case MESA_FORMAT_YCBCR:
   return PIPE_FORMAT_UYVY;
 #if FEATURE_texture_s3tc
@@ -427,6 +434,10 @@ st_pipe_format_to_mesa_format(enum pipe_format format)
   return MESA_FORMAT_X8_Z24;
case PIPE_FORMAT_Z24_UNORM_S8_USCALED:
   return MESA_FORMAT_S8_Z24;
+   case PIPE_FORMAT_Z32_FLOAT:
+  return MESA_FORMAT_Z32_FLOAT;
+   case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED:
+  return MESA_FORMAT_Z32_FLOAT_X24S8;
 
case PIPE_FORMAT_UYVY:
   return MESA_FORMAT_YCBCR;
@@ -784,6 +795,10 @@ static const struct format_mapping format_map[] = {
   { GL_DEPTH_COMPONENT, 0 },
   { DEFAULT_DEPTH_FORMATS }
},
+   {
+  { GL_DEPTH_COMPONENT32F, 0 },
+  { PIPE_FORMAT_Z32_FLOAT, 0 }
+   },
 
/* stencil formats */
{
@@ -800,6 +815,10 @@ static const struct format_mapping format_map[] = {
   { GL_DEPTH_STENCIL_EXT, GL_DEPTH24_STENCIL8_EXT, 0 },
   { PIPE_FORMAT_Z24_UNORM_S8_USCALED, PIPE_FORMAT_S8_USCALED_Z24_UNORM, 0 }
},
+   {
+  { GL_DEPTH32F_STENCIL8, 0 },
+  { PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED, 0 }
+   },
 
/* sRGB formats */
{
-- 
1.7.4.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/13] st/mesa: implement read/draw/copypixels for Z32F and Z32F_S8X24

2011-06-30 Thread Marek Olšák
---
 src/mesa/state_tracker/st_cb_drawpixels.c |   64 +
 src/mesa/state_tracker/st_cb_readpixels.c |   43 +++
 2 files changed, 98 insertions(+), 9 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c 
b/src/mesa/state_tracker/st_cb_drawpixels.c
index d61d7ac..dca3324 100644
--- a/src/mesa/state_tracker/st_cb_drawpixels.c
+++ b/src/mesa/state_tracker/st_cb_drawpixels.c
@@ -812,6 +812,7 @@ draw_stencil_pixels(struct gl_context *ctx, GLint x, GLint 
y,
   for (row = 0; row  height; row++) {
  GLubyte sValues[MAX_WIDTH];
  GLuint zValues[MAX_WIDTH];
+ GLfloat *zValuesFloat = (GLfloat*)zValues;
  GLenum destType = GL_UNSIGNED_BYTE;
  const GLvoid *source = _mesa_image_address2d(clippedUnpack, pixels,
   width, height,
@@ -822,7 +823,11 @@ draw_stencil_pixels(struct gl_context *ctx, GLint x, GLint 
y,
ctx-_ImageTransferState);
 
  if (format == GL_DEPTH_STENCIL) {
-_mesa_unpack_depth_span(ctx, spanWidth, GL_UNSIGNED_INT, zValues,
+GLenum ztype =
+   pt-resource-format == PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED ?
+   GL_FLOAT : GL_UNSIGNED_INT;
+
+_mesa_unpack_depth_span(ctx, spanWidth, ztype, zValues,
 (1  24) - 1, type, source,
 clippedUnpack);
  }
@@ -887,6 +892,26 @@ draw_stencil_pixels(struct gl_context *ctx, GLint x, GLint 
y,
   }
}
break;
+case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED:
+   if (format == GL_DEPTH_STENCIL) {
+  uint *dest = (uint *) (stmap + spanY * pt-stride + spanX*4);
+  GLfloat *destf = (GLfloat*)dest;
+  GLint k;
+  assert(usage == PIPE_TRANSFER_WRITE);
+  for (k = 0; k  spanWidth; k++) {
+ destf[k*2] = zValuesFloat[k];
+ dest[k*2+1] = sValues[k]  0xff;
+  }
+   }
+   else {
+  uint *dest = (uint *) (stmap + spanY * pt-stride + spanX*4);
+  GLint k;
+  assert(usage == PIPE_TRANSFER_READ_WRITE);
+  for (k = 0; k  spanWidth; k++) {
+ dest[k*2+1] = sValues[k]  0xff;
+  }
+   }
+   break;
 default:
assert(0);
 }
@@ -994,14 +1019,23 @@ st_DrawPixels(struct gl_context *ctx, GLint x, GLint y,
 GL_NONE, GL_NONE,
 PIPE_TEXTURE_2D,
0, PIPE_BIND_SAMPLER_VIEW);
-  if (tex_format == PIPE_FORMAT_Z24_UNORM_S8_USCALED)
-stencil_format = PIPE_FORMAT_X24S8_USCALED;
-  else if (tex_format == PIPE_FORMAT_S8_USCALED_Z24_UNORM)
-stencil_format = PIPE_FORMAT_S8X24_USCALED;
-  else
-stencil_format = PIPE_FORMAT_S8_USCALED;
-  if (stencil_format == PIPE_FORMAT_NONE)
-goto stencil_fallback;
+
+  switch (tex_format) {
+  case PIPE_FORMAT_Z24_UNORM_S8_USCALED:
+ stencil_format = PIPE_FORMAT_X24S8_USCALED;
+ break;
+  case PIPE_FORMAT_S8_USCALED_Z24_UNORM:
+ stencil_format = PIPE_FORMAT_S8X24_USCALED;
+ break;
+  case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED:
+ stencil_format = PIPE_FORMAT_X32_S8X24_USCALED;
+ break;
+  case PIPE_FORMAT_S8_USCALED:
+ stencil_format = PIPE_FORMAT_S8_USCALED;
+ break;
+  default:
+ goto stencil_fallback;
+  }
}
 
/* Mesa state should be up to date by now */
@@ -1188,6 +1222,18 @@ copy_stencil_pixels(struct gl_context *ctx, GLint srcx, 
GLint srcy,
  assert(usage == PIPE_TRANSFER_WRITE);
  memcpy(dst, src, width);
  break;
+  case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED:
+ {
+uint *dst4 = (uint *) dst;
+int j;
+dst4++;
+assert(usage == PIPE_TRANSFER_READ_WRITE);
+for (j = 0; j  width; j++) {
+   *dst4 = src[j]  0xff;
+   dst4 += 2;
+}
+ }
+ break;
   default:
  assert(0);
   }
diff --git a/src/mesa/state_tracker/st_cb_readpixels.c 
b/src/mesa/state_tracker/st_cb_readpixels.c
index 67926e3..02ddad7 100644
--- a/src/mesa/state_tracker/st_cb_readpixels.c
+++ b/src/mesa/state_tracker/st_cb_readpixels.c
@@ -151,6 +151,24 @@ st_read_stencil_pixels(struct gl_context *ctx, GLint x, 
GLint y,
 }
  }
  break;
+  case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED:
+ if (format == GL_DEPTH_STENCIL) {
+const uint *src = (uint *) (stmap + srcY * pt-stride);
+const GLfloat *srcf = (const GLfloat*)src;
+GLint 

[Mesa-dev] [PATCH 11/13] gallium/util: implement pack functions for Z32F and Z32F_S8X24

2011-06-30 Thread Marek Olšák
The suffix of 64 means it returns uint64_t.
---
 src/gallium/auxiliary/util/u_pack_color.h |   64 +
 1 files changed, 64 insertions(+), 0 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_pack_color.h 
b/src/gallium/auxiliary/util/u_pack_color.h
index 5378f2d..d2dfba5 100644
--- a/src/gallium/auxiliary/util/u_pack_color.h
+++ b/src/gallium/auxiliary/util/u_pack_color.h
@@ -458,6 +458,19 @@ util_pack_mask_z(enum pipe_format format, uint32_t z)
}
 }
 
+
+static INLINE uint64_t
+util_pack_mask_z64(enum pipe_format format, uint32_t z)
+{
+   switch (format) {
+   case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED:
+  return z;
+   default:
+  return util_pack_mask_z(format, z);
+   }
+}
+
+
 static INLINE uint32_t
 util_pack_mask_z_stencil(enum pipe_format format, uint32_t z, uint8_t s)
 {
@@ -481,6 +494,21 @@ util_pack_mask_z_stencil(enum pipe_format format, uint32_t 
z, uint8_t s)
 }
 
 
+static INLINE uint64_t
+util_pack_mask_z_stencil64(enum pipe_format format, uint32_t z, uint8_t s)
+{
+   uint64_t packed;
+
+   switch (format) {
+   case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED:
+  packed = util_pack_mask_z64(format, z);
+  packed |= (uint64_t)s  32ull;
+  return packed;
+   default:
+  return util_pack_mask_z_stencil(format, z, s);
+   }
+}
+
 
 /**
  * Note: it's assumed that z is in [0,1]
@@ -525,6 +553,24 @@ util_pack_z(enum pipe_format format, double z)
   return 0;
}
 }
+
+
+static INLINE uint64_t
+util_pack_z64(enum pipe_format format, double z)
+{
+   union fi fui;
+
+   if (z == 0)
+  return 0;
+
+   switch (format) {
+   case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED:
+  fui.f = (float)z;
+  return fui.ui;
+   default:
+  return util_pack_z(format, z);
+   }
+}
  
 
 /**
@@ -554,6 +600,24 @@ util_pack_z_stencil(enum pipe_format format, double z, 
uint8_t s)
 }
 
 
+static INLINE uint64_t
+util_pack_z_stencil64(enum pipe_format format, double z, uint8_t s)
+{
+   uint64_t packed;
+
+   switch (format) {
+   case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED:
+  packed = util_pack_z64(format, z);
+  packed |= (uint64_t)s  32ull;
+  break;
+   default:
+  return util_pack_z_stencil(format, z, s);
+   }
+
+   return packed;
+}
+
+
 /**
  * Pack 4 ubytes into a 4-byte word
  */
-- 
1.7.4.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/13] gallium/util: implement software Z32F_S8X24 depth-stencil clear

2011-06-30 Thread Marek Olšák
---
 src/gallium/auxiliary/util/u_surface.c |   35 +++-
 1 files changed, 34 insertions(+), 1 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_surface.c 
b/src/gallium/auxiliary/util/u_surface.c
index 4c5cc4d..8fcf6b9 100644
--- a/src/gallium/auxiliary/util/u_surface.c
+++ b/src/gallium/auxiliary/util/u_surface.c
@@ -358,8 +358,41 @@ util_clear_depth_stencil(struct pipe_context *pipe,
dst_map += dst_stride;
 }
  }
-break;
+ break;
   case 8:
+  {
+ uint64_t zstencil = util_pack_z_stencil64(dst-texture-format,
+   depth, stencil);
+
+ assert(dst-format == PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED);
+
+ if (!need_rmw) {
+for (i = 0; i  height; i++) {
+   uint64_t *row = (uint64_t *)dst_map;
+   for (j = 0; j  width; j++)
+  *row++ = zstencil;
+   dst_map += dst_stride;
+}
+ }
+ else {
+uint64_t src_mask;
+
+if (clear_flags  PIPE_CLEAR_DEPTH)
+   src_mask = 0xull;
+else
+   src_mask = 0x00ffull;
+
+for (i = 0; i  height; i++) {
+   uint64_t *row = (uint64_t *)dst_map;
+   for (j = 0; j  width; j++) {
+  uint64_t tmp = *row  ~src_mask;
+  *row++ = tmp | (zstencil  src_mask);
+   }
+   dst_map += dst_stride;
+}
+ }
+ break;
+  }
   default:
  assert(0);
  break;
-- 
1.7.4.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 13/13] gallium/util: handle Z32F_FLOAT_S8X24_USCALED in pipe_tile_raw_to_rgba

2011-06-30 Thread Marek Olšák
And make pipe_put_tile_rgba_format no-op like the other Z formats.
---
 src/gallium/auxiliary/util/u_tile.c |   35 +++
 1 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_tile.c 
b/src/gallium/auxiliary/util/u_tile.c
index e3c7085..23f12e5 100644
--- a/src/gallium/auxiliary/util/u_tile.c
+++ b/src/gallium/auxiliary/util/u_tile.c
@@ -318,6 +318,32 @@ z32f_get_tile_rgba(const float *src,
}
 }
 
+/*** PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED ***/
+
+/**
+ * Return each Z value as four floats in [0,1].
+ */
+static void
+z32f_x24s8_get_tile_rgba(const float *src,
+ unsigned w, unsigned h,
+ float *p,
+ unsigned dst_stride)
+{
+   unsigned i, j;
+
+   for (i = 0; i  h; i++) {
+  float *pRow = p;
+  for (j = 0; j  w; j++, pRow += 4) {
+ pRow[0] =
+ pRow[1] =
+ pRow[2] =
+ pRow[3] = *src;
+ src += 2;
+  }
+  p += dst_stride;
+   }
+}
+
 
 void
 pipe_tile_raw_to_rgba(enum pipe_format format,
@@ -352,6 +378,9 @@ pipe_tile_raw_to_rgba(enum pipe_format format,
case PIPE_FORMAT_Z32_FLOAT:
   z32f_get_tile_rgba((float *) src, w, h, dst, dst_stride);
   break;
+   case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED:
+  z32f_x24s8_get_tile_rgba((float *) src, w, h, dst, dst_stride);
+  break;
default:
   util_format_read_4f(format,
   dst, dst_stride * sizeof(float),
@@ -445,6 +474,12 @@ pipe_put_tile_rgba_format(struct pipe_context *pipe,
case PIPE_FORMAT_X8Z24_UNORM:
   /*z24s8_put_tile_rgba((unsigned *) packed, w, h, p, src_stride);*/
   break;
+   case PIPE_FORMAT_Z32_FLOAT:
+  /*z32f_put_tile_rgba((unsigned *) packed, w, h, p, src_stride);*/
+  break;
+   case PIPE_FORMAT_Z32_FLOAT_S8X24_USCALED:
+  /*z32f_s8x24_put_tile_rgba((unsigned *) packed, w, h, p, src_stride);*/
+  break;
default:
   util_format_write_4f(format,
p, src_stride * sizeof(float),
-- 
1.7.4.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/13] mesa: implement depth/stencil renderbuffer wrapper accessors for Z32F_X24S8

2011-06-30 Thread Marek Olšák
---
 src/mesa/main/depthstencil.c |  322 +++---
 src/mesa/main/depthstencil.h |5 +
 src/mesa/main/framebuffer.c  |   10 +-
 3 files changed, 313 insertions(+), 24 deletions(-)

diff --git a/src/mesa/main/depthstencil.c b/src/mesa/main/depthstencil.c
index ab62c97..f979045 100644
--- a/src/mesa/main/depthstencil.c
+++ b/src/mesa/main/depthstencil.c
@@ -393,6 +393,217 @@ _mesa_new_z24_renderbuffer_wrapper(struct gl_context *ctx,
 }
 
 
+static void
+get_row_z32f(struct gl_context *ctx, struct gl_renderbuffer *z32frb, GLuint 
count,
+ GLint x, GLint y, void *values)
+{
+   struct gl_renderbuffer *dsrb = z32frb-Wrapped;
+   GLfloat temp[MAX_WIDTH*2];
+   GLfloat *dst = (GLfloat *) values;
+   const GLfloat *src = (const GLfloat *) dsrb-GetPointer(ctx, dsrb, x, y);
+   GLuint i;
+   ASSERT(z32frb-DataType == GL_FLOAT);
+   ASSERT(dsrb-DataType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV);
+   ASSERT(dsrb-Format == MESA_FORMAT_Z32_FLOAT_X24S8);
+   if (!src) {
+  dsrb-GetRow(ctx, dsrb, count, x, y, temp);
+  src = temp;
+   }
+   for (i = 0; i  count; i++) {
+  dst[i] = src[i*2];
+   }
+}
+
+static void
+get_values_z32f(struct gl_context *ctx, struct gl_renderbuffer *z32frb, GLuint 
count,
+const GLint x[], const GLint y[], void *values)
+{
+   struct gl_renderbuffer *dsrb = z32frb-Wrapped;
+   GLfloat temp[MAX_WIDTH*2];
+   GLfloat *dst = (GLfloat *) values;
+   GLuint i;
+   ASSERT(z32frb-DataType == GL_FLOAT);
+   ASSERT(dsrb-DataType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV);
+   ASSERT(dsrb-Format == MESA_FORMAT_Z32_FLOAT_X24S8);
+   ASSERT(count = MAX_WIDTH);
+   /* don't bother trying direct access */
+   dsrb-GetValues(ctx, dsrb, count, x, y, temp);
+   for (i = 0; i  count; i++) {
+  dst[i] = temp[i*2];
+   }
+}
+
+static void
+put_row_z32f(struct gl_context *ctx, struct gl_renderbuffer *z32frb, GLuint 
count,
+ GLint x, GLint y, const void *values, const GLubyte *mask)
+{
+   struct gl_renderbuffer *dsrb = z32frb-Wrapped;
+   const GLfloat *src = (const GLfloat *) values;
+   GLfloat *dst = (GLfloat *) dsrb-GetPointer(ctx, dsrb, x, y);
+   ASSERT(z32frb-DataType == GL_FLOAT);
+   ASSERT(dsrb-DataType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV);
+   ASSERT(dsrb-Format == MESA_FORMAT_Z32_FLOAT_X24S8);
+   if (dst) {
+  /* direct access */
+  GLuint i;
+  for (i = 0; i  count; i++) {
+ if (!mask || mask[i]) {
+dst[i*2] = src[i];
+ }
+  }
+   }
+   else {
+  /* get, modify, put */
+  GLfloat temp[MAX_WIDTH*2];
+  GLuint i;
+  dsrb-GetRow(ctx, dsrb, count, x, y, temp);
+  for (i = 0; i  count; i++) {
+ if (!mask || mask[i]) {
+temp[i*2] = src[i];
+ }
+  }
+  dsrb-PutRow(ctx, dsrb, count, x, y, temp, mask);
+   }
+}
+
+static void
+put_mono_row_z32f(struct gl_context *ctx, struct gl_renderbuffer *z32frb, 
GLuint count,
+  GLint x, GLint y, const void *value, const GLubyte *mask)
+{
+   struct gl_renderbuffer *dsrb = z32frb-Wrapped;
+   GLfloat *dst = (GLfloat *) dsrb-GetPointer(ctx, dsrb, x, y);
+   ASSERT(z32frb-DataType == GL_FLOAT);
+   ASSERT(dsrb-DataType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV);
+   ASSERT(dsrb-Format == MESA_FORMAT_Z32_FLOAT_X24S8);
+   if (dst) {
+  /* direct access */
+  GLuint i;
+  const GLfloat val = *(GLfloat*)value;
+  for (i = 0; i  count; i++) {
+ if (!mask || mask[i]) {
+dst[i*2] = val;
+ }
+  }
+   }
+   else {
+  /* get, modify, put */
+  GLfloat temp[MAX_WIDTH*2];
+  GLuint i;
+  const GLfloat val = *(GLfloat *)value;
+  dsrb-GetRow(ctx, dsrb, count, x, y, temp);
+  for (i = 0; i  count; i++) {
+ if (!mask || mask[i]) {
+temp[i*2] = val;
+ }
+  }
+  dsrb-PutRow(ctx, dsrb, count, x, y, temp, mask);
+   }
+}
+
+static void
+put_values_z32f(struct gl_context *ctx, struct gl_renderbuffer *z32frb, GLuint 
count,
+const GLint x[], const GLint y[],
+const void *values, const GLubyte *mask)
+{
+   struct gl_renderbuffer *dsrb = z32frb-Wrapped;
+   const GLfloat *src = (const GLfloat *) values;
+   ASSERT(z32frb-DataType == GL_FLOAT);
+   ASSERT(dsrb-DataType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV);
+   ASSERT(dsrb-Format == MESA_FORMAT_Z32_FLOAT_X24S8);
+   if (dsrb-GetPointer(ctx, dsrb, 0, 0)) {
+  /* direct access */
+  GLuint i;
+  for (i = 0; i  count; i++) {
+ if (!mask || mask[i]) {
+GLfloat *dst = (GLfloat *) dsrb-GetPointer(ctx, dsrb, x[i], y[i]);
+dst[1] = src[i];
+ }
+  }
+   }
+   else {
+  /* get, modify, put */
+  GLfloat temp[MAX_WIDTH*2];
+  GLuint i;
+  dsrb-GetValues(ctx, dsrb, count, x, y, temp);
+  for (i = 0; i  count; i++) {
+ if (!mask || mask[i]) {
+temp[i*2] = src[i];
+ }
+  }
+  dsrb-PutValues(ctx, dsrb, count, x, y, 

[Mesa-dev] [Bug 5002] indirect rendering of glDrawArrays() to an NVidia machine is broke.

2011-06-30 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=5002

Trevor Forbes t...@internode.on.net changed:

   What|Removed |Added

 CC||t...@internode.on.net

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/13] mesa: implement depth/stencil renderbuffer wrapper accessors for Z32F_X24S8

2011-06-30 Thread Kenneth Graunke
On 06/30/2011 05:29 PM, Marek Olšák wrote:
 ---
  src/mesa/main/depthstencil.c |  322 
 +++---
  src/mesa/main/depthstencil.h |5 +
  src/mesa/main/framebuffer.c  |   10 +-
  3 files changed, 313 insertions(+), 24 deletions(-)
 
 diff --git a/src/mesa/main/depthstencil.c b/src/mesa/main/depthstencil.c
 index ab62c97..f979045 100644
 --- a/src/mesa/main/depthstencil.c
 +++ b/src/mesa/main/depthstencil.c
[snip]
 +static void
 +put_values_z32f(struct gl_context *ctx, struct gl_renderbuffer *z32frb, 
 GLuint count,
 +const GLint x[], const GLint y[],
 +const void *values, const GLubyte *mask)
 +{
 +   struct gl_renderbuffer *dsrb = z32frb-Wrapped;
 +   const GLfloat *src = (const GLfloat *) values;
 +   ASSERT(z32frb-DataType == GL_FLOAT);
 +   ASSERT(dsrb-DataType == GL_FLOAT_32_UNSIGNED_INT_24_8_REV);
 +   ASSERT(dsrb-Format == MESA_FORMAT_Z32_FLOAT_X24S8);
 +   if (dsrb-GetPointer(ctx, dsrb, 0, 0)) {
 +  /* direct access */
 +  GLuint i;
 +  for (i = 0; i  count; i++) {
 + if (!mask || mask[i]) {
 +GLfloat *dst = (GLfloat *) dsrb-GetPointer(ctx, dsrb, x[i], 
 y[i]);
 +dst[1] = src[i];

Don't you mean dst[0] = src[i] here?  With dst[1], you'll be assigning
to the stencil value...

 + }
 +  }
 +   }
 +   else {
 +  /* get, modify, put */
 +  GLfloat temp[MAX_WIDTH*2];
 +  GLuint i;
 +  dsrb-GetValues(ctx, dsrb, count, x, y, temp);
 +  for (i = 0; i  count; i++) {
 + if (!mask || mask[i]) {
 +temp[i*2] = src[i];

...when clearly this is assigning to the depth value.

 + }
 +  }
 +  dsrb-PutValues(ctx, dsrb, count, x, y, temp, mask);
 +   }
 +}
[snip]

With that fixed, this patch is:

Reviewed-by: Kenneth Graunke kenn...@whitecape.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 37177] Mathematica Plot3D Crash

2011-06-30 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=37177

Jason Tibbitts ti...@math.uh.edu changed:

   What|Removed |Added

 CC||ti...@math.uh.edu

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/13] Floating-point depth buffers

2011-06-30 Thread Kenneth Graunke
On 06/30/2011 05:29 PM, Marek Olšák wrote:
 Hi,
 
 this patch series implements ARB_depth_buffer_float in Mesa and Gallium. 
 There is complete r600g/r600-r700 support in my private branch, which passes 
 the same tests that pass for Z24S8. Softpipe has only sampler support. This 
 has turned out to be not so trivial, so it's possible I missed something.
 
 I did not implement NV_depth_buffer_float, because it's not compatible with 
 the ARB variant. (GL_DEPTH_COMPONENT32F != GL_DEPTH_COMPONENT32F_NV etc.) The 
 NV extension can operate on unclamped depth values, whereas the ARB one 
 always clamps them.
 
 Please review.

Marek,

You are awesome!  Thanks so much for implementing this!

Patches 1-7 look great:
Reviewed-by: Kenneth Graunke kenn...@whitecape.org

I replied with a comment on patch 8, and am not planning on reviewing
patches 9-13 as I'm not familiar with Gallium.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 5002] indirect rendering of glDrawArrays() to an NVidia machine is broke.

2011-06-30 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=5002

--- Comment #8 from Trevor Forbes t...@internode.on.net 2011-06-30 21:39:59 
PDT ---
I also keep running into this problem with workstations running NVIDIA drivers
 180.  Try running Googleearth from a remote server for example..

Some people are getting around it by using an old NVIDIA driver or by copying
the proprietary NVIDIA lib to the server and replacing the Mesa lib. 

In my case, I am setting LIBGL_NO_DRAWARRAYS=1 on the server before launching
the application which works but is really a workaround rather than a solution.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev