[Mesa3d-dev] Remove static_dispatch=false from GL functions exported by ATI or nVidia?

2010-09-14 Thread Luca Barbieri
Currently, there are several functions where static dispatch has been
disabled, but that are exported by either ATI, nVidia or both lib.

To prevent compatibility issues, it seems a good idea to export those
too at least.

What do you think?
Should we export all function exported by both nVidia and ATI, those
exported by any of them, or even just export all functions?

static_dispatch=false but exported by both ATI and nVidia:
glBlendEquationSeparateEXT
glBlitFramebufferEXT
glGetQueryObjecti64vEXT
glGetQueryObjectui64vEXT
glProgramEnvParameters4fvEXT
glProgramLocalParameters4fvEXT

static_dispatch=false but exported by ATI, not by nVidia:
glGetHistogramEXT
glGetHistogramParameterfvEXT
glGetHistogramParameterivEXT
glGetMinmaxEXT
glGetMinmaxParameterfvEXT
glGetMinmaxParameterivEXT
glGetTexParameterPointervAPPLE
glHistogramEXT
glMinmaxEXT
glResetHistogramEXT
glResetMinmaxEXT
glStencilFuncSeparateATI
glStencilOpSeparateATI
glTextureRangeAPPLE

static_dispatch=false but exported by nVidia, not by ATI:
glActiveStencilFaceEXT
glColorSubTableEXT
glDeleteFencesNV
glDepthBoundsEXT
glFinishFenceNV
glGenFencesNV
glGetFenceivNV
glIsFenceNV
glSetFenceNV
glTestFenceNV

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Remove static_dispatch=false from GL functions exported by ATI or nVidia?

2010-09-14 Thread Luca Barbieri
 No. The libGL ABI is well defined:

 http://www.opengl.org/registry/ABI/

Does the ABI forbid exporting additional functions from libGL.so?

As far as I can tell, it doesn't, and only says what is required to be
exported, and that applications shouldn't statically link to other
functions.

The way things are now, some applications that work with nVidia or ATI
implementations will fail to link or fail to load with Mesa, which
seems undesirable.
A bug on this has just been reported.

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread Luca Barbieri
On Mon, Sep 6, 2010 at 3:57 PM, José Fonseca jfons...@vmware.com wrote:
 I'd like to know if there's any objection to change the
 resource_copy_region semantics to allow copies between different yet
 compatible formats, where the definition of compatible formats is:

I was about to propose something like this.

How about a much more powerful change though, that would make any pair
of non-blocked format of the same bit depth compatible?
This way you could copy z24s8 to r8g8b8a8, for instance.

In addition to this, how about explicitly allowing sampler views to
use a compatible format, and add the ability for surfaces to use a
compatible format too? (with a new parameter to get_tex_surface)

This would allow for instance to implement glBlitFramebuffer on
stencil buffers by reinterpreting the buffer as r8g8b8a8, and allow
the blitter module to copy depth/stencil buffers by simply treating
them as color buffers.

The only issue is that some drivers might hold depth/stencil surfaces
in compressed formats that cannot be interpreted as a color format,
and not have any mechanism for keeping temporaries or doing
conversions internally.

DirectX seems to have something like this with the _TYPELESS formats.

--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread Luca Barbieri
How about dropping the idea that resource_copy_region must be just a
memcpy and have the driver instruct the hardware 2D blitter to write
1s in the alpha channel if supported by hw or have u_blitter do this
in the shader?

nv30/nv40 and apparently nv50 can do this in the 2D blitter, and all
Radeons seem to use the 3D engine, which obviously can do it in the
shader.

We may also want to allow actual conversion between arbitrary formats,
since again u_blitter can do it trivially, and so can most/all
hardware 2D engines.

--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread Luca Barbieri
 This way you could copy z24s8 to r8g8b8a8, for instance.

 I am not sure this makes a lot of sense. There's no guarantee the bit
 layout of these is even remotely similar (and it likely won't be on any
 decent hardware). I think the dx10 restriction makes sense here.

Yes, it depends on the flexibility of the hardware and the driver.
Due to depth textures, I think it is actually likely that you can
easily treat depth as color.

The worst issue right now is that stencil cannot be accessed in a
sensible way at all, which makes implementing glBlitFramebuffer of
STENCIL_BIT with NEAREST and different rect sizes impossible.
Some cards (r600+ at least) can write stencil in shaders, but on some
you must reinterpret the surface.
And resource_copy_region does not support stretching, so it can't be used.

Since not all cards can write stencil in shaders, one either needs to
be able to bind depth/stencil as a color buffer, or extend
resource_copy_region to support stretching with nearest filtering, or
both (possibly in addition to having the option of using stencil
export in shaders).

Other things would likely benefit, such as GL_NV_copy_depth_to_color.

--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread Luca Barbieri
Yes, if x8 is interpreted as writes can write arbitrary data, reads
must return 1 (as you said), then this is not necessary in
resource_copy_region even if A8 - X8 becomes supported.

You are right that format conversions would probably be better added
as a separate function (if at all), in addition to the
reinterpret_cast mechanism you proposed to add.

--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread Luca Barbieri
 When I said it won't work with decent hardware, I really meant it won't
 work due to compression. Now, it's quite possible this can be disabled
 on any chip, but you don't know that before hence you need to jump
 through hoops to get an uncompressed version of your compressed buffer
 later.

Well, you can render to a compressed depth buffer and then bind it as
a depth texture (routinely done for shadows), so there needs to be a
way to get compressed data to the sampler either directly or via the
driver automagically converting it with a blit beforehand.

Of course, this may not actually work for stencil too, or might not
allow to let you interpret depth as 8-bit color components, or perhaps
not use directly as a render target, but it seems possible, especially
on modern flexible hardware and on older dumber hardware that
lacks/doesn't force compression.

I haven't checked any hardware docs though, beyond the fact that nvfx
currently doesn't support any compression and thus can just do it.

--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread Luca Barbieri
 The dst blending parameter is just a factor the real dst value is multiplied
 by (except for min/max). There is no way to multiply an arbitrary value by a
 constant and get 1.0. But you can force 0, of course. I don't think there is
 hardware which supports such flexible swizzling in the blender. If x8 is
 just padding as you say, the value of it should be undefined and every
 operation using the padding bits should be undefined too except for texture
 sampling. It's not like I have any other choice.

As far as I can tell, the only problem you have with blending with an
X8 with random garbage, but with read value 1 is if any of the
blending factors is DST_ALPHA or INV_DST_ALPHA (or COLOR as an alpha
factor), in which case you can solve the issue by replacing the
offending factor with ONE or ZERO, as long as you have support for
RGB/A separate blend functions (which Gallium currenly assumes afaik).

You can also disable the alpha channel in the writemask to avoid
unnecessary work.

On nv30/nv40, there is an actual render target format that instructs
the card to read dst alpha as 1 (you can also choose whether to write
0 or 1).

Of course, one could argue that mesa/st should do the transformation
instead of Gallium drivers where hardware lacks such support.

I suppose just not advertising X8 formats as render target formats
could also work.

--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] ARB draw buffers + texenv program

2010-04-13 Thread Luca Barbieri
On nv30/nv40 support for patching fragment programs is already
necessary (constants must be patched in as immediates), and this can
be handled by just patching the end of the fragment program to include
a variable number of instructions to copy a temp to COLOR[x].

It's possible that there could be a hardware mechanism too, haven't checked.

If other MRT-capable hardware already has this kind of fragment
program patching or supports this in hardware, then a new TGSI
semantic or register file can be added for this, and drivers can
easily implement that without recompilation.

Drivers could also just unconditionally write all color outputs as a
first implementation or if that doesn't affect performance.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


[Mesa3d-dev] [PATCH 4/6] gallium/auxiliary: add semantic linkage utility code

2010-04-13 Thread Luca Barbieri
---
 src/gallium/auxiliary/Makefile |1 +
 src/gallium/auxiliary/util/u_linkage.c |  119 
 src/gallium/auxiliary/util/u_linkage.h |   38 ++
 3 files changed, 158 insertions(+), 0 deletions(-)
 create mode 100644 src/gallium/auxiliary/util/u_linkage.c
 create mode 100644 src/gallium/auxiliary/util/u_linkage.h

diff --git a/src/gallium/auxiliary/Makefile b/src/gallium/auxiliary/Makefile
index c4d6b52..44c2f8b 100644
--- a/src/gallium/auxiliary/Makefile
+++ b/src/gallium/auxiliary/Makefile
@@ -120,6 +120,7 @@ C_SOURCES = \
util/u_hash.c \
util/u_keymap.c \
util/u_linear.c \
+   util/u_linkage.c \
util/u_network.c \
util/u_math.c \
util/u_mm.c \
diff --git a/src/gallium/auxiliary/util/u_linkage.c 
b/src/gallium/auxiliary/util/u_linkage.c
new file mode 100644
index 000..8a76378
--- /dev/null
+++ b/src/gallium/auxiliary/util/u_linkage.c
@@ -0,0 +1,119 @@
+#include util/u_debug.h
+#include pipe/p_shader_tokens.h
+#include tgsi/tgsi_parse.h
+#include tgsi/tgsi_scan.h
+#include util/u_linkage.h
+
+/* we must only record the registers that are actually used, not just declared 
*/
+static INLINE boolean
+util_semantic_set_test_and_set(struct util_semantic_set *set, unsigned value)
+{
+   unsigned mask = 1  (value % (sizeof(long) * 8));
+   unsigned long *p = set-masks[value / (sizeof(long) * 8)];
+   unsigned long v = *p  mask;
+   *p |= mask;
+   return !!v;
+}
+
+unsigned
+util_semantic_set_from_program_file(struct util_semantic_set *set, const 
struct tgsi_token *tokens, enum tgsi_file_type file)
+{
+   struct tgsi_shader_info info;
+   struct tgsi_parse_context parse;
+   unsigned count = 0;
+   ubyte *semantic_name;
+   ubyte *semantic_index;
+
+   tgsi_scan_shader(tokens, info);
+
+   if(file == TGSI_FILE_INPUT)
+   {
+  semantic_name = info.input_semantic_name;
+  semantic_index = info.input_semantic_index;
+   }
+   else if(file == TGSI_FILE_OUTPUT)
+   {
+  semantic_name = info.output_semantic_name;
+  semantic_index = info.output_semantic_index;
+   }
+   else
+  assert(0);
+
+   tgsi_parse_init(parse, tokens);
+
+   memset(set-masks, 0, sizeof(set-masks));
+   while(!tgsi_parse_end_of_tokens(parse))
+   {
+  tgsi_parse_token(parse);
+
+  if(parse.FullToken.Token.Type == TGSI_TOKEN_TYPE_INSTRUCTION)
+  {
+const struct tgsi_full_instruction *finst = 
parse.FullToken.FullInstruction;
+unsigned i;
+for(i = 0; i  finst-Instruction.NumDstRegs; ++i)
+{
+   if(finst-Dst[i].Register.File == file)
+   {
+  unsigned idx = finst-Dst[i].Register.Index;
+  if(semantic_name[idx] == TGSI_SEMANTIC_GENERIC)
+  {
+ if(!util_semantic_set_test_and_set(set, semantic_index[idx]))
+++count;
+  }
+   }
+}
+
+for(i = 0; i  finst-Instruction.NumSrcRegs; ++i)
+{
+   if(finst-Src[i].Register.File == file)
+   {
+  unsigned idx = finst-Src[i].Register.Index;
+  if(semantic_name[idx] == TGSI_SEMANTIC_GENERIC)
+  {
+ if(!util_semantic_set_test_and_set(set, semantic_index[idx]))
+++count;
+  }
+   }
+}
+  }
+   }
+   tgsi_parse_free(parse);
+
+   return count;
+}
+
+#define UTIL_SEMANTIC_SET_FOR_EACH(i, set) for(i = 0; i  256; ++i) 
if(set-masks[i / (sizeof(long) * 8)]  (1  (i % (sizeof(long) * 8
+
+void
+util_semantic_layout_from_set(unsigned char *layout, const struct 
util_semantic_set *set, unsigned efficient_slots, unsigned num_slots)
+{
+   int first = -1;
+   int last = -1;
+   unsigned i;
+
+   memset(layout, 0xff, num_slots);
+
+   UTIL_SEMANTIC_SET_FOR_EACH(i, set)
+   {
+  if(first  0)
+first = i;
+  last = i;
+   }
+
+   if(last  efficient_slots)
+   {
+  UTIL_SEMANTIC_SET_FOR_EACH(i, set)
+ layout[i] = i;
+   }
+   else if((last - first)  efficient_slots)
+   {
+  UTIL_SEMANTIC_SET_FOR_EACH(i, set)
+ layout[i - first] = i;
+   }
+   else
+   {
+  unsigned idx = 0;
+  UTIL_SEMANTIC_SET_FOR_EACH(i, set)
+ layout[idx++] = i;
+   }
+}
diff --git a/src/gallium/auxiliary/util/u_linkage.h 
b/src/gallium/auxiliary/util/u_linkage.h
new file mode 100644
index 000..e73e0fd
--- /dev/null
+++ b/src/gallium/auxiliary/util/u_linkage.h
@@ -0,0 +1,38 @@
+#ifndef U_LINKAGE_H_
+#define U_LINKAGE_H_
+
+#include pipe/p_compiler.h
+
+struct util_semantic_set
+{
+   unsigned long masks[256 / 8 / sizeof(unsigned long)];
+};
+
+static INLINE bool
+util_semantic_set_contains(struct util_semantic_set *set, unsigned char value)
+{
+   return !!(set-masks[value / (sizeof(long) * 8)]  (1  (value / 
(sizeof(long) * 8;
+}
+
+unsigned util_semantic_set_from_program_file(struct util_semantic_set *set, 
const struct tgsi_token *tokens, enum tgsi_file_type file);
+
+/* 

[Mesa3d-dev] [PATCH 0/6] [RFC] Formalization of the Gallium shader semantics linkage model

2010-04-13 Thread Luca Barbieri
 only.

Any API that requires special semantics for COLOR and BCOLOR (i.e.
non-SM3) seems to only want 0-1 indices.

Note that SM3 does *not* include BCOLOR, so basically the limits for
generic indices would need to be conditional on BCOLOR being present
or not (e.g. if it is present, we must reserve two semantic slots in
svga for it).

POSITION0 is obviously special.
PSIZE0 is also special for points.

FOG0 seems right now to just be a GENERIC with a single component.
Gallium could be extended to support fixed function fog, which most
DX9 hardware supports (nv30/nv40 and r300). This is mostly orthogonal
to the semantic issue.

==
Current Gallium users
==

Right now no open-source users of Gallium fundamentally require arbitrary 
indices.
In particular:
1. GLSL and anything with similar link-by-name can of course be modified to use 
sequential indices
2. ARB fragment program and vertex program use index-limited texcoord slots
3. g3dvl needs and uses 8 texcoord slots, indices 0-7
4. vega and xorg use indices 0-1
5. DX10 seems to restrict semantics to 0-N range, if I'm not mistaken
6. The GL_EXT_separate_shader_objects extension does not provide
arbitrary index matching for GLSL, but merely lets it use a model
similar to ARB fp/vp

However, the GLSL linker needs them in its current form, and the capability can 
be generally useful anyway.

===
Discussion of possible options
===

[Options from Keith Whitwell, see 
http://www.opensource-archive.org/showthread.php?p=180719]
a) Picking a lower number like 128, that an SM3 state tracker could
usually be able to directly translate incoming semantics into, but which
would force it to renumber under rare circumstances. This would make
life easier for the open drivers at the expense of the closed code.

b) Picking 256 to make life easier for some closed-source SM3 state
tracker, but harder for open drivers.

c) Picking 219 (or some other magic number) that happens to work with
the current set of constraints, but makes gallium fragile in the face of
new constraints.

d) Abandoning the current gallium linkage rules and coming up with
something new, for instance forcing the state trackers to renumber
always and making life trivial for the drivers...

[Options from me]

(e) Allow arbitrary 32-bit indices. This requires slightly more
complicated data structures in some cases, and will require svga and
r600 to fallback to software linkage if numbers are too high.

(f) Limit semantic indices to hardware interpolators _and_ introduce
an interface to let the user specify an

Personally I think the simplest idea for now could be to have all
drivers support 256 indices or, in the case of r600 and svga, the
maximum value supported by the hardware, and expose that as a cap (as
well as another cap for the number of different semantic values
supported at once).
The minimum guaranteed value is set to the lowest hardware constraint,
which would be svga with 219 indices (assuming no bcolor is used).
If some new constraints pop up, we just lower it and change SM3 state
trackers to check for it and fallback otherwise.

This should just require simple fixes to svga and r300, and
significant code for nv30/nv40, which is however already implemented.

Luca Barbieri (5):
  tgsi: formalize limits on semantic indices
  tgsi: add support for packing semantics in SM3 byte values
  gallium/auxiliary: add semantic linkage utility code
  nvfx: support proper shader linkage - adds glsl support
  nvfx: expose GLSL

Michal Krol (1):
  gallium: Remove TGSI_SEMANTIC_NORMAL.


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


[Mesa3d-dev] [PATCH 2/6] tgsi: formalize limits on semantic indices

2010-04-13 Thread Luca Barbieri
---
 src/gallium/include/pipe/p_shader_tokens.h |   18 ++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/src/gallium/include/pipe/p_shader_tokens.h 
b/src/gallium/include/pipe/p_shader_tokens.h
index baff802..5d511ba 100644
--- a/src/gallium/include/pipe/p_shader_tokens.h
+++ b/src/gallium/include/pipe/p_shader_tokens.h
@@ -146,6 +146,24 @@ struct tgsi_declaration_dimension
 #define TGSI_SEMANTIC_INSTANCEID 10
 #define TGSI_SEMANTIC_COUNT  11 /** number of semantic values */
 
+/* 219 = (14 * 16 - 5)
+ * All SM3 semantics minus COLOR0, COLOR1, POSITION0, FOG0 and PSIZE0
+ * This value is accurately chosen so that Gallium semantic/indices may be 
converted
+ * losslessly from and to SM3 semantics.
+ *
+ * Note that if BCOLOR is used, then this value is actually 211 - 
#MAX_BCOLOR_INDEX_USED - 1
+ * (SM3 does not support BCOLOR, and uses FACE instead)
+ *
+ * In any card supports more, this will be handled later.
+ *
+ * However, drivers should support 256 generic indices if the mechanism
+ * they use is not intrinsically limited to a lower value.
+ */
+#define TGSI_SEMANTIC_GENERIC_INDICES 219
+
+#define TGSI_SEMANTIC_INDICES(sem) (((sem) == TGSI_SEMANTIC_GENERIC) ? 
TGSI_SEMANTIC_GENERIC_INDICES : \
+   ((sem == TGSI_SEMANTIC_COLOR_INDICES || sem == 
TGSI_SEMANTIC_BCOLOR_INDICES) ? 2 : 1))
+
 struct tgsi_declaration_semantic
 {
unsigned Name   : 8;  /** one of TGSI_SEMANTIC_x */
-- 
1.7.0.1.147.g6d84b


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


[Mesa3d-dev] [PATCH 3/6] tgsi: add support for packing semantics in SM3 byte values

2010-04-13 Thread Luca Barbieri
---
 src/gallium/auxiliary/util/u_semantics.h |  123 ++
 1 files changed, 123 insertions(+), 0 deletions(-)
 create mode 100644 src/gallium/auxiliary/util/u_semantics.h

diff --git a/src/gallium/auxiliary/util/u_semantics.h 
b/src/gallium/auxiliary/util/u_semantics.h
new file mode 100644
index 000..d620619
--- /dev/null
+++ b/src/gallium/auxiliary/util/u_semantics.h
@@ -0,0 +1,123 @@
+#ifndef U_SEMANTICS_H_
+#define U_SEMANTICS_H_
+
+#include pipe/p_compiler.h
+#include pipe/p_shader_tokens.h
+
+/* same as SM3 values */
+#define TGSI_SEMANTIC_BYTE_POSITION 0
+#define TGSI_SEMANTIC_BYTE_PSIZE (4  4)
+#define TGSI_SEMANTIC_BYTE_COLOR0 (10  4)
+#define TGSI_SEMANTIC_BYTE_COLOR1 (TGSI_SEMANTIC_BYTE_COLOR0 + 1)
+#define TGSI_SEMANTIC_BYTE_FOG (11  4)
+#define TGSI_SEMANTIC_BYTE_BCOLOR0 (14  4)
+#define TGSI_SEMANTIC_BYTE_BCOLOR1 (TGSI_SEMANTIC_BYTE_BCOLOR0 + 1)
+#define TGSI_SEMANTIC_BYTE_TGSI (15  4)
+
+static INLINE unsigned char
+pipe_semantic_to_byte(unsigned name, unsigned index)
+{
+   switch (name)
+   {
+   case TGSI_SEMANTIC_POSITION:
+  return TGSI_SEMANTIC_BYTE_POSITION;
+   case TGSI_SEMANTIC_PSIZE:
+  return TGSI_SEMANTIC_BYTE_PSIZE;
+   case TGSI_SEMANTIC_FOG:
+  return TGSI_SEMANTIC_BYTE_FOG;
+   case TGSI_SEMANTIC_COLOR:
+  return TGSI_SEMANTIC_BYTE_COLOR0 + index;
+   case TGSI_SEMANTIC_GENERIC:
+  ++index;
+  if(index = TGSI_SEMANTIC_BYTE_PSIZE)
+  {
+++index;
+if(index = TGSI_SEMANTIC_BYTE_COLOR0)
+{
+   index += 2;
+   if(index = TGSI_SEMANTIC_BYTE_FOG)
+  ++index;
+}
+  }
+  return index;
+   case TGSI_SEMANTIC_BCOLOR:
+  return TGSI_SEMANTIC_BYTE_BCOLOR0 + index;
+   default:
+  return TGSI_SEMANTIC_BYTE_TGSI + name;
+   }
+}
+
+/* this fits BCOLOR in the SM3 range, but is not reversible */
+static INLINE unsigned char
+pipe_semantic_to_byte_sm3(unsigned name, unsigned index)
+{
+   if(name == TGSI_SEMANTIC_BCOLOR)
+  return TGSI_SEMANTIC_BYTE_BCOLOR0 - 1 - index;
+   return pipe_semantic_to_byte(name, index);
+}
+
+static INLINE unsigned
+pipe_semantic_name_from_byte(unsigned char value)
+{
+   switch (value)
+   {
+   case TGSI_SEMANTIC_BYTE_POSITION:
+  return TGSI_SEMANTIC_POSITION;
+   case TGSI_SEMANTIC_BYTE_PSIZE:
+  return TGSI_SEMANTIC_PSIZE;
+   case TGSI_SEMANTIC_BYTE_FOG:
+  return TGSI_SEMANTIC_FOG;
+   case TGSI_SEMANTIC_BYTE_COLOR0:
+   case TGSI_SEMANTIC_BYTE_COLOR1:
+  return TGSI_SEMANTIC_COLOR;
+   case TGSI_SEMANTIC_BYTE_BCOLOR0:
+   case TGSI_SEMANTIC_BYTE_BCOLOR1:
+  return TGSI_SEMANTIC_BCOLOR;
+   default:
+  if(value  TGSI_SEMANTIC_BYTE_TGSI)
+return TGSI_SEMANTIC_GENERIC;
+  else
+return value - TGSI_SEMANTIC_BYTE_TGSI;
+   }
+}
+
+static INLINE unsigned
+pipe_semantic_index_from_byte(unsigned char value)
+{
+   if(value == TGSI_SEMANTIC_BYTE_POSITION)
+  return 0;
+
+   if(value = TGSI_SEMANTIC_BYTE_PSIZE)
+   {
+  if(value  TGSI_SEMANTIC_BYTE_PSIZE)
+return value - 1;
+  else
+return 0;
+   }
+
+   if(value  (TGSI_SEMANTIC_BYTE_COLOR0 + 2))
+   {
+  if(value  TGSI_SEMANTIC_BYTE_COLOR0)
+return value - 2;
+  else
+return value - TGSI_SEMANTIC_BYTE_COLOR0;
+   }
+
+   if(value = TGSI_SEMANTIC_BYTE_FOG)
+   {
+  if(value  TGSI_SEMANTIC_BYTE_FOG)
+return value - 4;
+  else
+return 0;
+   }
+
+   if(value  TGSI_SEMANTIC_BYTE_BCOLOR0)
+  return value - 5;
+
+   if(value == (TGSI_SEMANTIC_BYTE_BCOLOR1))
+  return 1;
+
+   return 0;
+}
+
+#endif /* U_SEMANTICS_H_ */
-- 
1.7.0.1.147.g6d84b


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


[Mesa3d-dev] [PATCH 6/6] nvfx: expose GLSL

2010-04-13 Thread Luca Barbieri
Still no control flow support, but basic stuff works.
---
 src/gallium/drivers/nvfx/nvfx_screen.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/gallium/drivers/nvfx/nvfx_screen.c 
b/src/gallium/drivers/nvfx/nvfx_screen.c
index 6742759..b935fa9 100644
--- a/src/gallium/drivers/nvfx/nvfx_screen.c
+++ b/src/gallium/drivers/nvfx/nvfx_screen.c
@@ -42,7 +42,7 @@ nvfx_screen_get_param(struct pipe_screen *pscreen, int param)
case PIPE_CAP_TWO_SIDED_STENCIL:
return 1;
case PIPE_CAP_GLSL:
-   return 0;
+   return 1;
case PIPE_CAP_ANISOTROPIC_FILTER:
return 1;
case PIPE_CAP_POINT_SPRITE:
-- 
1.7.0.1.147.g6d84b


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


[Mesa3d-dev] [PATCH 1/6] gallium: Remove TGSI_SEMANTIC_NORMAL.

2010-04-13 Thread Luca Barbieri
From: Michal Krol mic...@vmware.com

Use TGSI_SEMANTIC_GENERIC for this kind of stuff.
---
 src/gallium/auxiliary/tgsi/tgsi_dump.c |2 +-
 src/gallium/auxiliary/tgsi/tgsi_text.c |2 +-
 src/gallium/docs/source/tgsi.rst   |6 --
 src/gallium/drivers/svga/svga_tgsi_decl_sm30.c |4 
 src/gallium/include/pipe/p_shader_tokens.h |2 +-
 5 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c 
b/src/gallium/auxiliary/tgsi/tgsi_dump.c
index 5703141..b6df249 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_dump.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c
@@ -120,7 +120,7 @@ static const char *semantic_names[] =
FOG,
PSIZE,
GENERIC,
-   NORMAL,
+   ,
FACE,
EDGEFLAG,
PRIM_ID,
diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c 
b/src/gallium/auxiliary/tgsi/tgsi_text.c
index f918151..356eee0 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_text.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_text.c
@@ -933,7 +933,7 @@ static const char *semantic_names[TGSI_SEMANTIC_COUNT] =
FOG,
PSIZE,
GENERIC,
-   NORMAL,
+   ,
FACE,
EDGEFLAG,
PRIM_ID,
diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst
index c292cd3..d5e0220 100644
--- a/src/gallium/docs/source/tgsi.rst
+++ b/src/gallium/docs/source/tgsi.rst
@@ -1397,12 +1397,6 @@ These attributes are called generic because they may 
be used for anything
 else, including parameters, texture generation information, or anything that
 can be stored inside a four-component vector.
 
-TGSI_SEMANTIC_NORMAL
-
-
-Vertex normal; could be used to implement per-pixel lighting for legacy APIs
-that allow mixing fixed-function and programmable stages.
-
 TGSI_SEMANTIC_FACE
 
 
diff --git a/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c 
b/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c
index 73102a7..05d9102 100644
--- a/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c
+++ b/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c
@@ -61,10 +61,6 @@ static boolean translate_vs_ps_semantic( struct 
tgsi_declaration_semantic semant
   *idx = semantic.Index + 1; /* texcoord[0] is reserved for fog */
   *usage = SVGA3D_DECLUSAGE_TEXCOORD;
   break;
-   case TGSI_SEMANTIC_NORMAL:
-  *idx = semantic.Index;
-  *usage = SVGA3D_DECLUSAGE_NORMAL;
-  break;
default:
   assert(0);
   *usage = SVGA3D_DECLUSAGE_TEXCOORD;
diff --git a/src/gallium/include/pipe/p_shader_tokens.h 
b/src/gallium/include/pipe/p_shader_tokens.h
index c5c480f..baff802 100644
--- a/src/gallium/include/pipe/p_shader_tokens.h
+++ b/src/gallium/include/pipe/p_shader_tokens.h
@@ -139,7 +139,7 @@ struct tgsi_declaration_dimension
 #define TGSI_SEMANTIC_FOG3
 #define TGSI_SEMANTIC_PSIZE  4
 #define TGSI_SEMANTIC_GENERIC5
-#define TGSI_SEMANTIC_NORMAL 6
+/* gap */
 #define TGSI_SEMANTIC_FACE   7
 #define TGSI_SEMANTIC_EDGEFLAG   8
 #define TGSI_SEMANTIC_PRIMID 9
-- 
1.7.0.1.147.g6d84b


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


[Mesa3d-dev] [PATCH 5/6] nvfx: support proper shader linkage - adds glsl support

2010-04-13 Thread Luca Barbieri
---
 src/gallium/drivers/nvfx/nvfx_fragprog.c   |  146 ++--
 src/gallium/drivers/nvfx/nvfx_shader.h |1 +
 src/gallium/drivers/nvfx/nvfx_state.c  |4 +
 src/gallium/drivers/nvfx/nvfx_state.h  |   15 +++
 src/gallium/drivers/nvfx/nvfx_state_emit.c |2 +-
 src/gallium/drivers/nvfx/nvfx_vertprog.c   |   40 ++--
 6 files changed, 143 insertions(+), 65 deletions(-)

diff --git a/src/gallium/drivers/nvfx/nvfx_fragprog.c 
b/src/gallium/drivers/nvfx/nvfx_fragprog.c
index 5fa825a..b4b63e2 100644
--- a/src/gallium/drivers/nvfx/nvfx_fragprog.c
+++ b/src/gallium/drivers/nvfx/nvfx_fragprog.c
@@ -1,6 +1,7 @@
 #include pipe/p_context.h
 #include pipe/p_defines.h
 #include pipe/p_state.h
+#include util/u_semantics.h
 #include util/u_inlines.h
 
 #include pipe/p_shader_tokens.h
@@ -16,8 +17,6 @@
 struct nvfx_fpc {
struct nvfx_fragment_program *fp;
 
-   uint attrib_map[PIPE_MAX_SHADER_INPUTS];
-
unsigned r_temps;
unsigned r_temps_discard;
struct nvfx_sreg r_result[PIPE_MAX_SHADER_OUTPUTS];
@@ -36,6 +35,8 @@ struct nvfx_fpc {
 
struct nvfx_sreg imm[MAX_IMM];
unsigned nr_imm;
+
+   unsigned char sem_table[256]; /* semantic idx for each input semantic */
 };
 
 static INLINE struct nvfx_sreg
@@ -111,6 +112,11 @@ emit_src(struct nvfx_fpc *fpc, int pos, struct nvfx_sreg 
src)
sr |= (NVFX_FP_REG_TYPE_TEMP  NVFX_FP_REG_TYPE_SHIFT);
sr |= (src.index  NVFX_FP_REG_SRC_SHIFT);
break;
+   case NVFXSR_RELOCATED:
+   sr |= (NVFX_FP_REG_TYPE_INPUT  NVFX_FP_REG_TYPE_SHIFT);
+   printf(adding relocation at %x for %x\n, fpc-inst_offset, 
src.index);
+   util_dynarray_append(fpc-fp-sem_relocs[src.index], unsigned, 
fpc-inst_offset);
+   break;
case NVFXSR_CONST:
if (!fpc-have_const) {
grow_insns(fpc, 4);
@@ -241,8 +247,28 @@ tgsi_src(struct nvfx_fpc *fpc, const struct 
tgsi_full_src_register *fsrc)
 
switch (fsrc-Register.File) {
case TGSI_FILE_INPUT:
-   src = nvfx_sr(NVFXSR_INPUT,
- fpc-attrib_map[fsrc-Register.Index]);
+   if(fpc-fp-info.input_semantic_name[fsrc-Register.Index] == 
TGSI_SEMANTIC_POSITION) {
+   
assert(fpc-fp-info.input_semantic_index[fsrc-Register.Index] == 0);
+   src = nvfx_sr(NVFXSR_INPUT, 
NVFX_FP_OP_INPUT_SRC_POSITION);
+   } else 
if(fpc-fp-info.input_semantic_name[fsrc-Register.Index] == 
TGSI_SEMANTIC_COLOR) {
+   
if(fpc-fp-info.input_semantic_index[fsrc-Register.Index] == 0)
+   src = nvfx_sr(NVFXSR_INPUT, 
NVFX_FP_OP_INPUT_SRC_COL0);
+   else 
if(fpc-fp-info.input_semantic_index[fsrc-Register.Index] == 1)
+   src = nvfx_sr(NVFXSR_INPUT, 
NVFX_FP_OP_INPUT_SRC_COL1);
+   else
+   assert(0);
+   } else 
if(fpc-fp-info.input_semantic_name[fsrc-Register.Index] == 
TGSI_SEMANTIC_FOG) {
+   
assert(fpc-fp-info.input_semantic_index[fsrc-Register.Index] == 0);
+   src = nvfx_sr(NVFXSR_INPUT, NVFX_FP_OP_INPUT_SRC_FOGC);
+   } else 
if(fpc-fp-info.input_semantic_name[fsrc-Register.Index] == 
TGSI_SEMANTIC_FACE) {
+   /* TODO: check this has the correct values */
+   /* XXX: what do we do for nv30 here (assuming it lacks 
facing)?!  */
+   
assert(fpc-fp-info.input_semantic_index[fsrc-Register.Index] == 0);
+   src = nvfx_sr(NVFXSR_INPUT, 
NV40_FP_OP_INPUT_SRC_FACING);
+   } else {
+   
assert(fpc-fp-info.input_semantic_name[fsrc-Register.Index] == 
TGSI_SEMANTIC_GENERIC);
+   src = nvfx_sr(NVFXSR_RELOCATED, 
fpc-sem_table[fpc-fp-info.input_semantic_index[fsrc-Register.Index]]);
+   }
break;
case TGSI_FILE_CONSTANT:
src = constant(fpc, fsrc-Register.Index, NULL);
@@ -611,48 +637,6 @@ nvfx_fragprog_parse_instruction(struct nvfx_context* nvfx, 
struct nvfx_fpc *fpc,
 }
 
 static boolean
-nvfx_fragprog_parse_decl_attrib(struct nvfx_context* nvfx, struct nvfx_fpc 
*fpc,
-   const struct tgsi_full_declaration *fdec)
-{
-   int hw;
-
-   switch (fdec-Semantic.Name) {
-   case TGSI_SEMANTIC_POSITION:
-   hw = NVFX_FP_OP_INPUT_SRC_POSITION;
-   break;
-   case TGSI_SEMANTIC_COLOR:
-   if (fdec-Semantic.Index == 0) {
-   hw = NVFX_FP_OP_INPUT_SRC_COL0;
-   } else
-   if (fdec-Semantic.Index == 1) {
-   hw = NVFX_FP_OP_INPUT_SRC_COL1;
-   } else {
-   NOUVEAU_ERR(bad colour semantic index\n);
-   

Re: [Mesa3d-dev] r300g: hack around issue with doom3 and 0 stride

2010-04-10 Thread Luca Barbieri
 r300g: hack around issue with doom3 and 0 stride

 This is most likely a bug in the mesa state tracker, but do the quick hack
 for now to avoid the divide by 0.

This is not a bug: stride 0 means that the vertex attribute is
constant for all vertices.

It is not a special value either: advancing the vertex attribute
pointer by 0 will naturally result in always fetching the same value.

Thus, the patch is not likely to be correct: you should instead either
program stride 0 to the hardware if supported, or fetch the vertex
attribute with the CPU (I think it is always in a user buffer, but not
sure, maybe OpenGL allows explicitly specifying a VBO with stride 0)
and use whatever means Radeon provides to set a constant vertex
attribute (e.g. nVidia GPUs have a FIFO method exactly for that).

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] u_half.? - u_format_half.? rename

2010-04-08 Thread Luca Barbieri
I named it that way because it is datatype conversion functionality,
which is conceptually a lower layer than format conversion, which
operates on multi-component formats, and is also totally independent
of the existing format conversion functionality.

It is the only member of that layer because all other currently needed
datatype conversions can be performed with trivial C language
expressions: this could change as other unusual floating point
datatypes are needed (e.g. 6e5 and 5e5 for EXT_packed_float).

That said, feel free to rename it: it's just a cosmetic issue.
Alternatively, maybe it a new data conversion prefix could be
invented, like u_convert_half.* or something like that.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler

2010-04-05 Thread Luca Barbieri
 This might depend on the target: R600+, for example, is quite
 scalar-oriented anyway (modulo a lot of subtle limitations), so just
 pretending that everything is scalar could work well there since
 revectorizing is almost unnecessary.

Interesting, nv50 is also almost fully scalar, and based on the
Gallium driver source, i965 seems to be scalar too.

So it seems it would really make sense to also have a scalar IR,
whether LLVM IR or something else.

Of course, scalar is usually actually SoA SIMD, but that's mostly
hidden, except for things like barriers, join points and nv50 voting
instructions.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Gallium: ARB_half_float_vertex

2010-04-04 Thread Luca Barbieri
There was some talk about doing the query with a vertex buffer target
for is_format_supported.

After gallium-resources is merged, this should be automatically possible.

BTW, the st/mesa patch originally was from Dave Airlie and was
slightly changed by me.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Gallium: ARB_half_float_vertex

2010-04-04 Thread Luca Barbieri
 Does it mean there will be format fallbacks? Because dword-unaligned but
 still pretty common (i.e. GL1.1) vertex formats aren't supported by r300,
 most often we hit R16G16B16. What will happen when is_format_supported says
 NO to such a format? I hope it won't share the fate of PIPE_CAP_SM3, which
 every in-tree state tracker ignores.

I'm not sure I understand correctly what you are saying.

The idea is to do like you did in your patch, but instead of calling
screen-get_param(screen, PIPE_CAP_HALF_FLOAT_VERTEX), calling
screen-is_format_supported(screen, PIPE_FORMAT_R16G16B16G16,
PIPE_BUFFER, ..., ...).

The PIPE_BUFFER target is supported in gallium-resources, but I'm not
sure whether this way of querying vertex formats is supported; it
would probably need to be added first.

If you mean that r300 doesn't support R16G16B16, I suppose you can
just use R16G16B16A16 and ignore the extra fetched w element (the
vertex buffer stride will make this work properly).

However, if non-dword-aligned vertex buffer strides or vertex element
offsets are not supported, I think you have a serious problem, which
is however independent of half float vertices since I don't think
OpenGL places any alignment constraints on those values (correct me if
I'm wrong).

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Gallium: ARB_half_float_vertex

2010-04-04 Thread Luca Barbieri
 If you mean that r300 doesn't support R16G16B16, I suppose you can
 just use R16G16B16A16 and ignore the extra fetched w element (the
 vertex buffer stride will make this work properly).

 I've tried to do it this way, it locks up (unless I am missing something).

Shouldn't there be official ATI hardware documentation for r300
describing such things? (just curious)

Otherwise, I guess you could trace the ATI binary driver and see what it does...

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler

2010-04-04 Thread Luca Barbieri
 Way back I actually looked into LLVM for R300. I was totally
 unconvinced by their vector support back then, but that may well have
 changed. In particular, I'm curious about how LLVM deals with
 writemasks. Writing to only a select subsets of components of a vector
 is something I've seen in a lot of shaders, but it doesn't seem to be
 too popular in CPU-bound SSE code, which is probably why LLVM didn't
 support it well. Has that improved?

 The trouble with writemasks is that it's not something you can just
 implement one module for. All your optimization passes, from simple
 peephole to the smartest loop modifications need to understand the
 meaning of writemasks.

You should be able to just use
shufflevector/insertelement/extractelement to mix the new computed
values with the previous values in the vector register (as well as
doing swizzles).

There is also the option of immediately scalarizing, optimizing the
scalar code, and then revectorizing.
This risks pessimizing the input code, but might turn out to work well.

 I agree, though if I were to start an LLVM-based compilation project,
 I would do it for R600+, not for R300. That would be a very different
 kind of project.

 A LLVM-TGSI conversion is not the best way to go because TGSI doesn't
 match the hardware all that well, at least in the Radeon family.
 R300-R500 fragment programs have the weird RGB/A split, and R600+ is
 yet another beast that looks quite different from TGSI. So at least
 for Radeon, I believe it would be best to generate hardware-level
 instructions directly from LLVM, possibly via some Radeon-family
 specific intermediate representation.

The advantage of LLVM-TGSI would be that it works with all drivers
without any driver specific code, so it probably makes sense as an
initial step.
nv30/nv40 fragment programs map almost directly to TGSI (with the
addition of condition codes, and half float precision, and a few other
things).
Things that end up using an existing graphics API like vmware svga, or
using the llvm optimizer for game development, also need tgsi-like
output.
Thus, even if TGSI itself becomes irrelevant at some point, any
nontrivial parts of the LLVM-TGSI code should be needed anyway for
those cases.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] How do we init half float tables?

2010-04-03 Thread Luca Barbieri
They are not passing for me with current master and a 32-bit system:

Here are the failures:

Testing util_format_dxt1_rgb_pack_8unorm ...
FAILED: f2 d7 90 20 ae 2c 6f 97 obtained
f2 d7 b0 20 ae 2c 6f 97 expected

Testing util_format_dxt5_rgba_pack_8unorm ...
FAILED: f7 10 c5 0c 9a 73 b4 9c f6 8f ab 32 2a 9a 95 5a obtained
f8 11 c5 0c 9a 73 b4 9c f6 8f ab 32 2a 9a 95 5a expected

Testing util_format_dxt1_rgb_unpack_8unorm ...
FAILED: {0x99, 0xb0, 0x8e, 0xff}, {0x5d, 0x62, 0x89, 0xff}, {0x99,
0xb0, 0x8e, 0xff}, {0x99, 0xb0, 0x8e, 0xff}, {0xd6, 0xff, 0x94, 0xff},
{0x5d, 0x62, 0x89, 0xff}, {0x99, 0xb0, 0x8e, 0xff}, {0xd6, 0xff, 0x94,
0xff}, {0x5d, 0x62, 0x89, 0xff}, {0x5d, 0x62, 0x89, 0xff}, {0x99,
0xb0, 0x8e, 0xff}, {0x21, 0x14, 0x84, 0xff}, {0x5d, 0x62, 0x89, 0xff},
{0x21, 0x14, 0x84, 0xff}, {0x21, 0x14, 0x84, 0xff}, {0x99, 0xb0, 0x8e,
0xff} obtained
{0x98, 0xaf, 0x8e, 0xff}, {0x5c, 0x62, 0x88, 0xff}, {0x98,
0xaf, 0x8e, 0xff}, {0x98, 0xaf, 0x8e, 0xff}, {0xd6, 0xff, 0x94, 0xff},
{0x5c, 0x62, 0x88, 0xff}, {0x98, 0xaf, 0x8e, 0xff}, {0xd6, 0xff, 0x94,
0xff}, {0x5c, 0x62, 0x88, 0xff}, {0x5c, 0x62, 0x88, 0xff}, {0x98,
0xaf, 0x8e, 0xff}, {0x21, 0x13, 0x84, 0xff}, {0x5c, 0x62, 0x88, 0xff},
{0x21, 0x13, 0x84, 0xff}, {0x21, 0x13, 0x84, 0xff}, {0x98, 0xaf, 0x8e,
0xff} expected

Testing util_format_dxt1_rgba_unpack_8unorm ...
FAILED: {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90, 0xff}, {0x4e,
0xaa, 0x90, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90, 0xff},
{0x29, 0xff, 0xff, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90,
0xff}, {0x73, 0x55, 0x21, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x00,
0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90, 0xff}, {0x4e, 0xaa, 0x90, 0xff},
{0x00, 0x00, 0x00, 0x00}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90,
0xff} obtained
{0x00, 0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f, 0xff}, {0x4e,
0xa9, 0x8f, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f, 0xff},
{0x29, 0xff, 0xff, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f,
0xff}, {0x73, 0x54, 0x21, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x00,
0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f, 0xff}, {0x4e, 0xa9, 0x8f, 0xff},
{0x00, 0x00, 0x00, 0x00}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f,
0xff} expected

Testing util_format_dxt3_rgba_unpack_8unorm ...
FAILED: {0x6d, 0xc6, 0x96, 0x77}, {0x6d, 0xc6, 0x96, 0xee}, {0x6d,
0xc6, 0x96, 0xaa}, {0x8c, 0xff, 0xb5, 0x44}, {0x6d, 0xc6, 0x96, 0xff},
{0x6d, 0xc6, 0x96, 0x88}, {0x31, 0x55, 0x5a, 0x66}, {0x6d, 0xc6, 0x96,
0x99}, {0x31, 0x55, 0x5a, 0xbb}, {0x31, 0x55, 0x5a, 0x55}, {0x31,
0x55, 0x5a, 0x11}, {0x6d, 0xc6, 0x96, 0xcc}, {0x6d, 0xc6, 0x96, 0xcc},
{0x6d, 0xc6, 0x96, 0x11}, {0x31, 0x55, 0x5a, 0x44}, {0x31, 0x55, 0x5a,
0x88} obtained
{0x6c, 0xc6, 0x96, 0x77}, {0x6c, 0xc6, 0x96, 0xee}, {0x6c,
0xc6, 0x96, 0xa9}, {0x8c, 0xff, 0xb5, 0x43}, {0x6c, 0xc6, 0x96, 0xff},
{0x6c, 0xc6, 0x96, 0x87}, {0x31, 0x54, 0x5a, 0x66}, {0x6c, 0xc6, 0x96,
0x98}, {0x31, 0x54, 0x5a, 0xba}, {0x31, 0x54, 0x5a, 0x54}, {0x31,
0x54, 0x5a, 0x10}, {0x6c, 0xc6, 0x96, 0xcc}, {0x6c, 0xc6, 0x96, 0xcc},
{0x6c, 0xc6, 0x96, 0x10}, {0x31, 0x54, 0x5a, 0x43}, {0x31, 0x54, 0x5a,
0x87} expected

Testing util_format_dxt5_rgba_unpack_8unorm ...
FAILED: {0x6d, 0xc6, 0x96, 0x74}, {0x6d, 0xc6, 0x96, 0xf8}, {0x6d,
0xc6, 0x96, 0xb6}, {0x8c, 0xff, 0xb5, 0x53}, {0x6d, 0xc6, 0x96, 0xf8},
{0x6d, 0xc6, 0x96, 0x95}, {0x31, 0x55, 0x5a, 0x53}, {0x6d, 0xc6, 0x96,
0x95}, {0x31, 0x55, 0x5a, 0xb6}, {0x31, 0x55, 0x5a, 0x53}, {0x31,
0x55, 0x5a, 0x11}, {0x6d, 0xc6, 0x96, 0xd7}, {0x6d, 0xc6, 0x96, 0xb6},
{0x6d, 0xc6, 0x96, 0x11}, {0x31, 0x55, 0x5a, 0x32}, {0x31, 0x55, 0x5a,
0x95} obtained
{0x6c, 0xc6, 0x96, 0x73}, {0x6c, 0xc6, 0x96, 0xf7}, {0x6c,
0xc6, 0x96, 0xb6}, {0x8c, 0xff, 0xb5, 0x53}, {0x6c, 0xc6, 0x96, 0xf7},
{0x6c, 0xc6, 0x96, 0x95}, {0x31, 0x54, 0x5a, 0x53}, {0x6c, 0xc6, 0x96,
0x95}, {0x31, 0x54, 0x5a, 0xb6}, {0x31, 0x54, 0x5a, 0x53}, {0x31,
0x54, 0x5a, 0x10}, {0x6c, 0xc6, 0x96, 0xd7}, {0x6c, 0xc6, 0x96, 0xb6},
{0x6c, 0xc6, 0x96, 0x10}, {0x31, 0x54, 0x5a, 0x31}, {0x31, 0x54, 0x5a,
0x95} expected

Compiling libtxc_dxtn with -O0 or with -march=core2 -msse2
-mfpmath=sse did not make them work.

As you can see the tests seem mostly off-by-one, which makes me think
of an approximation problem.

libtxc_dxtn seems to take 8-bit input instead of floating point input,
so and it seems to be inherently hard to get it to roundtrip sensibly.

Since only integer-coordinate points can be used, they are unlikely to
be exactly on a line unless specifically crafted to be so.

Thus, a possible solution could be to actually pick a starting color,
pick an increment, and generate an exact line by adding multiples of
that increment to the starting color.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks 

Re: [Mesa3d-dev] How do we init half float tables?

2010-04-03 Thread Luca Barbieri
For instance, the DXT1 test is wrong.

The red values used are:
33
93
153
214

99 - 33 = 60
153 - 93 = 60
214 - 153 = 61

213 should be used instead (i.e. 0xd5 instead 0xd6)

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


[Mesa3d-dev] [PATCH] progs/gallium/unit: improve error detection in u_format_test and make it more lenient for S3TC

2010-04-03 Thread Luca Barbieri
Collect the maximum error for fetch/unpack tests, and ratio of flipped
to total bits for pack tests.

Add lenient thresholds for S3TC tests.
---
 progs/gallium/unit/u_format_test.c |  163 +++-
 1 files changed, 86 insertions(+), 77 deletions(-)

diff --git a/progs/gallium/unit/u_format_test.c 
b/progs/gallium/unit/u_format_test.c
index 53e0284..1911dad 100644
--- a/progs/gallium/unit/u_format_test.c
+++ b/progs/gallium/unit/u_format_test.c
@@ -36,22 +36,48 @@
 #include util/u_format_s3tc.h
 
 
+static float
+float_error(float x, float y)
+{
+   return fabsf(y - x);
+}
+
+static float
+byte_error(uint8_t x, uint8_t y)
+{
+   return float_error(x / 255.0, y / 255.0);
+}
+
+/* this is done in this terrible way only because these are unit tests.
+ * a real implementation must use a lookup table, or the mask/shift/add
+ * algorithm in the Linux source
+ * it should also use the builtin/intrinsic if available
+ */
+static unsigned
+popcnt8(uint8_t v)
+{
+   unsigned i;
+   unsigned cnt = 0;
+   for(i = 0; i  8; ++i)
+  cnt += ((v  i)  1);
+   return cnt;
+}
+
 static boolean
-compare_float(float x, float y)
+print_max_error(const struct util_format_description *format_desc, float 
max_error)
 {
-   float error = y - x;
+   if(max_error = FLT_EPSILON)
+  return TRUE;
 
-   if (error  0.0f)
-  error = -error;
+   printf(MAX ABS ERROR: %f float, %.1f 8scaled\n, max_error, max_error * 
255.0);
 
-   if (error  FLT_EPSILON) {
-  return FALSE;
-   }
+   /* compression tests aren't currently perfect, so be lenient here */
+   if(format_desc-layout == UTIL_FORMAT_LAYOUT_S3TC  max_error  0.01f)
+  return TRUE;
 
-   return TRUE;
+   return FALSE;
 }
 
-
 static void
 print_packed(const struct util_format_description *format_desc,
  const char *prefix,
@@ -69,6 +95,31 @@ print_packed(const struct util_format_description 
*format_desc,
printf(%s, suffix);
 }
 
+static boolean
+print_packed_results(const struct util_format_description *format_desc, const 
struct util_format_test_case *test, uint8_t* packed)
+{
+   unsigned flipped_bits = 0;
+   unsigned total_bits = 0;
+   float flipped_bits_ratio;
+   unsigned i;
+   for (i = 0; i  format_desc-block.bits/8; ++i) {
+  flipped_bits += popcnt8((test-packed[i] ^ packed[i])  test-mask[i]);
+  total_bits += popcnt8(test-mask[i]);
+   }
+
+   flipped_bits_ratio = (float)flipped_bits / total_bits;
+
+   if (flipped_bits)
+  printf(FLIPPED BITS: %u (%u %%)\n, flipped_bits, 
(unsigned)(flipped_bits_ratio * 100.0));
+
+   /* TODO: S3TC threshold is random */
+   if (flipped_bits_ratio  (format_desc-layout == UTIL_FORMAT_LAYOUT_S3TC ? 
0.1 : 0)) {
+  print_packed(format_desc, FAILED: , packed,  obtained\n);
+  print_packed(format_desc, , test-packed,  expected\n);
+  return FALSE;
+   }
+   return TRUE;
+}
 
 static void
 print_unpacked_doubl(const struct util_format_description *format_desc,
@@ -94,7 +145,7 @@ print_unpacked_doubl(const struct util_format_description 
*format_desc,
 static void
 print_unpacked_float(const struct util_format_description *format_desc,
  const char *prefix,
- const float 
unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4],
+ float 
unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4],
  const char *suffix)
 {
unsigned i, j;
@@ -115,7 +166,7 @@ print_unpacked_float(const struct util_format_description 
*format_desc,
 static void
 print_unpacked_8unorm(const struct util_format_description *format_desc,
   const char *prefix,
-  const uint8_t 
unpacked[][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4],
+  uint8_t unpacked[][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4],
   const char *suffix)
 {
unsigned i, j;
@@ -138,26 +189,23 @@ test_format_fetch_float(const struct 
util_format_description *format_desc,
 {
float 
unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4] = 
{ { { 0 } } };
unsigned i, j, k;
-   boolean success;
+   float max_error = 0.0f;
 
-   success = TRUE;
for (i = 0; i  format_desc-block.height; ++i) {
   for (j = 0; j  format_desc-block.width; ++j) {
  format_desc-fetch_float(unpacked[i][j], test-packed, j, i);
- for (k = 0; k  4; ++k) {
-if (!compare_float(test-unpacked[i][j][k], unpacked[i][j][k])) {
-   success = FALSE;
-}
- }
+ for (k = 0; k  4; ++k)
+max_error = MAX2(max_error, float_error(test-unpacked[i][j][k], 
unpacked[i][j][k]));
   }
}
 
-   if (!success) {
+   if (!print_max_error(format_desc, max_error)) {
   print_unpacked_float(format_desc, FAILED: , unpacked,  obtained\n);
   print_unpacked_doubl(format_desc, , test-unpacked,  
expected\n);
+  return FALSE;
}
 
-   return success;
+  

Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler

2010-04-03 Thread Luca Barbieri
This is somewhat nice, but without using a real compiler, the result
will still be just a toy, unless you employ hundreds of compiler
experts working full time on the project.

For instance, Wikipedia lists the following loop optimizations:
# loop interchange : These optimizations exchange inner loops with
outer loops. When the loop variables index into an array, such a
transformation can improve locality of reference, depending on the
array's layout. This is also known as loop permutation.

# loop splitting/loop peeling : Loop splitting attempts to simplify a
loop or eliminate dependencies by breaking it into multiple loops
which have the same bodies but iterate over different contiguous
portions of the index range. A useful special case is loop peeling,
which can simplify a loop with a problematic first iteration by
performing that iteration separately before entering the loop.

# loop fusion or loop combining : Another technique which attempts to
reduce loop overhead. When two adjacent loops would iterate the same
number of times (whether or not that number is known at compile time),
their bodies can be combined as long as they make no reference to each
other's data.

# loop fission or loop distribution : Loop fission attempts to break a
loop into multiple loops over the same index range but each taking
only a part of the loop's body. This can improve locality of
reference, both of the data being accessed in the loop and the code in
the loop's body.

# loop unrolling: Duplicates the body of the loop multiple times, in
order to decrease the number of times the loop condition is tested and
the number of jumps, which may degrade performance by impairing the
instruction pipeline. Completely unrolling a loop eliminates all
overhead (except multiple instruction fetches  increased program load
time), but requires that the number of iterations be known at compile
time (except in the case of JIT compilers). Care must also be taken to
ensure that multiple re-calculation of indexed variables is not a
greater overhead than advancing pointers within the original loop.

# loop unswitching : Unswitching moves a conditional inside a loop
outside of it by duplicating the loop's body, and placing a version of
it inside each of the if and else clauses of the conditional.

# loop inversion : This technique changes a standard while loop into a
do/while (a.k.a. repeat/until) loop wrapped in an if conditional,
reducing the number of jumps by two, for cases when the loop is
executed. Doing so duplicates the condition check (increasing the size
of the code) but is more efficient because jumps usually cause a
pipeline stall. Additionally, if the initial condition is known at
compile-time and is known to be side-effect-free, the if guard can be
skipped.

# loop-invariant code motion : If a quantity is computed inside a loop
during every iteration, and its value is the same for each iteration,
it can vastly improve efficiency to hoist it outside the loop and
compute its value just once before the loop begins. This is
particularly important with the address-calculation expressions
generated by loops over arrays. For correct implementation, this
technique must be used with loop inversion, because not all code is
safe to be hoisted outside the loop.

# loop reversal : Loop reversal reverses the order in which values are
assigned to the index variable. This is a subtle optimization which
can help eliminate dependencies and thus enable other optimizations.
Also, certain architectures utilise looping constructs at Assembly
language level that count in a single direction only (e.g.
decrement-jump-if-not-zero (DJNZ)).

# loop tiling/loop blocking : Loop tiling reorganizes a loop to
iterate over blocks of data sized to fit in the cache.

# loop skewing : Loop skewing takes a nested loop iterating over a
multidimensional array, where each iteration of the inner loop depends
on previous iterations, and rearranges its array accesses so that the
only dependencies are between iterations of the outer loop.



Good luck doing all this on TGSI (especially if the developer does not
have serious experience writing production compilers).

Also, this does not mention all the other optimizations and analyses
required to the above stuff well (likely other 10-20 things).

Using a real compiler (e.g. LLVM, but also gcc or Open64), those
optimizations are already implemented, or at least there is already a
team of experienced compiler developers who are working full time to
implement such optimizations, allowing you to then just turn them on
without having to do any of the work yourself.

Note all X compiler is bad for VLIW or whatever GPU architecture
objections are irrelevant, since almost all optimizations are totally
architecture independent.

Also note that we should support OpenCL/compute shaders (already
available for *3* years on e.g. nv50) and those *really* need a real
compiler (as in, something developed for years by a team of compiler
experts, 

Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler

2010-04-03 Thread Luca Barbieri
As a further example that just came to mind, nv40 (GeForce 6-7 and PS3
RSX) supports control flow in fragment shaders, but does not
apparently support the continue keyword (since NV_fragment_program2,
which maps almost directly to the hardware, does not have it either).

I implemented TGSI control flow in a private branch, but did not
implement the continue keyword.

Implementing continue requires to transform the code to generate and
carry around should continue flags, or perform even less trivial
transformations including code duplication.

Unfortunately, doing requires non-local modifications, and thus would
require to do something beyond just scanning the TGSI source code as
the nv30/nv40 driver currently does.

If there was a TGSI-LLVM-TGSI module, the LLVM-TGSI control flow
reconstruction would already handle this, and it would be enough to
tell it to not make use of the continue instruction: it would then
automatically generate the proper if/endif structure, duplicating code
and/or introducing flags as needed in a generic way.

As things stand now, I'm faced with either just hoping the GLSL
programs don't use continue, implementing an hack in the nv40 shader
backend (where such an high-level optimization does not belong at all
and can't be done cleanly), or writing the LLVM module myself before
tackling this.

With an LLVM-based infrastructure, there would be a clear and
straightforward way to solve this, will all the supporting
infrastructure already available and the ability to create an
optimization pass reusable by other drivers that may face the same
issue.

This is just an example, by the way: others can be found.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler

2010-04-03 Thread Luca Barbieri
By the way, if you want a simple, limited and temporary, but very
effective, way to optimize shaders, here it is:
1. Trivially convert TGSI to GLSL
2. Feed the GLSL to the nVidia Cg compiler, telling it to produce
optimized output in ARB_fragment_program format
3. Ask the Mesa frontend/state tracker to parse the
ARB_fragment_program and give you back TGSI

This does actually optimize the program well and does all the nice
control flow transformations desired.
If your GPU can support predicates or condition codes, you can also
ask the Cg compiler to give you NV_fragment_program_option, which will
use them efficiently.
If it also supports control flow, you can ask for NV_fragment_program2
and get control flow too where appropriate.

Of course, if this does not happen to do exactly what you want, you
are totally out of luck, since it is closed source.
With an ad-hoc TGSI optimizer, you can modify it, but that will often
require to rearchitecture the module, since it may be too primitive
for the new feature you want, and implement everything from scratch
with no supporting tools to help you.

With a real compiler framework, you already have the optimization
ready for use, or you at least have a comprehensive conceptual
framework and IR and a full set of analyses, frameworks and tools to
use, not to mention a whole community of compiler developers that can
at least tell you what is the best way of doing what you want
(actually giving out competent advice), if not even have already done
or planned to do it themselves.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] gallium-util-format-is-supported

2010-04-03 Thread Luca Barbieri
 I don't agree with this. Making the format description table mutable when the 
 only formats that are potentially unsupported due to patent issues are s3tc 
 variants makes no sense. S3TC formats *are* special. There is nothing to 
 generalize here.

Yes, I don't like this very much either.
The immediate alternative is to have separate is_supported flags for
externally-implemented formats, but this also doesn't look perfect to
me.

Another thing to look at is to remove both is_supported and the
pack/unpack functions, and put them in a separate, possibly mutable,
table.
In some sense pack/unpack functionality does not really belong in the
format description, since many interfaces are possible (for instance
llvmpipe has another interface that is code-generated separately for
SoA tiles).

This last option, with a mutable format access table, seems
conceptually the cleanest to me, but not sure.

 Replacing the conditionals with a no-op stubs is a good optimization.
 But attempting to load s3tc shared library from the stubs is unnecessary. 
 Stubs should have an assert(0) -- it is an error to attempt any S3TC 
 (de)compression when there's no support for it.

The fundamental issue here seems to be: what to do if the application
tries to read/write an unsupported format?

Currently, unsupported formats have empty functions rather than
assert(0), so I just kept with that convention.
Since it is permissible to call other format functions without
checking they are supported, I made S3TC work consistently with that,
which requires on-demand loading upon format access.

In general it seems to me that the fact that S3TC (or any other)
formats are somehow special should be completely hidden to any user.
This allows to write generic robust format-independent code. Explicit
initialization or ad-hoc format checking goes counter to this, and
requires to sprinkle code everywhere (for instance, I suspect the rbug
texture-examination tools don't work right now in master on S3TC
because they don't call util_format_s3tc_init).

It might makes sense to make all unsupported formats assert(0). A C++
exception would be the perfect thing since you could catch it, but
unfortunately we aren't using C++ right now.

Another option that seems better to me is to have an
util_format_get_functions that would either give you a pointer to a
table of functions, or return NULL if unsupported, and make this the
only way of accessing format conversions.
This way, applications will naturally have to check for support before
usage, and both GCC and a static checker can be told to flag an error
if the util_format_get_functions return value is not checked for null
before use.

BTW, note that the indirect function calls are also generally slow,
and we may want to switch Gallium to C++ and use C++ templates to
specialize (and fully inline) whole algorithms for specific formats.
llvmpipe and the code generation facilities lessen the need for this,
but it might perhaps be worthwhile in some cases.
This a wholly separate issue, but may be worth keeping in mind.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler

2010-04-03 Thread Luca Barbieri
 I agree with you that doing these kinds of optimizations is a difficult
 task, but I am trying to focus my proposal on emulating branches and
 loops for older hardware that don't have branching instructions rather
 than performing global optimizations on the TGSI code.  I don't think
 most of the loop optimizations you listed are even possible on hardware
 without branching instructions.

Yes, that's possible.
In fact, if you unroll loops, those optimizations can be done after
loop unrolling.

This does not however necessarily change things, since while you can
e.g. avoid loop-invariant code motion, you still need common
subexpression elimination to remove the mutiple redundant copies of
the loop-invariant code generated by unrolling.

Also even loop unrolling needs to find the number of iterations, which
at the very least requires simple constant folding, and potentially a
whole suite of complex optimization to work in all possible

Some of the challenges of this were mentioned in a previous thread, as
well as LLVM-related issues

 (2) Write a LLVM-TGSI backend, restricted to programs without any control 
 flow

 I think (2) is probably the closest to what I am proposing, and it is
 something I can take a look at.

Note that this means an _input_ program without control flow, that is
a control flow graph with a single basic block.

Once you have more than one basic block, you need to convert the CFG
for an arbitrary graph to something made of structured loops and
conditionals.

The problem here is that GPUs often use a SIMT approach.
This means that the GPU internally works like an SSE CPU with vector
registers (but often much wider, with up to 32 elements or even more).
However, this is hidden to the programmer, by putting the variables
related to several pixels in the vector, and making you think
everything is a scalar or just a 4-component vector

This works fine as long as there is no control flow; however when you
reach a conditional jump, some pixels may want to take one path and
some others another path.
The solution is to have an execution mask and do not write to any
pixels not in the execution masks.

When and if/else/endif structure is encountered, if the pixels all
take the same path, things work like CPUs; if that is not the case,
both branches are executed with the appropriate execution masks, and
things continue normally after the endif.

The problem here is that this needs a structure if/else/endif
formulation as opposed to arbitrary gotos.

However LLVM and most optimizers work in arbitrary-goto formulation,
which needs to be converted to a structured approach.

The above all applies for GPU with hardware control flow.
However, even without it, you have the same issue of reconstructing
if/else/endif blocks, since you need to basically do the same in
software, using a the if conditional to choose between results
computed by the branches.

Converting a control flow graph to a structured program is always
possible, but doing it well requires some thought.
In particular, you need to be careful to not break DDX instructions,
which operate on a 2x2 block of pixels, and will thus behave
differently if some of the other things have diverged away due to
control flow modifications.
This may require to make sure control flow optimizations do not
duplicate them, and possibly other issues.

Using an ad-hoc optimizer does indeed sidestep the issue, but only as
long as you don't try to do non-trivial control flow optimization or
changes.
In that case, those may be best expressed on an arbitrary control flow
graph (e.g. the issue with converting continue to if/end), and at
this point you would need to add that logic anyway.


At any rate, I'm not sure whether this is suitable for your GSoC project or not.

My impression is that using an existing compiler would prove to be
more widely useful and more long lasting, especially considering that
we are moving towards applications and hardware with very complex
shader support (consider the CUDA/OpenCL shaders and the very generic
GPU shading capabilities).

An ad-hoc TGSI optimizer will probably prove unsuitable for efficient
code generation for, say, scientific applications using OpenCL, and
would need to be later replaced.

So my personal impression (which could be wrong) is that using an
existing optimizer, while possibly requiring an higher initial
investment, should have much better payoffs in the long run, by making
everything beyond the initial TGSI-LLVM-TGSI work already done or
easier to do.

From a coding perspective, you lose the design and write everything
myself from scratch aspect, but you gain experience with a complex
and real-world compiler, and are able to write more complex
optimizations and transforms due to having a well-developed
infrastructure allowing to express them easily.

Furthermore, hopefully using a real compiler would result in seeing
your work producing very good code in all cases, while an ad-hoc
optimizer would impove the 

Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler

2010-04-03 Thread Luca Barbieri
 Gallium. Obviously a code-generator that can handle control-flow (to be honest
 I'm really not sure why you want to restrict it to something without control-
 flow in the first place).

The no-control-flow was just for the first step, with a second step
supporting everything.

 Having said that I'm not sure whether this is something that's a good GSOC
 project. It's a fairly difficult piece of code to write. One that to do right
 will depend on adding some features to TGSI (a good source of inspiration for
 those would be AMD's CAL and NVIDIA's PTX
 http://developer.amd.com/gpu_assets/ATI_Intermediate_Language_(IL)_Specification_v2b.pdf
 http://www.nvidia.com/content/CUDA-ptx_isa_1.4.pdf )

This would be required to handle arbitrary LLVM code (e.g. for
clang/OpenCL use), but since GLSL shader code starts as TGSI, it
should be possible to convert it back without TGSI.

 I thought the initial proposal was likely a lot more feasible for a GSOC (of
 course there one has to point out that Mesa's GLSL compiler already does
 unroll loops and in general simplifies control-flow so the points #1 and #2 
 are
 largely no-ops, but surely there's enough work on Gallium Radeon's drivers
 left to keep Tom busy). Otherwise having a well-defined and reduced scope with
 clear deliverables would be rather necessary for LLVM-TGSI code because that
 is not something that you could get rock solid over a summer.

I'd say, as an initial step, restricting to code produced by
TGSI-LLVM (AoS) that can be expressed with no intrinsics, having a
single basic block, with no optimization passes having been run on it.
All 4 restrictions (from TGSI-LLVM, no instrinsics, single BB and no
optimizations) can then be lifted in successive iterations.

Of course, yes, it has a different scope than the original proposal.

The problem I see is that since OpenCL will be hopefully done at some
point, then as you say TGSI-LLVM will also be done, and that will
probably make any other optimization work irrelevant.

So basically the r300 optimization work looks doomed from the
beginning to be eventually obsoleted.
That said, you may want to do it anyway.

But if you really want a quick fix for r300, seriously, just use the
nVidia Cg compiler.
It's closed source, but being produced by the nVidia team, you can
generally rely on it not sucking.
It takes GLSL input and spits out optimized ARB_fragment_program (or
optionally other languages) so it is trivial to interface with it.
It could even be useful to compare the output/performance of that with
a more serious LLVM-based solution, to make sure we get the latter
right.

For instance, personally, I did work on the nv30/nv40 shader assembler
(note the word assembler here), and haven't done anything more than
simple local transforms, for exactly this reason.

The only thing I've done for LLVM-TGSI is trying to recover Stephane
Marchesin's work on LLVM (forgot to CC him too), lost in an hard drive
crash, but failed to find anyone having pulled it.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler

2010-04-03 Thread Luca Barbieri
 So basically the r300 optimization work looks doomed from the
 beginning to be eventually obsoleted.

 Please consider there are hw-specific optimizations in place which I think
 no other compiler framework provides, and I believe this SSA thing will do

Sure, but it seemed to me that all the optimizations proposed were
hardware-independent and valid for any driver (other than having to
know about generic capabilities like having control flow or not).

 even better job for superscalar r600. So yes, we need both LLVM to do global
 optimizations and RC to efficiently map code to hw.

LLVM also uses SSA form (actually, it is totally built around it),
assuming that's what you meant.

There are doubts about whether the LLVM backend framework works well
for GPUs or not (apparently because some GPUs are VLIW and only IA-64
is VLIW too, so LLVM support for it is either nonexisting or not
necessary a major focus), but using LLVM-TGSI makes this irrelevant,
since the existing TGSI-accepting backend will still run.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] gallium-resources branch merge

2010-04-02 Thread Luca Barbieri
How about merging gallium-buffer-usage-cleanup as well, which is based
on gallium-resources?
Unless, it changed recently, the gallium-resources branch left a mix
of old PIPE_BUFFER_USAGE_* and new PIPE_TRANSFER_* flags.

It would nice to convert drivers having both branches, so that it is done once.
However, note that I may be misunderstanding the branches, correct me
if I'm worng.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] How do we init half float tables?

2010-04-02 Thread Luca Barbieri
 FWIW, I don't see any new s3tc formats. rgtc will not be handled by s3tc
 library since it isn't patent encumbered. util_format_is_s3tc will not
 include rgtc formats.
 (Though I guess that external decoding per-pixel is really rather lame,
 should do it per-block...)

Yes the other formats (rgtc and bptc) have no patent claims listed.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] How do we init half float tables?

2010-04-02 Thread Luca Barbieri
 So far, there are no dependencies on Gallium in core Mesa.

 We've talked about refactoring some of the Gallium code into a separate
 module that'd be sharable between Gallium and Mesa.  The format code would
 probably fit into that.

Can't we just unconditionally pull gallium/auxiliary in Mesa? (unused
stuff will be ignored by the linker due to .a behavior)

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] How do we init half float tables?

2010-04-02 Thread Luca Barbieri
What are you seeing a regression on?
texcompress and texcompsub seemed to work for me: I'll try to test
something else and recheck the code.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] How do we init half float tables?

2010-04-02 Thread Luca Barbieri
Sorry for the regression.
This whole thing was done to fix the u_gctors.cpp issue, originally
done by me, sent out without full testing since I saw duplicate work
being done, and then merged by Roland if I recall correctly.
I probably should not have fixed s3tc/util_format like it was done for
u_half and instead put it in a branch and sent it to the ML first.

Note that everything that reads pixels and does not call
util_format_s3tc_init (e.g. I think rbug tools) needs something like
this, or an explicit call which is likely to be forgotten (even
finding out everything that ends up calling util_format is
nontrivial).

Anyway, this patch fixes a couple of bugs that may have caused the regression.

How can I reproduce it locally?

The DXTn unit tests do fail, but the values have usually a difference
of 1, so I assume it's an approximation error.

commit 80214ef6265d406496dc4fd3c76d8ac782cd012b
Author: Luca Barbieri l...@luca-barbieri.com
Date:   Sat Apr 3 01:55:27 2010 +0200

gallium/util: fix inverted if is_nop logic in s3tc

diff --git a/src/gallium/auxiliary/util/u_format_s3tc.c
b/src/gallium/auxiliary/util/u_format_s3tc.c
index d48551f..7808210 100644
--- a/src/gallium/auxiliary/util/u_format_s3tc.c
+++ b/src/gallium/auxiliary/util/u_format_s3tc.c
@@ -303,7 +303,7 @@ util_format_dxt3_rgba_unpack_8unorm(uint8_t
*dst_row, unsigned dst_stride, const
 void
 util_format_dxt5_rgba_unpack_8unorm(uint8_t *dst_row, unsigned
dst_stride, const uint8_t *src_row, unsigned src_stride, unsigned
width, unsigned height)
 {
-   if (is_nop(util_format_dxt5_rgba_fetch)) {
+   if (!is_nop(util_format_dxt5_rgba_fetch)) {
   unsigned x, y, i, j;
   for(y = 0; y  height; y += 4) {
  const uint8_t *src = src_row;
@@ -324,7 +324,7 @@ util_format_dxt5_rgba_unpack_8unorm(uint8_t
*dst_row, unsigned dst_stride, const
 void
 util_format_dxt1_rgb_unpack_float(float *dst_row, unsigned
dst_stride, const uint8_t *src_row, unsigned src_stride, unsigned
width, unsigned height)
 {
-   if (is_nop(util_format_dxt1_rgb_fetch)) {
+   if (!is_nop(util_format_dxt1_rgb_fetch)) {
   unsigned x, y, i, j;
   for(y = 0; y  height; y += 4) {
  const uint8_t *src = src_row;

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] How do we init half float tables?

2010-04-02 Thread Luca Barbieri
 One more thing: I'm maintaining the u_format* modules. I'm not speaking the 
 just in the long term, but in the sense I'm actually working on this as we 
 speak.  Please do not make this kind of deep reaching changes to the u_format 
 stuff in master without clearing them first with me.

Yes sorry, it was an attempt to fix breakage originally caused by code
of mine that was sent out in a non-fully-mergeable state (to prevent
duplicate work on half float conversion) and got merged anyway.

Since master was already broken (due to u_gctors.cpp not being picked
up by ld), it seemed a good idea to try to fix it.

Unfortunately what seemed to be an easy fix gradually became something
much more invasive than originally envisioned.

After realizing the util_format_init thing wouldn't work out, I should
have made these call util_format_s3tc_init again (was changed so they
would init util_half as well) and then sent the util_foramt changes
for review.

I added a gallium-util-format-is-supported branch to hold the work and
the fix I just sent.
Sorry for not doing that in the first place.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] How do we init half float tables?

2010-04-02 Thread Luca Barbieri
The s3tc-teximage test seems fixed by the two line change I put in
gallium-util-format-is-supported.

s3tc-texsubimage prints:
Mesa: User error: GL_INVALID_VALUE in glTexSubImage2D(xoffset+width)
Probe at (285,12)
  Expected: 1.00 0.00 0.00
  Observed: 0.00 0.00 0.00

which seems to be due to a Mesa or testcase bug.

As for u_format_test.c, it looks like it simply fails to account for
DXTn being lossy.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] gallium raw interfaces

2010-04-01 Thread Luca Barbieri
 Once MS changes interfaces, then there's _no advantage_ to using DX10
 internally, regardless of what WINE does, and one might as well use
 OpenGL.  Wine doesn't change that.

[resent to ML, inadvertently replied only to Miles]

Note that my proposal was not to use DirectX 10 internally, but rather
to expose DirectX 10 and promote it initially as an API to test
Gallium and later as the preferred Linux graphics API instead of
OpenGL, for the technical reason that a DirectX 10 over Gallium
implementation carries much less performance overhead than an OpenGL
implementation and is much simpler, due to the superior design of
DirectX 10.

Using an extended version of DirectX 10 internally could also be an
option, but I don't think it's worth doing that right now and likely
it's not worth at all.

Also note that Microsoft does not use DirectX 10 or 11 internally
either, but rather uses the DirectX 10 DDI or DirectX 10 Device
Driver Interface, which is also publicly documented.

The last time Microsoft did an incompatible interface change (DX10),
it was to move away from fixed pipeline support with scattered state
towards a shader-only pipeline with constant state objects.

Exactly the same change was achieved by the move from the classic Mesa
architecture to the Gallium architecture: you could think of the move
to Gallium as switching to something like DX10 internally, done purely
for technical reasons, partially the same as the ones that prompted
Microsoft to make the transition.

Actually, while this is not generally explicitly stated by Gallium
designers, Gallium itself is generally evolving towards being closer
to DirectX 10.
The biggest deviations are additional features needed to support
OpenGL features not included in DirectX 10.

For instance, looking at recent changes:
- Vertex element CSOs, recently added, are equivalent to DX10 input layouts
- Sampler views, also recently added, are equivalent to DX10 shared
resource views
- Doing transfers per-context (recent work by Keith Whitwell) is what DX10 does
- Having a resource concept (also recent work by Keith Whitwell) is
what DX10 does
- Gallium format values where changed from self-describing to a set of
stock values like DX10
- Gallium format names where later changed and made identical to DX10
ones (except for the fact that the names of the former start with
PIPE_FORMAT_ and the ones of the latter with DXGI_FORMAT_, and the
enumerated values are different)
- It has been decided to follow the DX9 SM3/DX10 model for shader
semantic linkage as opposed to the OpenGL one

I recently systematically compared Gallium and DirectX 10, and found
them to be mostly equivalent, where the exceptions were usually either
additional features Gallium had for the sake of OpenGL, or Gallium
misdesigns that are being changed or looked into.

This is not likely for the sake of imitating Microsoft, but just
because they made a good design, having had made the decision to
redesign the whole API from scratch when making DirectX 10.
It's also probably because VMWare is apparently funding DirectX 10
support over Gallium, which obviously makes all discrepancies evident
for people working on that, and those are generally because DirectX 10
is better, leading those people to improve the Gallium design taking
inspiration from DirectX 10.

Presumably if Microsoft were to change interfaces incompatibly again
(notice that DX11 is a compatible change), Mesa would probably benefit
from introducing a further abstraction layer similar to new Microsoft
interface and have a Gallium-NewLayer module, since such a change
would most likely be a result of a paradigm shift in graphics hardware
itself (e.g. a switch to fully software-based GPUs like Larrabee).

Also, unless Microsoft holds patents to DirectX 10 (which would be a
showstopper, even though Gallium may violate them anyway), I don't see
any difference from having to implement DirectX 10 or OpenGL, or any
difference in openness of the APIs.
It is indeed possible to participate in the ARB standardization
process and some Mesa contributors/leaders do, but I'm not sure
whether this is particularly advantageous: decisions that work well
for Microsoft and Windows are also likely to work well for Linux/Mesa
since the hardware is the same and the software works mostly
equivalently.

And should some decisions not work well, it is technically trivial to
provide an alternative API.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Mesa (master): gallium/util: add fast half float conversion functions

2010-04-01 Thread Luca Barbieri
 This constructor scheme is not working for me. I think that's because
 there isn't any symbol here that's used elsewhere, hence the linker is
 not linking this file.

I replaced the system with a different mechanism.

It should now work correctly, but only GCC and MSVC are supported, and
the latter is untested.

 Please put copyright headers. *Especially* when basing your work from
 external references, as it gives the impression that this code was
 copied, and not your own creation.

Done: the code was a reimplementation from scratch of the code
provided by them, with slight changes.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


[Mesa3d-dev] How do we init half float tables?

2010-04-01 Thread Luca Barbieri
The half float conversion code initially used a global C++ constructor
to initialize the tables.

That fails to work since ld does not get the symbol from the shared
library, so I changed to use register a global constructor from C,
using GCC-  or MSVC-specific code.

This is not necessarily the best option, but clearly putting a check
in the functions as Corbin did is a bad idea performance-wise.

So, how should this be done?

Options are:
1. Revert Corbin Simpsons's commit and if anyone complains about an
unsupported compiler, implement UTIL_INIT for that compiler too
2a. Write the init module in C++ and use portable global constructor
syntax (but with other C++-related problems)
2b. Write an auxiliary file in C++ with a global constructor and
reference it from the C init file so the static linker pulls it from
the .a
3. Have all modules that need half float conversion directly or
indirectly call the init functions in their init code
4. Make the build pregenerate the tables and ship them in the executables

Option 1:
Pros: just works, other auxiliary modules can use the same system
Cons: need to write UTIL_INIT for each compiler, only GCC and MSVC
(and compatible ones) are currently supported

Option 2a:
Pros: no compiler-specific UTIL_INIT
Cons: significant code written in C++ instead of C and you must link
all targets with a C++ compiler or use compiler-specific options to
prevent stuff like the G++ personality causing the link to fail

Option 2b:
Like option 2a, but
Pro: less code written in C++
Con: need an extra C++ file for every module with data to be initialized

Option 3:
Pros: none of the cons of the other options
Cons: cumbersome to do, must not forget to call the init function or
you get silent corruption. Init calls creep through the whole
codebase.

Option 4:
Pros: no need to do anything at runtime, pages can be shared between OpenGL apps
Cons: need to write a special table generator, all DRI drivers get
10KB larger, solution does not apply to all similar problems

I would lean for either option 1 or option 4, perhaps option 4
considering there seems to be disagreement over option 1.
Option 4 however is likely not universally applicable (not everything
can necessarily be pregenerated), so I'd keep the UTIL_INIT code
anyway.

Which one do we pick?

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] How do we init half float tables?

2010-04-01 Thread Luca Barbieri
OK.

I'd like to add that u_atomic.h already requires either GCC, MSVC or
Solaris, and GCC and MSVC are already supported by this system.
Thus we do indeed now have a simple way to do global constructors,
that only removes support for non-GCC Solaris until someone figures
out how to do that.

And it's relatively simple, you just have to figure out the section
name of the global constructor table, and how to instruct the specific
compiler to put a variable in a specific section.

GCC even has __attribute__((constructor)) which does it all for you.

At any rate, util_format_s3tc_init has similar issues, and is
currently called from a few places.
I think the best thing to do to implement your suggestion is adding an
util_format_init that calls both init functions and leave the
UTIL_INIT code in place (since it seems we now got it right): it is
easy to remove by deleting u_init.h if desired.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] How do we init half float tables?

2010-04-01 Thread Luca Barbieri
Are you sure about this?

I've tried doing it, and it turns out that basically every Gallium
module needs this initialized.

For instance:
st/mesa due to glReadPixels
vg/mesa due to vgReadPixels
st/python due to mesa sampling
several programs in rbug to dump textures to disk
softpipe due to texture sampling
nv50 due to static attrbutes

Also, if translate did not needlessly duplicate the generic format
support, it would also need it, and draw would too.

Basically everything in Gallium will end up having util_format
initialized, and it seems there are at least 10 different places in
the code where such a call would need to be added (including strange
places like rbug with call pipe*tile* which calls util_format_read*).

I added it for nv50 before realizing the extent of the changes needed,
but now think it is not really a feasible solution.

In other words, I think this should be revisited as it results in
cluttering the codebase and creating a somewhat unreliable system.

I believe that we should either use the global constructor-like
technique I introduced, or do the following:
1. Pregenerate half float tables
2. Initialize the S3TC function pointers to stubs that dlopen the
library, initialize the function pointers to the real functions and
then delegate to the real function corresponding to the stub

More specifically, I think this two-step approach is superior to the
global constructor, but that the global constructor technique may be
useful in other cases where it is not possible to either pregenerate
or have a free initialization check due to the S3TC dynamic loading.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] gallium raw interfaces

2010-03-31 Thread Luca Barbieri
 WINE can deal with that. The real showstopper is that WINE has to also
 work on MacOS X and Linux + NVIDIA blob, where Gallium is unavailable.

We could actually consider making a Gallium driver that uses OpenGL to
do rendering.

If the app uses DirectX 10, this may not significantly degrade
performance, and should instead appreciably increase it if a Gallium
driver is available.

On the other hand, for DirectX 9 apps, this could decrease performance
significantly (because DirectX 9 has immediate mode and doesn't
require CSOs).

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


[Mesa3d-dev] Gallium half float conversion/support

2010-03-31 Thread Luca Barbieri
[sent to ML too]

Michal,
I noticed you made some commits related to half float support in Gallium.

I had already done this work and implemented a fast conversion
algorithm capable of handling all cases based on a paper cited in the
commit, but hadn't gotten around to proposing it yet.

I created a gallium-fast-half-float branch and put my work there, so
it may be useful to you.
Feel free to merge, rebase and/or adapt it against Mesa master.

The conversion function itself has been tested separately from
Gallium, but I haven't tested softpipe on fp16 data.

Ideally we should find a way to have Mesa use this improved converter
instead of the one it currently uses, but I'm not sure how to arrange
this with the current buildsystem.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Current tinderbox regression (swrastg_dri, ppc64)

2010-03-31 Thread Luca Barbieri
Should be fixed now.

BTW, if it is still not compiling due to the __sync* issues, try
adding CFLAGS=-march=v9 to the build: it should fix that.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] GSOC: Gallium R300 driver

2010-03-30 Thread Luca Barbieri
 Another idea was to convert TGSI to a SSA form. That would make unrolling
 branches much easier as the Phi function would basically become a linear
 interpolation, loops and subroutines with conditional return statements
 might be trickier. The r300 compiler already uses SSA for its optimization
 passes so maybe you wouldn't need to mess with TGSI that much...


 Is the conditional translation something that only needs to be done
 in the Gallium drivers, or would it be useful to apply the translation
 before the Mesa IR is converted into TGSI?  Are any of the other drivers
 (Gallium or Mesa) currently doing this kind of translation?

 Not that I know of. You may do it wherever you want theoretically, even in
 the r300 compiler and leaving TGSI untouched, but I think most people would
 appreciate if these translation were done in TGSI.

It would be nice to have a driver-independent TGSI optimization module.
It could either operate directly on TGSI (probably only good for
simple optimization), or convert to LLVM IR, optimize, and convert
back.

This would allow to use this for all drivers: note that at least
inlining and loop unrolling should generally be performed even for
hardware with full control flow support.
Lots of other optimizations would then be possible (using LLVM, with a
single line of code to request the appropriate LLVM pass), and would
automatically be available for all drivers, instead of being only
available for r300 by putting them in the radeon compiler.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] GSOC: Gallium R300 driver

2010-03-30 Thread Luca Barbieri
 There are several deep challenges in making TGSI - LLVM IR translation
 lossless -- I'm sure we'll get around to overcome them -- but I don't
 think that using LLVM is a requirement for this module. Having a shared
 IR for simple TGSI optimization module would go a long way by itself.

What are these challenges?
If you keep vectors and don't scalarize, I don't see why it shouldn't
just work, especially if you just roundtrip without running any
passes.
The DAG instruction matcher should be able to match writemasks,
swizzles, etc. fine.

Control flow may not be exactly reconstructed, but I think LLVM has
control flow canonicalization that should allow to reconstruct a
loop/if control flow structure of equivalent efficiency.

Using LLVM has the obvious advantage that all optimizations have
already been written and tested.
And for complex shaders, you may really need a good full optimizer
(that can do inter-basic-block and interprocedural optimizations,
alias analysis, advanced loop optmizations, and so on), especially if
we start supporting OpenCL over TGSI.

There is also the option of having the driver directly consume the
LLVM IR, and the frontend directly produce it (e.g. clang supports
OpenCL - LLVM).

Some things, like inlining, are easy to do directly in TGSI (but only
because all regs are global).
However, even determining the minimum number of loop iterations for
loop unrolling is very hard to do without a full compiler.

For instance, consider code like this:
if(foo = 6)
{
  if(foo == 1)
iters = foo + 3;
  else if(bar == 1)
iters = foo + 5 + bar;
  else
iters = foo + 7;

   for(i = 0; i  iters; ++i) LOOP_BODY;

}

You need a non-trivial optimizer (with control flow support, value
range propagation, and constant folding) to find out that the loop
always executes at least 12 iterations, which you need to know to
unroll it optimally.
More complex examples are possible.

It general, anything that requires (approximately) determining any
property of the program potentially benefits from having the most
complex and powerful optimizer available.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] GSOC: Gallium R300 driver

2010-03-30 Thread Luca Barbieri
DDX/DDY could cause miscompilation, but I think that only happens if
LLVM clones or causes some paths to net execute them.

Someone proposed some time ago on llvmdev to add a flag to tell llvm
to never duplicate an intrinsic, not sure if that went through (iirc,
it was for a barrier instruction that relied on the instruction
pointer).
Alternatively, it should be possible to just disable any passes that
clone basic blocks if those instructions are present.

The non-execution problem should be fixable by declaring DDX/DDY to
have global-write-like side effects (this will prevent dead code
elimination of them if they are totally unused, but hopefully shaders
are not written so badly they need that).

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] GSOC: Gallium R300 driver

2010-03-30 Thread Luca Barbieri
 On Tue, 2010-03-30 at 09:52 -0700, Luca Barbieri wrote:
  There are several deep challenges in making TGSI - LLVM IR translation
  lossless -- I'm sure we'll get around to overcome them -- but I don't
  think that using LLVM is a requirement for this module. Having a shared
  IR for simple TGSI optimization module would go a long way by itself.

 What are these challenges?

 - Control flow as you mentioned -- gets broken into jump spaghetti.

LoopSimplify seems to do at least some of the work for loops.

Not sure if there is an if-construction pass, but it should be relatively easy.

Once you have an acyclic CFG subgraph (which hopefully LoopSimplify
easily gives you), every basic block with more than one outedge will
need to have an if/else block generated.

Now find the first block in topological sort order such that any path
from the if start block reaches that block before any later ones in
topological sort order.
I think this is called the forward dominator, and LLVM should have
analysis that gives you that easily.

After that, just duplicate the CFG between the if block start and the
forward dominator to build each branch of the if, and recursively
process the branches.

If you have a DDX/DDY present in multiple if parts, you are screwed,
but that won't happen without optimization and hopefully you can tune
fragment program optimization so that doesn't happen at all.

 - Predicates can't be represented -- you need to use AND / NAND masking.
 I know people have asked support for this in the LLVM list so it might
 change someday.

For the LLVM-TGSI part, x86 has condition codes.
Not sure how LLVM represents them, but I suppose predicates can be
handled in the same way

Multiple predicate registers may not work well, but GPUs probably
don't have them in hardware anyway (e.g. nv30/nv40 only have one or
two).

For the TGSI-LLVM part, Mesa never outputs predicates afaik.

 - missing intrinsics -- TGSI has a much richer instruction set than
 LLVM's builtin instructions, so it would be necessary to add several
 llvm.tgsi.xxx instrinsics (e.g., for max, min, madd, exp2, log2, etc),
 and teach LLVM to do constant propagation for every single one of them.

Yes, of course.
Initially you could do without constant propagation.
Also again x86/SSE has many of the same intrinsics, so their approach
can be imitated.

I think MAD can be handled by mul + add, if you don't care about
whether an extra rounding is done or not (and I think, for GPU
shaders, it's not really a high priority issue).
Anyway SSE5 has fused multiply/add, so LLVM has/will have a way.

 - Constants -- you often want to make specialized version of shaders for
 certain constants, especially when you have control flow statements
 whose arguments are constants (e.g., when doing TNL with a big glsl
 shader), and therefore should be factored out. You also may want to do
 factor out constant operations (e.g., MUL TEMP[1], CONST[0], CONST[1])
 But LLVM can't help you with that given that for LLVM IR constants are
 ordinary memory, like the inputs. LLVM doesn't know that a shader will
 be invoked million of times with the same constants but varying inputs.

If you want to do that, you must of course run LLVM for each constant
set, telling it what the constant values are.
You can probably identify branch-relevant constant from the LLVM SSA
form to restrict that set.

For the MUL TEMP[1], CONST[0], CONST[1], I suppose you could enclose
the shader code in a big loop to simulate the rasterizer.

LLVM will them move the CONST[0] * CONST[1] outside the loop, and you
can codegen the part outside the loop using an LLVM CPU backend.

In this case, using LLVM will give you automatic pre-shader
generation for the CPU mostly for free.

Alternatively, you could have a basic IF-simplifier on TGSI that only
supports the conditional being the comparison of a constant to
something else (using the rasterizer loop trick can allow you to get
simpler conditionals).

 If people can make this TGSI optimization module work quickly on top of
 LLVM then it's fine by me. I'm just pointing out that between the
 extreme of sharing nothing between each pipe driver compiler, and
 sharing everything with LLVM, there's a middle ground which is sharing
 between pipe drivers but not LLVM.  Once that module exists having it
 use LLVM internally would then be pretty easy.  It looks to me a better
 way to parallize the effort than to be blocked for quite some time on
 making TGSI - LLVM IR be lossless.

Yes, sure, a minimal module can be written first and then LLVM use can
be investigated later.

In other words, it's not necessarily trivial, but definitely seems doable.
In particular getting it to work on  anything non-GLSL should be
relatively straightforward.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications

Re: [Mesa3d-dev] gallium raw interfaces

2010-03-30 Thread Luca Barbieri
An interesting option could be to provide a DirectX 10 implementation
using TGSI text as the shader interface, which should be much easier
than one would think at first.

DirectX 10 + TGSI text would provide a very thin binary compatible
layer over Gallium, unlike all existing state trackers.

It could even run Windows games if integrated with Wine and something
producing TGSI from either HLSL text or D3D10 bytecode (e.g. whatever
Wine uses to produce GLSL + the Mesa GLSL frontend + st_mesa_to_tgsi).

In fact, given the Gallium architecture, it may even make sense to
support a variant of DirectX 10 as the main Mesa/Gallium API on all
platfoms, instead of OpenGL.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [Nouveau] [radeonhd] Re: Status of s3tc patent in respect to open-source drivers and workarounds

2010-03-29 Thread Luca Barbieri
Interestingly, the post-trial judge opinion at
http://wi.findacase.com/research/wfrmDocViewer.aspx/xq/fac.%5CFDCT%5CWWI%5C2008%5C20080801_734.WWI.htm/qx
contains the following text:


Plaintiff’s expert, Dr. Stevenson, testified that the ‘327 patent is
directed to “a special
purpose hardware component designed and optimized specifically for
high speed graphics
processing. 
The specification makes it plain that the invention does not relate to
software for graphics. As the inventors noted, such programs “are well
known in the art.
[...]
Claim 17 does not say in so many words that the method it discloses is
a rasterization
circuit operating on a floating point format, but that is what it describes.
Reading the disputed claims as disclosing hardware is not reading a
preferred embodiment in the claims; it is simply
reading the claims as the person of ordinary skill would read a patent
directed to special purpose hardware.


This seems to indicate that it would be safe to implement floating
point textures/framebuffers in Mesa, as both SGI and ATI and the court
seemed to agree that the patent applies specifically to hardware.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Status of s3tc patent in respect to open-source drivers and workarounds

2010-03-28 Thread Luca Barbieri
If the application provides s3tc-encoded data through
glCompressedTexImage (usually loaded from a pre-compressed texture
stored on disk), Mesa will pass it unaltered to the graphics card (as
long as the driver/card supports DXT* format ids) and will not need to
use any encoding or decoding algorithms.

The problem is that if the application supplies uncompressed data,
Mesa would need to run an encoding algorithm to be able to use
compressed textures.

Conversely, if software rendering is necessary, and the application
provides compressed textures, Mesa will need to run a decoding
algorithm to be able to sample from the texture.

So the workaround (and what commercial games usually do) is to ship
pre-compressed textures along with the game, as well as uncompressed
textures in case the card/rendered do not support texture compression.
An end-user side solution is to download, compile and install
libtxc_dxtn.so, which Mesa will use if present to decode and encode
compressed textures.

Furthermore, a GPU can be used to do decoding using its native
sampling support, and some may also support encoding.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Current tinderbox regression (swrastg_dri, sparc64)

2010-03-28 Thread Luca Barbieri
On Sun, Mar 28, 2010 at 7:36 PM, Chris Ball c...@laptop.org wrote:
 Hi,

    http://tinderbox.x.org/builds/2010-03-25-0018/logs/libGL/#build
   
    swrastg_dri.so.tmp: undefined reference to `__sync_sub_and_fetch_4'
    swrastg_dri.so.tmp: undefined reference to `__sync_add_and_fetch_4'

 This regression is still present -- could we get a fix or a revert?

I believe the problem is that sparc does not support atomic operations
in the basic architecture: I think someone who knows about sparc and
has such a machine should look into it.

If you don't know anything about sparc, try rebuilding with the
highest possible sparc -march= level and if that fixes the problem,
perform a binary search to find the minimum one, and then report the
results.

If it does not solve the problem, see if anything in /lib or /usr/lib
exports those symbols.

Also maybe check whether the built swrastg_dri or xlib softpipe
actually works there.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] gallium raw interfaces

2010-03-28 Thread Luca Barbieri
I posted something similar some time ago, that however could use
hardware accelerated drivers with DRI2 or KMS, provided a substantial
set of helpers and offered a complement of 3 demos.

My solution to window-system issues was to simply have the application
provide a draw callback to the framework, which would automatically
create a maximized window with the application name in the title, and
call draw in a loop, presenting the results.

Then I had a path that woud use the X DRI2 interface if possible, and
another path that would use the Linux DRM KMS API (and initially some
EGL+ad-hoc extension paths that were later dropped).

It no longer works due to Gallium interface changes, but maybe it can
be resurrected and merged with graw.

However, there is a disadvantage to having Gallium programs in-tree:
they break every time the Gallium interface in changed and avoiding
that means that in addition to fixing all drivers and state trackers,
you also need to fix all programs for each change

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Status of s3tc patent in respect to open-source drivers and workarounds

2010-03-28 Thread Luca Barbieri
I just noticed a potentially interesting thing.

nVidia publishes under the MIT license a software suite called nVidia
texture tools.
This includes a library called nvtt that provides DXT* compression,
plus a library called nvimage that provides decompression.
It looks like the libraries can be used unmodified and nVidia is
almost surely a licensee of that patent.

So, if using and shipping a possibly-patent-covered library published
by a patent licensee does not risk violating the patent, we may have a
workable solution.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Rationale of gallium-resources changes?

2010-03-27 Thread Luca Barbieri
 To me this speaks to another aspect of the gallium interface which is
 a bit odd -- in particular the way several of our interfaces basically
 copy their inputs into a structure and pass that back to the state
 tracker.  Why are we doing that?  The state tracker already knows what
 it asked us to do, and there is no reason to assume that it needs us
 to re-present that information back to it.

Yes, only the CSOs don't have this form of copying: all other
structures include the input parameters there.

As a random example pipe_sampler_view has the lots of parameters that
a driver would have converter into the hardware format and are thus
redundant, and unlikely to be read by state tracker.

Textures and buffers also have many visible data members that the
state tracker may or may not read.
In particular, the Mesa state tracker already keeps everything in the
Mesa internal structures, and so benefits little from such data

We may want to consider going toward making _all_ Gallium structures
opaque (and, by the way, using declared-only structs instead of void*
like we do for CSOs, which are not checkable by the compiler).


Another serious data duplication issue are drivers that just copy the
input state in internal structures and return, to then process
everything in draw calls.

This usually results in state being duplicated (and copied) 3 times:
in Mesa internal structures, in the state tracker structures and then
in the driver.
The draw module may also keep a 4th copy of the state.
Note that when reference counting is involved, copies are even more
expensive since they now need atomic operations.

Usually drivers do this because:
1. They need to pass data to the draw module in case of fallbacks, and
thus cannot send it to hardware and forget about it
2. They need to recreate the whole hardware context state in some cases
3. They multiplex multiple pipe_contexts on a single screen
4. They need a global view of state, rather than a single state change
at a time, to decide what to do

A possible solution is to remove all set_* and bind_* calls and
replace them with data members of pipe_context that the state tracker
would use instead of its own internal structures.

In addition, and a new what's new bitmask would be added, and the
driver would check it on draw calls.

Performance-wise, this replaces num_state_changes dynamic function
calls to the driver, with (log2(total_states) + num_state_changes)
branches to check the what's new bitmask.

Furthermore:
1. State is never copied, since the state tracker constructs it in place
2. There is no longer any need for state save helper in the blitter
module and similar
3. The draw module can potentially directly read state from
pipe_context instead of duplicating it yet a
4. Drivers no longer need to have all the functions that store the
parameters, set a dirty flag and return

Note that the Direct3D DDI does not do this, but they have to keep
binary compatibility, which is easier with Set* calls than this
scheme.

softpipe, nvfx, nv50, r300 and probably others already do this
internally, and having the state tracker itself construct the data
would remove a lot of redundant copying code and increase performance.

Having drivers capable of doing send-to-hardware-and-forget-about-it
on arbitrary state setting could be a nice thing instead, but
unfortunately a lot of hardware fundamentally can't do this, since for
instance:
1. Shaders need to be all seen to be linked, possibly modifying the
shaders themselves (nv30)
2. Constants need to be written directly into the fragment program (nv30-nv40)
3. Fragment programs depend on the viewport to implement
fragment.position (r300)
4. Fragment programs depend on bound textures to specify normalization
type and emulate NPOT (r300, r600?, nv30)
and so on...
5. Sometimes sampler state and textures must be seen together since
the hardware mixes it


 The only really new information provided by the driver to the state
 tracker by transfer_create + transfer_map is:
 - the pointer to the data
 - stride
 - slice stride

There is also the 3D box, unless transfers start covering the whole
resource, which seems really suboptimal for stuff like glTexSubImage.

This needs to be provided to the driver unless a buffer-specialized
interface is made (then a 1D box is enough).


 Thanks for the summary.  I'd add that there is also some information
 available publicly about the D3D10 DDI, which follows a slightly
 different interface to the API.  In that world, there is a single
 create resource function:

It is indeed extremely interesting, and it looks like it should be the
first place to look for inspiration for Gallium interface.

I added a comparison of the D3D11 DDI and Gallium to src/gallium/docs.

 There is however clearly concern about the possible need for
 specialized transfer mechanisms for particular buffer types.  It seems
 like they've taken an approach that leaves the choice to the driver
 whether to specialize or not -- 

Re: [Mesa3d-dev] Current tinderbox regression (swrastg_dri, sparc64)

2010-03-25 Thread Luca Barbieri
Are you sure that swrastg and/or any Gallium driver actually load
correctly and work on sparc64?

This seems to indicate that they use __sync_add_and_fetch_4 assuming
it is a GCC builtin, but GCC does not implement it as a builtin on
sparc64 and neither libgcc nor libc have an implementation of the
function.

I don't know anything about sparc64, but according to the linux
kernel, I vaguely guess that specifying an high enough -march= to gcc
could solve it by enabling use of atomic instructions that are
otherwise are not used.

The root cause is likely that we set PIPE_ATOMIC_GCC_INTRINSIC even
though not all __sync builtins are actually supported: we should
probably fix that.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Rationale of gallium-resources changes?

2010-03-24 Thread Luca Barbieri
Thanks for providing a long insightful reply.

 Transfers can then be split in texture transfers and buffer transfers.
 Note that they are often inherently different, since one often uses
 memcpy-like GPU functionality, and the other often uses 2D blitter or
 3D engine functionality (and needs to worry about swizzling or tiling)
 Thus, they are probably better split and not unified.

 My experience is that there is more in common than different about the
 paths.  There are the same set of constraints about not wanting to
 stall the GPU by mapping the underlying storage directly if it is
 still in flight, and allocating a dma buffer for the upload if it is.
 There will always be some differences, but probably no more than the
 differences between uploading to eg a constant buffer and a vertex
 buffer, or uploading to a swizzled and linear texture.

The considerations you mentioned are indeed common between buffers and
textures, but the actual mechanisms for performing the copy are often
significantly different.

For instance, r300g ends up calling the 3D engine via
surface_copy-util_blitter for texture transfers, which I suppose it
wouldn't do for buffer transfers.

nv30/nv40 don't have a single way to deal with swizzled textures, and
the driver must choose between many paths depending on whether  the
source/destination is swizzled or not, a 3D texture or not, and even
its alignment or pitch (the current driver doesn't do fully that, and
is partially broken for this reason).
Buffers can instead be copied very simply with MEMORY_TO_MEMORY_FORMAT.

nv50 does indeed have a common copy functionality that can handle all
buffers and textures in a unified way (implemented as a revamped
MEMORY_TO_MEMORY_FORMAT).
However, an additional buffer-only path would surely be faster than
going through the common texture path.
In particular, for buffers tile_flags are always 0 and height is
always 1, allowing to write a significantly simplified buffer-only
version of nv50_transfer_rect_m2mf with no branches and no
multiplications at all.

In other words, I think most drivers would be better off implementing
unified transfers with an if switching between a buffer and a
texture path, so it may be worth using two interfaces.

Also note that a buffer-only interface is significantly simplified
since you don't need to specify:
- face
- level
- zslice
- y
- height
- z
- depth
- stride
- slice stride

While this may seem a micro-optimization, note that 3D applications
often spend all the time running the OpenGL driver and Mesa/Gallium
functions are already too heavy in profiles, so I think it's important
to always keep CPU performance in mind.

The code is also streamlined and easier to follow if it does not have
to default-initialize a lot of stuff.

An utility function calling the right interface can be created for
state trackers that really need it (maybe Direct3D10, if the driver
interface follows the user API).

 In DX they have
 different nomenclature for this - the graphics API level entities are
 resources and the underlying VMM buffers are labelled as allocations.
 In gallium, we're exposing the resource concept, but allocations are
 driver-internal entities, usually called winsys_buffers, or some
 similar name.

D3D10 uses buffers, sampler views and render target views as entities
bindable to the pipeline, and the latter are constructed over either
textures or buffers.
Note however, that the description structure is actually different
in the buffer and texture cases.

For render target views, they are respectively D3D10_BUFFER_RTV and
D3D10_TEX2D_RTV (and others for other texture types).
The first specifies an offset and stride, while the second specifies a
mipmap level.
Other views have similar behavior.

Buffers are directly used in the interfaces that allow binding
vertex/index/constant buffers.

Both buffers and textures are subclasses of ID3D10Resource, which is
used by CopyResource, CopySubresourceRegion and UpdateSubresource,
which provide a subset of the Gallium transfer functionality in
gallium-resources.

Note however that the two resources specified to CopyResource and
CopySubresourceRegion must be of the same type.

So in summary, D3D10 does indeed in some sense go in the
buffer/texture unification, but with some important differences:
1. Buffers and textures still exists as separate types. Note that
there is no texture type, but rather a separate interface for each
texture type, which directly inherits from ID3D10Resource
2. Textures are never used directly by the pipeline, but rather
through views which have texture-type-specific creation methods and
have separate interfaces
3. Buffers are directly used by the pipeline for vertex, index and
constant buffers
4. Resources are used in copying and transfer functionality
5. D3D10 has a more memory-centric view of resources, providing for
instance a D3D10_USAGE_STAGING flag, for A resource that supports
data transfer (copy) from the GPU to the CPU.

D3D11 seems to 

Re: [Mesa3d-dev] Segfault on glClear of non-existent stencil buffer caused by bd1ce874728c06d08a1f9881f51edbdd2f1c9db0

2010-03-23 Thread Luca Barbieri
We have a visual haveStencilBuffer == 1 but stencilBits == 0 (and no
stencil renderbuffer), which I suppose shouldn't be happening.
visualid and fbconfigid are also 0.

Here is the full structure:
$1 = {next = 0x0, rgbMode = 1 '\001', floatMode = 0 '\000',
colorIndexMode = 0 '\000', doubleBufferMode = 1, stereoMode = 0,
haveAccumBuffer = 0 '\000', haveDepthBuffer = 1 '\001',
  haveStencilBuffer = 1 '\001', redBits = 8, greenBits = 8, blueBits =
8, alphaBits = 8, redMask = 0, greenMask = 0, blueMask = 0, alphaMask
= 0, rgbBits = 32, indexBits = 0, accumRedBits = 0,
  accumGreenBits = 0, accumBlueBits = 0, accumAlphaBits = 0, depthBits
= 24, stencilBits = 0, numAuxBuffers = 0, level = 0, pixmapMode = 0,
visualID = 0, visualType = 0, visualRating = 0, transparentPixel = 0,
  transparentRed = 0, transparentGreen = 0, transparentBlue = 0,
transparentAlpha = 0, transparentIndex = 0, sampleBuffers = 0, samples
= 0, drawableType = 0, renderType = 0, xRenderable = 0, fbconfigID =
0,
  maxPbufferWidth = 0, maxPbufferHeight = 0, maxPbufferPixels = 0,
optimalPbufferWidth = 0, optimalPbufferHeight = 0, visualSelectGroup =
0, swapMethod = 0, screen = 0, bindToTextureRgb = 0,
  bindToTextureRgba = 0, bindToMipmapTexture = 0, bindToTextureTargets
= 0, yInverted = 0}

BTW, what's the purpose of having haveStencilBuffer at all? Isn't
checking stencilBits != 0 enough?

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Segfault on glClear of non-existent stencil buffer caused by bd1ce874728c06d08a1f9881f51edbdd2f1c9db0

2010-03-23 Thread Luca Barbieri
The problem seems to be in st_manager.c:
   if (visual-depth_stencil_format != PIPE_FORMAT_NONE) {
  mode-haveDepthBuffer = GL_TRUE;
  mode-haveStencilBuffer = GL_TRUE;

  mode-depthBits =
 util_format_get_component_bits(visual-depth_stencil_format,
   UTIL_FORMAT_COLORSPACE_ZS, 0);
  mode-stencilBits =
 util_format_get_component_bits(visual-depth_stencil_format,
   UTIL_FORMAT_COLORSPACE_ZS, 1);
   }

This sets haveStencilBuffer even for depth-only buffers.

How about fixing this to set haveDepthBuffer and haveStencilBuffer
only if bits  0, and later removing haveStencilBuffer,
haveDepthBuffer and haveAccumBuffer in favor of just testing the *bits
counterparts?

BTW, what if we have a floating-point depth buffer, or, say, a shared
exponent floating-point color buffer? How do we represent that with
the visual structure?
Shouldn't we be using the actual formats instead of this *bits stuff,
maybe by having Mesa look at its internal structures instead of a
GLXVisual-like struct?

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Current tinderbox regression (dri)

2010-03-23 Thread Luca Barbieri
The issue should be hopefully completely fixed by
7e246e6aa63979d53731a591f4caee3651c1d96b.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Current tinderbox regression (dri)

2010-03-23 Thread Luca Barbieri
On Tue, Mar 23, 2010 at 10:45 PM, Sedat Dilek
sedat.di...@googlemail.com wrote:
The issue should be hopefully completely fixed by
7e246e6aa63979d53731a591f4caee3651c1d96b.

 Unfortunately, build breaks here.
 Not sure which of the last changes really breaks it.

Hopefully fixed that too now.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Current tinderbox regression (dri: fix dri_test.c for non-TLS build)

2010-03-23 Thread Luca Barbieri
According to the logs, that build was not based on that commit, which
instead actually fixes that issue.

http://tinderbox.x.org/builds/2010-03-23-0040/ was actually the first
tinderbox build using that, and it went past that issue to fail on
xeglthreads problem, which is unrelated.

Thanks anyway for reporting this.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] DRI SDK and modularized drivers.

2010-03-19 Thread Luca Barbieri
 It may seem e.g. like the DRM interface is the worst because of rather large 
 threads caused by certain kernel developer's problems, but that doesn't mean 
 problems wouldn't be created by splitting other areas.

This would probably be best solved by merging libdrm into the Linux kernel tree.
Ingo Molnar's rationale for having tools/perf in the kernel tree
applies even more in this case.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] dri: test whether the built drivers have unresolved symbols

2010-03-19 Thread Luca Barbieri
How about applying this?

It should prevent introducing regressions similar to ones that
happened in the past, with very little downside.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] dri: test whether the built drivers have unresolved symbols

2010-03-19 Thread Luca Barbieri
 Can we just put this program in the demos? Or at least just make it a
 separate target (make test-link)? It seems excessive to make this part
 of the default build path.

The whole purpose is to run this as part of the standard build, so
that the build fails if any driver is unloadable, (i.e. a modification
to it was botched) and the tree hopefully doesn't get pushed to
master.

You can test it separately by just running glxinfo/glxgears, obviously.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] dri: test whether the built drivers have unresolved symbols

2010-03-19 Thread Luca Barbieri
 For developers that makes a lot of sense, but I've never seen any
 other projects impose this type of thing on regular users.

Why do you see it as an onerous imposition?
It just tries to compile a program linked with a couple of libraries
(the DRI driver, plus libGL) and makes the build fail if that fails.
It doesn't even execute the built program (and could not always do so
even if it were desired, since you could be cross-compiling).

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


[Mesa3d-dev] [PATCH] nv40: remove leftover nv40_transfer.c from unification into nvfx

2010-03-15 Thread Luca Barbieri
---
 src/gallium/drivers/nv40/nv40_transfer.c |  181 --
 1 files changed, 0 insertions(+), 181 deletions(-)
 delete mode 100644 src/gallium/drivers/nv40/nv40_transfer.c

diff --git a/src/gallium/drivers/nv40/nv40_transfer.c 
b/src/gallium/drivers/nv40/nv40_transfer.c
deleted file mode 100644
index 3d8c8e8..000
--- a/src/gallium/drivers/nv40/nv40_transfer.c
+++ /dev/null
@@ -1,181 +0,0 @@
-#include pipe/p_state.h
-#include pipe/p_defines.h
-#include util/u_inlines.h
-#include util/u_format.h
-#include util/u_memory.h
-#include util/u_math.h
-#include nouveau/nouveau_winsys.h
-#include nv40_context.h
-#include nvfx_screen.h
-#include nvfx_state.h
-
-struct nv40_transfer {
-   struct pipe_transfer base;
-   struct pipe_surface *surface;
-   boolean direct;
-};
-
-static void
-nv40_compatible_transfer_tex(struct pipe_texture *pt, unsigned width, unsigned 
height,
- struct pipe_texture *template)
-{
-   memset(template, 0, sizeof(struct pipe_texture));
-   template-target = pt-target;
-   template-format = pt-format;
-   template-width0 = width;
-   template-height0 = height;
-   template-depth0 = 1;
-   template-last_level = 0;
-   template-nr_samples = pt-nr_samples;
-
-   template-tex_usage = PIPE_TEXTURE_USAGE_DYNAMIC |
- NOUVEAU_TEXTURE_USAGE_LINEAR;
-}
-
-static struct pipe_transfer *
-nv40_transfer_new(struct pipe_context *pcontext, struct pipe_texture *pt,
- unsigned face, unsigned level, unsigned zslice,
- enum pipe_transfer_usage usage,
- unsigned x, unsigned y, unsigned w, unsigned h)
-{
-struct pipe_screen *pscreen = pcontext-screen;
-   struct nvfx_miptree *mt = (struct nvfx_miptree *)pt;
-   struct nv40_transfer *tx;
-   struct pipe_texture tx_tex_template, *tx_tex;
-
-   tx = CALLOC_STRUCT(nv40_transfer);
-   if (!tx)
-   return NULL;
-
-   pipe_texture_reference(tx-base.texture, pt);
-   tx-base.x = x;
-   tx-base.y = y;
-   tx-base.width = w;
-   tx-base.height = h;
-   tx-base.stride = mt-level[level].pitch;
-   tx-base.usage = usage;
-   tx-base.face = face;
-   tx-base.level = level;
-   tx-base.zslice = zslice;
-
-   /* Direct access to texture */
-   if ((pt-tex_usage  PIPE_TEXTURE_USAGE_DYNAMIC ||
-debug_get_bool_option(NOUVEAU_NO_TRANSFER, TRUE/*XXX:FALSE*/)) 
-   pt-tex_usage  NOUVEAU_TEXTURE_USAGE_LINEAR)
-   {
-   tx-direct = true;
-   tx-surface = pscreen-get_tex_surface(pscreen, pt,
-  face, level, zslice,
-  
pipe_transfer_buffer_flags(tx-base));
-   return tx-base;
-   }
-
-   tx-direct = false;
-
-   nv40_compatible_transfer_tex(pt, w, h, tx_tex_template);
-
-   tx_tex = pscreen-texture_create(pscreen, tx_tex_template);
-   if (!tx_tex)
-   {
-   FREE(tx);
-   return NULL;
-   }
-
-   tx-base.stride = ((struct nvfx_miptree*)tx_tex)-level[0].pitch;
-
-   tx-surface = pscreen-get_tex_surface(pscreen, tx_tex,
-  0, 0, 0,
-  
pipe_transfer_buffer_flags(tx-base));
-
-   pipe_texture_reference(tx_tex, NULL);
-
-   if (!tx-surface)
-   {
-   pipe_surface_reference(tx-surface, NULL);
-   FREE(tx);
-   return NULL;
-   }
-
-   if (usage  PIPE_TRANSFER_READ) {
-   struct nvfx_screen *nvscreen = nvfx_screen(pscreen);
-   struct pipe_surface *src;
-
-   src = pscreen-get_tex_surface(pscreen, pt,
-  face, level, zslice,
-  PIPE_BUFFER_USAGE_GPU_READ);
-
-   /* TODO: Check if SIFM can deal with x,y,w,h when swizzling */
-   /* TODO: Check if SIFM can un-swizzle */
-   nvscreen-eng2d-copy(nvscreen-eng2d,
- tx-surface, 0, 0,
- src, x, y,
- w, h);
-
-   pipe_surface_reference(src, NULL);
-   }
-
-   return tx-base;
-}
-
-static void
-nv40_transfer_del(struct pipe_context *pcontext, struct pipe_transfer *ptx)
-{
-   struct nv40_transfer *tx = (struct nv40_transfer *)ptx;
-
-   if (!tx-direct  (ptx-usage  PIPE_TRANSFER_WRITE)) {
-   struct pipe_screen *pscreen = pcontext-screen;
-   struct nvfx_screen *nvscreen = nvfx_screen(pscreen);
-   struct pipe_surface *dst;
-
-   dst = pscreen-get_tex_surface(pscreen, ptx-texture,
-  ptx-face, ptx-level, 
ptx-zslice,
-

Re: [Mesa3d-dev] undefined symbols and silent fallback to swrast

2010-03-15 Thread Luca Barbieri
Adding both -Wl,--no-undefined and -lGL (in
src/gallium/winsys/drm/Makefile.template) seems to work for me.

The driver loader is already loading libGL.so.1 with RTLD_NOW |
RTLD_GLOBAL, so I think that the DRI driver depending on libGL.so.1
shouldn't introduce any issue.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


[Mesa3d-dev] [PATCH] dri: link DRI drivers with -Wl, --no-undefined -lGL

2010-03-15 Thread Luca Barbieri
Right now undefined symbols in DRI drivers will still allow the
build to succeed.

As a result, people modifying drivers they cannot test risk creating
unloadable drivers with no easy way of automatically avoiding it.

For instance, the modifications to nv50 for context transfers caused
such an issue recently.

The fix is to build DRI drivers with -Wl,--no-undefined -lGL which
will cause make to fail in such cases.

Note that this introduces a dependency from the DRI drivers on libGL.so.1.
However, the driver loader calls dlopen on libGL.so.1 with
RTLD_GLOBAL | RTLD_NOW before loading any DRI driver, so the added
dependency shouldn't cause changes in runtime behavior.

Please double-check the correctness of this assumption before pushing.

All classic DRI drivers as well as all the Gallium drivers with configure
options compiled successfully with this change.

Thanks to Xavier Chantry chantry.xav...@gmail.com for helping with this.
---
 src/gallium/winsys/drm/Makefile.template |4 ++--
 src/mesa/drivers/dri/Makefile.template   |2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/winsys/drm/Makefile.template 
b/src/gallium/winsys/drm/Makefile.template
index f4cc0de..326cd59 100644
--- a/src/gallium/winsys/drm/Makefile.template
+++ b/src/gallium/winsys/drm/Makefile.template
@@ -66,9 +66,9 @@ default: depend symlinks $(TOP)/$(LIB_DIR)/gallium/$(LIBNAME)
 $(LIBNAME): $(OBJECTS) $(MESA_MODULES) $(PIPE_DRIVERS) Makefile \
$(TOP)/src/mesa/drivers/dri/Makefile.template
$(MKLIB) -o $@ -noprefix -linker '$(CC)' -ldflags '$(LDFLAGS)' \
-   $(OBJECTS) $(PIPE_DRIVERS) \
+   -Wl,--no-undefined $(OBJECTS) $(PIPE_DRIVERS) \
 -Wl,--start-group $(MESA_MODULES) -Wl,--end-group \
- $(DRI_LIB_DEPS) $(DRIVER_EXTRAS)
+ $(DRI_LIB_DEPS) $(DRIVER_EXTRAS) -L$(TOP)/lib -lGL
 
 $(TOP)/$(LIB_DIR)/gallium:
mkdir -p $@
diff --git a/src/mesa/drivers/dri/Makefile.template 
b/src/mesa/drivers/dri/Makefile.template
index a0c25d2..dcffa70 100644
--- a/src/mesa/drivers/dri/Makefile.template
+++ b/src/mesa/drivers/dri/Makefile.template
@@ -53,7 +53,7 @@ lib: symlinks subdirs depend
 $(LIBNAME): $(OBJECTS) $(MESA_MODULES) $(EXTRA_MODULES) Makefile \
$(TOP)/src/mesa/drivers/dri/Makefile.template
$(MKLIB) -o $@ -noprefix -linker '$(CC)' -ldflags '$(LDFLAGS)' \
-   $(OBJECTS) $(MESA_MODULES) $(EXTRA_MODULES) $(DRI_LIB_DEPS)
+   -Wl,--no-undefined $(OBJECTS) $(MESA_MODULES) $(EXTRA_MODULES) 
$(DRI_LIB_DEPS) -L$(TOP)/lib -lGL
 
 
 $(TOP)/$(LIB_DIR)/$(LIBNAME): $(LIBNAME)
-- 
1.6.3.3


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] nv30/nv40 Gallium drivers unification

2010-03-14 Thread Luca Barbieri
Perhaps try running make clean and make if you haven't already?
And perhaps make sure that the installed libGL.so and DRI drivers are
build from the same codebase.

The changes in my branch definitely shouldn't affect this.

 I wanted to merge Lucsa' branch in to my copy of Mesa master to test it
 out,but it would let me for some reason,any advice on that?
What reason?
git merge should work.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Mesa (master): st/mesa: Always recalculate invalid index bounds.

2010-03-13 Thread Luca Barbieri
 But for any such technique, the mesa state tracker will need to figure
 out what memory is being referred to by those non-VBO vertex buffers
 and to do that requires knowing the index min/max values.

Isn't the min/max value only required to compute a sensible value for
the maximum user buffer length? (the base pointer is passed to
gl*Pointer)

The fact is, that we don't need to know how large the user buffer is
if the CPU is accessing it (or if we have a very advanced driver that
faults memory in the GPU VM on demand, and/or a mechanism to let the
GPU share the process address space).
As you said, this happens for instance  with swtnl, but also with
drivers that scan the index buffer and copy the referenced vertex for
each index onto the GPU FIFO themselves (e.g. nv50 and experimental
versions of nv30/nv40).

So couldn't we pass ~0 or similar as the user buffer length, and have
the driver use an auxiliary module on draw calls to determine the real
length, if necessary?
Of course, drivers that upload user buffers on creation (if any
exists) would need to be changed to only do that on draw calls.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


[Mesa3d-dev] [PATCH] nv30/nv40 Gallium drivers unification

2010-03-13 Thread Luca Barbieri
Currently the nv30 and nv40 Gallium drivers are very similar, and
contain about 5000 lines of essentially duplicate code.

I prepared a patchset (which can be found at
http://repo.or.cz/w/mesa/mesa-lb.git/shortlog/refs/heads/unification+fixes)
which gradually unifies the drivers, one file per the commit.

A new nvfx directory is created, and unified files are put there one by one.
After all patches are applied, the nv30 and nv40 directories are
removed and the only the new nvfx directory remains.

The first patches unify the engine naming (s/curie/eng3d/g;
s/rankine/eng3d), and switch nv40 to use the NV34TCL_ constants.
Initial versions of this work changed renouveau.xml to create a new
NVFXTCL object, but the current version doesn't need any
renouveau.xml modification at all.

The unification+fixes branch referenced above is the one that should
be tested.
The unification branch contains just the unification, with no
behavior changes, while unification+fixes also fixes swtnl and quad
rendering, allowing to better test the unification. Some cleanups on
top of the unfication are also included.

That same repository also contains other branches with significant
improvements on top of the unification, but I'm still not proposing
them for inclusion as they need more testing and some fixes.

While there are some branches in the Mesa repository that would
conflict with this, such branches seem to be popping up continuously
(and this is good!), so waiting until they are merged probably won't
really work.

The conflicts are minimal anyway and the driver fixes can be very
easily reconstructed over the unified codebase.

How about merging this?
Any objections? Any comments?

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Mesa (gallium-sampler-view): st/mesa: Associate a sampler view with an st texture object.

2010-03-12 Thread Luca Barbieri
What if you have a non-integer min LOD?
While the integer part may belong to the sampler view, the fractional
part really seems to be a sampler property.
Requiring min_lod  1.0 also doesn't seem to make much sense, so
shouldn't it be kept as it is now?
Same thing for last_level / max_lod.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Mesa (master): st/mesa: Always recalculate invalid index bounds.

2010-03-12 Thread Luca Barbieri
Isn't it possible to compute the maximum legal index by just taking
the minimum of:
(vb-buffer-size - vb-buffer_offset - ve-src_offset) / vb-stride

over all vertex buffers/elements?

Isn't the kernel checker doing something like this?

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Mesa (master): st/mesa: Always recalculate invalid index bounds.

2010-03-12 Thread Luca Barbieri
Actually, why is the state tracker doing the min/max computation at all?

If the driver does the index lookup itself, as opposed to using an
hardware index buffer, (e.g. the nouveau drivers do this in some
cases) this is unnecessary and slow.

Would completely removing the call to vbo_get_minmax_index break anything?

Also, how about removing the max_index field in pipe_vertex_buffer?
This seems to be set to the same value for all vertex buffers, and the
value is then passed to draw_range_elements too.
Isn't the value passed to draw_range_elements sufficient?

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Revert ST_SURFACE_DEPTH = BUFFER_DEPTH in master too?

2010-03-11 Thread Luca Barbieri
I've looked into the issue, and found a workaround by looking at what
st_renderbuffer_alloc_storage (which is called to create the depth
buffer with ST_SURFACE_DEPTH != BUFFER_DEPTH) does.

Adding:
if(ctx) ctx-NewState |= _NEW_BUFFERS;

at the end of st_set_framebuffer_surface seems to solve the warsow
problem with no other regressions.

Brian, is this the right fix?
Marek, does it fix your r300g problems too?

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Revert ST_SURFACE_DEPTH = BUFFER_DEPTH in master too?

2010-03-11 Thread Luca Barbieri
Solves the Warsow issue and seems to work.
Thanks!

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Revert ST_SURFACE_DEPTH = BUFFER_DEPTH in master too?

2010-03-11 Thread Luca Barbieri
Shouldn't

_mesa_add_renderbuffer(stfb-Base, BUFFER_FRONT_LEFT, rb);

be

_mesa_add_renderbuffer(stfb-Base, surfIndex, rb);

instead, since you seem to make the on-demand creation mechanism
generic and no longer limited to the front buffer?

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


[Mesa3d-dev] Revert ST_SURFACE_DEPTH = BUFFER_DEPTH in master too?

2010-03-10 Thread Luca Barbieri
In mesa_7_7_branch, 52d83efdbc4735d721e6fc9b44f29bdd432d4d73 reverts
commit 9d17ad2891b58de9e33e943ff918a678c6a3c2bd.

How about cherry-picking that commit into master, until a fix for the
bugs the revert commit introduces are found?

The reverted commit currently breaks the Warsow main menu for me,
making it show garbage.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] gallium cached bufmgr busy change

2010-03-09 Thread Luca Barbieri
 We can do this optimisation with busy as well. As long as you add
 things to the busy list at the end, and stop testing after the first
 busy call. At least for a single linear GPU context, which is all
 I expect this code will ever be handling.

Wouldn't this just end up reinventing the fenced bufmgr?

Basically cached needs a list of all destroyed buffers (ideally in
destruction order, so it can do the stopping optimization when
expiring buffers), while the busy mechanism needs a list of all used
buffers (destroyed or not) in usage order.

So it seems it would need two lists, and essentially result in
something that replicates fenced inside cached.

BTW, right now I think all drivers use a single GPU context in
userspace. Even Nouveau multiplexes Gallium contexts on a single
channel (this is probably broken though).

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] gallium cached bufmgr busy change

2010-03-07 Thread Luca Barbieri
I think you are supposed to do this using the fenced bufmgr over
cached along with a (ideally userspace) fencing mechanism.
If you can implement pb_busy, you should be able to implement
fence_signalled in exactly the same way (making the fence handle a
pointer to buffers should work for this purpose, if standalone fences
are hard to do).
The fenced bufmgr will only pass destruction requests to the wrapped
bufmgr once the fences are signalled.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Mesa (master): glsl/pp: Add asserts to check for null pointer deferences.

2010-03-04 Thread Luca Barbieri
 For static analysis with Coverity Prevent, the added assert will clear a 
 defect report and/or allow it to continue parsing to the next possible defect.

Are these being checked manually and determined to be false positives?
If not, then it would be beneficial to not shut up static analysis,
since it actually could be a problem.
If yes, perhaps it would be useful to have a comment explaining why it
is not a false positive, unless the reasoning is often trivial, which
means that the static analyzer isn't doing a very good job.

Also, is the whole concept of having a static analyzer assume that
asserts are true a good idea?
Shouldn't it instead specifically attempt to check whether the
assertions in the code are always true? (and have some other means to
flag false positives, perhaps not involving source modification)

Finally, does the checker provide some easy and license-allowed way of
making the analysis results public? (e.g. by putting up the same web
interface they used for their open source checking demos)

BTW, I just looked at one of the assert commits, and found it actually
_introduces_ a bug.
Look at the assert(attrib_list) added in
706fffbff59be0dc884e1938f1bdf731af1efa3e.

This ends up asserting that the attrib_list in glXCreatePixmap is not NULL.
But the GLX specification says that it can be NULL, and it will usually be.

The memcpy does not crash because when attrib_list is NULL, the length
parameter passed to it is 0, as the code before shows.

Thus, that commit should be reverted, and replaced with either no
change or by surrounding the memcpy with if(attrib_list) or if(i) .
Ideally, we could also mark the if, as well as the if(attrib_list)
above with unlikely() while we are at it.

Are we sure all the other commits like this are correct and actually
flag false positives, as opposed to hiding real bugs?

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Mesa (master): glsl/pp: Add asserts to check for null pointer deferences.

2010-03-04 Thread Luca Barbieri
Just noticed that has already been fixed in
5f40a7aed12500fd6792e2453f49c3b5c54d with an if(attrib_list).

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: gallium-format-cleanup branch (was Gallium format swizzles)

2010-03-03 Thread Luca Barbieri
 PIPE_FORMAT_X8B8G8R8_UNORM is being used by mesa. PIPE_FORMAT_R8G8B8X8_UNORM 
 doesn't exist hence it appears to be unnecessary. So it doesn't make sense to 
 rename.

How about D3DFMT_X8B8G8R8? That should map to PIPE_FORMAT_R8G8B8X8_UNORM.

BTW, we are also missing D3DFMT_X4R4G4B4, D3DFMT_X1R5G5B5,
D3DFMT_A4L4, D3DFMT_A1, D3DFMT_L6V5U5, D3DFMT_D15S1, D3DFMT_D24X4S4,
D3DFMT_CxV8U8 and perhaps others I did not notice.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Does DX9 SM3 - VMware svga with arbitrary semantics work? How?

2010-03-03 Thread Luca Barbieri
BTW, i915 is also limited to 0-7 generic indices, and thus doesn't
work with GLSL at all right now.

This should be relatively easy to fix since it should be enough to
store the generic indices in the texCoords arrays, and then pass
them to draw_find_shader_output.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


[Mesa3d-dev] Does DX9 SM3 - VMware svga with arbitrary semantics work? How?

2010-03-02 Thread Luca Barbieri
I've been looking at shader semantics some more, and I'm a bit
surprised by how the svga driver works.
It seems that an obvious implementation of a DirectX 9 state tracker
just won't work with the svga driver.

In SM3, vertex/fragment semantics can be arbitrary (independent of
hardware resources), but indices are limited to a 0-15 range.

A DirectX 9 state tracker must convert those to TGSI_SEMANTIC_GENERIC.
How does the VMware one do that?
Assuming that it maps them directly, this means that the driver must
support GENERIC semantic indices up to a number that varies between
about 200 and 255.

The problem is that the vmware svga driver, as far as I can see,
doesn't support indices greater than 15.
This is caused by the fact that it maps all GENERIC semantics to
SVGA3D_DECLUSAGE_TEXCOORD, and the index bitfield in the svga virtual
interface only supports 4 bits.

In other words, SM3 under VMware with arbitrary semantics (allowed by
SM3 and other drivers) really seems broken, for a straightforward
DirectX9 state tracker implementation.

The only way it can work now is if the DirectX 9 state tracker looks
at both the vertex and pixel shaders, links them, and outputs
sequential semantic indices.

It seems to me that the svga driver should be fixed to map GENERIC to
*all* SM3 semantic types, ideally in a way that reverses the SM3 -
GENERIC transformation done by the DX9 state tracker.

Doing this requires to specify a maximum index for
TGSI_SEMANTIC_GENERIC which is very carefully chosen to allow 1:1
mapping with SM3, so that DirectX 9 state trackers have enough indices
to represent all SM3, and the svga driver can fit all indices in the
SM3-like semantics of the VMware virtual GPU interface.

The correct value in this case seems to be 219 = 14 * 16 SM3 semantics
- 5 for COLOR0, COLOR1, PSIZE0, POSITION0, FOG0 which have specific
TGSI semantics which they need to mapped to/from.

I'm looking at this because this seems the strictest constraint on
choosing a limit for TGSI_SEMANTIC_GENERIC indices.
The other constraint is due to r600 supporting only byte-sized
semantic/index combinations, which is less strict than SM3.

BTW, glsl also looks artificially limited on svga, as only 6 varyings
will be supported, due to it starting from 10.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Does DX9 SM3 - VMware svga with arbitrary semantics work? How?

2010-03-02 Thread Luca Barbieri
 I don't think anybody has tried hooking it up - so far the primary
 purpose of the svga gallium driver has been GL support, but thinking
 about it you're probably right.

I'm a bit confused about this: I was under the impression that VMware
Tools for Windows used your DirectX state tracker and a WGL version of
Mesa, talking to the svga Gallium driver.
How does it actually work?
What do you normally use the DirectX 9 state tracker with?

 The details of the closed code aren't terribly important as they could
 always be changed.
Sure, but it currently is the only Gallium user that supports the SM3
model and thus the only one that really needs arbitrary semantic
indices, and puts constraints on them.

 The correct value in this case seems to be 219 = 14 * 16 SM3 semantics
 - 5 for COLOR0, COLOR1, PSIZE0, POSITION0, FOG0 which have specific
 TGSI semantics which they need to mapped to/from.

 Agree, though I'd opt for 255 as a round number.

The problem with this is that you only have 14 SM3 semantics with 16
indices each, so you can't map 256 generic indices into the VMware
interface, or directly into an SM3 shader.
You only have 14 * 16 minus the ones used for non-GENERIC semantics
(the one mentioned above).
And of course, if you choose a smaller number, you can't map SM3
_into_ Gallium, so you need to choose the exact number required for
SM3.

Tying Gallium in this way to SM3 is surely a bit ugly, but it's just a
constant, and I don't see any other way to implement SM3 without doing
linkage in software in the r600 and svga drivers and/or in SM3 state
trackers.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Does DX9 SM3 - VMware svga with arbitrary semantics work? How?

2010-03-02 Thread Luca Barbieri
The difference between an easier and harder life for (some) drivers is
whether the limit is tied to hardware interpolators or not.
Once we decide to not tie it, whether the limit is 128 or 256 is of
course quite inconsequential.
Allowing arbitrary 32-bit values would however require use of binary
search or an hash table.

I think you or someone else from the Mesa team should decide how to
proceed, and most drivers would need to be fixed.

As I understand, the constraints are the following:

Hardware with no capabilities.
- nv30 does not support any mapping. However, we already need to patch
fragment programs to insert constants, so we can patch input register
numbers as well. The current driver only supports 0-7 generic indices,
but I already implemented support for 0-255 indices with in-driver
linkage and patching. Note that nv30 lacks control flow in fragment
programs.
- nv40 is like nv30, but supports fp control flow, and may have some
configurable mapping support, with unknown behavior

Hardware with capabilities that must be configured for each fp/vp pair.
- nv40 might have this but the nVidia OpenGL driver does not use them
- nv50 has configurable vp-gp and gp-fp mappings with 64 entries.
The current driver seems to support arbitrary 0-2^32 indices.
- r300 appears to have a configurable vp-fp mapping. The current
driver only supports 0-15 generic indices, but redefining
ATTR_GENERIC_COUNT could be enough to have it support larger numbers.

Hardware with automatic linkage when semantics match:
- VMWare svga appears to support 14 * 16 semantics, but the current
driver only supports 0-15 generic indices. This could be fixed by
mapping GENERIC into all non-special SM3 semantics.

Hardware that can do both configurable mappings and automatic linkage:
- r600 supports linkage in hardware between matching apparently
byte-sized semantic ids

Other hardware;
- i915 has no hardware vertex shading
- Not sure about i965

Software:
1. SM3 wants to use 14 * 16 indices overall. This is apparently only
supported by the VMware closed source state tracker.
2. SM2 and non-GLSL OpenGL just want to use as many indices as the
hardware interpolator count
3. Current GLSL currently wants to use at most about 10 indices more
than the hardware interpolator count. This can be fixed since we see
both the fragment and vertex shaders during linkage (the patch I sent
did that)
4. GLSL with EXT_separate_shader_objects does not add requirements
because only gl_TexCoord and other builtin varyings are supported.
User-defined varyings are not supported
5. An hypotetical version of EXT_separate_shader_objects extended to
support user-defining varyings would either want arbitrary 32-bit
generic indices (by interning strings to generate the indices) or the
ability to specify a custom mapping between shader indices
6. An hypotetical no-op implementation of the GLSL linker would have
the same requirement

Also note that non-GENERIC indices have peculiar properties.

For COLOR and BCOLOR:
1. SM3 and OpenGL with glColorClamp appropriately set wants it to
_not_ be clamped to [0, 1]
2. SM2 and normal OpenGL apparently want it to be clamped to [0, 1]
(sometimes for fixed point targets only) and may also allow using
U8_UNORM precision for it instead of FP32
3. OpenGL allows to enable two-sided lighting, in which case COLOR in
the fragment shader is automagically set to BCOLOR for back faces
4. Older hardware (e.g. nv30) tends to support BCOLOR but not FACING.
Some hardware (e.g. nv40) supports both FACING and BCOLOR in hardware.
The latest hardware probably supports FACING only.

Any API that requires special semantics for COLOR and BCOLOR (i.e.
non-SM3) seems to only want 0-1 indices.

Note that SM3 does *not* include BCOLOR, so basically the limits for
generic indices would need to be conditional on BCOLOR being present
or not (e.g. if it is present, we must reserve two semantic slots in
svga for it).

POSITION0 is obviously special.
PSIZE0 is also special for points.

FOG0 seems right now to just be a GENERIC with a single component.
Gallium could be extended to support fixed function fog, which most
DX9 hardware supports (nv30/nv40 and r300). This is mostly orthogonal
to the semantic issue.

TGSI_SEMANTIC_NORMAL is essentially unused and should probably be removed

The options are the ones you outlined, plus:
(e) Allow arbitrary 32-bit indices. This requires slightly more
complicated data structures in some cases, and will require svga and
r600 to fallback to software linkage if numbers are too high.
(f) Limit semantic indices to hardware interpolators _and_ introduce
an interface to let the user specify an

Personally I think the simplest idea for now could be to have all
drivers support 256 indices or, in the case of r600 and svga, the
maximum value supported by the hardware, and expose that as a cap (as
well as another cap for the number of different semantic values
supported at once).
The minimum guaranteed value is set to the lowest 

Re: [Mesa3d-dev] Does DX9 SM3 - VMware svga with arbitrary semantics work? How?

2010-03-02 Thread Luca Barbieri
On Tue, Mar 2, 2010 at 10:00 PM, Corbin Simpson
mostawesomed...@gmail.com wrote:
 FYI r300 only supports 24 interpolators: 16 linear and 8 perspective.
 (IIRC; not in front of the docs right now.) r600 supports 256 fully
 configurable interpolators.

Yes, but if you raised ATTR_GENERIC_COUNT, the current driver would
support higher semantic indices right? (of course, with a limit of
8/24 different semantic indices used at once).

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: gallium-format-cleanup branch (was Gallium format swizzles)

2010-03-02 Thread Luca Barbieri
Shouldn't
PIPE_FORMAT_X8B8G8R8_UNORM= 68,

be instead R8G8B8X8_UNORM, which is currently missing, for consistency with:
PIPE_FORMAT_R8G8B8X8_SNORM= 81,

with X8B8G8R8_UNORM perhaps put at the end next to PIPE_FORMAT_A8B8G8R8_UNORM?

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Gallium software fallback/draw command failure

2010-03-01 Thread Luca Barbieri
Falling back to CPU rendering, while respecting the OpenGL spec, is
likely going to be unusably slow in most cases and thus not really
better for real usage than just not rendering.

I think the only way to have an usable fallback mechanism is to do
fallbacks with the GPU, by automatically introducing multiple
rendering passes.
For instance, if you were to run each fragment shader instruction in a
separate pass (using floating point targets), then you would never
have more than two texture operands.

If the render targets are too large, you can also just split them in
multiple portions, and you can limit texture size so that 2 textures
plus a render target portion always fit in memory. Alternatively, you
can split textures too, try to statically deduce the referenced
portion and KIL if you guessed wrong, combined with occlusion queries
to check if you KILled.

Control flow complicates things, but you can probably just put the
execution mask in a stencil buffer or secondary render target/texture,
and use occlusion queries to find out if it is empty.

Of course, this requires to write and test a very significant amount
of complex code (probably including a TGSI-LLVM-TGSI infrastructure,
since you likely need nontrivial compiler techniques to do this
optimally).

However, we may need part of this anyway to support multi-GPU
configurations, and it also allows to emulate advanced shader
capabilities on less capable hardware (e.g. shaders with more
instructions or temporaries than the hardware limitations, or
SM3+/GLSL shaders on SM2 hardware), with some hope of getting usable
performance.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


[Mesa3d-dev] PK/UP* and NV_[vertex|fragment]_program* support in Gallium?

2010-03-01 Thread Luca Barbieri
I see that PK2US and friends are being removed.
These would be necessary to implement NV_fragment_program_option,
NV_fragment_program2 and NV_gpu_program4.

Currently the no drivers (including Nouveau) support them, but since
we already have some support in Mesa (even parsers for the nVidia
syntax), it would be nice to support them in Gallium eventually.

Not sure about STR/SFL though: they can be encoded/decoded as MOV x,
0/1, but they complete the SETcond instruction set.

How about keeping them and adding a capability bit for them?

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


[Mesa3d-dev] [PATCH] pipebuffer: avoid assert due to increasing a zeroed refcnt

2010-02-23 Thread Luca Barbieri
The cache manager stores buffers with a reference count that dropped to 0.
pipe_reference asserts in this case on debug builds,
so use pipe_reference_init instead.

---
 src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c 
b/src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c
index 53bc019..86f9266 100644
--- a/src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c
+++ b/src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c
@@ -294,7 +294,7 @@ pb_cache_manager_create_buffer(struct pb_manager *_mgr,
   LIST_DEL(buf-head);
   pipe_mutex_unlock(mgr-mutex);
   /* Increase refcount */
-  pipe_reference(NULL, buf-base.base.reference);
+  pipe_reference_init(buf-base.base.reference, 1);
   return buf-base;
}

-- 
1.6.6.1.476.g01ddb


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


[Mesa3d-dev] [PATCH] pipebuffer: fix inverted signalled checking

2010-02-23 Thread Luca Barbieri
A return of 0 means the fence is signalled.
---
 .../auxiliary/pipebuffer/pb_buffer_fenced.c|2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/gallium/auxiliary/pipebuffer/pb_buffer_fenced.c 
b/src/gallium/auxiliary/pipebuffer/pb_buffer_fenced.c
index 95eb5f6..d97f749 100644
--- a/src/gallium/auxiliary/pipebuffer/pb_buffer_fenced.c
+++ b/src/gallium/auxiliary/pipebuffer/pb_buffer_fenced.c
@@ -696,7 +696,7 @@ fenced_buffer_map(struct pb_buffer *buf,
* Don't wait for the GPU to finish accessing it, if blocking is 
forbidden.
*/
   if((flags  PIPE_BUFFER_USAGE_DONTBLOCK) 
-  ops-fence_signalled(ops, fenced_buf-fence, 0) == 0) {
+  ops-fence_signalled(ops, fenced_buf-fence, 0) != 0) {
  goto done;
   }
 
-- 
1.6.6.1.476.g01ddb


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] pipebuffer: check for unsynchronized usage before looking at flags

2010-02-23 Thread Luca Barbieri
 Good catch of the fence_signalled
 negated logic.

This was actually mentioned on IRC by Maarten Maathuis (who was
working on adding pipebuffer support to the nv50 driver).
Thanks to him :)

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] [PATCH] pipebuffer: check for unsynchronized usage before looking at flags

2010-02-23 Thread Luca Barbieri
 +   if (flags  PIPE_BUFFER_USAGE_UNSYNCHRONIZED) {
This should be:
if (!(flags  PIPE_BUFFER_USAGE_UNSYNCHRONIZED)) {

Sorry for this.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


  1   2   3   >