[Mesa3d-dev] Remove static_dispatch=false from GL functions exported by ATI or nVidia?
Currently, there are several functions where static dispatch has been disabled, but that are exported by either ATI, nVidia or both lib. To prevent compatibility issues, it seems a good idea to export those too at least. What do you think? Should we export all function exported by both nVidia and ATI, those exported by any of them, or even just export all functions? static_dispatch=false but exported by both ATI and nVidia: glBlendEquationSeparateEXT glBlitFramebufferEXT glGetQueryObjecti64vEXT glGetQueryObjectui64vEXT glProgramEnvParameters4fvEXT glProgramLocalParameters4fvEXT static_dispatch=false but exported by ATI, not by nVidia: glGetHistogramEXT glGetHistogramParameterfvEXT glGetHistogramParameterivEXT glGetMinmaxEXT glGetMinmaxParameterfvEXT glGetMinmaxParameterivEXT glGetTexParameterPointervAPPLE glHistogramEXT glMinmaxEXT glResetHistogramEXT glResetMinmaxEXT glStencilFuncSeparateATI glStencilOpSeparateATI glTextureRangeAPPLE static_dispatch=false but exported by nVidia, not by ATI: glActiveStencilFaceEXT glColorSubTableEXT glDeleteFencesNV glDepthBoundsEXT glFinishFenceNV glGenFencesNV glGetFenceivNV glIsFenceNV glSetFenceNV glTestFenceNV -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Remove static_dispatch=false from GL functions exported by ATI or nVidia?
No. The libGL ABI is well defined: http://www.opengl.org/registry/ABI/ Does the ABI forbid exporting additional functions from libGL.so? As far as I can tell, it doesn't, and only says what is required to be exported, and that applications shouldn't statically link to other functions. The way things are now, some applications that work with nVidia or ATI implementations will fail to link or fail to load with Mesa, which seems undesirable. A bug on this has just been reported. -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats
On Mon, Sep 6, 2010 at 3:57 PM, José Fonseca jfons...@vmware.com wrote: I'd like to know if there's any objection to change the resource_copy_region semantics to allow copies between different yet compatible formats, where the definition of compatible formats is: I was about to propose something like this. How about a much more powerful change though, that would make any pair of non-blocked format of the same bit depth compatible? This way you could copy z24s8 to r8g8b8a8, for instance. In addition to this, how about explicitly allowing sampler views to use a compatible format, and add the ability for surfaces to use a compatible format too? (with a new parameter to get_tex_surface) This would allow for instance to implement glBlitFramebuffer on stencil buffers by reinterpreting the buffer as r8g8b8a8, and allow the blitter module to copy depth/stencil buffers by simply treating them as color buffers. The only issue is that some drivers might hold depth/stencil surfaces in compressed formats that cannot be interpreted as a color format, and not have any mechanism for keeping temporaries or doing conversions internally. DirectX seems to have something like this with the _TYPELESS formats. -- This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats
How about dropping the idea that resource_copy_region must be just a memcpy and have the driver instruct the hardware 2D blitter to write 1s in the alpha channel if supported by hw or have u_blitter do this in the shader? nv30/nv40 and apparently nv50 can do this in the 2D blitter, and all Radeons seem to use the 3D engine, which obviously can do it in the shader. We may also want to allow actual conversion between arbitrary formats, since again u_blitter can do it trivially, and so can most/all hardware 2D engines. -- This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats
This way you could copy z24s8 to r8g8b8a8, for instance. I am not sure this makes a lot of sense. There's no guarantee the bit layout of these is even remotely similar (and it likely won't be on any decent hardware). I think the dx10 restriction makes sense here. Yes, it depends on the flexibility of the hardware and the driver. Due to depth textures, I think it is actually likely that you can easily treat depth as color. The worst issue right now is that stencil cannot be accessed in a sensible way at all, which makes implementing glBlitFramebuffer of STENCIL_BIT with NEAREST and different rect sizes impossible. Some cards (r600+ at least) can write stencil in shaders, but on some you must reinterpret the surface. And resource_copy_region does not support stretching, so it can't be used. Since not all cards can write stencil in shaders, one either needs to be able to bind depth/stencil as a color buffer, or extend resource_copy_region to support stretching with nearest filtering, or both (possibly in addition to having the option of using stencil export in shaders). Other things would likely benefit, such as GL_NV_copy_depth_to_color. -- This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats
Yes, if x8 is interpreted as writes can write arbitrary data, reads must return 1 (as you said), then this is not necessary in resource_copy_region even if A8 - X8 becomes supported. You are right that format conversions would probably be better added as a separate function (if at all), in addition to the reinterpret_cast mechanism you proposed to add. -- This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats
When I said it won't work with decent hardware, I really meant it won't work due to compression. Now, it's quite possible this can be disabled on any chip, but you don't know that before hence you need to jump through hoops to get an uncompressed version of your compressed buffer later. Well, you can render to a compressed depth buffer and then bind it as a depth texture (routinely done for shadows), so there needs to be a way to get compressed data to the sampler either directly or via the driver automagically converting it with a blit beforehand. Of course, this may not actually work for stencil too, or might not allow to let you interpret depth as 8-bit color components, or perhaps not use directly as a render target, but it seems possible, especially on modern flexible hardware and on older dumber hardware that lacks/doesn't force compression. I haven't checked any hardware docs though, beyond the fact that nvfx currently doesn't support any compression and thus can just do it. -- This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats
The dst blending parameter is just a factor the real dst value is multiplied by (except for min/max). There is no way to multiply an arbitrary value by a constant and get 1.0. But you can force 0, of course. I don't think there is hardware which supports such flexible swizzling in the blender. If x8 is just padding as you say, the value of it should be undefined and every operation using the padding bits should be undefined too except for texture sampling. It's not like I have any other choice. As far as I can tell, the only problem you have with blending with an X8 with random garbage, but with read value 1 is if any of the blending factors is DST_ALPHA or INV_DST_ALPHA (or COLOR as an alpha factor), in which case you can solve the issue by replacing the offending factor with ONE or ZERO, as long as you have support for RGB/A separate blend functions (which Gallium currenly assumes afaik). You can also disable the alpha channel in the writemask to avoid unnecessary work. On nv30/nv40, there is an actual render target format that instructs the card to read dst alpha as 1 (you can also choose whether to write 0 or 1). Of course, one could argue that mesa/st should do the transformation instead of Gallium drivers where hardware lacks such support. I suppose just not advertising X8 formats as render target formats could also work. -- This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] ARB draw buffers + texenv program
On nv30/nv40 support for patching fragment programs is already necessary (constants must be patched in as immediates), and this can be handled by just patching the end of the fragment program to include a variable number of instructions to copy a temp to COLOR[x]. It's possible that there could be a hardware mechanism too, haven't checked. If other MRT-capable hardware already has this kind of fragment program patching or supports this in hardware, then a new TGSI semantic or register file can be added for this, and drivers can easily implement that without recompilation. Drivers could also just unconditionally write all color outputs as a first implementation or if that doesn't affect performance. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
[Mesa3d-dev] [PATCH 4/6] gallium/auxiliary: add semantic linkage utility code
--- src/gallium/auxiliary/Makefile |1 + src/gallium/auxiliary/util/u_linkage.c | 119 src/gallium/auxiliary/util/u_linkage.h | 38 ++ 3 files changed, 158 insertions(+), 0 deletions(-) create mode 100644 src/gallium/auxiliary/util/u_linkage.c create mode 100644 src/gallium/auxiliary/util/u_linkage.h diff --git a/src/gallium/auxiliary/Makefile b/src/gallium/auxiliary/Makefile index c4d6b52..44c2f8b 100644 --- a/src/gallium/auxiliary/Makefile +++ b/src/gallium/auxiliary/Makefile @@ -120,6 +120,7 @@ C_SOURCES = \ util/u_hash.c \ util/u_keymap.c \ util/u_linear.c \ + util/u_linkage.c \ util/u_network.c \ util/u_math.c \ util/u_mm.c \ diff --git a/src/gallium/auxiliary/util/u_linkage.c b/src/gallium/auxiliary/util/u_linkage.c new file mode 100644 index 000..8a76378 --- /dev/null +++ b/src/gallium/auxiliary/util/u_linkage.c @@ -0,0 +1,119 @@ +#include util/u_debug.h +#include pipe/p_shader_tokens.h +#include tgsi/tgsi_parse.h +#include tgsi/tgsi_scan.h +#include util/u_linkage.h + +/* we must only record the registers that are actually used, not just declared */ +static INLINE boolean +util_semantic_set_test_and_set(struct util_semantic_set *set, unsigned value) +{ + unsigned mask = 1 (value % (sizeof(long) * 8)); + unsigned long *p = set-masks[value / (sizeof(long) * 8)]; + unsigned long v = *p mask; + *p |= mask; + return !!v; +} + +unsigned +util_semantic_set_from_program_file(struct util_semantic_set *set, const struct tgsi_token *tokens, enum tgsi_file_type file) +{ + struct tgsi_shader_info info; + struct tgsi_parse_context parse; + unsigned count = 0; + ubyte *semantic_name; + ubyte *semantic_index; + + tgsi_scan_shader(tokens, info); + + if(file == TGSI_FILE_INPUT) + { + semantic_name = info.input_semantic_name; + semantic_index = info.input_semantic_index; + } + else if(file == TGSI_FILE_OUTPUT) + { + semantic_name = info.output_semantic_name; + semantic_index = info.output_semantic_index; + } + else + assert(0); + + tgsi_parse_init(parse, tokens); + + memset(set-masks, 0, sizeof(set-masks)); + while(!tgsi_parse_end_of_tokens(parse)) + { + tgsi_parse_token(parse); + + if(parse.FullToken.Token.Type == TGSI_TOKEN_TYPE_INSTRUCTION) + { +const struct tgsi_full_instruction *finst = parse.FullToken.FullInstruction; +unsigned i; +for(i = 0; i finst-Instruction.NumDstRegs; ++i) +{ + if(finst-Dst[i].Register.File == file) + { + unsigned idx = finst-Dst[i].Register.Index; + if(semantic_name[idx] == TGSI_SEMANTIC_GENERIC) + { + if(!util_semantic_set_test_and_set(set, semantic_index[idx])) +++count; + } + } +} + +for(i = 0; i finst-Instruction.NumSrcRegs; ++i) +{ + if(finst-Src[i].Register.File == file) + { + unsigned idx = finst-Src[i].Register.Index; + if(semantic_name[idx] == TGSI_SEMANTIC_GENERIC) + { + if(!util_semantic_set_test_and_set(set, semantic_index[idx])) +++count; + } + } +} + } + } + tgsi_parse_free(parse); + + return count; +} + +#define UTIL_SEMANTIC_SET_FOR_EACH(i, set) for(i = 0; i 256; ++i) if(set-masks[i / (sizeof(long) * 8)] (1 (i % (sizeof(long) * 8 + +void +util_semantic_layout_from_set(unsigned char *layout, const struct util_semantic_set *set, unsigned efficient_slots, unsigned num_slots) +{ + int first = -1; + int last = -1; + unsigned i; + + memset(layout, 0xff, num_slots); + + UTIL_SEMANTIC_SET_FOR_EACH(i, set) + { + if(first 0) +first = i; + last = i; + } + + if(last efficient_slots) + { + UTIL_SEMANTIC_SET_FOR_EACH(i, set) + layout[i] = i; + } + else if((last - first) efficient_slots) + { + UTIL_SEMANTIC_SET_FOR_EACH(i, set) + layout[i - first] = i; + } + else + { + unsigned idx = 0; + UTIL_SEMANTIC_SET_FOR_EACH(i, set) + layout[idx++] = i; + } +} diff --git a/src/gallium/auxiliary/util/u_linkage.h b/src/gallium/auxiliary/util/u_linkage.h new file mode 100644 index 000..e73e0fd --- /dev/null +++ b/src/gallium/auxiliary/util/u_linkage.h @@ -0,0 +1,38 @@ +#ifndef U_LINKAGE_H_ +#define U_LINKAGE_H_ + +#include pipe/p_compiler.h + +struct util_semantic_set +{ + unsigned long masks[256 / 8 / sizeof(unsigned long)]; +}; + +static INLINE bool +util_semantic_set_contains(struct util_semantic_set *set, unsigned char value) +{ + return !!(set-masks[value / (sizeof(long) * 8)] (1 (value / (sizeof(long) * 8; +} + +unsigned util_semantic_set_from_program_file(struct util_semantic_set *set, const struct tgsi_token *tokens, enum tgsi_file_type file); + +/*
[Mesa3d-dev] [PATCH 0/6] [RFC] Formalization of the Gallium shader semantics linkage model
only. Any API that requires special semantics for COLOR and BCOLOR (i.e. non-SM3) seems to only want 0-1 indices. Note that SM3 does *not* include BCOLOR, so basically the limits for generic indices would need to be conditional on BCOLOR being present or not (e.g. if it is present, we must reserve two semantic slots in svga for it). POSITION0 is obviously special. PSIZE0 is also special for points. FOG0 seems right now to just be a GENERIC with a single component. Gallium could be extended to support fixed function fog, which most DX9 hardware supports (nv30/nv40 and r300). This is mostly orthogonal to the semantic issue. == Current Gallium users == Right now no open-source users of Gallium fundamentally require arbitrary indices. In particular: 1. GLSL and anything with similar link-by-name can of course be modified to use sequential indices 2. ARB fragment program and vertex program use index-limited texcoord slots 3. g3dvl needs and uses 8 texcoord slots, indices 0-7 4. vega and xorg use indices 0-1 5. DX10 seems to restrict semantics to 0-N range, if I'm not mistaken 6. The GL_EXT_separate_shader_objects extension does not provide arbitrary index matching for GLSL, but merely lets it use a model similar to ARB fp/vp However, the GLSL linker needs them in its current form, and the capability can be generally useful anyway. === Discussion of possible options === [Options from Keith Whitwell, see http://www.opensource-archive.org/showthread.php?p=180719] a) Picking a lower number like 128, that an SM3 state tracker could usually be able to directly translate incoming semantics into, but which would force it to renumber under rare circumstances. This would make life easier for the open drivers at the expense of the closed code. b) Picking 256 to make life easier for some closed-source SM3 state tracker, but harder for open drivers. c) Picking 219 (or some other magic number) that happens to work with the current set of constraints, but makes gallium fragile in the face of new constraints. d) Abandoning the current gallium linkage rules and coming up with something new, for instance forcing the state trackers to renumber always and making life trivial for the drivers... [Options from me] (e) Allow arbitrary 32-bit indices. This requires slightly more complicated data structures in some cases, and will require svga and r600 to fallback to software linkage if numbers are too high. (f) Limit semantic indices to hardware interpolators _and_ introduce an interface to let the user specify an Personally I think the simplest idea for now could be to have all drivers support 256 indices or, in the case of r600 and svga, the maximum value supported by the hardware, and expose that as a cap (as well as another cap for the number of different semantic values supported at once). The minimum guaranteed value is set to the lowest hardware constraint, which would be svga with 219 indices (assuming no bcolor is used). If some new constraints pop up, we just lower it and change SM3 state trackers to check for it and fallback otherwise. This should just require simple fixes to svga and r300, and significant code for nv30/nv40, which is however already implemented. Luca Barbieri (5): tgsi: formalize limits on semantic indices tgsi: add support for packing semantics in SM3 byte values gallium/auxiliary: add semantic linkage utility code nvfx: support proper shader linkage - adds glsl support nvfx: expose GLSL Michal Krol (1): gallium: Remove TGSI_SEMANTIC_NORMAL. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
[Mesa3d-dev] [PATCH 2/6] tgsi: formalize limits on semantic indices
--- src/gallium/include/pipe/p_shader_tokens.h | 18 ++ 1 files changed, 18 insertions(+), 0 deletions(-) diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index baff802..5d511ba 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -146,6 +146,24 @@ struct tgsi_declaration_dimension #define TGSI_SEMANTIC_INSTANCEID 10 #define TGSI_SEMANTIC_COUNT 11 /** number of semantic values */ +/* 219 = (14 * 16 - 5) + * All SM3 semantics minus COLOR0, COLOR1, POSITION0, FOG0 and PSIZE0 + * This value is accurately chosen so that Gallium semantic/indices may be converted + * losslessly from and to SM3 semantics. + * + * Note that if BCOLOR is used, then this value is actually 211 - #MAX_BCOLOR_INDEX_USED - 1 + * (SM3 does not support BCOLOR, and uses FACE instead) + * + * In any card supports more, this will be handled later. + * + * However, drivers should support 256 generic indices if the mechanism + * they use is not intrinsically limited to a lower value. + */ +#define TGSI_SEMANTIC_GENERIC_INDICES 219 + +#define TGSI_SEMANTIC_INDICES(sem) (((sem) == TGSI_SEMANTIC_GENERIC) ? TGSI_SEMANTIC_GENERIC_INDICES : \ + ((sem == TGSI_SEMANTIC_COLOR_INDICES || sem == TGSI_SEMANTIC_BCOLOR_INDICES) ? 2 : 1)) + struct tgsi_declaration_semantic { unsigned Name : 8; /** one of TGSI_SEMANTIC_x */ -- 1.7.0.1.147.g6d84b -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
[Mesa3d-dev] [PATCH 3/6] tgsi: add support for packing semantics in SM3 byte values
--- src/gallium/auxiliary/util/u_semantics.h | 123 ++ 1 files changed, 123 insertions(+), 0 deletions(-) create mode 100644 src/gallium/auxiliary/util/u_semantics.h diff --git a/src/gallium/auxiliary/util/u_semantics.h b/src/gallium/auxiliary/util/u_semantics.h new file mode 100644 index 000..d620619 --- /dev/null +++ b/src/gallium/auxiliary/util/u_semantics.h @@ -0,0 +1,123 @@ +#ifndef U_SEMANTICS_H_ +#define U_SEMANTICS_H_ + +#include pipe/p_compiler.h +#include pipe/p_shader_tokens.h + +/* same as SM3 values */ +#define TGSI_SEMANTIC_BYTE_POSITION 0 +#define TGSI_SEMANTIC_BYTE_PSIZE (4 4) +#define TGSI_SEMANTIC_BYTE_COLOR0 (10 4) +#define TGSI_SEMANTIC_BYTE_COLOR1 (TGSI_SEMANTIC_BYTE_COLOR0 + 1) +#define TGSI_SEMANTIC_BYTE_FOG (11 4) +#define TGSI_SEMANTIC_BYTE_BCOLOR0 (14 4) +#define TGSI_SEMANTIC_BYTE_BCOLOR1 (TGSI_SEMANTIC_BYTE_BCOLOR0 + 1) +#define TGSI_SEMANTIC_BYTE_TGSI (15 4) + +static INLINE unsigned char +pipe_semantic_to_byte(unsigned name, unsigned index) +{ + switch (name) + { + case TGSI_SEMANTIC_POSITION: + return TGSI_SEMANTIC_BYTE_POSITION; + case TGSI_SEMANTIC_PSIZE: + return TGSI_SEMANTIC_BYTE_PSIZE; + case TGSI_SEMANTIC_FOG: + return TGSI_SEMANTIC_BYTE_FOG; + case TGSI_SEMANTIC_COLOR: + return TGSI_SEMANTIC_BYTE_COLOR0 + index; + case TGSI_SEMANTIC_GENERIC: + ++index; + if(index = TGSI_SEMANTIC_BYTE_PSIZE) + { +++index; +if(index = TGSI_SEMANTIC_BYTE_COLOR0) +{ + index += 2; + if(index = TGSI_SEMANTIC_BYTE_FOG) + ++index; +} + } + return index; + case TGSI_SEMANTIC_BCOLOR: + return TGSI_SEMANTIC_BYTE_BCOLOR0 + index; + default: + return TGSI_SEMANTIC_BYTE_TGSI + name; + } +} + +/* this fits BCOLOR in the SM3 range, but is not reversible */ +static INLINE unsigned char +pipe_semantic_to_byte_sm3(unsigned name, unsigned index) +{ + if(name == TGSI_SEMANTIC_BCOLOR) + return TGSI_SEMANTIC_BYTE_BCOLOR0 - 1 - index; + return pipe_semantic_to_byte(name, index); +} + +static INLINE unsigned +pipe_semantic_name_from_byte(unsigned char value) +{ + switch (value) + { + case TGSI_SEMANTIC_BYTE_POSITION: + return TGSI_SEMANTIC_POSITION; + case TGSI_SEMANTIC_BYTE_PSIZE: + return TGSI_SEMANTIC_PSIZE; + case TGSI_SEMANTIC_BYTE_FOG: + return TGSI_SEMANTIC_FOG; + case TGSI_SEMANTIC_BYTE_COLOR0: + case TGSI_SEMANTIC_BYTE_COLOR1: + return TGSI_SEMANTIC_COLOR; + case TGSI_SEMANTIC_BYTE_BCOLOR0: + case TGSI_SEMANTIC_BYTE_BCOLOR1: + return TGSI_SEMANTIC_BCOLOR; + default: + if(value TGSI_SEMANTIC_BYTE_TGSI) +return TGSI_SEMANTIC_GENERIC; + else +return value - TGSI_SEMANTIC_BYTE_TGSI; + } +} + +static INLINE unsigned +pipe_semantic_index_from_byte(unsigned char value) +{ + if(value == TGSI_SEMANTIC_BYTE_POSITION) + return 0; + + if(value = TGSI_SEMANTIC_BYTE_PSIZE) + { + if(value TGSI_SEMANTIC_BYTE_PSIZE) +return value - 1; + else +return 0; + } + + if(value (TGSI_SEMANTIC_BYTE_COLOR0 + 2)) + { + if(value TGSI_SEMANTIC_BYTE_COLOR0) +return value - 2; + else +return value - TGSI_SEMANTIC_BYTE_COLOR0; + } + + if(value = TGSI_SEMANTIC_BYTE_FOG) + { + if(value TGSI_SEMANTIC_BYTE_FOG) +return value - 4; + else +return 0; + } + + if(value TGSI_SEMANTIC_BYTE_BCOLOR0) + return value - 5; + + if(value == (TGSI_SEMANTIC_BYTE_BCOLOR1)) + return 1; + + return 0; +} + +#endif /* U_SEMANTICS_H_ */ -- 1.7.0.1.147.g6d84b -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
[Mesa3d-dev] [PATCH 6/6] nvfx: expose GLSL
Still no control flow support, but basic stuff works. --- src/gallium/drivers/nvfx/nvfx_screen.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/gallium/drivers/nvfx/nvfx_screen.c b/src/gallium/drivers/nvfx/nvfx_screen.c index 6742759..b935fa9 100644 --- a/src/gallium/drivers/nvfx/nvfx_screen.c +++ b/src/gallium/drivers/nvfx/nvfx_screen.c @@ -42,7 +42,7 @@ nvfx_screen_get_param(struct pipe_screen *pscreen, int param) case PIPE_CAP_TWO_SIDED_STENCIL: return 1; case PIPE_CAP_GLSL: - return 0; + return 1; case PIPE_CAP_ANISOTROPIC_FILTER: return 1; case PIPE_CAP_POINT_SPRITE: -- 1.7.0.1.147.g6d84b -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
[Mesa3d-dev] [PATCH 1/6] gallium: Remove TGSI_SEMANTIC_NORMAL.
From: Michal Krol mic...@vmware.com Use TGSI_SEMANTIC_GENERIC for this kind of stuff. --- src/gallium/auxiliary/tgsi/tgsi_dump.c |2 +- src/gallium/auxiliary/tgsi/tgsi_text.c |2 +- src/gallium/docs/source/tgsi.rst |6 -- src/gallium/drivers/svga/svga_tgsi_decl_sm30.c |4 src/gallium/include/pipe/p_shader_tokens.h |2 +- 5 files changed, 3 insertions(+), 13 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c b/src/gallium/auxiliary/tgsi/tgsi_dump.c index 5703141..b6df249 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_dump.c +++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c @@ -120,7 +120,7 @@ static const char *semantic_names[] = FOG, PSIZE, GENERIC, - NORMAL, + , FACE, EDGEFLAG, PRIM_ID, diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c b/src/gallium/auxiliary/tgsi/tgsi_text.c index f918151..356eee0 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_text.c +++ b/src/gallium/auxiliary/tgsi/tgsi_text.c @@ -933,7 +933,7 @@ static const char *semantic_names[TGSI_SEMANTIC_COUNT] = FOG, PSIZE, GENERIC, - NORMAL, + , FACE, EDGEFLAG, PRIM_ID, diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index c292cd3..d5e0220 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -1397,12 +1397,6 @@ These attributes are called generic because they may be used for anything else, including parameters, texture generation information, or anything that can be stored inside a four-component vector. -TGSI_SEMANTIC_NORMAL - - -Vertex normal; could be used to implement per-pixel lighting for legacy APIs -that allow mixing fixed-function and programmable stages. - TGSI_SEMANTIC_FACE diff --git a/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c b/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c index 73102a7..05d9102 100644 --- a/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c +++ b/src/gallium/drivers/svga/svga_tgsi_decl_sm30.c @@ -61,10 +61,6 @@ static boolean translate_vs_ps_semantic( struct tgsi_declaration_semantic semant *idx = semantic.Index + 1; /* texcoord[0] is reserved for fog */ *usage = SVGA3D_DECLUSAGE_TEXCOORD; break; - case TGSI_SEMANTIC_NORMAL: - *idx = semantic.Index; - *usage = SVGA3D_DECLUSAGE_NORMAL; - break; default: assert(0); *usage = SVGA3D_DECLUSAGE_TEXCOORD; diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index c5c480f..baff802 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -139,7 +139,7 @@ struct tgsi_declaration_dimension #define TGSI_SEMANTIC_FOG3 #define TGSI_SEMANTIC_PSIZE 4 #define TGSI_SEMANTIC_GENERIC5 -#define TGSI_SEMANTIC_NORMAL 6 +/* gap */ #define TGSI_SEMANTIC_FACE 7 #define TGSI_SEMANTIC_EDGEFLAG 8 #define TGSI_SEMANTIC_PRIMID 9 -- 1.7.0.1.147.g6d84b -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
[Mesa3d-dev] [PATCH 5/6] nvfx: support proper shader linkage - adds glsl support
--- src/gallium/drivers/nvfx/nvfx_fragprog.c | 146 ++-- src/gallium/drivers/nvfx/nvfx_shader.h |1 + src/gallium/drivers/nvfx/nvfx_state.c |4 + src/gallium/drivers/nvfx/nvfx_state.h | 15 +++ src/gallium/drivers/nvfx/nvfx_state_emit.c |2 +- src/gallium/drivers/nvfx/nvfx_vertprog.c | 40 ++-- 6 files changed, 143 insertions(+), 65 deletions(-) diff --git a/src/gallium/drivers/nvfx/nvfx_fragprog.c b/src/gallium/drivers/nvfx/nvfx_fragprog.c index 5fa825a..b4b63e2 100644 --- a/src/gallium/drivers/nvfx/nvfx_fragprog.c +++ b/src/gallium/drivers/nvfx/nvfx_fragprog.c @@ -1,6 +1,7 @@ #include pipe/p_context.h #include pipe/p_defines.h #include pipe/p_state.h +#include util/u_semantics.h #include util/u_inlines.h #include pipe/p_shader_tokens.h @@ -16,8 +17,6 @@ struct nvfx_fpc { struct nvfx_fragment_program *fp; - uint attrib_map[PIPE_MAX_SHADER_INPUTS]; - unsigned r_temps; unsigned r_temps_discard; struct nvfx_sreg r_result[PIPE_MAX_SHADER_OUTPUTS]; @@ -36,6 +35,8 @@ struct nvfx_fpc { struct nvfx_sreg imm[MAX_IMM]; unsigned nr_imm; + + unsigned char sem_table[256]; /* semantic idx for each input semantic */ }; static INLINE struct nvfx_sreg @@ -111,6 +112,11 @@ emit_src(struct nvfx_fpc *fpc, int pos, struct nvfx_sreg src) sr |= (NVFX_FP_REG_TYPE_TEMP NVFX_FP_REG_TYPE_SHIFT); sr |= (src.index NVFX_FP_REG_SRC_SHIFT); break; + case NVFXSR_RELOCATED: + sr |= (NVFX_FP_REG_TYPE_INPUT NVFX_FP_REG_TYPE_SHIFT); + printf(adding relocation at %x for %x\n, fpc-inst_offset, src.index); + util_dynarray_append(fpc-fp-sem_relocs[src.index], unsigned, fpc-inst_offset); + break; case NVFXSR_CONST: if (!fpc-have_const) { grow_insns(fpc, 4); @@ -241,8 +247,28 @@ tgsi_src(struct nvfx_fpc *fpc, const struct tgsi_full_src_register *fsrc) switch (fsrc-Register.File) { case TGSI_FILE_INPUT: - src = nvfx_sr(NVFXSR_INPUT, - fpc-attrib_map[fsrc-Register.Index]); + if(fpc-fp-info.input_semantic_name[fsrc-Register.Index] == TGSI_SEMANTIC_POSITION) { + assert(fpc-fp-info.input_semantic_index[fsrc-Register.Index] == 0); + src = nvfx_sr(NVFXSR_INPUT, NVFX_FP_OP_INPUT_SRC_POSITION); + } else if(fpc-fp-info.input_semantic_name[fsrc-Register.Index] == TGSI_SEMANTIC_COLOR) { + if(fpc-fp-info.input_semantic_index[fsrc-Register.Index] == 0) + src = nvfx_sr(NVFXSR_INPUT, NVFX_FP_OP_INPUT_SRC_COL0); + else if(fpc-fp-info.input_semantic_index[fsrc-Register.Index] == 1) + src = nvfx_sr(NVFXSR_INPUT, NVFX_FP_OP_INPUT_SRC_COL1); + else + assert(0); + } else if(fpc-fp-info.input_semantic_name[fsrc-Register.Index] == TGSI_SEMANTIC_FOG) { + assert(fpc-fp-info.input_semantic_index[fsrc-Register.Index] == 0); + src = nvfx_sr(NVFXSR_INPUT, NVFX_FP_OP_INPUT_SRC_FOGC); + } else if(fpc-fp-info.input_semantic_name[fsrc-Register.Index] == TGSI_SEMANTIC_FACE) { + /* TODO: check this has the correct values */ + /* XXX: what do we do for nv30 here (assuming it lacks facing)?! */ + assert(fpc-fp-info.input_semantic_index[fsrc-Register.Index] == 0); + src = nvfx_sr(NVFXSR_INPUT, NV40_FP_OP_INPUT_SRC_FACING); + } else { + assert(fpc-fp-info.input_semantic_name[fsrc-Register.Index] == TGSI_SEMANTIC_GENERIC); + src = nvfx_sr(NVFXSR_RELOCATED, fpc-sem_table[fpc-fp-info.input_semantic_index[fsrc-Register.Index]]); + } break; case TGSI_FILE_CONSTANT: src = constant(fpc, fsrc-Register.Index, NULL); @@ -611,48 +637,6 @@ nvfx_fragprog_parse_instruction(struct nvfx_context* nvfx, struct nvfx_fpc *fpc, } static boolean -nvfx_fragprog_parse_decl_attrib(struct nvfx_context* nvfx, struct nvfx_fpc *fpc, - const struct tgsi_full_declaration *fdec) -{ - int hw; - - switch (fdec-Semantic.Name) { - case TGSI_SEMANTIC_POSITION: - hw = NVFX_FP_OP_INPUT_SRC_POSITION; - break; - case TGSI_SEMANTIC_COLOR: - if (fdec-Semantic.Index == 0) { - hw = NVFX_FP_OP_INPUT_SRC_COL0; - } else - if (fdec-Semantic.Index == 1) { - hw = NVFX_FP_OP_INPUT_SRC_COL1; - } else { - NOUVEAU_ERR(bad colour semantic index\n); -
Re: [Mesa3d-dev] r300g: hack around issue with doom3 and 0 stride
r300g: hack around issue with doom3 and 0 stride This is most likely a bug in the mesa state tracker, but do the quick hack for now to avoid the divide by 0. This is not a bug: stride 0 means that the vertex attribute is constant for all vertices. It is not a special value either: advancing the vertex attribute pointer by 0 will naturally result in always fetching the same value. Thus, the patch is not likely to be correct: you should instead either program stride 0 to the hardware if supported, or fetch the vertex attribute with the CPU (I think it is always in a user buffer, but not sure, maybe OpenGL allows explicitly specifying a VBO with stride 0) and use whatever means Radeon provides to set a constant vertex attribute (e.g. nVidia GPUs have a FIFO method exactly for that). -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] u_half.? - u_format_half.? rename
I named it that way because it is datatype conversion functionality, which is conceptually a lower layer than format conversion, which operates on multi-component formats, and is also totally independent of the existing format conversion functionality. It is the only member of that layer because all other currently needed datatype conversions can be performed with trivial C language expressions: this could change as other unusual floating point datatypes are needed (e.g. 6e5 and 5e5 for EXT_packed_float). That said, feel free to rename it: it's just a cosmetic issue. Alternatively, maybe it a new data conversion prefix could be invented, like u_convert_half.* or something like that. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler
This might depend on the target: R600+, for example, is quite scalar-oriented anyway (modulo a lot of subtle limitations), so just pretending that everything is scalar could work well there since revectorizing is almost unnecessary. Interesting, nv50 is also almost fully scalar, and based on the Gallium driver source, i965 seems to be scalar too. So it seems it would really make sense to also have a scalar IR, whether LLVM IR or something else. Of course, scalar is usually actually SoA SIMD, but that's mostly hidden, except for things like barriers, join points and nv50 voting instructions. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Gallium: ARB_half_float_vertex
There was some talk about doing the query with a vertex buffer target for is_format_supported. After gallium-resources is merged, this should be automatically possible. BTW, the st/mesa patch originally was from Dave Airlie and was slightly changed by me. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Gallium: ARB_half_float_vertex
Does it mean there will be format fallbacks? Because dword-unaligned but still pretty common (i.e. GL1.1) vertex formats aren't supported by r300, most often we hit R16G16B16. What will happen when is_format_supported says NO to such a format? I hope it won't share the fate of PIPE_CAP_SM3, which every in-tree state tracker ignores. I'm not sure I understand correctly what you are saying. The idea is to do like you did in your patch, but instead of calling screen-get_param(screen, PIPE_CAP_HALF_FLOAT_VERTEX), calling screen-is_format_supported(screen, PIPE_FORMAT_R16G16B16G16, PIPE_BUFFER, ..., ...). The PIPE_BUFFER target is supported in gallium-resources, but I'm not sure whether this way of querying vertex formats is supported; it would probably need to be added first. If you mean that r300 doesn't support R16G16B16, I suppose you can just use R16G16B16A16 and ignore the extra fetched w element (the vertex buffer stride will make this work properly). However, if non-dword-aligned vertex buffer strides or vertex element offsets are not supported, I think you have a serious problem, which is however independent of half float vertices since I don't think OpenGL places any alignment constraints on those values (correct me if I'm wrong). -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Gallium: ARB_half_float_vertex
If you mean that r300 doesn't support R16G16B16, I suppose you can just use R16G16B16A16 and ignore the extra fetched w element (the vertex buffer stride will make this work properly). I've tried to do it this way, it locks up (unless I am missing something). Shouldn't there be official ATI hardware documentation for r300 describing such things? (just curious) Otherwise, I guess you could trace the ATI binary driver and see what it does... -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler
Way back I actually looked into LLVM for R300. I was totally unconvinced by their vector support back then, but that may well have changed. In particular, I'm curious about how LLVM deals with writemasks. Writing to only a select subsets of components of a vector is something I've seen in a lot of shaders, but it doesn't seem to be too popular in CPU-bound SSE code, which is probably why LLVM didn't support it well. Has that improved? The trouble with writemasks is that it's not something you can just implement one module for. All your optimization passes, from simple peephole to the smartest loop modifications need to understand the meaning of writemasks. You should be able to just use shufflevector/insertelement/extractelement to mix the new computed values with the previous values in the vector register (as well as doing swizzles). There is also the option of immediately scalarizing, optimizing the scalar code, and then revectorizing. This risks pessimizing the input code, but might turn out to work well. I agree, though if I were to start an LLVM-based compilation project, I would do it for R600+, not for R300. That would be a very different kind of project. A LLVM-TGSI conversion is not the best way to go because TGSI doesn't match the hardware all that well, at least in the Radeon family. R300-R500 fragment programs have the weird RGB/A split, and R600+ is yet another beast that looks quite different from TGSI. So at least for Radeon, I believe it would be best to generate hardware-level instructions directly from LLVM, possibly via some Radeon-family specific intermediate representation. The advantage of LLVM-TGSI would be that it works with all drivers without any driver specific code, so it probably makes sense as an initial step. nv30/nv40 fragment programs map almost directly to TGSI (with the addition of condition codes, and half float precision, and a few other things). Things that end up using an existing graphics API like vmware svga, or using the llvm optimizer for game development, also need tgsi-like output. Thus, even if TGSI itself becomes irrelevant at some point, any nontrivial parts of the LLVM-TGSI code should be needed anyway for those cases. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] How do we init half float tables?
They are not passing for me with current master and a 32-bit system: Here are the failures: Testing util_format_dxt1_rgb_pack_8unorm ... FAILED: f2 d7 90 20 ae 2c 6f 97 obtained f2 d7 b0 20 ae 2c 6f 97 expected Testing util_format_dxt5_rgba_pack_8unorm ... FAILED: f7 10 c5 0c 9a 73 b4 9c f6 8f ab 32 2a 9a 95 5a obtained f8 11 c5 0c 9a 73 b4 9c f6 8f ab 32 2a 9a 95 5a expected Testing util_format_dxt1_rgb_unpack_8unorm ... FAILED: {0x99, 0xb0, 0x8e, 0xff}, {0x5d, 0x62, 0x89, 0xff}, {0x99, 0xb0, 0x8e, 0xff}, {0x99, 0xb0, 0x8e, 0xff}, {0xd6, 0xff, 0x94, 0xff}, {0x5d, 0x62, 0x89, 0xff}, {0x99, 0xb0, 0x8e, 0xff}, {0xd6, 0xff, 0x94, 0xff}, {0x5d, 0x62, 0x89, 0xff}, {0x5d, 0x62, 0x89, 0xff}, {0x99, 0xb0, 0x8e, 0xff}, {0x21, 0x14, 0x84, 0xff}, {0x5d, 0x62, 0x89, 0xff}, {0x21, 0x14, 0x84, 0xff}, {0x21, 0x14, 0x84, 0xff}, {0x99, 0xb0, 0x8e, 0xff} obtained {0x98, 0xaf, 0x8e, 0xff}, {0x5c, 0x62, 0x88, 0xff}, {0x98, 0xaf, 0x8e, 0xff}, {0x98, 0xaf, 0x8e, 0xff}, {0xd6, 0xff, 0x94, 0xff}, {0x5c, 0x62, 0x88, 0xff}, {0x98, 0xaf, 0x8e, 0xff}, {0xd6, 0xff, 0x94, 0xff}, {0x5c, 0x62, 0x88, 0xff}, {0x5c, 0x62, 0x88, 0xff}, {0x98, 0xaf, 0x8e, 0xff}, {0x21, 0x13, 0x84, 0xff}, {0x5c, 0x62, 0x88, 0xff}, {0x21, 0x13, 0x84, 0xff}, {0x21, 0x13, 0x84, 0xff}, {0x98, 0xaf, 0x8e, 0xff} expected Testing util_format_dxt1_rgba_unpack_8unorm ... FAILED: {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90, 0xff}, {0x4e, 0xaa, 0x90, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90, 0xff}, {0x29, 0xff, 0xff, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90, 0xff}, {0x73, 0x55, 0x21, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90, 0xff}, {0x4e, 0xaa, 0x90, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xaa, 0x90, 0xff} obtained {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f, 0xff}, {0x4e, 0xa9, 0x8f, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f, 0xff}, {0x29, 0xff, 0xff, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f, 0xff}, {0x73, 0x54, 0x21, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f, 0xff}, {0x4e, 0xa9, 0x8f, 0xff}, {0x00, 0x00, 0x00, 0x00}, {0x00, 0x00, 0x00, 0x00}, {0x4e, 0xa9, 0x8f, 0xff} expected Testing util_format_dxt3_rgba_unpack_8unorm ... FAILED: {0x6d, 0xc6, 0x96, 0x77}, {0x6d, 0xc6, 0x96, 0xee}, {0x6d, 0xc6, 0x96, 0xaa}, {0x8c, 0xff, 0xb5, 0x44}, {0x6d, 0xc6, 0x96, 0xff}, {0x6d, 0xc6, 0x96, 0x88}, {0x31, 0x55, 0x5a, 0x66}, {0x6d, 0xc6, 0x96, 0x99}, {0x31, 0x55, 0x5a, 0xbb}, {0x31, 0x55, 0x5a, 0x55}, {0x31, 0x55, 0x5a, 0x11}, {0x6d, 0xc6, 0x96, 0xcc}, {0x6d, 0xc6, 0x96, 0xcc}, {0x6d, 0xc6, 0x96, 0x11}, {0x31, 0x55, 0x5a, 0x44}, {0x31, 0x55, 0x5a, 0x88} obtained {0x6c, 0xc6, 0x96, 0x77}, {0x6c, 0xc6, 0x96, 0xee}, {0x6c, 0xc6, 0x96, 0xa9}, {0x8c, 0xff, 0xb5, 0x43}, {0x6c, 0xc6, 0x96, 0xff}, {0x6c, 0xc6, 0x96, 0x87}, {0x31, 0x54, 0x5a, 0x66}, {0x6c, 0xc6, 0x96, 0x98}, {0x31, 0x54, 0x5a, 0xba}, {0x31, 0x54, 0x5a, 0x54}, {0x31, 0x54, 0x5a, 0x10}, {0x6c, 0xc6, 0x96, 0xcc}, {0x6c, 0xc6, 0x96, 0xcc}, {0x6c, 0xc6, 0x96, 0x10}, {0x31, 0x54, 0x5a, 0x43}, {0x31, 0x54, 0x5a, 0x87} expected Testing util_format_dxt5_rgba_unpack_8unorm ... FAILED: {0x6d, 0xc6, 0x96, 0x74}, {0x6d, 0xc6, 0x96, 0xf8}, {0x6d, 0xc6, 0x96, 0xb6}, {0x8c, 0xff, 0xb5, 0x53}, {0x6d, 0xc6, 0x96, 0xf8}, {0x6d, 0xc6, 0x96, 0x95}, {0x31, 0x55, 0x5a, 0x53}, {0x6d, 0xc6, 0x96, 0x95}, {0x31, 0x55, 0x5a, 0xb6}, {0x31, 0x55, 0x5a, 0x53}, {0x31, 0x55, 0x5a, 0x11}, {0x6d, 0xc6, 0x96, 0xd7}, {0x6d, 0xc6, 0x96, 0xb6}, {0x6d, 0xc6, 0x96, 0x11}, {0x31, 0x55, 0x5a, 0x32}, {0x31, 0x55, 0x5a, 0x95} obtained {0x6c, 0xc6, 0x96, 0x73}, {0x6c, 0xc6, 0x96, 0xf7}, {0x6c, 0xc6, 0x96, 0xb6}, {0x8c, 0xff, 0xb5, 0x53}, {0x6c, 0xc6, 0x96, 0xf7}, {0x6c, 0xc6, 0x96, 0x95}, {0x31, 0x54, 0x5a, 0x53}, {0x6c, 0xc6, 0x96, 0x95}, {0x31, 0x54, 0x5a, 0xb6}, {0x31, 0x54, 0x5a, 0x53}, {0x31, 0x54, 0x5a, 0x10}, {0x6c, 0xc6, 0x96, 0xd7}, {0x6c, 0xc6, 0x96, 0xb6}, {0x6c, 0xc6, 0x96, 0x10}, {0x31, 0x54, 0x5a, 0x31}, {0x31, 0x54, 0x5a, 0x95} expected Compiling libtxc_dxtn with -O0 or with -march=core2 -msse2 -mfpmath=sse did not make them work. As you can see the tests seem mostly off-by-one, which makes me think of an approximation problem. libtxc_dxtn seems to take 8-bit input instead of floating point input, so and it seems to be inherently hard to get it to roundtrip sensibly. Since only integer-coordinate points can be used, they are unlikely to be exactly on a line unless specifically crafted to be so. Thus, a possible solution could be to actually pick a starting color, pick an increment, and generate an exact line by adding multiples of that increment to the starting color. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks
Re: [Mesa3d-dev] How do we init half float tables?
For instance, the DXT1 test is wrong. The red values used are: 33 93 153 214 99 - 33 = 60 153 - 93 = 60 214 - 153 = 61 213 should be used instead (i.e. 0xd5 instead 0xd6) -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
[Mesa3d-dev] [PATCH] progs/gallium/unit: improve error detection in u_format_test and make it more lenient for S3TC
Collect the maximum error for fetch/unpack tests, and ratio of flipped to total bits for pack tests. Add lenient thresholds for S3TC tests. --- progs/gallium/unit/u_format_test.c | 163 +++- 1 files changed, 86 insertions(+), 77 deletions(-) diff --git a/progs/gallium/unit/u_format_test.c b/progs/gallium/unit/u_format_test.c index 53e0284..1911dad 100644 --- a/progs/gallium/unit/u_format_test.c +++ b/progs/gallium/unit/u_format_test.c @@ -36,22 +36,48 @@ #include util/u_format_s3tc.h +static float +float_error(float x, float y) +{ + return fabsf(y - x); +} + +static float +byte_error(uint8_t x, uint8_t y) +{ + return float_error(x / 255.0, y / 255.0); +} + +/* this is done in this terrible way only because these are unit tests. + * a real implementation must use a lookup table, or the mask/shift/add + * algorithm in the Linux source + * it should also use the builtin/intrinsic if available + */ +static unsigned +popcnt8(uint8_t v) +{ + unsigned i; + unsigned cnt = 0; + for(i = 0; i 8; ++i) + cnt += ((v i) 1); + return cnt; +} + static boolean -compare_float(float x, float y) +print_max_error(const struct util_format_description *format_desc, float max_error) { - float error = y - x; + if(max_error = FLT_EPSILON) + return TRUE; - if (error 0.0f) - error = -error; + printf(MAX ABS ERROR: %f float, %.1f 8scaled\n, max_error, max_error * 255.0); - if (error FLT_EPSILON) { - return FALSE; - } + /* compression tests aren't currently perfect, so be lenient here */ + if(format_desc-layout == UTIL_FORMAT_LAYOUT_S3TC max_error 0.01f) + return TRUE; - return TRUE; + return FALSE; } - static void print_packed(const struct util_format_description *format_desc, const char *prefix, @@ -69,6 +95,31 @@ print_packed(const struct util_format_description *format_desc, printf(%s, suffix); } +static boolean +print_packed_results(const struct util_format_description *format_desc, const struct util_format_test_case *test, uint8_t* packed) +{ + unsigned flipped_bits = 0; + unsigned total_bits = 0; + float flipped_bits_ratio; + unsigned i; + for (i = 0; i format_desc-block.bits/8; ++i) { + flipped_bits += popcnt8((test-packed[i] ^ packed[i]) test-mask[i]); + total_bits += popcnt8(test-mask[i]); + } + + flipped_bits_ratio = (float)flipped_bits / total_bits; + + if (flipped_bits) + printf(FLIPPED BITS: %u (%u %%)\n, flipped_bits, (unsigned)(flipped_bits_ratio * 100.0)); + + /* TODO: S3TC threshold is random */ + if (flipped_bits_ratio (format_desc-layout == UTIL_FORMAT_LAYOUT_S3TC ? 0.1 : 0)) { + print_packed(format_desc, FAILED: , packed, obtained\n); + print_packed(format_desc, , test-packed, expected\n); + return FALSE; + } + return TRUE; +} static void print_unpacked_doubl(const struct util_format_description *format_desc, @@ -94,7 +145,7 @@ print_unpacked_doubl(const struct util_format_description *format_desc, static void print_unpacked_float(const struct util_format_description *format_desc, const char *prefix, - const float unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4], + float unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4], const char *suffix) { unsigned i, j; @@ -115,7 +166,7 @@ print_unpacked_float(const struct util_format_description *format_desc, static void print_unpacked_8unorm(const struct util_format_description *format_desc, const char *prefix, - const uint8_t unpacked[][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4], + uint8_t unpacked[][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4], const char *suffix) { unsigned i, j; @@ -138,26 +189,23 @@ test_format_fetch_float(const struct util_format_description *format_desc, { float unpacked[UTIL_FORMAT_MAX_UNPACKED_HEIGHT][UTIL_FORMAT_MAX_UNPACKED_WIDTH][4] = { { { 0 } } }; unsigned i, j, k; - boolean success; + float max_error = 0.0f; - success = TRUE; for (i = 0; i format_desc-block.height; ++i) { for (j = 0; j format_desc-block.width; ++j) { format_desc-fetch_float(unpacked[i][j], test-packed, j, i); - for (k = 0; k 4; ++k) { -if (!compare_float(test-unpacked[i][j][k], unpacked[i][j][k])) { - success = FALSE; -} - } + for (k = 0; k 4; ++k) +max_error = MAX2(max_error, float_error(test-unpacked[i][j][k], unpacked[i][j][k])); } } - if (!success) { + if (!print_max_error(format_desc, max_error)) { print_unpacked_float(format_desc, FAILED: , unpacked, obtained\n); print_unpacked_doubl(format_desc, , test-unpacked, expected\n); + return FALSE; } - return success; +
Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler
This is somewhat nice, but without using a real compiler, the result will still be just a toy, unless you employ hundreds of compiler experts working full time on the project. For instance, Wikipedia lists the following loop optimizations: # loop interchange : These optimizations exchange inner loops with outer loops. When the loop variables index into an array, such a transformation can improve locality of reference, depending on the array's layout. This is also known as loop permutation. # loop splitting/loop peeling : Loop splitting attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops which have the same bodies but iterate over different contiguous portions of the index range. A useful special case is loop peeling, which can simplify a loop with a problematic first iteration by performing that iteration separately before entering the loop. # loop fusion or loop combining : Another technique which attempts to reduce loop overhead. When two adjacent loops would iterate the same number of times (whether or not that number is known at compile time), their bodies can be combined as long as they make no reference to each other's data. # loop fission or loop distribution : Loop fission attempts to break a loop into multiple loops over the same index range but each taking only a part of the loop's body. This can improve locality of reference, both of the data being accessed in the loop and the code in the loop's body. # loop unrolling: Duplicates the body of the loop multiple times, in order to decrease the number of times the loop condition is tested and the number of jumps, which may degrade performance by impairing the instruction pipeline. Completely unrolling a loop eliminates all overhead (except multiple instruction fetches increased program load time), but requires that the number of iterations be known at compile time (except in the case of JIT compilers). Care must also be taken to ensure that multiple re-calculation of indexed variables is not a greater overhead than advancing pointers within the original loop. # loop unswitching : Unswitching moves a conditional inside a loop outside of it by duplicating the loop's body, and placing a version of it inside each of the if and else clauses of the conditional. # loop inversion : This technique changes a standard while loop into a do/while (a.k.a. repeat/until) loop wrapped in an if conditional, reducing the number of jumps by two, for cases when the loop is executed. Doing so duplicates the condition check (increasing the size of the code) but is more efficient because jumps usually cause a pipeline stall. Additionally, if the initial condition is known at compile-time and is known to be side-effect-free, the if guard can be skipped. # loop-invariant code motion : If a quantity is computed inside a loop during every iteration, and its value is the same for each iteration, it can vastly improve efficiency to hoist it outside the loop and compute its value just once before the loop begins. This is particularly important with the address-calculation expressions generated by loops over arrays. For correct implementation, this technique must be used with loop inversion, because not all code is safe to be hoisted outside the loop. # loop reversal : Loop reversal reverses the order in which values are assigned to the index variable. This is a subtle optimization which can help eliminate dependencies and thus enable other optimizations. Also, certain architectures utilise looping constructs at Assembly language level that count in a single direction only (e.g. decrement-jump-if-not-zero (DJNZ)). # loop tiling/loop blocking : Loop tiling reorganizes a loop to iterate over blocks of data sized to fit in the cache. # loop skewing : Loop skewing takes a nested loop iterating over a multidimensional array, where each iteration of the inner loop depends on previous iterations, and rearranges its array accesses so that the only dependencies are between iterations of the outer loop. Good luck doing all this on TGSI (especially if the developer does not have serious experience writing production compilers). Also, this does not mention all the other optimizations and analyses required to the above stuff well (likely other 10-20 things). Using a real compiler (e.g. LLVM, but also gcc or Open64), those optimizations are already implemented, or at least there is already a team of experienced compiler developers who are working full time to implement such optimizations, allowing you to then just turn them on without having to do any of the work yourself. Note all X compiler is bad for VLIW or whatever GPU architecture objections are irrelevant, since almost all optimizations are totally architecture independent. Also note that we should support OpenCL/compute shaders (already available for *3* years on e.g. nv50) and those *really* need a real compiler (as in, something developed for years by a team of compiler experts,
Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler
As a further example that just came to mind, nv40 (GeForce 6-7 and PS3 RSX) supports control flow in fragment shaders, but does not apparently support the continue keyword (since NV_fragment_program2, which maps almost directly to the hardware, does not have it either). I implemented TGSI control flow in a private branch, but did not implement the continue keyword. Implementing continue requires to transform the code to generate and carry around should continue flags, or perform even less trivial transformations including code duplication. Unfortunately, doing requires non-local modifications, and thus would require to do something beyond just scanning the TGSI source code as the nv30/nv40 driver currently does. If there was a TGSI-LLVM-TGSI module, the LLVM-TGSI control flow reconstruction would already handle this, and it would be enough to tell it to not make use of the continue instruction: it would then automatically generate the proper if/endif structure, duplicating code and/or introducing flags as needed in a generic way. As things stand now, I'm faced with either just hoping the GLSL programs don't use continue, implementing an hack in the nv40 shader backend (where such an high-level optimization does not belong at all and can't be done cleanly), or writing the LLVM module myself before tackling this. With an LLVM-based infrastructure, there would be a clear and straightforward way to solve this, will all the supporting infrastructure already available and the ability to create an optimization pass reusable by other drivers that may face the same issue. This is just an example, by the way: others can be found. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler
By the way, if you want a simple, limited and temporary, but very effective, way to optimize shaders, here it is: 1. Trivially convert TGSI to GLSL 2. Feed the GLSL to the nVidia Cg compiler, telling it to produce optimized output in ARB_fragment_program format 3. Ask the Mesa frontend/state tracker to parse the ARB_fragment_program and give you back TGSI This does actually optimize the program well and does all the nice control flow transformations desired. If your GPU can support predicates or condition codes, you can also ask the Cg compiler to give you NV_fragment_program_option, which will use them efficiently. If it also supports control flow, you can ask for NV_fragment_program2 and get control flow too where appropriate. Of course, if this does not happen to do exactly what you want, you are totally out of luck, since it is closed source. With an ad-hoc TGSI optimizer, you can modify it, but that will often require to rearchitecture the module, since it may be too primitive for the new feature you want, and implement everything from scratch with no supporting tools to help you. With a real compiler framework, you already have the optimization ready for use, or you at least have a comprehensive conceptual framework and IR and a full set of analyses, frameworks and tools to use, not to mention a whole community of compiler developers that can at least tell you what is the best way of doing what you want (actually giving out competent advice), if not even have already done or planned to do it themselves. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] gallium-util-format-is-supported
I don't agree with this. Making the format description table mutable when the only formats that are potentially unsupported due to patent issues are s3tc variants makes no sense. S3TC formats *are* special. There is nothing to generalize here. Yes, I don't like this very much either. The immediate alternative is to have separate is_supported flags for externally-implemented formats, but this also doesn't look perfect to me. Another thing to look at is to remove both is_supported and the pack/unpack functions, and put them in a separate, possibly mutable, table. In some sense pack/unpack functionality does not really belong in the format description, since many interfaces are possible (for instance llvmpipe has another interface that is code-generated separately for SoA tiles). This last option, with a mutable format access table, seems conceptually the cleanest to me, but not sure. Replacing the conditionals with a no-op stubs is a good optimization. But attempting to load s3tc shared library from the stubs is unnecessary. Stubs should have an assert(0) -- it is an error to attempt any S3TC (de)compression when there's no support for it. The fundamental issue here seems to be: what to do if the application tries to read/write an unsupported format? Currently, unsupported formats have empty functions rather than assert(0), so I just kept with that convention. Since it is permissible to call other format functions without checking they are supported, I made S3TC work consistently with that, which requires on-demand loading upon format access. In general it seems to me that the fact that S3TC (or any other) formats are somehow special should be completely hidden to any user. This allows to write generic robust format-independent code. Explicit initialization or ad-hoc format checking goes counter to this, and requires to sprinkle code everywhere (for instance, I suspect the rbug texture-examination tools don't work right now in master on S3TC because they don't call util_format_s3tc_init). It might makes sense to make all unsupported formats assert(0). A C++ exception would be the perfect thing since you could catch it, but unfortunately we aren't using C++ right now. Another option that seems better to me is to have an util_format_get_functions that would either give you a pointer to a table of functions, or return NULL if unsupported, and make this the only way of accessing format conversions. This way, applications will naturally have to check for support before usage, and both GCC and a static checker can be told to flag an error if the util_format_get_functions return value is not checked for null before use. BTW, note that the indirect function calls are also generally slow, and we may want to switch Gallium to C++ and use C++ templates to specialize (and fully inline) whole algorithms for specific formats. llvmpipe and the code generation facilities lessen the need for this, but it might perhaps be worthwhile in some cases. This a wholly separate issue, but may be worth keeping in mind. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler
I agree with you that doing these kinds of optimizations is a difficult task, but I am trying to focus my proposal on emulating branches and loops for older hardware that don't have branching instructions rather than performing global optimizations on the TGSI code. Â I don't think most of the loop optimizations you listed are even possible on hardware without branching instructions. Yes, that's possible. In fact, if you unroll loops, those optimizations can be done after loop unrolling. This does not however necessarily change things, since while you can e.g. avoid loop-invariant code motion, you still need common subexpression elimination to remove the mutiple redundant copies of the loop-invariant code generated by unrolling. Also even loop unrolling needs to find the number of iterations, which at the very least requires simple constant folding, and potentially a whole suite of complex optimization to work in all possible Some of the challenges of this were mentioned in a previous thread, as well as LLVM-related issues (2) Write a LLVM-TGSI backend, restricted to programs without any control flow I think (2) is probably the closest to what I am proposing, and it is something I can take a look at. Note that this means an _input_ program without control flow, that is a control flow graph with a single basic block. Once you have more than one basic block, you need to convert the CFG for an arbitrary graph to something made of structured loops and conditionals. The problem here is that GPUs often use a SIMT approach. This means that the GPU internally works like an SSE CPU with vector registers (but often much wider, with up to 32 elements or even more). However, this is hidden to the programmer, by putting the variables related to several pixels in the vector, and making you think everything is a scalar or just a 4-component vector This works fine as long as there is no control flow; however when you reach a conditional jump, some pixels may want to take one path and some others another path. The solution is to have an execution mask and do not write to any pixels not in the execution masks. When and if/else/endif structure is encountered, if the pixels all take the same path, things work like CPUs; if that is not the case, both branches are executed with the appropriate execution masks, and things continue normally after the endif. The problem here is that this needs a structure if/else/endif formulation as opposed to arbitrary gotos. However LLVM and most optimizers work in arbitrary-goto formulation, which needs to be converted to a structured approach. The above all applies for GPU with hardware control flow. However, even without it, you have the same issue of reconstructing if/else/endif blocks, since you need to basically do the same in software, using a the if conditional to choose between results computed by the branches. Converting a control flow graph to a structured program is always possible, but doing it well requires some thought. In particular, you need to be careful to not break DDX instructions, which operate on a 2x2 block of pixels, and will thus behave differently if some of the other things have diverged away due to control flow modifications. This may require to make sure control flow optimizations do not duplicate them, and possibly other issues. Using an ad-hoc optimizer does indeed sidestep the issue, but only as long as you don't try to do non-trivial control flow optimization or changes. In that case, those may be best expressed on an arbitrary control flow graph (e.g. the issue with converting continue to if/end), and at this point you would need to add that logic anyway. At any rate, I'm not sure whether this is suitable for your GSoC project or not. My impression is that using an existing compiler would prove to be more widely useful and more long lasting, especially considering that we are moving towards applications and hardware with very complex shader support (consider the CUDA/OpenCL shaders and the very generic GPU shading capabilities). An ad-hoc TGSI optimizer will probably prove unsuitable for efficient code generation for, say, scientific applications using OpenCL, and would need to be later replaced. So my personal impression (which could be wrong) is that using an existing optimizer, while possibly requiring an higher initial investment, should have much better payoffs in the long run, by making everything beyond the initial TGSI-LLVM-TGSI work already done or easier to do. From a coding perspective, you lose the design and write everything myself from scratch aspect, but you gain experience with a complex and real-world compiler, and are able to write more complex optimizations and transforms due to having a well-developed infrastructure allowing to express them easily. Furthermore, hopefully using a real compiler would result in seeing your work producing very good code in all cases, while an ad-hoc optimizer would impove the
Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler
Gallium. Obviously a code-generator that can handle control-flow (to be honest I'm really not sure why you want to restrict it to something without control- flow in the first place). The no-control-flow was just for the first step, with a second step supporting everything. Having said that I'm not sure whether this is something that's a good GSOC project. It's a fairly difficult piece of code to write. One that to do right will depend on adding some features to TGSI (a good source of inspiration for those would be AMD's CAL and NVIDIA's PTX http://developer.amd.com/gpu_assets/ATI_Intermediate_Language_(IL)_Specification_v2b.pdf http://www.nvidia.com/content/CUDA-ptx_isa_1.4.pdf ) This would be required to handle arbitrary LLVM code (e.g. for clang/OpenCL use), but since GLSL shader code starts as TGSI, it should be possible to convert it back without TGSI. I thought the initial proposal was likely a lot more feasible for a GSOC (of course there one has to point out that Mesa's GLSL compiler already does unroll loops and in general simplifies control-flow so the points #1 and #2 are largely no-ops, but surely there's enough work on Gallium Radeon's drivers left to keep Tom busy). Otherwise having a well-defined and reduced scope with clear deliverables would be rather necessary for LLVM-TGSI code because that is not something that you could get rock solid over a summer. I'd say, as an initial step, restricting to code produced by TGSI-LLVM (AoS) that can be expressed with no intrinsics, having a single basic block, with no optimization passes having been run on it. All 4 restrictions (from TGSI-LLVM, no instrinsics, single BB and no optimizations) can then be lifted in successive iterations. Of course, yes, it has a different scope than the original proposal. The problem I see is that since OpenCL will be hopefully done at some point, then as you say TGSI-LLVM will also be done, and that will probably make any other optimization work irrelevant. So basically the r300 optimization work looks doomed from the beginning to be eventually obsoleted. That said, you may want to do it anyway. But if you really want a quick fix for r300, seriously, just use the nVidia Cg compiler. It's closed source, but being produced by the nVidia team, you can generally rely on it not sucking. It takes GLSL input and spits out optimized ARB_fragment_program (or optionally other languages) so it is trivial to interface with it. It could even be useful to compare the output/performance of that with a more serious LLVM-based solution, to make sure we get the latter right. For instance, personally, I did work on the nv30/nv40 shader assembler (note the word assembler here), and haven't done anything more than simple local transforms, for exactly this reason. The only thing I've done for LLVM-TGSI is trying to recover Stephane Marchesin's work on LLVM (forgot to CC him too), lost in an hard drive crash, but failed to find anyone having pulled it. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] RFC: GSOC Proposal: R300/Gallium GLSL Compiler
So basically the r300 optimization work looks doomed from the beginning to be eventually obsoleted. Please consider there are hw-specific optimizations in place which I think no other compiler framework provides, and I believe this SSA thing will do Sure, but it seemed to me that all the optimizations proposed were hardware-independent and valid for any driver (other than having to know about generic capabilities like having control flow or not). even better job for superscalar r600. So yes, we need both LLVM to do global optimizations and RC to efficiently map code to hw. LLVM also uses SSA form (actually, it is totally built around it), assuming that's what you meant. There are doubts about whether the LLVM backend framework works well for GPUs or not (apparently because some GPUs are VLIW and only IA-64 is VLIW too, so LLVM support for it is either nonexisting or not necessary a major focus), but using LLVM-TGSI makes this irrelevant, since the existing TGSI-accepting backend will still run. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] gallium-resources branch merge
How about merging gallium-buffer-usage-cleanup as well, which is based on gallium-resources? Unless, it changed recently, the gallium-resources branch left a mix of old PIPE_BUFFER_USAGE_* and new PIPE_TRANSFER_* flags. It would nice to convert drivers having both branches, so that it is done once. However, note that I may be misunderstanding the branches, correct me if I'm worng. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] How do we init half float tables?
FWIW, I don't see any new s3tc formats. rgtc will not be handled by s3tc library since it isn't patent encumbered. util_format_is_s3tc will not include rgtc formats. (Though I guess that external decoding per-pixel is really rather lame, should do it per-block...) Yes the other formats (rgtc and bptc) have no patent claims listed. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] How do we init half float tables?
So far, there are no dependencies on Gallium in core Mesa. We've talked about refactoring some of the Gallium code into a separate module that'd be sharable between Gallium and Mesa. Â The format code would probably fit into that. Can't we just unconditionally pull gallium/auxiliary in Mesa? (unused stuff will be ignored by the linker due to .a behavior) -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] How do we init half float tables?
What are you seeing a regression on? texcompress and texcompsub seemed to work for me: I'll try to test something else and recheck the code. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] How do we init half float tables?
Sorry for the regression. This whole thing was done to fix the u_gctors.cpp issue, originally done by me, sent out without full testing since I saw duplicate work being done, and then merged by Roland if I recall correctly. I probably should not have fixed s3tc/util_format like it was done for u_half and instead put it in a branch and sent it to the ML first. Note that everything that reads pixels and does not call util_format_s3tc_init (e.g. I think rbug tools) needs something like this, or an explicit call which is likely to be forgotten (even finding out everything that ends up calling util_format is nontrivial). Anyway, this patch fixes a couple of bugs that may have caused the regression. How can I reproduce it locally? The DXTn unit tests do fail, but the values have usually a difference of 1, so I assume it's an approximation error. commit 80214ef6265d406496dc4fd3c76d8ac782cd012b Author: Luca Barbieri l...@luca-barbieri.com Date: Sat Apr 3 01:55:27 2010 +0200 gallium/util: fix inverted if is_nop logic in s3tc diff --git a/src/gallium/auxiliary/util/u_format_s3tc.c b/src/gallium/auxiliary/util/u_format_s3tc.c index d48551f..7808210 100644 --- a/src/gallium/auxiliary/util/u_format_s3tc.c +++ b/src/gallium/auxiliary/util/u_format_s3tc.c @@ -303,7 +303,7 @@ util_format_dxt3_rgba_unpack_8unorm(uint8_t *dst_row, unsigned dst_stride, const void util_format_dxt5_rgba_unpack_8unorm(uint8_t *dst_row, unsigned dst_stride, const uint8_t *src_row, unsigned src_stride, unsigned width, unsigned height) { - if (is_nop(util_format_dxt5_rgba_fetch)) { + if (!is_nop(util_format_dxt5_rgba_fetch)) { unsigned x, y, i, j; for(y = 0; y height; y += 4) { const uint8_t *src = src_row; @@ -324,7 +324,7 @@ util_format_dxt5_rgba_unpack_8unorm(uint8_t *dst_row, unsigned dst_stride, const void util_format_dxt1_rgb_unpack_float(float *dst_row, unsigned dst_stride, const uint8_t *src_row, unsigned src_stride, unsigned width, unsigned height) { - if (is_nop(util_format_dxt1_rgb_fetch)) { + if (!is_nop(util_format_dxt1_rgb_fetch)) { unsigned x, y, i, j; for(y = 0; y height; y += 4) { const uint8_t *src = src_row; -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] How do we init half float tables?
One more thing: I'm maintaining the u_format* modules. I'm not speaking the just in the long term, but in the sense I'm actually working on this as we speak. Â Please do not make this kind of deep reaching changes to the u_format stuff in master without clearing them first with me. Yes sorry, it was an attempt to fix breakage originally caused by code of mine that was sent out in a non-fully-mergeable state (to prevent duplicate work on half float conversion) and got merged anyway. Since master was already broken (due to u_gctors.cpp not being picked up by ld), it seemed a good idea to try to fix it. Unfortunately what seemed to be an easy fix gradually became something much more invasive than originally envisioned. After realizing the util_format_init thing wouldn't work out, I should have made these call util_format_s3tc_init again (was changed so they would init util_half as well) and then sent the util_foramt changes for review. I added a gallium-util-format-is-supported branch to hold the work and the fix I just sent. Sorry for not doing that in the first place. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] How do we init half float tables?
The s3tc-teximage test seems fixed by the two line change I put in gallium-util-format-is-supported. s3tc-texsubimage prints: Mesa: User error: GL_INVALID_VALUE in glTexSubImage2D(xoffset+width) Probe at (285,12) Expected: 1.00 0.00 0.00 Observed: 0.00 0.00 0.00 which seems to be due to a Mesa or testcase bug. As for u_format_test.c, it looks like it simply fails to account for DXTn being lossy. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] gallium raw interfaces
Once MS changes interfaces, then there's _no advantage_ to using DX10 internally, regardless of what WINE does, and one might as well use OpenGL. Wine doesn't change that. [resent to ML, inadvertently replied only to Miles] Note that my proposal was not to use DirectX 10 internally, but rather to expose DirectX 10 and promote it initially as an API to test Gallium and later as the preferred Linux graphics API instead of OpenGL, for the technical reason that a DirectX 10 over Gallium implementation carries much less performance overhead than an OpenGL implementation and is much simpler, due to the superior design of DirectX 10. Using an extended version of DirectX 10 internally could also be an option, but I don't think it's worth doing that right now and likely it's not worth at all. Also note that Microsoft does not use DirectX 10 or 11 internally either, but rather uses the DirectX 10 DDI or DirectX 10 Device Driver Interface, which is also publicly documented. The last time Microsoft did an incompatible interface change (DX10), it was to move away from fixed pipeline support with scattered state towards a shader-only pipeline with constant state objects. Exactly the same change was achieved by the move from the classic Mesa architecture to the Gallium architecture: you could think of the move to Gallium as switching to something like DX10 internally, done purely for technical reasons, partially the same as the ones that prompted Microsoft to make the transition. Actually, while this is not generally explicitly stated by Gallium designers, Gallium itself is generally evolving towards being closer to DirectX 10. The biggest deviations are additional features needed to support OpenGL features not included in DirectX 10. For instance, looking at recent changes: - Vertex element CSOs, recently added, are equivalent to DX10 input layouts - Sampler views, also recently added, are equivalent to DX10 shared resource views - Doing transfers per-context (recent work by Keith Whitwell) is what DX10 does - Having a resource concept (also recent work by Keith Whitwell) is what DX10 does - Gallium format values where changed from self-describing to a set of stock values like DX10 - Gallium format names where later changed and made identical to DX10 ones (except for the fact that the names of the former start with PIPE_FORMAT_ and the ones of the latter with DXGI_FORMAT_, and the enumerated values are different) - It has been decided to follow the DX9 SM3/DX10 model for shader semantic linkage as opposed to the OpenGL one I recently systematically compared Gallium and DirectX 10, and found them to be mostly equivalent, where the exceptions were usually either additional features Gallium had for the sake of OpenGL, or Gallium misdesigns that are being changed or looked into. This is not likely for the sake of imitating Microsoft, but just because they made a good design, having had made the decision to redesign the whole API from scratch when making DirectX 10. It's also probably because VMWare is apparently funding DirectX 10 support over Gallium, which obviously makes all discrepancies evident for people working on that, and those are generally because DirectX 10 is better, leading those people to improve the Gallium design taking inspiration from DirectX 10. Presumably if Microsoft were to change interfaces incompatibly again (notice that DX11 is a compatible change), Mesa would probably benefit from introducing a further abstraction layer similar to new Microsoft interface and have a Gallium-NewLayer module, since such a change would most likely be a result of a paradigm shift in graphics hardware itself (e.g. a switch to fully software-based GPUs like Larrabee). Also, unless Microsoft holds patents to DirectX 10 (which would be a showstopper, even though Gallium may violate them anyway), I don't see any difference from having to implement DirectX 10 or OpenGL, or any difference in openness of the APIs. It is indeed possible to participate in the ARB standardization process and some Mesa contributors/leaders do, but I'm not sure whether this is particularly advantageous: decisions that work well for Microsoft and Windows are also likely to work well for Linux/Mesa since the hardware is the same and the software works mostly equivalently. And should some decisions not work well, it is technically trivial to provide an alternative API. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Mesa (master): gallium/util: add fast half float conversion functions
This constructor scheme is not working for me. I think that's because there isn't any symbol here that's used elsewhere, hence the linker is not linking this file. I replaced the system with a different mechanism. It should now work correctly, but only GCC and MSVC are supported, and the latter is untested. Please put copyright headers. *Especially* when basing your work from external references, as it gives the impression that this code was copied, and not your own creation. Done: the code was a reimplementation from scratch of the code provided by them, with slight changes. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
[Mesa3d-dev] How do we init half float tables?
The half float conversion code initially used a global C++ constructor to initialize the tables. That fails to work since ld does not get the symbol from the shared library, so I changed to use register a global constructor from C, using GCC- or MSVC-specific code. This is not necessarily the best option, but clearly putting a check in the functions as Corbin did is a bad idea performance-wise. So, how should this be done? Options are: 1. Revert Corbin Simpsons's commit and if anyone complains about an unsupported compiler, implement UTIL_INIT for that compiler too 2a. Write the init module in C++ and use portable global constructor syntax (but with other C++-related problems) 2b. Write an auxiliary file in C++ with a global constructor and reference it from the C init file so the static linker pulls it from the .a 3. Have all modules that need half float conversion directly or indirectly call the init functions in their init code 4. Make the build pregenerate the tables and ship them in the executables Option 1: Pros: just works, other auxiliary modules can use the same system Cons: need to write UTIL_INIT for each compiler, only GCC and MSVC (and compatible ones) are currently supported Option 2a: Pros: no compiler-specific UTIL_INIT Cons: significant code written in C++ instead of C and you must link all targets with a C++ compiler or use compiler-specific options to prevent stuff like the G++ personality causing the link to fail Option 2b: Like option 2a, but Pro: less code written in C++ Con: need an extra C++ file for every module with data to be initialized Option 3: Pros: none of the cons of the other options Cons: cumbersome to do, must not forget to call the init function or you get silent corruption. Init calls creep through the whole codebase. Option 4: Pros: no need to do anything at runtime, pages can be shared between OpenGL apps Cons: need to write a special table generator, all DRI drivers get 10KB larger, solution does not apply to all similar problems I would lean for either option 1 or option 4, perhaps option 4 considering there seems to be disagreement over option 1. Option 4 however is likely not universally applicable (not everything can necessarily be pregenerated), so I'd keep the UTIL_INIT code anyway. Which one do we pick? -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] How do we init half float tables?
OK. I'd like to add that u_atomic.h already requires either GCC, MSVC or Solaris, and GCC and MSVC are already supported by this system. Thus we do indeed now have a simple way to do global constructors, that only removes support for non-GCC Solaris until someone figures out how to do that. And it's relatively simple, you just have to figure out the section name of the global constructor table, and how to instruct the specific compiler to put a variable in a specific section. GCC even has __attribute__((constructor)) which does it all for you. At any rate, util_format_s3tc_init has similar issues, and is currently called from a few places. I think the best thing to do to implement your suggestion is adding an util_format_init that calls both init functions and leave the UTIL_INIT code in place (since it seems we now got it right): it is easy to remove by deleting u_init.h if desired. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] How do we init half float tables?
Are you sure about this? I've tried doing it, and it turns out that basically every Gallium module needs this initialized. For instance: st/mesa due to glReadPixels vg/mesa due to vgReadPixels st/python due to mesa sampling several programs in rbug to dump textures to disk softpipe due to texture sampling nv50 due to static attrbutes Also, if translate did not needlessly duplicate the generic format support, it would also need it, and draw would too. Basically everything in Gallium will end up having util_format initialized, and it seems there are at least 10 different places in the code where such a call would need to be added (including strange places like rbug with call pipe*tile* which calls util_format_read*). I added it for nv50 before realizing the extent of the changes needed, but now think it is not really a feasible solution. In other words, I think this should be revisited as it results in cluttering the codebase and creating a somewhat unreliable system. I believe that we should either use the global constructor-like technique I introduced, or do the following: 1. Pregenerate half float tables 2. Initialize the S3TC function pointers to stubs that dlopen the library, initialize the function pointers to the real functions and then delegate to the real function corresponding to the stub More specifically, I think this two-step approach is superior to the global constructor, but that the global constructor technique may be useful in other cases where it is not possible to either pregenerate or have a free initialization check due to the S3TC dynamic loading. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] gallium raw interfaces
WINE can deal with that. The real showstopper is that WINE has to also work on MacOS X and Linux + NVIDIA blob, where Gallium is unavailable. We could actually consider making a Gallium driver that uses OpenGL to do rendering. If the app uses DirectX 10, this may not significantly degrade performance, and should instead appreciably increase it if a Gallium driver is available. On the other hand, for DirectX 9 apps, this could decrease performance significantly (because DirectX 9 has immediate mode and doesn't require CSOs). -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
[Mesa3d-dev] Gallium half float conversion/support
[sent to ML too] Michal, I noticed you made some commits related to half float support in Gallium. I had already done this work and implemented a fast conversion algorithm capable of handling all cases based on a paper cited in the commit, but hadn't gotten around to proposing it yet. I created a gallium-fast-half-float branch and put my work there, so it may be useful to you. Feel free to merge, rebase and/or adapt it against Mesa master. The conversion function itself has been tested separately from Gallium, but I haven't tested softpipe on fp16 data. Ideally we should find a way to have Mesa use this improved converter instead of the one it currently uses, but I'm not sure how to arrange this with the current buildsystem. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Current tinderbox regression (swrastg_dri, ppc64)
Should be fixed now. BTW, if it is still not compiling due to the __sync* issues, try adding CFLAGS=-march=v9 to the build: it should fix that. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] GSOC: Gallium R300 driver
Another idea was to convert TGSI to a SSA form. That would make unrolling branches much easier as the Phi function would basically become a linear interpolation, loops and subroutines with conditional return statements might be trickier. The r300 compiler already uses SSA for its optimization passes so maybe you wouldn't need to mess with TGSI that much... Is the conditional translation something that only needs to be done in the Gallium drivers, or would it be useful to apply the translation before the Mesa IR is converted into TGSI? Â Are any of the other drivers (Gallium or Mesa) currently doing this kind of translation? Not that I know of. You may do it wherever you want theoretically, even in the r300 compiler and leaving TGSI untouched, but I think most people would appreciate if these translation were done in TGSI. It would be nice to have a driver-independent TGSI optimization module. It could either operate directly on TGSI (probably only good for simple optimization), or convert to LLVM IR, optimize, and convert back. This would allow to use this for all drivers: note that at least inlining and loop unrolling should generally be performed even for hardware with full control flow support. Lots of other optimizations would then be possible (using LLVM, with a single line of code to request the appropriate LLVM pass), and would automatically be available for all drivers, instead of being only available for r300 by putting them in the radeon compiler. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] GSOC: Gallium R300 driver
There are several deep challenges in making TGSI - LLVM IR translation lossless -- I'm sure we'll get around to overcome them -- but I don't think that using LLVM is a requirement for this module. Having a shared IR for simple TGSI optimization module would go a long way by itself. What are these challenges? If you keep vectors and don't scalarize, I don't see why it shouldn't just work, especially if you just roundtrip without running any passes. The DAG instruction matcher should be able to match writemasks, swizzles, etc. fine. Control flow may not be exactly reconstructed, but I think LLVM has control flow canonicalization that should allow to reconstruct a loop/if control flow structure of equivalent efficiency. Using LLVM has the obvious advantage that all optimizations have already been written and tested. And for complex shaders, you may really need a good full optimizer (that can do inter-basic-block and interprocedural optimizations, alias analysis, advanced loop optmizations, and so on), especially if we start supporting OpenCL over TGSI. There is also the option of having the driver directly consume the LLVM IR, and the frontend directly produce it (e.g. clang supports OpenCL - LLVM). Some things, like inlining, are easy to do directly in TGSI (but only because all regs are global). However, even determining the minimum number of loop iterations for loop unrolling is very hard to do without a full compiler. For instance, consider code like this: if(foo = 6) { if(foo == 1) iters = foo + 3; else if(bar == 1) iters = foo + 5 + bar; else iters = foo + 7; for(i = 0; i iters; ++i) LOOP_BODY; } You need a non-trivial optimizer (with control flow support, value range propagation, and constant folding) to find out that the loop always executes at least 12 iterations, which you need to know to unroll it optimally. More complex examples are possible. It general, anything that requires (approximately) determining any property of the program potentially benefits from having the most complex and powerful optimizer available. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] GSOC: Gallium R300 driver
DDX/DDY could cause miscompilation, but I think that only happens if LLVM clones or causes some paths to net execute them. Someone proposed some time ago on llvmdev to add a flag to tell llvm to never duplicate an intrinsic, not sure if that went through (iirc, it was for a barrier instruction that relied on the instruction pointer). Alternatively, it should be possible to just disable any passes that clone basic blocks if those instructions are present. The non-execution problem should be fixable by declaring DDX/DDY to have global-write-like side effects (this will prevent dead code elimination of them if they are totally unused, but hopefully shaders are not written so badly they need that). -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] GSOC: Gallium R300 driver
On Tue, 2010-03-30 at 09:52 -0700, Luca Barbieri wrote: There are several deep challenges in making TGSI - LLVM IR translation lossless -- I'm sure we'll get around to overcome them -- but I don't think that using LLVM is a requirement for this module. Having a shared IR for simple TGSI optimization module would go a long way by itself. What are these challenges? - Control flow as you mentioned -- gets broken into jump spaghetti. LoopSimplify seems to do at least some of the work for loops. Not sure if there is an if-construction pass, but it should be relatively easy. Once you have an acyclic CFG subgraph (which hopefully LoopSimplify easily gives you), every basic block with more than one outedge will need to have an if/else block generated. Now find the first block in topological sort order such that any path from the if start block reaches that block before any later ones in topological sort order. I think this is called the forward dominator, and LLVM should have analysis that gives you that easily. After that, just duplicate the CFG between the if block start and the forward dominator to build each branch of the if, and recursively process the branches. If you have a DDX/DDY present in multiple if parts, you are screwed, but that won't happen without optimization and hopefully you can tune fragment program optimization so that doesn't happen at all. - Predicates can't be represented -- you need to use AND / NAND masking. I know people have asked support for this in the LLVM list so it might change someday. For the LLVM-TGSI part, x86 has condition codes. Not sure how LLVM represents them, but I suppose predicates can be handled in the same way Multiple predicate registers may not work well, but GPUs probably don't have them in hardware anyway (e.g. nv30/nv40 only have one or two). For the TGSI-LLVM part, Mesa never outputs predicates afaik. - missing intrinsics -- TGSI has a much richer instruction set than LLVM's builtin instructions, so it would be necessary to add several llvm.tgsi.xxx instrinsics (e.g., for max, min, madd, exp2, log2, etc), and teach LLVM to do constant propagation for every single one of them. Yes, of course. Initially you could do without constant propagation. Also again x86/SSE has many of the same intrinsics, so their approach can be imitated. I think MAD can be handled by mul + add, if you don't care about whether an extra rounding is done or not (and I think, for GPU shaders, it's not really a high priority issue). Anyway SSE5 has fused multiply/add, so LLVM has/will have a way. - Constants -- you often want to make specialized version of shaders for certain constants, especially when you have control flow statements whose arguments are constants (e.g., when doing TNL with a big glsl shader), and therefore should be factored out. You also may want to do factor out constant operations (e.g., MUL TEMP[1], CONST[0], CONST[1]) But LLVM can't help you with that given that for LLVM IR constants are ordinary memory, like the inputs. LLVM doesn't know that a shader will be invoked million of times with the same constants but varying inputs. If you want to do that, you must of course run LLVM for each constant set, telling it what the constant values are. You can probably identify branch-relevant constant from the LLVM SSA form to restrict that set. For the MUL TEMP[1], CONST[0], CONST[1], I suppose you could enclose the shader code in a big loop to simulate the rasterizer. LLVM will them move the CONST[0] * CONST[1] outside the loop, and you can codegen the part outside the loop using an LLVM CPU backend. In this case, using LLVM will give you automatic pre-shader generation for the CPU mostly for free. Alternatively, you could have a basic IF-simplifier on TGSI that only supports the conditional being the comparison of a constant to something else (using the rasterizer loop trick can allow you to get simpler conditionals). If people can make this TGSI optimization module work quickly on top of LLVM then it's fine by me. I'm just pointing out that between the extreme of sharing nothing between each pipe driver compiler, and sharing everything with LLVM, there's a middle ground which is sharing between pipe drivers but not LLVM. Â Once that module exists having it use LLVM internally would then be pretty easy. Â It looks to me a better way to parallize the effort than to be blocked for quite some time on making TGSI - LLVM IR be lossless. Yes, sure, a minimal module can be written first and then LLVM use can be investigated later. In other words, it's not necessarily trivial, but definitely seems doable. In particular getting it to work on anything non-GLSL should be relatively straightforward. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications
Re: [Mesa3d-dev] gallium raw interfaces
An interesting option could be to provide a DirectX 10 implementation using TGSI text as the shader interface, which should be much easier than one would think at first. DirectX 10 + TGSI text would provide a very thin binary compatible layer over Gallium, unlike all existing state trackers. It could even run Windows games if integrated with Wine and something producing TGSI from either HLSL text or D3D10 bytecode (e.g. whatever Wine uses to produce GLSL + the Mesa GLSL frontend + st_mesa_to_tgsi). In fact, given the Gallium architecture, it may even make sense to support a variant of DirectX 10 as the main Mesa/Gallium API on all platfoms, instead of OpenGL. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [Nouveau] [radeonhd] Re: Status of s3tc patent in respect to open-source drivers and workarounds
Interestingly, the post-trial judge opinion at http://wi.findacase.com/research/wfrmDocViewer.aspx/xq/fac.%5CFDCT%5CWWI%5C2008%5C20080801_734.WWI.htm/qx contains the following text: Plaintiff’s expert, Dr. Stevenson, testified that the ‘327 patent is directed to “a special purpose hardware component designed and optimized specifically for high speed graphics processing. The specification makes it plain that the invention does not relate to software for graphics. As the inventors noted, such programs “are well known in the art. [...] Claim 17 does not say in so many words that the method it discloses is a rasterization circuit operating on a floating point format, but that is what it describes. Reading the disputed claims as disclosing hardware is not reading a preferred embodiment in the claims; it is simply reading the claims as the person of ordinary skill would read a patent directed to special purpose hardware. This seems to indicate that it would be safe to implement floating point textures/framebuffers in Mesa, as both SGI and ATI and the court seemed to agree that the patent applies specifically to hardware. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Status of s3tc patent in respect to open-source drivers and workarounds
If the application provides s3tc-encoded data through glCompressedTexImage (usually loaded from a pre-compressed texture stored on disk), Mesa will pass it unaltered to the graphics card (as long as the driver/card supports DXT* format ids) and will not need to use any encoding or decoding algorithms. The problem is that if the application supplies uncompressed data, Mesa would need to run an encoding algorithm to be able to use compressed textures. Conversely, if software rendering is necessary, and the application provides compressed textures, Mesa will need to run a decoding algorithm to be able to sample from the texture. So the workaround (and what commercial games usually do) is to ship pre-compressed textures along with the game, as well as uncompressed textures in case the card/rendered do not support texture compression. An end-user side solution is to download, compile and install libtxc_dxtn.so, which Mesa will use if present to decode and encode compressed textures. Furthermore, a GPU can be used to do decoding using its native sampling support, and some may also support encoding. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Current tinderbox regression (swrastg_dri, sparc64)
On Sun, Mar 28, 2010 at 7:36 PM, Chris Ball c...@laptop.org wrote: Hi,  http://tinderbox.x.org/builds/2010-03-25-0018/logs/libGL/#build   swrastg_dri.so.tmp: undefined reference to `__sync_sub_and_fetch_4'  swrastg_dri.so.tmp: undefined reference to `__sync_add_and_fetch_4' This regression is still present -- could we get a fix or a revert? I believe the problem is that sparc does not support atomic operations in the basic architecture: I think someone who knows about sparc and has such a machine should look into it. If you don't know anything about sparc, try rebuilding with the highest possible sparc -march= level and if that fixes the problem, perform a binary search to find the minimum one, and then report the results. If it does not solve the problem, see if anything in /lib or /usr/lib exports those symbols. Also maybe check whether the built swrastg_dri or xlib softpipe actually works there. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] gallium raw interfaces
I posted something similar some time ago, that however could use hardware accelerated drivers with DRI2 or KMS, provided a substantial set of helpers and offered a complement of 3 demos. My solution to window-system issues was to simply have the application provide a draw callback to the framework, which would automatically create a maximized window with the application name in the title, and call draw in a loop, presenting the results. Then I had a path that woud use the X DRI2 interface if possible, and another path that would use the Linux DRM KMS API (and initially some EGL+ad-hoc extension paths that were later dropped). It no longer works due to Gallium interface changes, but maybe it can be resurrected and merged with graw. However, there is a disadvantage to having Gallium programs in-tree: they break every time the Gallium interface in changed and avoiding that means that in addition to fixing all drivers and state trackers, you also need to fix all programs for each change -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Status of s3tc patent in respect to open-source drivers and workarounds
I just noticed a potentially interesting thing. nVidia publishes under the MIT license a software suite called nVidia texture tools. This includes a library called nvtt that provides DXT* compression, plus a library called nvimage that provides decompression. It looks like the libraries can be used unmodified and nVidia is almost surely a licensee of that patent. So, if using and shipping a possibly-patent-covered library published by a patent licensee does not risk violating the patent, we may have a workable solution. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Rationale of gallium-resources changes?
To me this speaks to another aspect of the gallium interface which is a bit odd -- in particular the way several of our interfaces basically copy their inputs into a structure and pass that back to the state tracker. Â Why are we doing that? Â The state tracker already knows what it asked us to do, and there is no reason to assume that it needs us to re-present that information back to it. Yes, only the CSOs don't have this form of copying: all other structures include the input parameters there. As a random example pipe_sampler_view has the lots of parameters that a driver would have converter into the hardware format and are thus redundant, and unlikely to be read by state tracker. Textures and buffers also have many visible data members that the state tracker may or may not read. In particular, the Mesa state tracker already keeps everything in the Mesa internal structures, and so benefits little from such data We may want to consider going toward making _all_ Gallium structures opaque (and, by the way, using declared-only structs instead of void* like we do for CSOs, which are not checkable by the compiler). Another serious data duplication issue are drivers that just copy the input state in internal structures and return, to then process everything in draw calls. This usually results in state being duplicated (and copied) 3 times: in Mesa internal structures, in the state tracker structures and then in the driver. The draw module may also keep a 4th copy of the state. Note that when reference counting is involved, copies are even more expensive since they now need atomic operations. Usually drivers do this because: 1. They need to pass data to the draw module in case of fallbacks, and thus cannot send it to hardware and forget about it 2. They need to recreate the whole hardware context state in some cases 3. They multiplex multiple pipe_contexts on a single screen 4. They need a global view of state, rather than a single state change at a time, to decide what to do A possible solution is to remove all set_* and bind_* calls and replace them with data members of pipe_context that the state tracker would use instead of its own internal structures. In addition, and a new what's new bitmask would be added, and the driver would check it on draw calls. Performance-wise, this replaces num_state_changes dynamic function calls to the driver, with (log2(total_states) + num_state_changes) branches to check the what's new bitmask. Furthermore: 1. State is never copied, since the state tracker constructs it in place 2. There is no longer any need for state save helper in the blitter module and similar 3. The draw module can potentially directly read state from pipe_context instead of duplicating it yet a 4. Drivers no longer need to have all the functions that store the parameters, set a dirty flag and return Note that the Direct3D DDI does not do this, but they have to keep binary compatibility, which is easier with Set* calls than this scheme. softpipe, nvfx, nv50, r300 and probably others already do this internally, and having the state tracker itself construct the data would remove a lot of redundant copying code and increase performance. Having drivers capable of doing send-to-hardware-and-forget-about-it on arbitrary state setting could be a nice thing instead, but unfortunately a lot of hardware fundamentally can't do this, since for instance: 1. Shaders need to be all seen to be linked, possibly modifying the shaders themselves (nv30) 2. Constants need to be written directly into the fragment program (nv30-nv40) 3. Fragment programs depend on the viewport to implement fragment.position (r300) 4. Fragment programs depend on bound textures to specify normalization type and emulate NPOT (r300, r600?, nv30) and so on... 5. Sometimes sampler state and textures must be seen together since the hardware mixes it The only really new information provided by the driver to the state tracker by transfer_create + transfer_map is: - the pointer to the data - stride - slice stride There is also the 3D box, unless transfers start covering the whole resource, which seems really suboptimal for stuff like glTexSubImage. This needs to be provided to the driver unless a buffer-specialized interface is made (then a 1D box is enough). Thanks for the summary. Â I'd add that there is also some information available publicly about the D3D10 DDI, which follows a slightly different interface to the API. Â In that world, there is a single create resource function: It is indeed extremely interesting, and it looks like it should be the first place to look for inspiration for Gallium interface. I added a comparison of the D3D11 DDI and Gallium to src/gallium/docs. There is however clearly concern about the possible need for specialized transfer mechanisms for particular buffer types. Â It seems like they've taken an approach that leaves the choice to the driver whether to specialize or not --
Re: [Mesa3d-dev] Current tinderbox regression (swrastg_dri, sparc64)
Are you sure that swrastg and/or any Gallium driver actually load correctly and work on sparc64? This seems to indicate that they use __sync_add_and_fetch_4 assuming it is a GCC builtin, but GCC does not implement it as a builtin on sparc64 and neither libgcc nor libc have an implementation of the function. I don't know anything about sparc64, but according to the linux kernel, I vaguely guess that specifying an high enough -march= to gcc could solve it by enabling use of atomic instructions that are otherwise are not used. The root cause is likely that we set PIPE_ATOMIC_GCC_INTRINSIC even though not all __sync builtins are actually supported: we should probably fix that. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Rationale of gallium-resources changes?
Thanks for providing a long insightful reply. Transfers can then be split in texture transfers and buffer transfers. Note that they are often inherently different, since one often uses memcpy-like GPU functionality, and the other often uses 2D blitter or 3D engine functionality (and needs to worry about swizzling or tiling) Thus, they are probably better split and not unified. My experience is that there is more in common than different about the paths. Â There are the same set of constraints about not wanting to stall the GPU by mapping the underlying storage directly if it is still in flight, and allocating a dma buffer for the upload if it is. There will always be some differences, but probably no more than the differences between uploading to eg a constant buffer and a vertex buffer, or uploading to a swizzled and linear texture. The considerations you mentioned are indeed common between buffers and textures, but the actual mechanisms for performing the copy are often significantly different. For instance, r300g ends up calling the 3D engine via surface_copy-util_blitter for texture transfers, which I suppose it wouldn't do for buffer transfers. nv30/nv40 don't have a single way to deal with swizzled textures, and the driver must choose between many paths depending on whether the source/destination is swizzled or not, a 3D texture or not, and even its alignment or pitch (the current driver doesn't do fully that, and is partially broken for this reason). Buffers can instead be copied very simply with MEMORY_TO_MEMORY_FORMAT. nv50 does indeed have a common copy functionality that can handle all buffers and textures in a unified way (implemented as a revamped MEMORY_TO_MEMORY_FORMAT). However, an additional buffer-only path would surely be faster than going through the common texture path. In particular, for buffers tile_flags are always 0 and height is always 1, allowing to write a significantly simplified buffer-only version of nv50_transfer_rect_m2mf with no branches and no multiplications at all. In other words, I think most drivers would be better off implementing unified transfers with an if switching between a buffer and a texture path, so it may be worth using two interfaces. Also note that a buffer-only interface is significantly simplified since you don't need to specify: - face - level - zslice - y - height - z - depth - stride - slice stride While this may seem a micro-optimization, note that 3D applications often spend all the time running the OpenGL driver and Mesa/Gallium functions are already too heavy in profiles, so I think it's important to always keep CPU performance in mind. The code is also streamlined and easier to follow if it does not have to default-initialize a lot of stuff. An utility function calling the right interface can be created for state trackers that really need it (maybe Direct3D10, if the driver interface follows the user API). In DX they have different nomenclature for this - the graphics API level entities are resources and the underlying VMM buffers are labelled as allocations. In gallium, we're exposing the resource concept, but allocations are driver-internal entities, usually called winsys_buffers, or some similar name. D3D10 uses buffers, sampler views and render target views as entities bindable to the pipeline, and the latter are constructed over either textures or buffers. Note however, that the description structure is actually different in the buffer and texture cases. For render target views, they are respectively D3D10_BUFFER_RTV and D3D10_TEX2D_RTV (and others for other texture types). The first specifies an offset and stride, while the second specifies a mipmap level. Other views have similar behavior. Buffers are directly used in the interfaces that allow binding vertex/index/constant buffers. Both buffers and textures are subclasses of ID3D10Resource, which is used by CopyResource, CopySubresourceRegion and UpdateSubresource, which provide a subset of the Gallium transfer functionality in gallium-resources. Note however that the two resources specified to CopyResource and CopySubresourceRegion must be of the same type. So in summary, D3D10 does indeed in some sense go in the buffer/texture unification, but with some important differences: 1. Buffers and textures still exists as separate types. Note that there is no texture type, but rather a separate interface for each texture type, which directly inherits from ID3D10Resource 2. Textures are never used directly by the pipeline, but rather through views which have texture-type-specific creation methods and have separate interfaces 3. Buffers are directly used by the pipeline for vertex, index and constant buffers 4. Resources are used in copying and transfer functionality 5. D3D10 has a more memory-centric view of resources, providing for instance a D3D10_USAGE_STAGING flag, for A resource that supports data transfer (copy) from the GPU to the CPU. D3D11 seems to
Re: [Mesa3d-dev] Segfault on glClear of non-existent stencil buffer caused by bd1ce874728c06d08a1f9881f51edbdd2f1c9db0
We have a visual haveStencilBuffer == 1 but stencilBits == 0 (and no stencil renderbuffer), which I suppose shouldn't be happening. visualid and fbconfigid are also 0. Here is the full structure: $1 = {next = 0x0, rgbMode = 1 '\001', floatMode = 0 '\000', colorIndexMode = 0 '\000', doubleBufferMode = 1, stereoMode = 0, haveAccumBuffer = 0 '\000', haveDepthBuffer = 1 '\001', haveStencilBuffer = 1 '\001', redBits = 8, greenBits = 8, blueBits = 8, alphaBits = 8, redMask = 0, greenMask = 0, blueMask = 0, alphaMask = 0, rgbBits = 32, indexBits = 0, accumRedBits = 0, accumGreenBits = 0, accumBlueBits = 0, accumAlphaBits = 0, depthBits = 24, stencilBits = 0, numAuxBuffers = 0, level = 0, pixmapMode = 0, visualID = 0, visualType = 0, visualRating = 0, transparentPixel = 0, transparentRed = 0, transparentGreen = 0, transparentBlue = 0, transparentAlpha = 0, transparentIndex = 0, sampleBuffers = 0, samples = 0, drawableType = 0, renderType = 0, xRenderable = 0, fbconfigID = 0, maxPbufferWidth = 0, maxPbufferHeight = 0, maxPbufferPixels = 0, optimalPbufferWidth = 0, optimalPbufferHeight = 0, visualSelectGroup = 0, swapMethod = 0, screen = 0, bindToTextureRgb = 0, bindToTextureRgba = 0, bindToMipmapTexture = 0, bindToTextureTargets = 0, yInverted = 0} BTW, what's the purpose of having haveStencilBuffer at all? Isn't checking stencilBits != 0 enough? -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Segfault on glClear of non-existent stencil buffer caused by bd1ce874728c06d08a1f9881f51edbdd2f1c9db0
The problem seems to be in st_manager.c: if (visual-depth_stencil_format != PIPE_FORMAT_NONE) { mode-haveDepthBuffer = GL_TRUE; mode-haveStencilBuffer = GL_TRUE; mode-depthBits = util_format_get_component_bits(visual-depth_stencil_format, UTIL_FORMAT_COLORSPACE_ZS, 0); mode-stencilBits = util_format_get_component_bits(visual-depth_stencil_format, UTIL_FORMAT_COLORSPACE_ZS, 1); } This sets haveStencilBuffer even for depth-only buffers. How about fixing this to set haveDepthBuffer and haveStencilBuffer only if bits 0, and later removing haveStencilBuffer, haveDepthBuffer and haveAccumBuffer in favor of just testing the *bits counterparts? BTW, what if we have a floating-point depth buffer, or, say, a shared exponent floating-point color buffer? How do we represent that with the visual structure? Shouldn't we be using the actual formats instead of this *bits stuff, maybe by having Mesa look at its internal structures instead of a GLXVisual-like struct? -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Current tinderbox regression (dri)
The issue should be hopefully completely fixed by 7e246e6aa63979d53731a591f4caee3651c1d96b. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Current tinderbox regression (dri)
On Tue, Mar 23, 2010 at 10:45 PM, Sedat Dilek sedat.di...@googlemail.com wrote: The issue should be hopefully completely fixed by 7e246e6aa63979d53731a591f4caee3651c1d96b. Unfortunately, build breaks here. Not sure which of the last changes really breaks it. Hopefully fixed that too now. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Current tinderbox regression (dri: fix dri_test.c for non-TLS build)
According to the logs, that build was not based on that commit, which instead actually fixes that issue. http://tinderbox.x.org/builds/2010-03-23-0040/ was actually the first tinderbox build using that, and it went past that issue to fail on xeglthreads problem, which is unrelated. Thanks anyway for reporting this. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] DRI SDK and modularized drivers.
It may seem e.g. like the DRM interface is the worst because of rather large threads caused by certain kernel developer's problems, but that doesn't mean problems wouldn't be created by splitting other areas. This would probably be best solved by merging libdrm into the Linux kernel tree. Ingo Molnar's rationale for having tools/perf in the kernel tree applies even more in this case. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] dri: test whether the built drivers have unresolved symbols
How about applying this? It should prevent introducing regressions similar to ones that happened in the past, with very little downside. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] dri: test whether the built drivers have unresolved symbols
Can we just put this program in the demos? Or at least just make it a separate target (make test-link)? It seems excessive to make this part of the default build path. The whole purpose is to run this as part of the standard build, so that the build fails if any driver is unloadable, (i.e. a modification to it was botched) and the tree hopefully doesn't get pushed to master. You can test it separately by just running glxinfo/glxgears, obviously. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] dri: test whether the built drivers have unresolved symbols
For developers that makes a lot of sense, but I've never seen any other projects impose this type of thing on regular users. Why do you see it as an onerous imposition? It just tries to compile a program linked with a couple of libraries (the DRI driver, plus libGL) and makes the build fail if that fails. It doesn't even execute the built program (and could not always do so even if it were desired, since you could be cross-compiling). -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
[Mesa3d-dev] [PATCH] nv40: remove leftover nv40_transfer.c from unification into nvfx
--- src/gallium/drivers/nv40/nv40_transfer.c | 181 -- 1 files changed, 0 insertions(+), 181 deletions(-) delete mode 100644 src/gallium/drivers/nv40/nv40_transfer.c diff --git a/src/gallium/drivers/nv40/nv40_transfer.c b/src/gallium/drivers/nv40/nv40_transfer.c deleted file mode 100644 index 3d8c8e8..000 --- a/src/gallium/drivers/nv40/nv40_transfer.c +++ /dev/null @@ -1,181 +0,0 @@ -#include pipe/p_state.h -#include pipe/p_defines.h -#include util/u_inlines.h -#include util/u_format.h -#include util/u_memory.h -#include util/u_math.h -#include nouveau/nouveau_winsys.h -#include nv40_context.h -#include nvfx_screen.h -#include nvfx_state.h - -struct nv40_transfer { - struct pipe_transfer base; - struct pipe_surface *surface; - boolean direct; -}; - -static void -nv40_compatible_transfer_tex(struct pipe_texture *pt, unsigned width, unsigned height, - struct pipe_texture *template) -{ - memset(template, 0, sizeof(struct pipe_texture)); - template-target = pt-target; - template-format = pt-format; - template-width0 = width; - template-height0 = height; - template-depth0 = 1; - template-last_level = 0; - template-nr_samples = pt-nr_samples; - - template-tex_usage = PIPE_TEXTURE_USAGE_DYNAMIC | - NOUVEAU_TEXTURE_USAGE_LINEAR; -} - -static struct pipe_transfer * -nv40_transfer_new(struct pipe_context *pcontext, struct pipe_texture *pt, - unsigned face, unsigned level, unsigned zslice, - enum pipe_transfer_usage usage, - unsigned x, unsigned y, unsigned w, unsigned h) -{ -struct pipe_screen *pscreen = pcontext-screen; - struct nvfx_miptree *mt = (struct nvfx_miptree *)pt; - struct nv40_transfer *tx; - struct pipe_texture tx_tex_template, *tx_tex; - - tx = CALLOC_STRUCT(nv40_transfer); - if (!tx) - return NULL; - - pipe_texture_reference(tx-base.texture, pt); - tx-base.x = x; - tx-base.y = y; - tx-base.width = w; - tx-base.height = h; - tx-base.stride = mt-level[level].pitch; - tx-base.usage = usage; - tx-base.face = face; - tx-base.level = level; - tx-base.zslice = zslice; - - /* Direct access to texture */ - if ((pt-tex_usage PIPE_TEXTURE_USAGE_DYNAMIC || -debug_get_bool_option(NOUVEAU_NO_TRANSFER, TRUE/*XXX:FALSE*/)) - pt-tex_usage NOUVEAU_TEXTURE_USAGE_LINEAR) - { - tx-direct = true; - tx-surface = pscreen-get_tex_surface(pscreen, pt, - face, level, zslice, - pipe_transfer_buffer_flags(tx-base)); - return tx-base; - } - - tx-direct = false; - - nv40_compatible_transfer_tex(pt, w, h, tx_tex_template); - - tx_tex = pscreen-texture_create(pscreen, tx_tex_template); - if (!tx_tex) - { - FREE(tx); - return NULL; - } - - tx-base.stride = ((struct nvfx_miptree*)tx_tex)-level[0].pitch; - - tx-surface = pscreen-get_tex_surface(pscreen, tx_tex, - 0, 0, 0, - pipe_transfer_buffer_flags(tx-base)); - - pipe_texture_reference(tx_tex, NULL); - - if (!tx-surface) - { - pipe_surface_reference(tx-surface, NULL); - FREE(tx); - return NULL; - } - - if (usage PIPE_TRANSFER_READ) { - struct nvfx_screen *nvscreen = nvfx_screen(pscreen); - struct pipe_surface *src; - - src = pscreen-get_tex_surface(pscreen, pt, - face, level, zslice, - PIPE_BUFFER_USAGE_GPU_READ); - - /* TODO: Check if SIFM can deal with x,y,w,h when swizzling */ - /* TODO: Check if SIFM can un-swizzle */ - nvscreen-eng2d-copy(nvscreen-eng2d, - tx-surface, 0, 0, - src, x, y, - w, h); - - pipe_surface_reference(src, NULL); - } - - return tx-base; -} - -static void -nv40_transfer_del(struct pipe_context *pcontext, struct pipe_transfer *ptx) -{ - struct nv40_transfer *tx = (struct nv40_transfer *)ptx; - - if (!tx-direct (ptx-usage PIPE_TRANSFER_WRITE)) { - struct pipe_screen *pscreen = pcontext-screen; - struct nvfx_screen *nvscreen = nvfx_screen(pscreen); - struct pipe_surface *dst; - - dst = pscreen-get_tex_surface(pscreen, ptx-texture, - ptx-face, ptx-level, ptx-zslice, -
Re: [Mesa3d-dev] undefined symbols and silent fallback to swrast
Adding both -Wl,--no-undefined and -lGL (in src/gallium/winsys/drm/Makefile.template) seems to work for me. The driver loader is already loading libGL.so.1 with RTLD_NOW | RTLD_GLOBAL, so I think that the DRI driver depending on libGL.so.1 shouldn't introduce any issue. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
[Mesa3d-dev] [PATCH] dri: link DRI drivers with -Wl, --no-undefined -lGL
Right now undefined symbols in DRI drivers will still allow the build to succeed. As a result, people modifying drivers they cannot test risk creating unloadable drivers with no easy way of automatically avoiding it. For instance, the modifications to nv50 for context transfers caused such an issue recently. The fix is to build DRI drivers with -Wl,--no-undefined -lGL which will cause make to fail in such cases. Note that this introduces a dependency from the DRI drivers on libGL.so.1. However, the driver loader calls dlopen on libGL.so.1 with RTLD_GLOBAL | RTLD_NOW before loading any DRI driver, so the added dependency shouldn't cause changes in runtime behavior. Please double-check the correctness of this assumption before pushing. All classic DRI drivers as well as all the Gallium drivers with configure options compiled successfully with this change. Thanks to Xavier Chantry chantry.xav...@gmail.com for helping with this. --- src/gallium/winsys/drm/Makefile.template |4 ++-- src/mesa/drivers/dri/Makefile.template |2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/gallium/winsys/drm/Makefile.template b/src/gallium/winsys/drm/Makefile.template index f4cc0de..326cd59 100644 --- a/src/gallium/winsys/drm/Makefile.template +++ b/src/gallium/winsys/drm/Makefile.template @@ -66,9 +66,9 @@ default: depend symlinks $(TOP)/$(LIB_DIR)/gallium/$(LIBNAME) $(LIBNAME): $(OBJECTS) $(MESA_MODULES) $(PIPE_DRIVERS) Makefile \ $(TOP)/src/mesa/drivers/dri/Makefile.template $(MKLIB) -o $@ -noprefix -linker '$(CC)' -ldflags '$(LDFLAGS)' \ - $(OBJECTS) $(PIPE_DRIVERS) \ + -Wl,--no-undefined $(OBJECTS) $(PIPE_DRIVERS) \ -Wl,--start-group $(MESA_MODULES) -Wl,--end-group \ - $(DRI_LIB_DEPS) $(DRIVER_EXTRAS) + $(DRI_LIB_DEPS) $(DRIVER_EXTRAS) -L$(TOP)/lib -lGL $(TOP)/$(LIB_DIR)/gallium: mkdir -p $@ diff --git a/src/mesa/drivers/dri/Makefile.template b/src/mesa/drivers/dri/Makefile.template index a0c25d2..dcffa70 100644 --- a/src/mesa/drivers/dri/Makefile.template +++ b/src/mesa/drivers/dri/Makefile.template @@ -53,7 +53,7 @@ lib: symlinks subdirs depend $(LIBNAME): $(OBJECTS) $(MESA_MODULES) $(EXTRA_MODULES) Makefile \ $(TOP)/src/mesa/drivers/dri/Makefile.template $(MKLIB) -o $@ -noprefix -linker '$(CC)' -ldflags '$(LDFLAGS)' \ - $(OBJECTS) $(MESA_MODULES) $(EXTRA_MODULES) $(DRI_LIB_DEPS) + -Wl,--no-undefined $(OBJECTS) $(MESA_MODULES) $(EXTRA_MODULES) $(DRI_LIB_DEPS) -L$(TOP)/lib -lGL $(TOP)/$(LIB_DIR)/$(LIBNAME): $(LIBNAME) -- 1.6.3.3 -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] nv30/nv40 Gallium drivers unification
Perhaps try running make clean and make if you haven't already? And perhaps make sure that the installed libGL.so and DRI drivers are build from the same codebase. The changes in my branch definitely shouldn't affect this. I wanted to merge Lucsa' branch in to my copy of Mesa master to test it out,but it would let me for some reason,any advice on that? What reason? git merge should work. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Mesa (master): st/mesa: Always recalculate invalid index bounds.
But for any such technique, the mesa state tracker will need to figure out what memory is being referred to by those non-VBO vertex buffers and to do that requires knowing the index min/max values. Isn't the min/max value only required to compute a sensible value for the maximum user buffer length? (the base pointer is passed to gl*Pointer) The fact is, that we don't need to know how large the user buffer is if the CPU is accessing it (or if we have a very advanced driver that faults memory in the GPU VM on demand, and/or a mechanism to let the GPU share the process address space). As you said, this happens for instance with swtnl, but also with drivers that scan the index buffer and copy the referenced vertex for each index onto the GPU FIFO themselves (e.g. nv50 and experimental versions of nv30/nv40). So couldn't we pass ~0 or similar as the user buffer length, and have the driver use an auxiliary module on draw calls to determine the real length, if necessary? Of course, drivers that upload user buffers on creation (if any exists) would need to be changed to only do that on draw calls. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
[Mesa3d-dev] [PATCH] nv30/nv40 Gallium drivers unification
Currently the nv30 and nv40 Gallium drivers are very similar, and contain about 5000 lines of essentially duplicate code. I prepared a patchset (which can be found at http://repo.or.cz/w/mesa/mesa-lb.git/shortlog/refs/heads/unification+fixes) which gradually unifies the drivers, one file per the commit. A new nvfx directory is created, and unified files are put there one by one. After all patches are applied, the nv30 and nv40 directories are removed and the only the new nvfx directory remains. The first patches unify the engine naming (s/curie/eng3d/g; s/rankine/eng3d), and switch nv40 to use the NV34TCL_ constants. Initial versions of this work changed renouveau.xml to create a new NVFXTCL object, but the current version doesn't need any renouveau.xml modification at all. The unification+fixes branch referenced above is the one that should be tested. The unification branch contains just the unification, with no behavior changes, while unification+fixes also fixes swtnl and quad rendering, allowing to better test the unification. Some cleanups on top of the unfication are also included. That same repository also contains other branches with significant improvements on top of the unification, but I'm still not proposing them for inclusion as they need more testing and some fixes. While there are some branches in the Mesa repository that would conflict with this, such branches seem to be popping up continuously (and this is good!), so waiting until they are merged probably won't really work. The conflicts are minimal anyway and the driver fixes can be very easily reconstructed over the unified codebase. How about merging this? Any objections? Any comments? -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Mesa (gallium-sampler-view): st/mesa: Associate a sampler view with an st texture object.
What if you have a non-integer min LOD? While the integer part may belong to the sampler view, the fractional part really seems to be a sampler property. Requiring min_lod 1.0 also doesn't seem to make much sense, so shouldn't it be kept as it is now? Same thing for last_level / max_lod. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Mesa (master): st/mesa: Always recalculate invalid index bounds.
Isn't it possible to compute the maximum legal index by just taking the minimum of: (vb-buffer-size - vb-buffer_offset - ve-src_offset) / vb-stride over all vertex buffers/elements? Isn't the kernel checker doing something like this? -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Mesa (master): st/mesa: Always recalculate invalid index bounds.
Actually, why is the state tracker doing the min/max computation at all? If the driver does the index lookup itself, as opposed to using an hardware index buffer, (e.g. the nouveau drivers do this in some cases) this is unnecessary and slow. Would completely removing the call to vbo_get_minmax_index break anything? Also, how about removing the max_index field in pipe_vertex_buffer? This seems to be set to the same value for all vertex buffers, and the value is then passed to draw_range_elements too. Isn't the value passed to draw_range_elements sufficient? -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Revert ST_SURFACE_DEPTH = BUFFER_DEPTH in master too?
I've looked into the issue, and found a workaround by looking at what st_renderbuffer_alloc_storage (which is called to create the depth buffer with ST_SURFACE_DEPTH != BUFFER_DEPTH) does. Adding: if(ctx) ctx-NewState |= _NEW_BUFFERS; at the end of st_set_framebuffer_surface seems to solve the warsow problem with no other regressions. Brian, is this the right fix? Marek, does it fix your r300g problems too? -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Revert ST_SURFACE_DEPTH = BUFFER_DEPTH in master too?
Solves the Warsow issue and seems to work. Thanks! -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Revert ST_SURFACE_DEPTH = BUFFER_DEPTH in master too?
Shouldn't _mesa_add_renderbuffer(stfb-Base, BUFFER_FRONT_LEFT, rb); be _mesa_add_renderbuffer(stfb-Base, surfIndex, rb); instead, since you seem to make the on-demand creation mechanism generic and no longer limited to the front buffer? -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
[Mesa3d-dev] Revert ST_SURFACE_DEPTH = BUFFER_DEPTH in master too?
In mesa_7_7_branch, 52d83efdbc4735d721e6fc9b44f29bdd432d4d73 reverts commit 9d17ad2891b58de9e33e943ff918a678c6a3c2bd. How about cherry-picking that commit into master, until a fix for the bugs the revert commit introduces are found? The reverted commit currently breaks the Warsow main menu for me, making it show garbage. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] gallium cached bufmgr busy change
We can do this optimisation with busy as well. As long as you add things to the busy list at the end, and stop testing after the first busy call. At least for a single linear GPU context, which is all I expect this code will ever be handling. Wouldn't this just end up reinventing the fenced bufmgr? Basically cached needs a list of all destroyed buffers (ideally in destruction order, so it can do the stopping optimization when expiring buffers), while the busy mechanism needs a list of all used buffers (destroyed or not) in usage order. So it seems it would need two lists, and essentially result in something that replicates fenced inside cached. BTW, right now I think all drivers use a single GPU context in userspace. Even Nouveau multiplexes Gallium contexts on a single channel (this is probably broken though). -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] gallium cached bufmgr busy change
I think you are supposed to do this using the fenced bufmgr over cached along with a (ideally userspace) fencing mechanism. If you can implement pb_busy, you should be able to implement fence_signalled in exactly the same way (making the fence handle a pointer to buffers should work for this purpose, if standalone fences are hard to do). The fenced bufmgr will only pass destruction requests to the wrapped bufmgr once the fences are signalled. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Mesa (master): glsl/pp: Add asserts to check for null pointer deferences.
For static analysis with Coverity Prevent, the added assert will clear a defect report and/or allow it to continue parsing to the next possible defect. Are these being checked manually and determined to be false positives? If not, then it would be beneficial to not shut up static analysis, since it actually could be a problem. If yes, perhaps it would be useful to have a comment explaining why it is not a false positive, unless the reasoning is often trivial, which means that the static analyzer isn't doing a very good job. Also, is the whole concept of having a static analyzer assume that asserts are true a good idea? Shouldn't it instead specifically attempt to check whether the assertions in the code are always true? (and have some other means to flag false positives, perhaps not involving source modification) Finally, does the checker provide some easy and license-allowed way of making the analysis results public? (e.g. by putting up the same web interface they used for their open source checking demos) BTW, I just looked at one of the assert commits, and found it actually _introduces_ a bug. Look at the assert(attrib_list) added in 706fffbff59be0dc884e1938f1bdf731af1efa3e. This ends up asserting that the attrib_list in glXCreatePixmap is not NULL. But the GLX specification says that it can be NULL, and it will usually be. The memcpy does not crash because when attrib_list is NULL, the length parameter passed to it is 0, as the code before shows. Thus, that commit should be reverted, and replaced with either no change or by surrounding the memcpy with if(attrib_list) or if(i) . Ideally, we could also mark the if, as well as the if(attrib_list) above with unlikely() while we are at it. Are we sure all the other commits like this are correct and actually flag false positives, as opposed to hiding real bugs? -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Mesa (master): glsl/pp: Add asserts to check for null pointer deferences.
Just noticed that has already been fixed in 5f40a7aed12500fd6792e2453f49c3b5c54d with an if(attrib_list). -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] RFC: gallium-format-cleanup branch (was Gallium format swizzles)
PIPE_FORMAT_X8B8G8R8_UNORM is being used by mesa. PIPE_FORMAT_R8G8B8X8_UNORM doesn't exist hence it appears to be unnecessary. So it doesn't make sense to rename. How about D3DFMT_X8B8G8R8? That should map to PIPE_FORMAT_R8G8B8X8_UNORM. BTW, we are also missing D3DFMT_X4R4G4B4, D3DFMT_X1R5G5B5, D3DFMT_A4L4, D3DFMT_A1, D3DFMT_L6V5U5, D3DFMT_D15S1, D3DFMT_D24X4S4, D3DFMT_CxV8U8 and perhaps others I did not notice. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Does DX9 SM3 - VMware svga with arbitrary semantics work? How?
BTW, i915 is also limited to 0-7 generic indices, and thus doesn't work with GLSL at all right now. This should be relatively easy to fix since it should be enough to store the generic indices in the texCoords arrays, and then pass them to draw_find_shader_output. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
[Mesa3d-dev] Does DX9 SM3 - VMware svga with arbitrary semantics work? How?
I've been looking at shader semantics some more, and I'm a bit surprised by how the svga driver works. It seems that an obvious implementation of a DirectX 9 state tracker just won't work with the svga driver. In SM3, vertex/fragment semantics can be arbitrary (independent of hardware resources), but indices are limited to a 0-15 range. A DirectX 9 state tracker must convert those to TGSI_SEMANTIC_GENERIC. How does the VMware one do that? Assuming that it maps them directly, this means that the driver must support GENERIC semantic indices up to a number that varies between about 200 and 255. The problem is that the vmware svga driver, as far as I can see, doesn't support indices greater than 15. This is caused by the fact that it maps all GENERIC semantics to SVGA3D_DECLUSAGE_TEXCOORD, and the index bitfield in the svga virtual interface only supports 4 bits. In other words, SM3 under VMware with arbitrary semantics (allowed by SM3 and other drivers) really seems broken, for a straightforward DirectX9 state tracker implementation. The only way it can work now is if the DirectX 9 state tracker looks at both the vertex and pixel shaders, links them, and outputs sequential semantic indices. It seems to me that the svga driver should be fixed to map GENERIC to *all* SM3 semantic types, ideally in a way that reverses the SM3 - GENERIC transformation done by the DX9 state tracker. Doing this requires to specify a maximum index for TGSI_SEMANTIC_GENERIC which is very carefully chosen to allow 1:1 mapping with SM3, so that DirectX 9 state trackers have enough indices to represent all SM3, and the svga driver can fit all indices in the SM3-like semantics of the VMware virtual GPU interface. The correct value in this case seems to be 219 = 14 * 16 SM3 semantics - 5 for COLOR0, COLOR1, PSIZE0, POSITION0, FOG0 which have specific TGSI semantics which they need to mapped to/from. I'm looking at this because this seems the strictest constraint on choosing a limit for TGSI_SEMANTIC_GENERIC indices. The other constraint is due to r600 supporting only byte-sized semantic/index combinations, which is less strict than SM3. BTW, glsl also looks artificially limited on svga, as only 6 varyings will be supported, due to it starting from 10. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Does DX9 SM3 - VMware svga with arbitrary semantics work? How?
I don't think anybody has tried hooking it up - so far the primary purpose of the svga gallium driver has been GL support, but thinking about it you're probably right. I'm a bit confused about this: I was under the impression that VMware Tools for Windows used your DirectX state tracker and a WGL version of Mesa, talking to the svga Gallium driver. How does it actually work? What do you normally use the DirectX 9 state tracker with? The details of the closed code aren't terribly important as they could always be changed. Sure, but it currently is the only Gallium user that supports the SM3 model and thus the only one that really needs arbitrary semantic indices, and puts constraints on them. The correct value in this case seems to be 219 = 14 * 16 SM3 semantics - 5 for COLOR0, COLOR1, PSIZE0, POSITION0, FOG0 which have specific TGSI semantics which they need to mapped to/from. Agree, though I'd opt for 255 as a round number. The problem with this is that you only have 14 SM3 semantics with 16 indices each, so you can't map 256 generic indices into the VMware interface, or directly into an SM3 shader. You only have 14 * 16 minus the ones used for non-GENERIC semantics (the one mentioned above). And of course, if you choose a smaller number, you can't map SM3 _into_ Gallium, so you need to choose the exact number required for SM3. Tying Gallium in this way to SM3 is surely a bit ugly, but it's just a constant, and I don't see any other way to implement SM3 without doing linkage in software in the r600 and svga drivers and/or in SM3 state trackers. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Does DX9 SM3 - VMware svga with arbitrary semantics work? How?
The difference between an easier and harder life for (some) drivers is whether the limit is tied to hardware interpolators or not. Once we decide to not tie it, whether the limit is 128 or 256 is of course quite inconsequential. Allowing arbitrary 32-bit values would however require use of binary search or an hash table. I think you or someone else from the Mesa team should decide how to proceed, and most drivers would need to be fixed. As I understand, the constraints are the following: Hardware with no capabilities. - nv30 does not support any mapping. However, we already need to patch fragment programs to insert constants, so we can patch input register numbers as well. The current driver only supports 0-7 generic indices, but I already implemented support for 0-255 indices with in-driver linkage and patching. Note that nv30 lacks control flow in fragment programs. - nv40 is like nv30, but supports fp control flow, and may have some configurable mapping support, with unknown behavior Hardware with capabilities that must be configured for each fp/vp pair. - nv40 might have this but the nVidia OpenGL driver does not use them - nv50 has configurable vp-gp and gp-fp mappings with 64 entries. The current driver seems to support arbitrary 0-2^32 indices. - r300 appears to have a configurable vp-fp mapping. The current driver only supports 0-15 generic indices, but redefining ATTR_GENERIC_COUNT could be enough to have it support larger numbers. Hardware with automatic linkage when semantics match: - VMWare svga appears to support 14 * 16 semantics, but the current driver only supports 0-15 generic indices. This could be fixed by mapping GENERIC into all non-special SM3 semantics. Hardware that can do both configurable mappings and automatic linkage: - r600 supports linkage in hardware between matching apparently byte-sized semantic ids Other hardware; - i915 has no hardware vertex shading - Not sure about i965 Software: 1. SM3 wants to use 14 * 16 indices overall. This is apparently only supported by the VMware closed source state tracker. 2. SM2 and non-GLSL OpenGL just want to use as many indices as the hardware interpolator count 3. Current GLSL currently wants to use at most about 10 indices more than the hardware interpolator count. This can be fixed since we see both the fragment and vertex shaders during linkage (the patch I sent did that) 4. GLSL with EXT_separate_shader_objects does not add requirements because only gl_TexCoord and other builtin varyings are supported. User-defined varyings are not supported 5. An hypotetical version of EXT_separate_shader_objects extended to support user-defining varyings would either want arbitrary 32-bit generic indices (by interning strings to generate the indices) or the ability to specify a custom mapping between shader indices 6. An hypotetical no-op implementation of the GLSL linker would have the same requirement Also note that non-GENERIC indices have peculiar properties. For COLOR and BCOLOR: 1. SM3 and OpenGL with glColorClamp appropriately set wants it to _not_ be clamped to [0, 1] 2. SM2 and normal OpenGL apparently want it to be clamped to [0, 1] (sometimes for fixed point targets only) and may also allow using U8_UNORM precision for it instead of FP32 3. OpenGL allows to enable two-sided lighting, in which case COLOR in the fragment shader is automagically set to BCOLOR for back faces 4. Older hardware (e.g. nv30) tends to support BCOLOR but not FACING. Some hardware (e.g. nv40) supports both FACING and BCOLOR in hardware. The latest hardware probably supports FACING only. Any API that requires special semantics for COLOR and BCOLOR (i.e. non-SM3) seems to only want 0-1 indices. Note that SM3 does *not* include BCOLOR, so basically the limits for generic indices would need to be conditional on BCOLOR being present or not (e.g. if it is present, we must reserve two semantic slots in svga for it). POSITION0 is obviously special. PSIZE0 is also special for points. FOG0 seems right now to just be a GENERIC with a single component. Gallium could be extended to support fixed function fog, which most DX9 hardware supports (nv30/nv40 and r300). This is mostly orthogonal to the semantic issue. TGSI_SEMANTIC_NORMAL is essentially unused and should probably be removed The options are the ones you outlined, plus: (e) Allow arbitrary 32-bit indices. This requires slightly more complicated data structures in some cases, and will require svga and r600 to fallback to software linkage if numbers are too high. (f) Limit semantic indices to hardware interpolators _and_ introduce an interface to let the user specify an Personally I think the simplest idea for now could be to have all drivers support 256 indices or, in the case of r600 and svga, the maximum value supported by the hardware, and expose that as a cap (as well as another cap for the number of different semantic values supported at once). The minimum guaranteed value is set to the lowest
Re: [Mesa3d-dev] Does DX9 SM3 - VMware svga with arbitrary semantics work? How?
On Tue, Mar 2, 2010 at 10:00 PM, Corbin Simpson mostawesomed...@gmail.com wrote: FYI r300 only supports 24 interpolators: 16 linear and 8 perspective. (IIRC; not in front of the docs right now.) r600 supports 256 fully configurable interpolators. Yes, but if you raised ATTR_GENERIC_COUNT, the current driver would support higher semantic indices right? (of course, with a limit of 8/24 different semantic indices used at once). -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] RFC: gallium-format-cleanup branch (was Gallium format swizzles)
Shouldn't PIPE_FORMAT_X8B8G8R8_UNORM= 68, be instead R8G8B8X8_UNORM, which is currently missing, for consistency with: PIPE_FORMAT_R8G8B8X8_SNORM= 81, with X8B8G8R8_UNORM perhaps put at the end next to PIPE_FORMAT_A8B8G8R8_UNORM? -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Gallium software fallback/draw command failure
Falling back to CPU rendering, while respecting the OpenGL spec, is likely going to be unusably slow in most cases and thus not really better for real usage than just not rendering. I think the only way to have an usable fallback mechanism is to do fallbacks with the GPU, by automatically introducing multiple rendering passes. For instance, if you were to run each fragment shader instruction in a separate pass (using floating point targets), then you would never have more than two texture operands. If the render targets are too large, you can also just split them in multiple portions, and you can limit texture size so that 2 textures plus a render target portion always fit in memory. Alternatively, you can split textures too, try to statically deduce the referenced portion and KIL if you guessed wrong, combined with occlusion queries to check if you KILled. Control flow complicates things, but you can probably just put the execution mask in a stencil buffer or secondary render target/texture, and use occlusion queries to find out if it is empty. Of course, this requires to write and test a very significant amount of complex code (probably including a TGSI-LLVM-TGSI infrastructure, since you likely need nontrivial compiler techniques to do this optimally). However, we may need part of this anyway to support multi-GPU configurations, and it also allows to emulate advanced shader capabilities on less capable hardware (e.g. shaders with more instructions or temporaries than the hardware limitations, or SM3+/GLSL shaders on SM2 hardware), with some hope of getting usable performance. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
[Mesa3d-dev] PK/UP* and NV_[vertex|fragment]_program* support in Gallium?
I see that PK2US and friends are being removed. These would be necessary to implement NV_fragment_program_option, NV_fragment_program2 and NV_gpu_program4. Currently the no drivers (including Nouveau) support them, but since we already have some support in Mesa (even parsers for the nVidia syntax), it would be nice to support them in Gallium eventually. Not sure about STR/SFL though: they can be encoded/decoded as MOV x, 0/1, but they complete the SETcond instruction set. How about keeping them and adding a capability bit for them? -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
[Mesa3d-dev] [PATCH] pipebuffer: avoid assert due to increasing a zeroed refcnt
The cache manager stores buffers with a reference count that dropped to 0. pipe_reference asserts in this case on debug builds, so use pipe_reference_init instead. --- src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c b/src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c index 53bc019..86f9266 100644 --- a/src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c +++ b/src/gallium/auxiliary/pipebuffer/pb_bufmgr_cache.c @@ -294,7 +294,7 @@ pb_cache_manager_create_buffer(struct pb_manager *_mgr, LIST_DEL(buf-head); pipe_mutex_unlock(mgr-mutex); /* Increase refcount */ - pipe_reference(NULL, buf-base.base.reference); + pipe_reference_init(buf-base.base.reference, 1); return buf-base; } -- 1.6.6.1.476.g01ddb -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
[Mesa3d-dev] [PATCH] pipebuffer: fix inverted signalled checking
A return of 0 means the fence is signalled. --- .../auxiliary/pipebuffer/pb_buffer_fenced.c|2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/gallium/auxiliary/pipebuffer/pb_buffer_fenced.c b/src/gallium/auxiliary/pipebuffer/pb_buffer_fenced.c index 95eb5f6..d97f749 100644 --- a/src/gallium/auxiliary/pipebuffer/pb_buffer_fenced.c +++ b/src/gallium/auxiliary/pipebuffer/pb_buffer_fenced.c @@ -696,7 +696,7 @@ fenced_buffer_map(struct pb_buffer *buf, * Don't wait for the GPU to finish accessing it, if blocking is forbidden. */ if((flags PIPE_BUFFER_USAGE_DONTBLOCK) - ops-fence_signalled(ops, fenced_buf-fence, 0) == 0) { + ops-fence_signalled(ops, fenced_buf-fence, 0) != 0) { goto done; } -- 1.6.6.1.476.g01ddb -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] pipebuffer: check for unsynchronized usage before looking at flags
Good catch of the fence_signalled negated logic. This was actually mentioned on IRC by Maarten Maathuis (who was working on adding pipebuffer support to the nv50 driver). Thanks to him :) -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] [PATCH] pipebuffer: check for unsynchronized usage before looking at flags
+ if (flags PIPE_BUFFER_USAGE_UNSYNCHRONIZED) { This should be: if (!(flags PIPE_BUFFER_USAGE_UNSYNCHRONIZED)) { Sorry for this. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev