José Fonseca wrote on 2010-01-07 14:45: > On Thu, 2010-01-07 at 05:25 -0800, Zack Rusin wrote: > >> On Thursday 07 January 2010 06:50:36 José Fonseca wrote: >> >>> I wonder if storage size of registers is such a big issue. Knowing the >>> storage size of a register matters mostly for indexable temps. For >>> regular assignments and intermediate computations storage everything >>> gets transformed in SSA form, and the register size can be determined >>> from the instructions where it is generated/used and there is no need >>> for consistency. >>> >>> For example, imagine a shader that has: >>> >>> TEX TEMP[0], SAMP[0], IN[0] // SAMP[0] is a PIPE_FORMAT_R32G32B32_FLOAT >>> --> use 4x32bit float registers MAX ?? >>> ... >>> TEX TEMP[0], SAMP[1], IN[0] // SAMP[1] is a >>> PIPE_FORMAT_R64G64B64A64_FLOAT --> use 4x64bit double registers DMAX ????, >>> TEMP[0], ??? >>> >> That's not an issue because such a format doesn't exist. There's no 256bit >> sampling in any api. It's one of the self-inflicted wounds that we have. >> R64G64 >> is the most you'll get right now. >> > > That's interesting. Never realized that. > > >>> TEX TEMP[0], SAMP[2], IN[0] // texture 0 and rendertarget are both >>> PIPE_FORMAT_R8G8B8A8_UNORM --> use 4x8bit unorm registers MOV OUT[0], >>> TEMP[0] >>> >>> etc. >>> >>> There is actually programmable 3d hardware out there that has special >>> 4x8bit registers, and for performance the compiler has to deduct where >>> to use those 4xbit. llvmpipe will need to do similar thing, as the >>> smaller the bit-width the higher the throughput. And at least current >>> gallium statetrackers will reuse temps with no attempt to maintain >>> consistency in use. >>> >>> So if the compilers already need to deal with this, if this notion that >>> registers are 128bits is really necessary, and will prevail in the long >>> term. >>> >> Somehow this is the core issue it's the fact that TGSI is untyped anything >> but >> "register size" is constant implies "TGSI is typed but the actual types have >> to be deduced by the drivers" which goes against what Gallium was about (we >> put the complexity in the driver). >> >> The question of 8bit vs 32bit and 64bit vs 32bit are really different >> questions. The first one is about optimization - it will work perfectly well >> if >> the 128bit registers will be used, the second one is about correctness - it >> will not work if 128bit registers will be used for doubles and it will not >> work if 256bit registers will be used for floats. >> > > True. > > >> Also we don't have a 4x8bit >> instructions, they're all 4x32bit instructions (float, unsigned ints, signed >> ints), so doubles will be the first differently sized instructions. Which in >> turn will mean that either TGSI will have to be actually statically typed, >> but >> not typed declared i.e. D_ADD will only be able to take two 256bit registers >> as inputs and if anything else is passed it has to throw an error, which is >> especially difficult that those registers didn't have a size declared but it >> would have to be inferred from previous instructions, or we'd have to allow >> mixing sizes of all inputs, e.g. D_ADD can operate on both 4x32 or 4x64 >> which >> simply moves the problem from above into the driver. >> >> Really, unless we'll say "the entire pipeline can run in 4x64" like we did >> for >> floats then I don't see an easier way of dealing with this than the xy, zw, >> swizzle form. >> > > Ok. I didn't felt strongly either way, but now I'm more convinced that > restricting xy zw swizzles is less painful. Thanks for explaining this > Zack. > > Zack,
1. Do I understand correctly that while D_ADD dst.xy, src1.xy, src2.zw will add one double, is the following code D_ADD dst, src1, src2.zwxy also valid, and results in two doubles being added together? 2. Is the list of double-precision opcodes proposed by Igor roughly enough for OpenCL implementation? Thanks. ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev