Re: [Mesa3d-dev] [RFC] add support to double opcodes

michal Thu, 07 Jan 2010 06:12:24 -0800

José Fonseca wrote on 2010-01-07 14:45:
> On Thu, 2010-01-07 at 05:25 -0800, Zack Rusin wrote:
>   
>> On Thursday 07 January 2010 06:50:36 José Fonseca wrote:
>>     
>>> I wonder if storage size of registers is such a big issue. Knowing the
>>> storage size of a register matters mostly for indexable temps. For
>>> regular assignments and intermediate computations storage everything
>>> gets transformed in SSA form, and the register size can be determined
>>> from the instructions where it is generated/used and there is no need
>>> for consistency.
>>>
>>> For example, imagine a shader that has:
>>>
>>>    TEX TEMP[0], SAMP[0], IN[0]  // SAMP[0] is a PIPE_FORMAT_R32G32B32_FLOAT
>>>  --> use 4x32bit float registers MAX ??
>>>    ...
>>>    TEX TEMP[0], SAMP[1], IN[0]  // SAMP[1] is a
>>>  PIPE_FORMAT_R64G64B64A64_FLOAT --> use 4x64bit double registers DMAX ????,
>>>  TEMP[0], ???
>>>       
>> That's not an issue because such a format doesn't exist. There's no 256bit 
>> sampling in any api. It's one of the self-inflicted wounds that we have. 
>> R64G64 
>> is the most you'll get right now.
>>     
>
> That's interesting. Never realized that.
>
>   
>>>    TEX TEMP[0], SAMP[2], IN[0] // texture 0 and rendertarget are both 
>>>  PIPE_FORMAT_R8G8B8A8_UNORM  --> use 4x8bit unorm registers MOV OUT[0],
>>>  TEMP[0]
>>>
>>> etc.
>>>
>>> There is actually programmable 3d hardware out there that has special
>>> 4x8bit registers, and for performance the compiler has to deduct where
>>> to use those 4xbit. llvmpipe will need to do similar thing, as the
>>> smaller the bit-width the higher the throughput. And at least current
>>> gallium statetrackers will reuse temps with no attempt to maintain
>>> consistency in use.
>>>
>>> So if the compilers already need to deal with this, if this notion that
>>> registers are 128bits is really necessary, and will prevail in the long
>>> term.
>>>       
>> Somehow this is the core issue it's the fact that TGSI is untyped anything 
>> but 
>> "register size" is constant implies "TGSI is typed but the actual types have 
>> to be deduced by the drivers" which goes against what Gallium was about (we 
>> put the complexity in the driver). 
>>
>> The question of 8bit vs 32bit and 64bit vs 32bit are really different 
>> questions. The first one is about optimization - it will work perfectly well 
>> if 
>> the 128bit registers will be used, the second one is about correctness - it 
>> will not work if 128bit registers will be used for doubles and it will not 
>> work if 256bit registers will be used for floats. 
>>     
>
> True.
>
>   
>> Also we don't have a 4x8bit 
>> instructions, they're all 4x32bit instructions (float, unsigned ints, signed 
>> ints), so doubles will be the first differently sized instructions. Which in 
>> turn will mean that either TGSI will have to be actually statically typed, 
>> but 
>> not typed declared i.e. D_ADD will only be able to take two 256bit registers 
>> as inputs and if anything else is passed it has to throw an error, which is 
>> especially difficult that those registers didn't have a size declared but it 
>> would have to be inferred from previous instructions, or we'd have to allow 
>> mixing sizes of all inputs, e.g. D_ADD can operate on both 4x32 or 4x64 
>> which 
>> simply moves the problem from above into the driver.
>>
>> Really, unless we'll say "the entire pipeline can run in 4x64" like we did 
>> for 
>> floats then I don't see an easier way of dealing with this than the xy, zw, 
>> swizzle form.
>>     
>
> Ok. I didn't felt strongly either way, but now I'm more convinced that
> restricting xy zw swizzles is less painful. Thanks for explaining this
> Zack.
>
>   
Zack,


1. Do I understand correctly that while

D_ADD dst.xy, src1.xy, src2.zw

will add one double, is the following code

D_ADD dst, src1, src2.zwxy

also valid, and results in two doubles being added together?

2. Is the list of double-precision opcodes proposed by Igor roughly 
enough for OpenCL implementation?

Thanks.

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Mesa3d-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Re: [Mesa3d-dev] [RFC] add support to double opcodes

Reply via email to