On Jul 23, 2012, at 1:24 PM, Dan Gohman wrote:

> 
> On Jul 23, 2012, at 11:34 AM, Tanya Lattner <[email protected]> wrote:
> 
>> 
>> On Jul 19, 2012, at 11:51 AM, Dan Gohman wrote:
>> 
>>> 
>>> On Jul 18, 2012, at 6:51 PM, John McCall <[email protected]> wrote:
>>> 
>>>> On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote:
>>>>> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote:
>>>>>> Hi Tanya,
>>>>>> Looks good and usefull, but I'm not sure if it should be clang's 
>>>>>> decision if storing and loading vec4s is better than vec3.
>>>>> 
>>>>> The idea was to have Clang generate code that the optimizers would be 
>>>>> more likely to do something useful and smart with. I understand the 
>>>>> concern, but I'm not sure where the best place for this would be then?
>>>> 
>>>> Hmm.  The IR size of a <3 x blah> is basically the size of a <4 x blah> 
>>>> anyway;  arguably the backend already has all the information it needs for 
>>>> this.  Dan, what do you think?
>>> 
>>> I guess optimizer passes won't be extraordinarily happy about all this
>>> bitcasting and shuffling. It seems to me that we have a problem in that
>>> we're splitting up the high-level task of "lower <3 x blah> to <4 x blah>"
>>> and doing some of it in the front-end and some of it in the backend.
>>> Ideally, we should do it all in one place, for conceptual simplicity, and
>>> to avoid the awkwardness of having the optimizer run in a world that's
>>> half one way and half the other, with awkward little bridges between the
>>> two halves.
>> 
>> I think its hard to speculate that the optimizer passes are not happy about 
>> the bit cast and shuffling. I'm running with optimizations on and the code 
>> is still much better than not having Clang do this "optimization" for vec3.
> 
> Sorry for being unclear; I was speculating more about future optimization
> passes. I don't doubt your patch achieves its purpose today.
> 
>> I strongly feel that Clang can make the decision to output code like this if 
>> it leads to better code in the end. 
> 
> Ok. What do you think about having clang doing all of the lowering
> of <3 x blah> to <4 x blah> then? I mean all of the aritihmetic,
> function arguments and return values, and so on? In other words, is
> there something special about loads and stores of vec3, or are they
> just one symptom of a broader vec3 problem?
> 

For function args and return values, the calling convention will coerce the 
types (on X86). I haven't had time to totally verify, but I think that 
arithmetic is done correctly in the backend via widening. So its mostly this 
one issue that we are trying to address. 

While it still may be a good idea of the backends to optimize situations such 
as this, I think its still ok for Clang to go ahead and effectively widen the 
vector when doing its code generation since it is a win for most targets 
(assuming as I can't test them all). vec3 is pretty important for the OpenCL 
community and we'd like it to have good performance. 

Does anyone have a firm objection to this going in? I realize that all backends 
could be modified to try to handle this, but I don't see this happening in the 
near future. 

-Tanya



> Of course, I'm not asking you do this work right now; I'm asking
> whether this would be a better overall design.
> 
> Dan
> 

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to