On Fri, Jul 27, 2012 at 10:42 AM, Tanya Lattner <[email protected]> wrote:
>
> On Jul 27, 2012, at 2:41 AM, Hal Finkel wrote:
>
>> On Mon, 23 Jul 2012 13:14:07 -0700
>> Tanya Lattner <[email protected]> wrote:
>>
>>>
>>> On Jul 18, 2012, at 6:51 PM, John McCall wrote:
>>>
>>>> On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote:
>>>>> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote:
>>>>>> Hi Tanya,
>>>>>> Looks good and usefull, but I'm not sure if it should be clang's
>>>>>> decision if storing and loading vec4s is better than vec3.
>>>>>
>>>>> The idea was to have Clang generate code that the optimizers would
>>>>> be more likely to do something useful and smart with. I understand
>>>>> the concern, but I'm not sure where the best place for this would
>>>>> be then?
>>>>
>>>> Hmm.  The IR size of a <3 x blah> is basically the size of a <4 x
>>>> blah> anyway;  arguably the backend already has all the information
>>>> blah> it needs for this.  Dan, what do you think?
>>>>
>>>> One objection to doing this in the frontend is that it's not clear
>>>> to me that this is a transformation we should be doing if <4 x
>>>> blah> isn't actually legal for the target.  But I'm amenable to the
>>>> blah> idea that this belongs here.
>>>>
>>>
>>> I do not think its Clangs job to care about this as we already have
>>> this problem for other vector sizes and its target lowering's job to
>>> fix it.
>>>
>>>> I'm also a little uncomfortable with this patch because it's so
>>>> special-cased to 3.  I understand that that might be all that
>>>> OpenCL really cares about, but it seems silly to add this code that
>>>> doesn't also kick in for, say, <7 x i16> or whatever.  It really
>>>> shouldn't be difficult to generalize.
>>>
>>> While it could be generalized, I am only 100% confident in the
>>> codegen for vec3 as I know for sure that it improves the code quality
>>> that is ultimately generated. This is also throughly tested by our
>>> OpenCL compiler so I am confident we are not breaking anything and we
>>> are improving performance.
>>
>> On the request of several people, I recently enhanced the BB vectorizer
>> to produce odd-sized vector types (when possible). This was for two
>> reasons:
>>
>> 1. Some targets actually have instructions for length-3 vectors
>> (mostly for doing things on x,y,z triples), and they wanted
>> autovectorization support for these.
>>
>> 2. This is generally a win elsewhere as well because the odd-length
>> vectors will be promoted to even-length vectors (which is good
>> compares to leaving scalar code).
>>
>> In this context, I am curious to know how generating length-4 vectors
>> in the frontend gives better performance. Is this something that the BB
>> vectorizer (or any other vectorizer) should be doing as well?
>>
>
> While I think you are doing great work with your vectorizer, its not 
> something we can rely on yet to get back this performance since its not on by 
> default.
>
> A couple more comments about my patch. First, vec3 is an OpenCL type that is 
> defined in the spec that says it should have the same size and alignment as 
> vec4. So generating this code pattern for load/stores makes sense in that 
> context. I can only enable this if its OpenCL, but I think anyone who uses 
> vec3 would want this performance win.
>
> Secondly, I'm not arguing that this is the ultimate fix, but it is a valid, 
> simple, and easy to remove once other areas of the compiler are improved to 
> handle this case.
>
> After discussing with others, the proper fix might be something like this:
> 1. Enhance target data to encode the "native" vector types for the target, 
> along the same lines as the "native" integer types that it already can do.
> 2. Enhance the IR optimizer to force vectors to those types when it can get 
> away with it.
>
> This optimization needs to be early enough that other mid-level optimizations 
> can take advantage of it.
>
> I'd still like to check this in as there is not an alternative solution that 
> exists right now. Because there is some debate, I think the code owner needs 
> to have final say here (cc-ing Doug).

I think it makes sense for this to go in. If our optimizers improve to
the point where they can handle vec3 directly as well as/better than
vec3-lowered-as-vec4, we can revisit this topic.

  - Doug

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to