On Fri, Jul 27, 2012 at 10:42 AM, Tanya Lattner <[email protected]> wrote: > > On Jul 27, 2012, at 2:41 AM, Hal Finkel wrote: > >> On Mon, 23 Jul 2012 13:14:07 -0700 >> Tanya Lattner <[email protected]> wrote: >> >>> >>> On Jul 18, 2012, at 6:51 PM, John McCall wrote: >>> >>>> On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote: >>>>> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote: >>>>>> Hi Tanya, >>>>>> Looks good and usefull, but I'm not sure if it should be clang's >>>>>> decision if storing and loading vec4s is better than vec3. >>>>> >>>>> The idea was to have Clang generate code that the optimizers would >>>>> be more likely to do something useful and smart with. I understand >>>>> the concern, but I'm not sure where the best place for this would >>>>> be then? >>>> >>>> Hmm. The IR size of a <3 x blah> is basically the size of a <4 x >>>> blah> anyway; arguably the backend already has all the information >>>> blah> it needs for this. Dan, what do you think? >>>> >>>> One objection to doing this in the frontend is that it's not clear >>>> to me that this is a transformation we should be doing if <4 x >>>> blah> isn't actually legal for the target. But I'm amenable to the >>>> blah> idea that this belongs here. >>>> >>> >>> I do not think its Clangs job to care about this as we already have >>> this problem for other vector sizes and its target lowering's job to >>> fix it. >>> >>>> I'm also a little uncomfortable with this patch because it's so >>>> special-cased to 3. I understand that that might be all that >>>> OpenCL really cares about, but it seems silly to add this code that >>>> doesn't also kick in for, say, <7 x i16> or whatever. It really >>>> shouldn't be difficult to generalize. >>> >>> While it could be generalized, I am only 100% confident in the >>> codegen for vec3 as I know for sure that it improves the code quality >>> that is ultimately generated. This is also throughly tested by our >>> OpenCL compiler so I am confident we are not breaking anything and we >>> are improving performance. >> >> On the request of several people, I recently enhanced the BB vectorizer >> to produce odd-sized vector types (when possible). This was for two >> reasons: >> >> 1. Some targets actually have instructions for length-3 vectors >> (mostly for doing things on x,y,z triples), and they wanted >> autovectorization support for these. >> >> 2. This is generally a win elsewhere as well because the odd-length >> vectors will be promoted to even-length vectors (which is good >> compares to leaving scalar code). >> >> In this context, I am curious to know how generating length-4 vectors >> in the frontend gives better performance. Is this something that the BB >> vectorizer (or any other vectorizer) should be doing as well? >> > > While I think you are doing great work with your vectorizer, its not > something we can rely on yet to get back this performance since its not on by > default. > > A couple more comments about my patch. First, vec3 is an OpenCL type that is > defined in the spec that says it should have the same size and alignment as > vec4. So generating this code pattern for load/stores makes sense in that > context. I can only enable this if its OpenCL, but I think anyone who uses > vec3 would want this performance win. > > Secondly, I'm not arguing that this is the ultimate fix, but it is a valid, > simple, and easy to remove once other areas of the compiler are improved to > handle this case. > > After discussing with others, the proper fix might be something like this: > 1. Enhance target data to encode the "native" vector types for the target, > along the same lines as the "native" integer types that it already can do. > 2. Enhance the IR optimizer to force vectors to those types when it can get > away with it. > > This optimization needs to be early enough that other mid-level optimizations > can take advantage of it. > > I'd still like to check this in as there is not an alternative solution that > exists right now. Because there is some debate, I think the code owner needs > to have final say here (cc-ing Doug).
I think it makes sense for this to go in. If our optimizers improve to the point where they can handle vec3 directly as well as/better than vec3-lowered-as-vec4, we can revisit this topic. - Doug _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
