On Thu, May 5, 2011 at 12:19 AM, Xinliang David Li <davi...@google.com> wrote: >> >> I can think of some more-or-less obvious high-level forms, one would >> for example simply stick a new DISPATCH tree into gimple_call_fn >> (similar to how we can have OBJ_TYPE_REF there), the DISPATCH >> tree would be of variable length, first operand the selector function >> and further operands function addresses. That would keep the >> actual call visible (instead of a fake __builtin_dispatch call), something >> I'd really like to see. > > This sounds like a good long term solution.
Thinking about it again maybe, similar to OBJ_TYPE_REF, have the selection itself lowered and only keep the set of functions as additional info. Thus instead of having the selector function as first operand have a pointer to the selected function there (that also avoids too much knowledge about the return value of the selector). Thus, sel = selector (); switch (sel) { case A: fn = &bar; case B: fn = &foo; } val = (*DISPATCH (fn, bar, foo)) (...); that way regular optimizations can apply to the selection, eventually discard the dispatch if fn becomes a known direct function (similar to devirtualization). At expansion time the call address is simply taken from the first operand and an indirect call is assembled. Does the above still provide enough knowledge for the IPA path isolation? >> Restricting ourselves to use the existing target attribute at the >> beginning (with a single, compiler-generated selector function) >> is probably good enough to get a prototype up and running. >> Extending it to arbitrary selector-function, value pairs using a >> new attribute is then probably easy (I don't see the exact use-case >> for that yet, but I suppose it exists if you say so). > > For the use cases, CPU model will be looked at instead of just the > core architecture -- this will give use more information about the > numbrer of cores, size of caches etc. Intel's runtime library does > this checkiing at start up time so that the multi-versioned code can > look at those and make the appropriate decisions. > > It will be even more complicated for arm processors -- which can have > the same processor cores but configured differently w.r.t VFP, NEON > etc. Ah, indeed. I hadn't thought about the tuning for different variants as opposed to enabling HW features. So the interface for overloading would be sth like enum X { Foo = 0, Bar = 5 }; enum X select () { return Bar; } void foo (void) __attribute__((dispatch(select, Bar))); which either means having pairs of function / select return value in the DISPATCH operands or having it partly lowered as I outlined above. >> For the overloading to work we probably have to force that the >> functions are local (so we can mangle them arbitrarily) and that >> if the function should be visible externally people add an >> externally visible dispatcher (foo in the above example would be one). >> > > For most of the cases, probably only the primary/default version needs > to be publicly visible .. Yeah. And that one we eventually can auto-transform to use IFUNC relocations. Richard.