On Thu, May 5, 2011 at 12:19 AM, Xinliang David Li <davi...@google.com> wrote:
>>
>> I can think of some more-or-less obvious high-level forms, one would
>> for example simply stick a new DISPATCH tree into gimple_call_fn
>> (similar to how we can have OBJ_TYPE_REF there), the DISPATCH
>> tree would be of variable length, first operand the selector function
>> and further operands function addresses.  That would keep the
>> actual call visible (instead of a fake __builtin_dispatch call), something
>> I'd really like to see.
>
> This sounds like a good long term solution.

Thinking about it again maybe, similar to OBJ_TYPE_REF, have the
selection itself lowered and only keep the set of functions as
additional info.  Thus instead of having the selector function as
first operand have a pointer to the selected function there (that also
avoids too much knowledge about the return value of the selector).
Thus,

  sel = selector ();
  switch (sel)
   {
   case A: fn = &bar;
   case B: fn = &foo;
   }
  val = (*DISPATCH (fn, bar, foo)) (...);

that way regular optimizations can apply to the selection, eventually
discard the dispatch if fn becomes a known direct function (similar
to devirtualization).  At expansion time the call address is simply
taken from the first operand and an indirect call is assembled.

Does the above still provide enough knowledge for the IPA path isolation?

>> Restricting ourselves to use the existing target attribute at the
>> beginning (with a single, compiler-generated selector function)
>> is probably good enough to get a prototype up and running.
>> Extending it to arbitrary selector-function, value pairs using a
>> new attribute is then probably easy (I don't see the exact use-case
>> for that yet, but I suppose it exists if you say so).
>
> For the use cases, CPU model will be looked at instead of just the
> core architecture -- this will give use more information about the
> numbrer of cores, size of caches etc. Intel's runtime library does
> this checkiing at start up time so that the multi-versioned code can
> look at those and make the appropriate decisions.
>
> It will be even more complicated for arm processors -- which can have
> the same processor cores but configured differently w.r.t VFP, NEON
> etc.

Ah, indeed.  I hadn't thought about the tuning for different variants
as opposed to enabling HW features.  So the interface for overloading
would be sth like

enum X { Foo = 0, Bar = 5 };

enum X select () { return Bar; }

void foo (void) __attribute__((dispatch(select, Bar)));

which either means having pairs of function / select return value in
the DISPATCH operands or having it partly lowered as I outlined
above.

>> For the overloading to work we probably have to force that the
>> functions are local (so we can mangle them arbitrarily) and that
>> if the function should be visible externally people add an
>> externally visible dispatcher (foo in the above example would be one).
>>
>
> For most of the cases, probably only the primary/default version needs
> to be publicly visible ..

Yeah.  And that one we eventually can auto-transform to use IFUNC
relocations.

Richard.

Reply via email to