On Mon, 12 Mar 2012 04:25:54 -0500, Manu <turkey...@gmail.com> wrote:
On 12 March 2012 04:00, Robert Jacques <sandf...@jhu.edu> wrote:

On Sun, 11 Mar 2012 18:15:31 -0500, Timon Gehr <timon.g...@gmx.ch> wrote:

 On 03/11/2012 11:58 PM, Robert Jacques wrote:

Manu was arguing that MRV were somehow special and had mystical
optimization potential. That's simply not true.


Not exactly mystical, but it is certainly there.

void main(){
    auto a = foo(); // MRV/struct return
    bar(&a.x); // defined in a different compilation unit
}

struct return has to write out the whole struct on the stack because of
layout guarantees, probably making the optimized struct return calling
convention somewhat slower for this case. The same does not hold for MRV.


The layout of the struct only has to exist _when_ the address is taken.
Before that, the compiler/language/optimizer is free to (and does) do
whatever it want. Besides, in your example only the address of a field is
taken, the compiler will optimize away all the other pieces a (dead
variable elimination).


No, it can't. That's the point. It must preserve the struct in case you
fiddle with the pointer. Taking the pointer is explicit in this case, but
if you passed anything in the struct to another function by ref, you've
setup the same scenario.

Okay, to be clear about things, once a struct is returned the optimizer can do 
anything to it wants. Certain compilers are extremely aggressive about this 
because on their hardware it matters. C and C++ compilers do this today, so 
yes, compilers can.

Wait, ARM?! That's really cool. However, as far as I know, D on ARM is very
experimental. Having an experimental compiler not eak out every last cycle
is not something that should be unexpected.

That said, I'm not sure what point you were trying to make, aside from
backend quality-of-implementation issues. I think bringing these issues up
is important, but they are tangent to the language changes you're asking
for.


This is using GCC's backend which is not really experimental, it has
decades of field use. The point here is that we are seeing the effect of
the C ABI applied directly to this problem, and it's completely un-workable.
I'm trying to show that D needs to declare something of an ABI promise when
applied to this problem if it is to be a useful+efficient feature. Again, C
can't express this problem, and we won't get any value from of the C ABI to
make this contruct efficient, but a very simple and efficient solution does
exist.

GCC is very large collection of things and its backend has a general reputation 
of being second place to the commercial vendors by a decent margin (25+%) and I 
think also to LLVM. I was more referring to GDC's mapping to the GCC arm 
backend and the associated runtime issues, etc.

As for a simple and efficient solution existing: show me and academic paper or 
compiler that gets it right. Then show me the study on a large codebase that 
its actually more efficient. Then we will listen. Until then, I'm liable to 
trust existing wisdom.

Why should D place this constraint on future compilers? D currently only
specifies the ABI for x86. I'm fairly sure it would follow the best
practices for each of the other architecture, but none of them have been
established yet.


Constraint? Perhaps you mean 'liberation'...
The x86 ABI is not a *best* practise by a long shot. It is only banking on
a traditional x86 trick for small structs.

Let us assume for a moment that the x86 design is good for x86, but terrible 
for ARM and vice versa. Why should either backend do something subpar for the 
other. Generating code for a IOE CPU vs OOE CPU vs a stack machine vs a 
register machine are all very different operations and the backend should have 
the liberation to do whatever is best.

I'm was giving you an example that seemed to satisfy your complaints. An
no, actually it can't return in those registers at zero cost. There is a
reason why we don't use all the registers to both pass and return
arguments: we need some registers free to work on them both before and
after the call.


"D should define an MRV ABI which is precisely the ABI for passing multiple
args TO a function, but in reverse, for any given architecture." .. I've
never said anything about using ALL the registers, I say to use all the
ARGUMENT registers.
On x64, that is 4 GPR regs, and 4 XMM regs.

The point is that increasing the number of return registers isn't free and that 
simply matching the best number of argument registers is not, ipso facto ideal.


I know Go has MRV. What does its ABI look like? What does ARM prefer? I'd
recommend citing some papers or a compiler or something. Otherwise, it
looks like you're ignoring the wisdom of the masses or simply ignorant.


I don't have a Go toolchain, do you wanna run my tests above?
Are you suggesting I have no idea what I'm talking about with respect to
efficient calling conventions? The very fastest way is to return in the
registers designed for the job. This is true for x64, ARM, everything. What
to do when you exceed the argument register limit is a question for each
architecture, but I maintain it should behave exactly as it does when
calling a function, this way you create the possibility of super-efficient
chain-calls.

But the return itself _isn't_ the core measure of the performance of the ABI; 
if the returner has to evacuate some of those return registers in order to 
compute the other return values, then you have to unnecessarily copy the 
evacuated value back to the registers. Similarly, if the returnee has to 
evacuate any of the returned registers in order to compute the next value, 
unnecessary copies happen.

LLVM has support for MRV how I describe:

The biggest change in LLVM 2.3 is Multiple Return Value (MRV) support. MRVs
allow LLVM IR to directly represent functions that return multiple values
without having to pass them "by reference" in the LLVM IR. This allows a
front-end to generate more efficient code, *as MRVs are generally returned
in registers if a target supports them*. See the LLVM IR
Reference<http://llvm.org/releases/2.3/docs/LangRef.html#i_getresult>
for
more details.

Thanks for looking this up. From the reference:

  %struct.A = type { i32, i8 }
  %r = call %struct.A @foo()
  %gr = getresult %struct.A %r, 0    ; yields i32:%gr
  %gr1 = getresult %struct.A %r, 1   ; yields i8:%gr1
  add i32 %gr, 42
  add i8 %gr1, 41

and

The 'getresult' instruction takes a call or invoke value as its first argument, 
or an undef value. The value must have structure type. The second argument is a 
constant unsigned index value which must be in range for the number of values 
returned by the call.

It would appear that LLVM implements MRV via structs. Furthermore, I'm not positive on 
what they mean by "by reference", but I know some languages implement MRV using 
arrays.

MRVs are fully supported in the LLVM IR, but are not yet fully supported in
on all targets. However, it is generally safe to return up to 2 values from
a function: most targets should be able to handle at least that. MRV
support is a critical requirement for X86-64 ABI support, as X86-64
requires the ability to return multiple registers from functions, and we
use MRVs to accomplish this *in a direct way*.
In this case, if we have the expression defined in the language (the other
guys have convinced me we do, via tuples), it's conceivable the front end
could present it to LLVM in such a way that it can produce great code
already.

Digging into the x86-64 ABI, what they are talking about is the ability to 
support the two return GPR and two return XMM; i.e. the C ABI. Also, 
documentation on this changes between different systems.

P.S. The fun(gun()) case is interesting, but it seems like a corner case.
Designing the ABI around it feels wrong, if it hurts performance elsewhere.


It's certainly not the goal of the feature, just a nice little side effect.
And the MRV feature its self certainly doesn't hurt anything else
anywhere...
The whole thing is a feature that is missing from the C ABI, because C
simply can't express the concept, so there's never been a reason to define
it.

Except, the documentation you've linked to _is_ the C ABI.

Reply via email to