Re: jhc vs ghc and the surprising result involving ghcgeneratedassembly.

2005-11-02 Thread Florian Weimer
* Lennart Augustsson:

> Simon Marlow wrote:
>>>Is it correct that you use indirect gotos across functions?  Such
>>>gotos aren't supported by GCC and work only by accident.
>> Yes, but cross-function gotos are always to the beginning of a
>> function.
>
> Is that enough to ensure that the constant pool base register
> is reloaded on the Alpha?

Good point, most of the restrictions I mentioned result from the need
to update the GP pointer.
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: jhc vs ghc and the surprising result involving ghcgeneratedassembly.

2005-11-02 Thread Lennart Augustsson

Simon Marlow wrote:

Is it correct that you use indirect gotos across functions?  Such
gotos aren't supported by GCC and work only by accident.



Yes, but cross-function gotos are always to the beginning of a function.


Is that enough to ensure that the constant pool base register
is reloaded on the Alpha?

-- Lennart
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: jhc vs ghc and the surprising result involving ghcgeneratedassembly.

2005-11-02 Thread Florian Weimer
* Simon Marlow:

>> However, beginning with GCC 3.4, you can use:
>> 
>> extern void bar();
>> 
>> void foo()
>> {
>>   void (*p)(void) = bar;
>>   p();
>> }
>
> Interesting.. though I'm not sure I'm comfortable with relying on gcc's
> tail call optimisation to do the right thing.  Aren't there side
> conditions that might prevent it from kicking in?

It's a target-specific optimization.  For i386, the requirements are
roughly speaking, (a) it works with -fPIC only for very special cases
(direct calls within the same module), (b) the return values must be
the same, (c) for indirect calls, there must be a free register
(currently, this means that regparam must be less than 3; irrelevant
if you don't pass any arguments).

AMD64 has only very few restrictions, none of which seem particularly
relevant.

ia64 may need additional hints before the optimization is performed
(non-default visibility of the target function), otherwise the
optimization is only performed within the same translation unit.
PowerPC and SPARC cannot optimize indirect calls.

Common MIPS targets should be fine.

So your concern is valid; this optimization is not always available.
It might be possible to extend GCC with something that violates the
ABI and fits your needs, though, in case the current goto hack no
longer works.
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: jhc vs ghc and the surprising result involving ghcgeneratedassembly.

2005-11-02 Thread Simon Marlow
On 02 November 2005 13:59, Florian Weimer wrote:

> However, beginning with GCC 3.4, you can use:
> 
> extern void bar();
> 
> void foo()
> {
>   void (*p)(void) = bar;
>   p();
> }

Interesting.. though I'm not sure I'm comfortable with relying on gcc's
tail call optimisation to do the right thing.  Aren't there side
conditions that might prevent it from kicking in?
 
> And the indirect call is turned into a direct jump.  Tail recursive
> calls and really indirect tail calls are also optimzed.  Together with
> -fomit-frame-pointer, this could give you what you need, without
> post-processing the generated assembler code (which is desirable
> because the asm volatile statements inhibit further optimization).
> 
> Is it correct that you use indirect gotos across functions?  Such
> gotos aren't supported by GCC and work only by accident.

Yes, but cross-function gotos are always to the beginning of a function.
Also, our post-processor removes the function prologue from the asm.

GHC via C has always worked "by accident" :-)  But it has worked for a
long time with careful tweaking of the post-processor (known as the
mangler) for each new version of gcc.  Yes, we're living dangerously,
and it's getting harder, but we're still alive (just).

Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: jhc vs ghc and the surprising result involving ghcgeneratedassembly.

2005-10-27 Thread Simon Marlow
On 27 October 2005 12:12, John Meacham wrote:

>> Note that GHC's back end is really aimed at producing good code when
>> there are registers available for passing arguments - this isn't
>> true on x86 or x86_64 at the moment, though.
> 
> Hrm? why are registers not available on x86_64? I thought it had a
> plethora. (compared to the i386)

mutter mutter... a bunch of the registers are reserved for argument
passing in the C calling convention, and when I tried to steal them I
ran into trouble around foreign calls.  It should/might be possible to
work around this, I need to have another go.  It works fine with the
NCG, of course.

> I was thinking something like the worker/wrapper split, ghc would
> recognize when a function takes only unboxed arguments and returns an
> unboxed result (these can probably be relaxed, no evals is the key
> thing)
> 
> so in the case of fac, it would create
> 
> int fac(int n, int r) {
> if (n == 1) return 1;
> return fac (n - 1,n*r);
> }
> 
> and (something like)
> 
> void fac_wrapper(void) {
> continuation = pop()   // I might be mixing up the order of these
> n = pop()
> r = pop()
> 
> x = fac(n,r)
> 
> push(x)
> jump(continuation)
> 
> }

Well yes, but if the worker needs to return to the scheduler (i.e. if it
does a heap check or stack check) then the C stack is all messed up and
we need a setjmp/longjmp to get back to the scheduler.  You can do it in
the case where there are no heap/stack checks, but I think that's very
rare.

> I am not sure how much sense this makes though. I am no expert on the
> spineless tagless G machine  (which would make an excellent name for a
> band BTW)

:-D

> fortunatly, modern CPUs anticipate this conondrum and provide
> 'write-combining' forms of their memory access functions, these will
> write a value directly to RAM without touching the cache at all. This
> will always be a win when updating thunks due to the reasons mentioned
> above and is potentially a big benefit. selective write-combining is
> in the top 3 performance enhancing things according to the cpu
> optimization manuals.
> 
> I think the easiest way to do this would be to have a MACRO defined to
> an appropriate bit of assembly or a simple C assignment if the
> write-combining mov's arn't available.

very good idea, I must try that.  Any more progress on why our x86_64
code is slow?

Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users