Re: jhc vs ghc and the surprising result involving ghcgeneratedassembly.
* Lennart Augustsson: > Simon Marlow wrote: >>>Is it correct that you use indirect gotos across functions? Such >>>gotos aren't supported by GCC and work only by accident. >> Yes, but cross-function gotos are always to the beginning of a >> function. > > Is that enough to ensure that the constant pool base register > is reloaded on the Alpha? Good point, most of the restrictions I mentioned result from the need to update the GP pointer. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: jhc vs ghc and the surprising result involving ghcgeneratedassembly.
Simon Marlow wrote: Is it correct that you use indirect gotos across functions? Such gotos aren't supported by GCC and work only by accident. Yes, but cross-function gotos are always to the beginning of a function. Is that enough to ensure that the constant pool base register is reloaded on the Alpha? -- Lennart ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: jhc vs ghc and the surprising result involving ghcgeneratedassembly.
* Simon Marlow: >> However, beginning with GCC 3.4, you can use: >> >> extern void bar(); >> >> void foo() >> { >> void (*p)(void) = bar; >> p(); >> } > > Interesting.. though I'm not sure I'm comfortable with relying on gcc's > tail call optimisation to do the right thing. Aren't there side > conditions that might prevent it from kicking in? It's a target-specific optimization. For i386, the requirements are roughly speaking, (a) it works with -fPIC only for very special cases (direct calls within the same module), (b) the return values must be the same, (c) for indirect calls, there must be a free register (currently, this means that regparam must be less than 3; irrelevant if you don't pass any arguments). AMD64 has only very few restrictions, none of which seem particularly relevant. ia64 may need additional hints before the optimization is performed (non-default visibility of the target function), otherwise the optimization is only performed within the same translation unit. PowerPC and SPARC cannot optimize indirect calls. Common MIPS targets should be fine. So your concern is valid; this optimization is not always available. It might be possible to extend GCC with something that violates the ABI and fits your needs, though, in case the current goto hack no longer works. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: jhc vs ghc and the surprising result involving ghcgeneratedassembly.
On 02 November 2005 13:59, Florian Weimer wrote: > However, beginning with GCC 3.4, you can use: > > extern void bar(); > > void foo() > { > void (*p)(void) = bar; > p(); > } Interesting.. though I'm not sure I'm comfortable with relying on gcc's tail call optimisation to do the right thing. Aren't there side conditions that might prevent it from kicking in? > And the indirect call is turned into a direct jump. Tail recursive > calls and really indirect tail calls are also optimzed. Together with > -fomit-frame-pointer, this could give you what you need, without > post-processing the generated assembler code (which is desirable > because the asm volatile statements inhibit further optimization). > > Is it correct that you use indirect gotos across functions? Such > gotos aren't supported by GCC and work only by accident. Yes, but cross-function gotos are always to the beginning of a function. Also, our post-processor removes the function prologue from the asm. GHC via C has always worked "by accident" :-) But it has worked for a long time with careful tweaking of the post-processor (known as the mangler) for each new version of gcc. Yes, we're living dangerously, and it's getting harder, but we're still alive (just). Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: jhc vs ghc and the surprising result involving ghcgeneratedassembly.
On 27 October 2005 12:12, John Meacham wrote: >> Note that GHC's back end is really aimed at producing good code when >> there are registers available for passing arguments - this isn't >> true on x86 or x86_64 at the moment, though. > > Hrm? why are registers not available on x86_64? I thought it had a > plethora. (compared to the i386) mutter mutter... a bunch of the registers are reserved for argument passing in the C calling convention, and when I tried to steal them I ran into trouble around foreign calls. It should/might be possible to work around this, I need to have another go. It works fine with the NCG, of course. > I was thinking something like the worker/wrapper split, ghc would > recognize when a function takes only unboxed arguments and returns an > unboxed result (these can probably be relaxed, no evals is the key > thing) > > so in the case of fac, it would create > > int fac(int n, int r) { > if (n == 1) return 1; > return fac (n - 1,n*r); > } > > and (something like) > > void fac_wrapper(void) { > continuation = pop() // I might be mixing up the order of these > n = pop() > r = pop() > > x = fac(n,r) > > push(x) > jump(continuation) > > } Well yes, but if the worker needs to return to the scheduler (i.e. if it does a heap check or stack check) then the C stack is all messed up and we need a setjmp/longjmp to get back to the scheduler. You can do it in the case where there are no heap/stack checks, but I think that's very rare. > I am not sure how much sense this makes though. I am no expert on the > spineless tagless G machine (which would make an excellent name for a > band BTW) :-D > fortunatly, modern CPUs anticipate this conondrum and provide > 'write-combining' forms of their memory access functions, these will > write a value directly to RAM without touching the cache at all. This > will always be a win when updating thunks due to the reasons mentioned > above and is potentially a big benefit. selective write-combining is > in the top 3 performance enhancing things according to the cpu > optimization manuals. > > I think the easiest way to do this would be to have a MACRO defined to > an appropriate bit of assembly or a simple C assignment if the > write-combining mov's arn't available. very good idea, I must try that. Any more progress on why our x86_64 code is slow? Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users