inside the GHC code generator
The following link gives reasons for not generating via C http://groups.google.com/groups?hl=enlr=ie=UTF-8selm=4zp8kn7xe.fsf_-_%40beta.franz.com Naturally a number of these are common lisp specific, however I think that Haskell and GCC are quite semantically different, so using GCC might prevent a lot of optimizations, that aren't possible in C, but would be in GHC. The OCAML-OPT backend is supposed to produce i386 assembly that is competitive with GCC. Maybe this could be ported to GHC? Rene. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: inside the GHC code generator
Hello Rene, Thursday, February 23, 2006, 4:19:15 PM, you wrote: RdV The following link gives reasons for not generating via C RdV http://groups.google.com/groups?hl=enlr=ie=UTF-8selm=4zp8kn7xe.fsf_-_%40beta.franz.com i read it now RdV Naturally a number of these are common lisp specific, however I think that RdV Haskell and GCC are quite semantically different, so using GCC might prevent RdV a lot of optimizations, that aren't possible in C, but would be in GHC. seems that you don;t understand the situation. ghc compiles Haskell to language called core, do almost all optimizations at level of this language, then translates final result to the STG language from that the C-- code is generated. changing the translation of STG can't prevent ANY ghc optimization. although iy is not so easy because ghc code generation and RTS closely tied together RdV The OCAML-OPT backend is supposed to produce i386 assembly that is RdV competitive with GCC. Maybe this could be ported to GHC? can you please show the code for the factorial accumulating function: factorial n r = if n=1 then r else (factorial (n - 1) (n*r)) i think that ocaml can't generate code better than gcc and especially icc (intel C/C++ compiler), but may be i'm wrong? ;) -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: inside the GHC code generator
seems that you don;t understand the situation. ghc compiles Haskell to language called core, do almost all optimizations at level of this language, then translates final result to the STG language from that the C-- code is generated. changing the translation of STG can't prevent ANY ghc optimization. although iy is not so easy because ghc code generation and RTS closely tied together I should have been a bit clearer here. I meant that optimizations that are available from STG - Assembler, are better than STG - C - Assembler. GHC currently doesn't do most of the optimizations I am thinking of. -- Bit tagging to reduce pointer chasing, speed up pattern matching. Due to memory latency and speed it is quicker to do bit masking rather than memory reads -- Parameter passing and regisgter usage opimizations that rely on the structure of the RTS. -- Multiple stacks with custom frame layout. -- dynamic code optimization etc. -- Taking advantage of special assember instructions and flags. Though I have also seen comments that you can do a lot of these with GCC if you do your own stack and parameter management. i.e. don't use the C stack at all. Though your suggestions are probably better than nothing, which is probably what the alternative is (for instance I have not sufficient time to work on these things). Note that I didn't say that the assembly generation of OCAML was better than GCC, just that it was comparable. Rene. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: inside the GHC code generator
Bulat Ziganshin wrote: i think that ocaml can't generate code better than gcc and especially icc (intel C/C++ compiler), but may be i'm wrong? ;) didn't try factorial, but exponential fib in ocaml is *FASTER* than both gcc and intel c/c++ with highest optimization levels ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: inside the GHC code generator
the long answer is: are you ever heard promises that gcc is best cpu-independent assembler? no? and you know why? because gcc is not cpu-independent assembler. gcc was strongly optimized to make efficient asm from the code usually written by the C programmers. but code generated by ghc has nothing common with it. so we are stay with all these register-memory moves, non-unrolled loops and all other results of naive compilation. gcc is just don't know how to efficiently manage such code! would there be any advantage to targetting gcc's backend directly? I notice that Mercury seems to support this: http://www.cs.mu.oz.au/research/mercury/download/gcc-backend.html http://gcc.gnu.org/frontends.html that is, does using C as assembler disable possible optimizations, or is going through the C frontend enabling more optimizations than going to the backend directly? just curious, claus ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re[2]: inside the GHC code generator
Hello kyra, Thursday, February 23, 2006, 5:38:54 PM, you wrote: k Bulat Ziganshin wrote: i think that ocaml can't generate code better than gcc and especially icc (intel C/C++ compiler), but may be i'm wrong? ;) k didn't try factorial, but exponential fib in ocaml is *FASTER* than both k gcc and intel c/c++ with highest optimization levels i prefer to see the asm code. this may be because of better high-level optimization strategies (reusing fib values). the scheme about i say will combine advantages of both worlds -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re[2]: inside the GHC code generator
Hello Rene, Thursday, February 23, 2006, 5:32:21 PM, you wrote: seems that you don;t understand the situation. ghc compiles Haskell to language called core, do almost all optimizations at level of this language, then translates final result to the STG language from that the C-- code is generated. changing the translation of STG can't prevent ANY ghc optimization. although iy is not so easy because ghc code generation and RTS closely tied together RdV I should have been a bit clearer here. I meant that optimizations that are RdV available from STG - Assembler, are better than STG - C - Assembler. theoretically - yes. in practice, it is hard to compete with gcc. ghc wait for such code generation 15 years (wait for the mountain to go in our direction :) and i think that the REAL way to reach gcc level optimization is to compile to the idiomatic C instead of continue waiting while this theoretically possible optimization will arise RdV GHC currently doesn't do most of the optimizations I am thinking of. RdV -- Bit tagging to reduce pointer chasing, speed up pattern matching. Due to RdV memory latency and speed it is quicker to do bit masking rather than memory RdV reads RdV -- Parameter passing and regisgter usage opimizations that rely on the RdV structure of the RTS. RdV -- Multiple stacks with custom frame layout. RdV -- dynamic code optimization etc. RdV -- Taking advantage of special assember instructions and flags. i think that all these things can be done with idiomatic C way RdV Though I have also seen comments that you can do a lot of these with GCC if RdV you do your own stack and parameter management. i.e. don't use the C stack RdV at all. as i see in the current code generation, attaempts to do our own stack management lead to that gcc just don't understand such code and don't optimize it as good as possible. please compare code generated for fac() function with all other variants RdV Though your suggestions are probably better than nothing, which is probably RdV what the alternative is (for instance I have not sufficient time to work on RdV these things). moreover, i sure that you can't compete with gcc :) RdV Note that I didn't say that the assembly generation of OCAML was better than RdV GCC, just that it was comparable. what mean comparable? and what we should do to reuse this code generation in ghc? at least i know what to do to reuse gcc code generation. can you propose similar plan to reuse ocaml code generation? -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: inside the GHC code generator
Hello Rene, Thursday, February 23, 2006, 4:19:15 PM, you wrote: RdV The following link gives reasons for not generating via C RdV http://groups.google.com/groups?hl=enlr=ie=UTF-8selm=4zp8kn7xe.fsf_-_%40beta.franz.com i done reading. my question - is YOU read this? the lisp problems have almost nothing in common with haskell -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: inside the GHC code generator
Claus Reinke writes: the long answer is: are you ever heard promises that gcc is best cpu-independent assembler? no? and you know why? because gcc is not cpu-independent assembler. gcc was strongly optimized to make efficient asm from the code usually written by the C programmers. but code generated by ghc has nothing common with it. so we are stay with all these register-memory moves, non-unrolled loops and all other results of naive compilation. gcc is just don't know how to efficiently manage such code! would there be any advantage to targetting gcc's backend directly? I notice that Mercury seems to support this: http://www.cs.mu.oz.au/research/mercury/download/gcc-backend.html http://gcc.gnu.org/frontends.html that is, does using C as assembler disable possible optimizations, or is going through the C frontend enabling more optimizations than going to the backend directly? On a related point, Mercury has two C backends a low level one at the level of GHC's and a high level one. Bulat might want to read this for a description of the high level C implementation: http://www.cs.mu.oz.au/research/mercury/information/papers.html#hlc_cc Also, ghc used to be faster than gcc for a naive, recursive factorial function (once the cpr analysis and optimisation was added). From what Bulat wrote it seems that gcc got better ... k ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re[2]: inside the GHC code generator
Hello Claus, Thursday, February 23, 2006, 8:56:57 PM, you wrote: the long answer is: are you ever heard promises that gcc is best cpu-independent assembler? no? and you know why? because gcc is not cpu-independent assembler. gcc was strongly optimized to make efficient asm from the code usually written by the C programmers. but code generated by ghc has nothing common with it. so we are stay with all these register-memory moves, non-unrolled loops and all other results of naive compilation. gcc is just don't know how to efficiently manage such code! CR would there be any advantage to targetting gcc's backend directly? CR I notice that Mercury seems to support this: CR http://www.cs.mu.oz.au/research/mercury/download/gcc-backend.html CR http://gcc.gnu.org/frontends.html CR that is, does using C as assembler disable possible optimizations, CR or is going through the C frontend enabling more optimizations than CR going to the backend directly? i will read this. but one disadvantage is obvious - we can't use other C compilers, including more efficient intel c/c++ (for my code it was constantly 10% faster) -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re[2]: inside the GHC code generator
Hello Kevin, Thursday, February 23, 2006, 9:06:25 PM, you wrote: KG On a related point, Mercury has two C backends a low level one at the KG level of GHC's and a high level one. Bulat might want to read this for KG a description of the high level C implementation: KG KG http://www.cs.mu.oz.au/research/mercury/information/papers.html#hlc_cc citating from this paper's annotation: Many logic programming implementations compile to C, but they compile to very low-level C, and thus discard many of the advantages of compiling to a high-level language. it's the same as i think KG Also, ghc used to be faster than gcc for a naive, recursive factorial KG function (once the cpr analysis and optimisation was added). From KG what Bulat wrote it seems that gcc got better ... i don't say that we must compile recursive Haskell/STG functions naively to recursive C ones (as jhc does). no - i say that we should translate recursive Haskell definitions to explicit C loops, what is NATURAL C PROGRAMMING STYLE and therefore optimized much better as you can see in the files i attached -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re[2]: inside the GHC code generator
Bulat Ziganshin writes: Hello Kevin, KG Also, ghc used to be faster than gcc for a naive, recursive factorial KG function (once the cpr analysis and optimisation was added). From KG what Bulat wrote it seems that gcc got better ... i don't say that we must compile recursive Haskell/STG functions naively to recursive C ones (as jhc does). no - i say that we should translate recursive Haskell definitions to explicit C loops, what is NATURAL C PROGRAMMING STYLE and therefore optimized much better as you can see in the files i attached Ahhh, OK. I misunderstood. I don't claim ghc beat a C loop. ghc was just more optimised at making function calls :-). k ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: inside the GHC code generator
From: Bulat Ziganshin [EMAIL PROTECTED] Hello Rene, i done reading. my question - is YOU read this? the lisp problems have almost nothing in common with haskell It is a long time since i read this, but some things come to mind. Listed below. Maybe GHC should generate better C. I am just not sure whether this will bring the best global (as opposed to micro-optimizations) performance. However doing anything else might be difficult. Generating better C might also be much more difficult for idioms that don't exist in C, and might encourage people to write Haskell that looks like C (in order to get C like performance). I prefer idiomatic Haskell. It would be nice if this could be fast. But can idiomatic Haskell be translated to efficient C? It might not have many direct loops for example. -- Other notes Integer is about 30 times slower than it needs to be on GHC if you have over half the values between 2^-31 and 2^31. On i386 can you basically can test the low bit for free (it runs in parallel to the integer add instruction). This would allow only values outside this range to required calling the long integer code. Such an optimization is not easily done in C. This encourages Haskell programmers to always use Int, even if the value might get too big, because Integer is too slow. Also Tail recursion is more general than looping. A general tail call optimization will bring better returns than a loop optimization in GHC (though I could be wrong here). This requires special stack handling. Also not so easy in C. If only simple loops are optimized it will encourage people to always code loops in their haskell rather than perhaps using more appropriate constructs. Also take the Maybe data type with Nothing and Just ... or any other datatypes with 0 and 1 variable constructors. Here these could be represent by special values for the 0 variable case and bit marking on the single constructor values. This could lead to good optimizations on case expressions. Also not so easy in C. The lack of this feature encourages people to encode their datatypes as Int's to get speed. Also not good. Whether we can communicate the non aliasing and aliasing properties of GHC to the C compiler, I am also not so sure. Also in GHC, I am not sure whether stack base locals are the best move. It might be best to have a fixed page as register spill in some cases. If you wish to pass and return unboxed tuples without reboxing you will probably required a special function interface with double entry points (I think this can be done in C, but it is a bit tricky). Summary: If only C like Haskell is fast, we end up with Haskell that looks like C. Yuck. Rene. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re[2]: inside the GHC code generator
Hello Rene, Thursday, February 23, 2006, 10:17:40 PM, you wrote: RdV Maybe GHC should generate better C. I am just not sure whether this will RdV bring the best global (as opposed to micro-optimizations) performance. i answered in the original letter (search for Cray :) RdV Generating better C might also be much more difficult for idioms that don't RdV exist in C, i will be glad to see concrete examples RdV and might encourage people to write Haskell that looks like C RdV (in order to get C like performance). yes, they do just it now (see shootout entries or my Streams library) but still don't get C performance. so i think that we will go to something better position :) RdV I prefer idiomatic Haskell. It would be nice if this could be fast. But can RdV idiomatic Haskell be translated to efficient C? It might not have many RdV direct loops for example. sorry, i don't understand that you mean? in general, idiomatic Haskell has big efficiency problem and this problem is laziness. the strict Haskell code is, like ML code, can be compiled efficient RdV -- Other notes RdV Integer is about 30 times slower than it needs to be on GHC if you have over RdV half the values between 2^-31 and 2^31. On i386 can you basically can test RdV the low bit for free (it runs in parallel to the integer add instruction). RdV This would allow only values outside this range to required calling the long RdV integer code. Such an optimization is not easily done in C. RdV This encourages Haskell programmers to always use Int, even if the value RdV might get too big, because Integer is too slow. this optimization is not implemented in ghc until now and i think that amount of work required to implement it is bigger than requirements to do faster Integer processing. moreover, i hope that good optimizing C compiler can avoid additional testing RdV Also Tail recursion is more general than looping. A general tail call RdV optimization will bring better returns than a loop optimization in GHC RdV (though I could be wrong here). This requires special stack handling. Also RdV not so easy in C. seems that you don't seen the attached files. tail calls are optimized in gcc RdV If only simple loops are optimized it will encourage people to always code RdV loops in their haskell rather than perhaps using more appropriate RdV constructs. you are prefer that people will not have any option to make fast code? :) also i want to emphasize that C loops is the Haskell recursion. we always said that these two constructs are equivalent, now it's the time to generate code with the same speed RdV Also take the Maybe data type with Nothing and Just ... or any other RdV datatypes with 0 and 1 variable constructors. Here these could be represent RdV by special values for the 0 variable case and bit marking on the single RdV constructor values. This could lead to good optimizations on case RdV expressions. RdV Also not so easy in C. 1) it is far from current ghc optimization facilities 2) i don't see problems with using if() RdV The lack of this feature encourages people to encode their datatypes as RdV Int's to get speed. Also not good. RdV Whether we can communicate the non aliasing and aliasing properties of GHC RdV to the C compiler, I am also not so sure. concrete examples? RdV Also in GHC, I am not sure whether stack base locals are the best move. It RdV might be best to have a fixed page as register spill in some cases. it's the optimization that gcc can handle much better :) just see at the code generated for the fac() function RdV If you wish to pass and return unboxed tuples without reboxing you will RdV probably required a special function interface with double entry points (I RdV think this can be done in C, but it is a bit tricky). as i said, probably paira,b can be used, but i don't tested this yet -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: Re[2]: inside the GHC code generator
From: Bulat Ziganshin [EMAIL PROTECTED] i answered in the original letter (search for Cray :) Re-reading this, I see that you have a well defined goals that cover most of my points. seems that you don't seen the attached files. tail calls are optimized in gcc No I don't see any attached files (on any of your emails). Best of luck with your optimisations. I will look forward to using them. (Although I don't like the style of the shootout code, I find it very useful in helping me speed up my own code, and yes I find it better in write Haskell in such a style, than to use FFI and call C). Rene. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: inside the GHC code generator
Bulat Ziganshin wrote: i prefer to see the asm code. this may be because of better high-level optimization strategies (reusing fib values). the scheme about i say will combine advantages of both worlds no strategies, plain exponential algorithm, ocaml: excerpt _camlFibo__fib_57: subesp, 8 L101: cmpeax, 5 jgeL100 moveax, 3 addesp, 8 ret L100: movDWORD PTR 0[esp], eax addeax, -4 call_camlFibo__fib_57 L102: movDWORD PTR 4[esp], eax moveax, DWORD PTR 0[esp] addeax, -2 call_camlFibo__fib_57 L103: movebx, DWORD PTR 4[esp] addeax, ebx deceax addesp, 8 ret /excerpt visual C++ 7.1 (next fastest): excerpt _fibPROC NEAR pushesi movesi, DWORD PTR 8[esp] cmpesi, 2 jgeSHORT $L945 moveax, 1 popesi ret0 $L945: leaeax, DWORD PTR [esi-2] pushedi pusheax call_fib decesi pushesi movedi, eax call_fib addesp, 8 addeax, edi popedi popesi ret0 _fibENDP /excerpt also, Clean is *EXACTLY* in line with ocaml. This is interesting, because Clean is so much similar to Haskell. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: inside the GHC code generator
[EMAIL PROTECTED] wrote: * multiple results can be returned via C++ paira,b template, if this is efficiently implemented in gcc There's a -freg-struct-return option in 2.x, 3.x and 4.x. I think it's off by default on many architectures. * recursive definitions translated into the for/while loops if possible I think recent versions of GCC do a good job of this. See http://home.in.tum.de/~baueran/thesis/ All of this efficiency stuff aside, there's a big problem you're neglecting: GARBAGE COLLECTION. For a language that allocates as much as Haskell I think a conservative collector is absolutely out of the question, because they can't compact the heap and so allocation becomes very expensive. A copying collector has to be able to walk the stack and find all the roots, and that interacts very badly with optimization. All the languages I've seen that compile via C either aren't garbage collected or use a conservative or reference-count collector. The closest thing I've seen to a solution is the technique used in Urban Boquist's thesis, which is to put a static table in the executable with information about register and stack usage indexed by procedure return point, and use that to unwind the stack and find roots. Some C compilers (including gcc) support debugging of optimized code, so it might be possible to parse the debugging tables and extract this information. As far as I know, no one has ever looked into the feasibility of doing this, which is kind of surprising since root-finding is probably the single biggest obstacle to compiling functional languages via C. -- Ben ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Re[2]: inside the GHC code generator
Hello Bulat, From my (limited) knowledge of GHC backend, the difficult part of your plan is that STG is not suited to compilation to native C at all. You might need to do quite advanced translation from STG to another intemediate language, (as GRIN for example), and then some more advanced analysis/optimization before C can be generated. iirc, the tricky part is the handling of lazyness. At any point you may end up with a thunk (closure), which ghc handles easily by evaluating it: it's always a function pointer. (That's the tagless part) When in WHNF, it just calls a very simple function that fills registers with the evaluated data. Otherwise, the function call performs the reduction to WHNF. If you want to use the same trick, you'll end up with the same problems (bad C). So, you'll want to eliminate thunks statically, finally requiring a 'high level' analysis as I suggested above. Also, allocation + garbage collection is extremely efficient in current GHC... C/C++ alloc doesn't even come close. It's entirely possible that with even a very fast backend, the better ghc allocator will be enough to swing the balance. (see jhc) It might be possible to re-use most of it however. I certainly don't want to discourage you (we all want faster code :), but there is no easy path to climb the mountain ;) Cheers, JP. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users