Re[2]: inside the GHC code generator
Hello Simon, Friday, February 24, 2006, 12:52:36 PM, you wrote: SM Wow! You make a lot of good suggestions. I think some of them are SM unfortunately unworkable, some others have been tried before, but there SM are definitely some to try. I'll suggest a few more in this email. can you comment about unworkable/hard to implement ones? may be, ghc users can propose some solutions that you are skipped SM First of all I should say that we don't *want* to use gcc as a code SM generator. Everything we've been doing in the back end over the last SM few years has been aiming towards dropping the dependency on gcc and SM removing the Evil Mangler. Now is not the time to be spending a lot of SM effort in beefing up the C back end :-) Our preference is to work on SM both the native and C-- back ends; the C-- back end has the potential to SM produce the best code and be the most portable. We also plan to keep SM the unregisterised C back end for the purposes of porting to a new SM platform, it's only the registerised path we want to lose. i know that is the hardest part of discussion. but sometimes we need to change our decisions. i think that it's better to compare efforts/results of going toward gcc/c-- way starting from current state of ghc, avoiding counting the previous efforts :( on the c-- side: 1) time/efforts required to implement many low-level optimizations 2) more or less efficient code on the gcc side: 1) time required to implement generation of native C code instead of current portable asm 2) one of the best optimization possible gcc way already exists and works, while C-- way is still unfinished even in its basic, unoptimized state. next, unregisterized C way provides the maximum possible level of portability SM Having said all that, if we can get better code out of the C back end SM without too much effort, then that's certainly worthwhile. moreover, many changes, proposed on this thread, perfectly implementable even on C-- way translated to something like this factorial: _s1BD = *(Sp + 0); if (_s1BD != 1) goto c1C4; // n!=1 ? R1 = *(Sp + 4); Sp = Sp + 8; return (*(Sp + 0))(); // return from factorial c1C4: _s1BI = _s1BD * (*(Sp + 4)); // n*r _s1BF = _s1BD - 1; // n-1 *(Sp + 4) = _s1BI; *(Sp + 0) = _s1BF; goto factorial; SM To be fair, this is only the translation on an architecture that has no SM argument registers (x86, and currently x86_64 but hopefully not for long). this just emphasizes main problem of portable asm code generated by ghc - you can't optimize it as good as C compiler optimizes idiomatic C code. x86 has 7 registers, after all, and gcc generates great code for this procedure - but only if it is written in idiomatic manner. you will need to develop your own optimization techniques, and it will be hard and long way comparing to just generation of more high-level C code just to compare - i've rewritten my binary serialization library two times. the first version just used string concatenation (!), the second was the carefully optimized using my C experience. nevertheless, it was slow enough. and the final 3rd was optimized according to ghc features. and the result was astonishing portable asm looking great in theory, and C-- as portable low-level language is great theoretical idea. but reality has its own laws. want to know why my second version of serialization library, highly optimized according to C experience, was so slow? it just returns tuples and constructing/analyzing these lazy tuples used 80% of the time ghc generates not so fast code and that is known for many years. you have developed the proper theoretical way to solve this problem, but there is also much easier backdoor :) at least, it seems like this SM The simple optimisation of changing the final tail call to factorial SM into a goto to a label at the beginning of the function (as suggested by SM John Meacham I think) should improve the code generated by gcc for SM functions like this, and is easy to do. the main detail is to work with local variables instead of Sp[xx]. gcc can't optimize access to Sp[xx] 1) because it just not asked. you can enable gcc optimization by adding -optc-O6 to the cmdline, but this leads to compilation errors on part of modules. it seems that gcc optimization is not compatible with evil mangler that is the ghc's own optimization trick. SM -O3 works, I haven't tried -O6. i've attached optc-O2-bug.hs that don't work both with -optc-O2 and -optc-O3 * C++ calling stack should be used as much as possible * parameters are passed in C++ way: factorial(int n, int r) SM This is unworkable, I'm afraid. Go and read Simon's paper on the STG: SMhttp://citeseer.ist.psu.edu/peytonjones92implementing.html SM there are two main reasons we don't use the C stack: garbage collection SM and tail calls. What do you plan to do about GC?
Re: inside the GHC code generator
Bulat Ziganshin wrote: * C++ calling stack should be used as much as possible * parameters are passed in C++ way: factorial(int n, int r) SM This is unworkable, I'm afraid. Go and read Simon's paper on the STG: SMhttp://citeseer.ist.psu.edu/peytonjones92implementing.html SM there are two main reasons we don't use the C stack: garbage collection SM and tail calls. What do you plan to do about GC? minimalistic solution i see - just use two stacks. unboxed values are passed using the C conventions and C compiler will be able to optimize using of this stack. boxed values are passed using the current Sp[] machinery and therefore GC will not be changed How do you propose to do thread switching? Incedentally, GHC had two stacks a long time ago (before 4.00) I rewrote the runtime and code generator to replace it with one stack. It was about 6 months work, if I remember correctly. I think you misused the words just and minimalistic :-) I know gcc does (some) tail-call optimisation these days. You don't get any *guarantee* that a call will be implemented as a tail-call, though, and the conditions are quite complex. The gcc folks treat it as an optimisation, rather than a fundamental part of the operational semantics, which is what you need for Haskell. SM This is called semi-tagging, it was tried a long time ago. Certainly SM worth trying again, but I very much doubt the gains would be anything SM like you claim, and it increases code size. we will replace 2 unpredictable jumps with one that will not be even executed if value is in WHNF. at least, i'm very displeased by slowness of lazy values evaluation in GHC. for example, my binary serialization package writes 60 mb/s working with unboxed arrays and only 20 mb/s working with the lists You need to *try* these things and measure the difference. Quite often I'm surprised with the actual results I get from an optimisation that looks on paper to be worthwhile. Today's processors with 3-level caches are complicated beasts. Semi tagging increases code size, so unless you can make some significant gains to make up for it, you've lost already. Let me give you another example: GHC puts info tables next to the entry code for a closure. On paper this looks like it might be a bad idea: it pollutes the instruction cache with data that is rarely accessed during execution. It's hard to arrange (this is one thing the mangler does), so we'd like to avoid it if possible. However, doing this eliminates an indirection when entering a closure, reduces the size of info tables by one word, and reduces code size. These effects together are more beneficial than the code-cache-pollution effect - last time I measured it the difference was ~10%. (turn off TABLES_NEXT_TO_CODE and try it yourself). SM There are other schemes, including this one that we particularly like: SM use a spare bit in a pointer to indicate points to an evaluated value. SM Since there are two spare bits, you can go further and use the 4 SM values to mean: SM0: possibly unevaluated SM1: evaluated, tag 0 SM2: evaluated, tag 1 SM3: evaluated, tag 2 SM I believe Robert Ennals tried this as part of his spec eval SM implementation, but IIRC he didn't get separate measurements for it (or SM it didn't improve things much). Note that this increases code size too. to be exact, we had only 2^30-1 valid pointers, all other values are ready to use! :) True, but I'm assuming you often want to store a pointer in addition to the tag (eg. for a cons cell). For enumerations, you're quite right that the whole word can be used for the tag as long as it is distinguished from a pointer. Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: GHC 6.4.1 crash on Windows XP
Cyril Schmidt wrote: A freshly installed GHC 6.4.1 on my colleague's PC crashes when I try to build a package: runhaskell Setup.hs build The effect is easily reproduceable (it shows up on *any* package that I try to build). Does anyone have any idea of what might be wrong here? Does it always crash in the same way? Can you fire up GHCi and perform a small evaluation or two? Can you compile small programs, do they run? Please send us the complete output from commands that crash: eg. add -v to the build command above. Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Missing Folder in ghc?
On 2/28/06, Ashley Yakeley [EMAIL PROTECTED] wrote: I'm trying to build GHC from source. But the ghc repository at http://darcs.haskell.org/ghc seems to be missing ghc/lib/compat/Cabal? Did you run 'sh darcs-all get'? -- Friendly, Lemmih ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Missing Folder in ghc?
Lemmih wrote: Did you run 'sh darcs-all get'? Oh, that wasn't in the README. Thanks. But now I get this: /usr/bin/ghc -H16m -O -I. -Iinclude -Rghc-timing -I../../../libraries -fglasgow-exts -no-recomp-c System/Directory/Internals.hs -o System/Directory/Internals.o -ohi System/Directory/Internals.hi System/Directory/Internals.hs:1:0: Module `System.Directory.Internals' is a member of package base-1.0. To compile this module, please use -ignore-package base-1.0. I'm using GHC 6.4. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users