Hi Karel, could you try adding `-j8` to `SRC_HC_OPTS` for the build flavor you're using in `mk/build.mk`, and running `gmake -j8` instead of `gmake -j64`. A graph like the one you attached will likely look even worse, but the walltime of your build should hopefully be improved.
The build system seems to currently rely entirely on `make` for parallelism. It doesn't exploit ghc's own parallel `--make` at all, unless you explictly add `-jn` to SRC_HC_OPTS, with n>1 (which also sets the number of capabilities for the runtime system, so also adding `+RTS -Nn` is not needed). Case study: One of the first things the build system does is build ghc-cabal and Cabal using the stage 0 compiler, through a single invocation of `ghc --make`. All the later make targets depend on that step to complete first. Because `ghc --make` is not instructed to build in parallel, using `make -j1` or `make -j100000` doesn't make any difference (for that step). I think your graph shows that there are many of more of such bottlenecks. You would have to find out empirically how to best divide your number of threads (32) between `make` and `ghc --make`. From reading this comment <https://ghc.haskell.org/trac/ghc/ticket/9221#comment:12> by Simon in #9221 I understand it's better not to call `ghc --make -jn` with `n` higher than the number of physical cores of your machine (8 in your case). Once you get some better parallelism, other flags like `-A` might also have an effect on walltime (see that ticket). -Thomas On Sat, Mar 7, 2015 at 11:49 AM, Karel Gardas <[email protected]> wrote: > > Folks, > > first of all, I remember someone already mentioned issue with decreased > parallelism of the GHC build recently somewhere but I cann't find it now. > Sorry, for that since otherwise I would use this thread if it was on this > mailing list. > > Anyway, while working on SPARC NCG I'm using T2000 which provides 32 > threads/8 core UltraSPARC T1 CPU. The property of this machine is that it's > really slow on single-threaded work. To squeeze some perf from it man > really needs to push 32 threads of work on it. Now, it really hurts my > nerves to see it's lazy building/running just one or two ghc processes. To > verify the fact I've created simple script to collect number of ghc > processes over time and putting this to graph. The result is in the > attached picture. The graph is result of running: > > gmake -j64 > > anyway, the average number of running ghc processes is 4.4 and the median > value is 2. IMHO such low number not only hurts build times on something > like CMT SPARC machine, but also on let say a cluster of ARM machines using > NFS and also on common engineering workstations which provide these days > (IMHO!) around 8-16 cores (and double the threads number). > > My naive idea(s) for fixing this issue is (I'm assuming no Haskell file > imports unused imports here, but perhaps this may be also investigated): > > 1) provide explicit dependencies which guides make to build in more > optimal way > > 2) hack GHC's make depend to kind of compute explicit dependencies from > (1) in an optimal way automatically > > 3) someone already mentioned using shake for building ghc. I don't know > shake but perhaps this is the right direction? > > 4) hack GHC to compile needed hi file directly in its memory if hi file is > not (yet!) available (issue how to get compiling options right here). Also > I don't know hi file semantics yet so bear with me on this. > > > Is there anything else which may be done to fix that issue? Is someone > already working on some of those? (I mean those reasonable from the list)? > > Thanks! > Karel > > > _______________________________________________ > ghc-devs mailing list > [email protected] > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >
_______________________________________________ ghc-devs mailing list [email protected] http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
