Hi Thomas,
thanks for your suggestion. Also thanks for the PR number. I've tried
with quick way (build.mk) and benchmarking ghc while compiling ghc-cabal
manually and here are the results:
-j1: 45s
-j2: 28s
-j3: 26s
-j4: 24s
-j5: 24s
-j6: 25s
-j6 -A32m: 23s
-j6 -A64m: 21s
-j6 -A128m: 23s
real time is reported, GHC compiles into i386 code on Solaris 11. GHC is
located in /tmp hence basically in RAM. CPU is 6c/12ht E5-2620.
So not that bad, but on the other hand also not that good result.
Anyway, unfortunately on my niagara this will probably not help me,
since I guess --make -jX is recent addition probably not presented in
7.6.x, right? If so, then I'm afraid this will not help me since on
niagara I'm using patched 7.6.x with fixed SPARC NCG and this
single-threaded will be probably faster than 7.10.1 multithreaded but
building unregisterised (hence with C compiler...). Anyway, I'll try to
benchmark this tomorrow and will keep you posted.
Thanks!
Karel
On 04/ 1/15 12:34 PM, Thomas Miedema wrote:
Hi Karel,
could you try adding `-j8` to `SRC_HC_OPTS` for the build flavor you're
using in `mk/build.mk <http://build.mk>`, and running `gmake -j8`
instead of `gmake -j64`. A graph like the one you attached will likely
look even worse, but the walltime of your build should hopefully be
improved.
The build system seems to currently rely entirely on `make` for
parallelism. It doesn't exploit ghc's own parallel `--make` at all,
unless you explictly add `-jn` to SRC_HC_OPTS, with n>1 (which also sets
the number of capabilities for the runtime system, so also adding `+RTS
-Nn` is not needed).
Case study: One of the first things the build system does is build
ghc-cabal and Cabal using the stage 0 compiler, through a single
invocation of `ghc --make`. All the later make targets depend on that
step to complete first. Because `ghc --make` is not instructed to build
in parallel, using `make -j1` or `make -j100000` doesn't make any
difference (for that step). I think your graph shows that there are many
of more of such bottlenecks.
You would have to find out empirically how to best divide your number of
threads (32) between `make` and `ghc --make`. From reading this comment
<https://ghc.haskell.org/trac/ghc/ticket/9221#comment:12> by Simon in
#9221 I understand it's better not to call `ghc --make -jn` with `n`
higher than the number of physical cores of your machine (8 in your
case). Once you get some better parallelism, other flags like `-A` might
also have an effect on walltime (see that ticket).
-Thomas
On Sat, Mar 7, 2015 at 11:49 AM, Karel Gardas <karel.gar...@centrum.cz
<mailto:karel.gar...@centrum.cz>> wrote:
Folks,
first of all, I remember someone already mentioned issue with
decreased parallelism of the GHC build recently somewhere but I
cann't find it now. Sorry, for that since otherwise I would use this
thread if it was on this mailing list.
Anyway, while working on SPARC NCG I'm using T2000 which provides 32
threads/8 core UltraSPARC T1 CPU. The property of this machine is
that it's really slow on single-threaded work. To squeeze some perf
from it man really needs to push 32 threads of work on it. Now, it
really hurts my nerves to see it's lazy building/running just one or
two ghc processes. To verify the fact I've created simple script to
collect number of ghc processes over time and putting this to graph.
The result is in the attached picture. The graph is result of running:
gmake -j64
anyway, the average number of running ghc processes is 4.4 and the
median value is 2. IMHO such low number not only hurts build times
on something like CMT SPARC machine, but also on let say a cluster
of ARM machines using NFS and also on common engineering
workstations which provide these days (IMHO!) around 8-16 cores (and
double the threads number).
My naive idea(s) for fixing this issue is (I'm assuming no Haskell
file imports unused imports here, but perhaps this may be also
investigated):
1) provide explicit dependencies which guides make to build in more
optimal way
2) hack GHC's make depend to kind of compute explicit dependencies
from (1) in an optimal way automatically
3) someone already mentioned using shake for building ghc. I don't
know shake but perhaps this is the right direction?
4) hack GHC to compile needed hi file directly in its memory if hi
file is not (yet!) available (issue how to get compiling options
right here). Also I don't know hi file semantics yet so bear with me
on this.
Is there anything else which may be done to fix that issue? Is
someone already working on some of those? (I mean those reasonable
from the list)?
Thanks!
Karel
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org <mailto:ghc-devs@haskell.org>
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs