Re: parallelizing ghc

Simon Marlow Fri, 17 Feb 2012 01:01:59 -0800

On 17/02/2012 01:59, Evan Laforge wrote:

However, the GHC API doesn't provide a way to do this directly (I hadn't
really thought about this when I suggested the idea before, sorry).  The GHC
API provides support for compiling multiple modules in the way that GHCi and
--make work; each module is added to the HPT as it is compiled.  But when
compiling single modules, GHC doesn't normally use the HPT - interfaces for
modules in the home package are normally demand-loaded in the same way as
interfaces for package modules, and added to the PIT. The crucial difference
between the HPT and the PIT is that the PIT supports demand-loading of
interfaces, but the HPT is supposed to be populated in the right order by
the compilation manager - home package modules are assumed to be present in
the HPT when they are required.


Yah, that's what I don't understand about HscEnv.  The HPT doc says
that in one-shot mode, the HPT is empty and even local modules are
demand-cached in the ExternalPackageState (which the PIT belongs to).
And the EPS doc itself reinforces that where it says in one-shot mode
"home-package modules accumulate in the external package state".

So why not just ignore the HPT, and run multiple "one-shot" compiles,
and let all the info accumulate in the PIT?

Sure, except that if the server is to be used by multiple clients, youwill get clashes in the PIT when say two clients both try to compile amodule with the same name.

The PIT is indexed by Module, which is basically the pair(package,modulename), and the package for the main program is always thesame: "main".

This will work fine if you spin up a new server for each program youwant to build - maybe that's fine for your use case?

Don't forget to make sure the GhcMode is set to OneShot, notCompManager, BTW.

A fair amount of work in GhcMake is concerned with trimming old data
out of the HPT, I assume this is for ghci that wants to reload changed
modules but keep unchanged ones.  I don't actually care about that
since I can assume the modules will be unchanged over one run.

So I tried just calling compileFile multiple times in the same
GhcMonad, assuming the mutable bits of the HscEnv get updated
appropriately.  Here are the results for a build of about 200 modules:

with persistent server:
no link:
3.30s user 1.60s system 12% cpu 38.323 total
3.50s user 1.66s system 13% cpu 38.368 total
link:
21.66s user 4.13s system 35% cpu 1:11.62 total
21.59s user 4.54s system 38% cpu 1:08.13 total
21.82s user 4.70s system 35% cpu 1:14.56 total

without server (ghc -c):
no link:
109.25s user 19.90s system 240% cpu 53.750 total
109.11s user 19.23s system 243% cpu 52.794 total
link:
128.10s user 21.66s system 201% cpu 1:14.29 total

ghc --make (with linking since I can't turn that off):
42.57s user 5.83s system 74% cpu 1:05.15 total


Yep, it seems to be doing the right thing.

The 'user' is low for the server because it doesn't count time spent
by the subprocesses on the other end of the socket, but excluding
linking it looks like I can shave about 25% off compile time.
Unfortunately it winds up being just about the same as ghc --make, so
it seems too low.


But that's what you expect, isn't it?

Perhaps I should be using the HPT?  I'm also
falling back to plain ghc for linking, maybe --make can link faster
when it has everything cached?  I guess it shouldn't, because it
presumably just dispatches to ld.

--make has a slight advantage for linking in that it knows whichpackages it needs to link against, whereas plain ghc will link againstall the packages on the command line.


Cheers,
        Simon

_______________________________________________
Glasgow-haskell-users mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: parallelizing ghc

Reply via email to