Re: parallelizing ghc

2012-02-27 Thread Simon Marlow

On 17/02/2012 18:12, Evan Laforge wrote:

Sure, except that if the server is to be used by multiple clients, you will
get clashes in the PIT when say two clients both try to compile a module
with the same name.

The PIT is indexed by Module, which is basically the pair
(package,modulename), and the package for the main program is always the
same: main.

This will work fine if you spin up a new server for each program you want to
build - maybe that's fine for your use case?


Yep, I have a new server for each CPU.  So compiling one program will
start up (say) 4 compilers and one server.  Then shake will start
throwing source files at the server, in the proper dependency order,
and the server will distribute the input files among the 4 servers.
Each server is single-threaded so I don't have to worry about calling
GHC functions reentrantly.

But --make is single-threaded as well, so why doesn't it just call
compileFile repeatedly and instead bother with all that HPT stuff?  Is
it just for ghci?


That might be true, but I'm not completely sure.  The HPT stuff was 
added with a continuous edit-recompile cycle in mind (i.e. for GHCi), 
and we added --make at the same time because it fitted nicely.  It might 
be that just calling compileFile repeatedly works, and it would end up 
storing the interfaces for the home-package modules in the 
PackageIfaceTable, but we never considered this use case.  One thing 
that worries me: will it be reading the .hi file for a module off the 
disk after compiling it?  I suspect it might, whereas the HPT method 
will be caching the iface in the HPT.



The 'user' is low for the server because it doesn't count time spent
by the subprocesses on the other end of the socket, but excluding
linking it looks like I can shave about 25% off compile time.
Unfortunately it winds up being just about the same as ghc --make, so
it seems too low.


But that's what you expect, isn't it?


It's surprising to me that the serial --make is just about the same
speed as a parallelized one.  The whole point was to compile faster!


Ah, so maybe the problem is that the compileFile method is re-reading 
.hi files off the disk (and typechecking them), and that is making it 
slower.



Granted, each interface has to be loaded for each processor while
--make only needs to do it once, but once loaded they should stay
loaded and I'd expect the benefit from two processors would win out
pretty quickly.


--make has a slight advantage for linking in that it knows which packages it
needs to link against, whereas plain ghc will link against all the packages
on the command line.


Ohh, so maybe with --make it can omit some packages and do less work.
Let me try minimizing the -packages and see if that helps.

As an aside, it would be handy to be able to ask ghc given this main
module, which -packages should the final program get? but not
actually compile anything.  Is there a way to do that, short of
writing my own with the ghc api?  Would it be a reasonable ghc flag,
along the lines of -M but for packages?


I don't think we can calculate the package dependencies without knowing 
the ModIface, which is generated by compiling (or at least typechecking) 
each module.


Cheers,
Simon




BTW, in case anyone is interested, a darcs repo is at
http://ofb.net/~elaforge/ghc-server/



___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: parallelizing ghc

2012-02-17 Thread Simon Marlow

On 17/02/2012 01:59, Evan Laforge wrote:

However, the GHC API doesn't provide a way to do this directly (I hadn't
really thought about this when I suggested the idea before, sorry).  The GHC
API provides support for compiling multiple modules in the way that GHCi and
--make work; each module is added to the HPT as it is compiled.  But when
compiling single modules, GHC doesn't normally use the HPT - interfaces for
modules in the home package are normally demand-loaded in the same way as
interfaces for package modules, and added to the PIT. The crucial difference
between the HPT and the PIT is that the PIT supports demand-loading of
interfaces, but the HPT is supposed to be populated in the right order by
the compilation manager - home package modules are assumed to be present in
the HPT when they are required.


Yah, that's what I don't understand about HscEnv.  The HPT doc says
that in one-shot mode, the HPT is empty and even local modules are
demand-cached in the ExternalPackageState (which the PIT belongs to).
And the EPS doc itself reinforces that where it says in one-shot mode
home-package modules accumulate in the external package state.

So why not just ignore the HPT, and run multiple one-shot compiles,
and let all the info accumulate in the PIT?


Sure, except that if the server is to be used by multiple clients, you 
will get clashes in the PIT when say two clients both try to compile a 
module with the same name.


The PIT is indexed by Module, which is basically the pair 
(package,modulename), and the package for the main program is always the 
same: main.


This will work fine if you spin up a new server for each program you 
want to build - maybe that's fine for your use case?


Don't forget to make sure the GhcMode is set to OneShot, not 
CompManager, BTW.



A fair amount of work in GhcMake is concerned with trimming old data
out of the HPT, I assume this is for ghci that wants to reload changed
modules but keep unchanged ones.  I don't actually care about that
since I can assume the modules will be unchanged over one run.

So I tried just calling compileFile multiple times in the same
GhcMonad, assuming the mutable bits of the HscEnv get updated
appropriately.  Here are the results for a build of about 200 modules:

with persistent server:
no link:
3.30s user 1.60s system 12% cpu 38.323 total
3.50s user 1.66s system 13% cpu 38.368 total
link:
21.66s user 4.13s system 35% cpu 1:11.62 total
21.59s user 4.54s system 38% cpu 1:08.13 total
21.82s user 4.70s system 35% cpu 1:14.56 total

without server (ghc -c):
no link:
109.25s user 19.90s system 240% cpu 53.750 total
109.11s user 19.23s system 243% cpu 52.794 total
link:
128.10s user 21.66s system 201% cpu 1:14.29 total

ghc --make (with linking since I can't turn that off):
42.57s user 5.83s system 74% cpu 1:05.15 total


Yep, it seems to be doing the right thing.


The 'user' is low for the server because it doesn't count time spent
by the subprocesses on the other end of the socket, but excluding
linking it looks like I can shave about 25% off compile time.
Unfortunately it winds up being just about the same as ghc --make, so
it seems too low.


But that's what you expect, isn't it?


Perhaps I should be using the HPT?  I'm also
falling back to plain ghc for linking, maybe --make can link faster
when it has everything cached?  I guess it shouldn't, because it
presumably just dispatches to ld.


--make has a slight advantage for linking in that it knows which 
packages it needs to link against, whereas plain ghc will link against 
all the packages on the command line.


Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: parallelizing ghc

2012-02-17 Thread Evan Laforge
 Sure, except that if the server is to be used by multiple clients, you will
 get clashes in the PIT when say two clients both try to compile a module
 with the same name.

 The PIT is indexed by Module, which is basically the pair
 (package,modulename), and the package for the main program is always the
 same: main.

 This will work fine if you spin up a new server for each program you want to
 build - maybe that's fine for your use case?

Yep, I have a new server for each CPU.  So compiling one program will
start up (say) 4 compilers and one server.  Then shake will start
throwing source files at the server, in the proper dependency order,
and the server will distribute the input files among the 4 servers.
Each server is single-threaded so I don't have to worry about calling
GHC functions reentrantly.

But --make is single-threaded as well, so why doesn't it just call
compileFile repeatedly and instead bother with all that HPT stuff?  Is
it just for ghci?

 The 'user' is low for the server because it doesn't count time spent
 by the subprocesses on the other end of the socket, but excluding
 linking it looks like I can shave about 25% off compile time.
 Unfortunately it winds up being just about the same as ghc --make, so
 it seems too low.

 But that's what you expect, isn't it?

It's surprising to me that the serial --make is just about the same
speed as a parallelized one.  The whole point was to compile faster!

Granted, each interface has to be loaded for each processor while
--make only needs to do it once, but once loaded they should stay
loaded and I'd expect the benefit from two processors would win out
pretty quickly.

 --make has a slight advantage for linking in that it knows which packages it
 needs to link against, whereas plain ghc will link against all the packages
 on the command line.

Ohh, so maybe with --make it can omit some packages and do less work.
Let me try minimizing the -packages and see if that helps.

As an aside, it would be handy to be able to ask ghc given this main
module, which -packages should the final program get? but not
actually compile anything.  Is there a way to do that, short of
writing my own with the ghc api?  Would it be a reasonable ghc flag,
along the lines of -M but for packages?


BTW, in case anyone is interested, a darcs repo is at
http://ofb.net/~elaforge/ghc-server/

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: parallelizing ghc

2012-02-16 Thread Evan Laforge
 However, the GHC API doesn't provide a way to do this directly (I hadn't
 really thought about this when I suggested the idea before, sorry).  The GHC
 API provides support for compiling multiple modules in the way that GHCi and
 --make work; each module is added to the HPT as it is compiled.  But when
 compiling single modules, GHC doesn't normally use the HPT - interfaces for
 modules in the home package are normally demand-loaded in the same way as
 interfaces for package modules, and added to the PIT. The crucial difference
 between the HPT and the PIT is that the PIT supports demand-loading of
 interfaces, but the HPT is supposed to be populated in the right order by
 the compilation manager - home package modules are assumed to be present in
 the HPT when they are required.

Yah, that's what I don't understand about HscEnv.  The HPT doc says
that in one-shot mode, the HPT is empty and even local modules are
demand-cached in the ExternalPackageState (which the PIT belongs to).
And the EPS doc itself reinforces that where it says in one-shot mode
home-package modules accumulate in the external package state.

So why not just ignore the HPT, and run multiple one-shot compiles,
and let all the info accumulate in the PIT?

A fair amount of work in GhcMake is concerned with trimming old data
out of the HPT, I assume this is for ghci that wants to reload changed
modules but keep unchanged ones.  I don't actually care about that
since I can assume the modules will be unchanged over one run.

So I tried just calling compileFile multiple times in the same
GhcMonad, assuming the mutable bits of the HscEnv get updated
appropriately.  Here are the results for a build of about 200 modules:

with persistent server:
no link:
3.30s user 1.60s system 12% cpu 38.323 total
3.50s user 1.66s system 13% cpu 38.368 total
link:
21.66s user 4.13s system 35% cpu 1:11.62 total
21.59s user 4.54s system 38% cpu 1:08.13 total
21.82s user 4.70s system 35% cpu 1:14.56 total

without server (ghc -c):
no link:
109.25s user 19.90s system 240% cpu 53.750 total
109.11s user 19.23s system 243% cpu 52.794 total
link:
128.10s user 21.66s system 201% cpu 1:14.29 total

ghc --make (with linking since I can't turn that off):
42.57s user 5.83s system 74% cpu 1:05.15 total

The 'user' is low for the server because it doesn't count time spent
by the subprocesses on the other end of the socket, but excluding
linking it looks like I can shave about 25% off compile time.
Unfortunately it winds up being just about the same as ghc --make, so
it seems too low.  Perhaps I should be using the HPT?  I'm also
falling back to plain ghc for linking, maybe --make can link faster
when it has everything cached?  I guess it shouldn't, because it
presumably just dispatches to ld.

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: parallelizing ghc

2012-02-13 Thread Simon Marlow

On 10/02/2012 08:01, Evan Laforge wrote:

I like the idea!  And it should be possible to build this without modifying
GHC at all, on top of the GHC API.  As you say, you'll need a server
process, which accepts command lines, executes them, and sends back the
results.  A local socket should be fine (and will work on both Unix and
Windows).


I took a whack at this, but I'm having to backtrack a bit now because
I don't fully understand the GHC API, so I thought I should explain my
understanding to make sure I'm on the right track.

It appears the cached information I want to preserve between compiles
is in HscEnv.  At first I thought I could just do what --make does,
but what it does is call 'GHC.load', which maintains the HscEnv (which
mostly means loading already compiled modules into the
HomePackageTable, since the other cache entries are apparently loaded
on demand by DriverPipeline.compileFile).  But actually it does a lot
of things, such as detecting that a module doesn't need recompilation
and directly loading the interface in that case.  So I thought it
would be quickest to just use it: add a new target to the set of
targets and call load again.

However, there are problems with that.  The first is it doesn't pay
attention to DynFlags.outputFile, which makes sense because it's
expecting to compile multiple files.  The bigger problem is that it
apparently wants to reload the whole set each time, so it winds up
being slower rather than faster.  I guess 'load' is really set up to
figure out dependencies on its own and compile a set of modules, so
I'm talking at the wrong level.

So I think I need to rewrite the HPT-maintaining parts of GHC.load and
write my own compileFile that *does* maintain the HPT.  And also
figure out what other parts of the HscEnv should be updated, if any.
Sound about right?


What you're trying to do is mimic the operation of 'ghc -c Foo.hs ..' 
but cache any loaded interface files and re-use them.  This means you 
need to retain the contents of HscEnv (as you say), because that 
contains the cached information.


However, the GHC API doesn't provide a way to do this directly (I hadn't 
really thought about this when I suggested the idea before, sorry).  The 
GHC API provides support for compiling multiple modules in the way that 
GHCi and --make work; each module is added to the HPT as it is compiled. 
 But when compiling single modules, GHC doesn't normally use the HPT - 
interfaces for modules in the home package are normally demand-loaded in 
the same way as interfaces for package modules, and added to the PIT. 
The crucial difference between the HPT and the PIT is that the PIT 
supports demand-loading of interfaces, but the HPT is supposed to be 
populated in the right order by the compilation manager - home package 
modules are assumed to be present in the HPT when they are required.


For 'ghc -c Foo.hs' you want to demand-load interfaces for other modules 
in the same package (and cache them), but you want them to not get mixed 
up with interfaces from other packages that may be being compiled 
simultaneously by other clients.  There's no easy way to solve this. 
You could avoid the problem by not caching home-package interfaces, but 
that may throw away a lot of the benefit of doing this.  Or you could 
maintain some kind of session state with the client over multiple 
compilations, and only discard the home package interfaces if another 
client connects.


There are further complications in that certain flags can invalidate the 
information you have cached: changing the package flags, for instance.


So I think some additions to the API are almost certainly needed.  But 
this is as far as I have got in thinking about the problem...


Cheers,
Simon





Along the way I ran into the problem that it's impossible to re-parse
GHC flags to compare them to previous runs, because static flags only
export a parsing function that mutates global variables and can only
be called once.  So I parse out the dynamic flags, strip out the *.hs
args, and assume the rest are static flags.  I noticed comments about
converting them all to dynamic, I guess that might make a nice
housekeeping project some day.



___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: parallelizing ghc

2012-01-29 Thread Neil Mitchell
Hi Simon,

I have found that a factor of 2 parallelism is required on Linux to
draw with ghc --make. In particular:

GHC --make = 7.688
Shake -j1 = 11.828 (of which 11.702 is spent running system commands)
Shake full -j4 = 7.414 (of which 12.906 is spent running system commands)

This is for a Haskell program which has several bottlenecks, you can
see graph of spawned processes here:
http://community.haskell.org/~ndm/darcs/shake/academic/icfp2012/profile.eps
- everything above the 1 mark is more than one process in parallel, so
it gets to 4 processes, but not all the time - roughly an average of ~
x2 parallelism.

On Windows the story is much worse. If you -j4 then the time spent
executing system commands shoots up from ~15s to around ~25s, since
even on a 4 core machine the contention in the processes is high. I
tried investigating this, checking for things like a locked file (none
I can find), or disk/CPU/memory contention (its basically taking no
system resources), but couldn't find anything.

If you specify -O2 then the parallel performance also goes down - I
suspect because each ghc process needs to read inline information for
packages that are imported multiple times, and ghc --make gets away
with doing that once?

 This looks a bit suspicious.  The Shake build is doing nearly twice as much
 work as the --make build, in terms of CPU time, but because it is getting
 nearly 2x parallelism it comes in a close second.  How many processes is the
 Shake build using?

Shake uses a maximum of the number of processes you specify, it never
exceeds the -j flag - so in the above example it caps out at 4. It is
very good at getting parallelism (I believe it to be perfect, but the
code is 150 lines of IORef twiddling, so I wouldn't guarantee it), and
very safe about never exceeding the cap you specify (I think I can
even prove that, for some value of proof). The profiling makes it easy
to verify these claims after the fact.

Thanks, Neil

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: parallelizing ghc

2012-01-27 Thread Simon Marlow

On 26/01/2012 23:37, Evan Laforge wrote:

I'm slightly surprised by this - in my experience parallel builds beat
--make as long as the parallelism is a factor of 2 or more.  Is your
dependency graph very narrow, or do you have lots of very small modules?


I get full parallelism, 4 threads at once on a 2 core machine * 2
hyperthread/whatever core i5, and SSD.  Maybe I should try with just 2
threads.  I only ever get 200% CPU at most, so it seems like the
hyperthreads are not really much like a whole core.

The modules are usually around 150-250 lines.  Here are the timings
for an older run:

 from scratch (191 modules):
 runghc Shake/Shakefile.hs build/debug/seq  128.43s user 20.04s system 178%
cpu 1:23.01 total
 no link: runghc Shake/Shakefile.hs build/debug/seq  118.92s user 19.21s sys
tem 249% cpu 55.383 total
 make -j3 build/seq  68.81s user 9.98s system 98% cpu 1:19.60 total


This looks a bit suspicious.  The Shake build is doing nearly twice as 
much work as the --make build, in terms of CPU time, but because it is 
getting nearly 2x parallelism it comes in a close second.  How many 
processes is the Shake build using?


I'd investigate this further.  Are you sure there's no swapping going 
on?  How many processes is the Shake build creating - perhaps too many?


Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: parallelizing ghc

2012-01-26 Thread Simon Marlow

On 24/01/2012 03:53, Evan Laforge wrote:
 I recently switched from ghc --make to a parallelized build system.  I
 was looking forward to faster builds, and while they are much faster
 at figuring out what has to be rebuilt (which is most of the time for
 a small rebuild, since ld dominates), compilation of the whole system
 is either the same or slightly slower than the single threaded ghc
 --make version.  My guess is that the overhead of starting up lots of
 individual ghcs, each of which has to read all the .hi files all over
 again, just about cancels out the parallelism gains.

I'm slightly surprised by this - in my experience parallel builds beat 
--make as long as the parallelism is a factor of 2 or more.  Is your 
dependency graph very narrow, or do you have lots of very small modules?



So I'm wondering, does this seem reasonable and feasible?  Is there a
better way to do it?  Even if it could be done, would it be worth it?
If the answers are yes, maybe not, and maybe yes, then how hard
would this be to do and where should I start looking?  I'm assuming
start at GhcMake.hs and work outwards from there...


I like the idea!  And it should be possible to build this without 
modifying GHC at all, on top of the GHC API.  As you say, you'll need a 
server process, which accepts command lines, executes them, and sends 
back the results.  A local socket should be fine (and will work on both 
Unix and Windows).


The server process can either do the compilation itself, or have several 
workers.  Unfortunately the workers would have to be separate processes, 
because the GHC API is single threaded.


When a worker gets too large, just kill it and start a new one.

Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: parallelizing ghc

2012-01-26 Thread John Lato
 From: Evan Laforge qdun...@gmail.com

 On Wed, Jan 25, 2012 at 11:42 AM, Ryan Newton rrnew...@gmail.com wrote:
 package list for me. ?The time is going to be dominated by linking,
 which is single threaded anyway, so either way works.

 What is the state of incremental linkers? ?I thought those existed now.

 I think in some specific cases.  I've heard there's a microsoft one?
 It would be windows only of course.  Is anyone using that with ghc?

 gold is supposed to be multi-threaded and fast (don't know about
 incremental), but once again it's ELF-only.  I've heard a few people
 talking about gold with ghc, but I don't know what the results were.

 Unfortunately I'm on OS X, I don't know about any incremental or
 multithreaded linking here.

Neither do I.  On my older machine with 2GB RAM, builds are often
dominated by ld because it starts thrashing.  And not many linkers
target Mach-O.

I've been toying with building my own ld replacement.  I don't know
anything about linkers, but I'd say at least even odds that I can do
better than this.

John L.

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: parallelizing ghc

2012-01-26 Thread Evan Laforge
 I'm slightly surprised by this - in my experience parallel builds beat
 --make as long as the parallelism is a factor of 2 or more.  Is your
 dependency graph very narrow, or do you have lots of very small modules?

I get full parallelism, 4 threads at once on a 2 core machine * 2
hyperthread/whatever core i5, and SSD.  Maybe I should try with just 2
threads.  I only ever get 200% CPU at most, so it seems like the
hyperthreads are not really much like a whole core.

The modules are usually around 150-250 lines.  Here are the timings
for an older run:

from scratch (191 modules):
runghc Shake/Shakefile.hs build/debug/seq  128.43s user 20.04s system 178%
cpu 1:23.01 total
no link: runghc Shake/Shakefile.hs build/debug/seq  118.92s user 19.21s sys
tem 249% cpu 55.383 total
make -j3 build/seq  68.81s user 9.98s system 98% cpu 1:19.60 total

modify nothing:
runghc Shake/Shakefile.hs build/debug/seq  0.65s user 0.10s system 96% cpu
0.780 total
make -j3 build/seq  6.05s user 1.21s system 85% cpu 8.492 total

modify one file:
runghc Shake/Shakefile.hs build/debug/seq  19.50s user 2.37s system 94% cpu
 23.166 total
make -j3 build/seq  12.81s user 1.85s system 94% cpu 15.586 total

From scratch, --make (that's what 'make -j3' winds up calling) wins
slightly.  --make loses handily at detecting than nothing need be done
:)  And as expected, modifying one file is all about the linking,
though it's odd how --make was faster.

 I like the idea!  And it should be possible to build this without modifying
 GHC at all, on top of the GHC API.  As you say, you'll need a server
 process, which accepts command lines, executes them, and sends back the
 results.  A local socket should be fine (and will work on both Unix and
 Windows).

 The server process can either do the compilation itself, or have several
 workers.  Unfortunately the workers would have to be separate processes,
 because the GHC API is single threaded.

 When a worker gets too large, just kill it and start a new one.

A benefit of real processes, I'm pretty confident all the memory will
be GCed after the whole process is killed :)

I'll start looking into the ghc api.  I have no experience with it,
but I assume I can look at what GhcMake.hs is doing and learn from
that.

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: parallelizing ghc

2012-01-26 Thread Nathan Howell
On Thu, Jan 26, 2012 at 3:44 PM, Evan Laforge qdun...@gmail.com wrote:

 I'd think apple would care about linker performance... I'm even a
 little surprised Xcode doesn't have something better than a lightly
 hacked gnu ld.


Someone mentioned that it was on their wish-list at LLVM 2010 conference...
it's hinted at here too:
http://llvm.org/devmtg/2010-11/Spencer-ObjectFiles.pdf. He might know if
anyone is actually working on one.

-n
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: parallelizing ghc

2012-01-25 Thread Evan Laforge
On Wed, Jan 25, 2012 at 11:42 AM, Ryan Newton rrnew...@gmail.com wrote:
 package list for me.  The time is going to be dominated by linking,
 which is single threaded anyway, so either way works.

 What is the state of incremental linkers?  I thought those existed now.

I think in some specific cases.  I've heard there's a microsoft one?
It would be windows only of course.  Is anyone using that with ghc?

gold is supposed to be multi-threaded and fast (don't know about
incremental), but once again it's ELF-only.  I've heard a few people
talking about gold with ghc, but I don't know what the results were.

Unfortunately I'm on OS X, I don't know about any incremental or
multithreaded linking here.

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: parallelizing ghc

2012-01-24 Thread Mikhail Glushenkov
Hi,

On Tue, Jan 24, 2012 at 4:53 AM, Evan Laforge qdun...@gmail.com wrote:
 [...]

 So ghc --make provides two things: a dependency chaser and a way to
 keep the compiler resident as it compiles new files.  Since the
 dependency chaser will never be as powerful as a real build system, it
 occurs to me that the only reasonable way forward is to split out the
 second part, by adding an --interactive flag to ghc.  It would then
 read filenames on stdin, compiling each one in turn, only exiting when
 it sees EOF.

There is in fact an '--interactive' flag already, 'ghc --interactive'
is a synonym for 'ghci'.

 So I'm wondering, does this seem reasonable and feasible?  Is there a
 better way to do it?  Even if it could be done, would it be worth it?
 If the answers are yes, maybe not, and maybe yes, then how hard
 would this be to do and where should I start looking?  I'm assuming
 start at GhcMake.hs and work outwards from there...

I'm also interested in a build server mode for ghc. I have written a
parallel wrapper for 'ghc --make' [1], but the speed gains are not as
impressive [2] as I hoped because of the duplicated work.


[1] https://github.com/23Skidoo/ghc-parmake
[2] https://gist.github.com/1360470

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: parallelizing ghc

2012-01-24 Thread Mikhail Glushenkov
Hi,

On Tue, Jan 24, 2012 at 4:53 AM, Evan Laforge qdun...@gmail.com wrote:
 So ghc --make provides two things: a dependency chaser and a way to
 keep the compiler resident as it compiles new files.  Since the
 dependency chaser will never be as powerful as a real build system, it
 occurs to me that the only reasonable way forward is to split out the
 second part, by adding an --interactive flag to ghc.  It would then
 read filenames on stdin, compiling each one in turn, only exiting when
 it sees EOF.

 Then a separate program, ghc-fe, can wrap ghc and acts like a drop-in
 replacement for ghc.

One immediate problem I see with this is linking - 'ghc --make
Main.hs' is able to figure out what packages a program depends on,
while 'ghc Main.o ... -o Main' requires the user to specify them
manually with -package. So you'll either need to pass this information
back to the parent process, or use 'ghc --make' for linking (which
adds more overhead).

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: parallelizing ghc

2012-01-24 Thread Evan Laforge
 One immediate problem I see with this is linking - 'ghc --make
 Main.hs' is able to figure out what packages a program depends on,
 while 'ghc Main.o ... -o Main' requires the user to specify them
 manually with -package. So you'll either need to pass this information
 back to the parent process, or use 'ghc --make' for linking (which
 adds more overhead).

Well, figuring out dependencies is the job of the build system.  I'd
be perfectly happy to just invoke ghc with a hardcoded package list as
I do currently, or as you said, invoke --make just to figure out the
package list for me.  The time is going to be dominated by linking,
which is single threaded anyway, so either way works.

It would be a neat feature to be able to ask ghc to figure out the
packages needed for a particular file and emit them for the build
system (or is there already a way to do that currently?), but it's
orthogonal I think.  Probably not hard though, just stick a knob on
--make that prints the link line instead of running it.

 There is in fact an '--interactive' flag already, 'ghc --interactive'
 is a synonym for 'ghci'.

Oh right, well some other name then :)

 I'm also interested in a build server mode for ghc. I have written a
 parallel wrapper for 'ghc --make' [1], but the speed gains are not as
 impressive [2] as I hoped because of the duplicated work.

Was the duplicated work rereading .hi files, or was there something else?

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: parallelizing ghc

2012-01-24 Thread Mikhail Glushenkov
Hi,

On Tue, Jan 24, 2012 at 7:04 PM, Evan Laforge qdun...@gmail.com wrote:
 I'm also interested in a build server mode for ghc. I have written a
 parallel wrapper for 'ghc --make' [1], but the speed gains are not as
 impressive [2] as I hoped because of the duplicated work.

 Was the duplicated work rereading .hi files, or was there something else?

I think so - according to the GHC manual, the main speed improvement
comes from caching the information between compilations.

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


parallelizing ghc

2012-01-23 Thread Evan Laforge
I recently switched from ghc --make to a parallelized build system.  I
was looking forward to faster builds, and while they are much faster
at figuring out what has to be rebuilt (which is most of the time for
a small rebuild, since ld dominates), compilation of the whole system
is either the same or slightly slower than the single threaded ghc
--make version.  My guess is that the overhead of starting up lots of
individual ghcs, each of which has to read all the .hi files all over
again, just about cancels out the parallelism gains.

So one way around that would be parallelizing --make, which has been a
TODO for a long time.  However, I believe that's never going to be
satisfactory for a project involving various different languages,
because ghc itself is never going to be a general purpose build
system.

So ghc --make provides two things: a dependency chaser and a way to
keep the compiler resident as it compiles new files.  Since the
dependency chaser will never be as powerful as a real build system, it
occurs to me that the only reasonable way forward is to split out the
second part, by adding an --interactive flag to ghc.  It would then
read filenames on stdin, compiling each one in turn, only exiting when
it sees EOF.

Then a separate program, ghc-fe, can wrap ghc and acts like a drop-in
replacement for ghc.

It would be nice if ghc could atomically read one line from the input,
then you could just start a bunch of ghcs behind a named pipe and each
would steal its own work.  But I don't think that's possible with unix
pipes, and of course there are still a few non-unix systems out there.
 And I guess ghc-fe has to wait for the compilation to finish, so I
guess ghc has to print a status line when it completes (or fails) a
module.  But it can still be done with an external distributor program
that acts like a server: starts up n ghcs, distributes src files
between them, and shuts them down then given the command:

data Ghc = Ghc { status :: Free|Busy, in :: Handle, out :: Handle, pid :: Int }

main = do
origFlags - getArgs
ghcs - mapM (startup origFlags) [0..cpus]
socket - accept
while $ read socket = \case of
Quit - return False
Compile ghcFlags src - forkIO $
assert $ ghcFlags == origFlags
result - bracket (findFreeAndMarkBusy ghcs) markFree $ \ghc - do
tellGhc ghc src
readResult ghc
write socket result
return True
mapM_ shutdown ghcs

The ghc-fe then starts a distributor if one is not running, sends a
src file and waits for the response, acting like a drop-in replacement
for the ghc cmdline.  Build systems just call ghc-fe and have an extra
responsibility to call ghc-fe --quit when they are done.  And I guess
if they know how many files they want to rebuild, it won't be worth it
below a certain threshold.


So I'm wondering, does this seem reasonable and feasible?  Is there a
better way to do it?  Even if it could be done, would it be worth it?
If the answers are yes, maybe not, and maybe yes, then how hard
would this be to do and where should I start looking?  I'm assuming
start at GhcMake.hs and work outwards from there...

I'm not entirely sure it would be worth it to me even if it did make
full builds, say 1.5x faster for my dual core i5, but it's interesting
to think about all the same.

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users