Parallel --make (GHC build times on newer MacBook Pros?)

2011-08-26 Thread Brandon Moore
> From: Evan Laforge 

> Sent: Friday, August 26, 2011 6:35 PM
> Subject: Re: GHC build times on newer MacBook Pros?
> 
> On Tue, Aug 23, 2011 at 10:24 AM, David Terei  
> wrote:
>>  I have a 16 core machine at work (with 48GB of ram, a perk of the job
>>  :)). GHC can saturate them all. Can validate GHC in well under 10
>>  minutes on it.
> 
> To wander a bit from the topic, when I first saw this I thought "wow,
> ghc builds in parallel now, I want that" but then I realized it's
> because ghc itself uses make, not --make.  --make's automatic
> dependencies are convenient, but figuring out dependencies on every
> build and not being parallel means make should be a lot faster.  Also,
> --make doesn't understand the hsc->hs link, so in practice I have to
> do a fair amount of manual dependencies anyway.  So it inspired me to
> try to switch from --make to make for my own project.

I'm confused by this as well. Parallelizing --make was one of the
first case studies in the smp runtime paper, section 7 in
Haskell on a Shared-Memory Multiprocessor

There's also a trac ticket
http://hackage.haskell.org/trac/ghc/ticket/910with a vague comment that the 
patch from the paper
"almost certainly isn't ready for prime time",
but I haven't
seen any description of specific problems.


Brandon


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Parallel --make (GHC build times on newer MacBook Pros?)

2011-08-27 Thread Johan Tibell
On Sat, Aug 27, 2011 at 5:25 AM, Brandon Moore
 wrote:
> I'm confused by this as well. Parallelizing --make was one of the
> first case studies in the smp runtime paper, section 7 in
> Haskell on a Shared-Memory Multiprocessor
>
> There's also a trac ticket
> http://hackage.haskell.org/trac/ghc/ticket/910with a vague comment that the 
> patch from the paper
> "almost certainly isn't ready for prime time",
> but I haven't
> seen any description of specific problems.

>From what I remember someone tried to parallelize GHC but it turned
out to me tricky in practice. At the moment very trying to parallelize
Cabal which would allow us to build packages/modules in parallel using
ghc -c and let Cabal handle dependency management (including
preprocessing of .hsc files).

Johan

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Parallel --make (GHC build times on newer MacBook Pros?)

2011-08-27 Thread Evan Laforge
> From what I remember someone tried to parallelize GHC but it turned
> out to me tricky in practice. At the moment very trying to parallelize
> Cabal which would allow us to build packages/modules in parallel using
> ghc -c and let Cabal handle dependency management (including
> preprocessing of .hsc files).

Right, that's probably the one I mentioned.  And I think he was trying
to parallelize ghc internally, so even compiling one file could
parallelize.  That would be cool and all, but seems like a lot of work
compared to just parallelizing at the file level, as make would do.

A parallel cabal build would be excellent, but AFAIK not much help for
mixed language projects, though I admit I haven't tried cabal for that
yet.  I'm sure it could launch make to build the C, but can it track
.h -> .hsc dependencies?  Parallel cabal build would tempt me to give
it a try.

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Parallel --make (GHC build times on newer MacBook Pros?)

2011-08-29 Thread Max Bolingbroke
On 27 August 2011 09:00, Evan Laforge  wrote:
> Right, that's probably the one I mentioned.  And I think he was trying
> to parallelize ghc internally, so even compiling one file could
> parallelize.  That would be cool and all, but seems like a lot of work
> compared to just parallelizing at the file level, as make would do.

It was Thomas Schilling, and he wasn't trying to parallelise the
compilation of a single file. He was just trying to make access to the
various bits of shared state GHC uses thread safe. This mostly worked
but caused an unacceptable performance penalty to single-threaded
compilation.

Max

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Parallel --make (GHC build times on newer MacBook Pros?)

2011-08-29 Thread Thomas Schilling
The performance problem was due to the use of unsafePerformIO or other
thunk-locking functions.  The problem was that such functions can
cause severe performance problems when using a deep stack.  The
problem is that these functions need to traverse the stack to
atomically claim thunks that might be under evaluation by multiple
threads.

The latest version of GHC should no longer have this problem (or not
as severely) because the stack is now split into chunks (see [1] for
performance tuning options) only one of which needs to be scanned.
So, it might be worth a try to re-apply that thread-safety patch.

[1]: https://plus.google.com/107890464054636586545/posts/LqgXK77FgfV

On 29 August 2011 21:50, Max Bolingbroke  wrote:
> On 27 August 2011 09:00, Evan Laforge  wrote:
>> Right, that's probably the one I mentioned.  And I think he was trying
>> to parallelize ghc internally, so even compiling one file could
>> parallelize.  That would be cool and all, but seems like a lot of work
>> compared to just parallelizing at the file level, as make would do.
>
> It was Thomas Schilling, and he wasn't trying to parallelise the
> compilation of a single file. He was just trying to make access to the
> various bits of shared state GHC uses thread safe. This mostly worked
> but caused an unacceptable performance penalty to single-threaded
> compilation.
>
> Max
>
> ___
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users@haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>



-- 
Push the envelope. Watch it bend.

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Parallel --make (GHC build times on newer MacBook Pros?)

2011-08-29 Thread Evan Laforge
On Mon, Aug 29, 2011 at 1:50 PM, Max Bolingbroke
 wrote:
> On 27 August 2011 09:00, Evan Laforge  wrote:
>> Right, that's probably the one I mentioned.  And I think he was trying
>> to parallelize ghc internally, so even compiling one file could
>> parallelize.  That would be cool and all, but seems like a lot of work
>> compared to just parallelizing at the file level, as make would do.
>
> It was Thomas Schilling, and he wasn't trying to parallelise the
> compilation of a single file. He was just trying to make access to the
> various bits of shared state GHC uses thread safe. This mostly worked
> but caused an unacceptable performance penalty to single-threaded
> compilation.

Interesting, maybe I misremembered?  Or maybe there was some other guy
who was trying to parallelize?

Just out of curiosity, what benefit does a thread-safe ghc provide?  I
know ghc api users have go to some bother to not call re-entrantly...
what neat stuff could we do with a re-entrant ghc?  Could it
eventually lead to an internally parallel ghc or are there deeper
reasons it's hard to parallelize compilation?  That would be really
cool, if possible.  In fact, I don't know of any parallel compilers.

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Parallel --make (GHC build times on newer MacBook Pros?)

2011-08-29 Thread Thomas Schilling
On 30 August 2011 01:16, Evan Laforge  wrote:
> Interesting, maybe I misremembered?  Or maybe there was some other guy
> who was trying to parallelize?
>
> Just out of curiosity, what benefit does a thread-safe ghc provide?  I
> know ghc api users have go to some bother to not call re-entrantly...
> what neat stuff could we do with a re-entrant ghc?  Could it
> eventually lead to an internally parallel ghc or are there deeper
> reasons it's hard to parallelize compilation?  That would be really
> cool, if possible.  In fact, I don't know of any parallel compilers.

Yes, the plan was to eventually have a parallel --make mode.

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Parallel --make (GHC build times on newer MacBook Pros?)

2011-08-31 Thread Simon Marlow

On 30/08/2011 00:42, Thomas Schilling wrote:

The performance problem was due to the use of unsafePerformIO or other
thunk-locking functions.  The problem was that such functions can
cause severe performance problems when using a deep stack.  The
problem is that these functions need to traverse the stack to
atomically claim thunks that might be under evaluation by multiple
threads.

The latest version of GHC should no longer have this problem (or not
as severely) because the stack is now split into chunks (see [1] for
performance tuning options) only one of which needs to be scanned.
So, it might be worth a try to re-apply that thread-safety patch.

[1]: https://plus.google.com/107890464054636586545/posts/LqgXK77FgfV


I think I would do it differently.  Rather than using unsafePerformIO, 
use unsafeDupablePerformIO with an atomic idempotent operation.  Looking 
up or adding an entry to the FastString table can be done using an 
atomicModifyIORef, so this should be fine.


The other place you have to look carefully at is the NameCache; again an 
atomicModifyIORef should do the trick there.  In GHC 7.2.1 we also have 
a casMutVar# primitive which can be used to build lower-level atomic 
operations, so that might come in handy too.


Cheers,
Simon



On 29 August 2011 21:50, Max Bolingbroke  wrote:

On 27 August 2011 09:00, Evan Laforge  wrote:

Right, that's probably the one I mentioned.  And I think he was trying
to parallelize ghc internally, so even compiling one file could
parallelize.  That would be cool and all, but seems like a lot of work
compared to just parallelizing at the file level, as make would do.


It was Thomas Schilling, and he wasn't trying to parallelise the
compilation of a single file. He was just trying to make access to the
various bits of shared state GHC uses thread safe. This mostly worked
but caused an unacceptable performance penalty to single-threaded
compilation.

Max

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users








___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Parallel --make (GHC build times on newer MacBook Pros?)

2011-09-01 Thread Evan Laforge
> Yes, the plan was to eventually have a parallel --make mode.

If that's the goal, wouldn't it be easier to start many ghcs?

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Parallel --make (GHC build times on newer MacBook Pros?)

2011-09-01 Thread Thomas Schilling
On 1 September 2011 08:44, Evan Laforge  wrote:
>> Yes, the plan was to eventually have a parallel --make mode.
>
> If that's the goal, wouldn't it be easier to start many ghcs?

Yes.  With Scion I'm in the process of moving away from using GHC's
compilation manager (i.e., --make) towards a multi-process setup.
This has a number of advantages:

  - Less memory usage.  Loading lots of modules (e.g., GHC itself) can
take up to 1G of memory.  There are also a number of caches that can
only be flushed by restarting the session.

  - Sidestep a few bugs in the compilation manager, such as
non-flushable instance caches which lead to spurious instance
overlaps.  (Sorry, can't find the corresponding ticket, right now.)

  - An external compilation manager (e.g., Shake) can also handle
preprocessing of other extensions, such as .y, .chs, etc.

  - Support for different static flags (e.g., -prof).  Static flags
should eventually be removed from GHC, but it's low-priority and
difficult to do.

  - Uniform handling of compilation with multiple versions of GHC.

  - Parallel building, as you mentioned.

There may be more.  It also comes with disadvantages, such as the need
to serialise more data, but I think it's worth it.

This is the main reason why I stopped working on a thread-safe GHC.
Personally, I believe the GHC API should just include a simple API for
compiling a single module and return some binary value (i.e., don't
automatically write things to a file).  Everything else, including
GHCi, should be separate.

But that's a different matter...

-- 
Push the envelope. Watch it bend.

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Parallel --make (GHC build times on newer MacBook Pros?)

2011-09-01 Thread Simon Marlow

On 01/09/2011 08:44, Evan Laforge wrote:

Yes, the plan was to eventually have a parallel --make mode.


If that's the goal, wouldn't it be easier to start many ghcs?


It's an interesting idea that I hadn't thought of.  There would have to 
be an atomic file system operation to "commit" a compiled module - 
getting that right could be a bit tricky (compilation isn't 
deterministic, so the commit has to be atomic).


Then you would probably want to randomise the build order of each --make 
run to maximise the chance that each GHC does something different.


Fun project for someone?

Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Parallel --make (GHC build times on newer MacBook Pros?)

2011-09-01 Thread Edward Kmett
On Thu, Sep 1, 2011 at 8:49 AM, Simon Marlow  wrote:

> On 01/09/2011 08:44, Evan Laforge wrote:
>
>> Yes, the plan was to eventually have a parallel --make mode.
>>>
>>
>> If that's the goal, wouldn't it be easier to start many ghcs?
>>
>
> It's an interesting idea that I hadn't thought of.  There would have to be
> an atomic file system operation to "commit" a compiled module - getting that
> right could be a bit tricky (compilation isn't deterministic, so the commit
> has to be atomic).
>

I suppose you could just rename it into place when you're done.

-Edward
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Parallel --make (GHC build times on newer MacBook Pros?)

2011-09-01 Thread Evan Laforge
>> It's an interesting idea that I hadn't thought of.  There would have to be
>> an atomic file system operation to "commit" a compiled module - getting that
>> right could be a bit tricky (compilation isn't deterministic, so the commit
>> has to be atomic).
>
> I suppose you could just rename it into place when you're done.
> -Edward

I was imagining that it could create Module.o.compiling and then
rename into place when it's done.  Then each ghc would do a work
stealing thing where it tries to find output to produce that doesn't
have an accompanying .compiling, or sleeps for a bit if all work at
this stage is already taken, which is likely to happen since sometimes
the graph would go through a bottleneck.  Then it's easy to clean up
if work gets interrupted, just rm **/*.compiling

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Parallel --make (GHC build times on newer MacBook Pros?)

2011-09-02 Thread Simon Marlow

On 01/09/2011 18:02, Evan Laforge wrote:

It's an interesting idea that I hadn't thought of.  There would have to be
an atomic file system operation to "commit" a compiled module - getting that
right could be a bit tricky (compilation isn't deterministic, so the commit
has to be atomic).


I suppose you could just rename it into place when you're done.
-Edward


I was imagining that it could create Module.o.compiling and then
rename into place when it's done.  Then each ghc would do a work
stealing thing where it tries to find output to produce that doesn't
have an accompanying .compiling, or sleeps for a bit if all work at
this stage is already taken, which is likely to happen since sometimes
the graph would go through a bottleneck.  Then it's easy to clean up
if work gets interrupted, just rm **/*.compiling


Right, using a Module.o.compiling file as a lock would work.

Another way to do this would be to have GHC --make invoke itself to 
compile each module separately.  Actually I think I prefer this method, 
although it might be a bit slower since each individual compilation has 
to read lots of interface files.  The main GHC --make process would do 
the final link only.  A fun hack for somebody?


Cheers,
Simon



___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Parallel --make (GHC build times on newer MacBook Pros?)

2011-09-02 Thread Joachim Breitner
Hi,

Am Freitag, den 02.09.2011, 09:07 +0100 schrieb Simon Marlow:
> On 01/09/2011 18:02, Evan Laforge wrote:
> >>> It's an interesting idea that I hadn't thought of.  There would have to be
> >>> an atomic file system operation to "commit" a compiled module - getting 
> >>> that
> >>> right could be a bit tricky (compilation isn't deterministic, so the 
> >>> commit
> >>> has to be atomic).
> >>
> >> I suppose you could just rename it into place when you're done.
> >> -Edward
> >
> > I was imagining that it could create Module.o.compiling and then
> > rename into place when it's done.  Then each ghc would do a work
> > stealing thing where it tries to find output to produce that doesn't
> > have an accompanying .compiling, or sleeps for a bit if all work at
> > this stage is already taken, which is likely to happen since sometimes
> > the graph would go through a bottleneck.  Then it's easy to clean up
> > if work gets interrupted, just rm **/*.compiling
> 
> Right, using a Module.o.compiling file as a lock would work.
> 
> Another way to do this would be to have GHC --make invoke itself to 
> compile each module separately.  Actually I think I prefer this method, 
> although it might be a bit slower since each individual compilation has 
> to read lots of interface files.  The main GHC --make process would do 
> the final link only.  A fun hack for somebody?

this would also help building large libraries on architectures with
little memory, as it seems to me that when one ghc instance is compiling
multiple modules in a row, some leaked memory/unevaluated thunks pile up
and eventually cause the compilation to abort. I suspect that building
each file on its own avoids this issue.

(But this is only based on observation, not on hard facts.)

Greetings,
Joachim

-- 
Joachim "nomeata" Breitner
  m...@joachim-breitner.de  |  nome...@debian.org  |  GPG: 0x4743206C
  xmpp: nome...@joachim-breitner.de | http://www.joachim-breitner.de/



signature.asc
Description: This is a digitally signed message part
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Parallel --make (GHC build times on newer MacBook Pros?)

2011-09-02 Thread Evan Laforge
>> Another way to do this would be to have GHC --make invoke itself to
>> compile each module separately.  Actually I think I prefer this method,
>> although it might be a bit slower since each individual compilation has
>> to read lots of interface files.  The main GHC --make process would do
>> the final link only.  A fun hack for somebody?
>
> this would also help building large libraries on architectures with
> little memory, as it seems to me that when one ghc instance is compiling
> multiple modules in a row, some leaked memory/unevaluated thunks pile up
> and eventually cause the compilation to abort. I suspect that building
> each file on its own avoids this issue.

In my experience, reading all those .hi files is not so quick, about
1.5s for around 200 modules, on an SSD.  It gets worse with a pgmF, since ghc
wants to preprocess each file, it's a minimum of 5s given 'cat' as a
preprocessor.

Part of my wanting to use make instead of --make was to avoid this
re-preprocessing delay.  It's nice that it will automatically notice
which modules to recompile if a CPP define changes, but not so nice
that it has to take a lot of time to figure that out every single
compile, or for a preprocessor that doesn't have the power to change
whether the module should be recompiled or not.

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Parallel --make (GHC build times on newer MacBook Pros?)

2011-09-05 Thread Simon Marlow

On 03/09/2011 02:05, Evan Laforge wrote:

Another way to do this would be to have GHC --make invoke itself to
compile each module separately.  Actually I think I prefer this method,
although it might be a bit slower since each individual compilation has
to read lots of interface files.  The main GHC --make process would do
the final link only.  A fun hack for somebody?


this would also help building large libraries on architectures with
little memory, as it seems to me that when one ghc instance is compiling
multiple modules in a row, some leaked memory/unevaluated thunks pile up
and eventually cause the compilation to abort. I suspect that building
each file on its own avoids this issue.


In my experience, reading all those .hi files is not so quick, about
1.5s for around 200 modules, on an SSD.  It gets worse with a pgmF, since ghc
wants to preprocess each file, it's a minimum of 5s given 'cat' as a
preprocessor.

Part of my wanting to use make instead of --make was to avoid this
re-preprocessing delay.  It's nice that it will automatically notice
which modules to recompile if a CPP define changes, but not so nice
that it has to take a lot of time to figure that out every single
compile, or for a preprocessor that doesn't have the power to change
whether the module should be recompiled or not.


Ah, but you're measuring the startup time of ghc --make, which is not 
the same as the work that each individual ghc would do if ghc were 
invoked separately on each module, for two reasons:


 - when used in one-shot mode (i.e. without --make), ghc only reads
   and processes the interface files it needs, lazilly

 - the individual ghc's would not need to proprocess modules - that
   would only be done once, by the master process, before starting
   the subprocesses.  The preprocessed source would be cached,
   exactly as it is now by --make.

Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Parallel --make (GHC build times on newer MacBook Pros?)

2011-09-05 Thread Evan Laforge
> Ah, but you're measuring the startup time of ghc --make, which is not the
same as the work that each individual ghc would do if ghc were invoked
separately on each module, for two reasons:

Excellent, sign me up for this plan then :)  ghc on a single file is very
quick.
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users