Parallel --make (GHC build times on newer MacBook Pros?)
> From: Evan Laforge > Sent: Friday, August 26, 2011 6:35 PM > Subject: Re: GHC build times on newer MacBook Pros? > > On Tue, Aug 23, 2011 at 10:24 AM, David Terei > wrote: >> I have a 16 core machine at work (with 48GB of ram, a perk of the job >> :)). GHC can saturate them all. Can validate GHC in well under 10 >> minutes on it. > > To wander a bit from the topic, when I first saw this I thought "wow, > ghc builds in parallel now, I want that" but then I realized it's > because ghc itself uses make, not --make. --make's automatic > dependencies are convenient, but figuring out dependencies on every > build and not being parallel means make should be a lot faster. Also, > --make doesn't understand the hsc->hs link, so in practice I have to > do a fair amount of manual dependencies anyway. So it inspired me to > try to switch from --make to make for my own project. I'm confused by this as well. Parallelizing --make was one of the first case studies in the smp runtime paper, section 7 in Haskell on a Shared-Memory Multiprocessor There's also a trac ticket http://hackage.haskell.org/trac/ghc/ticket/910with a vague comment that the patch from the paper "almost certainly isn't ready for prime time", but I haven't seen any description of specific problems. Brandon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Parallel --make (GHC build times on newer MacBook Pros?)
On Sat, Aug 27, 2011 at 5:25 AM, Brandon Moore wrote: > I'm confused by this as well. Parallelizing --make was one of the > first case studies in the smp runtime paper, section 7 in > Haskell on a Shared-Memory Multiprocessor > > There's also a trac ticket > http://hackage.haskell.org/trac/ghc/ticket/910with a vague comment that the > patch from the paper > "almost certainly isn't ready for prime time", > but I haven't > seen any description of specific problems. >From what I remember someone tried to parallelize GHC but it turned out to me tricky in practice. At the moment very trying to parallelize Cabal which would allow us to build packages/modules in parallel using ghc -c and let Cabal handle dependency management (including preprocessing of .hsc files). Johan ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Parallel --make (GHC build times on newer MacBook Pros?)
> From what I remember someone tried to parallelize GHC but it turned > out to me tricky in practice. At the moment very trying to parallelize > Cabal which would allow us to build packages/modules in parallel using > ghc -c and let Cabal handle dependency management (including > preprocessing of .hsc files). Right, that's probably the one I mentioned. And I think he was trying to parallelize ghc internally, so even compiling one file could parallelize. That would be cool and all, but seems like a lot of work compared to just parallelizing at the file level, as make would do. A parallel cabal build would be excellent, but AFAIK not much help for mixed language projects, though I admit I haven't tried cabal for that yet. I'm sure it could launch make to build the C, but can it track .h -> .hsc dependencies? Parallel cabal build would tempt me to give it a try. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Parallel --make (GHC build times on newer MacBook Pros?)
On 27 August 2011 09:00, Evan Laforge wrote: > Right, that's probably the one I mentioned. And I think he was trying > to parallelize ghc internally, so even compiling one file could > parallelize. That would be cool and all, but seems like a lot of work > compared to just parallelizing at the file level, as make would do. It was Thomas Schilling, and he wasn't trying to parallelise the compilation of a single file. He was just trying to make access to the various bits of shared state GHC uses thread safe. This mostly worked but caused an unacceptable performance penalty to single-threaded compilation. Max ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Parallel --make (GHC build times on newer MacBook Pros?)
The performance problem was due to the use of unsafePerformIO or other thunk-locking functions. The problem was that such functions can cause severe performance problems when using a deep stack. The problem is that these functions need to traverse the stack to atomically claim thunks that might be under evaluation by multiple threads. The latest version of GHC should no longer have this problem (or not as severely) because the stack is now split into chunks (see [1] for performance tuning options) only one of which needs to be scanned. So, it might be worth a try to re-apply that thread-safety patch. [1]: https://plus.google.com/107890464054636586545/posts/LqgXK77FgfV On 29 August 2011 21:50, Max Bolingbroke wrote: > On 27 August 2011 09:00, Evan Laforge wrote: >> Right, that's probably the one I mentioned. And I think he was trying >> to parallelize ghc internally, so even compiling one file could >> parallelize. That would be cool and all, but seems like a lot of work >> compared to just parallelizing at the file level, as make would do. > > It was Thomas Schilling, and he wasn't trying to parallelise the > compilation of a single file. He was just trying to make access to the > various bits of shared state GHC uses thread safe. This mostly worked > but caused an unacceptable performance penalty to single-threaded > compilation. > > Max > > ___ > Glasgow-haskell-users mailing list > Glasgow-haskell-users@haskell.org > http://www.haskell.org/mailman/listinfo/glasgow-haskell-users > -- Push the envelope. Watch it bend. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Parallel --make (GHC build times on newer MacBook Pros?)
On Mon, Aug 29, 2011 at 1:50 PM, Max Bolingbroke wrote: > On 27 August 2011 09:00, Evan Laforge wrote: >> Right, that's probably the one I mentioned. And I think he was trying >> to parallelize ghc internally, so even compiling one file could >> parallelize. That would be cool and all, but seems like a lot of work >> compared to just parallelizing at the file level, as make would do. > > It was Thomas Schilling, and he wasn't trying to parallelise the > compilation of a single file. He was just trying to make access to the > various bits of shared state GHC uses thread safe. This mostly worked > but caused an unacceptable performance penalty to single-threaded > compilation. Interesting, maybe I misremembered? Or maybe there was some other guy who was trying to parallelize? Just out of curiosity, what benefit does a thread-safe ghc provide? I know ghc api users have go to some bother to not call re-entrantly... what neat stuff could we do with a re-entrant ghc? Could it eventually lead to an internally parallel ghc or are there deeper reasons it's hard to parallelize compilation? That would be really cool, if possible. In fact, I don't know of any parallel compilers. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Parallel --make (GHC build times on newer MacBook Pros?)
On 30 August 2011 01:16, Evan Laforge wrote: > Interesting, maybe I misremembered? Or maybe there was some other guy > who was trying to parallelize? > > Just out of curiosity, what benefit does a thread-safe ghc provide? I > know ghc api users have go to some bother to not call re-entrantly... > what neat stuff could we do with a re-entrant ghc? Could it > eventually lead to an internally parallel ghc or are there deeper > reasons it's hard to parallelize compilation? That would be really > cool, if possible. In fact, I don't know of any parallel compilers. Yes, the plan was to eventually have a parallel --make mode. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Parallel --make (GHC build times on newer MacBook Pros?)
On 30/08/2011 00:42, Thomas Schilling wrote: The performance problem was due to the use of unsafePerformIO or other thunk-locking functions. The problem was that such functions can cause severe performance problems when using a deep stack. The problem is that these functions need to traverse the stack to atomically claim thunks that might be under evaluation by multiple threads. The latest version of GHC should no longer have this problem (or not as severely) because the stack is now split into chunks (see [1] for performance tuning options) only one of which needs to be scanned. So, it might be worth a try to re-apply that thread-safety patch. [1]: https://plus.google.com/107890464054636586545/posts/LqgXK77FgfV I think I would do it differently. Rather than using unsafePerformIO, use unsafeDupablePerformIO with an atomic idempotent operation. Looking up or adding an entry to the FastString table can be done using an atomicModifyIORef, so this should be fine. The other place you have to look carefully at is the NameCache; again an atomicModifyIORef should do the trick there. In GHC 7.2.1 we also have a casMutVar# primitive which can be used to build lower-level atomic operations, so that might come in handy too. Cheers, Simon On 29 August 2011 21:50, Max Bolingbroke wrote: On 27 August 2011 09:00, Evan Laforge wrote: Right, that's probably the one I mentioned. And I think he was trying to parallelize ghc internally, so even compiling one file could parallelize. That would be cool and all, but seems like a lot of work compared to just parallelizing at the file level, as make would do. It was Thomas Schilling, and he wasn't trying to parallelise the compilation of a single file. He was just trying to make access to the various bits of shared state GHC uses thread safe. This mostly worked but caused an unacceptable performance penalty to single-threaded compilation. Max ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Parallel --make (GHC build times on newer MacBook Pros?)
> Yes, the plan was to eventually have a parallel --make mode. If that's the goal, wouldn't it be easier to start many ghcs? ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Parallel --make (GHC build times on newer MacBook Pros?)
On 1 September 2011 08:44, Evan Laforge wrote: >> Yes, the plan was to eventually have a parallel --make mode. > > If that's the goal, wouldn't it be easier to start many ghcs? Yes. With Scion I'm in the process of moving away from using GHC's compilation manager (i.e., --make) towards a multi-process setup. This has a number of advantages: - Less memory usage. Loading lots of modules (e.g., GHC itself) can take up to 1G of memory. There are also a number of caches that can only be flushed by restarting the session. - Sidestep a few bugs in the compilation manager, such as non-flushable instance caches which lead to spurious instance overlaps. (Sorry, can't find the corresponding ticket, right now.) - An external compilation manager (e.g., Shake) can also handle preprocessing of other extensions, such as .y, .chs, etc. - Support for different static flags (e.g., -prof). Static flags should eventually be removed from GHC, but it's low-priority and difficult to do. - Uniform handling of compilation with multiple versions of GHC. - Parallel building, as you mentioned. There may be more. It also comes with disadvantages, such as the need to serialise more data, but I think it's worth it. This is the main reason why I stopped working on a thread-safe GHC. Personally, I believe the GHC API should just include a simple API for compiling a single module and return some binary value (i.e., don't automatically write things to a file). Everything else, including GHCi, should be separate. But that's a different matter... -- Push the envelope. Watch it bend. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Parallel --make (GHC build times on newer MacBook Pros?)
On 01/09/2011 08:44, Evan Laforge wrote: Yes, the plan was to eventually have a parallel --make mode. If that's the goal, wouldn't it be easier to start many ghcs? It's an interesting idea that I hadn't thought of. There would have to be an atomic file system operation to "commit" a compiled module - getting that right could be a bit tricky (compilation isn't deterministic, so the commit has to be atomic). Then you would probably want to randomise the build order of each --make run to maximise the chance that each GHC does something different. Fun project for someone? Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Parallel --make (GHC build times on newer MacBook Pros?)
On Thu, Sep 1, 2011 at 8:49 AM, Simon Marlow wrote: > On 01/09/2011 08:44, Evan Laforge wrote: > >> Yes, the plan was to eventually have a parallel --make mode. >>> >> >> If that's the goal, wouldn't it be easier to start many ghcs? >> > > It's an interesting idea that I hadn't thought of. There would have to be > an atomic file system operation to "commit" a compiled module - getting that > right could be a bit tricky (compilation isn't deterministic, so the commit > has to be atomic). > I suppose you could just rename it into place when you're done. -Edward ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Parallel --make (GHC build times on newer MacBook Pros?)
>> It's an interesting idea that I hadn't thought of. There would have to be >> an atomic file system operation to "commit" a compiled module - getting that >> right could be a bit tricky (compilation isn't deterministic, so the commit >> has to be atomic). > > I suppose you could just rename it into place when you're done. > -Edward I was imagining that it could create Module.o.compiling and then rename into place when it's done. Then each ghc would do a work stealing thing where it tries to find output to produce that doesn't have an accompanying .compiling, or sleeps for a bit if all work at this stage is already taken, which is likely to happen since sometimes the graph would go through a bottleneck. Then it's easy to clean up if work gets interrupted, just rm **/*.compiling ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Parallel --make (GHC build times on newer MacBook Pros?)
On 01/09/2011 18:02, Evan Laforge wrote: It's an interesting idea that I hadn't thought of. There would have to be an atomic file system operation to "commit" a compiled module - getting that right could be a bit tricky (compilation isn't deterministic, so the commit has to be atomic). I suppose you could just rename it into place when you're done. -Edward I was imagining that it could create Module.o.compiling and then rename into place when it's done. Then each ghc would do a work stealing thing where it tries to find output to produce that doesn't have an accompanying .compiling, or sleeps for a bit if all work at this stage is already taken, which is likely to happen since sometimes the graph would go through a bottleneck. Then it's easy to clean up if work gets interrupted, just rm **/*.compiling Right, using a Module.o.compiling file as a lock would work. Another way to do this would be to have GHC --make invoke itself to compile each module separately. Actually I think I prefer this method, although it might be a bit slower since each individual compilation has to read lots of interface files. The main GHC --make process would do the final link only. A fun hack for somebody? Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Parallel --make (GHC build times on newer MacBook Pros?)
Hi, Am Freitag, den 02.09.2011, 09:07 +0100 schrieb Simon Marlow: > On 01/09/2011 18:02, Evan Laforge wrote: > >>> It's an interesting idea that I hadn't thought of. There would have to be > >>> an atomic file system operation to "commit" a compiled module - getting > >>> that > >>> right could be a bit tricky (compilation isn't deterministic, so the > >>> commit > >>> has to be atomic). > >> > >> I suppose you could just rename it into place when you're done. > >> -Edward > > > > I was imagining that it could create Module.o.compiling and then > > rename into place when it's done. Then each ghc would do a work > > stealing thing where it tries to find output to produce that doesn't > > have an accompanying .compiling, or sleeps for a bit if all work at > > this stage is already taken, which is likely to happen since sometimes > > the graph would go through a bottleneck. Then it's easy to clean up > > if work gets interrupted, just rm **/*.compiling > > Right, using a Module.o.compiling file as a lock would work. > > Another way to do this would be to have GHC --make invoke itself to > compile each module separately. Actually I think I prefer this method, > although it might be a bit slower since each individual compilation has > to read lots of interface files. The main GHC --make process would do > the final link only. A fun hack for somebody? this would also help building large libraries on architectures with little memory, as it seems to me that when one ghc instance is compiling multiple modules in a row, some leaked memory/unevaluated thunks pile up and eventually cause the compilation to abort. I suspect that building each file on its own avoids this issue. (But this is only based on observation, not on hard facts.) Greetings, Joachim -- Joachim "nomeata" Breitner m...@joachim-breitner.de | nome...@debian.org | GPG: 0x4743206C xmpp: nome...@joachim-breitner.de | http://www.joachim-breitner.de/ signature.asc Description: This is a digitally signed message part ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Parallel --make (GHC build times on newer MacBook Pros?)
>> Another way to do this would be to have GHC --make invoke itself to >> compile each module separately. Actually I think I prefer this method, >> although it might be a bit slower since each individual compilation has >> to read lots of interface files. The main GHC --make process would do >> the final link only. A fun hack for somebody? > > this would also help building large libraries on architectures with > little memory, as it seems to me that when one ghc instance is compiling > multiple modules in a row, some leaked memory/unevaluated thunks pile up > and eventually cause the compilation to abort. I suspect that building > each file on its own avoids this issue. In my experience, reading all those .hi files is not so quick, about 1.5s for around 200 modules, on an SSD. It gets worse with a pgmF, since ghc wants to preprocess each file, it's a minimum of 5s given 'cat' as a preprocessor. Part of my wanting to use make instead of --make was to avoid this re-preprocessing delay. It's nice that it will automatically notice which modules to recompile if a CPP define changes, but not so nice that it has to take a lot of time to figure that out every single compile, or for a preprocessor that doesn't have the power to change whether the module should be recompiled or not. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Parallel --make (GHC build times on newer MacBook Pros?)
On 03/09/2011 02:05, Evan Laforge wrote: Another way to do this would be to have GHC --make invoke itself to compile each module separately. Actually I think I prefer this method, although it might be a bit slower since each individual compilation has to read lots of interface files. The main GHC --make process would do the final link only. A fun hack for somebody? this would also help building large libraries on architectures with little memory, as it seems to me that when one ghc instance is compiling multiple modules in a row, some leaked memory/unevaluated thunks pile up and eventually cause the compilation to abort. I suspect that building each file on its own avoids this issue. In my experience, reading all those .hi files is not so quick, about 1.5s for around 200 modules, on an SSD. It gets worse with a pgmF, since ghc wants to preprocess each file, it's a minimum of 5s given 'cat' as a preprocessor. Part of my wanting to use make instead of --make was to avoid this re-preprocessing delay. It's nice that it will automatically notice which modules to recompile if a CPP define changes, but not so nice that it has to take a lot of time to figure that out every single compile, or for a preprocessor that doesn't have the power to change whether the module should be recompiled or not. Ah, but you're measuring the startup time of ghc --make, which is not the same as the work that each individual ghc would do if ghc were invoked separately on each module, for two reasons: - when used in one-shot mode (i.e. without --make), ghc only reads and processes the interface files it needs, lazilly - the individual ghc's would not need to proprocess modules - that would only be done once, by the master process, before starting the subprocesses. The preprocessed source would be cached, exactly as it is now by --make. Cheers, Simon ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Parallel --make (GHC build times on newer MacBook Pros?)
> Ah, but you're measuring the startup time of ghc --make, which is not the same as the work that each individual ghc would do if ghc were invoked separately on each module, for two reasons: Excellent, sign me up for this plan then :) ghc on a single file is very quick. ___ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users