Hey all, thanks a ton for the invaluable pointers. I’m now in the “I-dunno-what-I-am-doing” mode banging SCC annotations like there is no tomorrow, trying to spot any chance for some low-hanging-fruit algorithmic improvement (like using a sequence instead of a list, etc), and will come back to your suggestions as I will face the inevitable dead-end wall :D
Alfredo On 10 April 2017 at 01:54, Niklas Hambüchen <[email protected]> wrote: > I have some suggestions for low hanging fruits in this effort. > > 1. Make ghc print more statistics on what it spending time on > > When I did the linking investigation recently > (https://www.reddit.com/r/haskell/comments/63y43y/liked_ > linking_3x_faster_with_gold_link_10x_faster/) > I noticed (with strace) that there are lots of interesting syscalls > being made that you might not expect. For example, each time TH is used, > shared libraries are loaded, and to determine the shared library paths, > ghc shells out to `gcc --print-file-name`. Each such invocation takes 20 > ms on my system, and I have 1000 invocations in my build. That's 20 > seconds (out of 2 minutes build time) just asking gcc for paths. > > I recommend that for every call to an external GHC measures how long > that call took, so that it can be asked to print a summary when it's done. > > That might give us lots of interesting things to optimize. For example, > This would have made the long linker times totally obvious. > > At the end, I would love to know for each compilation (both one-shot as > used in ghc's build system, and `ghc --make`): > > * What programs did it invoke and how long did they take > * What files did it read and how long did that take > * How long did it take to read all the `.hi` files in `ghc --make` > * High level time summary (parsing, typechecking, codegen, .hi files, etc) > > That way we'll know at least what is slow, and don't have to resort to > strace every time in order to obtain this basic answer. > > 2. Investigate if idiotic syscalls are being done and how much > > There's this concept I call "idiotic syscalls", which are syscalls of > which you know from before that they won't contribute anything > productive. For example, if you give a linker N many `-L` flags (library > dirs) and M many `-l` flags (library names to link), it will try to > `stat()` or `open()` N*M many files, out of which most are total > rubbish, because we typically know what library is in what dir. > Example: You pass `-L/usr/lib/opencv -L/usr/lib/imagemagick > -L/usr/lib/blas -lopencv -limagemagick -lblas`. Then you you will get > things like `open("/usr/lib/opencv/libimagemagick.so") = ENOENT` which > makes no sense and obviously that file doesn't exist. This is a problem > with the general "search path" concept; same happens for running > executables searching through $PATH. Yes, nonexistent file opens fail > fast, but in my typical ghc invocation I get millions of them (and we > should at least measure how much time is wasted on them), and they > clutter the strace output and make the real problems harder to investigate. > We should check if we can create ways to give pass those files that do > exist. > > 3. Add pure TemplateHaskell > > It is well known that TH is a problem for incremental compilation > because it can have side effects and we must therefore be more > conservative about when to recompile; when you see a `[TH]` in your `ghc > --make` output, it's likely that time again. > > I believe this could be avoided by adding a variant of TH that forbids > the use of the `runIO` function, and can thus not have side effects. > > Most TH does not need side effects, for example any form of code > generation based on other data types (lenses, instances for whatever). > If that was made "pure TH", we would not have to recompile when inputs > to our TH functions change. > > Potentially this could even be determined automatically instead of > adding a new variant of TH like was done for typed TH `$$()`, simply by > inspecting what's in the TH and if we can decide there's no `runIO` in > there, mark it as clean, otherwise as tainted. > > 4. Build ghc with `ghc --make` if possible > > This one might be controversial or impossible (others can likely tell > us). Most Haskell code is built with `ghc --make`, not with the one-shot > compilation system + make or Hadrian as as done in GHC's build system. > Weirdly, often `ghc --make` scales much worse and has much worse > incremental recompilation times than the one-shot mode, which doesn't > make sense given that it has no process creation overhead, can do much > better caching etc. I believe that if ghc or large parts of it (e.g. > stage2) itself was built with `--make`, we would magically see --make > become very good, simply we make the right people (GHC devs) suffer > through it daily :D. I expect from this the solution of the `-j` > slowness, GHC overhead reduction, faster interface file loads and so on. > > These are some ideas. > > Niklas >
_______________________________________________ ghc-devs mailing list [email protected] http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
