Re: inter module optimizations

2007-03-28 Thread Donald Bruce Stewart
fmohamed:
> I had posted some data on inter-module optimizations that I had 
> calculated when splitting my program from one computational module to 
> many different ones.
> 
> Tim Chevalier suggested that my calculation could be interesting to the 
> people here.
> 
> So I made the effort of preparing the various versions of my code and re 
> doing the analysis better.
> Unfortunately I had already began renaming things without doing a darcs 
> record, so in the split version some function names are different.
> 
> I have a tar.bz archive of 21KB, but I did not know if it is considered 
> rude to send attachments, but if someone is interested I can send him 
> the file.
> 
> Basically it mainly boils down to non-inlining of some important 
> functions on a newtype (
>type LatLocI = Word32
>newtype LatLoc = LatLoc LatLocI deriving (Eq,Ord)
> ), because specialization should not be an issue as I had already given 
> specific signatures to my functions.
> 
> Also worth noting is that using the profiling with -O2 compilation makes 
> one thing that inlining (or using a single module) makes the program 
> slower, whereas the opposite is true. I think that the profiling 
> overhead are incorrectly evaluated.
> I know that with -O2 one cannot expect profiling to be good, but it 
> would be nice if it wouldn't be so misleading
> 
> Here some data (obtained with a script that is also in the tar.bz archive)
> 
>  allInOne:
> original program, monolithic main computational module
> * timings of -O2 executable
> 7.67user 0.00system 0:07.69elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+894minor)pagefaults 0swaps
> * timings of the executable with profiling
>total time  =   15.25 secs   (305 ticks @ 50 ms)
>total alloc = 5,888,786,120 bytes  (excludes profiling overheads)
>  splitModule NoReexport NoInline directives:
> split computational module, no export list for split modules
> * timings of -O2 executable
> 10.14user 0.01system 0:10.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+901minor)pagefaults 0swaps
> * timings of the executable with profiling
>total time  =   11.85 secs   (237 ticks @ 50 ms)
>total alloc = 5,888,780,912 bytes  (excludes profiling overheads)
>  splitModule Reexport NoInline directives:
> computational module, no export list for split modules, old module 
> reexport using export list
> * timings of -O2 executable
> 8.88user 0.00system 0:08.90elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+901minor)pagefaults 0swaps
> * timings of the executable with profiling
> total time  =   12.20 secs   (244 ticks @ 50 ms)
>total alloc = 5,888,780,912 bytes  (excludes profiling overheads)
>  splitModule NoReexport Inline directives:
> split computational module, no export list for split modules, explicit 
> inline directives
> * timings of -O2 executable
> 6.44user 0.01system 0:06.46elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+895minor)pagefaults 0swaps
> * timings of the executable with profiling
>total time  =   18.80 secs   (376 ticks @ 50 ms)
>total alloc = 5,374,883,312 bytes  (excludes profiling overheads)
> *
> 
> Fawzi

To really understand what is going on, I suggest looking at the
-ddump-simpl output as you change the inlining settings. Then you'll see
how GHC is moving code about.

-- Don (who's spent the last 2 weeks playing the simplifer/inliner game)
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: inter module optimizations

2007-03-28 Thread Fawzi Mohamed

Donald Bruce Stewart wrote:

[..]
To really understand what is going on, I suggest looking at the
-ddump-simpl output as you change the inlining settings. Then you'll see
how GHC is moving code about.

-- Don (who's spent the last 2 weeks playing the simplifer/inliner game)
  
Thanks, but actually (with Jeremy's and your suggestion on haskell-cafe 
about the INLINE directive) I have got back the performance that I had 
(actually even better than before), I don't want to *really* understand 
it ;-).
I was thinking that maybe someone else here would have liked to 
understand it.
Especially the fact that profiling gives exactly the opposite trend as 
the executable without profiling (the fastest program becomes the 
slowest and vice versa) when using -O2 is annoying.


Fawzi

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users