I had posted some data on inter-module optimizations that I had calculated when splitting my program from one computational module to many different ones.

Tim Chevalier suggested that my calculation could be interesting to the people here.

So I made the effort of preparing the various versions of my code and re doing the analysis better. Unfortunately I had already began renaming things without doing a darcs record, so in the split version some function names are different.

I have a tar.bz archive of 21KB, but I did not know if it is considered rude to send attachments, but if someone is interested I can send him the file.

Basically it mainly boils down to non-inlining of some important functions on a newtype (
   type LatLocI = Word32
   newtype LatLoc = LatLoc LatLocI deriving (Eq,Ord)
), because specialization should not be an issue as I had already given specific signatures to my functions.

Also worth noting is that using the profiling with -O2 compilation makes one thing that inlining (or using a single module) makes the program slower, whereas the opposite is true. I think that the profiling overhead are incorrectly evaluated. I know that with -O2 one cannot expect profiling to be good, but it would be nice if it wouldn't be so misleading

Here some data (obtained with a script that is also in the tar.bz archive)

******** allInOne:
original program, monolithic main computational module
* timings of -O2 executable
7.67user 0.00system 0:07.69elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+894minor)pagefaults 0swaps
* timings of the executable with profiling
       total time  =       15.25 secs   (305 ticks @ 50 ms)
       total alloc = 5,888,786,120 bytes  (excludes profiling overheads)
******** splitModule NoReexport NoInline directives:
split computational module, no export list for split modules
* timings of -O2 executable
10.14user 0.01system 0:10.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+901minor)pagefaults 0swaps
* timings of the executable with profiling
       total time  =       11.85 secs   (237 ticks @ 50 ms)
       total alloc = 5,888,780,912 bytes  (excludes profiling overheads)
******** splitModule Reexport NoInline directives:
computational module, no export list for split modules, old module reexport using export list
* timings of -O2 executable
8.88user 0.00system 0:08.90elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+901minor)pagefaults 0swaps
* timings of the executable with profiling
        total time  =       12.20 secs   (244 ticks @ 50 ms)
       total alloc = 5,888,780,912 bytes  (excludes profiling overheads)
******** splitModule NoReexport Inline directives:
split computational module, no export list for split modules, explicit inline directives
* timings of -O2 executable
6.44user 0.01system 0:06.46elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+895minor)pagefaults 0swaps
* timings of the executable with profiling
       total time  =       18.80 secs   (376 ticks @ 50 ms)
       total alloc = 5,374,883,312 bytes  (excludes profiling overheads)
*************

Fawzi
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Reply via email to