Hi,
On experimenting with concurrency, I got somewhat surprising result. I
have a program ('flower') that reads an input file and generates one or
more output files. As the different output files are independent, I
construct one IO action for each output requested on the command line,
and just forkIO one thread for each. This seems to work fairly well on one
CPU, so I decided to try on multiple CPUs, using +RTS -N. To my
surprise, this made the program take several times longer (wall clock).
So I tried -N2 and -N4 to see how that turned out. Results are, in
minutes:
format def -N2 -N4 -N
i 0 0 0 0
q 2 2 213
f 2 5 114
h 810 227
s261711 -
T372316 -
F473842 -
CPU 2543u 3027u (lost) -
36s 593s
92% 161%
I had to terminate -N after five hours wall time and 158026.68s user 60682.76s
system 1210% CPU.
So, well, it seems this scales okay as long as there are enough threads,
but scales *horribly* when you run more threads than processes. Is this
a correct assessment? Would it make sense to simply cap -N to the
number of forked threads? I guess I should try this with GHC7, but is there
reason to believe it will perform better?
-k
--
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe