Paul Lalonde wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
CSP doesn't scale very well to hundreds of simultaneously executing
threads (my claim, not, as far as I've found yet, anyone else's). It
is very well suited to a small number of threads that need to
communicate, and as a model of concurrency for tasks with few points
of contact. For performance, the channel locks become a bottleneck as
the number of cores scale up. As far as expressiveness, there are
still issues with composability and correctness as the number of
threads interacting increases. Yes, you at least get local stacks,
but the work seems to get exponentially harder as the number of
systems in the simulation (um, game engine) increases.
Interesting.
I agree with you, taking care about memory hierarchy is becoming very
important. Especially if you think about the upcoming NUMAcc systems
(Opterons are already there though).
But the fact is doesn't scale well is not about CSP itself, but the way
it has been implemented.
If CSP system itself takes care about memory hierarchy and uses no
synchronisation (using IPI to send message to another core by example),
CSP scales very well.
Of course IPI mechanism requires a switch to kernel mode which costs a
lot. But this is necessary only if the destination thread is running on
another core, and I don't think latency is very important in algorigthms
requiring a lot of cpus.
What do you think ?
Phil;