Re: [9fans] GCC/G++: some stress testing

Paul Lalonde Mon, 03 Mar 2008 18:31:19 -0800

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Mar 3, 2008, at 1:12 AM, Philippe Anel wrote:

So, does this mean the latency is only required by the I/O systemof your program ? If so, maybe I'm wrong, what you need is to beable to interrupt working cores and I'm afraid libthread doesn'thelp here.If not and your algorithm requires (a lot of) fast IPC, maybe thisis the reason why it doesn't scale well ?

No, the whole simulation has to run in the low-latency space - it's avideo game and its rendering engine, which are generally highlyheterogeneous workload. And that heterogeneity means that there aremany points of contact between various subsystems. And the (semi-)real-time constraint means that you can't just scale the problem upto cover overhead costs.

I don't know what you mean by "CSP system itself takes care aboutmemory hierarchy". Do you mean that the CSP implementation doessomething about it, or do you mean that the code using the CSPapproach takes care of it?
Both :)
I agree with you about the fact programming for the memoryhierarchy is way more important than optimizing CPU clocks.But I also think synchronization primitives used in CSP systems arethe main reason why CSP programs do not scale well (excepted baddesigned algorithm of course).I meant that a different CSP implementation, based on differentsynchronisation primitive (IPI), can help here.

I'm more interested just now in working with lock-free algorithms;I've not made any good measurements of how badly our kernels wouldhit channels as the number of threads increases. Perhaps some couldbe mitigated through a better channel implementation.

IPI isn't free either - apart from the OS switch, it generates bustraffic that competes with the cache coherence protocols andmemory traffic; in a well designed compute kernel that saturatesboth compute and bandwidth the latency hiccups so introduced canpropagate really badly.
This is very interesting. For sure IPI is not free. But I thoughtthe bus traffic generated by IPI was less important than cachecoherence protocols such as MESI, mainly because it is a one waymessage.

It depends immensely on the hardware implementation of your IPI. Ifyou wind up having to pay for MESI as well, then the advantagebecomes less.

I think now IPI are sent through the system bus (local APIC used totalk through a separate bus), so I agree with you about the fact itcan saturate the bandwidth. But I wonder if locking primitive arenot worse. It would be interesting to test this.


Agreed!

Paul

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (Darwin)

iD8DBQFHzLSSpJeHo/Fbu1wRAkv/AKDKK4fuuWyYCqXv4JqbWWj+RXQd0wCfSFoS
b9E6X/a13bg6AzUGT5dLSqU=
=ppoF
-----END PGP SIGNATURE-----

Re: [9fans] GCC/G++: some stress testing

Reply via email to