Folks
I have code in C which implements a form of discrete event simulator, but 
optimised for a clocked system in which most objects accept input on the 
positive edge of a simulated clock, do their appropriate internal 
computations, and store the result internally. On the negative edge of the 
clock, they each take their stored internal state and 'send' it on to the 
appropriate destination object

This is modelled using a linked list of objects, each of which has a phase0 
and phase1 function, and traversing the list twice per clock, calling the 
appropriate function.

It all works fine. On a uniprocessor. If we have one processor object and 
one memory object, with the processor implementing a standard instruction 
fetch decode implement interpreted loop, and playing with simulated caches, 
reading or writing on cache misses, we can get 20-30 MIPS on a Mac Mini. 

So since (much!) more performance is wanted, implementing this for a 
multiprocessor seems a good idea. Especially since every computer and its 
dog is multicore. Using go rather than C also sounds like a good idea.

So  the sketch of the go implementation is that I would have three threads 
- main, t0, and t1. (more for a real system, but two suffices for 
explanation)
- main sets stuff up, and t0 and t1 do the simulation work
- main has to initialise, set up any needed synchronization mechanism, and 
start t0 and t1
- t0 and t1 wait until main says its ok, then both traverse all objects in 
the list. t0 runs the function if it's an even numbered object, and t1 if 
it's an odd-numbered. No mutation of state by concurrent threads.
- main loops, as do t0 and t1; t0 and t1 signal that they've finished; when 
they have, main tells them to start the next traversal

So, after a long ramble, given that I am happy to waste CPU time in busy 
waits (rather than have the overhead of scheduling blocked goroutines), 
what is the recommendation for the signalling mechanism when all is done in 
go and everything's a goroutine, not a thread?

My guess is that creating specialist blocking 'barriers' using sync/atomic 
(atomic.Operation seems to be around 4nsec on my Mac Mini) is the highest 
performance mechanism. There's a dearth of performance information on 
channel communication, waitgroup, mutex etc use, but those I have seen seem 
to suggest that sending/receiving on a channel might be over the order of 
100nsec; since in C we iterate twice through the list in 30-40nsec, this is 
a tad high (yes, fixeable by modeling a bigger system, but)

I know that premature optimisation is a bad thing, but I'd prefer to ask 
for advice than try everything..

many thanks for any help

-- P


-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/5a1d1ccb-26f4-4da0-94fb-679c201782dan%40googlegroups.com.

Reply via email to