Folks I have code in C which implements a form of discrete event simulator, but optimised for a clocked system in which most objects accept input on the positive edge of a simulated clock, do their appropriate internal computations, and store the result internally. On the negative edge of the clock, they each take their stored internal state and 'send' it on to the appropriate destination object
This is modelled using a linked list of objects, each of which has a phase0 and phase1 function, and traversing the list twice per clock, calling the appropriate function. It all works fine. On a uniprocessor. If we have one processor object and one memory object, with the processor implementing a standard instruction fetch decode implement interpreted loop, and playing with simulated caches, reading or writing on cache misses, we can get 20-30 MIPS on a Mac Mini. So since (much!) more performance is wanted, implementing this for a multiprocessor seems a good idea. Especially since every computer and its dog is multicore. Using go rather than C also sounds like a good idea. So the sketch of the go implementation is that I would have three threads - main, t0, and t1. (more for a real system, but two suffices for explanation) - main sets stuff up, and t0 and t1 do the simulation work - main has to initialise, set up any needed synchronization mechanism, and start t0 and t1 - t0 and t1 wait until main says its ok, then both traverse all objects in the list. t0 runs the function if it's an even numbered object, and t1 if it's an odd-numbered. No mutation of state by concurrent threads. - main loops, as do t0 and t1; t0 and t1 signal that they've finished; when they have, main tells them to start the next traversal So, after a long ramble, given that I am happy to waste CPU time in busy waits (rather than have the overhead of scheduling blocked goroutines), what is the recommendation for the signalling mechanism when all is done in go and everything's a goroutine, not a thread? My guess is that creating specialist blocking 'barriers' using sync/atomic (atomic.Operation seems to be around 4nsec on my Mac Mini) is the highest performance mechanism. There's a dearth of performance information on channel communication, waitgroup, mutex etc use, but those I have seen seem to suggest that sending/receiving on a channel might be over the order of 100nsec; since in C we iterate twice through the list in 30-40nsec, this is a tad high (yes, fixeable by modeling a bigger system, but) I know that premature optimisation is a bad thing, but I'd prefer to ask for advice than try everything.. many thanks for any help -- P -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/5a1d1ccb-26f4-4da0-94fb-679c201782dan%40googlegroups.com.