Any primitive in `sync` package will do. I would go for two `RWMutex` each 
for each goroutine, or two unbuffered channels for each gorouitne. However, 
AFAIK, in Go you can't force start execution of a goroutine. Go will try to 
wake up any unblocked goroutine as soon as possible though.
On Thursday, January 14, 2021 at 8:57:56 AM UTC+8 Peter Wilson wrote:

> Folks
> I have code in C which implements a form of discrete event simulator, but 
> optimised for a clocked system in which most objects accept input on the 
> positive edge of a simulated clock, do their appropriate internal 
> computations, and store the result internally. On the negative edge of the 
> clock, they each take their stored internal state and 'send' it on to the 
> appropriate destination object
> This is modelled using a linked list of objects, each of which has a 
> phase0 and phase1 function, and traversing the list twice per clock, 
> calling the appropriate function.
> It all works fine. On a uniprocessor. If we have one processor object and 
> one memory object, with the processor implementing a standard instruction 
> fetch decode implement interpreted loop, and playing with simulated caches, 
> reading or writing on cache misses, we can get 20-30 MIPS on a Mac Mini. 
> So since (much!) more performance is wanted, implementing this for a 
> multiprocessor seems a good idea. Especially since every computer and its 
> dog is multicore. Using go rather than C also sounds like a good idea.
> So  the sketch of the go implementation is that I would have three threads 
> - main, t0, and t1. (more for a real system, but two suffices for 
> explanation)
> - main sets stuff up, and t0 and t1 do the simulation work
> - main has to initialise, set up any needed synchronization mechanism, and 
> start t0 and t1
> - t0 and t1 wait until main says its ok, then both traverse all objects in 
> the list. t0 runs the function if it's an even numbered object, and t1 if 
> it's an odd-numbered. No mutation of state by concurrent threads.
> - main loops, as do t0 and t1; t0 and t1 signal that they've finished; 
> when they have, main tells them to start the next traversal
> So, after a long ramble, given that I am happy to waste CPU time in busy 
> waits (rather than have the overhead of scheduling blocked goroutines), 
> what is the recommendation for the signalling mechanism when all is done in 
> go and everything's a goroutine, not a thread?
> My guess is that creating specialist blocking 'barriers' using sync/atomic 
> (atomic.Operation seems to be around 4nsec on my Mac Mini) is the highest 
> performance mechanism. There's a dearth of performance information on 
> channel communication, waitgroup, mutex etc use, but those I have seen seem 
> to suggest that sending/receiving on a channel might be over the order of 
> 100nsec; since in C we iterate twice through the list in 30-40nsec, this is 
> a tad high (yes, fixeable by modeling a bigger system, but)
> I know that premature optimisation is a bad thing, but I'd prefer to ask 
> for advice than try everything..
> many thanks for any help
> -- P

You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
To view this discussion on the web visit

Reply via email to