Any primitive in `sync` package will do. I would go for two `RWMutex` each for each goroutine, or two unbuffered channels for each gorouitne. However, AFAIK, in Go you can't force start execution of a goroutine. Go will try to wake up any unblocked goroutine as soon as possible though. On Thursday, January 14, 2021 at 8:57:56 AM UTC+8 Peter Wilson wrote:
> Folks > I have code in C which implements a form of discrete event simulator, but > optimised for a clocked system in which most objects accept input on the > positive edge of a simulated clock, do their appropriate internal > computations, and store the result internally. On the negative edge of the > clock, they each take their stored internal state and 'send' it on to the > appropriate destination object > > This is modelled using a linked list of objects, each of which has a > phase0 and phase1 function, and traversing the list twice per clock, > calling the appropriate function. > > It all works fine. On a uniprocessor. If we have one processor object and > one memory object, with the processor implementing a standard instruction > fetch decode implement interpreted loop, and playing with simulated caches, > reading or writing on cache misses, we can get 20-30 MIPS on a Mac Mini. > > So since (much!) more performance is wanted, implementing this for a > multiprocessor seems a good idea. Especially since every computer and its > dog is multicore. Using go rather than C also sounds like a good idea. > > So the sketch of the go implementation is that I would have three threads > - main, t0, and t1. (more for a real system, but two suffices for > explanation) > - main sets stuff up, and t0 and t1 do the simulation work > - main has to initialise, set up any needed synchronization mechanism, and > start t0 and t1 > - t0 and t1 wait until main says its ok, then both traverse all objects in > the list. t0 runs the function if it's an even numbered object, and t1 if > it's an odd-numbered. No mutation of state by concurrent threads. > - main loops, as do t0 and t1; t0 and t1 signal that they've finished; > when they have, main tells them to start the next traversal > > So, after a long ramble, given that I am happy to waste CPU time in busy > waits (rather than have the overhead of scheduling blocked goroutines), > what is the recommendation for the signalling mechanism when all is done in > go and everything's a goroutine, not a thread? > > My guess is that creating specialist blocking 'barriers' using sync/atomic > (atomic.Operation seems to be around 4nsec on my Mac Mini) is the highest > performance mechanism. There's a dearth of performance information on > channel communication, waitgroup, mutex etc use, but those I have seen seem > to suggest that sending/receiving on a channel might be over the order of > 100nsec; since in C we iterate twice through the list in 30-40nsec, this is > a tad high (yes, fixeable by modeling a bigger system, but) > > I know that premature optimisation is a bad thing, but I'd prefer to ask > for advice than try everything.. > > many thanks for any help > > -- P > > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/6f3d412d-85a1-4bbb-aaab-cc31a41cdaben%40googlegroups.com.