[go-nuts] Advice, please

Peter Wilson Wed, 13 Jan 2021 16:57:57 -0800

Folks
I have code in C which implements a form of discrete event simulator, but 
optimised for a clocked system in which most objects accept input on the 
positive edge of a simulated clock, do their appropriate internal 
computations, and store the result internally. On the negative edge of the 
clock, they each take their stored internal state and 'send' it on to the 
appropriate destination object

This is modelled using a linked list of objects, each of which has a phase0
and phase1 function, and traversing the list twice per clock, calling the
appropriate function.

It all works fine. On a uniprocessor. If we have one processor object and
one memory object, with the processor implementing a standard instruction
fetch decode implement interpreted loop, and playing with simulated caches,
reading or writing on cache misses, we can get 20-30 MIPS on a Mac Mini.

So since (much!) more performance is wanted, implementing this for a
multiprocessor seems a good idea. Especially since every computer and its
dog is multicore. Using go rather than C also sounds like a good idea.

So the sketch of the go implementation is that I would have three threads
- main, t0, and t1. (more for a real system, but two suffices for
explanation)
- main sets stuff up, and t0 and t1 do the simulation work
- main has to initialise, set up any needed synchronization mechanism, and
start t0 and t1
- t0 and t1 wait until main says its ok, then both traverse all objects in
the list. t0 runs the function if it's an even numbered object, and t1 if
it's an odd-numbered. No mutation of state by concurrent threads.
- main loops, as do t0 and t1; t0 and t1 signal that they've finished; when
they have, main tells them to start the next traversal

So, after a long ramble, given that I am happy to waste CPU time in busy
waits (rather than have the overhead of scheduling blocked goroutines),
what is the recommendation for the signalling mechanism when all is done in
go and everything's a goroutine, not a thread?

My guess is that creating specialist blocking 'barriers' using sync/atomic
(atomic.Operation seems to be around 4nsec on my Mac Mini) is the highest
performance mechanism. There's a dearth of performance information on
channel communication, waitgroup, mutex etc use, but those I have seen seem
to suggest that sending/receiving on a channel might be over the order of
100nsec; since in C we iterate twice through the list in 30-40nsec, this is
a tad high (yes, fixeable by modeling a bigger system, but)

I know that premature optimisation is a bad thing, but I'd prefer to ask
for advice than try everything..

many thanks for any help

-- P

--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/golang-nuts/5a1d1ccb-26f4-4da0-94fb-679c201782dan%40googlegroups.com.

[go-nuts] Advice, please

Reply via email to