Dave

Thanks for thoughts.

My intent is to use non-custom barriers and measure scalability, customising if 
need be. Not busy-waiting the go routines leaves processor time for other stuff 
(like visualization, looking for patterns, etc)

There’s also the load-balancing effect; code pretending to be a core fetching 
an instruction from an icache and waiting for registers to be available has 
longer path-length than a simple memory access, so one has to avoid the whole 
set waiting on the longest pole. Simulating vector operations brings with it 
similar problems.

The ‘batch the instructions’ thing is standard practice even in sequential 
simulators, where the overheads of walking the list, finding the data structure 
for the core, and fetching an instruction are sufficient that if you let the 
simulated core fetch-and-execute 10, 100, or 1000 instructions you’ll get 
usefully improving performance.

This has to be traded off agains the fact that multicore computer systems have 
lots of contention, and if you batch stuff up you risk losing sight of the 
details of that contention (which means the utility of the model can be iffy). 
This can be attacked by ‘variable batches’ and letting other engines catch up, 
etc etc, but it all makes it more complicated (which is unwise). So assuring 
oneself of good scalability with instruction at a time is good ue diligence.

IF this works as desired, it should turn out to be open source. I’m a novice at 
writing go, so first pass will be as simple as practical rather than polished 
idiomatic go

I’m also considering making a ‘pure go’ version - a goroutine is an object - 
one to one. They’re clocked. They communicate by channels. This would be for 
folk who want to ‘model a system architecture’ more or less directly dropping 
their vision into go, rather than having to create weird data structures. Given 
such a thing, it would not be infeasible to create the faster (well, hopefully) 
model automagically. [Did this way back in the early 90’s, using a homebrew 
language which introduced the idea of ‘clocked channels'. Unfortunately I 
wasn’t very good at writing compilers and the tool was… unstable]

— P

> On Jan 14, 2021, at 9:33 AM, David Riley <fraveyd...@gmail.com> wrote:
> 
> On Jan 13, 2021, at 7:21 PM, Peter Wilson <peter.wil...@bsc.es> wrote:
>> So, after a long ramble, given that I am happy to waste CPU time in busy 
>> waits (rather than have the overhead of scheduling blocked goroutines), what 
>> is the recommendation for the signalling mechanism when all is done in go 
>> and everything's a goroutine, not a thread?
> 
> This is similar to something I'm working on for logic simulation, and I'd 
> been thinking about the clocked simulation as well.  I'll be interested in 
> your results; since I'm also considering remote computation (and GPU 
> computation, which might as well be remote) I'm currently going with the idea 
> of futures driven by either channels or sync.Cond.  That may not be as 
> efficient for your use case.
> 
>> My guess is that creating specialist blocking 'barriers' using sync/atomic 
>> (atomic.Operation seems to be around 4nsec on my Mac Mini) is the highest 
>> performance mechanism. There's a dearth of performance information on 
>> channel communication, waitgroup, mutex etc use, but those I have seen seem 
>> to suggest that sending/receiving on a channel might be over the order of 
>> 100nsec; since in C we iterate twice through the list in 30-40nsec, this is 
>> a tad high (yes, fixeable by modeling a bigger system, but)
> 
> My advice would be to implement the easiest method possible that's not likely 
> to box you in and profile it and see where your bottlenecks are.  In my case, 
> so far, the delays introduced by IPC mechanisms (and also allocations) is 
> absolutely dwarfed by just the "business logic" crunching the logical 
> primitives.  So far it's not worth trying to improve the IPC on the order of 
> nanoseconds (would be a nice problem to have) because the work done in each 
> "chunk" is big enough that it's not worth worrying about.
> 
> This also leads me to the next part, which is that if you have lots of little 
> operations and you're worried about the time spent on IPC for each little 
> thing, you'll probably get the easiest and best performance gains by trying 
> to batch them so that you can burn through lots of similar operations at once 
> before trying to send a slice over a channel or something.
> 
> As always, do a POC implementation and then profile it. That's the only 
> productive way to optimize things at this scale, and Go has EXCELLENT 
> profiling capabilities built in.
> 
> 
> - Dave
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "golang-nuts" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/golang-nuts/yqxfGIGDKr4/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/golang-nuts/24427C92-66CF-4515-ADB4-A3E96059380C%40gmail.com.



WARNING / LEGAL TEXT: This message is intended only for the use of the 
individual or entity to which it is addressed and may contain information which 
is privileged, confidential, proprietary, or exempt from disclosure under 
applicable law. If you are not the intended recipient or the person responsible 
for delivering the message to the intended recipient, you are strictly 
prohibited from disclosing, distributing, copying, or in any way using this 
message. If you have received this communication in error, please notify the 
sender and destroy and delete any copies you may have received. 

http://www.bsc.es/disclaimer 






http://bsc.es/disclaimer

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/FBC7B5C2-EADA-4739-AE64-2B7EEA2508A4%40bsc.es.

Reply via email to