@mratsim Oh, I forgot Arraymancer always uses OpneMP so you're talking about threads created by it. I don't use OpenMP that often, that's probably why I forgot about it.
Oh, by the way: the elementary operations you mentioned, addition, sum etc, can be easily split into equal chunks. But not so easy for map, the cost of which can depend on the element's value. Would you use dynamic scheduling then? The idea of a ref-counted coarray is bizzare to me, to be honest. Maybe that's partly because I'm used to per-process coarrays and yours are per-thread coarrays. Still, I don't really like the very idea of ref-counted garbage-collection for tensors. Why? Because there is no reason I couldn't have two equally privileged references to the same array inside of the same thread. But you have views for slicing anyway, so why not one array and many views over it? That would probably help to avoid some problems you mentioned. The only thing is it would need proper destructors... OR a python-like with context. We use it for files in Nim anyway, so I can't see why not?