Hello Jürgen, I don't know if you have given this issue any thought, but it has certainly occupied my mind for the last few days.
It's clear that heavy array processing does far too much cloning than should be necessary. Especially in cases where you have lots of operations on smaller arrays (as opposed to few operations on large arrays). This is because the code always performs a clone prior to a destructive operation because at that point it can never be sure that the array will not be used again. The (very few) exceptions to this is handled by the "temp" system, which is really only used in the ravel function. The way I see it, there are two approaches to solving this: The first one being to go through the code with a fine-toothed comb and implement the temp system everywhere. This is lots of work and is error-prone. The other solution would be to re-engineer the Value class so that it can share underlying data with other Value instances. The Value class would then get a counter called share_count or something, indicating how many other references there are to the same data. When clone() is called, it will simply create a new Value instance, share the underlying data and increase share_count. Any destructive operations would only copy the content if it's not already shared. *Now, Value_P already implements some of these semantics. Could it be reused for this?* While appealing, this copy-on-write solution might not be perfect though. The assumption is that the caller would have decremented the share count (effectively "releasing" the Value) before the called function tries to modify it. Now, this "release" is similar to the the "temp" marking. Could there be a better way? I've been experimenting with this on and off, and my interim results (as I've mentioned before) show that the potential performance boosts are massive. We're talking about 3-4 orders of magnitude. Definitely worth quite a lot of effort, IMHO. I'm likely going to continue pursuing this, because I personally feel frustrated when I do something and it's not instantaneous, knowing that I'm waiting for the interpreter to perform unnecessary operations. :-) What are your thoughts on this? What would be the best approach? Regards, Elias