Turns out the single-ref-destructive-update optimizations are more
difficult to accomplish, even with a micro-gc. The problem is that
refs from the stacks aren't counted.
To explain, let's start with the case where we want to keep the
optimization pike currently does:
array a = ({0});
for (int i = 1; i < 1000; i++)
a += ({i});
(To make the example simpler, I start out with the array ({0}) instead
of the empty array ({}) since the empty array always is shared.) Here
pike can be destructive on the array in a since there are no other
refs to it, and hence it can overallocate it and append new elements
to the end efficiently.
With the micro-gc, we can tell in `+= that there are no refs to the
array from anywhere but the stack. Now consider this case:
array a = ({0});
array b = a;
for (int i = 1; i < 1000; i++)
a += ({i});
After this, one of course expects a = ({0, 1, 2, ...}) and b = ({0}).
But since we don't count refs from the stack then we don't see the
second ref to the array and might therefore think it's safe to be
destructive, but that would make b = ({0, 1, 2, ...}) too.
So to use the single-ref-destructive-update optimizations we have to
continue to count refs from the stacks. Or at least the pike svalue
stack to cope with the situation above, but there might be situations
when the same occurs with refs from the C stack too.
This is a problem, because I'm convinced there is a _big_ performance
boost in not counting stack refs. So I wonder, is there a way to cope?
One alternative is to extend the array type to explicitly allow adding
elements to the end destructively. It could perhaps look like this:
a[sizeof (a)] = i;
I.e. an indexed assignment to one step past the end. The rationale is
that []= already is destructive, and this is the same sort of
assignment, even though a new slot has to be created in the process.
It doesn't imply that arbitrary elements past the end or before the
beginning can be created the same way.
I'd like an extension like that anyway, because destructive appends
are useful regardless of the amount of refs (and whether the array is
thread local or not). It's also nice to be explicit about it rather
than relying on an optimization.
The problem is of course that old pike code that previously was O(n)
suddenly becomes O(n^2) until it's updated. It doesn't misbehave
though.
Of course, there are many more single-ref-destructive-update
optimizations, but this is by far the most common for which there
currently is no way to be explicitly destructive.