On Tue, 24 Nov 2009 11:26:11 -0500, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> wrote:
Steven Schveighoffer wrote:
On Tue, 24 Nov 2009 11:01:10 -0500, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> wrote:
Steven Schveighoffer wrote:
[snip]
Andrei has mentioned that he thinks we can store the allocated length
in the GC block, which I think would also work. You also wouldn't
need an MRU cache in that case, but he says it's in *addition* to the
MRU cache, so I'm not sure what he means.
[snip]
Reaching the GC block is relatively expensive, so the MRU still helps.
In essence it's like this. When appending:
a) look up the cache, if there, you have O(1) amortized append that's
really fast
b) if not in the cache, talk to the GC, still O(1) amortized append
that's not as fast
Both help providing an important performance guarantee. I was a bit
worried about guaranteeing "O(1) amortized append for up to 8 arrays
at a time."
Have you considered the performance impact on allocating non-array
types? That is, are you intending on all allocations storing the
allocated length, even class or struct allocations who will likely
never append? Or will there be a "non-appendable" flag?
Also, the part without the MRU cache was my original proposal from
last year, I had some ideas on how length could be stored. For
example, in a page of up to 128 byte blocks, you only need 8 bits to
store the length (alas, you cannot store with 4 bits for 16-byte blocks
because you need to cover both 0 and 16). This reduces the overhead
for those blocks. For 256 byte to 1-page blocks, 16 bits is acceptable
multi-page blocks, the cost of storing a 32-bit integer is negligible.
Having access to the requested length is important at larger lengths, so
probably we could be fine by not actually storing it for allocations up
to 128 bytes.
So in other words, appending any array under 128 bytes which does not
still sit in the MRU cache causes a reallocation? I'm not sure if it's
the right move, but it is an optimization, so maybe it works well. It
also handily fixes the problem of requiring allocated lengths for
non-array allocations since most would be under 128 bytes.
It is true the lookup of the MRU cache will not involve dissecting the
address of the block to find it's container block, but you still will
need a lock, or are you planning on doing something clever? I think
the locking will be the bottleneck, and if you don't make it the same
as the global lock, add the cost of both locks when you actually need
to append.
The cache is a thread-local map from pointers to size_t. Using it does
not require any locking I think.
When reallocating, do you not also need to update the allocated length in
the heap, even if the allocation fits into the same block to prevent other
threads from stomping? Doesn't that require a lock? Somewhere you have
to share the allocated length between threads sharing the same array. I
assumed it was going to be in the MRU cache...
-Steve