Bill Baxter wrote:
On Tue, Nov 24, 2009 at 8:26 AM, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> wrote:
Steven Schveighoffer wrote:
On Tue, 24 Nov 2009 11:01:10 -0500, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> wrote:

Steven Schveighoffer wrote:
[snip]
Andrei has mentioned that he thinks we can store the allocated length in
the GC block, which I think would also work.  You also wouldn't need an MRU
cache in that case, but he says it's in *addition* to the MRU cache, so I'm
not sure what he means.
[snip]

Reaching the GC block is relatively expensive, so the MRU still helps. In
essence it's like this. When appending:

a) look up the cache, if there, you have O(1) amortized append that's
really fast

b) if not in the cache, talk to the GC, still O(1) amortized append
that's not as fast

Both help providing an important performance guarantee. I was a bit
worried about guaranteeing "O(1) amortized append for up to 8 arrays at a
time."
Have you considered the performance impact on allocating non-array types?
 That is, are you intending on all allocations storing the allocated length,
even class or struct allocations who will likely never append?  Or will
there be a "non-appendable" flag?

Also, the part without the MRU cache was my original proposal from last
year, I had some ideas on how length could be stored.  For example, in a
page of up to 128 byte blocks, you only need 8 bits to store the length
(alas, you cannot store with 4 bits for 16-byte blocks because you need to
cover both 0 and 16).  This reduces the overhead for those blocks.  For 256
byte to 1-page blocks, 16 bits is acceptable multi-page blocks, the cost of
storing a 32-bit integer is negligible.
Having access to the requested length is important at larger lengths, so
probably we could be fine by not actually storing it for allocations up to
128 bytes.

It is true the lookup of the MRU cache will not involve dissecting the
address of the block to find it's container block, but you still will need a
lock, or are you planning on doing something clever?   I think the locking
will be the bottleneck, and if you don't make it the same as the global
lock, add the cost of both locks when you actually need to append.
The cache is a thread-local map from pointers to size_t. Using it does not
require any locking I think.

Wait, what happens if you're sharing an array between two different threads?

The cache does not work with shared arrays, and threads cannot share non-shared arrays.


Andrei

Reply via email to