On Tuesday, 30 September 2014 at 15:46:54 UTC, Steven Schveighoffer wrote:
On 9/30/14 10:24 AM, Dicebot wrote:
On Tuesday, 30 September 2014 at 14:01:17 UTC, Steven Schveighoffer wrote:
Assertion passes with D1/Tango runtime but fails with current D2 runtime. This happens because `result.ptr` is not actually a pointer returned by gc_qalloc from array reallocation, but interior pointer 16 bytes from the start of that block. Druntime stores some metadata
(length/capacity I presume) in the very beginning.

This is accurate, it stores the "used" size of the array. But it's
only the case for arrays, not general GC.malloc blocks.

Alternative is to use result.capacity, which essentially looks up the same thing (and should be more accurate). But it doesn't cover the
same inputs.

Why is it stored in the beginning and not in the end of the block (like capacity)? I'd like to explore options of removing interior pointer completely before proceeding with adding more special cases to GC
functions.

First, it is the capacity. It's just that the capacity lives at the beginning of larger blocks.

The reason is due to the ability to extend pages.

With smaller blocks (2048 bytes or less), the page is divided into equal portions, and those can NEVER be extended. Any attempt to extend results in a realloc into another block. Putting the capacity at the end makes sense for 2 reasons: 1. 1 byte is already reserved to prevent cross-block pointers, 2. It doesn't cause alignment issues. We can't very well offset a 16 byte block by 16 bytes. But importantly, the capacity field does not move.

However, for page and above size (4096+ bytes), the original (D1 and early D2) runtime would attempt to extend into the next page, without moving the data. Thus we save the copy of data into a new block, and just set some bits and we're done.

Ah that must be what confused me - I looked at small block offset calculation originally and blindly assumed same logic for other sizes. Sorry, my fault!

But this poses a problem for when the capacity field is stored at the end -- especially since we are caching the block info. The block info can change with a call to GC.extend (whereas a fixed-size block, the block info CANNOT change). Depending on what "version" of the block info you have, the "end" can be different, and you may end up corrupting data. This is especially important for shared or immutable array blocks, where multiple threads could be appending at the same time.

So I made the call to put it at the beginning of the block, which obviously doesn't change, and offset everything by 16 bytes to maintain alignment.

It may very well be that we can put it at the end of the block instead, and you can probably do so without much effort in the runtime (everything uses CTFE functions to calculate padding and location of the capacity). It has been such a long time since I did that, I'm not very sure of all the reasons not to do it. A look through the mailing list archives might be useful.

I think it should be possible. That way actual block size will be simply considered a bit smaller and extending happen before reserved space is hit. But of course I have only a very vague knowledge of druntime ackquired while porting cdgc so may need to think about it a bit more and probably chat with Leandro too :)

Have created bugzilla issue for now : https://issues.dlang.org/show_bug.cgi?id=13558

Reply via email to