Re: In-place extension of arrays only for certain alignment?

Steven Schveighoffer via Digitalmars-d-learn Wed, 17 Aug 2022 18:37:13 -0700

On 8/17/22 2:40 PM, Ali Çehreli wrote:

On 8/16/22 19:33, Steven Schveighoffer wrote:
Using a 16-byte block sounds like a good strategy at first becausenobody knows whether an array will get more than one element.
However, if my guess is correct (i.e. the first element of size of16-bytes is placed on a 16-byte block), then the next allocation willalways allocate memory for the second element.

A 16-byte element size must be put in a 32-byte block, you still needone byte for the metadata.

One might argue that dynamic arrays are likely to have more than asingle element, so the initial block should at least be twice theelement size. This would cut memory allocation by 1 count for allarrays. And in my case of 1-element arrays, allocation count would behalved. (Because I pay for every single append right now.)

So yes, if you have a 32-byte block for 16-byte elements, it means youcan only fit one element in the block. If you are using a sliding windowapproach, where you remove the first element and then append another,you will in effect reallocate on every append.

Using the append/popFront mechanism to implement your sliding window isgoing to perform badly. Appending is not designed to make this situationperform well.

That all makes sense. I didn't think the meta data would be at the endbut I sense it's related to the "end slice", so it's a better placethere. (?)

It's for alignment. If I put 1 byte at the front, this means I have toalways skip 7 or 15 more bytes (depending on alignment requirements).

BUT, I put the metadata at the front on big (page+ size) blocks, becauseI can both afford to skip 16 bytes in a block of 4096, and if you extendthe block, there is no need to move the metadata later. Consider thatthe metadata lookup cache could be out of date if it had to move.

 > What is your focus? Why do you really want this "optimization" of gluing
 > together items to happen?
This is about what you and I talked about in the past and something Imentioned in my DConf lightning talk this year. I am imagining aFIFO-like cache where elements of a source range are stored. There is asliding window involved. I want to drop the unused front elementsbecause they are expensive in 2 ways:
1) If the elements are not needed anymore, I have to move my sliceforward so that the GC collects their pages.
2) If they are not needed anymore, I don't want to even copy them to anew block because this would be expensive, and in the case of aninfinite source range, impossible.

Ah! I actually have a solution for this in iopipe -- a ring buffer.Basically, you map the same physical pages of memory sequentially. Itallows you to simply change the pointer, and never need to copy anything.

See this code for an example (I only have it for Posix, but Windows hassimilar features, I have to add them):https://github.com/schveiguy/iopipe/blob/6a8c10d2858f92978d72c55eecc7ad55fcc207e2/source/iopipe/buffer.d#L306

The question is when to apply this dropping of old front elements. WhenI need to add one more element to the array, I can detect whether this*may* allocate by the expression 'arr.length == arr.capacity' but notreally though, because the runtime may give me adjacent room withoutallocation. So I can delay the "drop the front elements" computationbecause there will be no actual copying at this time.

Even if you could do this, this doesn't help because at the end of thepool, you need to reallocate into a another pool (or back to thebeginning of the pool), because there can be no free pages after thelast page in the pool (you can't merge pools together).

 > https://dlang.org/phobos/core_memory.html#.GC.extend
Ok, that sounds very useful. In addition to "I can detect when it *may*allocate", I can detect whether there is adjacent free room. (I can askfor just 1 element extension; I tested; and it works.) (I guess thisGC.extend has to grab a GC lock.)
However, for that to work, I seem to need the initial block pointer thatthe GC knows about. (My sliding array's .ptr not work, so I have to savethe initial arr.ptr).


You can get this via `GC.query`, but that means 2 calls into the GC.

Conclusion:
1) Although understanding the inner workings of the runtime is veryuseful and core.memory has interesting functions, it feels too muchfragile work to get exactly what I want. I should manage my own memory(likely still backed by the GC).
2) I argue that the initial allocation should be more than 1 element sothat we skip 1 allocation for most arrays and 50% for my window-of-1sliding window case.


So 2 things here:

1. I highly recommend trying out the ring buffer solution to see if ithelps. The disadvantage here is that you need to tie up at least a pageof memory.2. All my tests using the ring buffer showed little to no performanceimprovement over just copying back to the front of the buffer. Soconsider just copying the data back to the front of an already allocatedblock.

IIRC, your data does not need to be sequential in *physical memory*,which means you can use a ring buffer that is segmented instead ofvirtually mapped, and that can be of any size.


-Steve

Re: In-place extension of arrays only for certain alignment?

Reply via email to