Re: Formal Review of region allocator begins

dsimcha Fri, 09 Sep 2011 05:56:10 -0700

Thanks for the review.  Detailed comments below.

On 9/9/2011 2:38 AM, Martin Nowak wrote:

So I have used RegionAllocator.
First of all, thanks for getting things like these into the std library.
I didn't even know segmented stacks before and they were the perfect
solution to my use case.
Since first reading about this I've used a TLS Appender which I cleared
occasionally.
Performance is top. For me it's on par with statically allocating a
huge chunk and draw from this with memset initalization.


The Use Case:

I have implemented a wavelet rasterizer. It uses a sparse quadtree.
Bezier curves are recursively sliced into each child quadrant.
I lazily allocate 4 child nodes when recursing into a quadrant.
After all bezier curves are added the coverage for pixels can be
calculated and used for blitting.
Afterward the whole tree is disposed.

Details for this pretty neat and simple rasterizer at
http://josiahmanson.com/research/wavelet_rasterization/.

Criticism of regionallocator:

1. Please M-x delete-trailing-whitespace

I am surprised there is any trailing whitespace. I use CodeBlocks whichusually gets rid of it. I think I disabled this because it was showingup in diffs when I edited Phobos modules.

2. Using RAII for the allocator also pushes RAII onto the user.
To avoid having to pass the allocator around I had to add another static
variable holding the allocator.
This doesn't work well, because there is no explicit freeAll function
for the allocator.
Furthermore assigning a newRegionAllocator to an initialized
allocator is not possible as it breaks the LIFO order. So I had
to resort to sAlloc = RegionAllocator.init; sAlloc = newRegionAllocator();.

But RAII makes things simple and relatively safe. What would yousuggest as the alternative?


3. All initialization (newArray, newUnitializedArray, newT) should
not be part of an allocator interface.
We should instead create a templated Initializer or free functions
that will do this using one of the new allocators providing void*
memory.

Unfortunately, this leads to a fundamental disagreement that I have witha large portion of this review. I want RegionAllocator to be high-levelenough to be easy to use right out of the box. You seem to be askingfor something lower level but more composable. The problem is thatlow-level but composable often makes simple things complicated andmessy. Furthermore, if the allocator knows the type being allocated, itcan sometimes take that into account and do "special" things like GCscanning, etc. Similarly, array() takes advantage of knowing theimplementation details of the allocator.


4. I don't appreciate all the bookkeeping involved with every
allocation, mainly just to provide freeLast.
RegionAllocator should be such lightweight that you can use one
for each allocation that you want individual freeing for.

This is also necessary to allow proper freeing of huge allocations(bigger than a segment size). I really don't see much downside giventhat the overhead is small. Furthermore, having tons ofRegionAllocators local to one function would get really messy really fast.


5. Separating the stack from the allocator creates mental/code bloat.
It suggest the wrong association with allocating from the
allocator while in reality your allocating off the stack using
kind of a cookie.
I had a simpler interface in mind https://gist.github.com/1204402.
Can you elaborate a little why/what additional complexity is needed.

Creating a brand-new stack is expensive because it heap allocates andincurs synchronization overhead. The stack is designed to make creatinga new RegionAllocator cheap (i.e. no synchronization overhead or heapallocation in most cases). RegionAllocator mainly provides RAII freeing.


6. Allocating more bytes than the segment size could be an error.
This would reduce the complexity for handling big objects.

I really like the transparent "just works" design for big objects.Again, this goes back to the high-level vs. simple point. You seem tobe calling for the Unix way (everything is simple, stupid andcomposable, implementation simplicity is most important). I'd muchrather make the implementation more complicated but the interfacesimpler/less error prone.


Allocator interface:

7. Instead of using malloc/free you should use an malloc/free version of
the allocator interface.
This way RegionAllocator can be used for other memory (e.g. shared memory).

This might not be a bad idea, especially if I make the malloc/freeallocator the default so most users don't need to think about this.


8. The allocator interface should provide means to do aligned allocations.
As this would arguably complicate implementations alternatively requiring
all allocators to return 16-Byte aligned memory seems reasonable.
Still another possibility is having an enum flag for alignment and
provide an
allocator wrapper that does aligned allocations.


Hmm, I guess my alignedMalloc/alignedFree could be made into LibcAllocator.


9. The allocator interface should have a flag to advice GC range adding.
Could be:
alloc(size_t nbytes, GCScan scan = GCScan.no)

RegionAllocator could deduce it's scan flag from the first use and
enforce it never changes afterwards. I'm not to sure about this,
but requiring the user to add memory to the GC seems error prone and
reduce the design space for allocators.

I don't understand. If they have to set a flag to get it added, then Ifail to see how adding it manually is any more difficult.


10. Using a free list for the segments should be left to a
FreeListAllocator (see 7).

Maybe in the long run once we have a FreeListAllocator, but IMHO this isan implementation detail that can be changed later. I definitely don'twant to expose the free list stuff in the API because I'd rather notcomplicate the interface with these details.


11. I'm really much in favor of using classes (final) for such long
living objects (the stack).
This allow simpler lazy static initialization and has scope new for
stack variables.
But most important you don't need to have unitialized structs. Which at the
end of the day use a ref counted impl (see 2). OTOH classes are more
complicated
for templated decoration.

Hmm, I really like the idea of the stack memory getting freedautomatically and deterministically by ref counting.


12. Ideally RegionAllocator would be AlignedAllocator!(16,
RegionAllocator!(FreeListAllocator!(LibcAllocator))).

The alignment is taken care of internally to RegionAllocator in a waythat a generic AlignedAllocator wouldn't be able to handle asefficiently. FreeListAllocator and the whole free list concept is animplementation detail and shouldn't appear in the API at all. I thinkyou may have a good point about LibcAllocator being a templateparameter, though.



Smaller issues:

- In the constructor of RegionAllocatorStack you should enforce that
segmentSize is bigger than alignBytes instead of only catching 0.


Good point.  Will fix.


- At a quick glance in alignedMalloc you only need to store an offset
smaller than alignBytes instead of a pointer (ubyte will suffice).

Good catch. I'll do this if I make a generic LibcAllocator with thesefunctions, but otherwise the savings are so small (a few bytes on themulti-megabyte allocations that RegionAllocator does) that I'd rathernot change it just due to inertia and risk aversion.


- regionallocator.d(594): idempotent assertion
while (nLargeObjects > 0 && bookkeepIndex > regionIndex) {
assert(bookkeepIndex > regionIndex);
freeLast();
}

Another good catch. This is just cruft from when the second check inthe while statement wasn't there.


- regionallocator.d(939): something is missing/too much
static size_t alignBytes(size_t nBytes) {
return .alignBytes;
}

alignBytes() is a function that is part of the allocator interface..alignBytes is an enum that needs to be visible at the module level foralignedMalloc/alignedFree.

Re: Formal Review of region allocator begins

Reply via email to