On Tuesday, 24 September 2013 at 08:46:36 UTC, Dmitry Olshansky wrote:
23-Sep-2013 03:49, Andrei Alexandrescu пишет:
Hello,


I am making good progress on the design of std.allocator, and I am optimistic about the way it turns out. D's introspection capabilities
really shine through, and in places the design does feel really
archetypal - e.g. "this is the essence of a freelist allocator". It's a
very good feeling. The overall inspiration comes from Berger's
HeapLayers, but D's introspection takes that pattern to a whole new level.

Several details are still in flux, but at the top level it seems most
natural to divide things in two tiers:

1. Typed allocator, i.e. every request for allocation comes with the
exact type requested;

2. Untyped allocator - traffics exclusively in ubyte[].


Looks good (s/ubyte[]/void[] per current discussion).

Do you imagine Typed allocators as something more then adapters that simplify a common pattern of allocate + emplace / destroy + deallocate? (create!T/destroy!T)

struct NullAllocator
{
    enum alignment = real.alignof;
    enum size_t available = 0;
    ubyte[] allocate(size_t s) shared { return null; }
bool expand(ref ubyte[] b, size_t minDelta, size_t maxDelta) shared
    { assert(b is null); return false; }
    bool reallocate(ref ubyte[] b, size_t) shared
    { assert(b is null); return false; }
    void deallocate(ubyte[] b) shared { assert(b is null); }
    void collect() shared { }
    void deallocateAll() shared { }
    static shared NullAllocator it;
}

Primitives:


First things first - can't allocator return alignment as run-time value - a property (just like 'available' does)? The implementation contract is that it must be O(1) vanila syscall-free function. (Read this as consult system info exactly once, at construction).

Thinking more of it - it also may be immutable size_t? Then it gets proper value at construction and then is never changed.

* expand(b, minDelta, maxDelta) grows b's length by at least minDelta (and on a best-effort basis by at least maxDelta) and returns true, or does nothing and returns false. In most allocators this should be @safe. (One key insight is that expand() can be made @safe whereas shrink() or realloc() are most often not; such mini-epiphanies are very exciting because they put the design on a beam guide with few principles and many consequences.) The precondition is that b is null or has been previously
returned by allocate(). This method is optional.

Great to see this primitive. Classic malloc-ators are so lame...
(+ WinAPI Heaps fits here)

* deallocate(b) deallocates the memory allocated for b. b must have been previously allocated by the same allocator. This method is usually unsafe (but there are notable implementations that may offer safety,
such as unbounded freelists.) This method is optional.

Does the following implication hold "have a deallocate" --> must be manually managed? Otherwise how would one reliably work with it and not leak? This brings us to traits that allocators may (should) have a-la automatic? free-all on termination? Zeros on allocate (more common then one would think)? etc.

* deallocateAll() deallocates in one shot all memory previously
allocated by this allocator. This method is optional, and when present is almost always unsafe (I see no meaningful @safe implementation.) Region allocators are notable examples of allocators that define this
method.

Okay.. I presume region "mark" works by spiting off a subinstance of allocator and "release" by deallocateAll().


* collect() frees unused memory that had been allocated with this allocator. This optional method is implemented by tracing collectors and
is usually @safe.


This seems hard and/or suboptimal w/o typeinfo --> typed allocators? I guess they could be more then a simple helper.

There are quite a few more things to define more precisely, but this part of the design has become quite stable. To validate the design, I've defined some simple allocators (Mallocator, GCAllocator, Freelist, StackRegion, Region etc) but not the more involved ones, such as
coalescing heaps, buddy system etc.

The problem I see is that allocators are rooted in poor foundation - malloc is so out of fashion (its interface is simply too thin on guarantees), sbrk is frankly a stone-age syscall.

I would love to make a root allocator one on top of mmap/VirtualAlloc. This leads me to my biggest problem with classical memory management - ~20 years of PCs having virtual memory support directly in the CPU, and yet hardly anything outside of OS takes advantage of it. They (allocators) seem simply not aware of it - none of interface implies anything that would help user take advantage of it. I'm talking of (potentially) large allocations that may be often resized.

(large allocations actually almost always a blind spot/fallback in allocators)

To the point - we have address space and optionally memory committed in there, leading us to a 3-state of an _address range_: available, reserved, committed. The ranges of 2nd state are dirt cheap (abundant on 64bit) and could be easily flipped to committed and back.

So I have a pattern I want to try to fit in your design. At present it is a datastructure + allocation logic. What I would want is clean and nice composition for ultimate reuse.

An example is a heavy-duty array (could be used for potentially large stack). The requirements are: a) Never reallocate on resize - in certain cases it may even be too large to reallocate in RAM (!) b) Never waste RAM beyond some limit (a few pages or a tiny faction of size is fine) c) O(1) amortized appending as in plain array and other properties of an array -> it's a dynamic array after all

Currently I use mmap/madvise and related entities directly. It goes as follows.

Allocate a sufficiently large - as large as "it" may get (1Mb, 1G, 10G you name it) _address range_. Call this size a 'capacity'. Commit some memory (optionally on first resize) - call this a 'committed'. Appending goes using up committed space as typical array would do. Whenever it needs more it then that applies the same extending algorithm as usual but with (committed, capacity) pair. Ditto on resize back - (with some amortization) it asks OS to decommit pages decreasing the commited amount.

In short - a c++ "vector" that has 2 capacities (current and absolute maximum) with a plus that it never reallocates. What to do if/when it hits the upper bound - is up to specific application (terminate, fallback to something else). It has the danger of sucking up address space on 32bits though.. but who cares :)



Somewhat related: http://probablydance.com/2013/05/13/4gb-per-vector/

Reply via email to