On Tuesday, January 20, 2004, at 07:56 , [EMAIL PROTECTED] wrote:

I believe that parrot already has the concept of memory pools in it's memory management. The idea is that by allocating similarly sized objects from separate (large) allocations, you can reduce the fragmentation of chunks and reduce the incidences where the memory need to be GC'd and compacted.

Don't presume that. General purpose memory allocators tend to handle this scenario... generically. Use of custom allocators is usually a premature optimization; the best generic memory allocators are very, very good.


But the ultimate effect of the overuse of custom allocators or pools is to move memory fragmentation to a higher level, preventing the generic allocator from addressing it even when the memory allocation patterns would otherwise allow it.

If the allocation of pools, and the allocation of bit-of-a-pool, are macroised, it makes it possible for OS's where there are multiple APIs for memory allocation to bypass the CRT memory allocation routines and use which ever native APis are best suited for the type of allocation.

Would much rather see them abstracted, as they are now, beneath an API. This reduces the number of ways in which compiled parrot extensions can be incompatible with the parrot core.


Personally, I would like to see memory allocation for each class type be managed by the class constructor itself. This would theoretically allow each class that has a fixed instance size to manage it's own pool on OS's where that makes sense. The class would allocate a pool for itself when loaded and then allocate instances from that pool on new() and deallocate upon DESTROY. If it's memory pool was exhausted when new was called, it would invoke the GC on *it's pool only*.

A PMC class is free to do precisely that. Only the PMC header cannot be thus managed, and that's already pooled.


This separation would mean that each run of the GC would have a much smaller pool of memory to compact and garbage collect when it was invoked.

This is false. The mark phase will still need to run over the entire process, else it cannot detect all references into the pool. (Unless you can provide a type-safety guarantee that your type can only be referenced by other instances of itself. In which case, your type is necessarily "garbage" and the best optimization strategy would be dead-code elimination. :)


I predict the best overall throughput would be with the sweep or copy phase run immediately after the mark phase, across the entire process: It would be wasteful to do all that marking and yet leave known garbage uncollected.

It would also be less likely to be called, as each allocation from a pool of fixed sized sub allocations will only ever need to call the GC when it's pool is entirely exhausted.

Statistically, multiple pools will individually exhaust themselves *MORE* frequently than a global pool. The ideal case for collection frequency is that there is only one pool, or that all pools rise to capacity concurrently. In these ideal cases, the segregated pools will exhaust themselves precisely *AS* frequently as a global pool. In any case, there is no possibility for a decrease in collection events through the use of pooled memory. As per above, collection events will also not become less expensive. Thus, expect garbage collection to have a negative net impact on performance if pooled memory is used.


But that is a radical departure :), so if would just like to see separate calls for pool allocation/reallocation and element allocation/reallocation, rather than having calls to malloc() scattered through out the codebase.

Uh. See memory.c and smallobject.c...



Digression:


Most copying collectors avoid free space fragmentation by design. (That's one of their benefits.) An allocator as might be employed by a standard compacting, generational system (Java, MS CLR), might resemble the following:

#define SYS_ALIGN sizeof(double) /* for instance */

struct generation {
    volatile void *next;
    void *top;
    void *base;
}

struct generation *generation_init(size_t capacity) {
struct generation *g = (struct generation *) malloc(sizeof(struct generation));
g->next = g->base = malloc(capacity);
g->top = g->base + capacity;
}


void* generation_alloc(struct generation* p, size_t len) {
    len = (len + SYS_ALIGN - 1) & ~(SYS_ALIGN - 1); /* round up */
    do {
        void *ptr = ATOMIC_ADD(&p->next, len);
        if (ptr <= p->top) {
            return ptr - len; /* break the loop */
        }
        gc();
    } while (true); /* try again */
}

+ No free space fragmentation at all.
+ Super-duper fast.
+ Threadsafe.
+ Lock-free unless ATOMIC_ADD cannot be implemented without a mutex.
- There is copying overhead when the generation is exhausted.
- One could say that the generation is fragmented by garbage.
+ It is no more fragmented by garbage than a GC system which uses a freelist allocator.




Gordon Henriksen
[EMAIL PROTECTED]

Reply via email to