Re: Request for pre-review: std.serialization/orange

Robert Jacques Mon, 03 Oct 2011 22:25:42 -0700

On Mon, 03 Oct 2011 14:10:52 -0400, Jacob Carlborg <d...@me.com> wrote:

On 2011-10-03 15:57, Robert Jacques wrote:

So, in essence, you are saying that by the time archiving occurs,
isSliceOf will always return false? Then why is it part of the public API?


No, I'm not saying that. Example:

struct Array
{
        void* ptr;
        size_t length;
        size_t elementSize;
        
        bool isSliceOf (Array b)
        {
                return ptr >= b.ptr && ptr + length * elementSize <= b.ptr + 
b.length
* b.elementSize;
        }
}

void main ()
{
        auto a = [0, 1, 2, 3, 4];
        auto b = a[2 .. 4];
        
        auto aArr = Array(a.ptr, a.length, 4);
        auto bArr = Array(b.ptr, b.length, 4);
        
        assert(bArr.isSliceOf(aArr));
        assert(!aArr.isSliceOf(bArr));
}

Both the asserts in the above code passes as expected. See, no
serialization or archiving in sight. Is there something I'm missing?


That putting isSliceOf in the public API, implies its usage by the archiver.

Actually it does not need to be part of the public API when I think
about it. I can move it into Serializer. Array would still need to be
public since both Serailzer and Archive need access to it and the
package attribute doesn't work very well.


Regarding design, I agree, although I'd go one further and define Array as a 
public type inside the Serializer class. However, this concept of an 'Array' is 
fundamentally flawed. Consider:

auto c = a[1..3];

auto cArr = Array(c.ptr,c.length,4);

assert(!cArr.isSliceOf(bArr));
assert(!bArr.isSliceOf(cArr));

// and

b ~= 5;

bArr = Array(b.ptr, b.length, 4);

assert(!bArr.isSliceOf(aArr));

In short, a serializer must be capable of handling overlapping arrays, not just 
strict slices. The array representation therefore needs to parameterize both 
the underlying array common to all slices and the actual slice that was 
serialized for that key.

The solution, in my mind, is to think in terms of memory blocks/chucks.
Every reference can be thought as pointing to a memory chunk defined by
two pointers and a flag:

{
void* head; // Start of the memory chunk
void* tail; // End of the memory chunk
bool hasAliases; // True if there are more than one reference to this
chunk
}

For alias detection / resolution, you build a balanced tree of memory
chunks, widening the chunk and flagging hasAliases as appropriate. Which
should give you O(N log(N)) performance.


I'm not sure I understand. That would require that the arrays are stored
in a continues block of memory? Won't "head" and "tail" always point to
start of the array and the end of the array?


Most of the time yes. But not all of the time. It would be fair to say
that 'head' and 'tail' would be inside the GC memory region of the array
and are <= or >= of the start/end of an array, respectively. More
importantly, both objects and pointers would resolve to memory chunks as
well and be included in the alias resolution algorithm.


Now I think I start to understand.

I'm assuming you
currently separate object and array alias resolution and don't handle
pointers at all.


Yes. Pointers are handled as well. It's handled similar to
arrays/slices. First it always serializes what the pointer points to.
Then in a post process step, after all serialization is done, it
replaces all serialized pointers, that points to a value that has been
serialized, with a reference. If a given pointer doesn't point to a
value that has be serialized it's left as is.


Can a pointer point to the interior of an object? To an element of an array?

I can see if I this memory chunk approach can be used instead. How will
this be used with the balanced tree, could you give a simple example?


Well, balanced trees need a comparison function so:

struct Node {
    void* head; // Start of the memory chunk
    void* tail; // End of the memory chunk
    bool hasAliases; // True if there are more than one reference to this chunk
    //... Other meta-data, i.e. ID,

    int opCmp(const ref Node b) {
        if(  tail < b.head) return -1;
        if(b.tail <   head) return 1;
        return 0;
    }
}

On equality / assignment, one just has to combine the heads and tail with 
min/max, respectively, and update hasAliases, etc. The difficulty is when a new 
node 'bridges the gap' between two existing nodes. This has to handled 
explicitly as part of the tree re-balancing, but you may want to consider 
making the merging of nodes part of the comparison operator for 
simplicity/efficiency:

head = min(head,b.head);
tail = max(tail,b.tail);
hasAliases = true;

After pruning, updating meta-data, etc, the aliased memory chunk for any given 
pointer can be found using a separate comparison operator:

    int opCmp(const ref void* b) {
        if(tail < b   ) return -1;
        if(b    < head) return 1;
        return 0;
    }

// Which you'd probably use like:

if( auto node = arr.ptr in setOfAliases ) {
    auto offset = arr.ptr - node.head;
    //...
} else {//...


So at the pseudo-code level, it would be something like:

foreach(obj; [objects,pointers,arrays]) {
   auto node = Node(obj);
   setOfAliases.insert(node);
   // Convert obj into an intermediate form
}

setOfAliases.pruneUnaliasedNodes;
setOfAliases.postProcessMetadata; // i.e. assign unique alias ID, etc

foreach(obj; [objects,pointers,arrays]) {
    if( auto node = arr.ptr in setOfAliases ) {
        auto offset = arr.ptr - node.head;
        // Perform an aliased archival
    } else {
        // Perform an unaliased archival
    }
}

I hope that helped, though I'm not sure if I really answered your question or 
not.

Re: Request for pre-review: std.serialization/orange

Reply via email to