often i try to implement an append-only random-access data structure. it's a fun puzzle for my confused mind. i never finish it, but today i thought about it a little bit and i think over the years i might be able to put together a simplified form and maybe make it exist some day.
here is where i wrote right now. i started spinning out when I renamed Simple.Write to Simple.RangedData . I was in the middle of implementing a leaf iterator, so as to make a simple way to count the number of data leaves and associate them with a depth. the approach, as written in the file, is to simply relink to leaves with excessive depth each flush, so as to maintain O(log n) random seek time. i vaguely recall that approach has a major issue, but i might be remembering wrongly, and it's still quite exciting to be able to spend some minutes trying.
# block tree structure # blocks written to are leaves # depth is log2 leaves # when a flush is made, all blocks are written, and also enough nodes such that every leaf can be accessed within depth lookups. # consider we have an existing tree # with say m flushes, containing n leaves (or m leaves). we'll likely call it n. # each flush shows which leaves it has # additionally, with the final flush, each leaf has an existing depth. # when we reflush, we need to provide a new index for any leaves that become too deep. # which leaves are too deep? # we could basically walk them all to find out. this would be a consistent first approach. class Simple: class RangedData: def __init__(self, offset, data): self.start = offset self.data = data self.end = self.start + len(self.data) class Flush: # flush has a list of new leaves, and a list of indexes to old leaves with ranges def __init__(self, *writes, prev_flush=None): self.prev_flush = prev_flush self.data = writes # find extents start = min((write.start for write in self.data)) end = max((write.end for write in self.data)) if prev_flush is None: self.start = start self.end = end self.index = [] return self.start = min(start, prev_flush.start) self.end = max(end, prev_flush.end) self.index = [(prev_flush.start, prev_flush.end, prev_flush)] # find leaf count and leaf depths offset = start while offset < end: #def lookup(self, offset def leaves(self, start = None, end = None): offset = self.start data_iter = iter(self.data) index_iter = iter(self.index) next_write = next(data_iter, None) next_index = next(index_iter, None) while offset < self.end: if next_write is not None: assert offset <= next_write.start if offset == next_write.start: yield (0, next_write) continue def __init__(self, latest = None): self.latest = latest self.pending = [] def write(self, offset, data): self.pending.append((offset, data)) def flush(self): self.latest = self.Flush(*self.pending, prev_flush=self.latest) self.pending = []