On Monday, 11 June 2012 at 14:48:27 UTC, Dmitry Olshansky wrote:
On 11.06.2012 16:11, Roman D. Boiko wrote:
On Tuesday, 5 June 2012 at 13:27:12 UTC, Roman D. Boiko wrote:
maybe it is better to avoid immutability... or do bulk ins /del before
copy.
Would it be difficult to introduce two optimizations:
* have strings of all lengths above configurable threshold go to the same bucket (is it reasonable?)

The sample I listed did just that: for length >= 64 it goes to bucket of 0-length strings. After all it's your prefix function, with any distribution you like ;)
Great

* have ability to store a checkpoint, then do a series of modifications, and store another checkpoint but reuse data which have not been affected.

yea, that's a rough sketch of what I want to support.
That's wonderful :)

For the second optimization a possible design would be to store internal state as snapshot + diffs, and apply diffs when creating another snapshot. Diff format should allow efficiently performing trie interface operations.

It may be. Diff could be a bunch of pages that are XOR-ed on top of snapshot. Dunno if it's worthwhile trick yet.
It should be, provided that a single insert / delete doesn't affect many pages and size of page is reasonably small. Only need to create pages corresponding to those which are affected; and there is no need to track separate diffs between snapshots, so they can be combined.

Another option might be creating a separate trie for insertions and tracking deletions somehow, provided that tries can be merged efficiently.

Actually, in my case, deletions could be deferred and performed in bulk. OTOH, I will need to count how many times a string is inserted minus number of times it has been deleted. Alternatively, I could just check from time to time which strings are not needed any more (micro-GC :) ).

There are many possible data structures, but the one you mentioned seems to be the most reasonable.

Reply via email to