Re: Making generalized Trie type in D

Roman D. Boiko Mon, 11 Jun 2012 09:48:24 -0700

On Monday, 11 June 2012 at 14:48:27 UTC, Dmitry Olshansky wrote:

On 11.06.2012 16:11, Roman D. Boiko wrote:
On Tuesday, 5 June 2012 at 13:27:12 UTC, Roman D. Boiko wrote:
maybe it is better to avoid immutability... or do bulk ins/del before
copy.
Would it be difficult to introduce two optimizations:
* have strings of all lengths above configurable threshold goto the same bucket (is it reasonable?)
The sample I listed did just that: for length >= 64 it goes tobucket of 0-length strings. After all it's your prefixfunction, with any distribution you like ;)

Great

* have ability to store a checkpoint, then do a series ofmodifications,and store another checkpoint but reuse data which have notbeen affected.
yea, that's a rough sketch of what I want to support.

That's wonderful :)

For the second optimization a possible design would be tostore internal state as snapshot + diffs, and apply diffs whencreating another snapshot. Diff format should allowefficiently performing trie interface operations.
It may be. Diff could be a bunch of pages that are XOR-ed ontop of snapshot. Dunno if it's worthwhile trick yet.

It should be, provided that a single insert / delete doesn'taffect many pages and size of page is reasonably small. Only needto create pages corresponding to those which are affected; andthere is no need to track separate diffs between snapshots, sothey can be combined.

Another option might be creating a separate trie for insertionsand tracking deletions somehow, provided that tries can be mergedefficiently.

Actually, in my case, deletions could be deferred and performedin bulk. OTOH, I will need to count how many times a string isinserted minus number of times it has been deleted.Alternatively, I could just check from time to time which stringsare not needed any more (micro-GC :) ).

There are many possible data structures, but the one youmentioned seems to be the most reasonable.

Re: Making generalized Trie type in D

Reply via email to