Re: Making generalized Trie type in D

Roman D. Boiko Mon, 04 Jun 2012 11:43:49 -0700

On Monday, 4 June 2012 at 15:35:31 UTC, Dmitry Olshansky wrote:

On 04.06.2012 18:19, Andrei Alexandrescu wrote:
On 6/4/12 4:46 AM, Dmitry Olshansky wrote:
Cross-posting from my GSOC list as I think we had a lack of Drox posts
lately :)
[snip]
I think it would be great if you converted this post into anarticle.
Andrei
Sounds good, will do once I fix few issues that were mentioned(bit-packing, GC types etc.)

Would be interesting to see some more examples, along withrationale/motivation for various aspects of your API, andpossible usage scenarios.

Tries are fundamental (both data structures and respectivealgorithms) for lookup problems, in the same way as arrays arefundamental for indexed access.

For example, they have several advantages over hash tables. Hashcalculation requires const * key.length operations, which isequivalent to the number of comparisons needed for trie lookup.But hash tables may be less space efficient, or lead to hashcollisions which increase lookup time. Also, it is possible tocreate persistent (immutable) tries with efficient (log N)inserting / deleting (this scenario is very important for my DCTproject). Immutable hash tables would require 0(N) copying foreach insert / delete.

It is difficult to create a good API for fundamental datastructures, because various use cases would motivate differenttrade-offs. The same is true for implementation. This is why Ilike your decision to introduce policies for configuration.Rationale and use cases should help to analyze design of your APIand implementation, thus you will get better community feedback :)


Below are some notes related to my DCT use cases.

Your examples deal with lookup by the whole word (first/lastcharacters and length are needed). Are your API andimplementation adaptable for character-by-character trie lookup?

Will compile-time generation of lookup code based on tries besupported? Example which is currently in DCT (first implementedby Brian Schott in his Dscanner project) uses switch statements(which means lookup linear in number of possible characters ateach position). A trivial improvement might be using ifstatements and binary lookup. (E.g., if there are 26 possiblecharacters used at some position, use only 5 comparisons, not 26).

I wanted to analyse your regex implementation, but that's not aneasy task and requires a lot of effort... It looks like the mostpromising alternative to binary trie lookup which I described inprevious paragraph. Similarities and differences with your regexdesign might also help us understand tries better.

Re: Making generalized Trie type in D

Reply via email to