On Monday, 4 June 2012 at 15:35:31 UTC, Dmitry Olshansky wrote:
On 04.06.2012 18:19, Andrei Alexandrescu wrote:
On 6/4/12 4:46 AM, Dmitry Olshansky wrote:
Cross-posting from my GSOC list as I think we had a lack of D rox posts
lately :)
[snip]

I think it would be great if you converted this post into an article.

Andrei


Sounds good, will do once I fix few issues that were mentioned (bit-packing, GC types etc.)

Would be interesting to see some more examples, along with rationale/motivation for various aspects of your API, and possible usage scenarios.

Tries are fundamental (both data structures and respective algorithms) for lookup problems, in the same way as arrays are fundamental for indexed access.

For example, they have several advantages over hash tables. Hash calculation requires const * key.length operations, which is equivalent to the number of comparisons needed for trie lookup. But hash tables may be less space efficient, or lead to hash collisions which increase lookup time. Also, it is possible to create persistent (immutable) tries with efficient (log N) inserting / deleting (this scenario is very important for my DCT project). Immutable hash tables would require 0(N) copying for each insert / delete.

It is difficult to create a good API for fundamental data structures, because various use cases would motivate different trade-offs. The same is true for implementation. This is why I like your decision to introduce policies for configuration. Rationale and use cases should help to analyze design of your API and implementation, thus you will get better community feedback :)

Below are some notes related to my DCT use cases.

Your examples deal with lookup by the whole word (first/last characters and length are needed). Are your API and implementation adaptable for character-by-character trie lookup?

Will compile-time generation of lookup code based on tries be supported? Example which is currently in DCT (first implemented by Brian Schott in his Dscanner project) uses switch statements (which means lookup linear in number of possible characters at each position). A trivial improvement might be using if statements and binary lookup. (E.g., if there are 26 possible characters used at some position, use only 5 comparisons, not 26).

I wanted to analyse your regex implementation, but that's not an easy task and requires a lot of effort... It looks like the most promising alternative to binary trie lookup which I described in previous paragraph. Similarities and differences with your regex design might also help us understand tries better.

Reply via email to