This seems dangerous to me, since Lucene is free to take liberties with tokens, such as stemming and filtering out stop words. So a URL likeYes, I thought of that, but it always felt like a weird idea to me. I can't really explain why.... Clemens, what do you think about this? I was imagining something like skipping the link parts that are the same in the previous link....and now I know where I got that :)
/path/to/foo
might get mapped to
/path/foo
if you used a stopword analyzer.
A very common trick for compressing paths is this: give each known URL prefix a code. Example:
/foo -> 1 = ("foo")
/foo/bar -> 2 = (1, "bar")
/foo/blah -> 3 = (1, "blah")
/foo/bar/moo -> 4 = (2, "moo")
This trick is used often in caching, to reduce the number of lookups required to find an element in a hierarchical cache.
--
Brian Goetz
Quiotix Corporation
[EMAIL PROTECTED] Tel: 650-843-1300 Fax: 650-324-8032
http://www.quiotix.com
--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@;jakarta.apache.org>
