Yes, I thought of that, but it always felt like a weird idea to me.  I
can't really explain why....  Clemens, what do you think about this?  I
was imagining something like skipping the link parts that are the same
in the previous link....and now I know where I got that :)
This seems dangerous to me, since Lucene is free to take liberties with tokens, such as stemming and filtering out stop words. So a URL like
/path/to/foo
might get mapped to
/path/foo
if you used a stopword analyzer.

A very common trick for compressing paths is this: give each known URL prefix a code. Example:

/foo -> 1 = ("foo")
/foo/bar -> 2 = (1, "bar")
/foo/blah -> 3 = (1, "blah")
/foo/bar/moo -> 4 = (2, "moo")

This trick is used often in caching, to reduce the number of lookups required to find an element in a hierarchical cache.



--
Brian Goetz
Quiotix Corporation
[EMAIL PROTECTED] Tel: 650-843-1300 Fax: 650-324-8032

http://www.quiotix.com


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@;jakarta.apache.org>



Reply via email to