That could have its own problems.  If they are labelled -1, -2, ...
then phrase searching would have to match *backwards* for negative
numbers.  Then if true positions overflowed into negative numbers,
...very negative number, then it is essentially starting from a very
large (unsigned) location.  Thoughts?

It's pretty easy to come up with a n-bit integer that should be long enough for practical purposes. 2^16 = 65,536 which is probably still a bit too small for the maximum number of words in a document. But 2^24 gives us a good 16-million words, which is good enough for War and Peace. (I'm checking at the moment.)


Regarding flexibility, we could make  htsearch  treat words separated
by "invalid" puctuation (but no spaces) as a phrase, and make the
default  valid_punctuation  empty.  That way people who want the
current functionality can have it (except queries where words are not
separated by spaces but *should* match those words separately?) but
the default would be less buggy for phrase searches.

Sounds sensible to me--but I think we need more than one or two voices on this. But just to make sure I'm clear on what you want to do...


status-quo -> status (location 0) + quo (location 1)

And there's no entry for "statusquo"

For some people, punctuation has meaning. Let's say we have part
numbers or dates. "3/24/03" isn't really the same as "32403" and
I'm not sure the phrase search works well either.

Ah, yes. All three would be too short to be indexed... But isn't that what extra_word_characters is for?

Yes. But my point is that we should eventually work out a WordToken class or something that wraps up all these attributes and can be generalized for Unicode-type issues.


-Geoff



-------------------------------------------------------
This SF.net email is sponsored by: Etnus, makers of TotalView, The debugger for complex code. Debugging C/C++ programs can leave you feeling lost and disoriented. TotalView can help you find your way. Available on major UNIX and Linux platforms. Try it free. www.etnus.com
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to