On Wednesday 15 December 2004 19:29, Mike Snare wrote: > In my case, the words are keywords that must remain as is, searchable > with the hyphen in place. ÂIt was easy enough to modify the tokenizer > to do what I need, so I'm not really asking for help there. ÂI'm > really just curious as to why it is that "a-1" is considered a single > token, but "a-b" is split.
a-1 is considered a typical product name that needs to be unchanged (there's a comment in the source that mentions this). Indexing "hyphen-word" as two tokens has the advantage that it can then be found with the following queries: hypen-word (will be turned into a phrase query internally) "hypen word" (phrase query) (it cannot be found searching for hyphenword, however). Regards Daniel -- http://www.danielnaber.de --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]