: book Managing Gigabytes, making "*string*" queries drastically more : efficient for searching (though also impacting index size). Take the : term "cat". It would be indexed with all rotated variations with an : end of word marker added: ... : The query for "*at*" would be preprocessed and rotated such that the : wildcards are collapsed at the end to search for "at*" as a : PrefixQuery. A wildcard in the middle of a string like "c*t" would : become a prefix query for "t$c*".
That's a pretty slick trick. Considering how many Terms the index would wind up containing in order to denormalize the data in that way, I wonder if it would be more practicle to index each of the characters as a seperate term, with the word repeated after the "end of word" character, making wildcard searches into "phase" searches (after doing preprocessing and rotating as you described). Ie, index "cat" as: c a t $ c a t search for "*at*" as a phrase search for "a t" search for "*at" as a phrase search for "a t $" search for "c*t" as a phrase search for "t $ c" ...i'm fairly certain that would keep the index size much smaller (the number of terms would be much smaller, while the average term frequence wouldn't really increase), but i'm not sure if it would actaully be any faster. it depends on the algorithm/performace of PhraseQuery -- which is something I haven't really looked into. It could very well be significantly slower. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]