I'd say support them everywhere, and slip LengthFilter into all the standard Analyzers, so people won't hit empty terms unless they opt-in for it. This is a most consistent approach.
On Sat, Oct 30, 2010 at 15:06, Robert Muir <rcm...@gmail.com> wrote: > On Sat, Oct 30, 2010 at 7:01 AM, Earwin Burrfoot <ear...@gmail.com> wrote: >> Mathematically an inverted index is keyed by strings. Any strings. >> Empty term is just a case of a string of length 0. >> So, for consistency, Lucene should support them. TermsEnum.seek("") >> should position you into very beginning of terms list, etc. >> If you drop the support, you have to check zero length damn >> eeeeverywhere in the API where you accept terms. Or, thoroughly >> document unpredictable erratic behaviour :) > > well, we are checking this already, in a lot of the analyzers. > > as i said originally, the biggest problems that we *must* solve are: > 1. try to prevent the performance trap i mentioned, where people > create the empty term as a mega-stopword without realizing it. > 2. fix the analyzers to be consistent with regards to the empty > term... for example, if we decide the empty term is supported, then > they shouldnt be arbitrarily removing empty-term tokens. > > as far as TermsEnum, i myself have already had to special-case the > empty term in TermsEnum implementations before... and I'm pretty > fucking sure that we have long-standing bugs if you have an empty-term > anywhere in your index (e.g. FuzzyQuery will divide by 0 to scale the > boost, and you will get a strange exception from your collector > because it will then have NaN/Inf/some sentinel value). > > just saying, its problematic today, doing nothing and leaving it the > messy unambiguous situation it is now is no option. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Phone: +7 (495) 683-567-4 ICQ: 104465785 --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org