Re: inconsistency/performance trap of empty terms

Chris Hostetter Fri, 29 Oct 2010 10:45:13 -0700

: why not just discard them completely in say, indexer/queryparser ?

In QueryParser: maybe, that's a high level API with assumptions about 
"human" interaction and text.


In the IndexWriter: it seems like a bad idea.

Low level Lucene really shouldn't be making any assumptions about *how* 
the client code is using the library -- you and i may not have any good 
reasons for wanting an empty term, but we shouldn't put that as a 
hardcoded assumption in the low level code.

It's essentially the converse issue of IndexWriter.maxFieldLength -- 
which was deliberately changed to default to Integer.MAX_VALUE precisesly 
because of this "don't assume we know how people are using the library" 
issue -- but we could certianly make it configurable in the same way.

(I see now that IndexWriter.maxFieldLength got deprecated in favor of 
IndexWriterConfig.maxFieldLength ... i thought i remembered that had been 
deprecated in favor of a TokenFilter that did the limiting, hence my 
suggestion that we use the same pattern for "min term length" -- it 
could easily be an IndexWriterConfig option as well, but using the 
TokenFilter approach seems more useful since it can be per field)


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: inconsistency/performance trap of empty terms

Reply via email to