We don't even need the TF, LengthFilter does the job very easy, just set minTermLength to 1 and maxTermLength to Integer.MAX_VALUE.
----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Friday, October 29, 2010 7:49 PM > To: dev@lucene.apache.org > Subject: RE: inconsistency/performance trap of empty terms > > I am for the tokenfilter approach. Max Field Length is still to be deprecated in > favour of the TokenFilter. > > TF is very easy, just loop over incrementToken() until it returns false or a > termLength>0 > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -----Original Message----- > > From: Chris Hostetter [mailto:hossman_luc...@fucit.org] > > Sent: Friday, October 29, 2010 7:45 PM > > To: Lucene Dev > > Subject: Re: inconsistency/performance trap of empty terms > > > > > > : why not just discard them completely in say, indexer/queryparser ? > > > > In QueryParser: maybe, that's a high level API with assumptions about > "human" > > interaction and text. > > > > In the IndexWriter: it seems like a bad idea. > > > > Low level Lucene really shouldn't be making any assumptions about > > *how* > the > > client code is using the library -- you and i may not have any good > reasons for > > wanting an empty term, but we shouldn't put that as a hardcoded > > assumption in the low level code. > > > > It's essentially the converse issue of IndexWriter.maxFieldLength -- > > which > was > > deliberately changed to default to Integer.MAX_VALUE precisesly > > because of this "don't assume we know how people are using the library" > > issue -- but we could certianly make it configurable in the same way. > > > > (I see now that IndexWriter.maxFieldLength got deprecated in favor of > > IndexWriterConfig.maxFieldLength ... i thought i remembered that had > > been deprecated in favor of a TokenFilter that did the limiting, hence > > my > suggestion > > that we use the same pattern for "min term length" -- it could easily > > be > an > > IndexWriterConfig option as well, but using the TokenFilter approach > > seems more useful since it can be per field) > > > > > > -Hoss > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For > > additional commands, e-mail: dev-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional > commands, e-mail: dev-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org