On Tue, Jan 3, 2012 at 10:10 AM, Paul Libbrecht <[email protected]> wrote: > I think the idf is also about terms and not about tokens. > Maybe an expert can confirm my belief or we have to invent a test. >
idf is docFreq and maxDoc. docFreq is per-field, maxDoc is not. This might not even matter though. if you are concerned about it in a situation where you have multiple languages in different fields and some are sparse, you can look at lucene's trunk, which has a "per-field maxdoc" (Terms.docCount), which is the count of all documents that have at least one indexed term for the field. -- lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
