On Tue, Jan 3, 2012 at 10:10 AM, Paul Libbrecht <[email protected]> wrote:
> I think the idf is also about terms and not about tokens.
> Maybe an expert can confirm my belief or we have to invent a test.
>

idf is docFreq and maxDoc.

docFreq is per-field, maxDoc is not. This might not even matter though.

if you are concerned about it in a situation where you have multiple
languages in different fields and some are sparse, you can look at
lucene's trunk, which has a "per-field maxdoc" (Terms.docCount), which
is the count of all documents that have at least one indexed term for
the field.

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to