On May 17, 2005, at 5:33 AM, Pablo Gomes Ludermir wrote:

Dear all,

I would like to know about the maxFieldLength. It says on the Javadocs
that it limits "The maximum number of terms that will be indexed for a
single field in a document." So, for instance, in my "contents" field,
I would have it limited by default to 10.000 terms. But which terms
are those? The first 10.000 to be indexed?
Or is there any feature selection approach? Like, the most frequent
10.000 terms are indexed and the rest are discarded? Anyone knows
that? If this is not the case, Is it possible to implement?

It's the first 10,000 terms. You could implement an analyzer that buffered tokens and only emitted the most frequent ones as one possible way to pick which ones are indexed - there may be other ways to accomplish this by hacking Lucene itself.


    Erik


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to