On Monday 21 November 2005 14:28, [EMAIL PROTECTED] wrote: > Just to make sure that I understand this correctly, > the docs say: > > " By default, no more than 10,000 terms will be > indexed for a field." > > Given your note, then the docs do not mean that no > more than 10,000 terms will be indexed, but that some > smaller number of terms will be indexed and only the > first 10,000 occurrances will be tallied.
I'm sorry, but I don't know a good meaning for tally here. Kind regards, Paul Elschot > > Is that correct? > > Thanks > -MG > > ------ Original Message ------ > Received: Mon, 21 Nov 2005 03:04:42 AM EST > From: Paul Elschot <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Subject: Re: TermFrequencies vector limits? > > > On Monday 21 November 2005 02:16, > [EMAIL PROTECTED] wrote: > > > Hi. I was wondering if anyone else has seen this > > > before. I'm using lucene 1.4.3 and have indexed > > > about 3000 text documents using the statement: > > > > > > doc.add(Field.Text("contents", new FileReader(f), > > > true)); > > > > > > When I go and retrieve the term frequency vectors, > for > > > any document under about 90k, everything looks as > > > expected. However for larger documents (I haven't > > > found the exact point, but I know that those over > 128k > > > qualify) the sum of the term frequencies in the > vector > > > seems to max out at 10001. > > .. > > > > That's correct, have a look here for > IndexWriter.maxFieldLength : > > > http://wiki.apache.org/jakarta-lucene/ LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71 > > > > Regards, > > Paul Elschot > > > > > > > > > __________________________________ > Yahoo! Mail - PC Magazine Editors' Choice 2005 > http://mail.yahoo.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]