Sure there are other options. You could decide to index in chunks rather then entire documents. You could decide many things. None of which we can recommend unless we have a clue what you're really trying to accomplish or whether you're encountering a specific problem.
I can say that we've indexed 7,000 *page* documents by bumping the MaxFieldLength. The performance is fine. I didn't measure indexing performance, but it ran acceptably quickly. Search performance seems unaffected, it's mostly dependent upon the overall index size and number of unique tokens as far as I can tell. I suggest you just try it and measure, that's the only way to determine whether *your* situation is adversely affected, since nobody can answer such a general question without considerably more specifics, and even then the answer is a qualified guess. But if you're *really* asking whether bumping MaxFieldLength does something like reserve that much space for every document whether or not it needs to, the answer is "no". A MaxFieldLength of 1,000,000,000 won't use noticeably more resources for a file with 10 tokens than if the MaxFieldLength were 100. As far as I know. Best Erick On Tue, Mar 10, 2009 at 10:47 AM, Amy Zhou <amy.z...@systemware.com> wrote: > My issue here is that large file is truncated with default MaxFieldLength > 10,000 during indexing. The file size I index could be 10mb or larger. > > My questions are: > > 1) If I chose MaxFieldLength as UNLIMITED instead of 100,000, what the > performance could be? > 2) Any other options? > > > -----Original Message----- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: Tuesday, March 10, 2009 9:37 AM > To: java-user@lucene.apache.org > Subject: Re: index large size file > > Amy Zhou wrote: > > Hi, > > > > I'm having a couple of questions about indexing large size file. As my > understanding, the default MaxFieldLength 100,000. In Lucene 2.4, we can set > the MaxFieldLength during constructor. My questions are: > > > The default is 10,000. > > 1) How's the performance if MaxFieldLength is set to UNLIMITED? > > > It depends on how long your documents are. Its simply a cutoff - > documents longer than n (10,000 by default) terms will be truncated. > > 2) Any other options for indexing large size file? > > > What is the problem you are trying to address? Are you having trouble > indexing a very large file? Can you share more details? > > > -- > - Mark > > http://www.lucidimagination.com > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >