Thanks Eric for your quick response and useful information. I'll give a try to 
bump up the MaxFieldLength and check the performance. It seems the quickest way 
to handle the issue. 

Amy

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, March 10, 2009 9:56 AM
To: java-user@lucene.apache.org
Subject: Re: index large size file

Sure there are other options. You could decide to index in chunks
rather then entire  documents. You could decide many things.
None of which we can recommend unless we have a clue what
you're really trying to accomplish or whether you're encountering
a specific problem.

I can say that we've indexed 7,000 *page* documents by bumping the
MaxFieldLength. The performance is fine. I didn't measure indexing
performance, but it ran acceptably quickly. Search performance seems
unaffected, it's mostly dependent upon the overall index size and
number of unique tokens as far as I can tell.

I suggest you just try it and measure, that's the only way to determine
whether *your* situation is adversely affected, since nobody can answer
such a general question without considerably more specifics, and even
then the answer is a qualified guess.

But if you're *really* asking whether bumping MaxFieldLength does
something like reserve that much space for every document whether
or not it needs to, the answer is "no". A MaxFieldLength of 1,000,000,000
won't use noticeably more resources for a file with 10 tokens than if the
MaxFieldLength were 100. As far as I know.

Best
Erick

On Tue, Mar 10, 2009 at 10:47 AM, Amy Zhou <amy.z...@systemware.com> wrote:

> My issue here is that large file is truncated with default MaxFieldLength
> 10,000 during indexing. The file size I index could be 10mb or larger.
>
> My questions are:
>
> 1) If I chose MaxFieldLength as UNLIMITED instead of 100,000, what the
> performance could be?
> 2) Any other options?
>
>
> -----Original Message-----
> From: Mark Miller [mailto:markrmil...@gmail.com]
> Sent: Tuesday, March 10, 2009 9:37 AM
> To: java-user@lucene.apache.org
> Subject: Re: index large size file
>
> Amy Zhou wrote:
> > Hi,
> >
> > I'm having a couple of questions about indexing large size file. As my
> understanding, the default MaxFieldLength 100,000. In Lucene 2.4, we can set
> the MaxFieldLength during constructor. My questions are:
> >
> The default is 10,000.
> > 1) How's the performance if MaxFieldLength is set to UNLIMITED?
> >
> It depends on how long your documents are. Its simply a cutoff -
> documents longer than n (10,000 by default) terms will be truncated.
> > 2) Any other options for indexing large size file?
> >
> What is the problem you are trying to address? Are you having trouble
> indexing a very large file? Can you share more details?
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to