Re: OutOfMemory errors while indexing large documents

Harini Raghavan Mon, 25 Jul 2005 07:30:20 -0700

I am using org.htmlparser.parserapplications.StringExtractor to parse thehtml pages, I guess the OutOfMemory occurs while parsing the large HTMLpages and not while indexing. Sorry about the confusion.

----- Original Message -----From: "Erik Hatcher" <[EMAIL PROTECTED]>

To: <[email protected]>
Sent: Monday, July 25, 2005 6:43 PM
Subject: Re: OutOfMemory errors while indexing large documents

Could you be more specific about where the OutOfMemory error ishappening? Do you have a complete stack trace?
As for maxFieldLength - in my use of Lucene, it is necessary to index theentire document and not just the first 10,000 or so terms - I setmaxFieldLength to Integer.MAX_VALUE.
    Erik


On Jul 25, 2005, at 7:30 AM, Harini Raghavan wrote:
Hi All,
I am using lucene to index large documents(HTML pages). The applicationis running on JBoss and MySQL on UNIX. The indexing is throwingOutOfMemory errors beyond a certain point. I am not sure why this ishappening. I am using the default IndexWriter properties, but the lucenedocumentation mentions about setting the max field length on theIndexWriter to some optimum value for large documents. Is anyone awareof any optimum settings for maxFieldLength, mergeFactor, minMergeDoc andmaxMergeDoc?
Thanks,
Harini

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: OutOfMemory errors while indexing large documents

Reply via email to