I am using org.htmlparser.parserapplications.StringExtractor to parse the
html pages, I guess the OutOfMemory occurs while parsing the large HTML
pages and not while indexing. Sorry about the confusion.
----- Original Message -----
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: <java-user@lucene.apache.org>
Sent: Monday, July 25, 2005 6:43 PM
Subject: Re: OutOfMemory errors while indexing large documents
Could you be more specific about where the OutOfMemory error is
happening? Do you have a complete stack trace?
As for maxFieldLength - in my use of Lucene, it is necessary to index the
entire document and not just the first 10,000 or so terms - I set
maxFieldLength to Integer.MAX_VALUE.
Erik
On Jul 25, 2005, at 7:30 AM, Harini Raghavan wrote:
Hi All,
I am using lucene to index large documents(HTML pages). The application
is running on JBoss and MySQL on UNIX. The indexing is throwing
OutOfMemory errors beyond a certain point. I am not sure why this is
happening. I am using the default IndexWriter properties, but the lucene
documentation mentions about setting the max field length on the
IndexWriter to some optimum value for large documents. Is anyone aware
of any optimum settings for maxFieldLength, mergeFactor, minMergeDoc and
maxMergeDoc?
Thanks,
Harini
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]