Sorry for my late response. It took us some time to run it again. We increased the memory heap to 1G as you suggested and it works. The indexer is not crashing. (We are running into some other problem with a powerpoint file .That is for another email).

The code change with PDFTextStripper.writeText((org.pdfbox.pdmodel.PDDocument,%20java.io.Writer) did not work for us.


Thanks for all the help.

suba suresh.

Rob Staveley (Tom) wrote:
Let us know how you get on. There are a lot of people fighting very similar
battles on this list.
-----Original Message-----
From: Suba Suresh [mailto:[EMAIL PROTECTED] Sent: 13 July 2006 15:30
To: java-user@lucene.apache.org
Subject: Re: Out of memory error

Thanks.

I am using the getText(PDDocument) method of the PDFTextStripper. I will try
the other suggestion.

suba suresh.

Rob Staveley (Tom) wrote:

If you are using
http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#get
Text(o rg.pdfbox.pdmodel.PDDocument), you are going to get a large String and may need a 1G heap.

If, however, you are using
http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#wri
teText
(org.pdfbox.pdmodel.PDDocument,%20java.io.Writer) to go via a temporary file, you will not need so much RAM, but you need to use http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Fiel
d.html
#Field(java.lang.String,%20java.io.Reader) to construct your Lucene field (rather than http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Fiel d.html #Field(java.lang.String,%20java.lang.String,%20org.apache.lucene.docum
ent.Fi eld.Store,%20org.apache.lucene.document.Field.Index)).

-----Original Message-----
From: Suba Suresh [mailto:[EMAIL PROTECTED]
Sent: 13 July 2006 14:55
To: java-user@lucene.apache.org
Subject: Out of memory error

I am indexing different document formats with lucene 1.9. One of the pdf file I am indexing is 300MG. Whenever the index writer hits that file it stops the indexing with "Out of Memory" exception. I am using the pdf box library to index. I have set the following merge factors in my

code.

writer.setMergeFactor(1000);
writer.setMaxMergeDocs(9999999);
writer.setMaxBufferedDocs(1000);
writer.setMaxFieldLength(Integer.MAX_VALUE);

I would like any help and suggestions.

thanks,
suba suresh.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to