Sorry for my late response. It took us some time to run it again. We
increased the memory heap to 1G as you suggested and it works. The
indexer is not crashing. (We are running into some other problem with a
powerpoint file .That is for another email).
The code change with
PDFTextStripper.writeText((org.pdfbox.pdmodel.PDDocument,%20java.io.Writer)
did not work for us.
Thanks for all the help.
suba suresh.
Rob Staveley (Tom) wrote:
Let us know how you get on. There are a lot of people fighting very similar
battles on this list.
-----Original Message-----
From: Suba Suresh [mailto:[EMAIL PROTECTED]
Sent: 13 July 2006 15:30
To: java-user@lucene.apache.org
Subject: Re: Out of memory error
Thanks.
I am using the getText(PDDocument) method of the PDFTextStripper. I will try
the other suggestion.
suba suresh.
Rob Staveley (Tom) wrote:
If you are using
http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#get
Text(o rg.pdfbox.pdmodel.PDDocument), you are going to get a large
String and may need a 1G heap.
If, however, you are using
http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#wri
teText
(org.pdfbox.pdmodel.PDDocument,%20java.io.Writer) to go via a
temporary file, you will not need so much RAM, but you need to use
http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Fiel
d.html
#Field(java.lang.String,%20java.io.Reader) to construct your Lucene
field (rather than
http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Fiel
d.html
#Field(java.lang.String,%20java.lang.String,%20org.apache.lucene.docum
ent.Fi eld.Store,%20org.apache.lucene.document.Field.Index)).
-----Original Message-----
From: Suba Suresh [mailto:[EMAIL PROTECTED]
Sent: 13 July 2006 14:55
To: java-user@lucene.apache.org
Subject: Out of memory error
I am indexing different document formats with lucene 1.9. One of the
pdf file I am indexing is 300MG. Whenever the index writer hits that
file it stops the indexing with "Out of Memory" exception. I am using
the pdf box library to index. I have set the following merge factors in my
code.
writer.setMergeFactor(1000);
writer.setMaxMergeDocs(9999999);
writer.setMaxBufferedDocs(1000);
writer.setMaxFieldLength(Integer.MAX_VALUE);
I would like any help and suggestions.
thanks,
suba suresh.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]