By 300MG I assume you mean 300MB. You can also try extracting the text outside of lucene by using a PDFBox command line app.
java org.pdfbox.ExtractText <pdffile> you may need to increase the JRE memory like this java -Xmx512m .pdfbox.ExtractText <pdffile> OR java -Xmx1024m .pdfbox.ExtractText <pdffile> If this is still giving you an out of memory error then it is possibly an issue with PDFBox, if that is the case then please create an issue and attach/upload the PDF on the PDFBox site. Ben > Thanks. > > I am using the getText(PDDocument) method of the PDFTextStripper. I will > try the other suggestion. > > suba suresh. > > Rob Staveley (Tom) wrote: > > If you are using > > http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#getTe xt(o > > rg.pdfbox.pdmodel.PDDocument), you are going to get a large String and may > > need a 1G heap. > > > > If, however, you are using > > http://www.pdfbox.org/javadoc/org/pdfbox/util/PDFTextStripper.html#write Text > > (org.pdfbox.pdmodel.PDDocument,%20java.io.Writer) to go via a temporary > > file, you will not need so much RAM, but you need to use > > http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Field. html > > #Field(java.lang.String,%20java.io.Reader) to construct your Lucene field > > (rather than > > http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Field. html > > #Field(java.lang.String,%20java.lang.String,% 20org.apache.lucene.document.Fi > > eld.Store,%20org.apache.lucene.document.Field.Index)). > > > > -----Original Message----- > > From: Suba Suresh [mailto:[EMAIL PROTECTED] > > Sent: 13 July 2006 14:55 > > To: [email protected] > > Subject: Out of memory error > > > > I am indexing different document formats with lucene 1.9. One of the pdf > > file I am indexing is 300MG. Whenever the index writer hits that file it > > stops the indexing with "Out of Memory" exception. I am using the pdf box > > library to index. I have set the following merge factors in my code. > > > > writer.setMergeFactor(1000); > > writer.setMaxMergeDocs(9999999); > > writer.setMaxBufferedDocs(1000); > > writer.setMaxFieldLength(Integer.MAX_VALUE); > > > > I would like any help and suggestions. > > > > thanks, > > suba suresh. > > > > -------------------------------------------------------------------- - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
