[jira] [Issue Comment Edited] (PDFBOX-1226) Counting pages of a PDF gives OutOfMemoryError

Adam Nichols (Issue Comment Edited) (JIRA) Thu, 09 Feb 2012 16:54:26 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205123#comment-13205123
 ]


Adam Nichols edited comment on PDFBOX-1226 at 2/10/12 12:53 AM:
----------------------------------------------------------------

There are a few different options here, the easiest and fastest would be to 
increase the amount of memory available to your JVM.  If you need to get code 
that works straight away, you should go this route.

However, counting pages shouldn't require a lot of memory (nor even reading the 
entire file for that matter).  PDFBOX-1000 is tracking a new parser which isn't 
done yet, but it might actually be far enough along to count pages (it's been a 
while since I had the code open).  This parser will probably take quite a while 
to complete, so there's also PDFBOX-1199 which is a shorter term solution.  
Sorry I can't dig into the code right now and give you a more certain answer.  
Hopefully the references to the other JIRA issues will be enough to help you 
out.

Also, if you're not using the getNumberOfPages() method, take a look at 
PDFBOX-911 for some very simple sample code.
                
      was (Author: adamnichols):
    There are a few different options here, the easiest and fastest would be to 
increase the amount of memory available to your JVM.  If you need to get code 
that works straight away, you should go this route.

However, counting pages shouldn't require a lot of memory (nor even reading the 
entire file for that matter).  PDFBOX-1000 is tracking a new parser which isn't 
done yet, but it might actually be far enough along to count pages (it's been a 
while since I had the code open).  This parser will probably take quite a while 
to complete, so there's also PDFBOX-1199 which is a shorter term solution.  
Sorry I can't dig into the code right now and give you a more certain answer.  
Hopefully 

Also, if you're not using the getNumberOfPages() method, take a look at 
PDFBOX-911 for some very simple sample code.
                  
> Counting  pages of a PDF gives OutOfMemoryError
> -----------------------------------------------
>
>                 Key: PDFBOX-1226
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1226
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDFReader
>    Affects Versions: 1.6.0
>         Environment: Windows 7 / Windows XP
>            Reporter: Anca Zapuc
>         Attachments: Big_no_pages.7z
>
>
> I have a pdf ( 397 MB) and I am trying to count the pages.
> I am able to open the PDF with AdobeReader 9, but no with FoxitReader.
> Code:
>   PDDocument doc = null;
>               File temp = null;
>               RandomAccessFile rand = null;
>               int nr = 0;
>               try {
>                   //create a temporary file needed by the PDFBox when dealing 
> with PDFs really really large
>                   temp = new File("e:/temp.tmp");
>                   //using random access file needed for PDF really large
>                   rand = new RandomAccessFile(temp,"rw");
>                   doc = PDDocument.load(file,rand);
>                   nr = doc.getNumberOfPages();
>       }catch(Exception e){
>               e.printStackTrace();
>       }
> Got following exception:
> org.apache.pdfbox.exceptions.WrappedIOException
>       at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:240)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1022)
>       at PDFBoxExample.getHugeNrOfFiles(PDFBoxExample.java:36)
>       at PDFBoxExample.main(PDFBoxExample.java:258)
> Caused by: java.lang.OutOfMemoryError: Java heap space
>       at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45)
>       at java.lang.StringBuffer.<init>(StringBuffer.java:79)
>       at 
> org.apache.pdfbox.pdfparser.BaseParser.readString(BaseParser.java:1121)
>       at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:402)
>       at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:552)
>       at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
>       ... 4 more
> I attached the PDF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (PDFBOX-1226) Counting pages of a PDF gives OutOfMemoryError

Reply via email to