[jira] [Comment Edited] (PDFBOX-1715) java.lang.OutOfMemoryError when extracting images

sarathy (JIRA) Tue, 10 Sep 2013 08:30:01 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763096#comment-13763096
 ]


sarathy edited comment on PDFBOX-1715 at 9/10/13 3:02 PM:
----------------------------------------------------------

Thanks for your prompt reply. 
* We are programmatically extracting the image (using the same steps) - thats 
why we are not using ExtractImages which is pretty much a command line utility.
* No we have not - we will give it a go and let you know.
* Only this PDF - generally it works fine.
* In this case, it looks like tiff files.
* Will check with the management team and let you know since the issue occurred 
in production and we are not sure if our SLA allows giving the actual PDF.
                
      was (Author: sarathy.thothathri):
    * We are programmatically extracting the image (using the same steps) - 
thats why we are not using ExtractImages which is pretty much a command line 
utility.
* No we have not - we will give it a go and let you know.
* Only this PDF - generally it works fine.
* In this case, it looks like tiff files.
* Will check with the management team and let you know since the issue occurred 
in production and we are not sure if our SLA allows giving the actual PDF.
                  
> java.lang.OutOfMemoryError when extracting images
> -------------------------------------------------
>
>                 Key: PDFBOX-1715
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1715
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.1
>         Environment: LSB Version:    
> :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
> Distributor ID: CentOS
> Description:    CentOS release 4.7 (Final)
> Release:        4.7
> Codename:       Final
> Java 1.6.0
>            Reporter: sarathy
>
> We are trying to extract images from PDF file. As part of that, we are 
> converting a PDPage into an image. using PDPage.convertImage method. Its a 52 
> page document.
> At that time, We are seeing the following trace:
> Here are the steps:
> PDDocument document = PDDocument.load(inputStream);
> List<PDPage> pages = document.getDocumentCatalog().getAllPages();
> for (PDPage pdPage : pages) {
>    if (pdPage.getResources() != null && pdPage.getResources().getImages() != 
> null)
>      PageInfo  page = new PageInfo(pdPage, true, rotation);
>      ...
>    }
> }
> In PageInfo, we are doing:
> BufferedImage bimage = page.convertToImage();
> And after processing about 12 or so pages, it starts complaining as follows.
> java.lang.OutOfMemoryError: Java heap space
>         at 
> org.apache.pdfbox.io.RandomAccessBuffer.expandBuffer(RandomAccessBuffer.java:263)
>         at 
> org.apache.pdfbox.io.RandomAccessBuffer.write(RandomAccessBuffer.java:222)
>         at 
> org.apache.pdfbox.io.RandomAccessFileOutputStream.write(RandomAccessFileOutputStream.java:108)
>         at java.io.OutputStream.write(OutputStream.java:75)
>         at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:102)
>         at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:295)
>         at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:237)
>         at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:172)
>         at 
> org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:231)
>         at 
> org.apache.pdfbox.pdmodel.common.PDStream.getByteArray(PDStream.java:509)
>         at 
> org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap.getRGBImage(PDPixelMap.java:185)
>         at 
> org.apache.pdfbox.util.operator.pagedrawer.Invoke.process(Invoke.java:83)
>         at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
>         at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
>         at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>         at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>         at 
> org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:125)
>         at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:781)
>         at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:712)
>         at oss.rcpt.PageInfo.<init>(PageInfo.java:328)
>         at oss.utl.PDFImageSplitter.execute(PDFImageSplitter.java:217)
>         at oss.utl.PDFUtilities.getImageCount(PDFUtilities.java:165)
>         at cms.utl.PDFImageOperations.main(PDFImageOperations.java:157)
> when we run this from command line, 
> * if we set -Xms=512m and -Xmx=512m, its complaining after 12 pages.
> * if we set -Xms=1024m and -Xmx=1024m, its complaining after 27 pages.
> On the side, we are also getting "Colour key masking isn't supported" message 
> for each image in the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (PDFBOX-1715) java.lang.OutOfMemoryError when extracting images

Reply via email to