[jira] [Comment Edited] (PDFBOX-4058) High memory consumption when extracting image from PDF file

Bjorn Misseghers (JIRA) Thu, 11 Jan 2018 02:54:27 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16322037#comment-16322037
 ]


Bjorn Misseghers edited comment on PDFBOX-4058 at 1/11/18 10:53 AM:
--------------------------------------------------------------------

First of all many thanks for the quick response! It is highly appreciated.

Do you have any idea when 2.0.9 will be released? And what about 3.0?
For now, I duplicated your fix in the 2.0.8 sources and made my own build, 
which works.
However, I still can't get my head around the fact that we need close to 3G to 
extract what seems a very basic image. Is there maybe a way we can use PDFBox 
to detect PDF pages with high-memory consuming objects (objects with multiple 
layers?) without actually having to render them?


was (Author: bjorn.misseghers):
First of all many thanks for the quick response! It is highly appreciated.

Do you have any idea when 2.0.9 will be released? And what about 3.0?
For now, I duplicated your fix in the 2.0.8 sources and made my own build, 
which works.
However, I still can't get my head around the fact that we need close to 3G to 
extract what seems a very basic image. Is there maybe a way we can use PDFBox 
to detect PDF pages with high-memory consuming objects without actually having 
to render them?

> High memory consumption when extracting image from PDF file
> -----------------------------------------------------------
>
>                 Key: PDFBOX-4058
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4058
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.5, 2.0.6, 2.0.7, 2.0.8
>         Environment: windows 10 / Linux
>            Reporter: Bjorn Misseghers
>            Assignee: Tilman Hausherr
>              Labels: regression
>             Fix For: 2.0.9, 3.0.0 PDFBox
>
>         Attachments: HighMemoryFootprint.pdf
>
>
> When rendering an image at 300 dpi from the included PDF, my java process 
> uses a huge amount of memory.
> The document is only 45 Kb in size and contains 2 pages, my JVM is unable to 
> extract even 1 page with 3G of memory. Setting Xmx to 4G works but is not the 
> solution I want.
> The error occurs when calling PDFRenderer.renderImageWithDPI()
> I already tried tweaking the memory usage in my application to use a scratch 
> file while loading the document as well as avoiding caching of XObjects as 
> described here: https://pdfbox.apache.org/2.0/faq.html#outofmemoryerror
> These didn't work.
> The issue can be reproduced using the pdfbox-app utility:
> java -Xmx3G -jar pdfbox-app-2.0.8.jar PDFToImage 
> HighMemoryFootprint.pdf -dpi 300 -color RGB -page 1
> What can not be changed?
> * 300 dpi will not be decreased.
> * Max Java memory will not be increased: 3GB is ridiculous for a 45kb PDF 
> file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (PDFBOX-4058) High memory consumption when extracting image from PDF file

Reply via email to