[ 
https://issues.apache.org/jira/browse/PDFBOX-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123219#comment-14123219
 ] 

ASF subversion and git services commented on PDFBOX-2313:
---------------------------------------------------------

Commit 1622746 from [~jahewson] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1622746 ]

PDFBOX-2313: Only extract images which are used in the content stream

> ExtractImages finds never-rendered images
> -----------------------------------------
>
>                 Key: PDFBOX-2313
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2313
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 2.0.0
>            Reporter: John Hewson
>            Assignee: John Hewson
>
> The file from PDFBOX-2101 is still causing unexpectedly high memory use with 
> ExtractImages when compared to PDFToImage. Given that PDFToImage uses the 
> same caching strategy, it's not really a caching issue, though we might still 
> want to think about that.
> The PDF contains 55 images on the first page which are never rendered and 
> ExtractImages runs out of memory trying to extract them all. Given that PDFs 
> often contain junk like this, I suggest that ExtractImages only extract 
> images which are actually drawn to the page at some point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to