[
https://issues.apache.org/jira/browse/PDFBOX-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123219#comment-14123219
]
ASF subversion and git services commented on PDFBOX-2313:
---------------------------------------------------------
Commit 1622746 from [~jahewson] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1622746 ]
PDFBOX-2313: Only extract images which are used in the content stream
> ExtractImages finds never-rendered images
> -----------------------------------------
>
> Key: PDFBOX-2313
> URL: https://issues.apache.org/jira/browse/PDFBOX-2313
> Project: PDFBox
> Issue Type: Bug
> Components: Utilities
> Affects Versions: 2.0.0
> Reporter: John Hewson
> Assignee: John Hewson
>
> The file from PDFBOX-2101 is still causing unexpectedly high memory use with
> ExtractImages when compared to PDFToImage. Given that PDFToImage uses the
> same caching strategy, it's not really a caching issue, though we might still
> want to think about that.
> The PDF contains 55 images on the first page which are never rendered and
> ExtractImages runs out of memory trying to extract them all. Given that PDFs
> often contain junk like this, I suggest that ExtractImages only extract
> images which are actually drawn to the page at some point.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)