[jira] [Commented] (PDFBOX-2313) ExtractImages finds never-rendered images

John Hewson (JIRA) Fri, 05 Sep 2014 10:43:06 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123225#comment-14123225
 ]


John Hewson commented on PDFBOX-2313:
-------------------------------------

{code}
we'd need to create something like a PageDrawer that doesn't draw
{code}

Now that we have PDFGraphicsStreamEngine this was actually easy, and it has the 
added benefit that inline images will finally be supported by ExtractImages.

I'm still going to look into removing/improving image caching, with a view to 
having consumers perform their own caching an implementing something smart for 
us in PDFRenderer.

> ExtractImages finds never-rendered images
> -----------------------------------------
>
>                 Key: PDFBOX-2313
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2313
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 2.0.0
>            Reporter: John Hewson
>            Assignee: John Hewson
>
> The file from PDFBOX-2101 is still causing unexpectedly high memory use with 
> ExtractImages when compared to PDFToImage. Given that PDFToImage uses the 
> same caching strategy, it's not really a caching issue, though we might still 
> want to think about that.
> The PDF contains 55 images on the first page which are never rendered and 
> ExtractImages runs out of memory trying to extract them all. Given that PDFs 
> often contain junk like this, I suggest that ExtractImages only extract 
> images which are actually drawn to the page at some point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PDFBOX-2313) ExtractImages finds never-rendered images

Reply via email to