[
https://issues.apache.org/jira/browse/PDFBOX-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849519#comment-17849519
]
Andreas Lehmkühler commented on PDFBOX-5675:
--------------------------------------------
I removed caching from the read process to keep things simple and to see if the
caching is needed at all. I came to the conclusion it isn't, at least in the
given case. The content stream parser doesn't necessarily need full random
access to the stream. It is sufficient to provide a source with limited peek
and rewind capabilities, so that it won't be necessary to decode the whole data
at once.
I've started with a stream based decoder for the flate filter, followed by a
new implementation of the RandomAccessRead interface using that stream as input
and providing limited peek/rewind capabilities using a couple of buffers but
without seek support. I already have a working prototype which works like a
charm with a low(er) memory foot print. On machine it took approx. 170 seconds
to render page 6 using the debugger.
> org.apache.pdfbox.filter.Filter#decode() Java heap space
> --------------------------------------------------------
>
> Key: PDFBOX-5675
> URL: https://issues.apache.org/jira/browse/PDFBOX-5675
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 3.0.0 PDFBox
> Reporter: liu
> Assignee: Andreas Lehmkühler
> Priority: Major
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: 2095e3df01fc32e0bff982a1e79600d5bcf10b81.pdf,
> PDFBOX-5675-v2.patch, PDFBOX-5675.patch, image-2023-09-05-15-05-50-168.png,
> image-2024-04-24-16-50-38-925.png, image-2024-04-24-18-33-17-524.png,
> image-2024-04-24-18-35-43-792.png, image-2024-04-24-19-25-22-904.png,
> image.png, screenshot-1.png, screenshot-2.png
>
>
> !image-2023-09-05-15-05-50-168.png!
> When converting the sixth page of this PDF
> file(2095e3df01fc32e0bff982a1e79600d5bcf10b81.pdf) to an image, a memory
> overflow occurs. Can you provide a way to store the output in a temporary
> file?
> {code:java}
> -Xmx2000m
> public static void main(String[] args) throws IOException,
> InterruptedException {
> File file = new
> File("D:\\2095e3df01fc32e0bff982a1e79600d5bcf10b81.pdf");
> PDDocument pdf = Loader.loadPDF(file,
> IOUtils.createTempFileOnlyStreamCache());
> pdf.setResourceCache(new PdfboxResourceCache());
> PDFRenderer renderer = new PDFRenderer(pdf);
> renderer.setSubsamplingAllowed(true);
> BufferedImage bi = renderer.renderImage(5, 0.125f);
> Thread.sleep(3600000);
> pdf.close();
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]