[ https://issues.apache.org/jira/browse/PDFBOX-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849519#comment-17849519 ]
Andreas Lehmkühler commented on PDFBOX-5675: -------------------------------------------- I removed caching from the read process to keep things simple and to see if the caching is needed at all. I came to the conclusion it isn't, at least in the given case. The content stream parser doesn't necessarily need full random access to the stream. It is sufficient to provide a source with limited peek and rewind capabilities, so that it won't be necessary to decode the whole data at once. I've started with a stream based decoder for the flate filter, followed by a new implementation of the RandomAccessRead interface using that stream as input and providing limited peek/rewind capabilities using a couple of buffers but without seek support. I already have a working prototype which works like a charm with a low(er) memory foot print. On machine it took approx. 170 seconds to render page 6 using the debugger. > org.apache.pdfbox.filter.Filter#decode() Java heap space > -------------------------------------------------------- > > Key: PDFBOX-5675 > URL: https://issues.apache.org/jira/browse/PDFBOX-5675 > Project: PDFBox > Issue Type: Bug > Affects Versions: 3.0.0 PDFBox > Reporter: liu > Assignee: Andreas Lehmkühler > Priority: Major > Fix For: 3.0.3 PDFBox, 4.0.0 > > Attachments: 2095e3df01fc32e0bff982a1e79600d5bcf10b81.pdf, > PDFBOX-5675-v2.patch, PDFBOX-5675.patch, image-2023-09-05-15-05-50-168.png, > image-2024-04-24-16-50-38-925.png, image-2024-04-24-18-33-17-524.png, > image-2024-04-24-18-35-43-792.png, image-2024-04-24-19-25-22-904.png, > image.png, screenshot-1.png, screenshot-2.png > > > !image-2023-09-05-15-05-50-168.png! > When converting the sixth page of this PDF > file(2095e3df01fc32e0bff982a1e79600d5bcf10b81.pdf) to an image, a memory > overflow occurs. Can you provide a way to store the output in a temporary > file? > {code:java} > -Xmx2000m > public static void main(String[] args) throws IOException, > InterruptedException { > File file = new > File("D:\\2095e3df01fc32e0bff982a1e79600d5bcf10b81.pdf"); > PDDocument pdf = Loader.loadPDF(file, > IOUtils.createTempFileOnlyStreamCache()); > pdf.setResourceCache(new PdfboxResourceCache()); > PDFRenderer renderer = new PDFRenderer(pdf); > renderer.setSubsamplingAllowed(true); > BufferedImage bi = renderer.renderImage(5, 0.125f); > Thread.sleep(3600000); > pdf.close(); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org