[ https://issues.apache.org/jira/browse/PDFBOX-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840563#comment-17840563 ]
Dieter von Holten commented on PDFBOX-5675: ------------------------------------------- there is another problem with this file, which may be more or less connected to the problem you investigate. On page 6 the file contains a stream of length 45.953.744, which is circa 45 MByte, the major part of the file-size. _This_ size itself should be no problem. The stream is FlateEncoded, that is compressed. However, when i open the file in PdfDebugger, click on page 6, it get an exception {color:#172b4d}"Required array size too large" from {color} {color:#172b4d}java.util.InputStream.readNBytes(), line 417 (in jdk 17). It is called from InputStream.readAllBytes( with Integer.MAX_VALUE ), which is called {color} {color:#172b4d}from StreamPane.requestStreamText().{color} {color:#172b4d}However, the internal buffer used in readNBytes() is Integer.MAX_VALUE-8. This method cannot read byte[] from streams larger {color} {color:#172b4d}that Integer.MAX_VALUE-8 (which usually is not a problem). The subclasses of InputStream seem to be able to handle larger streams,{color} {color:#172b4d}but the call to InputStream.readNBytes() must be avoided. The subclasses are a little questionable in this respect, somehow they{color} {color:#172b4d}know about 'long' positions and offsets, but in some places only 'int' is used. Everything works well when the things are well smaller than 2GB.{color} {color:#172b4d}HTH{color} > org.apache.pdfbox.filter.Filter#decode() Java heap space > -------------------------------------------------------- > > Key: PDFBOX-5675 > URL: https://issues.apache.org/jira/browse/PDFBOX-5675 > Project: PDFBox > Issue Type: Bug > Affects Versions: 3.0.0 PDFBox > Reporter: liu > Priority: Major > Attachments: 2095e3df01fc32e0bff982a1e79600d5bcf10b81.pdf, > image-2023-09-05-15-05-50-168.png, image-2024-04-24-16-50-38-925.png, > image-2024-04-24-18-33-17-524.png, image-2024-04-24-18-35-43-792.png, > image-2024-04-24-19-25-22-904.png, image.png, screenshot-1.png, > screenshot-2.png > > > !image-2023-09-05-15-05-50-168.png! > When converting the sixth page of this PDF > file(2095e3df01fc32e0bff982a1e79600d5bcf10b81.pdf) to an image, a memory > overflow occurs. Can you provide a way to store the output in a temporary > file? > {code:java} > -Xmx2000m > public static void main(String[] args) throws IOException, > InterruptedException { > File file = new > File("D:\\2095e3df01fc32e0bff982a1e79600d5bcf10b81.pdf"); > PDDocument pdf = Loader.loadPDF(file, > IOUtils.createTempFileOnlyStreamCache()); > pdf.setResourceCache(new PdfboxResourceCache()); > PDFRenderer renderer = new PDFRenderer(pdf); > renderer.setSubsamplingAllowed(true); > BufferedImage bi = renderer.renderImage(5, 0.125f); > Thread.sleep(3600000); > pdf.close(); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org