[jira] [Commented] (PDFBOX-5590) Java heap space

Tilman Hausherr (Jira) Fri, 28 Apr 2023 08:21:05 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-5590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717721#comment-17717721
 ]


Tilman Hausherr commented on PDFBOX-5590:
-----------------------------------------

If this code works for you then use it for yourself (I had my own fork for 
years before joining the project team, and my own modification never made it 
into the code, because another user had a better solution). We can't use it 
because it isn't a general solution. It would have a terrible performance for 
ordinary inline images (usually tiny ones). Not all users have access to temp 
space. So there would be much more to do.
We have a scratch buffer file concept, however at this time it isn't available 
for PDInlineImage.
We would have to change many classes.
But you did find an interesting problem: should the temporary 
ByteArrayOutputStream objects in PDInlineImage, PDStream and COSInputStream use 
space from the scratch buffer instead of from memory?
It would make sense for terrible files like yours.

Re "Subsequent operations are sub-sampled" no they're not, despite that the 
subsampling parameter is passed to the filter. If you look at the 
RunLengthDecodeFilter class, you'll notice that the "options" parameter doesn't 
exist in the methods. Because of that, this is the part that is called for RLE:
{code}
public DecodeResult decode(InputStream encoded, OutputStream decoded, 
COSDictionary parameters,
                                                   int index, DecodeOptions 
options) throws IOException
{
        return decode(encoded, decoded, parameters, index);
}
{code}
Your file is rather recent, it was created on 14.4.2023 by CATIA V6. You should 
file a bug report with them, and point them to the PDF specification: "Because 
the inline format gives the reader less flexibility in managing the image data, 
it shall be used only for small images (4 KB or less)."
It's also a terrible idea to store a huge vector graphic as a raster image.


> Java heap space
> ---------------
>
>                 Key: PDFBOX-5590
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5590
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.28
>            Reporter: liu
>            Priority: Major
>         Attachments: 1-1.jpg, 1.jpg, 2.jpg, MyPDInlineImage.java, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png, 
> screenshot-5.png, screenshot-6.png, test.pdf
>
>
> jvm：
> -Xmx1000M
> -Xms1000M
> -XX:-PrintGCDetails
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=C:\Users\LYCIT\Desktop
> demo：
> public static void main(String[] args) throws IOException {
>               File file = new File("C:\\Users\\LYCIT\\Desktop\\test.pdf");
>               final PDDocument load = PDDocument.load(file, 
> MemoryUsageSetting.setupTempFileOnly(-1)
>                               .setTempDir(new File("D:\\fcs\\test")));
>               PDFRenderer renderer = new PDFRenderer(load);
>               renderer.setSubsamplingAllowed(true);
>               float scale = 1.2f;
>               try {
>                       BufferedImage bufferedImage = renderer.renderImage(0, 
> scale, ImageType.RGB);
>               } catch (Exception e) {
>                       System.out.println(e);
>               }
>       }



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-5590) Java heap space

Reply via email to