[ 
https://issues.apache.org/jira/browse/PDFBOX-5530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621283#comment-17621283
 ] 

Michael Klink commented on PDFBOX-5530:
---------------------------------------

Thousands and thousands of tiny bitmap images. Gigantic content streams.

This is a very uncommon PDF internally... optimizing resource usage while still 
remaining performant would be quite a challenge.

{quote}Can this hashmap be changed to soft reference or weak reference?like 
WeakHashMap or ConcurrentReferenceHashMap.{quote}

PDFBox 2.x is based on an architecture that requires all objects in the PDF to 
be parsed and represented in memory, so "no".

You can try PDFBox 3 which offers just-in-time loading. Unfortunately it also 
requires all loaded objects to remain in memory, so if your processing 
eventually touches most of the PDF, the resource requirement eventually will be 
the same. So even there "no".

A mode that allows loaded but currently unused objects to be freed again (which 
would allow for a "yes") is not yet implemented in the mainstream PDFBox.

> Java heap space
> ---------------
>
>                 Key: PDFBOX-5530
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5530
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.25
>            Reporter: liu
>            Priority: Blocker
>         Attachments: image-2022-10-20-14-30-19-790.png, 
> image-2022-10-20-14-30-57-332.png, image-2022-10-20-14-32-10-258.png, 
> image-2022-10-20-15-01-06-688.png, image-2022-10-20-19-07-42-632.png, 
> image-2022-10-20-19-08-23-932.png, screenshot-1.png, 引起宕机-1.pdf, 引起宕机.pdf
>
>
> code(only this part of the code):
> PDDocument load = PDDocument.load(file, 
> MemoryUsageSetting.setupTempFileOnly(-1);
>  
> hi. Why do I configure it like this, it still takes up so much memory? What 
> is the effect of using setupTempFileOnly. 
> !image-2022-10-20-14-30-19-790.png!
> !image-2022-10-20-14-30-57-332.png!
> !image-2022-10-20-14-32-10-258.png!
> [^引起宕机.pdf]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to