[
https://issues.apache.org/jira/browse/PDFBOX-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711074#comment-16711074
]
Tilman Hausherr commented on PDFBOX-4396:
-----------------------------------------
The resource cache is not to be shared across documents. The key is COSObject,
i.e. an indirect object number. In a PDF file you see these as "10 0 R", and
the objects as "10 0 obj".
I don't know what internal comment you mean (you didn't quote it), but there is
a weakness somewhere that scratch file buffers are not closed properly and this
is done in finalization. There is a JIRA issue on this, e.g. PDFBOX-3388 and
PDFBOX-3359.
If your problem gets solved by calling gc yourself then it means java is to
blame because it should do a gc by itself when memory is too low to allocate
new objects.
If you can reproduce a scenario that eats up available memory then please share
the PDF and the code.
> Memory leak due to soft reference caching
> -----------------------------------------
>
> Key: PDFBOX-4396
> URL: https://issues.apache.org/jira/browse/PDFBOX-4396
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 2.0.12
> Environment: JDK10; G1
> Reporter: Ben Manes
> Priority: Major
> Attachments: #2 - memory leak 2.png, #2 - memory leak.png, memory
> leak 2.png, memory leak.png
>
>
> In a heap dump, it appears that DefaultResourceCache is retaining 5.3 GB of
> memory due to buffered images (via PDImageXObject). I suspect that G1 is not
> collecting soft references across all regions before it out-of-memory errors.
> In PDFBOX-4389, I discovered very slow PDDocument#load times due to a JDK10
> I/O bug. Previously I was loading the document to render each page, but this
> took 1.5 minutes. To work around that bug I reused the document instance
> across pages. This seems to have fail because the pages were cached and not
> cleared by the GC.
> The DefaultResourceCache does not prune its cache entries when the soft
> references are collected. Like WeakHashMap, it should use a ReferenceQueue,
> poll it on every access, and prune accordingly.
> Thankfully PDDocument#setResourceCache exists. For now I am going to reset
> the cache to a new instance after a page has been rendered. The entries
> should no longer be reachable and be GC'd more aggressively. If that doesn't
> work, I'll either replace the cache (e.g. with Caffeine) or disable it by
> setting the instance to null.
> I think the desired fix is to prune the DefaultResourceCache and, ideally,
> reconsider usage of soft references (as they tend to be poor in practice).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]