[ 
https://issues.apache.org/jira/browse/PDFBOX-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209247#comment-14209247
 ] 

John Hewson edited comment on PDFBOX-2200 at 11/13/14 4:30 AM:
---------------------------------------------------------------

In 1.8 {{cmapObjects}} uses {{Collections.synchronizedMap}} and only calls 
{{put}} and {{clear}}, so it's perfectly fine to call in a multithreaded 
environment. There's nothing wrong with, say, calling it after each PDF is 
processed.

In 2.0 the entire mechanism has been removed, because it wasn't effective, but 
it was safe.

Actually, in 2.0 we're now caching fonts statically, which has meant that we 
had to make them thread safe.


was (Author: jahewson):
In 1.8 {{cmapObjects}} uses {{Collections.synchronizedMap}} and only calls 
{{put}} and {{clear}}, so it's perfectly fine to call in a multithreaded 
environment. There's nothing wrong with, say, calling it after each PDF is 
processed.

In 2.0 the entire mechanism has been removed, because it wasn't effective, but 
it was safe.

> Memory leak with org.apache.pdfbox.pdmodel.font.PDFont#cmapObjects
> ------------------------------------------------------------------
>
>                 Key: PDFBOX-2200
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2200
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.8.6, 2.0.0
>            Reporter: Matthew Buckett
>             Fix For: 2.0.0
>
>
> We use Tika to extract text from a large number (10,000+) of PDFs in a long 
> running JVM, after doing this for a while we started running short of heap 
> space. A heap dump shows that about 717MB of heap is retained through 
> org.apache.pdfbox.pdmodel.font.PDFont#cmapObjects and the hashmap has 18001 
> entries.
> PDFBOX-1009 looked to partially address this but it appears the symptons are 
> still present. As a workaround I'm going to manually call             
> PDFont.clearResources() after indexing each document to prevent this 
> happening, but it would be better if I didn't have to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to