[ 
https://issues.apache.org/jira/browse/PDFBOX-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated PDFBOX-2862:
--------------------------------
    Attachment: batch-process-warn-first10klines.log.bz2

First 10k lines from the caught exceptions log file.  This run was against the 
full govdocs1 corpus so you will see non-pdf exceptions.

I stopped the batch run after ~200k files. Roughly a quarter of the files are 
pdfs, so I'd estimate ~50k pdfs.

There were 22 of these exceptions during that run.

> GlyphList doesn't appear to be thread safe in trunk...or user error?
> --------------------------------------------------------------------
>
>                 Key: PDFBOX-2862
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2862
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Tim Allison
>         Attachments: batch-process-warn-first10klines.log.bz2
>
>
> This could be user error, but I'm getting the following when running trunk in 
> a multithreaded environment.
> {noformat}
> Caused by: java.util.ConcurrentModificationException
>         at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922)
>         at java.util.HashMap$EntryIterator.next(HashMap.java:962)
>         at java.util.HashMap$EntryIterator.next(HashMap.java:960)
>         at java.util.HashMap.putAllForCreate(HashMap.java:554)
>         at java.util.HashMap.<init>(HashMap.java:298)
>         at 
> org.apache.pdfbox.pdmodel.font.encoding.GlyphList.<init>(GlyphList.java:114)
>         at 
> org.apache.pdfbox.text.PDFTextStreamEngine.<init>(PDFTextStreamEngine.java:103)
>         at 
> org.apache.pdfbox.text.PDFTextStripper.<init>(PDFTextStripper.java:196)
>         at 
> org.apache.tika.parser.pdf.PDF2XHTML.<init>(PDF2XHTML.java:106)
>         at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:133)
>         at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:132)
>         at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
>         ... 16 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to