Florent Guillaume created PDFBOX-1622:
-----------------------------------------

             Summary: TextNormalize init not thread-safe, may lead to infinite 
loop
                 Key: PDFBOX-1622
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1622
             Project: PDFBox
          Issue Type: Bug
          Components: Utilities
    Affects Versions: 1.0.0
            Reporter: Florent Guillaume
             Fix For: 1.8.3, 2.0.0


TextNormalize fills a static HashMap (DIACHASH) from a method 
(populateDiacHash) called by the TextNormalize constructor.

If the constructor is called from two different threads at the same time, then 
the HashMap may be written by two concurrent threads which may and will cause 
infinite loops.

We see the CPU at 100% and jstack shows 4 threads all stuck at:

"Thread-2" prio=10 tid=0x00007f6e94499000 nid=0x347 runnable 
[0x00007f6e925d6000]
   java.lang.Thread.State: RUNNABLE
        at java.util.HashMap.put(HashMap.java:391)
        at 
org.apache.pdfbox.util.TextNormalize.populateDiacHash(TextNormalize.java:82)
        at org.apache.pdfbox.util.TextNormalize.<init>(TextNormalize.java:41)
        at 
org.apache.pdfbox.util.PDFTextStripper.<init>(PDFTextStripper.java:193)

A patch to fix this is attached, it just moves the initialization to a static 
block.

Please apply to the 1.8.3 and 2.0.0 branches.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to