PDFBox performance issue: Encoding.java getCharacter() method tweak
---------------------------------------------------------------------
Key: PDFBOX-603
URL: https://issues.apache.org/jira/browse/PDFBOX-603
Project: PDFBox
Issue Type: Improvement
Components: Text extraction
Affects Versions: 0.8.0-incubator
Environment: All
Reporter: Mel Martinez
Attachments: Encoding.java
During parsing / text extraction the Encoding.getCharacter(COSName) method is
invoked repeatedly.
It includes a string test that is performed up front but should only occur
rarely. The code should be restructured slightly to only perform that test
later. I.E. it should succeed fast and fail slow.
I'll post an attachment that rewrites the method slightly. The performance
gains is fairly significant.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.