Misplaced text
--------------
Key: PDFBOX-624
URL: https://issues.apache.org/jira/browse/PDFBOX-624
Project: PDFBox
Issue Type: Bug
Components: FontBox, Text extraction, Utilities
Affects Versions: 1.0.1
Reporter: Villu Ruusmann
Priority: Critical
Thomas Fischer reported to [email protected] that
org.apache.pdfbox.ExtractText interchanges typographic ligatures "fi" and "fl".
The sample document "documenta_math.pdf" was created using TeX and AFPL
Ghostscript 6.50.
I used PDFBox 1.0.1-SNAPSHOT to verify this problem. The "fi" ligature behaves
correctly (ie. text extraction yields "finite" and "infinite", not "flnite" and
"inflnite"), but the overall text layout is a complete mess. Please see the PDF
text extraction result "documenta_math.txt" and PDF rendering result
"documenta_math_page4.png".
The cause of the horizontal text misplacement is not yet known. This could
affect all PDF documents which have been created using TeX.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.