Misplaced text
--------------

                 Key: PDFBOX-624
                 URL: https://issues.apache.org/jira/browse/PDFBOX-624
             Project: PDFBox
          Issue Type: Bug
          Components: FontBox, Text extraction, Utilities
    Affects Versions: 1.0.1
            Reporter: Villu Ruusmann
            Priority: Critical


Thomas Fischer reported to [email protected] that 
org.apache.pdfbox.ExtractText interchanges typographic ligatures "fi" and "fl". 
The sample document "documenta_math.pdf" was created using TeX and AFPL 
Ghostscript 6.50.

I used PDFBox 1.0.1-SNAPSHOT to verify this problem. The "fi" ligature behaves 
correctly (ie. text extraction yields "finite" and "infinite", not "flnite" and 
"inflnite"), but the overall text layout is a complete mess. Please see the PDF 
text extraction result "documenta_math.txt" and PDF rendering result 
"documenta_math_page4.png".

The cause of the horizontal text misplacement is not yet known. This could 
affect all PDF documents which have been created using TeX.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to