[ https://issues.apache.org/jira/browse/PDFBOX-970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Hewson updated PDFBOX-970: ------------------------------- Component/s: (was: FontBox) Text extraction > TeX-created ligatures and umlauts are not recognised > ---------------------------------------------------- > > Key: PDFBOX-970 > URL: https://issues.apache.org/jira/browse/PDFBOX-970 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.5.0 > Environment: Mac OS X 10.6.6, Java(TM) SE Runtime Environment (build > 1.6.0_22-b04-307-10M3261) > Reporter: Thomas Fischer > Labels: textExtraction > Attachments: A Python Library for Provenance Recording and > Querying.txt, A Python Library for Provenance Recording and Querying.txt, > Test.pdf, Test.pdf, Test2-1.6.txt, Test2.1.4.txt, Test2.pdf > > > Ligatures in a TeX-created document are lost, which are regognised by v. 1.4, > e.g. > 1.4 1.5 > official ocial > effort e ort > fields elds > first rst > In addition, German umlauts (ä, ö, ü) are represented as ( a, o, u), -- This message was sent by Atlassian JIRA (v6.1.5#6160)