Hello,

I am dealing with PDF files that have been created using TeX. This seems to create some specific problems. These are earlier papers from the 1990s, newer may be more standardised and present fewer problems.

1. German Umlauts may or may not be recognised.
For "Hölder" I get once "Ho¨lder" and once "H¨older" in the same document. "Ho¨lder" would be correct in UTF-8 if the diaeresis would be combining (Unicode 308) but it is the not combining variety (Unicode A8). The same appears in the html version (here: ¨). The not combining character is not a real problem, but putting it before once and after the other time is. PDFBox 0.7.3 seems to use consistently the version "H¨older".

2. Some Ligatures are lost: I get "de nition" for "definition" (the ligature "fi" for "fi" is replaced with a space). The same holds for example for all the words in "fix first satisfies defined finite" and many others. On the other hand, "reflecting" is correctly resolved to "reflecting".

Any chances this can be fixed?

All the best
Thomas 

Reply via email to