[ https://issues.apache.org/jira/browse/PDFBOX-5808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841169#comment-17841169 ]
Tilman Hausherr commented on PDFBOX-5808: ----------------------------------------- I redownloaded the font from the creator and committed that one: https://www.glukfonts.pl/font.php?l=de&font=FoglihtenNo07 He has also a ttf font, but the file is much larger so we'll stay with the otf font. I had a look about what went wrong, and it's a positioning problem, we're searching for "_nnnn" but after a "hit" it's positioned after the "_". I prefer to solve that one instead of using the new code because it's shorter. However I'm using your tests to make sure it is correct, and also using the string code so that this tokenizer is used only for gsub core business and not for separating spaces. The returned strings sometimes have "_" and sometimes not but the existing code takes care of that. In building I found out a rendering difference (it appears in build output but doesn't fail it because some differences are because of different jdk versions), this is pdfbox/target/test-output/Devanagari.pdf-1.png and Devanagari.pdf-1.png-diff.png . However this turns out to be an improvement. > Add support for GSUB Lookup Type 3 > ---------------------------------- > > Key: PDFBOX-5808 > URL: https://issues.apache.org/jira/browse/PDFBOX-5808 > Project: PDFBox > Issue Type: New Feature > Components: FontBox > Affects Versions: 3.0.2 PDFBox > Reporter: Fabrice Calafat > Priority: Major > > Add support for the lookup type 3, Alternate Substitution when handling GSUB: > [https://learn.microsoft.com/en-us/typography/opentype/spec/gsub#AS] > The first available substitution glyph can be used (as done in other > libraries) > > Also, the current implementation of CompoundCharacterTokenizer doesn't > account for collision in ligatures > For example, if a font supports ligatures for _att_ and {_}en{_}, the current > implementation will not tokenize properly for the word _attention._ This is > because the regex implementation doesn't allow for a proper split > > I'll open a proposed implementation for the above -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org