[ 
https://issues.apache.org/jira/browse/PDFBOX-5808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841169#comment-17841169
 ] 

Tilman Hausherr commented on PDFBOX-5808:
-----------------------------------------

I redownloaded the font from the creator and committed that one:
https://www.glukfonts.pl/font.php?l=de&font=FoglihtenNo07
He has also a ttf font, but the file is much larger so we'll stay with the otf 
font.

I had a look about what went wrong, and it's a positioning problem, we're 
searching for "_nnnn" but after a "hit" it's positioned after the "_". I prefer 
to solve that one instead of using the new code because it's shorter. However 
I'm using your tests to make sure it is correct, and also using the string code 
so that this tokenizer is used only for gsub core business and not for 
separating spaces.
The returned strings sometimes have "_" and sometimes not but the existing code 
takes care of that.
In building I found out a rendering difference (it appears in build output but 
doesn't fail it because some differences are because of different jdk 
versions), this is pdfbox/target/test-output/Devanagari.pdf-1.png and 
Devanagari.pdf-1.png-diff.png . However this turns out to be an improvement.

> Add support for GSUB Lookup Type 3
> ----------------------------------
>
>                 Key: PDFBOX-5808
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5808
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: FontBox
>    Affects Versions: 3.0.2 PDFBox
>            Reporter: Fabrice Calafat
>            Priority: Major
>
> Add support for the lookup type 3, Alternate Substitution when handling GSUB:
> [https://learn.microsoft.com/en-us/typography/opentype/spec/gsub#AS]
> The first available substitution glyph can be used (as done in other 
> libraries)
>  
> Also, the current implementation of CompoundCharacterTokenizer doesn't 
> account for collision in ligatures
> For example, if a font supports ligatures for _att_ and {_}en{_}, the current 
> implementation will not tokenize properly for the word _attention._ This is 
> because the regex implementation doesn't allow for a proper split
>  
> I'll open a proposed implementation for the above



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to