[jira] [Commented] (PDFBOX-4951) Sequences of DIN SPEC 91379 with combining letters are rendered incorrectly

Tilman Hausherr (Jira) Sat, 30 May 2026 06:01:07 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18084568#comment-18084568
 ]


Tilman Hausherr commented on PDFBOX-4951:
-----------------------------------------

Thoughts / Questions / Comments:
- Could it be possible to just pass the processor, and hide this double loading 
of the font in the pdfbox library code? I'm trying to make it simple for the 
users. Ideally stuff should "just work". If we load a font twice it may happen 
that people load a different font.
- Lets say people want a different processor. Could you create an 
FontProcessorInterface that has the minimal features?
- Would this extra loading of the font work with ttc files?
- Is it correct that arabic works nicely by default but bengali needs 
activation of ligatures?
- please replace <p/> with <p>
- showGlyphsWithPositioning() is package local, isn't it? Or do we want to make 
this accessible for people who make their own?
- "greek letters extended" looks like this: Ά Έ Ή Ί Ό
- can we get rid of "showTextPDType0Font(GlyphVector glyphVector, ..." and keep 
this to GlyphLayoutProcessor? This way we won't have awt in 
PDAbstractContentStream.
- I looked at my old texts here, so it seems I changed my mind from 2020. What 
I remember is that it was too much work, and then I didn't have enough free 
time. However I see I noticed the double loading of the font.
- Would this work with Thai?
- ifMixedThenDivideTextAndShow() should be split so that it either does 
something, or returns something. Currently it does both and this is confusing.
- Assuming you know a lot about this part of awt, does it also contain 
something for vertical fonts? We use GSUB for replacement of vertical glyphs.
- I'll run another copilot review later which is likely to be uncomfortable.
- If we take this (I like it and we have often been asked about complex 
scripts) we may need an ICLA


> Sequences of DIN SPEC 91379 with combining letters are rendered incorrectly
> ---------------------------------------------------------------------------
>
>                 Key: PDFBOX-4951
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4951
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.21
>            Reporter: Volker Kunert
>            Priority: Major
>         Attachments: DIN_SPEC_91379_Sequences-aa.pdf, 
> DIN_SPEC_91379_Sequences-ab.pdf, DIN_SPEC_91379_Sequences-ac.pdf, 
> DIN_SPEC_91379_Sequences.txt, DefaultScriptProcessor.java, DejaVuSans.ttf, 
> DoGlyphLayoutBidi.pdf, DoGlyphLayoutDinSpec91379.pdf, 
> DoGlyphLayoutDinSpec91379Form.pdf, DoGlyphPositionBengali.pdf, 
> ExamplePdfboxFopPos-By-Tilman.pdf, ExamplePdfboxFopPos.java, 
> ExamplePdfboxFopPos.pdf, ExamplePdfboxFopPosForm.java, 
> ExamplePdfboxFopPosForm.pdf, FiraCode-Regular.ttf, 
> FontForge-Lohit-Bengali.png, TestPdfbox.java, TestPdfboxFop2.java, 
> TestPdfboxFop2.pdf, TestPdfboxJava2D.java, TestPdfboxJava2D.pdf, bidi-1.png, 
> bidi-2.png, bidi.png, image-2026-05-23-16-16-53-442.png, 
> image-2026-05-23-16-17-28-172.png, image-2026-05-26-16-49-45-529.png, 
> ligatures-kerning.png, patch-2020-10-02.txt, pdfbox.patch, pdfbox.pdf, 
> screenshot-1.png
>
>
> Accented Letters composed of Unicode base letter and combining accent are 
> rendered wrong. E.g. with 0041 030B LATIN CAPITAL LETTER A WITH COMBINING 
> DOUBLE ACUTE ACCENT the accent appears at the right hand side of the letter 
> A, not above the letter A.
> The position is wrong for most of the sequences defined in the following spec:
> DIN SPEC 91379: Characters in Unicode for the electronic processing of names 
> and data 
>  exchange in Europe; with digital attachment
>  [https://www.xoev.de/downloads-2316#StringLatin]
>  [https://www.din.de/de/wdc-beuth:din21:301228458]
>  
> The correct rendering should look like the output of hb-view 2.6.8, see files 
> DIN_SPEC_91379_Sequences*.pdf.
> The output of PDFBox is appended in pdfbox.pdf, which is created by running 
> TestPdfbox.java. The sequences are read from file 
> DIN_SPEC_91379_Sequences.txt.
>  
> Font used for testing: NotoSansMono-Regular.ttf, see 
> [https://www.google.com/get/noto/] 
> download: 
> [https://noto-website-2.storage.googleapis.com/pkgs/NotoSansMono-hinted.zip]
>  See also FOP-2969
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-4951) Sequences of DIN SPEC 91379 with combining letters are rendered incorrectly

Reply via email to