[
https://issues.apache.org/jira/browse/PDFBOX-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18090222#comment-18090222
]
Maruan Sahyoun commented on PDFBOX-4951:
----------------------------------------
I have been follwing the discussion of the PR and some of the changes. I'd
support adding this as I see the benefit of it when it comes to handling
combining letters, which has been the original intent, but also handling
foreign scripts.
Looks like the groundwork for a clean module split is already 90% complete
because {{{}PDAbstractContentStream{}}}, {{{}PDAcroForm{}}}, and
{{AppearanceGeneratorHelper}} only interact with the interfaces rather than
concrete implementations.
However, placing {{GlyphLayoutProcessorAwt}} directly into the core module
introduces a hard dependency on {{java.desktop}} (via {{{}java.awt.*{}}}). For
users building minimal runtimes via {{{}jlink{}}}, running in restricted
headless server environments, or working on Android variants, keeping core free
of desktop dependencies is highly beneficial.
Additionally, as discussed, there is a strong interest in potentially
supporting alternative text-shaping engines (like {*}Apache FOP{*}'s complex
script layout) down the line. Keeping the SPI decoupled from the implementation
allows us to easily introduce a {{pdfbox-layout-fop}} submodule later on
without altering the core engine.
*Proposed Adjustment*
Since the SPI pattern is already beautifully established here, could we split
the AWT-specific implementation into its own submodule?
Keep in pdfbox (Core):
- GlyphLayoutProcessorInterface
- ContentStreamForGlyphLayoutInterface
- GlyphsAndPositions
- The wiring hooks in PDAbstractContentStream / PDAcroForm
Move to a new {{pdfbox-layout-awt}} submodule:
- GlyphLayoutProcessorAwt
- GlyphLayoutFontLoaderAwt
This keeps the core module lightweight and headless-friendly, while allowing
users who need advanced shaping to simply pull in the pdfbox-layout-awt
dependency.
Other than that +1 from my side
> Sequences of DIN SPEC 91379 with combining letters are rendered incorrectly
> ---------------------------------------------------------------------------
>
> Key: PDFBOX-4951
> URL: https://issues.apache.org/jira/browse/PDFBOX-4951
> Project: PDFBox
> Issue Type: Bug
> Components: Rendering
> Affects Versions: 2.0.21
> Reporter: Volker Kunert
> Priority: Major
> Attachments: DIN_SPEC_91379_Sequences-aa.pdf,
> DIN_SPEC_91379_Sequences-ab.pdf, DIN_SPEC_91379_Sequences-ac.pdf,
> DIN_SPEC_91379_Sequences.txt, DefaultScriptProcessor.java, DejaVuSans.ttf,
> DoGlyphLayoutBidi.pdf, DoGlyphLayoutDinSpec91379.pdf,
> DoGlyphLayoutDinSpec91379Form.pdf, DoGlyphPositionBengali.pdf,
> ExamplePdfboxFopPos-By-Tilman.pdf, ExamplePdfboxFopPos.java,
> ExamplePdfboxFopPos.pdf, ExamplePdfboxFopPosForm.java,
> ExamplePdfboxFopPosForm.pdf, FiraCode-Regular.ttf,
> FontForge-Lohit-Bengali.png, TestPdfbox.java, TestPdfboxFop2.java,
> TestPdfboxFop2.pdf, TestPdfboxJava2D.java, TestPdfboxJava2D.pdf, bidi-1.png,
> bidi-2.png, bidi.png, example-PDFBOX-3147-NotoSansThaiLooped-Regular.png,
> image-2026-05-23-16-16-53-442.png, image-2026-05-23-16-17-28-172.png,
> image-2026-05-26-16-49-45-529.png, ligatures-kerning.png,
> patch-2020-10-02.txt, pdfbox.patch, pdfbox.pdf, screenshot-1.png
>
>
> Accented Letters composed of Unicode base letter and combining accent are
> rendered wrong. E.g. with 0041 030B LATIN CAPITAL LETTER A WITH COMBINING
> DOUBLE ACUTE ACCENT the accent appears at the right hand side of the letter
> A, not above the letter A.
> The position is wrong for most of the sequences defined in the following spec:
> DIN SPEC 91379: Characters in Unicode for the electronic processing of names
> and data
> exchange in Europe; with digital attachment
> [https://www.xoev.de/downloads-2316#StringLatin]
> [https://www.din.de/de/wdc-beuth:din21:301228458]
>
> The correct rendering should look like the output of hb-view 2.6.8, see files
> DIN_SPEC_91379_Sequences*.pdf.
> The output of PDFBox is appended in pdfbox.pdf, which is created by running
> TestPdfbox.java. The sequences are read from file
> DIN_SPEC_91379_Sequences.txt.
>
> Font used for testing: NotoSansMono-Regular.ttf, see
> [https://www.google.com/get/noto/]
> download:
> [https://noto-website-2.storage.googleapis.com/pkgs/NotoSansMono-hinted.zip]
> See also FOP-2969
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]