[
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446682#comment-16446682
]
Palash Ray commented on PDFBOX-4189:
------------------------------------
I would like to put some thoughts around the approach I took. I am reading all
the GSUB tables. Then, based on the Language from the ScriptTable, I am first
determining whether I will support GSUB at all. Right now, I have these
languages (only Bengali for now) hard-coded in the
*GlyphSubstitutionDataExtractor* class. Later, we would need to figure out a
better way. The reason I did this is: wanted to be safe, and not break any
existing features, for example *vert*.
Next, Microsoft has language-specific guidelines about how to handle the
various features that are defined in the FeatureTable in GSUB. For Bengali, its
here:
[https://docs.microsoft.com/en-us/typography/script-development/bengali#reor]
In the *PDPageContentStream*, right now, I am just hard-coding the Bengali
implementation of GSUB. Again, here we need to figure out a way to handle this
gracefully.
The below features are still not supported:
# Copy-pasting pdf-text works partially for the characters which have not been
GSUB-processed. Imho, this feature is not that important.
# Right now, certain characters are still placed wrongly. I hope to implement
this soon. This is a very important feature.
In order to see how good I am using GSUB, in the example
*BengaliPdfGenerationHelloWorld*, I have added some difficult text on the 1st
page. On the 2nd page, I have embedded an image about how these should actually
look like. As and when I add these missing features, these 2 pages would look
similar.
Hope this helps in the review process.
Thanks,
Palash.
> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -----------------------------------------------------------------------------
>
> Key: PDFBOX-4189
> URL: https://issues.apache.org/jira/browse/PDFBOX-4189
> Project: PDFBox
> Issue Type: New Feature
> Components: FontBox, PDModel
> Reporter: Palash Ray
> Priority: Major
> Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph
> substitution. The GSUB table has been read and used effectively to replace
> some compound words with their respective Glyphs. All tests are passing. I
> have tested this for the Bengali font. Please review these changes and let me
> know if it makes sense to incorporate these.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]