[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446682#comment-16446682 ]
Palash Ray commented on PDFBOX-4189: ------------------------------------ I would like to put some thoughts around the approach I took. I am reading all the GSUB tables. Then, based on the Language from the ScriptTable, I am first determining whether I will support GSUB at all. Right now, I have these languages (only Bengali for now) hard-coded in the *GlyphSubstitutionDataExtractor* class. Later, we would need to figure out a better way. The reason I did this is: wanted to be safe, and not break any existing features, for example *vert*. Next, Microsoft has language-specific guidelines about how to handle the various features that are defined in the FeatureTable in GSUB. For Bengali, its here: [https://docs.microsoft.com/en-us/typography/script-development/bengali#reor] In the *PDPageContentStream*, right now, I am just hard-coding the Bengali implementation of GSUB. Again, here we need to figure out a way to handle this gracefully. The below features are still not supported: # Copy-pasting pdf-text works partially for the characters which have not been GSUB-processed. Imho, this feature is not that important. # Right now, certain characters are still placed wrongly. I hope to implement this soon. This is a very important feature. In order to see how good I am using GSUB, in the example *BengaliPdfGenerationHelloWorld*, I have added some difficult text on the 1st page. On the 2nd page, I have embedded an image about how these should actually look like. As and when I add these missing features, these 2 pages would look similar. Hope this helps in the review process. Thanks, Palash. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > ----------------------------------------------------------------------------- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel > Reporter: Palash Ray > Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org