[ 
https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446682#comment-16446682
 ] 

Palash Ray commented on PDFBOX-4189:
------------------------------------

I would like to put some thoughts around the approach I took. I am reading all 
the GSUB tables. Then, based on the Language from the ScriptTable, I am first 
determining whether I will support GSUB at all. Right now, I have these 
languages (only Bengali for now) hard-coded in the 
*GlyphSubstitutionDataExtractor* class. Later, we would need to figure out a 
better way. The reason I did this is: wanted to be safe, and not break any 
existing features, for example *vert*.

Next, Microsoft has language-specific guidelines about how to handle the 
various features that are defined in the FeatureTable in GSUB. For Bengali, its 
here: 
[https://docs.microsoft.com/en-us/typography/script-development/bengali#reor]

In the *PDPageContentStream*, right now, I am just hard-coding the Bengali 
implementation of GSUB. Again, here we need to figure out a way to handle this 
gracefully.

 

The below features are still not supported:
 # Copy-pasting pdf-text works partially for the characters which have not been 
GSUB-processed. Imho, this feature is not that important.
 # Right now, certain characters are still placed wrongly. I hope to implement 
this soon. This is a very important feature.

In order to see how good I am using GSUB, in the example 
*BengaliPdfGenerationHelloWorld*, I have added some difficult text on the 1st 
page. On the 2nd page, I have embedded an image about how these should actually 
look like. As and when I add these missing features, these 2 pages would look 
similar.

 

Hope this helps in the review process.

 

Thanks,

Palash.

 

> Enable rendering of Indian languages, by reading and utilizing the GSUB table
> -----------------------------------------------------------------------------
>
>                 Key: PDFBOX-4189
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4189
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: FontBox, PDModel
>            Reporter: Palash Ray
>            Priority: Major
>         Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Implemented proper rendering of Indian languages, which need extensive Glyph 
> substitution. The GSUB table has been read and used effectively to replace 
> some compound words with their respective Glyphs. All tests are passing. I 
> have tested this for the Bengali font. Please review these changes and let me 
> know if it makes sense to incorporate these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to