[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446840#comment-16446840 ] Tilman Hausherr commented on PDFBOX-4189: - Thank you [~paawak] I committed your code with slight modifications. I removed most formatting changes (it drives the attention away from the acual changes) and changed the example code as mentioned before. todo next: - [~amake] please use the trunk to check if your vertical texts are still ok (likely yes, the tests pass and the PDF generated by the sample works too) - [~paawak] the example output now looks even more different than before - or is the text from the screenshot wrong? Is this related to your latest change, or could it be I messed up something? - run sonar tool - port to 2.0 after a few days > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446818#comment-16446818 ] ASF subversion and git services commented on PDFBOX-4189: - Commit 1829710 from [~tilman] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1829710 ] PDFBOX-4189: Enable rendering of Indian languages by reading and utilizing the GSUB table, by Palash Ray > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446728#comment-16446728 ] Tilman Hausherr commented on PDFBOX-4189: - I modified your example so that it is on one page and so that the line breaks are the same, please take that one - but I see that there are differences: see the last word (ব্যাস নির্ভয় ). However these differences are not when I see the source code. There it looks the same. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446711#comment-16446711 ] Tilman Hausherr commented on PDFBOX-4189: - that's just a warm-up and to get rid of binaries in the patch. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446709#comment-16446709 ] ASF subversion and git services commented on PDFBOX-4189: - Commit 1829697 from [~tilman] in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1829697 ] PDFBOX-4189: add lohit-bengali font for upcoming tests and example > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446710#comment-16446710 ] ASF subversion and git services commented on PDFBOX-4189: - Commit 1829698 from [~tilman] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1829698 ] PDFBOX-4189: add lohit-bengali font for upcoming tests and example > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446685#comment-16446685 ] Palash Ray commented on PDFBOX-4189: I know. If you ask me, its a real shame. The reason we have abstractions and specifications, we are supposed to be able to figure out pretty much the rules, without having to write language specific handlers. But I think even the font developers are to blame. They should push these big companies who build these specifications to do a better job. Anyway, sorry for the rant :) > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446684#comment-16446684 ] Tilman Hausherr commented on PDFBOX-4189: - I had a look at Apache FOP a year or two ago and I remember that they also have specific code for the different languages. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446682#comment-16446682 ] Palash Ray commented on PDFBOX-4189: I would like to put some thoughts around the approach I took. I am reading all the GSUB tables. Then, based on the Language from the ScriptTable, I am first determining whether I will support GSUB at all. Right now, I have these languages (only Bengali for now) hard-coded in the *GlyphSubstitutionDataExtractor* class. Later, we would need to figure out a better way. The reason I did this is: wanted to be safe, and not break any existing features, for example *vert*. Next, Microsoft has language-specific guidelines about how to handle the various features that are defined in the FeatureTable in GSUB. For Bengali, its here: [https://docs.microsoft.com/en-us/typography/script-development/bengali#reor] In the *PDPageContentStream*, right now, I am just hard-coding the Bengali implementation of GSUB. Again, here we need to figure out a way to handle this gracefully. The below features are still not supported: # Copy-pasting pdf-text works partially for the characters which have not been GSUB-processed. Imho, this feature is not that important. # Right now, certain characters are still placed wrongly. I hope to implement this soon. This is a very important feature. In order to see how good I am using GSUB, in the example *BengaliPdfGenerationHelloWorld*, I have added some difficult text on the 1st page. On the 2nd page, I have embedded an image about how these should actually look like. As and when I add these missing features, these 2 pages would look similar. Hope this helps in the review process. Thanks, Palash. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444713#comment-16444713 ] Tilman Hausherr commented on PDFBOX-4189: - Wow... I'll try to review all this in the weekend. I made a short test and subsetting works, and so does copy & paste. What does "breaking change" mean in your commits? I looked at one of these and it didn't look like it broke the API. Feel free to add your name (without mail) as "@author" in the classes that you introduced, and in those where you made major changes / improvements, But it's not required, i.e. some of us do it and some don't. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444671#comment-16444671 ] Palash Ray commented on PDFBOX-4189: Hi All, Most of my changes are done. I have taken care of subsetting as well. Its working fine. Apart from some minor issues and a few hard-coding for now, everything is almost there. Please take a look and let me know what else I should do. Thanks, Palash. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441424#comment-16441424 ] Palash Ray commented on PDFBOX-4189: I have re-instated PDFont::encode() as final, and, moved the gsub logic inside PDPageContentStream#showTextInternal, as suggested by John. I am enabling GSUB only for Bengali as of now: GlyphSubstitutionDataExtractor#SUPPORTED_LANGUAGES I am still working on subsetting. Thanks, Palash. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438914#comment-16438914 ] John Hewson commented on PDFBOX-4189: - For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt bq. here BASE, JSTF and BiDi are concerned with _paragraph-level_ layout, which happens at a higher level than the proposed layout() - which would be concerned with only a single script in a single direction (i.e. only OpenType _shaping_). BASE and BiDi are related to changes between different scripts, while JSTF is to aid in making good line break choices. So all of that functionality will happen somewhere else (this fits very closely with the layout code form forms, for example). So in layout we're really only going to be concerned with GPOS and GSUB features. That way the only options that one might want to pass to layout would be this list of which [feature flags|https://docs.microsoft.com/en-us/typography/opentype/spec/featurelist] to apply. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438640#comment-16438640 ] Palash Ray commented on PDFBOX-4189: Thanks a lot guys, for the detailed comments. It seems that there is some more work for me to ensure that this patch fits in into the broader scheme of things. I am ready to work with you to make this happen. I think PdfBox is a great piece of software, and I am committed to make it more feature rich. This particular feature is imporant to support any Indian or South East Asian Language. So, in my perspective, I would like to make it happen. John, thanks specially for taking the time out to explain the architecture. Let me do a bit of refactoring, and incorporate your suggestions. I will let you know how that goes. I plan to handle subsetting. Thanks, Palash. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438609#comment-16438609 ] Maruan Sahyoun commented on PDFBOX-4189: The patch is a great and - given several questions we had in the past - important addition to PDFBox. On the longer run I'd see some additions we might conceptually already think about and/or start introducing in the public API. As I haven't reviewed the patch the below list is meant to be a hint for possible addition. They may already be included For correct text positioning using mixed language information from the following tables might be useful: - GPOS: to adjust the glyph position - BASE: baseline offsets on a script-by-script basis. - JSTF: justification information, including whitespace and Kashida adjustments. - BIDI Mirroring: https://www.unicode.org/Public/10.0.0/ucd/BidiMirroring.txt To allow the user to override the language system identified by the script being used we might want to add {{setLanguage/getLanguage}} so that can be called prior to {{showText}} if an override needs to be done. Putting that into an internal {{layout}} method as John suggested would also allow us to put it behind a feature flag where one could enable/disable the processing. We might also mark that feature as **experimental** and specify which languages it has been tested with (to some extend). > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438540#comment-16438540 ] John Hewson commented on PDFBOX-4189: - Hi guys, this is a really welcome contribution, thank you. With regards to PDFont#encode(String text) being non-final I can add some insight as I was the original designer of our current PDFont#encode mechanism. Basically, the PDFont classes are designed to represent fonts identically to how they are represented when embedded in PDF files. So there's no support for OpenType, by design. A Type0 font knows nothing about OpenType (by design). So how can we use OpenType in PDFBox? The answer is that we do it one layer of abstraction up, during text _layout_ instead of text _encoding_*_._* So you want to put your glyph substitution code inside PDPageContentStream#showText, actually you want [PDPageContentStream#showTextInternal|https://github.com/apache/pdfbox/blob/7e721643c0b1fca9fdc349f78431f36e68abc097/pdfbox/src/main/java/org/apache/pdfbox/contentstream/PDAbstractContentStream.java#L256]. That way PDFont#encode(String text) can stay non-final :) > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438487#comment-16438487 ] Palash Ray commented on PDFBOX-4189: I have pushed some changes which takes care of most of the issues that you have pointed out except: # subsetting # BengaliPdfGenerationHelloWorld should be integrated into the EmbeddedFonts.java example I will take care of these as well. Meanwhile, please let me know if any other changes are needed. Thanks, Palash. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438411#comment-16438411 ] Tilman Hausherr commented on PDFBOX-4189: - Re subsetting: in your call of {{{color:#24292e}PDType0Font{color}{color:#d73a49}.{color}{color:#24292e}load{color}}}, set the last parameter to true or remove it, and see what happens. Subsetting means PDFBox creates a new font with only the glyphs that are really used, so generated files get smaller (for example, the Arial Uni font has a size of 23MB!). Please have a look at {{PDAbstractContentStream.showTextInternal}}. This takes all codepoints and remembers which will be in the subset. I suspect that you'd need to know what actual codepoints are used after the substitutions. Re {color:#33}GlyphsubstitutionTable{color}, yeah, just move it back, thanks. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438401#comment-16438401 ] Palash Ray commented on PDFBOX-4189: * +{color:#33}What is the story of having different data for jdk7 and jdk8{color}+ Out of 323 entries for the GSUB table for the Bengali-Lohit.ttf font, I am getting a single entry which differs for jdk1.7 and 1.8. Thats the reason I had to create the 2 files. I am still investigating this, so maybe, I will come up with a better solution when I get to the bottom of this * +I'd also need to know where this file came from, or whether you created it yourself from other data+ Those .txt files are simple reference data used for testing the correctness of the GSUB tables. I have created them by putting some logic, transforming unicode characters into base-10 numbers. * +BengaliPdfGenerationHelloWorld should be integrated into the EmbeddedFonts.java example+ Will do * +why a log4j2.xml ? We don't use log4j2 except in preflight where log4j is used in Tests+ Agreed, I will remove the log4j2.xml * +You disabled subsetting+ I don't understand that yet. Please bear with me, I will make it work even with that. Let me take a look. * +The move of GlyphsubstitutionTable+ I can move it back if it simplifies things. Should I? * +There is a lot of logging done+ Will do * +Loosening scope restrictions is a bit of a no-no+ Agreed. I did this as a part of the move of GlyphsubstitutionTable, if I undo the move, this will be taken care of. * +Public methods should have a javadoc+ Will do Thanks, Palash. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4189) Enable rendering of Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438383#comment-16438383 ] Tilman Hausherr commented on PDFBOX-4189: - Your patch is very much appreciated of course, thank you. It will probably result in thousands new users / usages. This is a complex patch so expect this to take some time before it is committed. See PDFBOX-4106 for an example of a complex patch and the discussion. Can you please also add an apache header to the .txt files? See the file "pdfbox\src\main\resources\org\apache\pdfbox\resources\glyphlist\additional.txt". I'd also need to know where this file came from, or whether you created it yourself from other data; if yes, please include a comment how, and/or the code that created the file. About the commits: - {color:#33}What is the story of having different data for jdk7 and jdk8?{color} - BengaliPdfGenerationHelloWorld should be integrated into the EmbeddedFonts.java example - why a log4j2.xml ? We don't use log4j2 except in preflight where log4j is used in Tests - I think I understand why my example didn't work. You disabled subsetting. But with subsetting the subsetter should "know" which glyphs are used. But we do need subsetting because otherwise files might get huge - The generated PDF file has trouble with text extraction: "আমি কোন পথƶ §ীরƶর ল©ী ষĞ পুতুল Šপো গÄা ঋষি" i.e. there are some unknown glyphs. - The move of GlyphsubstitutionTable breaks the API. Like I said in the PR, if you keep the API as it is (only expand. not change existing methods) then your change could be used for 2.0 too. The release of 3.0 could take years. The release of 2.0.10 only a few months. - There is a lot of logging done ("WARNUNG: oldValue: [52, 114] will be overridden with newValue: [114, 52]"). This is scary and should be changed or removed, It scares users and they create issues, thinking that something got wrong. If you change it to debug, please include a comment what this is about. See also the discussion in PDFBOX-4106, about{color:#33} "Trying to un-substitute a never-before-seen gid"{color}. - Loosening scope restrictions is a bit of a no-no, as done in [TTFDataStream.java|https://github.com/apache/pdfbox/pull/46/files#diff-894ae790d373c62634ceed941b264dc3] , [TTFTable.java|https://github.com/apache/pdfbox/pull/46/files#diff-355fd8e3330f392bdae0778f942dc124] , and maybe elsewhere. As preached by "Effective Java", item 15: "make each class or member as inaccessible as possible". - Public methods should have a javadoc, same for classes. It doesn't have to be big, just make it good enough for other people to understand what is done. See also [https://pdfbox.apache.org/codingconventions.html] , I think most conventions are already respected. I have no yet done a review review of the code (looking side-by-side), so more questions may be coming. > Enable rendering of Indian languages, by reading and utilizing the GSUB table > - > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org