[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407489#comment-17407489 ] Tilman Hausherr edited comment on PDFBOX-4189 at 8/31/21, 3:52 PM: --- This would have to be implemented in the source code. See Language.java in fontbox, it describes what needs to be done to implement a new language (implement a new GsubWorker). Currently there's only a GsubWorkerForBengali.java . You would need to understand what Palash Ray has done and why. I assume you'd need to know about Bengali and Malayalam glyphs, i.e. how the substitutions are done. Maybe it's a similar principle, maybe it isn't. Nobody in the team does AFAIK. And you need to be able to build from source. The current implementation is incomplete, the visual is fine but the text extraction is wrong. You're welcome if you want to try! was (Author: tilman): This would have to be implemented in the source code. See Language.java in fontbox, it describes what needs to be done to implement a new language (implement a new GsubWorker). Currently there's only a GsubWorkerForBengali.java . You would need to understand what Palash Ray has done and why. I assume you'd need to know about Bengali and Malayalam glyphs, i.e. how the substitutions are done. Maybe it's a similar principle, maybe it isn't. Nobody in the team does AFAIK. And you need to be able to build from source. The current implementation is incomplete, the visual is fine but the text extraction is wrong. You're welcome if you want to try > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, bengali-example3.pdf, bengali-word-lohit-bad.pdf, > bengali-word-lohit-good.pdf, committed.patch, pdf-output.png, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407073#comment-17407073 ] Kishore Kumar edited comment on PDFBOX-4189 at 8/31/21, 5:45 AM: - Team, I am not getting PDFBox to render malayalam (one of the Indic script) text properly. If Ligature substitution works then this should work. I am using 3.0.0-RC1 version. String text = "വകുപ്പ് 1 മനുഷ്യരെല്ലാവരും തുല്യാവകാശങ്ങളോടും"; PDDocument doc = *new* PDDocument(); PDFont font = PDType0Font.load(doc, new File("/Users/kishore/Downloads/ML-NILA01_NewLipi.ttf")); PDPage page = *new* PDPage(); doc.addPage(page); PDPageContentStream contentStream = *new* PDPageContentStream(doc, page); contentStream.beginText(); contentStream.newLineAtOffset(25, 700); contentStream.setFont(font,12 ); contentStream.showText(text); contentStream.endText(); contentStream.close(); Do we have the support for GSUB tables now? Am I doing anything wrong here? Please suggest. The output I get is - !pdf-output.png! versus the input text text വകുപ്പ് 1 മനുഷ്യരെല്ലാവരും തുല്യാവകാശങ്ങളോടും Here GSUB substitution is not happening. was (Author: kishorekollam): Team, I am not getting PDFBox to render malayalam (one of the Indic script) text properly. If Ligature substitution works then this should work. I am using 3.0.0-RC1 version. String text = "വകുപ്പ് 1 മനുഷ്യരെല്ലാവരും തുല്യാവകാശങ്ങളോടും"; PDDocument doc = *new* PDDocument(); PDFont font = PDType0Font.load(doc, new File("/Users/kishore/Downloads/ML-NILA01_NewLipi.ttf")); PDPage page = *new* PDPage(); doc.addPage(page); PDPageContentStream contentStream = *new* PDPageContentStream(doc, page); contentStream.beginText(); contentStream.newLineAtOffset(25, 700); contentStream.setFont(font,12 ); contentStream.showText(text); contentStream.endText(); contentStream.close(); Do we have the support for GSUB tables now? Am I doing anything wrong here? Please suggest. > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, bengali-example3.pdf, bengali-word-lohit-bad.pdf, > bengali-word-lohit-good.pdf, committed.patch, pdf-output.png, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407073#comment-17407073 ] Kishore Kumar edited comment on PDFBOX-4189 at 8/31/21, 5:36 AM: - Team, I am not getting PDFBox to render malayalam (one of the Indic script) text properly. If Ligature substitution works then this should work. I am using 3.0.0-RC1 version. String text = "വകുപ്പ് 1 മനുഷ്യരെല്ലാവരും തുല്യാവകാശങ്ങളോടും"; PDDocument doc = *new* PDDocument(); PDFont font = PDType0Font.load(doc, new File("/Users/kishore/Downloads/ML-NILA01_NewLipi.ttf")); PDPage page = *new* PDPage(); doc.addPage(page); PDPageContentStream contentStream = *new* PDPageContentStream(doc, page); contentStream.beginText(); contentStream.newLineAtOffset(25, 700); contentStream.setFont(font,12 ); contentStream.showText(text); contentStream.endText(); contentStream.close(); Do we have the support for GSUB tables now? Am I doing anything wrong here? Please suggest. was (Author: kishorekollam): Team, I am not getting PDFBox to render malayalam (one of the Indic script) text properly. If Ligature substitution works then this should work. I am using 3.0.0-RC1 version. String text = "വകുപ്പ് 1 മനുഷ്യരെല്ലാവരും തുല്യാവകാശങ്ങളോടും"; PDDocument doc = *new* PDDocument(); PDFont font = PDType0Font.load(doc, new File("/Users/kishore/Downloads/ML-NILA01_NewLipi.ttf")); PDPage page = *new* PDPage(); doc.addPage(page); PDPageContentStream contentStream = *new* PDPageContentStream(doc, page); contentStream.beginText(); contentStream.newLineAtOffset(25, 700); contentStream.setFont(font,12 ); contentStream.showText(text); contentStream.endText(); contentStream.close(); Am I doing anything wrong here? Please help. > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, bengali-example3.pdf, bengali-word-lohit-bad.pdf, > bengali-word-lohit-good.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407073#comment-17407073 ] Kishore Kumar edited comment on PDFBOX-4189 at 8/31/21, 5:35 AM: - Team, I am not getting PDFBox to render malayalam (one of the Indic script) text properly. If Ligature substitution works then this should work. I am using 3.0.0-RC1 version. String text = "വകുപ്പ് 1 മനുഷ്യരെല്ലാവരും തുല്യാവകാശങ്ങളോടും"; PDDocument doc = *new* PDDocument(); PDFont font = PDType0Font.load(doc, new File("/Users/kishore/Downloads/ML-NILA01_NewLipi.ttf")); PDPage page = *new* PDPage(); doc.addPage(page); PDPageContentStream contentStream = *new* PDPageContentStream(doc, page); contentStream.beginText(); contentStream.newLineAtOffset(25, 700); contentStream.setFont(font,12 ); contentStream.showText(text); contentStream.endText(); contentStream.close(); Am I doing anything wrong here? Please help. was (Author: kishorekollam): Team, I am not getting PDFBox to render malayalam (one of the Indic script) text properly. If Ligature substitution works then this should work. I am using 3.0.0-RC1 version. PDDocument doc = *new* PDDocument(); PDFont font = PDType0Font.load(doc, new File("/Users/kishore/Downloads/ML-NILA01_NewLipi.ttf")); PDPage page = *new* PDPage(); doc.addPage(page); PDPageContentStream contentStream = *new* PDPageContentStream(doc, page); contentStream.beginText(); contentStream.newLineAtOffset(25, 700); contentStream.setFont(font,12 ); contentStream.showText(text); contentStream.endText(); contentStream.close(); Am I doing anything wrong here? Please help. > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, bengali-example3.pdf, bengali-word-lohit-bad.pdf, > bengali-word-lohit-good.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827647#comment-16827647 ] Tilman Hausherr edited comment on PDFBOX-4189 at 4/27/19 4:02 PM: -- This has been a year and I wanted to look what's going on and concentrated on the first word ( আমি ). example2 has incorrect visual glyph sequence but correct text extraction. example 3 has correct visual glyphs sequence but incorrect text extraction. The "scythe" ি (= "BENGALI VOWEL SIGN I") is painted to the left of the consonant it is "influencing", but when composed with an editor, it is to be after it. WORD solves this that the "scythe" glyph maps to the consonant in the ToUnicode table: [^bengali-word-lohit-good.pdf] This somehow looked suspicious and I wondered what would happen if I'd use the "scythe" glyph with two different consonants: ( আিমি ). The result was [^bengali-word-lohit-bad.pdf] and the glyphs look good, but the text extraction is wrong: আিআি . So that is really funny 🤣 but the downside is that for now, we have no "gold standard" to look for some guidance and inspiration. was (Author: tilman): This has been a year and I wanted to look what's going on and concentrated on the first word ( আমি ). example2 has incorrect visual glyph sequence but correct text extraction. example 3 has correct visual glyphs sequence but incorrect text extraction. The "scythe" ি (= "BENGALI VOWEL SIGN I") is painted to the left of the consonant it is "influencing", but when composed with an editor, it is to be after it. WORD solves this that the "scythe" glyph maps to the consonant in the ToUnicode table: [^bengali-word-lohit-good.pdf] This somehow looked suspicious and I wondered what would happen if I'd use the "scythe" glyph with two different consonants: ( আিমি ). The result was [^bengali-word-lohit-bad.pdf] and the glyphs look good, but the text extraction is wrong: আিআি . So that is really funny 🤣 but the downside is that for now, we have no "gold standard" to look up to. > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, bengali-example3.pdf, bengali-word-lohit-bad.pdf, > bengali-word-lohit-good.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827647#comment-16827647 ] Tilman Hausherr edited comment on PDFBOX-4189 at 4/27/19 4:02 PM: -- This has been a year and I wanted to look what's going on and concentrated on the first word ( আমি ). example2 has incorrect visual glyph sequence but correct text extraction. example 3 has correct visual glyphs sequence but incorrect text extraction. The "scythe" ি (= "BENGALI VOWEL SIGN I") is painted to the left of the consonant it is "influencing", but when composed with an editor, it is to be after it. WORD solves this that the "scythe" glyph maps to the consonant in the ToUnicode table: [^bengali-word-lohit-good.pdf] This somehow looked suspicious and I wondered what would happen if I'd use the "scythe" glyph with two different consonants: ( আিমি ). The result was [^bengali-word-lohit-bad.pdf] and the glyphs look good, but the text extraction is wrong: আিআি . So that is really funny 🤣 but the downside is that for now, we have no "gold standard" to look up to. was (Author: tilman): This has been a year and I wanted to look what's going on and concentrated on the first word ( আমি ). example2 has incorrect visual glyph sequence but correct text extraction. example 3 has correct visual glyphs sequence but incorrect text extraction. The "scythe" ি (= "BENGALI VOWEL SIGN I") is painted to the left of the consonant it is "influencing", but when composed with an editor, it is to be after it. WORD solves this that the "scythe" glyph maps to the consonant in the ToUnicode table: [^bengali-word-lohit-good.pdf] This somehow looked suspicious and I wondered what would happen if I'd use the "scythe" glyph with two different consonants: ( আিমি ). The result was [^bengali-word-lohit-bad.pdf] and the glyphs look good, but the text extraction is wrong 🤣. So that is really funny, but the downside is that for now, we have no "gold standard" to look up to. > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, bengali-example3.pdf, bengali-word-lohit-bad.pdf, > bengali-word-lohit-good.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827647#comment-16827647 ] Tilman Hausherr edited comment on PDFBOX-4189 at 4/27/19 4:01 PM: -- This has been a year and I wanted to look what's going on and concentrated on the first word ( আমি ). example2 has incorrect visual glyph sequence but correct text extraction. example 3 has correct visual glyphs sequence but incorrect text extraction. The "scythe" ি (= "BENGALI VOWEL SIGN I") is painted to the left of the consonant it is "influencing", but when composed with an editor, it is to be after it. WORD solves this that the "scythe" glyph maps to the consonant in the ToUnicode table: [^bengali-word-lohit-good.pdf] This somehow looked suspicious and I wondered what would happen if I'd use the "scythe" glyph with two different consonants: ( আিমি ). The result was [^bengali-word-lohit-bad.pdf] and the glyphs look good, but the text extraction is wrong 🤣. So that is really funny, but the downside is that for now, we have no "gold standard" to look up to. was (Author: tilman): This has been a year and I wanted to look what's going on and concentrated on the first word ( আমি ). example2 has incorrect visual glyph sequence but correct text extraction. example 3 has correct visual glyphs sequence but incorrect text extraction. The "scythe" ি (= "BENGALI VOWEL SIGN I") is painted to the left of the consonant it is "influencing", but when composed with an editor, it is to be after it. WORD solves this that the "scythe" glyph maps to the consonant in the ToUnicode table: [^bengali-word-lohit-good.pdf] This somehow looked suspicious and I wondered what would happen if I'd use the "scythe" glyph with two different consonants. The result was [^bengali-word-lohit-bad.pdf] and the glyphs look good, but the text extraction is wrong 🤣. So that is really funny, but the downside is that for now, we have no "gold standard" to look up to. > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, bengali-example3.pdf, bengali-word-lohit-bad.pdf, > bengali-word-lohit-good.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827647#comment-16827647 ] Tilman Hausherr edited comment on PDFBOX-4189 at 4/27/19 3:51 PM: -- This has been a year and I wanted to look what's going on and concentrated on the first word ( আমি ). example2 has incorrect visual glyph sequence but correct text extraction. example 3 has correct visual glyphs sequence but incorrect text extraction. The "scythe" ি (= "BENGALI VOWEL SIGN I") is painted to the left of the consonant it is "influencing", but when composed with an editor, it is to be after it. WORD solves this that the "scythe" glyph maps to the consonant in the ToUnicode table: [^bengali-word-lohit-good.pdf] This somehow looked suspicious and I wondered what would happen if I'd use the "scythe" glyph with two different consonants. The result was [^bengali-word-lohit-bad.pdf] and the glyphs look good, but the text extraction is wrong 🤣. So that is really funny, but the downside is that for now, we have no "gold standard" to look up to. was (Author: tilman): This has been a year and I wanted to look what's going on and concentrated on the first word ( আমি ). example2 has incorrect visual glyph sequence but correct extraction. example 3 has correct visual glyphs sequence but incorrect extraction. The "scythe" ি (= "BENGALI VOWEL SIGN I") is painted to the left of the consonant it is "influencing", but when composed with an editor, it is to be after it. WORD solves this that the "scythe" glyph maps to the consonant in the ToUnicode table: [^bengali-word-lohit-good.pdf] This somehow looked suspicious and I wondered what would happen if I'd use the "scythe" glyph with two different consonants. The result was [^bengali-word-lohit-bad.pdf] and the glyphs look good, but the text extraction is wrong 🤣. So that is really funny, but the downside is that for now, we have no "gold standard" to look up to. > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, bengali-example3.pdf, bengali-word-lohit-bad.pdf, > bengali-word-lohit-good.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471296#comment-16471296 ] John Hewson edited comment on PDFBOX-4189 at 5/10/18 11:56 PM: --- I'm trying to get ToUnicodeMap generation working properly with GSUB but have hit problems introduced by PDFBOX-4106. We'll have to resolve that before I can proceed here. was (Author: jahewson): I'm trying to ToUnicodeMap generation working properly but have hit problems introduced by PDFBOX-4106. We'll have to resolve that before I can proceed here. > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, bengali-example3.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467728#comment-16467728 ] John Hewson edited comment on PDFBOX-4189 at 5/8/18 5:51 PM: - {quote} Maruan Sahyoun added a comment - 29/Apr/18 08:36 Tilman Hausherr Palash Ray could we get a method which returns the glyphs/ids/code to use together with the metrics information for a string? {quote} That's exactly what a GlyphVector is. Might be what we need here... was (Author: jahewson): {quote} Maruan Sahyoun added a comment - 29/Apr/18 08:36 Tilman Hausherr Palash Ray could we get a method which returns the glyphs/ids/code to use together with the metrics information for a string? {quote} That's exactly what a GlyphVector is. > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, bengali-example3.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458001#comment-16458001 ] Palash Ray edited comment on PDFBOX-4189 at 4/29/18 12:29 PM: -- I am unable to re-produce this error. It runs fine for me. I confirm that I do not have any local changes. Can you please tell me on what JRE you are running this? I tested with both 7 and 8, and its good. Actually, let me do a fresh checkout and re-test. Thanks, Palash. was (Author: paawak): I am unable to re-produce this error. It runs fine for me. I confirm that I do not have any local changes. Can you please tell me on what JRE you are running this? I tested with both 7 and 8, and its good. Thanks, Palash. > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, bengali-example3.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457993#comment-16457993 ] Tilman Hausherr edited comment on PDFBOX-4189 at 4/29/18 11:51 AM: --- I'm getting "Exception in thread "main" java.lang.IllegalArgumentException: No glyph for U+00E0 in font Lohit-Bengali" when running the example. Could you check whether the committed sample text file is the one you used, i.e. that no byte was changed? What encoding is used for the text, could it be we have to pass the encoding to InputStreamReader? was (Author: tilman): I'm getting "Exception in thread "main" java.lang.IllegalArgumentException: No glyph for U+00E0 in font Lohit-Bengali" when running the example. Could you check whether the committed sample text file is the one you used? What encoding is used for the text, could it be we have to pass the encoding to InputStreamReader? > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, bengali-example3.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456325#comment-16456325 ] Tilman Hausherr edited comment on PDFBOX-4189 at 4/27/18 12:53 PM: --- Thank you, I committed your change with one minor difference, I used the font instead of the font name. The reason is that I don't trust the names to be really different. was (Author: tilman): I committed your change with one minor difference, I used the font instead of the font name. The reason is that I don't trust the names to be really different. > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16454699#comment-16454699 ] Palash Ray edited comment on PDFBOX-4189 at 4/26/18 7:03 PM: - Well, I share your concern of not impacting others with these Gsub changes. I have a safety feature here: unless your Font supports the specific script name mentioned in the Language enum, this Gsub system will not kick in. And right now I have only the Bengali language in the Language enum. I think due to this safety feature, the Gsub feature should be pretty safe to have. However, if you find some other vulnerability that I might have overlooked, please do let me know, I am more than happy to fix. As for the Gsub workers, Tilman, I have taken your advice and created a Map of GsubWorkers. Please take a look if that agrees with you: [https://github.com/apache/pdfbox/pull/49] Thanks, Palash. was (Author: paawak): Well, I share your concern of not impacting others with these Gsub changes. I have a safety feature here: unless your Font supports the specific script name mentioned in the Language enum, this Gsub system will not kick in. And right now I have only the Bengali language in the Language enum. I think due o this safety feature, this Gsub feature should be pretty safe to have. However, if you find some other vulnerability that I might have overlooked, please do let me know, I am more than happy to fix them. As for the Gsub workers, Tilman, I have taken your advice and created a Map of GsubWorkers. Please take a look if that agrees with you: [https://github.com/apache/pdfbox/pull/49] Thanks, Palash. > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453551#comment-16453551 ] Maruan Sahyoun edited comment on PDFBOX-4189 at 4/26/18 6:10 AM: - I haven't had the time to look at the details but am following the discussion. What about - detecting the Script -> {{Character.UnicodeScript}} in {{java.lang}} - provide a language setting on top to override/specify further as a script might cover several languages which may have different needs - putting the GSUB processing behind a flag/configuration similar to Apache Fop (https://xmlgraphics.apache.org/fop/trunk/complexscripts.html#Disabling-complex-scripts) so users can decide if they want this for performance and compatibility reasons. Maybe similar to what was done in {{PDFMergerUtility}} was (Author: msahyoun): I haven't had the time to look at the details but am following the discussion. What about - detecting the Script -> {{Character.UnicodeScript}} in {{java.lang}} - provide a language setting on top to override/specify further as a script might cover several languages which may have different needs - putting the GSUB processing behind a flag/configuration similar to Apache Fop so users can decide if they want this for performance and compatibility reasons. Maybe similar to what was done in {{PDFMergerUtility}} > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453042#comment-16453042 ] Tilman Hausherr edited comment on PDFBOX-4189 at 4/25/18 8:33 PM: -- With "same as 2.0.9 i.e. no rearrangement" I mean that the new feature should not be activated by default, so that people who use 2.0.10 (assuming I commit your changes there too) would have the same output as before. The reason is that not everybody wants this, for example people who have pixel comparisons of their output don't want their tests to fail after a version change because of the rearrangement / replacement. About activating the script - I think it should be independent of the font. Some fonts may support several scripts. (I just see that your gsubdata code returns one single language so maybe I'm wrong there, I thought of Arial Uni that has a lot of different alphabets) How about caching the workers in the content stream, with the font as the key? was (Author: tilman): With "same as 2.0.9 i.e. no rearrangement" I mean that the new feature should not be activated by default, so that people who use 2.0.10 (assuming I commit your changes there too) would have the same output as before. The reason is that not everybody wants this, for example people who have pixel comparisons of their output don't want their tests to fail after a version change because of the rearrangement / replacement. About activating the script - I think it should be independent of the font. Some fonts may support several scripts. (I just see that your gsubdata code returns one single language so maybe I'm wrong there, I thought of Arial Uni that has a lot of different languages) How about caching the workers in the content stream, with the font as the key? > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453042#comment-16453042 ] Tilman Hausherr edited comment on PDFBOX-4189 at 4/25/18 8:32 PM: -- With "same as 2.0.9 i.e. no rearrangement" I mean that the new feature should not be activated by default, so that people who use 2.0.10 (assuming I commit your changes there too) would have the same output as before. The reason is that not everybody wants this, for example people who have pixel comparisons of their output don't want their tests to fail after a version change because of the rearrangement / replacement. About activating the script - I think it should be independent of the font. Some fonts may support several scripts. (I just see that your gsubdata code returns one single language so maybe I'm wrong there, I thought of Arial Uni that has a lot of different languages) How about caching the workers in the content stream, with the font as the key? was (Author: tilman): With "same as 2.0.9 i.e. no rearrangement" I mean that the new feature should not be activated by default, so that people who use 2.0.10 (assuming I commit your changes there too) would have the same output as before. The reason is that not everybody wants this, for example people who have pixel comparisons of their output don't want their tests to fail after a version change because of the rearrangement / replacement. About activating the script - I think it should be independent of the font. Some fonts may support several scripts. How about caching the workers in the content stream, with the font as the key? > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453042#comment-16453042 ] Tilman Hausherr edited comment on PDFBOX-4189 at 4/25/18 8:29 PM: -- With "same as 2.0.9 i.e. no rearrangement" I mean that the new feature should not be activated by default, so that people who use 2.0.10 (assuming I commit your changes there too) would have the same output as before. The reason is that not everybody wants this, for example people who have pixel comparisons of their output don't want their tests to fail after a version change because of the rearrangement / replacement. About activating the script - I think it should be independent of the font. Some fonts may support several scripts. How about caching the workers in the content stream, with the font as the key? was (Author: tilman): With "same as 2.0.9 i.e. no rearrangement" I mean that the new feature should not be activated by default, so that people who use 2.0.10 (assuming I commit your changes there too) would have the same output as before. The reason is that not everybody wants this, for example people who have pixel comparisons of their output don't want their tests to fail after a version change. About activating the script - I think it should be independent of the font. Some fonts may support several scripts. How about caching the workers in the content stream, with the font as the key? > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452887#comment-16452887 ] Tilman Hausherr edited comment on PDFBOX-4189 at 4/25/18 6:57 PM: -- I wonder if I'm missing something... {{gsubWorkerFactory.getGsubWorker(pdType0Font.getCmapLookup(), gsubData);}} is still hit for every call of showTextInternal(). Another thing to do: the default behaviour should be the same as 2.0.9 i.e. no rearrangement. What I'm thinking is some setter in PDPageContentStream that activates the GSUB worker for that script, e.g. setScript("Bengali"). was (Author: tilman): I wonder if I'm missing something... {{gsubWorkerFactory.getGsubWorker(pdType0Font.getCmapLookup(), gsubData);}} is still hit for every call of showTextInternal(). Another thing to do: the default behaviour should be the same as 2.0.9 i.e. no rearrangement. What I'm thinking is some language setting that activates the GSUB worker for that script. > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16447116#comment-16447116 ] Tilman Hausherr edited comment on PDFBOX-4189 at 4/22/18 10:13 AM: --- What is to be done if we want to activate ligatures for latin languages - write another ccmp and put our own FEATURES_IN_ORDER, here with ccmp, liga and clig, and add "latn" to GlyphSubstitutionDataExtractor.SUPPORTED_LANGUAGES? https://docs.microsoft.com/en-us/typography/script-development/standard was (Author: tilman): What is to be done if we want to activate ligatures for latin languages - write another ccmp and put our own FEATURES_IN_ORDER, here with ccmp, liga and clig? https://docs.microsoft.com/en-us/typography/script-development/standard > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4189) Enable PDF creation with Indian languages, by reading and utilizing the GSUB table
[ https://issues.apache.org/jira/browse/PDFBOX-4189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446840#comment-16446840 ] Tilman Hausherr edited comment on PDFBOX-4189 at 4/22/18 6:35 AM: -- Thank you [~paawak] I committed your code with slight modifications. I removed most formatting changes (it drives the attention away from the acual changes) and changed the example code as mentioned before. todo next: - [~amake] please use the trunk to check if your vertical texts are still ok (likely yes, the tests pass and the PDF generated by the sample works too) - [~paawak] the example output now looks even more different than before - or is the text from the screenshot wrong? Is this related to your latest change, or could it be I messed up something? - run sonar tool (done) - implement something to activate language specific handling - port to 2.0 after a few days was (Author: tilman): Thank you [~paawak] I committed your code with slight modifications. I removed most formatting changes (it drives the attention away from the acual changes) and changed the example code as mentioned before. todo next: - [~amake] please use the trunk to check if your vertical texts are still ok (likely yes, the tests pass and the PDF generated by the sample works too) - [~paawak] the example output now looks even more different than before - or is the text from the screenshot wrong? Is this related to your latest change, or could it be I messed up something? - run sonar tool - port to 2.0 after a few days > Enable PDF creation with Indian languages, by reading and utilizing the GSUB > table > -- > > Key: PDFBOX-4189 > URL: https://issues.apache.org/jira/browse/PDFBOX-4189 > Project: PDFBox > Issue Type: New Feature > Components: FontBox, PDModel >Reporter: Palash Ray >Priority: Major > Attachments: Bengali-text-after.pdf, Bengali-text-before.pdf, > BengaliPdfGenerationHelloWorld.java, bengali-example.pdf, > bengali-example2.pdf, committed.patch, screenshot.png > > Original Estimate: 336h > Remaining Estimate: 336h > > Implemented proper rendering of Indian languages, which need extensive Glyph > substitution. The GSUB table has been read and used effectively to replace > some compound words with their respective Glyphs. All tests are passing. I > have tested this for the Bengali font. Please review these changes and let me > know if it makes sense to incorporate these. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org