[iText-questions] ligature implementation for Indian languages / Devanagari script
I have a web application where I need to generate PDF files on the Linux server. I need to support a few Indian languages along with English. I started by using Arial Unicode (ARIALUNI.TTF) and test pdf generation on my the application installed on my Windows machine. iText is generating pdf files with Gujarati fonts but doesn't seem to implement any ligature. Does iText have ligature implementation for Indian languages such as Hindi, Gujarati, Marathi, etc? Here is the part of the PDF generation code: // open the pdf document Document document = new Document(); // get an instance of PdfWriter and in the process link it with the output file PdfWriter.getInstance(document, new FileOutputStream(uniqueFilePath)); // uniqueFilePath defined earlier in the program // open the document document.open(); // specify the font location String fontFile = "C:/WINDOWS/Fonts/ARIALUNI.TTF"; // create base font for the specified font BaseFont baseFont = BaseFont.createFont(fontFile, BaseFont.IDENTITY_H, BaseFont.EMBEDDED); Font font = new Font(baseFont, 12); document.add(new Paragraph(text, font)); // text in unicode defined earlier in the program === Thanks in advance. Dilip -- View this message in context: http://itext-general.2136553.n4.nabble.com/ligature-implementation-for-Indian-languages-Devanagari-script-tp3423055p3423055.html Sent from the iText - General mailing list archive at Nabble.com. -- Create and publish websites with WebMatrix Use the most popular FREE web apps or write code yourself; WebMatrix provides all the features you need to develop and publish your website. http://p.sf.net/sfu/ms-webmatrix-sf ___ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
Re: [iText-questions] ligature implementation for Indian languages / Devanagari script
Would that be a common dictionary for all the Indian languages that use Devanagari script or a separate one for each language such as Hindi, Marathi, Gujarati, etc. I believe you'll need separate one for each but then I could be wrong. I know these 3 Indian languages well enough to help out. What's the format of this dictionary? Could you point to the Arabic dictionary? Thanks for so much interest in this subject. Can we make use of the interest and momentum to get this done. Dilip From: Leonard Rosenthol-3 [via iText - General] Sent: Wednesday, April 06, 2011 1:48 PM To: [email protected] Subject: Re: ligature implementation for Indian languages / Devanagari script That's EXACTLY what is needed - the "dictionary" that tells iText that when it sees a specific combination of codepoints to use a different glyph than normal. iText has one for Arabic text, but not for Devanagari (and other Indics). If you can build such a table/dictionary, that would go a LONG WAY to getting support into iText. Leonard -Original Message- From: John Kilbourne [mailto:[hidden email]] Sent: Wednesday, April 06, 2011 1:15 PM To: Post all your questions about iText here Subject: Re: [iText-questions] ligature implementation for Indian languages / Devanagari script Thank you for your clarification. I understand from your last paragraph that iText would need to determine whether any contextual glyph shaping needs to be performed and then find the relevant glyphs in the font file. iText would not need to read the glyphs 'live' (as they are being typed in) and change the rendering as subsequent characters are typed; it just sees a finished sequence of Unicode (often multi-)byte characters. Is is difficult to have a 'dictionary' of character combinations within iText that relate the combinations of Unicode characters ('codepoints' I think is the correct term) to the appropriate glyphs (e.g. क + ष = क्ष)? I would like to help (because I would really like to use iText), or at least understand this problem better. - Original Message - From: "Leonard Rosenthol" <[hidden email]> To: "Post all your questions about iText here" <[hidden email]> Sent: Wednesday, April 6, 2011 3:45:24 PM GMT -05:00 US/Canada Eastern Subject: Re: [iText-questions] ligature implementation for Indian languages / Devanagari script Roman (also sometimes called Latin) is a class of languages (also known as Romance) that includes English, French, German, etc. This is codified in the encoding ISO 8859-1 (also called ISO Latin 1 - <http://en.wikipedia.org/wiki/ISO/IEC_8859-1>) Devanagari is a script used for Hindi (and other Indic languages - see <http://en.wikipedia.org/wiki/Devanagari>). Fonts are simply a way to provide a set of glyphs (visual representations of "letters" and "symbols"). They may or may not have a correlation to a specific script or language. In most cases today, fonts include glyphs for MANY languages & scripts (eg. Unicode fonts). A font CAN NOT automatically do anything! The software that lays out the characters/code points MUST determine whether any contextual glyph shaping needs to be performed and then find the relevant glyphs in the font file. See <http://people.w3.org/rishida/docs/unicode-tutorial/part3#context-sensitive> which is just part of a full presentation on Unicode. Hope that helps clarify things for you. -Original Message- From: John Kilbourne [mailto:[hidden email]] Sent: Wednesday, April 06, 2011 12:34 PM To: Post all your questions about iText here Subject: Re: [iText-questions] ligature implementation for Indian languages / Devanagari script I wonder if the distinction between font and language and something in between aren't involved here. English is a language, Roman is not quite a font (I think), but Times-Roman would be a font. Hindi is a language, devanagari is not quite a font, and Sansrit2003 (the font I use for devanagari) is a font. Anyway, here (http://www.wazu.jp/gallery/Fonts_Devanagari.html) is a list of devanagari fonts showing the ligatures they naturally produce. Sanskrit 2003 (the font) automatically renders devanagari ligatures like क्ष, त्म, प्र for क्+ष्, त्+म्, and प् + र्. - Original Message - From: "Leonard Rosenthol" <[hidden email]> To: "Post all your questions about iText here" <[hidden email]> Sent: Wednesday, April 6, 2011 2:49:09 PM GMT -05:00 US/Canada Eastern Subject: Re: [iText-questions] ligature implementation for Indian languages / Devanagari script The information about what two character codes/code points make up a given ligature isn't encoded into a font. For example, there is nothing that tells me that when I find 'f' and 'i' next to each other in R
Re: [iText-questions] ligature implementation for Indian languages / Devanagari script
I'm working on a self-funded 'hobby' project. I guess we'll have to wait for a party with funds who badly needs this done. Dilip From: Paulo Soares-3 [via iText - General] Sent: Wednesday, April 06, 2011 4:05 PM To: [email protected] Subject: Re: ligature implementation for Indian languages / Devanagari script Indic ligatures are a lot more complex, not only with the posiible combinations but also, and probably more important, in that the ligaturized representation has no corresponding Unicode code point. This requires a GSUB table to provide the glyph id for the ligature. iText has no capability to read this table (GPOS would also be nice to have). The process to implement support for Indic scripts would be: - have the rules for Indic ligatures - decode the GSUB table in the font to get the glyph id of the ligature - add the ligature, as a glyph id, to the output text None of this is supported in iText for the moment and would take several weeks to implement if we knew how (we can learn, no big deal) and if someone was willing to pay for the development. Paulo - Original Message - From: [hidden email] To: [hidden email] Sent: Wednesday, April 06, 2011 10:55 PM Subject: Re: [iText-questions] ligature implementation for Indian languages / Devanagari script I’ll let Paulo comment since he wrote the Arabic shaper and knows what’s involved… From: [hidden email] [mailto:[hidden email]] Sent: Wednesday, April 06, 2011 2:03 PM To: [hidden email] Subject: Re: [iText-questions] ligature implementation for Indian languages / Devanagari script Would that be a common dictionary for all the Indian languages that use Devanagari script or a separate one for each language such as Hindi, Marathi, Gujarati, etc. I believe you'll need separate one for each but then I could be wrong. I know these 3 Indian languages well enough to help out. What's the format of this dictionary? Could you point to the Arabic dictionary? Thanks for so much interest in this subject. Can we make use of the interest and momentum to get this done. Dilip From: [hidden email] Sent: Wednesday, April 06, 2011 1:48 PM To: [hidden email] Subject: Re: ligature implementation for Indian languages / Devanagari script That's EXACTLY what is needed - the "dictionary" that tells iText that when it sees a specific combination of codepoints to use a different glyph than normal. iText has one for Arabic text, but not for Devanagari (and other Indics). If you can build such a table/dictionary, that would go a LONG WAY to getting support into iText. Leonard -Original Message- From: John Kilbourne [mailto:[hidden email]] Sent: Wednesday, April 06, 2011 1:15 PM To: Post all your questions about iText here Subject: Re: [iText-questions] ligature implementation for Indian languages / Devanagari script Thank you for your clarification. I understand from your last paragraph that iText would need to determine whether any contextual glyph shaping needs to be performed and then find the relevant glyphs in the font file. iText would not need to read the glyphs 'live' (as they are being typed in) and change the rendering as subsequent characters are typed; it just sees a finished sequence of Unicode (often multi-)byte characters. Is is difficult to have a 'dictionary' of character combinations within iText that relate the combinations of Unicode characters ('codepoints' I think is the correct term) to the appropriate glyphs (e.g. क + ष = क्ष)? I would like to help (because I would really like to use iText), or at least understand this problem better. - Original Message - From: "Leonard Rosenthol" <[hidden email]> To: "Post all your questions about iText here" <[hidden email]> Sent: Wednesday, April 6, 2011 3:45:24 PM GMT -05:00 US/Canada Eastern Subject: Re: [iText-questions] ligature implementation for Indian languages / Devanagari script Roman (also sometimes called Latin) is a class of languages (also known as Romance) that includes English, French, German, etc. This is codified in the encoding ISO 8859-1 (also called ISO Latin 1 - <http://en.wikipedia.org/wiki/ISO/IEC_8859-1>" rel=nofollow target=_top link="external">http://en.wikipedia.org/wiki/ISO/IEC_8859-1>) Devanagari is a script used for Hindi (and other Indic languages - see <http://en.wikipedia.org/wiki/Devanagari>" rel=nofollow target=_top link="external">http://en.wikipedia.org/wiki/Devanagari>). Fonts are simply a way to provide a set of glyphs (visual representations of "letters" and "symbols"). They may or may not have a correlation to a specific script or language. In most cases toda
Re: [iText-questions] ligature implementation for Indian languages / Devanagari script
I would like to pick up this thread... Is there any way I could help with implementing Indian languages in iText? Is there any documentation / code that I can refer to and attempt to implement one Indian language to start with? I'll be more than happy to contribute my work to the community. Dilip From: Dilip Shah Sent: Thursday, April 07, 2011 6:40 AM To: Paulo Soares-3 [via iText - General] Subject: Re: ligature implementation for Indian languages / Devanagari script I'm working on a self-funded 'hobby' project. I guess we'll have to wait for a party with funds who badly needs this done. Dilip From: Paulo Soares-3 [via iText - General] Sent: Wednesday, April 06, 2011 4:05 PM To: [email protected] Subject: Re: ligature implementation for Indian languages / Devanagari script Indic ligatures are a lot more complex, not only with the posiible combinations but also, and probably more important, in that the ligaturized representation has no corresponding Unicode code point. This requires a GSUB table to provide the glyph id for the ligature. iText has no capability to read this table (GPOS would also be nice to have). The process to implement support for Indic scripts would be: - have the rules for Indic ligatures - decode the GSUB table in the font to get the glyph id of the ligature - add the ligature, as a glyph id, to the output text None of this is supported in iText for the moment and would take several weeks to implement if we knew how (we can learn, no big deal) and if someone was willing to pay for the development. Paulo - Original Message - From: [hidden email] To: [hidden email] Sent: Wednesday, April 06, 2011 10:55 PM Subject: Re: [iText-questions] ligature implementation for Indian languages / Devanagari script I’ll let Paulo comment since he wrote the Arabic shaper and knows what’s involved… From: [hidden email] [mailto:[hidden email]] Sent: Wednesday, April 06, 2011 2:03 PM To: [hidden email] Subject: Re: [iText-questions] ligature implementation for Indian languages / Devanagari script Would that be a common dictionary for all the Indian languages that use Devanagari script or a separate one for each language such as Hindi, Marathi, Gujarati, etc. I believe you'll need separate one for each but then I could be wrong. I know these 3 Indian languages well enough to help out. What's the format of this dictionary? Could you point to the Arabic dictionary? Thanks for so much interest in this subject. Can we make use of the interest and momentum to get this done. Dilip From: [hidden email] Sent: Wednesday, April 06, 2011 1:48 PM To: [hidden email] Subject: Re: ligature implementation for Indian languages / Devanagari script That's EXACTLY what is needed - the "dictionary" that tells iText that when it sees a specific combination of codepoints to use a different glyph than normal. iText has one for Arabic text, but not for Devanagari (and other Indics). If you can build such a table/dictionary, that would go a LONG WAY to getting support into iText. Leonard -Original Message- From: John Kilbourne [mailto:[hidden email]] Sent: Wednesday, April 06, 2011 1:15 PM To: Post all your questions about iText here Subject: Re: [iText-questions] ligature implementation for Indian languages / Devanagari script Thank you for your clarification. I understand from your last paragraph that iText would need to determine whether any contextual glyph shaping needs to be performed and then find the relevant glyphs in the font file. iText would not need to read the glyphs 'live' (as they are being typed in) and change the rendering as subsequent characters are typed; it just sees a finished sequence of Unicode (often multi-)byte characters. Is is difficult to have a 'dictionary' of character combinations within iText that relate the combinations of Unicode characters ('codepoints' I think is the correct term) to the appropriate glyphs (e.g. क + ष = क्ष)? I would like to help (because I would really like to use iText), or at least understand this problem better. - Original Message - From: "Leonard Rosenthol" <[hidden email]> To: "Post all your questions about iText here" <[hidden email]> Sent: Wednesday, April 6, 2011 3:45:24 PM GMT -05:00 US/Canada Eastern Subject: Re: [iText-questions] ligature implementation for Indian languages / Devanagari script Roman (also sometimes called Latin) is a class of languages (also known as Romance) that includes English, French, German, etc. This is codified in the encoding ISO 8859-1 (also called ISO Latin 1 - <http://en.wikipedia.org/wiki/ISO/IEC_8859-1>" rel=nofollow target=_top link="external">http://en.wiki
Re: [iText-questions] ligature implementation for Indian languages / Devanagari script
Hi Paulo, What can be done to use iText for Indian languages? As I've mentioned in my earlier emails, I'm willing to put in time to implement one Indian language to start with and contribute my discoveries as well as code to the community. Any direction in this matter is highly appreciated. Dilip From: Dilip Shah Sent: Thursday, October 13, 2011 2:17 PM To: Paulo Soares-3 [via iText - General] Subject: Re: ligature implementation for Indian languages / Devanagari script I would like to pick up this thread... Is there any way I could help with implementing Indian languages in iText? Is there any documentation / code that I can refer to and attempt to implement one Indian language to start with? I'll be more than happy to contribute my work to the community. Dilip From: Dilip Shah Sent: Thursday, April 07, 2011 6:40 AM To: Paulo Soares-3 [via iText - General] Subject: Re: ligature implementation for Indian languages / Devanagari script I'm working on a self-funded 'hobby' project. I guess we'll have to wait for a party with funds who badly needs this done. Dilip From: Paulo Soares-3 [via iText - General] Sent: Wednesday, April 06, 2011 4:05 PM To: [email protected] Subject: Re: ligature implementation for Indian languages / Devanagari script Indic ligatures are a lot more complex, not only with the posiible combinations but also, and probably more important, in that the ligaturized representation has no corresponding Unicode code point. This requires a GSUB table to provide the glyph id for the ligature. iText has no capability to read this table (GPOS would also be nice to have). The process to implement support for Indic scripts would be: - have the rules for Indic ligatures - decode the GSUB table in the font to get the glyph id of the ligature - add the ligature, as a glyph id, to the output text None of this is supported in iText for the moment and would take several weeks to implement if we knew how (we can learn, no big deal) and if someone was willing to pay for the development. Paulo - Original Message - From: [hidden email] To: [hidden email] Sent: Wednesday, April 06, 2011 10:55 PM Subject: Re: [iText-questions] ligature implementation for Indian languages / Devanagari script I’ll let Paulo comment since he wrote the Arabic shaper and knows what’s involved… From: [hidden email] [mailto:[hidden email]] Sent: Wednesday, April 06, 2011 2:03 PM To: [hidden email] Subject: Re: [iText-questions] ligature implementation for Indian languages / Devanagari script Would that be a common dictionary for all the Indian languages that use Devanagari script or a separate one for each language such as Hindi, Marathi, Gujarati, etc. I believe you'll need separate one for each but then I could be wrong. I know these 3 Indian languages well enough to help out. What's the format of this dictionary? Could you point to the Arabic dictionary? Thanks for so much interest in this subject. Can we make use of the interest and momentum to get this done. Dilip From: [hidden email] Sent: Wednesday, April 06, 2011 1:48 PM To: [hidden email] Subject: Re: ligature implementation for Indian languages / Devanagari script That's EXACTLY what is needed - the "dictionary" that tells iText that when it sees a specific combination of codepoints to use a different glyph than normal. iText has one for Arabic text, but not for Devanagari (and other Indics). If you can build such a table/dictionary, that would go a LONG WAY to getting support into iText. Leonard -Original Message- From: John Kilbourne [mailto:[hidden email]] Sent: Wednesday, April 06, 2011 1:15 PM To: Post all your questions about iText here Subject: Re: [iText-questions] ligature implementation for Indian languages / Devanagari script Thank you for your clarification. I understand from your last paragraph that iText would need to determine whether any contextual glyph shaping needs to be performed and then find the relevant glyphs in the font file. iText would not need to read the glyphs 'live' (as they are being typed in) and change the rendering as subsequent characters are typed; it just sees a finished sequence of Unicode (often multi-)byte characters. Is is difficult to have a 'dictionary' of character combinations within iText that relate the combinations of Unicode characters ('codepoints' I think is the correct term) to the appropriate glyphs (e.g. क + ष = क्ष)? I would like to help (because I would really like to use iText), or at least understand this problem better. - Original Message - From: "Leonard Rosenthol" <[hidden email]> To: "Post all your questions about iText here" <[hidden email]> Sent: Wedn
