[iText-questions] ligature implementation for Indian languages / Devanagari script

2011-04-02 Thread dilipvs...@hotmail.com
I have a web application where I need to generate PDF files on the Linux
server. I need to support a few Indian languages along with English. I
started by using Arial Unicode (ARIALUNI.TTF) and test pdf generation on  my
the application installed on my Windows machine. iText is generating pdf
files with Gujarati fonts but doesn't seem to implement any ligature.
 
Does iText have ligature implementation for Indian languages such as Hindi,
Gujarati, Marathi, etc?

Here is the part of the PDF generation code:

// open the pdf document
Document document = new Document();

// get an instance of PdfWriter and in the process link it with the output
file
PdfWriter.getInstance(document, new FileOutputStream(uniqueFilePath)); //
uniqueFilePath defined earlier in the program

// open the document
document.open();

// specify the font location
String fontFile = "C:/WINDOWS/Fonts/ARIALUNI.TTF";

// create base font for the specified font
BaseFont baseFont = BaseFont.createFont(fontFile, BaseFont.IDENTITY_H,
BaseFont.EMBEDDED);
Font font = new Font(baseFont, 12);

document.add(new Paragraph(text, font)); // text in unicode defined earlier
in the program
===
 
Thanks in advance.
 
Dilip


--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/ligature-implementation-for-Indian-languages-Devanagari-script-tp3423055p3423055.html
Sent from the iText - General mailing list archive at Nabble.com.

--
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
___
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php


Re: [iText-questions] ligature implementation for Indian languages / Devanagari script

2011-04-06 Thread dilipvs...@hotmail.com
Would that be a common dictionary for all the Indian languages that use 
Devanagari script or a separate one for each language such as Hindi, Marathi, 
Gujarati, etc. I believe you'll need separate one for each but then I could be 
wrong. I know these 3 Indian languages well enough to help out.

What's the format of this dictionary? Could you point to the Arabic dictionary?

Thanks for so much interest in this subject. Can we make use of the interest 
and momentum to get this done.

Dilip



From: Leonard Rosenthol-3 [via iText - General] 
Sent: Wednesday, April 06, 2011 1:48 PM
To: [email protected] 
Subject: Re: ligature implementation for Indian languages / Devanagari script


That's EXACTLY what is needed -  the "dictionary" that tells iText that when it 
sees a specific combination of codepoints to use a different glyph than normal. 
  iText has one for Arabic text, but not for Devanagari (and other Indics).  If 
you can build such a table/dictionary, that would go a LONG WAY to getting 
support into iText. 

Leonard 

-Original Message- 
From: John Kilbourne [mailto:[hidden email]] 
Sent: Wednesday, April 06, 2011 1:15 PM 
To: Post all your questions about iText here 
Subject: Re: [iText-questions] ligature implementation for Indian languages / 
Devanagari script 

Thank you for your clarification. 

I understand from your last paragraph that iText would need to determine 
whether any contextual glyph shaping needs to be performed and then find the 
relevant glyphs in the font file. iText would not need to read the glyphs 
'live' (as they are being typed in) and change the rendering as subsequent 
characters are typed; it just sees a finished sequence of Unicode (often 
multi-)byte characters. Is is difficult to have a 'dictionary' of character 
combinations within iText that relate the combinations of Unicode characters 
('codepoints' I think is the correct term) to the appropriate glyphs (e.g. क + 
ष = क्ष)? I would like to help (because I would really like to use iText), or 
at least understand this problem better. 

  
- Original Message - 
From: "Leonard Rosenthol" <[hidden email]> 
To: "Post all your questions about iText here" <[hidden email]> 
Sent: Wednesday, April 6, 2011 3:45:24 PM GMT -05:00 US/Canada Eastern 
Subject: Re: [iText-questions] ligature implementation for Indian languages / 
Devanagari script 

Roman (also sometimes called Latin) is a class of languages (also known as 
Romance) that includes English, French, German, etc.  This is codified in the 
encoding ISO 8859-1 (also called ISO Latin 1 - 
<http://en.wikipedia.org/wiki/ISO/IEC_8859-1>) 

Devanagari is a script used for Hindi (and other Indic languages - see 
<http://en.wikipedia.org/wiki/Devanagari>). 

Fonts are simply a way to provide a set of glyphs (visual representations of 
"letters" and "symbols").   They may or may not have a correlation to a 
specific script or language.  In most cases today, fonts include glyphs for 
MANY languages & scripts (eg. Unicode fonts). 

A font CAN NOT automatically do anything!  The software that lays out the 
characters/code points MUST determine whether any contextual glyph shaping 
needs to be performed and then find the relevant glyphs in the font file.  See 
<http://people.w3.org/rishida/docs/unicode-tutorial/part3#context-sensitive> 
which is just part of a full presentation on Unicode. 

Hope that helps clarify things for you. 

-Original Message- 
From: John Kilbourne [mailto:[hidden email]] 
Sent: Wednesday, April 06, 2011 12:34 PM 
To: Post all your questions about iText here 
Subject: Re: [iText-questions] ligature implementation for Indian languages / 
Devanagari script 

I wonder if the distinction between font and language and something in between 
aren't involved here. English is a language, Roman is not quite a font (I 
think), but Times-Roman would be a font. Hindi is a language, devanagari is not 
quite a font, and Sansrit2003 (the font I use for devanagari) is a font. 

Anyway, here (http://www.wazu.jp/gallery/Fonts_Devanagari.html)  is a list of 
devanagari fonts showing the ligatures they naturally produce. Sanskrit 2003 
(the font) automatically renders devanagari ligatures like 
क्ष, त्म, प्र  for 
क्‌+ष्‌, त्‌+म्‌, and प्‌ + र्‌. 


- Original Message - 
From: "Leonard Rosenthol" <[hidden email]> 
To: "Post all your questions about iText here" <[hidden email]> 
Sent: Wednesday, April 6, 2011 2:49:09 PM GMT -05:00 US/Canada Eastern 
Subject: Re: [iText-questions] ligature implementation for Indian languages / 
Devanagari script 

The information about what two character codes/code points make up a given 
ligature isn't encoded into a font.  For example, there is nothing that tells 
me that when I find 'f' and 'i' next to each other in R

Re: [iText-questions] ligature implementation for Indian languages / Devanagari script

2011-04-07 Thread dilipvs...@hotmail.com
I'm working on a self-funded 'hobby' project. I guess we'll have to wait for a 
party with funds who badly needs this done. 

Dilip


From: Paulo Soares-3 [via iText - General] 
Sent: Wednesday, April 06, 2011 4:05 PM
To: [email protected] 
Subject: Re: ligature implementation for Indian languages / Devanagari script


 
Indic ligatures are a lot more complex, not only with the posiible combinations 
but also, and probably more important, in that the ligaturized representation 
has no corresponding Unicode code point. This requires a GSUB table to provide 
the glyph id for the ligature. iText has no capability to read this table (GPOS 
would also be nice to have). The process to implement support for Indic scripts 
would be:

- have the rules for Indic ligatures
- decode the GSUB table in the font to get the glyph id of the ligature
- add the ligature, as a glyph id, to the output text

None of this is supported in iText for the moment and would take several weeks 
to implement if we knew how (we can learn, no big deal) and if someone was 
willing to pay for the development.

Paulo
  - Original Message - 
  From: [hidden email] 
  To: [hidden email] 
  Sent: Wednesday, April 06, 2011 10:55 PM
  Subject: Re: [iText-questions] ligature implementation for Indian languages / 
Devanagari script


  I’ll let Paulo comment since he wrote the Arabic shaper and knows what’s 
involved…

   

  From: [hidden email] [mailto:[hidden email]] 
  Sent: Wednesday, April 06, 2011 2:03 PM
  To: [hidden email]
  Subject: Re: [iText-questions] ligature implementation for Indian languages / 
Devanagari script

   

  Would that be a common dictionary for all the Indian languages that use 
Devanagari script or a separate one for each language such as Hindi, Marathi, 
Gujarati, etc. I believe you'll need separate one for each but then I could be 
wrong. I know these 3 Indian languages well enough to help out.

   

  What's the format of this dictionary? Could you point to the Arabic 
dictionary?

   

  Thanks for so much interest in this subject. Can we make use of the interest 
and momentum to get this done.

   

  Dilip

   

   

  From: [hidden email] 

  Sent: Wednesday, April 06, 2011 1:48 PM

  To: [hidden email] 

  Subject: Re: ligature implementation for Indian languages / Devanagari script

   

  That's EXACTLY what is needed -  the "dictionary" that tells iText that when 
it sees a specific combination of codepoints to use a different glyph than 
normal.   iText has one for Arabic text, but not for Devanagari (and other 
Indics).  If you can build such a table/dictionary, that would go a LONG WAY to 
getting support into iText. 

  Leonard 

  -Original Message- 
  From: John Kilbourne [mailto:[hidden email]] 
  Sent: Wednesday, April 06, 2011 1:15 PM 
  To: Post all your questions about iText here 
  Subject: Re: [iText-questions] ligature implementation for Indian languages / 
Devanagari script 

  Thank you for your clarification. 

  I understand from your last paragraph that iText would need to determine 
whether any contextual glyph shaping needs to be performed and then find the 
relevant glyphs in the font file. iText would not need to read the glyphs 
'live' (as they are being typed in) and change the rendering as subsequent 
characters are typed; it just sees a finished sequence of Unicode (often 
multi-)byte characters. Is is difficult to have a 'dictionary' of character 
combinations within iText that relate the combinations of Unicode characters 
('codepoints' I think is the correct term) to the appropriate glyphs (e.g. क + 
ष = क्ष)? I would like to help (because I would really like to use iText), or 
at least understand this problem better. 


  - Original Message - 
  From: "Leonard Rosenthol" <[hidden email]> 
  To: "Post all your questions about iText here" <[hidden email]> 
  Sent: Wednesday, April 6, 2011 3:45:24 PM GMT -05:00 US/Canada Eastern 
  Subject: Re: [iText-questions] ligature implementation for Indian languages / 
Devanagari script 

  Roman (also sometimes called Latin) is a class of languages (also known as 
Romance) that includes English, French, German, etc.  This is codified in the 
encoding ISO 8859-1 (also called ISO Latin 1 - <http://en.wikipedia.org/wiki/ISO/IEC_8859-1>" rel=nofollow target=_top 
link="external">http://en.wikipedia.org/wiki/ISO/IEC_8859-1>) 

  Devanagari is a script used for Hindi (and other Indic languages - see <http://en.wikipedia.org/wiki/Devanagari>" rel=nofollow target=_top 
link="external">http://en.wikipedia.org/wiki/Devanagari>). 

  Fonts are simply a way to provide a set of glyphs (visual representations of 
"letters" and "symbols").   They may or may not have a correlation to a 
specific script or language.  In most cases toda

Re: [iText-questions] ligature implementation for Indian languages / Devanagari script

2011-10-13 Thread dilipvs...@hotmail.com
I would like to pick up this thread...

Is there any way I could help with implementing Indian languages in iText? Is 
there any documentation / code that I can refer to and attempt to implement one 
Indian language to start with? I'll be more than happy to contribute my work to 
the community.

Dilip


From: Dilip Shah 
Sent: Thursday, April 07, 2011 6:40 AM
To: Paulo Soares-3 [via iText - General] 
Subject: Re: ligature implementation for Indian languages / Devanagari script


I'm working on a self-funded 'hobby' project. I guess we'll have to wait for a 
party with funds who badly needs this done. 

Dilip


From: Paulo Soares-3 [via iText - General] 
Sent: Wednesday, April 06, 2011 4:05 PM
To: [email protected] 
Subject: Re: ligature implementation for Indian languages / Devanagari script


 
Indic ligatures are a lot more complex, not only with the posiible combinations 
but also, and probably more important, in that the ligaturized representation 
has no corresponding Unicode code point. This requires a GSUB table to provide 
the glyph id for the ligature. iText has no capability to read this table (GPOS 
would also be nice to have). The process to implement support for Indic scripts 
would be:

- have the rules for Indic ligatures
- decode the GSUB table in the font to get the glyph id of the ligature
- add the ligature, as a glyph id, to the output text

None of this is supported in iText for the moment and would take several weeks 
to implement if we knew how (we can learn, no big deal) and if someone was 
willing to pay for the development.

Paulo
  - Original Message - 
  From: [hidden email] 
  To: [hidden email] 
  Sent: Wednesday, April 06, 2011 10:55 PM
  Subject: Re: [iText-questions] ligature implementation for Indian languages / 
Devanagari script


  I’ll let Paulo comment since he wrote the Arabic shaper and knows what’s 
involved…

   

  From: [hidden email] [mailto:[hidden email]] 
  Sent: Wednesday, April 06, 2011 2:03 PM
  To: [hidden email]
  Subject: Re: [iText-questions] ligature implementation for Indian languages / 
Devanagari script

   

  Would that be a common dictionary for all the Indian languages that use 
Devanagari script or a separate one for each language such as Hindi, Marathi, 
Gujarati, etc. I believe you'll need separate one for each but then I could be 
wrong. I know these 3 Indian languages well enough to help out.

   

  What's the format of this dictionary? Could you point to the Arabic 
dictionary?

   

  Thanks for so much interest in this subject. Can we make use of the interest 
and momentum to get this done.

   

  Dilip

   

   

  From: [hidden email] 

  Sent: Wednesday, April 06, 2011 1:48 PM

  To: [hidden email] 

  Subject: Re: ligature implementation for Indian languages / Devanagari script

   

  That's EXACTLY what is needed -  the "dictionary" that tells iText that when 
it sees a specific combination of codepoints to use a different glyph than 
normal.   iText has one for Arabic text, but not for Devanagari (and other 
Indics).  If you can build such a table/dictionary, that would go a LONG WAY to 
getting support into iText. 

  Leonard 

  -Original Message- 
  From: John Kilbourne [mailto:[hidden email]] 
  Sent: Wednesday, April 06, 2011 1:15 PM 
  To: Post all your questions about iText here 
  Subject: Re: [iText-questions] ligature implementation for Indian languages / 
Devanagari script 

  Thank you for your clarification. 

  I understand from your last paragraph that iText would need to determine 
whether any contextual glyph shaping needs to be performed and then find the 
relevant glyphs in the font file. iText would not need to read the glyphs 
'live' (as they are being typed in) and change the rendering as subsequent 
characters are typed; it just sees a finished sequence of Unicode (often 
multi-)byte characters. Is is difficult to have a 'dictionary' of character 
combinations within iText that relate the combinations of Unicode characters 
('codepoints' I think is the correct term) to the appropriate glyphs (e.g. क + 
ष = क्ष)? I would like to help (because I would really like to use iText), or 
at least understand this problem better. 


  - Original Message - 
  From: "Leonard Rosenthol" <[hidden email]> 
  To: "Post all your questions about iText here" <[hidden email]> 
  Sent: Wednesday, April 6, 2011 3:45:24 PM GMT -05:00 US/Canada Eastern 
  Subject: Re: [iText-questions] ligature implementation for Indian languages / 
Devanagari script 

  Roman (also sometimes called Latin) is a class of languages (also known as 
Romance) that includes English, French, German, etc.  This is codified in the 
encoding ISO 8859-1 (also called ISO Latin 1 - <http://en.wikipedia.org/wiki/ISO/IEC_8859-1>" rel=nofollow target=_top 
link="external">http://en.wiki

Re: [iText-questions] ligature implementation for Indian languages / Devanagari script

2011-10-14 Thread dilipvs...@hotmail.com
Hi Paulo,

What can be done to use iText for Indian languages? As I've mentioned in my 
earlier emails, I'm willing to put in time to implement one Indian language to 
start with and contribute my discoveries as well as code to the community.

Any direction in this matter is highly appreciated.

Dilip



From: Dilip Shah 
Sent: Thursday, October 13, 2011 2:17 PM
To: Paulo Soares-3 [via iText - General] 
Subject: Re: ligature implementation for Indian languages / Devanagari script


I would like to pick up this thread...

Is there any way I could help with implementing Indian languages in iText? Is 
there any documentation / code that I can refer to and attempt to implement one 
Indian language to start with? I'll be more than happy to contribute my work to 
the community.

Dilip


From: Dilip Shah 
Sent: Thursday, April 07, 2011 6:40 AM
To: Paulo Soares-3 [via iText - General] 
Subject: Re: ligature implementation for Indian languages / Devanagari script


I'm working on a self-funded 'hobby' project. I guess we'll have to wait for a 
party with funds who badly needs this done. 

Dilip


From: Paulo Soares-3 [via iText - General] 
Sent: Wednesday, April 06, 2011 4:05 PM
To: [email protected] 
Subject: Re: ligature implementation for Indian languages / Devanagari script


 
Indic ligatures are a lot more complex, not only with the posiible combinations 
but also, and probably more important, in that the ligaturized representation 
has no corresponding Unicode code point. This requires a GSUB table to provide 
the glyph id for the ligature. iText has no capability to read this table (GPOS 
would also be nice to have). The process to implement support for Indic scripts 
would be:

- have the rules for Indic ligatures
- decode the GSUB table in the font to get the glyph id of the ligature
- add the ligature, as a glyph id, to the output text

None of this is supported in iText for the moment and would take several weeks 
to implement if we knew how (we can learn, no big deal) and if someone was 
willing to pay for the development.

Paulo
  - Original Message - 
  From: [hidden email] 
  To: [hidden email] 
  Sent: Wednesday, April 06, 2011 10:55 PM
  Subject: Re: [iText-questions] ligature implementation for Indian languages / 
Devanagari script


  I’ll let Paulo comment since he wrote the Arabic shaper and knows what’s 
involved…

   

  From: [hidden email] [mailto:[hidden email]] 
  Sent: Wednesday, April 06, 2011 2:03 PM
  To: [hidden email]
  Subject: Re: [iText-questions] ligature implementation for Indian languages / 
Devanagari script

   

  Would that be a common dictionary for all the Indian languages that use 
Devanagari script or a separate one for each language such as Hindi, Marathi, 
Gujarati, etc. I believe you'll need separate one for each but then I could be 
wrong. I know these 3 Indian languages well enough to help out.

   

  What's the format of this dictionary? Could you point to the Arabic 
dictionary?

   

  Thanks for so much interest in this subject. Can we make use of the interest 
and momentum to get this done.

   

  Dilip

   

   

  From: [hidden email] 

  Sent: Wednesday, April 06, 2011 1:48 PM

  To: [hidden email] 

  Subject: Re: ligature implementation for Indian languages / Devanagari script

   

  That's EXACTLY what is needed -  the "dictionary" that tells iText that when 
it sees a specific combination of codepoints to use a different glyph than 
normal.   iText has one for Arabic text, but not for Devanagari (and other 
Indics).  If you can build such a table/dictionary, that would go a LONG WAY to 
getting support into iText. 

  Leonard 

  -Original Message- 
  From: John Kilbourne [mailto:[hidden email]] 
  Sent: Wednesday, April 06, 2011 1:15 PM 
  To: Post all your questions about iText here 
  Subject: Re: [iText-questions] ligature implementation for Indian languages / 
Devanagari script 

  Thank you for your clarification. 

  I understand from your last paragraph that iText would need to determine 
whether any contextual glyph shaping needs to be performed and then find the 
relevant glyphs in the font file. iText would not need to read the glyphs 
'live' (as they are being typed in) and change the rendering as subsequent 
characters are typed; it just sees a finished sequence of Unicode (often 
multi-)byte characters. Is is difficult to have a 'dictionary' of character 
combinations within iText that relate the combinations of Unicode characters 
('codepoints' I think is the correct term) to the appropriate glyphs (e.g. क + 
ष = क्ष)? I would like to help (because I would really like to use iText), or 
at least understand this problem better. 


  - Original Message - 
  From: "Leonard Rosenthol" <[hidden email]> 
  To: "Post all your questions about iText here" <[hidden email]> 
  Sent: Wedn