Okay, thanks both of you for your replies.
Makes me feel a bit safer about using IDENTITY_H for everything.
Cheers!
Date: Thu, 8 Oct 2009 09:15:04 -0700
From: [email protected]
To: [email protected]
Subject: Re: [iText-questions] Some questions about double byte characters
and asian text
The only
exception I can think of is Form Fields... but Acrobat will automagically
(automatic + magically, an english joke) change the font/encoding as
needed.
To further
encourage Identity_H: if you have the same font in a PDF with two different
encodings, then you have two different copies of that font in your PDF.
Inefficient, and worth avoiding.
Another
problem I have with non-Identity encodings is that they are
language-specific. There are GB encodings and JP encodings and Hebrew, and
Thai, and so on... but if you try to use a Thai character in a GB-based
encoding
(even one of the Unicode versions), you simply won't get the character[s] you
want. They don't exist in the underlying encoding.
As far as I
can tell, this behavior is a vestigal organ (like your apendix) left over from
The Days Before Unicode (also known as: The Dark Ages).
I suppose we
could generate a Unicode->Identity cmap for each font... a bit of extra work
(both CPU and dev), but the foundations are already laid in iText hither
and yon. It would certainly simplify the "toUnicode" map.
--Mark Storer
Senior Software Engineer
Cardiff.com
#include <disclaimer>
typedef std::Disclaimer<Cardiff> DisCard;
-----Original Message-----
From: Leonard Rosenthol
[mailto:[email protected]]
Sent: Thursday, October 08, 2009 7:16
AM
To: Post all your questions about iText here
Subject:
Re: [iText-questions] Some questions about double byte characters and asian
text
Yes,
use Identity_H for everything.
Leonard
From: Y Fang
[mailto:[email protected]]
Sent: Thursday, October 08, 2009
7:01 AM
To: [email protected]
Subject:
[iText-questions] Some questions about double byte characters and asian
text
I've been looking
at some of the font pages in the iText Tutorial here:
http://itextdocs.lowagie.com/tutorial/ but
there are two things which are confusing me regarding the writing of Asian
characters.
Firstly there is
the explaination of the IDENTITY_H and IDENTITY_V
encodings:
"In the next
example, we are passing the value IDENTITY_H as encoding.
BaseFont.IDENTITY_H and BaseFont.IDENTITY_V are not really encodings. They
indicate that the unicode character wil be looked up in the font and stored
as-is, taking two bytes of space. It's the only way to have Asian fonts and
some encoding! s left out by Adobe such as Thai. For Europe or the
Middle-East, it is better to use an available encoding that will store a
single byte per character. Fonts with BaseFont.IDENTITY_H or
BaseFont.IDENTITY_V will always be embedded no matter what you enter as third
parameter."
So working
through the examples there, it seems I'd be using BaseFont.CP1252 as the
encoding for regular English text, and BaseFond.IDENTITY_H when I need to use
double byte characters. So I tried to make a test pdf with some text, using
the font "Microsoft YaHei" (which according to the windows character map
contains chinese characters).
First
try:
---------------
Font font = new
Font(BaseFont.CreateFont(@"C:\Windows\Fonts\msyh.ttf", BaseFont.CP1252,
BaseFont.EMBEDDED), 12);
document.Add(new Paragraph("Hello",
font));
document.Add(new Paragraph("你好",
font));
---------------
This gave a 133kb
pdf file where only the English "Hello" displayed (no Chinese text
showed up under it)
Second
try:
---------------
Font font = new
Font(BaseFont.CreateFont(@"C:\Windows\Fonts\msyh.ttf",
BaseFont. IDENTITY_H , BaseFont.EMBEDDED),
12);
docu! ment.Add (new Paragraph("Hello",
font));
document.Add(new Paragraph("你好",
font));
---------------
This gave a 34kb
pdf file where the English "Hello" displayed, and below it were two correct
Chinese characters.
In both cases
I've asked for the font to be embedded, why is the pdf created by the first
try lar! ger?
In the second
try, using IDENTITY_H caused both
the English and Chinese text to show up. So is it fine to
specify IDENTITY_H as the encoding even for normal English text? i.e. as
oppossed to something like this:
---------------
Font font1 = new
Font(BaseFont.CreateFont(@"C:\Windows\Fonts\msyh.ttf", BaseFont.CP1252,
BaseFont.EMBE! DDED), 12);
! d ocument.Add(new Paragraph("Hello",
font1));
Font font2 = new
Font(BaseFont.CreateFont(@"C:\Windows\Fonts\msyh.ttf", BaseFont.IDENTITY_H,
BaseFont.EMBEDDED), 12);
document.Add(new Paragraph("你好",
font2));
---------------
The problem I'm
trying to solve here, is that I need to create something which can accept
text
input from a user. The input can be either English characters, or it
could be in Asian fonts...such as Chinese writing. And
that text needs to be written to a PDF. Is it okay to always use
BaseFont.IDENTITY_H? If not, and I should use BaseFont.CP1252 for English
text, is there any way to tell what kind of text input I'm
receiving?
For example, in
the first try above, the Chinese font simply did not show up. Is there any
way
to check whether printing! out a certain string with a certain font (in this
case the two chinese characters with msyh.ttf using CP1252) is going to work,
and if not redo it using IDENTITY_H instead?
It seems to me I
should just use IDENTITY_H regardless of whether the input text I'm receiving
is English writing or something like
Chinese.
The second thing
is that this tutorial page:
http://itextdocs.lowagie.com/tutorial/fonts/getting/index.php makes
mention of using iTextAsian for CJK writing. When and why would you use that,
as oppossed to simply writing asian text using IDENTITY_H and a font which
contains chinese (or japanese, korean, etc.) characters like Microsoft
YaHei?
If anyone could
give some insight here, or just point me to some relevant
documentation/information, would be mu! ch
appreciated.
Thanks in advan!
ce. ;
Check out The
Great Australian Pay Check Take a
peek at other people's pay and
perks
_________________________________________________________________
Get Hotmail on your iPhone Find out how here
http://windowslive.ninemsn.com.au/article.aspx?id=845706------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions:
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/