The only exception I can think of is Form Fields... but Acrobat will
automagically (automatic + magically, an english joke) change the font/encoding
as needed.
To further encourage Identity_H: if you have the same font in a PDF with two
different encodings, then you have two different copies of that font in your
PDF. Inefficient, and worth avoiding.
Another problem I have with non-Identity encodings is that they are
language-specific. There are GB encodings and JP encodings and Hebrew, and
Thai, and so on... but if you try to use a Thai character in a GB-based
encoding (even one of the Unicode versions), you simply won't get the
character[s] you want. They don't exist in the underlying encoding.
As far as I can tell, this behavior is a vestigal organ (like your apendix)
left over from The Days Before Unicode (also known as: The Dark Ages).
I suppose we could generate a Unicode->Identity cmap for each font... a bit of
extra work (both CPU and dev), but the foundations are already laid in iText
hither and yon. It would certainly simplify the "toUnicode" map.
--Mark Storer
Senior Software Engineer
Cardiff.com
#include <disclaimer>
typedef std::Disclaimer<Cardiff> DisCard;
-----Original Message-----
From: Leonard Rosenthol [mailto:[email protected]]
Sent: Thursday, October 08, 2009 7:16 AM
To: Post all your questions about iText here
Subject: Re: [iText-questions] Some questions about double byte characters and
asian text
Yes, use Identity_H for everything.
Leonard
From: Y Fang [mailto:[email protected]]
Sent: Thursday, October 08, 2009 7:01 AM
To: [email protected]
Subject: [iText-questions] Some questions about double byte characters and
asian text
I've been looking at some of the font pages in the iText Tutorial here:
http://itextdocs.lowagie.com/tutorial/ but there are two things which are
confusing me regarding the writing of Asian characters.
Firstly there is the explaination of the IDENTITY_H and IDENTITY_V encodings:
"In the next example, we are passing the value
<http://www.1t3xt.info/api/com/lowagie/text/pdf/BaseFont.html#IDENTITY_H>
IDENTITY_H as encoding. BaseFont.IDENTITY_H and BaseFont.IDENTITY_V are not
really encodings. They indicate that the unicode character wil be looked up in
the font and stored as-is, taking two bytes of space. It's the only way to have
Asian fonts and some encoding! s left out by Adobe such as Thai. For Europe or
the Middle-East, it is better to use an available encoding that will store a
single byte per character. Fonts with BaseFont.IDENTITY_H or
BaseFont.IDENTITY_V will always be embedded no matter what you enter as third
parameter."
So working through the examples there, it seems I'd be using BaseFont.CP1252 as
the encoding for regular English text, and BaseFond.IDENTITY_H when I need to
use double byte characters. So I tried to make a test pdf with some text, using
the font "Microsoft YaHei" (which according to the windows character map
contains chinese characters).
First try:
---------------
Font font = new
Font(BaseFont.CreateFont(@"C:\Windows\Fonts\msyh.ttf", BaseFont.CP1252,
BaseFont.EMBEDDED), 12);
document.Add(new Paragraph("Hello", font));
document.Add(new Paragraph("你好", font));
---------------
This gave a 133kb pdf file where only the English "Hello" displayed (no Chinese
text showed up under it)
Second try:
---------------
Font font = new
Font(BaseFont.CreateFont(@"C:\Windows\Fonts\msyh.ttf", BaseFont. IDENTITY_H ,
BaseFont.EMBEDDED), 12);
docu! ment.Add (new Paragraph("Hello", font));
document.Add(new Paragraph("你好", font));
---------------
This gave a 34kb pdf file where the English "Hello" displayed, and below it
were two correct Chinese characters.
In both cases I've asked for the font to be embedded, why is the pdf created by
the first try lar! ger?
In the second try, using IDENTITY_H caused both the English and Chinese text to
show up. So is it fine to specify IDENTITY_H as the encoding even for normal
English text? i.e. as oppossed to something like this:
---------------
Font font1 = new
Font(BaseFont.CreateFont(@"C:\Windows\Fonts\msyh.ttf", BaseFont.CP1252,
BaseFont.EMBE! DDED), 12);
! d ocument.Add(new Paragraph("Hello", font1));
Font font2 = new
Font(BaseFont.CreateFont(@"C:\Windows\Fonts\msyh.ttf", BaseFont.IDENTITY_H,
BaseFont.EMBEDDED), 12);
document.Add(new Paragraph("你好", font2));
---------------
The problem I'm trying to solve here, is that I need to create something which
can accept text input from a user. The input can be either English characters,
or it could be in Asian fonts...such as Chinese writing. And that text needs to
be written to a PDF. Is it okay to always use BaseFont.IDENTITY_H? If not, and
I should use BaseFont.CP1252 for English text, is there any way to tell what
kind of text input I'm receiving?
For example, in the first try above, the Chinese font simply did not show up.
Is there any way to check whether printing! out a certain string with a certain
font (in this case the two chinese characters with msyh.ttf using CP1252) is
going to work, and if not redo it using IDENTITY_H instead?
It seems to me I should just use IDENTITY_H regardless of whether the input
text I'm receiving is English writing or something like Chinese.
The second thing is that this tutorial page:
http://itextdocs.lowagie.com/tutorial/fonts/getting/index.php makes mention of
using iTextAsian for CJK writing. When and why would you use that, as oppossed
to simply writing asian text using IDENTITY_H and a font which contains chinese
(or japanese, korean, etc.) characters like Microsoft YaHei?
If anyone could give some insight here, or just point me to some relevant
documentation/information, would be mu! ch appreciated.
Thanks in advan! ce. ;
_____
Check out The Great Australian Pay Check Take a
<http://clk.atdmt.com/NMN/go/157639755/direct/01/> peek at other people's pay
and perks
------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions:
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/