Re: [iText-questions] Outputting texts in PDF including arbitraryUTF-8 characters (including CJK)

Paulo Soares Mon, 14 Sep 2009 17:48:14 -0700

FreeSerif.ttf only supports hiragana and katakana. If you're writing kanji 
it will show nothing.


Paulo

----- Original Message ----- 
From: "Erik Norvelle" <[email protected]>
To: "Post all your questions about iText here" 
<[email protected]>
Sent: Monday, September 14, 2009 10:46 PM
Subject: Re: [iText-questions] Outputting texts in PDF including 
arbitraryUTF-8 characters (including CJK)


Greetings Paolo,

Thanks for your response to my issue.  Unfortunately, I am still
unable to get PDF output with CJK characters.

I put the fonts into a subpackage of my project
(org.norvelle.textcite.gui.fonts).  I then used the form
BaseFont.createFont("org/norvelle/textcite/gui/fonts/FreeSerif.ttf",
BaseFont.IDENTITY_H, true) to create my a BaseFont object for
FreeSerif, which I then pass to the Font constructor.  The output,
unfortunately, still lacks any CJK characters.

I've checked the PDF file make sure it has FreeSerif embedded, and it
comes up as "Embedded subset" with encoding Identity-H.  So the font
is being embedded, but the CJK characters still aren't showing up.

I'm baffled at this point... any help will be greatly appreciated.

Cheers,
Erik

On Sep 11, 2009, at 6:32 PM, Paulo Soares wrote:

>
>
>> -----Original Message-----
>> From: Erik Norvelle [mailto:[email protected]]
>> Sent: Friday, September 11, 2009 5:06 PM
>> To: [email protected]
>> Subject: [iText-questions] Outputting texts in PDF including
>> arbitrary UTF-8 characters (including CJK)
>>
>> Greetings to Bruno and the iText community,
>>
>> My question is a bit complicated, so please bear with me as I
>> explain what my issue is.  I don't think my particular
>> problem has been answered on the mailing lists, due to the
>> fact that my situation is rather special.
>>
>>
>> I've been using iText in my program (TextCite,
>> http://textcite.sourceforge.net) for several years now, in
>> order to export text in PDF and RTF formats.  My program, as
>> designed, is able to output PDFs with text from nearly any
>> UTF-8 block, except CJK.  My program currently uses iText
>> 1.4.5, although I am in the process of updating it to use the
>> 2.1.7 version.
>>
>> Because my program is intended for use on any platform with
>> Java installed (including Linux of various flavors), I cannot
>> count on the user having Unicode fonts with all the necessary
>> character blocks.  So, I have embedded the GNU free fonts
>> (FreeSerif, FreeSans and FreeMono) in my program (i.e. inside
>> the Jar file).  When I create a PDF, I read the font file and
>> create a new embedded font, calling the routine
>> BaseFont.createFont(String name, String encoding, boolean
>> embedded, boolean cached, byte ttfAfm[], byte pfb[]).  Note
>> that I have to create and pass the ttfAfm byte array because
>> the fonts are stored inside my program's Jar file, and not in
>> the filesystem.
>>
>
> No need to do it, BaseFont.createFont() will laso look for the fonts
> as resources.
>
>> Unfortunately, despite the fact that the GNU fonts contain
>> the necessary CJK characters, my PDFs come out with blank
>> spaces where the CJK characters ought to be.
>>
>> I have included the iTextAsian.jar and the
>> iTextAsianCmaps.jar libraries in my classpath, along with
>> iText.jar, as suggested in the Examples online.
>>
>
> If you have a TTF or CFF font with the characters then
> iTextAsian.jar and iTextAsianCmaps.jar are not needed.
>
>> I know that iText isn't choking on the CJK characters,
>> because I can export to RTF and all the characters appear in
>> the resulting file.  So, it's *just* PDF output that is doing
>> something funny with CJK.
>>
>> So, the basic question is: am I doing something wrong?  What
>> do I need to change?  I include a snippet of sample code,
>> that shows how I am create Chunks which I then add to my PDF file.
>>
>
> The font must be created with BaseFont.createFont("font.ttf",
> BaseFont.IDENTITY_H, true).
>
> Paulo
>
>> Thanks for any help,
>> Cheers,
>> Erik Norvelle
>>
>> =============== snip ===================
>> /**
>> * Recursively parse a given text into Chunks, with each Chunk
>> given the font styling
>> * appropriate to it, based on the HTML tags in the text.
>> * @param text The text to parse into Chunks
>> * @param font The current font to apply
>> * @param phrase The currently existing Phrase to be added to.
>> * @return The completed Phrase, made up of formatted Chunks.
>> */
>> protected Phrase chunkText(String text, Font font, Phrase phrase) {
>> Matcher m = tagPattern.matcher(text);
>> if (m.matches()) {
>> // Separate out the text and the tag
>> String currFontText = m.group(1);
>> String tag = m.group(2);
>> String remainderText = m.group(3);
>> phrase.add(new Chunk(currFontText, font));
>>
>>
>>
>>
>> int family;
>> try {
>> String fontFileName = font.getBaseFont().getTTFileName();
>> if (fontFileName.contains("Mono")) family = Font.COURIER;
>> else if (fontFileName.contains("Sans")) family = Font.HELVETICA;
>> else family = Font.TIMES_ROMAN;
>> } catch (Exception e) {
>> family = Font.TIMES_ROMAN;
>> }
>>
>>
>>
>>
>> // Depending on the tag, adjust the font styling.
>> // UnicodeFontFactory is my custom code for getting a Font
>> object corresponding to the
>> // GNU Free Fonts, which are embedded in the PDF.
>> if (tag.equals("<i>"))
>> font = UnicodeFontFactory.getFont(family, font.getSize(),
>> font.getStyle() | Font.ITALIC);
>> else if (tag.equals("<b>"))
>> font = UnicodeFontFactory.getFont(family, font.getSize(),
>> font.getStyle() | Font.BOLD);
>> else if (tag.equals("<u>"))
>> font = UnicodeFontFactory.getFont(family, font.getSize(),
>> font.getStyle() | Font.UNDERLINE);
>> else if (tag.equals("</i>"))
>> font = UnicodeFontFactory.getFont(family, font.getSize(),
>> font.getStyle() & ~Font.ITALIC);
>> else if (tag.equals("</b>"))
>> font = UnicodeFontFactory.getFont(family, font.getSize(),
>> font.getStyle() & ~Font.BOLD);
>> else
>> font = UnicodeFontFactory.getFont(family, font.getSize(),
>> font.getStyle() & ~Font.UNDERLINE);
>>
>>
>>
>>
>> // Recurse to handle the rest of the text remaining
>> phrase = chunkText(remainderText, font, phrase);
>> }
>> else
>> // If we're at the end of the text, and there's no more tags,
>> just add a final Chunk to the phrase.
>> phrase.add(new Chunk(text, font));
>> return phrase;
>> }


------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Outputting texts in PDF including arbitraryUTF-8 characters (including CJK)

Reply via email to