Re: [iText-questions] UTF 8 SUPPORT

1T3XT info Wed, 07 Apr 2010 23:58:46 -0700

J sundi wrote:
> Hi
>   I am using itext to generate PDF documents with the data fetched
> from DB.  But I am facing problem in adding data with UTF8 chars to
> the PDF. Everything comes as junk. May I please know if UTF8 is
> supported by itext or if supported is there any package which i need
> to install to generate UTF8 containing PDF without any junk characters
> in it.What is the ttf files i need to use for this?


This is NOT an iText question.

Nevertheless, the answer is in the book:
http://www.manning.com/lowagie2/

If you download the free chapter:
http://www.manning.com/lowagie2/iText2E_MEAP_CH02.pdf
You'll read on page 18-19:

<quote>
In listing 2.3, some Strings were created using the encoding UTF-8 
explicitly:

new String(rs.getBytes("given_name"), "UTF-8")

That's because the database contains different names with special 
characters. If you look at the HSQL script filmfestival.script, you'll 
find INSERT statements like this:

INSERT INTO FILM_DIRECTOR VALUES(
41,'I\u00c3\u00b1\u00c3\u00a1rritu','Alejandro Gonz\u00c3\u00a1lez')

That's the record for the director Alejandro González Iñárritu. The 
characters á —(char) 226— and ñ —(char) 241— can be stored as one byte 
using the ANSI character encoding, which is a superset of ISO 8859-1 
a.k.a. Latin-1. HSQL stores them in UNICODE using multiple bytes per 
character. To make sure that the String is created correctly, I've used
ResultSet.getBytes() instead of ResultSet.getString().

This isn't always necessary. In most database systems, you can define 
the encoding for each table or for the whole database. The JVM uses the 
platform's default charset, for instance in the constructor new 
String(byte[] bytes).

FAQ:
Why is the data I retrieve from my database rendered as gibberish?
This can be caused by an encoding mismatch. The records in your database 
are encoded using encoding X; but the String objects obtained from your 
ResultSet assume that they are encoded using your platform's charset Y. 
For instance: the name González could be rendered as GonzÃ¡lez if the 
UNICODE characters are interpreted as ANSI characters.

These encoding problems disappear as soon as you've created the PDF 
document correctly. One of the main reasons why people prefer PDF over 
any other document format, is because PDF, as the first letter in the 
abbreviation tells us, is a portable document format. A PDF document can 
be viewed and printed on any platform: UNIX, Macintosh, Windows, Linux, 
or Palm OS, regardless of the encoding of character set that is used.
</quote>

The documentation is there, even in a free chapter, and your question is 
literally an FAQ. Please read the documentation BEFORE posting a 
question to the list.

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] UTF 8 SUPPORT

Reply via email to