Hello,
I am not sure if this helps or not, but CodePage 1252 is not Unicode.
I believe that UTF-8 encoded Unicode data uses CodePage 1208.
And, UTF-16 Little Endian is CodePage 1200. UTF-16 Big Endian is CodePage 1201.
Sorry I can't really help much more than that, but perhaps, your data is either
not Unicode, or not CodePage 1252, and when interpretting and transforming it,
it becomes scrambled?
________________________________
From: "[email protected]"
<[email protected]>
To: [email protected]
Sent: Monday, January 9, 2012 5:10:22 PM
Subject: iText-questions Digest, Vol 68, Issue 13
Send iText-questions mailing list submissions to
[email protected]
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.sourceforge.net/lists/listinfo/itext-questions
or, via email, send a message with subject or body 'help' to
[email protected]
You can reach the person managing the list at
[email protected]
When replying, please edit your Subject line so it is more specific
than "Re: Contents of iText-questions digest..."
Today's Topics:
1. Reading annotations containing Unicode characters (William Bell)
----------------------------------------------------------------------
Message: 1
Date: Mon, 9 Jan 2012 22:10:09 -0000
From: "William Bell" <[email protected]>
Subject: [iText-questions] Reading annotations containing Unicode
characters
To: <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset="us-ascii"
Good evening,
I am trying to extract the annotations in a pdf file. This is straight
forward:
reader = new PdfReader(pdfFile.fullname);
for (int n = 1; n <= reader.NumberOfPages; n++) {
PdfDictionary page = reader.GetPageN(n);
PdfArray annotsArray = page.GetAsArray(PdfName.ANNOTS);
if (annotsArray != null) {
for (int k = 0; k < annotsArray.Size; k++) {
PdfDictionary annot =
(PdfDictionary)PdfReader.GetPdfObject(annotsArray[k]);
PdfString content =
(PdfString)PdfReader.GetPdfObject(annot.Get(PdfName.CONTENTS));
if (content != null) {
System.Windows.Forms.MessageBox.Show(content.ToString());
}
}
}
}
However, if the annotation contains Unicode (more specific 1252 code page)
the annotation is not read correctly.
I tried modifying the above code as follows:
if (content != null) {
byte[] byteArray =
Encoding.Unicode.GetBytes(((PdfString)PdfReader.GetPdfObject(annot.Get(PdfNa
me.CONTENTS))).ToString());
string s = Encoding.Unicode.GetString(byteArray);
System.Windows.Forms.MessageBox.Show(s);
Unfortunately, this does not resolve the issue.
I have attached a sample file with the troublesome annotation.
I was wondering if someone could point me in the right direction.
Thanks.
William Bell
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mypd1f.pdf
Type: application/pdf
Size: 30244 bytes
Desc: not available
------------------------------
------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual
desktops for less than the cost of PCs and save 60% on VDI infrastructure
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
------------------------------
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA
End of iText-questions Digest, Vol 68, Issue 13
***********************************************------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create
new or port existing apps to sell to consumers worldwide. Explore the
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples:
http://itextpdf.com/themes/keywords.php