Hello,
I am not sure if this helps or not, but CodePage 1252 is not Unicode.
I believe that UTF-8 encoded Unicode data uses CodePage 1208.
And, UTF-16 Little Endian is CodePage 1200. UTF-16 Big Endian is CodePage 1201.
 
Sorry I can't really help much more than that, but perhaps, your data is either 
not Unicode, or not CodePage 1252, and when interpretting and transforming it, 
it becomes scrambled?
 
 

________________________________
 From: "[email protected]" 
<[email protected]>
To: [email protected] 
Sent: Monday, January 9, 2012 5:10:22 PM
Subject: iText-questions Digest, Vol 68, Issue 13
  
Send iText-questions mailing list submissions to
    [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
    https://lists.sourceforge.net/lists/listinfo/itext-questions
or, via email, send a message with subject or body 'help' to
    [email protected]

You can reach the person managing the list at
    [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of iText-questions digest..."


Today's Topics:

   1. Reading annotations containing Unicode characters (William Bell)


----------------------------------------------------------------------

Message: 1
Date: Mon, 9 Jan 2012 22:10:09 -0000
From: "William Bell" <[email protected]>
Subject: [iText-questions] Reading annotations containing Unicode
    characters
To: <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset="us-ascii"

Good evening,



I am trying to extract the annotations in a pdf file.  This is straight
forward:



reader = new PdfReader(pdfFile.fullname);

for (int n = 1; n <= reader.NumberOfPages; n++) {

  PdfDictionary page = reader.GetPageN(n);  

  PdfArray annotsArray = page.GetAsArray(PdfName.ANNOTS);

  if (annotsArray != null) {

    for (int k = 0; k < annotsArray.Size; k++) {

      PdfDictionary annot =
(PdfDictionary)PdfReader.GetPdfObject(annotsArray[k]);

      PdfString content =
(PdfString)PdfReader.GetPdfObject(annot.Get(PdfName.CONTENTS));

      if (content != null) {

        System.Windows.Forms.MessageBox.Show(content.ToString());

     }

   }

  }

}





However, if the annotation contains Unicode (more specific 1252 code page)
the annotation is not read correctly.



I tried modifying the above code as follows:



if (content != null) {

byte[] byteArray =
Encoding.Unicode.GetBytes(((PdfString)PdfReader.GetPdfObject(annot.Get(PdfNa
me.CONTENTS))).ToString());

string s = Encoding.Unicode.GetString(byteArray);

System.Windows.Forms.MessageBox.Show(s);





Unfortunately, this does not resolve the issue.



I have attached a sample file with the troublesome annotation.



I was wondering if someone could point me in the right direction.



Thanks.



William Bell

-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mypd1f.pdf
Type: application/pdf
Size: 30244 bytes
Desc: not available

------------------------------

------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox

------------------------------

_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA

End of iText-questions Digest, Vol 68, Issue 13
***********************************************
------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to