Re: [Podofo-users] Printing PdfStrings with escape sequences

A. Massad Fri, 18 Dec 2009 07:01:02 -0800

Hi Dominik,

your latest changes in the SVN (rev 1173) introduced a serious bug into the 
PoDoFo-Code.
Your access to m_escMap with m_escMap[static_cast<int>(*pBuf)] is wrong because 
this cast may yield negative indices (for chars > 0x7f). In that case, cEsc 
becomes randomly true and yields to unexpected escape sequences, such as "\1".


Could you please check my suggested diffs (against rev 1174) below and commit 
these changes to the SVN - if they are correct. 

Thank you!
Amin

PS: In my opinion, the additional sizeof(char) should be added to the malloc() 
because it's used in the memset()-call, too. 
PPS: Please note that a cast unsigned(*pBuf) does not correct the problem and 
yields very large numbers for "negative" chars. Use "unsigned(*pBuf)&0xff" to 
prevent this. 


Index: src/PdfString.cpp
===================================================================
--- src/PdfString.cpp   (revision 1174)
+++ src/PdfString.cpp   (working copy)
@@ -48,7 +48,7 @@
 static const char* genEscMap()
 {
     const long lAllocLen = 256;
-    char* map = static_cast<char*>(malloc(lAllocLen));
+    char* map = static_cast<char*>(sizeof(char)*malloc(lAllocLen));
     memset( map, 0, sizeof(char) * lAllocLen );
 
     map['\n'] = 'n'; // Line feed (LF)
@@ -376,7 +376,7 @@
 
             while( lLen-- ) 
             {
-                const char & cEsc = m_escMap[static_cast<int>(*pBuf)];
+                const char & cEsc = m_escMap[static_cast<unsigned 
char>(*pBuf)];
                 if( cEsc != 0 ) 
                 {
                     pDevice->Write( "\\", 1 );
Index: src/PdfTokenizer.cpp
===================================================================
--- src/PdfTokenizer.cpp        (revision 1174)
+++ src/PdfTokenizer.cpp        (working copy)
@@ -53,7 +53,7 @@
 {
     int        i;
     const long lAllocLen = 256;
-    char* map = static_cast<char*>(malloc(lAllocLen));
+    char* map = static_cast<char*>(sizeof(char)*malloc(lAllocLen));
     memset( map, 0, sizeof(char) * lAllocLen );
     for (i = 0; i < PoDoFo::s_nNumDelimiters; ++i)
         map[static_cast<int>(PoDoFo::s_cDelimiters[i])] = 1;
@@ -67,7 +67,7 @@
 {
     int   i;
     const long lAllocLen = 256;
-    char* map = static_cast<char*>(malloc(lAllocLen));
+    char* map = static_cast<char*>(sizeof(char)*malloc(lAllocLen));
     memset( map, 0, sizeof(char) * lAllocLen );
     for (i = 0; i < PoDoFo::s_nNumWhiteSpaces; ++i)
         map[static_cast<int>(PoDoFo::s_cWhiteSpaces[i])] = 1;
@@ -78,7 +78,7 @@
 const char* genEscMap()
 {
     const long lAllocLen = 256;
-    char* map = static_cast<char*>(malloc(lAllocLen));
+    char* map = static_cast<char*>(sizeof(char)*malloc(lAllocLen));
     memset( map, 0, sizeof(char) * lAllocLen );
 
     map['n'] = '\n'; // Line feed (LF)
@@ -646,7 +646,7 @@
             else
             {
                 // Handle plain escape sequences
-                const char & code = m_escMap[m_device.Device()->GetChar()];
+             const char & code = m_escMap[(unsigned 
char)(m_device.Device()->GetChar())];
                 if( code )
                     m_vecBuffer.push_back( code );

Am 14.12.2009 um 15:55 schrieb Dominik Seichter:

> Thanks for the clarifications. I commited a fix to SVN, could you please 
> check 
> that the behaviour is now correct for you?
> 
> Best regards,
>       Dom
> 
> Am Samstag 12 Dezember 2009 schrieb A. Massad:
>> Am 29.11.2009 um 19:21 schrieb Dominik Seichter:
>>> Hi,
>>> 
>>> I do not see how this is a problem. It is true that PoDoFo writes
>>> (Hello\nWorld) as (Hello
>>> World) into the PDF. But the PDF is read as sequence of bytes and the
>>> byte for the linebreak is still there. If I understand the PDF reference
>>> correctly, the behaviour of PoDoFo is correct. Escaping is optional and
>>> not required.
>>> 
>>> Please correct me if I am wrong here!
>> 
>> Sorry for the late reply, it took me some time to investigate this issue: I
>> came to the conclusion that your statement does not agree with the PDF
>> spec. The following is a quotation from section "7.4.3.2 Literal Strings":
>> 
>> "An end-of-line marker appearing within a literal string without a
>> preceding REVERSE SOLIDUS shall be treated as a byte value of (0Ah),
>> irrespective of whether the end-of-line marker was a CARRIAGE RETURN
>> (0Dh), a LINE FEED (0Ah), or both."
>> 
>> That means: If PoDoFo expands \n to a single code 0Ah and \r to 0Dh, they
>> loose the "REVERSE SOLIDUS" ("\") and become an end-of-line marker. Now,
>> if you read in such a PDF with the Adobe tools, they treat this
>> end-of-line marker as 0Ah. This is exactly the behaviour I have observed.
>> 
>> I am pretty sure that the output of PoDoFo is wrong: Due to the expansion
>> of the escape sequences of \r and \n, the hex codes 0Ah and 0Dh become
>> indistinguishable for PDF readers. This might be OK if they just represent
>> end-of-lines. However, due to Character Encodings with
>> "Differences"-Mappings, the hex codes 0Dh and 0Ah might be mapped to
>> different printable characters. In that case, the PoDoFo yields to serious
>> errors!
>> 
>> Best regards,
>> Amin
>> 
>>> Am Montag 23 November 2009 schrieb A. Massad:
>>>> Hello,
>>>> 
>>>> Maybe this is a bug in PoDoFo - or just wrong usage of the library
>>>> functions:
>>>> 
>>>> Reading/parsing PDF-Files which contain strings with escape sequences,
>>>> e.g. (\r) or (\b), causes problem when writing these strings: the
>>>> functions PdfVariant::Write() and PdfString::Write() yield a strange
>>>> output - that is: (\r)
>>>> becomes
>>>> (
>>>> )
>>>> and
>>>> (\b)
>>>> becomes
>>>> )
>>>> respectively.
>>>> 
>>>> That means, the escape sequences are resolved in the output to a CR or
>>>> BACKSPACE instead of maintaining the escaping with the Backslash "\".
>>>> 
>>>> Due to this behavior, the written PDFs are corrupt (esp. due to
>>>> malformed syntax produced by the \b).
>>>> 
>>>> Is there a special function or flag to find a work-around for this
>>>> behavior? I could not fix the problem with PoDoFo functions but rather
>>>> had use PdfVariant::ToString() and rewrite the std::string manually to
>>>> hex codes...
>>>> 
>>>> Thank you for your help!
>>>> 
>>>> Greetings,
>>>> Amin
>>>> 
>>>> PS: The strange encodings with low code numbers occur in a PDF where a
>>>> Font Encoding remaps all present characters by a "Differences"-Mapping
>>>> to codes starting at 1 - i.e. the non-printable chars will be mapped to
>>>> printable characters by this encoding (cf. Pdf Spec Sect 9.6.6 Character
>>>> Encoding).
>> 
>> ---------------------------------------------------------------------------
>> --- Return on Information:
>> Google Enterprise Search pays you back
>> Get the facts.
>> http://p.sf.net/sfu/google-dev2dev
>> _______________________________________________
>> Podofo-users mailing list
>> Podofo-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/podofo-users
>> 
> 
> 
> -- 
> **********************************************************************
> Dominik Seichter - domseich...@web.de
> KRename  - http://www.krename.net  - Powerful batch renamer for KDE
> KBarcode - http://www.kbarcode.net - Barcode and label printing
> PoDoFo - http://podofo.sf.net - PDF generation and parsing library
> SchafKopf - http://schafkopf.berlios.de - Schafkopf, a card game,  for KDE
> Alan - http://alan.sf.net - A Turing Machine in Java
> **********************************************************************

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev

_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Re: [Podofo-users] Printing PdfStrings with escape sequences

Reply via email to