Re: [Podofo-users] Printing PdfStrings with escape sequences

A . Massad Sat, 12 Dec 2009 06:31:59 -0800

Am 29.11.2009 um 19:21 schrieb Dominik Seichter:

> Hi,
> 
> I do not see how this is a problem. It is true that PoDoFo writes 
> (Hello\nWorld) as (Hello
> World) into the PDF. But the PDF is read as sequence of bytes and the byte 
> for 
> the linebreak is still there. If I understand the PDF reference correctly, 
> the 
> behaviour of PoDoFo is correct. Escaping is optional and not required. 
> 
> Please correct me if I am wrong here!


Sorry for the late reply, it took me some time to investigate this issue: I 
came to the conclusion that your statement does not agree with the PDF spec. 
The following is a quotation from section "7.4.3.2 Literal Strings":

"An end-of-line marker appearing within a literal string without a preceding 
REVERSE SOLIDUS shall be treated as a byte value of (0Ah), irrespective of 
whether the end-of-line marker was a CARRIAGE RETURN (0Dh), a LINE FEED (0Ah), 
or both."

That means: If PoDoFo expands \n to a single code 0Ah and \r to 0Dh, they loose 
the "REVERSE SOLIDUS" ("\") and become an end-of-line marker. Now, if you read 
in such a PDF with the Adobe tools, they treat this end-of-line marker as 0Ah. 
This is exactly the behaviour I have observed.

I am pretty sure that the output of PoDoFo is wrong: Due to the expansion of 
the escape sequences of \r and \n, the hex codes 0Ah and 0Dh become 
indistinguishable for PDF readers. This might be OK if they just represent 
end-of-lines. However, due to Character Encodings with "Differences"-Mappings, 
the hex codes 0Dh and 0Ah might be mapped to different printable characters. In 
that case, the PoDoFo yields to serious errors!

Best regards,
Amin

> Am Montag 23 November 2009 schrieb A. Massad:
>> Hello,
>> 
>> Maybe this is a bug in PoDoFo - or just wrong usage of the library
>> functions:
>> 
>> Reading/parsing PDF-Files which contain strings with escape sequences, e.g.
>> (\r) or (\b), causes problem when writing these strings: the functions
>> PdfVariant::Write() and PdfString::Write() yield a strange output - that
>> is: (\r)
>> becomes
>> (
>> )
>> and
>> (\b)
>> becomes
>> )
>> respectively.
>> 
>> That means, the escape sequences are resolved in the output to a CR or
>> BACKSPACE instead of maintaining the escaping with the Backslash "\".
>> 
>> Due to this behavior, the written PDFs are corrupt (esp. due to malformed
>> syntax produced by the \b).
>> 
>> Is there a special function or flag to find a work-around for this
>> behavior? I could not fix the problem with PoDoFo functions but rather had
>> use PdfVariant::ToString() and rewrite the std::string manually to hex
>> codes...
>> 
>> Thank you for your help!
>> 
>> Greetings,
>> Amin
>> 
>> PS: The strange encodings with low code numbers occur in a PDF where a Font
>> Encoding remaps all present characters by a "Differences"-Mapping to codes
>> starting at 1 - i.e. the non-printable chars will be mapped to printable
>> characters by this encoding (cf. Pdf Spec Sect 9.6.6 Character Encoding).


------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Re: [Podofo-users] Printing PdfStrings with escape sequences

Reply via email to