Re: [Podofo-users] Test EncodingTest::testDifferencesEncoding() now properly fixed

Michal Sudolsky Tue, 26 Apr 2022 10:20:27 -0700

>
> I tried the code you supplied in pdfmm: if the found font has all the
> required GIDs, and the standard14 Helvetica actually doesn't have all of
> them so I used Arial as a fallback, I can already handle the text
> correctly, see the attachment.



As I remember you did some unification regarding PdfString and UTF-8 in
pdfmm so if the source file is interpreted as in UTF-8 encoding I can see
why in your pdf are all texts correct. And standard14 Helvetica actually
has all the required glyphs, you can see this in the pdf posted by zyx as
the third text is displayed correctly, you just need encoding which
contains them all like Win1250 (I hope that standard14 fonts are not broken
in pdfmm).

        str = PdfString("ěščřABCĚŠČŘ");


As your string does not have a unicode marker it will be just copied into
the internar buffer as is. As your file is interpreted as UTF-8 now str
contains UTF-8 encoded string but treats its bytes as encoded in
PDFDocEncoding.

        ustr = str.GetStringUtf8();

        printf ("1) '%s'\n", ustr.c_str());
>

And in ustr you get UTF-8 garbage.


>         painter.DrawText(10, 780, str);
>

And it will not be correct in pdf.

        painter.DrawText(10, 740, ustr);
>
>
Now this takes UTF-8 garbage and treats it as PDFDocEncoding and you get
another garbage.



>         str = PdfString((const pdf_utf8 *) "ěščřABCĚŠČŘ");
>

As this 2) is working I would suppose your file is interpreted as UTF-8
encoded. String str now contains correct UTF-16BE encoded string if I am
not wrong.

        ustr = str.GetStringUtf8();
>         printf ("2) '%s'\n", ustr.c_str());
>

Here printf in your environment treats ustr as UTF-8 string and prints it
correctly.

        painter.DrawText(10, 700, str);
>

Now it is not surprising that this is working.



>         painter.DrawText(10, 660, ustr);
>
>
But this is not because you again used the PdfString constructor as in "str
= PdfString("ěščřABCĚŠČŘ");".


>         printf ("%s: wrote %d bytes: '%.*s'\n", __FUNCTION__, (int)
> output.GetLength(), (int) output.GetLength(), output.Get());
>

I suppose the output contains the same hex sequences as are in your pdf
file?


> Why do I include it here when it does not touch the r1967 change? I
> think the change in the r1967 can be correct, the problem is in the
> litePDF, not using proper PdfString constructors, similarly to the
> above test program. It can be the litePDF "counted" (even
> unintentionally) with the previous behavior, without using correct
> functions for the PdfString; or, taken it the other way around, the way
> litePDF has it done was the right way to do it before the r1967 change.
>

I now really cannot see how this all relates to r1967. The only right way
regardless of r1967 is to always pass string in correct encoding into
PdfString constructors.


> I mean, I consider this solved. I'll find a way to properly adapt the
> litePDF code to work as expected with the fixed PoDoFo. Maybe the above
> will help someone else when dealing with the lost UTF-8/Unicode
> letters.
>
>         Bye,
>         zyx
> _______________________________________________
> Podofo-users mailing list
> Podofo-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/podofo-users
>

_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Re: [Podofo-users] Test EncodingTest::testDifferencesEncoding() now properly fixed

Reply via email to