Bug#612675: libkio5: KTar class have broken UTF-8 support(longlink)

Ibragimov Rinat Wed, 04 May 2011 01:51:17 -0700

> This though is not totally clear to me. On the major architectures,
> char is signed, so I would assume that a chksum error in this area
> should have hit a lot of people already? Given that int is signed by
> default I wonder if this is the proper approach and it shouldn't rather
> be cast to signed char (signedness of char varies across the different
> architectures).


The error only occurs when file name have characters with codes larger than
128. All ASCII have codes lower than 127, so in that case there is no
difference. UTF-8 uses most significant bit as flag, so some charactes have
codes larger than 128. I'll explain with example:

int check = 32;
check += buffer[j];

assume buffer[0]==128, i.e. 0x80. When one adds signed char 0x80 to an
integer, signed char extents to a signed integer and becomes 0xffffff80.
It is not 0x80, as one may expect.

But if all file names are in english, no one can face the bug.

> Out of curiosity, you filed this from an i386 system. Did you maybe
> copy around the backup from/to any architcture including arm, armel,
> powerpc or s390? Were they somehow involved in the assumingly checksum
> error of yours? The thing behind the question is: If we "fix" the
> calculation in the direction that you propose, this would break backups
> done now on the architectures that do have char signed by default
> because it would result in a different checksum.

No, unfortunately I don't have access to architectures other than amd64 and 
i386.

BTW, I filed bug to upstream: https://bugs.kde.org/show_bug.cgi?id=266141

Bug#612675: libkio5: KTar class have broken UTF-8 support(longlink)

Reply via email to