Stas,

Sorry to insist.
But here I am again...

Stas wrote:
>Actually I haven't looked, I have tested with your code.
Thanks a lot for going through the effort...

>Before setting the
>header I wasn't getting the unicode chars you put in the form back in the
>dump. After setting the header it did print out exacly the same unicode
character.

Well that is strange. I just changed my code and still am getting the endash
back as code 150 and not as the 8212 code (the way it went in).

Are you sure that you have the 2 lines in the test program that change the
multibyte utf-8 encoded characters into their values?
(the famous lines 11 and 12)

Because if not, then I can understand that you have to put the changed
header in as you would be sending utf-8 encoded data to the client.
And it would also explain why you would 'see' the same character after
putting the utf-8 header in.

>I didn't have a chance to mess with the hex representations yet.

That makes me wonder even more about the thing above.

[...]

>I think this is where the weak point is. You need to compare characters on
the
>server side, not trying to rely on the browser, which as you have seen will
>render them improperly if you didn't set the right header.

Again that is the purpose of the dreaded lines 11 and 12 of my test program.
I don't want to render the character, I just want to display the actual
(utf-8 encoded) code that I read back from the form.

>You have two things happening: read input, send output. The problem can be
in
>any of the two and worse, it can be in both and the error can fix itself
when
>doubled. You need to verify first that the input is read properly, then the
>same for the output.

Believe me.
I also ran tests that write out the data to disk and then used a hex dump of
that file to actually verify what is in there. I got the same results. But
that go a bit tedious hence my little test program that does more or less
the same thing.

For your convenience here is the test program again
You will note that I change the $q->header print statement, but as said
before the outcome is still wrong.

Could you confirm that you indeed used this script unmodified and still are
recieving correct output?

As said the important part is in line 11 and 12.
You will need perl 5.8 in order to make those 2 lines work properly
(5.6 does not understand unicode correctly)

#!/perl/bin/perl.exe
use strict;
use CGI;
use CGI::Carp qw(fatalsToBrowser);
use CGI::Cookie;

my $q = CGI->new;
my $content = $q->param("utf8-test");
$content .= "verify with \x{2014}";
my @content = unpack('U*', $content);
$content =~ s/([\x{0800}-\x{FFFF}])/sprintf('+entity:%d+',ord($1))/ge;
$content =~ s/([\x{0080}-\x{07FF}])/sprintf('+entity: %d+',ord($1))/ge;
print $q->header("text/html; charset=utf-8");
print $q->p($content);
print $q->p('hex');
foreach (@content) {printf "%x ", $_}


>I have started writing the test for mp2 to verify utf8 input, hopefully
I'll
>finish it soon.

Thanks a lot for your support...

Bart

Reply via email to