FW: porting from mod_perl1 to mod_perl2
Randy, Did that (made sure to uninstall first). (made sure to replace the mod_perl.so as well) But no cure. I'm still getting the dreaded '8211=>entity: 150'. But it was worth a try Bart PS: Oh Randy and a big thanks of course for maintaining the ppms. It makes the life for the rest of us (mere mortals who dislike compiling) so much easier. -Original Message- From: Randy Kobes [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 10, 2003 7:00 AM To: Bart Terryn Cc: Stas Bekman; [EMAIL PROTECTED] Subject: RE: porting from mod_perl1 to mod_perl2 On Tue, 9 Sep 2003, Bart Terryn wrote: > Stas, > > Sorry to insist. > But here I am again... > > Stas wrote: > >Actually I haven't looked, I have tested with your code. > Thanks a lot for going through the effort... > > >Before setting the header I wasn't getting the unicode > >chars you put in the form back in the dump. After setting > >the header it did print out exacly the same unicode > >character. > > Well that is strange. I just changed my code and still am > getting the endash back as code 150 and not as the 8212 > code (the way it went in). If you're using ppm to install mod_perl, could you try the latest version at http://theoryx5.uwinnipeg.ca/ppms/? There were some changes made recently that may affect the above problem. Note that the version in the mod_perl.ppd hasn't changed, so you may have to uninstall mod_perl and then install it to force ppm to upgrade. -- best regards, randy kobes
RE: porting from mod_perl1 to mod_perl2
On Tue, 9 Sep 2003, Bart Terryn wrote: > Stas, > > Sorry to insist. > But here I am again... > > Stas wrote: > >Actually I haven't looked, I have tested with your code. > Thanks a lot for going through the effort... > > >Before setting the header I wasn't getting the unicode > >chars you put in the form back in the dump. After setting > >the header it did print out exacly the same unicode > >character. > > Well that is strange. I just changed my code and still am > getting the endash back as code 150 and not as the 8212 > code (the way it went in). If you're using ppm to install mod_perl, could you try the latest version at http://theoryx5.uwinnipeg.ca/ppms/? There were some changes made recently that may affect the above problem. Note that the version in the mod_perl.ppd hasn't changed, so you may have to uninstall mod_perl and then install it to force ppm to upgrade. -- best regards, randy kobes
RE: porting from mod_perl1 to mod_perl2
Stas, Sorry to insist. But here I am again... Stas wrote: >Actually I haven't looked, I have tested with your code. Thanks a lot for going through the effort... >Before setting the >header I wasn't getting the unicode chars you put in the form back in the >dump. After setting the header it did print out exacly the same unicode character. Well that is strange. I just changed my code and still am getting the endash back as code 150 and not as the 8212 code (the way it went in). Are you sure that you have the 2 lines in the test program that change the multibyte utf-8 encoded characters into their values? (the famous lines 11 and 12) Because if not, then I can understand that you have to put the changed header in as you would be sending utf-8 encoded data to the client. And it would also explain why you would 'see' the same character after putting the utf-8 header in. >I didn't have a chance to mess with the hex representations yet. That makes me wonder even more about the thing above. [...] >I think this is where the weak point is. You need to compare characters on the >server side, not trying to rely on the browser, which as you have seen will >render them improperly if you didn't set the right header. Again that is the purpose of the dreaded lines 11 and 12 of my test program. I don't want to render the character, I just want to display the actual (utf-8 encoded) code that I read back from the form. >You have two things happening: read input, send output. The problem can be in >any of the two and worse, it can be in both and the error can fix itself when >doubled. You need to verify first that the input is read properly, then the >same for the output. Believe me. I also ran tests that write out the data to disk and then used a hex dump of that file to actually verify what is in there. I got the same results. But that go a bit tedious hence my little test program that does more or less the same thing. For your convenience here is the test program again You will note that I change the $q->header print statement, but as said before the outcome is still wrong. Could you confirm that you indeed used this script unmodified and still are recieving correct output? As said the important part is in line 11 and 12. You will need perl 5.8 in order to make those 2 lines work properly (5.6 does not understand unicode correctly) #!/perl/bin/perl.exe use strict; use CGI; use CGI::Carp qw(fatalsToBrowser); use CGI::Cookie; my $q = CGI->new; my $content = $q->param("utf8-test"); $content .= "verify with \x{2014}"; my @content = unpack('U*', $content); $content =~ s/([\x{0800}-\x{}])/sprintf('+entity:%d+',ord($1))/ge; $content =~ s/([\x{0080}-\x{07FF}])/sprintf('+entity: %d+',ord($1))/ge; print $q->header("text/html; charset=utf-8"); print $q->p($content); print $q->p('hex'); foreach (@content) {printf "%x ", $_} >I have started writing the test for mp2 to verify utf8 input, hopefully I'll >finish it soon. Thanks a lot for your support... Bart
Re: porting from mod_perl1 to mod_perl2
Bart Terryn wrote: Stas and all of the others, Stas said: I think I got your problem solved, you need to: - print $q->header(); + print $q->header("text/html; charset=utf-8"); Well actually you did not. Probably you looked a bit too fast. (forgivable in view of the numbers of mails you reply to:-) Actually I haven't looked, I have tested with your code. Before setting the header I wasn't getting the unicode chars you put in the form back in the dump. After setting the header it did print out exacly the same unicode character. I didn't have a chance to mess with the hex representations yet. [...] (Oh did I mention already that I have tested only against IE6, because the browser could be the cause as well of this odd(?) behaviour.) I think this is where the weak point is. You need to compare characters on the server side, not trying to rely on the browser, which as you have seen will render them improperly if you didn't set the right header. You have two things happening: read input, send output. The problem can be in any of the two and worse, it can be in both and the error can fix itself when doubled. You need to verify first that the input is read properly, then the same for the output. I have started writing the test for mp2 to verify utf8 input, hopefully I'll finish it soon. __ Stas BekmanJAm_pH --> Just Another mod_perl Hacker http://stason.org/ mod_perl Guide ---> http://perl.apache.org mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com http://modperlbook.org http://apache.org http://ticketmaster.com
RE: porting from mod_perl1 to mod_perl2
Stas and all of the others, Stas said: >I think I got your problem solved, you need to: >- print $q->header(); >+ print $q->header("text/html; charset=utf-8"); Well actually you did not. Probably you looked a bit too fast. (forgivable in view of the numbers of mails you reply to:-) The utf8-test.pl code is reading what comes out of the form (which has a charset=utf-8 meta tag, so that is OK, see my previous mail) The utf8-test.pl then replaces the characters higher the 7F with char. ref entities but with the string '+entity: ' in front of the value(see below lines 11 and 12 of utf8-test.pl). And to double verify the information read back from the form is also unpacked from unicode values into their hex counterparts. And then both strings are printed out as normal low ascii characters (<7f), so no need to set the utf-8 flag here. >From further testing I have seen that only unicode characters that actually have a representation in the win1252 characters set come back under their corresponding win1252 characterset position. So the form would for example contain an ndash character (unicode position dec 8211 or U+2013) . But that is read back as character dec 150 or hex 96. And if the form contains a right single quotation (unicode position dec 8217 or U+2019), it comes back under its win1252 position of dec 146 or hex 92. I would have expected if I send something in under its unicode position, it would come back to me under its unicode position. But then again I may be wrong. And the utf8 flag in the header only means that is will be utf8 encoded and should not be confused with the characterset used. I am under the impression I confusing myself more and more here. So if somebody has been on this path before and knows the truth, let him speak up! (Oh did I mention already that I have tested only against IE6, because the browser could be the cause as well of this odd(?) behaviour.) Thanks all for your patience. I would really like to get to the bottom of this. Bart Here is utf8-test.pl, again this time with line numbers 1:#!/perl/bin/perl.exe 2:use strict; 3:use CGI; 4:use CGI::Carp qw(fatalsToBrowser); 5: 6:my $q = CGI->new; 7:my $content = $q->param("utf8-test"); 8:$content .= "verify with \x{2014}"; 9:my @content = unpack('U*', $content); 10:$content =~ s/([\x{0800}-\x{}])/sprintf('+entity:%d+',ord($1))/ge; 11:$content =~ s/([\x{0080}-\x{07FF}])/sprintf('+entity: %d+',ord($1))/ge; 12:print $q->header(); 13:print $q->p($content); 14:print $q->p('hex'); 15:foreach (@content) {printf "%x ", $_} and here is the htlm form that triggers the utf8-test.pl: http://www.w3.org/1999/xhtml"; lang="en"> test: ë — and here is the result this all produces: test: +entity: 235+ +entity: 151+verify with +entity:8212+ hex 74 65 73 74 3a 20 eb 20 97 76 65 72 69 66 79 20 77 69 74 68 20 2014
Re: porting from mod_perl1 to mod_perl2
I think I got your problem solved, you need to: - print $q->header(); + print $q->header("text/html; charset=utf-8"); __ Stas BekmanJAm_pH --> Just Another mod_perl Hacker http://stason.org/ mod_perl Guide ---> http://perl.apache.org mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com http://modperlbook.org http://apache.org http://ticketmaster.com
RE: porting from mod_perl1 to mod_perl2
I had version CGI 3.00 installed. Downgraded it to CGI 2.93, put I still have the same result. The problem as I see it that I have a form with character — in it. But it is returned as character — from the Widows-1252 characterset. Does everybody agree that it should be returned as — (the utf-8 representation I mean)? See my previous mail for the test I used. Bart -Original Message- From: Stas Bekman [mailto:[EMAIL PROTECTED] Sent: Saturday, September 06, 2003 8:35 AM To: Philip M. Gollucci Cc: Perrin Harkins; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: porting from mod_perl1 to mod_perl2 Philip M. Gollucci wrote: > If you check out the changes to CGI.pm on Licoln Stiens web site, utf8 > was added via a path by someone else > 2.99 - 3.00 likely this is the cause. Bart, can you try then with an earlier version? e.g. 2.93 was good for me. You can get it from here: http://www.cpan.org/authors/id/L/LD/LDS/ __ Stas BekmanJAm_pH --> Just Another mod_perl Hacker http://stason.org/ mod_perl Guide ---> http://perl.apache.org mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com http://modperlbook.org http://apache.org http://ticketmaster.com -- Reporting bugs: http://perl.apache.org/bugs/ Mail list info: http://perl.apache.org/maillist/modperl.html -- Reporting bugs: http://perl.apache.org/bugs/ Mail list info: http://perl.apache.org/maillist/modperl.html
RE: porting from mod_perl1 to mod_perl2
Stas wrote: >Bart, can you test whether you have the same problem when a run the same code >under mod_cgi in Apache2 (with perl5.8 ofcourse)? If not, that will point the >blaming finger towards mod_perl 2.0. Well I did that and guess what? mod_cgi fails as well. So it is not a mod_perl problem But for me it is still uncertain who to blame. (cgi.pm? apache2? perl5.8?) I made a small test for this. Just in case somebody wants to give it a try Here is my sample page: - http://www.w3.org/1999/xhtml"; lang="en"> test: ë — -- Here is the utf8-test.pl: --- #!/perl/bin/perl.exe use strict; use CGI; use CGI::Carp qw(fatalsToBrowser); my $q = CGI->new; my $content = $q->param("utf8-test"); $content .= "verify with \x{2014}"; my @content = unpack('U*', $content); $content =~ s/([\x{0800}-\x{}])/sprintf('+entity:%d+',ord($1))/ge; $content =~ s/([\x{0080}-\x{07FF}])/sprintf('+entity: %d+',ord($1))/ge; print $q->header(); print $q->p($content); print $q->p('hex'); foreach (@content) {printf "%x ", $_} --- and here is the output I get: test: +entity: 235+ +entity: 151+verify with +entity:8212+ hex 74 65 73 74 3a 20 eb 20 97 76 65 72 69 66 79 20 77 69 74 68 20 2014 -- >From which I understand that the original character — is returned as hex 97 or dec 151. And would be correct if the characterset would be window-1252 but that is not what I expected. Wanted utf-8 to be returned. If mod_perl is not the correct forum for this (which I agree it isn't) can somebody point me in the direction of a correct forum? But as said before the difficulty is that I don't know who to blame Kind Regards, Bart -- Reporting bugs: http://perl.apache.org/bugs/ Mail list info: http://perl.apache.org/maillist/modperl.html
Re: porting from mod_perl1 to mod_perl2
Philip M. Gollucci wrote: If you check out the changes to CGI.pm on Licoln Stiens web site, utf8 was added via a path by someone else 2.99 - 3.00 likely this is the cause. Bart, can you try then with an earlier version? e.g. 2.93 was good for me. You can get it from here: http://www.cpan.org/authors/id/L/LD/LDS/ __ Stas BekmanJAm_pH --> Just Another mod_perl Hacker http://stason.org/ mod_perl Guide ---> http://perl.apache.org mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com http://modperlbook.org http://apache.org http://ticketmaster.com -- Reporting bugs: http://perl.apache.org/bugs/ Mail list info: http://perl.apache.org/maillist/modperl.html
Re: porting from mod_perl1 to mod_perl2
If you check out the changes to CGI.pm on Licoln Stiens web site, utf8 was added via a path by someone else 2.99 - 3.00 likely this is the cause. Stas Bekman wrote: Perrin Harkins wrote: I am fairly sure it is not perl5.8. I'm fairly sure it is. What is your locale set to? Are you on Red Hat? See previous discussions of locale issues on Red Hat 8 and 9 in the list archives. Bart is on win32, AS Perl 5.8. I doubt it's a locale issue, since it's the client who decides what encoding the data is in, it's either CGI.pm (guessing that what he was using to parse the forms) or more low level (io) issues. Bart, can you test whether you have the same problem when a run the same code under mod_cgi in Apache2 (with perl5.8 ofcourse)? If not, that will point the blaming finger towards mod_perl 2.0. Someone volunteers to add a new test? See t/modperl/print_utf8.t t/response/TestModperl/print_utf8.pm for an example of testing the responding with utf8 data. You can probably adopt one of these couples for testing the posting of utf8 data: t/apache/cgihandler.t t/response/TestApache/cgihandler.pm t/modules/cgi.t t/response/TestModules/cgi.pm t/modules/cgiupload.t t/response/TestModules/cgiupload.pm of course you will want to create a new couple of files for this test. __ Stas BekmanJAm_pH --> Just Another mod_perl Hacker http://stason.org/ mod_perl Guide ---> http://perl.apache.org mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com http://modperlbook.org http://apache.org http://ticketmaster.com -- Reporting bugs: http://perl.apache.org/bugs/ Mail list info: http://perl.apache.org/maillist/modperl.html
Re: porting from mod_perl1 to mod_perl2
On Fri, 2003-09-05 at 21:36, Stas Bekman wrote: > Bart is on win32, AS Perl 5.8. Oops, sorry Bart, I missed that. Even so, I'm suspicious that 5.8 and all of its unicode changes are involved somehow. - Perrin -- Reporting bugs: http://perl.apache.org/bugs/ Mail list info: http://perl.apache.org/maillist/modperl.html
Re: porting from mod_perl1 to mod_perl2
Perrin Harkins wrote: I am fairly sure it is not perl5.8. I'm fairly sure it is. What is your locale set to? Are you on Red Hat? See previous discussions of locale issues on Red Hat 8 and 9 in the list archives. Bart is on win32, AS Perl 5.8. I doubt it's a locale issue, since it's the client who decides what encoding the data is in, it's either CGI.pm (guessing that what he was using to parse the forms) or more low level (io) issues. Bart, can you test whether you have the same problem when a run the same code under mod_cgi in Apache2 (with perl5.8 ofcourse)? If not, that will point the blaming finger towards mod_perl 2.0. Someone volunteers to add a new test? See t/modperl/print_utf8.t t/response/TestModperl/print_utf8.pm for an example of testing the responding with utf8 data. You can probably adopt one of these couples for testing the posting of utf8 data: t/apache/cgihandler.t t/response/TestApache/cgihandler.pm t/modules/cgi.t t/response/TestModules/cgi.pm t/modules/cgiupload.t t/response/TestModules/cgiupload.pm of course you will want to create a new couple of files for this test. __ Stas BekmanJAm_pH --> Just Another mod_perl Hacker http://stason.org/ mod_perl Guide ---> http://perl.apache.org mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com http://modperlbook.org http://apache.org http://ticketmaster.com -- Reporting bugs: http://perl.apache.org/bugs/ Mail list info: http://perl.apache.org/maillist/modperl.html
Re: porting from mod_perl1 to mod_perl2
Bart Terryn wrote: Hi, I have an application running under apache 1.37(win32)/mod_perl1.27_01-dev/perl5.6 build 633 I am trying to move this application to apache 2.0.47(win32)/mod_perl1.99_10-dev/perl 5.8 However I run into a problem with character encoding. Somewhere in this app I put up a form that contains text. The encoding of the html page that contains this form is set to 'utf-8' by the following: That form displays OK in both mod_perl1.0 and mod_perl2.0 When I read the form back under the apache1, everything is OK. When I do the same using the apache 2 combination I run into trouble with the char ref entities entities which are high in the unicode set like: — or –. These characters are returned as unicode characters hex 97 and hex 96. Returned from where? CGI.pm? Does your 'perl -V:useperlio' reports: useperlio='define'; If so, can you give a try with the latest mp2 cvs? However I think it won't change anything, since the only change is that since now perlio is used, you can binmode it to 'utf8'. I have just added tests for sending utf8 data, but we probably need to add the receiving utf8 data as well. __ Stas BekmanJAm_pH --> Just Another mod_perl Hacker http://stason.org/ mod_perl Guide ---> http://perl.apache.org mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com http://modperlbook.org http://apache.org http://ticketmaster.com -- Reporting bugs: http://perl.apache.org/bugs/ Mail list info: http://perl.apache.org/maillist/modperl.html
RE: porting from mod_perl1 to mod_perl2
On Fri, 2003-09-05 at 19:14, Bart Terryn wrote: > PS: some might say that this has nothing to do with mod_perl I would say that, but it's okay, you didn't know. > I am fairly sure it is not perl5.8. I'm fairly sure it is. What is your locale set to? Are you on Red Hat? See previous discussions of locale issues on Red Hat 8 and 9 in the list archives. - Perrin -- Reporting bugs: http://perl.apache.org/bugs/ Mail list info: http://perl.apache.org/maillist/modperl.html
RE: porting from mod_perl1 to mod_perl2
Hi there, On Sat, 6 Sep 2003, Bart Terryn wrote: > Hi, > > I have an application running under apache > 1.37(win32)/mod_perl1.27_01-dev/perl5.6 build 633 > > I am trying to move this application to apache > 2.0.47(win32)/mod_perl1.99_10-dev/perl 5.8 > > However I run into a problem with character encoding. Have you checked perldoc perllocale ? 73, Ged. -- Reporting bugs: http://perl.apache.org/bugs/ Mail list info: http://perl.apache.org/maillist/modperl.html
RE: porting from mod_perl1 to mod_perl2
Hi, I have an application running under apache 1.37(win32)/mod_perl1.27_01-dev/perl5.6 build 633 I am trying to move this application to apache 2.0.47(win32)/mod_perl1.99_10-dev/perl 5.8 However I run into a problem with character encoding. Somewhere in this app I put up a form that contains text. The encoding of the html page that contains this form is set to 'utf-8' by the following: That form displays OK in both mod_perl1.0 and mod_perl2.0 When I read the form back under the apache1, everything is OK. When I do the same using the apache 2 combination I run into trouble with the char ref entities entities which are high in the unicode set like: — or –. These characters are returned as unicode characters hex 97 and hex 96. Other character ref entities like the one for e (e umlaut = ë) are returned correctly. So I assume that only characters above 07FFF are returned wrong. Anybody any idea? Thanks in advance Bart PS: some might say that this has nothing to do with mod_perl. And maybe you are right, but I have no clue which part might be causing this. I am fairly sure it is not perl5.8. Although in order to make the apache2/mod_perl2 combination to work I had to upgrade the CGI.pm to version 3.0 -- Reporting bugs: http://perl.apache.org/bugs/ Mail list info: http://perl.apache.org/maillist/modperl.html