FW: porting from mod_perl1 to mod_perl2

2003-09-10 Thread Bart Terryn
Randy,

Did that (made sure to uninstall first).
(made sure to replace the mod_perl.so as well)

But no cure.
I'm still getting the dreaded '8211=entity: 150'.

But it was worth a try

Bart

PS: Oh Randy and a big thanks of course for maintaining the ppms.
It makes the life for the rest of us (mere mortals who dislike compiling) so
much easier.


-Original Message-
From: Randy Kobes [mailto:[EMAIL PROTECTED]
Sent: Wednesday, September 10, 2003 7:00 AM
To: Bart Terryn
Cc: Stas Bekman; [EMAIL PROTECTED]
Subject: RE: porting from mod_perl1 to mod_perl2


On Tue, 9 Sep 2003, Bart Terryn wrote:

 Stas,

 Sorry to insist.
 But here I am again...

 Stas wrote:
 Actually I haven't looked, I have tested with your code.
 Thanks a lot for going through the effort...

 Before setting the header I wasn't getting the unicode
 chars you put in the form back in the dump. After setting
 the header it did print out exacly the same unicode
 character.

 Well that is strange. I just changed my code and still am
 getting the endash back as code 150 and not as the 8212
 code (the way it went in).

If you're using ppm to install mod_perl, could you try the
latest version at http://theoryx5.uwinnipeg.ca/ppms/? There
were some changes made recently that may affect the above
problem. Note that the version in the mod_perl.ppd hasn't
changed, so you may have to uninstall mod_perl and then
install it to force ppm to upgrade.

--
best regards,
randy kobes



Re: porting from mod_perl1 to mod_perl2

2003-09-09 Thread Stas Bekman
I think I got your problem solved, you need to:

- print $q-header();
+ print $q-header(text/html; charset=utf-8);
__
Stas BekmanJAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide --- http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


RE: porting from mod_perl1 to mod_perl2

2003-09-09 Thread Bart Terryn
Stas and all of the others,

Stas said:
I think I got your problem solved, you need to:

- print $q-header();
+ print $q-header(text/html; charset=utf-8);

Well actually you did not.
Probably you looked a bit too fast.
(forgivable in view of the numbers of mails you reply to:-)

The utf8-test.pl code is reading what comes out of the form (which has a
charset=utf-8 meta tag, so that is OK, see my previous mail)
The utf8-test.pl then replaces the characters higher the 7F with char. ref
entities but with the string '+entity: ' in front of the value(see below
lines 11 and 12 of utf8-test.pl).
And to double verify the information read back from the form is also
unpacked from unicode values into their hex counterparts.
And then both strings are printed out as normal low ascii characters (7f),
so no need to set the utf-8 flag here.

From further testing I have seen that only unicode characters that actually
have a representation in the win1252 characters set come back under their
corresponding win1252 characterset position.
So the form would for example contain an ndash character (unicode position
dec 8211 or U+2013) .
But that is read back as character dec 150 or hex 96.
And if the form contains a right single quotation (unicode position dec 8217
or U+2019), it comes back under its win1252 position of dec 146 or hex 92.

I would have expected if I send something in under its unicode position, it
would come back to me under its unicode position.
But then again I may be wrong.
And the utf8 flag in the header only means that is will be utf8 encoded and
should not be confused with the characterset used.

I am under the impression I confusing myself more and more here.
So if somebody has been on this path before and knows the truth, let him
speak up!

(Oh did I mention already that I have tested only against IE6, because the
browser could be the cause as well of this odd(?) behaviour.)

Thanks all for your patience.
I would really like to get to the bottom of this.

Bart

Here is utf8-test.pl, again this time with line numbers
1:#!/perl/bin/perl.exe
2:use strict;
3:use CGI;
4:use CGI::Carp qw(fatalsToBrowser);
5:
6:my $q = CGI-new;
7:my $content = $q-param(utf8-test);
8:$content .= verify with \x{2014};
9:my @content = unpack('U*', $content);
10:$content =~ s/([\x{0800}-\x{}])/sprintf('+entity:%d+',ord($1))/ge;
11:$content =~ s/([\x{0080}-\x{07FF}])/sprintf('+entity: %d+',ord($1))/ge;
12:print $q-header();
13:print $q-p($content);
14:print $q-p('hex');
15:foreach (@content) {printf %x , $_}

and here is the htlm form that triggers the utf8-test.pl:
html xmlns=http://www.w3.org/1999/xhtml; lang=en
head
meta http-equiv=content-type content=text/html; charset=utf-8 /
/head

body
form method=post action=/mod_perl/utf8-test.pl
enctype=multipart/form-data
textarea name ='utf8-test' cols='60'test: #235; #8212;/textarea
nbsp;nbsp;input type=submit value=publish new content//h4
/form
/body/html

and here is the result this all produces:
test: +entity: 235+ +entity: 151+verify with +entity:8212+

hex

74 65 73 74 3a 20 eb 20 97 76 65 72 69 66 79 20 77 69 74 68 20 2014



Re: porting from mod_perl1 to mod_perl2

2003-09-09 Thread Stas Bekman
Bart Terryn wrote:
Stas and all of the others,

Stas said:

I think I got your problem solved, you need to:


- print $q-header();
+ print $q-header(text/html; charset=utf-8);


Well actually you did not.
Probably you looked a bit too fast.
(forgivable in view of the numbers of mails you reply to:-)
Actually I haven't looked, I have tested with your code. Before setting the 
header I wasn't getting the unicode chars you put in the form back in the 
dump. After setting the header it did print out exacly the same unicode character.

I didn't have a chance to mess with the hex representations yet.

[...]
(Oh did I mention already that I have tested only against IE6, because the
browser could be the cause as well of this odd(?) behaviour.)
I think this is where the weak point is. You need to compare characters on the 
server side, not trying to rely on the browser, which as you have seen will 
render them improperly if you didn't set the right header.

You have two things happening: read input, send output. The problem can be in 
any of the two and worse, it can be in both and the error can fix itself when 
doubled. You need to verify first that the input is read properly, then the 
same for the output.

I have started writing the test for mp2 to verify utf8 input, hopefully I'll 
finish it soon.

__
Stas BekmanJAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide --- http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


RE: porting from mod_perl1 to mod_perl2

2003-09-09 Thread Bart Terryn
Stas,

Sorry to insist.
But here I am again...

Stas wrote:
Actually I haven't looked, I have tested with your code.
Thanks a lot for going through the effort...

Before setting the
header I wasn't getting the unicode chars you put in the form back in the
dump. After setting the header it did print out exacly the same unicode
character.

Well that is strange. I just changed my code and still am getting the endash
back as code 150 and not as the 8212 code (the way it went in).

Are you sure that you have the 2 lines in the test program that change the
multibyte utf-8 encoded characters into their values?
(the famous lines 11 and 12)

Because if not, then I can understand that you have to put the changed
header in as you would be sending utf-8 encoded data to the client.
And it would also explain why you would 'see' the same character after
putting the utf-8 header in.

I didn't have a chance to mess with the hex representations yet.

That makes me wonder even more about the thing above.

[...]

I think this is where the weak point is. You need to compare characters on
the
server side, not trying to rely on the browser, which as you have seen will
render them improperly if you didn't set the right header.

Again that is the purpose of the dreaded lines 11 and 12 of my test program.
I don't want to render the character, I just want to display the actual
(utf-8 encoded) code that I read back from the form.

You have two things happening: read input, send output. The problem can be
in
any of the two and worse, it can be in both and the error can fix itself
when
doubled. You need to verify first that the input is read properly, then the
same for the output.

Believe me.
I also ran tests that write out the data to disk and then used a hex dump of
that file to actually verify what is in there. I got the same results. But
that go a bit tedious hence my little test program that does more or less
the same thing.

For your convenience here is the test program again
You will note that I change the $q-header print statement, but as said
before the outcome is still wrong.

Could you confirm that you indeed used this script unmodified and still are
recieving correct output?

As said the important part is in line 11 and 12.
You will need perl 5.8 in order to make those 2 lines work properly
(5.6 does not understand unicode correctly)

#!/perl/bin/perl.exe
use strict;
use CGI;
use CGI::Carp qw(fatalsToBrowser);
use CGI::Cookie;

my $q = CGI-new;
my $content = $q-param(utf8-test);
$content .= verify with \x{2014};
my @content = unpack('U*', $content);
$content =~ s/([\x{0800}-\x{}])/sprintf('+entity:%d+',ord($1))/ge;
$content =~ s/([\x{0080}-\x{07FF}])/sprintf('+entity: %d+',ord($1))/ge;
print $q-header(text/html; charset=utf-8);
print $q-p($content);
print $q-p('hex');
foreach (@content) {printf %x , $_}


I have started writing the test for mp2 to verify utf8 input, hopefully
I'll
finish it soon.

Thanks a lot for your support...

Bart



RE: porting from mod_perl1 to mod_perl2

2003-09-09 Thread Randy Kobes
On Tue, 9 Sep 2003, Bart Terryn wrote:

 Stas,

 Sorry to insist.
 But here I am again...

 Stas wrote:
 Actually I haven't looked, I have tested with your code.
 Thanks a lot for going through the effort...

 Before setting the header I wasn't getting the unicode
 chars you put in the form back in the dump. After setting
 the header it did print out exacly the same unicode
 character.

 Well that is strange. I just changed my code and still am
 getting the endash back as code 150 and not as the 8212
 code (the way it went in).

If you're using ppm to install mod_perl, could you try the
latest version at http://theoryx5.uwinnipeg.ca/ppms/? There
were some changes made recently that may affect the above
problem. Note that the version in the mod_perl.ppd hasn't
changed, so you may have to uninstall mod_perl and then
install it to force ppm to upgrade.

-- 
best regards,
randy kobes


Re: porting from mod_perl1 to mod_perl2

2003-09-06 Thread Philip M. Gollucci
If you check out the changes to CGI.pm on Licoln Stiens web site, utf8 
was added via a path by someone else
2.99 - 3.00 likely this is the cause.

Stas Bekman wrote:

Perrin Harkins wrote:

I am fairly sure it is not perl5.8.


I'm fairly sure it is.  What is your locale set to?  Are you on Red
Hat?  See previous discussions of locale issues on Red Hat 8 and 9 in
the list archives.


Bart is on win32, AS Perl 5.8. I doubt it's a locale issue, since it's 
the client who decides what encoding the data is in, it's either 
CGI.pm  (guessing that what he was using to parse the forms) or more 
low level (io) issues.

Bart, can you test whether you have the same problem when a run the 
same code under mod_cgi in Apache2 (with perl5.8 ofcourse)? If not, 
that will point the blaming finger towards mod_perl 2.0. Someone 
volunteers to add a new test? See

t/modperl/print_utf8.t
t/response/TestModperl/print_utf8.pm
for an example of testing the responding with utf8 data. You can 
probably adopt one of these couples for testing the posting of utf8 data:

t/apache/cgihandler.t
t/response/TestApache/cgihandler.pm
t/modules/cgi.t
t/response/TestModules/cgi.pm
t/modules/cgiupload.t
t/response/TestModules/cgiupload.pm
of course you will want to create a new couple of files for this test.

__
Stas BekmanJAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide --- http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com






--
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: porting from mod_perl1 to mod_perl2

2003-09-06 Thread Stas Bekman
Philip M. Gollucci wrote:
If you check out the changes to CGI.pm on Licoln Stiens web site, utf8 
was added via a path by someone else
2.99 - 3.00 likely this is the cause.
Bart, can you try then with an earlier version? e.g. 2.93 was good for me. You 
 can get it from here: http://www.cpan.org/authors/id/L/LD/LDS/

__
Stas BekmanJAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide --- http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


--
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


RE: porting from mod_perl1 to mod_perl2

2003-09-06 Thread Bart Terryn
Stas wrote:

Bart, can you test whether you have the same problem when a run the same
code
under mod_cgi in Apache2 (with perl5.8 ofcourse)? If not, that will point
the
blaming finger towards mod_perl 2.0.

Well I did that and guess what? mod_cgi fails as well.
So it is not a mod_perl problem
But for me it is still uncertain who to blame. (cgi.pm? apache2? perl5.8?)

I made a small test for this.
Just in case somebody wants to give it a try
Here is my sample page:
-
html xmlns=http://www.w3.org/1999/xhtml; lang=en
head
meta http-equiv=content-type content=text/html; charset=utf-8 /
/head

body
form method=post action=/mod_perl/utf8-test.pl
enctype=multipart/form-data
textarea name ='utf8-test' cols='60'test: #235; #8212;/textarea
nbsp;nbsp;input type=submit value=publish new content//h4
/form
/body/html
--
Here is the utf8-test.pl:
---
#!/perl/bin/perl.exe
use strict;
use CGI;
use CGI::Carp qw(fatalsToBrowser);

my $q = CGI-new;
my $content = $q-param(utf8-test);
$content .= verify with \x{2014};
my @content = unpack('U*', $content);
$content =~ s/([\x{0800}-\x{}])/sprintf('+entity:%d+',ord($1))/ge;
$content =~ s/([\x{0080}-\x{07FF}])/sprintf('+entity: %d+',ord($1))/ge;
print $q-header();
print $q-p($content);
print $q-p('hex');
foreach (@content) {printf %x , $_}
---
and here is the output I get:

test: +entity: 235+ +entity: 151+verify with +entity:8212+

hex

74 65 73 74 3a 20 eb 20 97 76 65 72 69 66 79 20 77 69 74 68 20 2014
--

From which I understand that the original character #8212; is returned as
hex 97 or dec 151.
And would be correct if the characterset would be window-1252 but that is
not what I expected.
Wanted utf-8 to be returned.

If mod_perl is not the correct forum for this (which I agree it isn't) can
somebody point me in the direction of a correct forum? But as said before
the difficulty is that I don't know who to blame

Kind Regards,

Bart



-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html



RE: porting from mod_perl1 to mod_perl2

2003-09-06 Thread Bart Terryn
I had version CGI 3.00 installed.
Downgraded it to CGI 2.93, put I still have the same result.

The problem as I see it that I have a form with character #8212; in it.
But it is returned as character #151 from the Widows-1252 characterset.
Does everybody agree that it should be returned as #8212; (the utf-8
representation I mean)?

See my previous mail for the test I used.

Bart

-Original Message-
From: Stas Bekman [mailto:[EMAIL PROTECTED]
Sent: Saturday, September 06, 2003 8:35 AM
To: Philip M. Gollucci
Cc: Perrin Harkins; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: porting from mod_perl1 to mod_perl2


Philip M. Gollucci wrote:
 If you check out the changes to CGI.pm on Licoln Stiens web site, utf8
 was added via a path by someone else
 2.99 - 3.00 likely this is the cause.

Bart, can you try then with an earlier version? e.g. 2.93 was good for me.
You
  can get it from here: http://www.cpan.org/authors/id/L/LD/LDS/

__
Stas BekmanJAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide --- http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com



--
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html



-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html



RE: porting from mod_perl1 to mod_perl2

2003-09-05 Thread Bart Terryn
Hi,

I have an application running under apache
1.37(win32)/mod_perl1.27_01-dev/perl5.6 build 633

I am trying to move this application to apache
2.0.47(win32)/mod_perl1.99_10-dev/perl 5.8

However I run into a problem with character encoding.
Somewhere in this app I put up a form that contains text.
The encoding of the html page that contains this form is set to 'utf-8' by
the following:
meta http-equiv=content-type content=text/html; charset=UTF-8 /
That form displays OK in both mod_perl1.0 and mod_perl2.0

When I read the form back under the apache1, everything is OK.
When I do the same using the apache 2 combination I run into trouble with
the char ref entities entities which are high in the unicode set like:
#8212; or #8211;. These characters are returned as unicode characters hex
97 and hex 96.

Other character ref entities like the one for e (e umlaut = #235;) are
returned correctly.

So I assume that only characters above 07FFF are returned wrong.

Anybody any idea?

Thanks in advance

Bart

PS: some might say that this has nothing to do with mod_perl.
And maybe you are right, but I have no clue which part might be causing
this.
I am fairly sure it is not perl5.8.
Although in order to make the apache2/mod_perl2 combination to work I had to
upgrade the CGI.pm to version 3.0



-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html



RE: porting from mod_perl1 to mod_perl2

2003-09-05 Thread Ged Haywood
Hi there,

On Sat, 6 Sep 2003, Bart Terryn wrote:

 Hi,
 
 I have an application running under apache
 1.37(win32)/mod_perl1.27_01-dev/perl5.6 build 633
 
 I am trying to move this application to apache
 2.0.47(win32)/mod_perl1.99_10-dev/perl 5.8
 
 However I run into a problem with character encoding.

Have you checked

perldoc perllocale

?

73,
Ged.
 



-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html



RE: porting from mod_perl1 to mod_perl2

2003-09-05 Thread Perrin Harkins
On Fri, 2003-09-05 at 19:14, Bart Terryn wrote:
 PS: some might say that this has nothing to do with mod_perl

I would say that, but it's okay, you didn't know.

 I am fairly sure it is not perl5.8.

I'm fairly sure it is.  What is your locale set to?  Are you on Red
Hat?  See previous discussions of locale issues on Red Hat 8 and 9 in
the list archives.

- Perrin



-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html



Re: porting from mod_perl1 to mod_perl2

2003-09-05 Thread Stas Bekman
Perrin Harkins wrote:

I am fairly sure it is not perl5.8.


I'm fairly sure it is.  What is your locale set to?  Are you on Red
Hat?  See previous discussions of locale issues on Red Hat 8 and 9 in
the list archives.
Bart is on win32, AS Perl 5.8. I doubt it's a locale issue, since it's the 
client who decides what encoding the data is in, it's either CGI.pm  (guessing 
that what he was using to parse the forms) or more low level (io) issues.

Bart, can you test whether you have the same problem when a run the same code 
under mod_cgi in Apache2 (with perl5.8 ofcourse)? If not, that will point the 
blaming finger towards mod_perl 2.0. Someone volunteers to add a new test? See

t/modperl/print_utf8.t
t/response/TestModperl/print_utf8.pm
for an example of testing the responding with utf8 data. You can probably 
adopt one of these couples for testing the posting of utf8 data:

t/apache/cgihandler.t
t/response/TestApache/cgihandler.pm
t/modules/cgi.t
t/response/TestModules/cgi.pm
t/modules/cgiupload.t
t/response/TestModules/cgiupload.pm
of course you will want to create a new couple of files for this test.

__
Stas BekmanJAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide --- http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


--
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html


Re: porting from mod_perl1 to mod_perl2

2003-09-05 Thread Perrin Harkins
On Fri, 2003-09-05 at 21:36, Stas Bekman wrote:
 Bart is on win32, AS Perl 5.8.

Oops, sorry Bart, I missed that.  Even so, I'm suspicious that 5.8 and
all of its unicode changes are involved somehow.

- Perrin



-- 
Reporting bugs: http://perl.apache.org/bugs/
Mail list info: http://perl.apache.org/maillist/modperl.html