Stas Bekman wrote:

Steve Hay wrote:


It's only Perl 5.8 that has the special "UTF-8 flag" which the functions above all operate with respect to. If a Perl variable contains a sequence of bytes that make up a valid UTF-8 character, but the string is not flagged with Perl's special flag, then Perl's built-in print() doesn't do this automatic conversion anyway.


Yes.

Apps wanting to handle utf will need to 'require 5.008;' as in your example.

IOW,

   print "Content-type: text/plain\n\n";
   $a = "\xC3\xBC";
   print $a;

retrieved from a mod_cgi server produces (via od -b / od -c):

   0000000 303 274
   0000002


yup, because you need to add utf8::decode($a); before printing $a. Which your version does as well.

(Indeed. I meant it as example of how Perl's (5.8's) print() doesn't do the conversion on strings that are not *flagged* as UTF-8, even when they make valid UTF-8.)




Perl 5.6 and older don't have the UTF-8 flag and hence don't do any automatic conversion via print(). Therefore, mod_perl's print() should not have the difference from Perl's print() that exists in 5.8, so no change should be required.

Sure enough, looking at the "doio.c" source file in Perl 5.6.1, the entire chunk of code that I half-inched above is not present.


So you suggest that we copy this functionality from Perl. So if need to #ifdef it for 5.8.0.

So I'll add


#if PERL_VERSION >= 8
...
#endif

around the code that I've added.



> I have attempted to shoe-horn this into mod_perl's print() method (in
> "src/modules/perl/Apache.xs"). Here's the diff against mod_perl 1.28:
> [Unfortunately, I've had to comment-out the first part of that "if"
> block, because I got an unresolved external symbol error relating to the
> PerlIO_isutf8() function otherwise (which may be because that function
> isn't documented in the perlapio manpage).]
>
> --- Apache.xs.orig 2003-06-06 12:31:10.000000000 +0100
> +++ Apache.xs 2003-07-15 12:20:42.000000000 +0100
> @@ -1119,12 +1119,25 @@
> SV *sv = sv_newmortal();
> SV *rp = ST(0);
> SV *sendh = perl_get_sv("Apache::__SendHeader", TRUE);
> + /*PerlIO *fp = PerlIO_stdout();*/
>
> if(items > 2)
> do_join(sv, &sv_no, MARK+1, SP); /* $sv = join '', @_[1..$#_] */
> else
> sv_setsv(sv, ST(1));
>
> + /*if (PerlIO_isutf8(fp)) {
> + if (!SvUTF8(sv))
> + sv_utf8_upgrade(sv = sv_mortalcopy(sv));
> + }
> + else*/ if (DO_UTF8(sv)) {
> + if (!sv_utf8_downgrade((sv = sv_mortalcopy(sv)), TRUE)
> + && ckWARN_d(WARN_UTF8))
> + {
> + Perl_warner(aTHX_ packWARN(WARN_UTF8), "Wide character in print");
> + }
> + }
> +
> PUSHMARK(sp);
> XPUSHs(rp);
> XPUSHs(sv);
>
> Besides the problem with PerlIO_isutf8(),


mod_perl 1.x doesn't use perlio, hence you have this problem. adding:

#include "perlio.h"

should resolve it I think.

No. The error was "unresolved external symbol", which means that the compiler is happy (it evidently has pulled in perlio.h, or something else that declares PerlIO_isutf8() as "extern ..."), but that the linker couldn't find the definition of that function.


(Check: If I change "PerlIO_isutf8" to "PerlIO_isutf" (deliberate typo) then I get a different error - "undefined; assuming extern returning int" - because now no declaration has been supplied.)

Listing the symbols exported from perl58.lib shows that PerlIO_isutf8 is *not* one of them. So where's the definition supposed to come from?

I'll ask about this on the perlxs mailing list, I think.



> there are other problems that
> spring to my mind straight away with this:
> - is getting the PerlIO * for STDOUT to right thing to be doing anyway?

PerlIO *fp = IoOFP(GvIOp(defoutgv))

Seems to work OK for me. What's defoutgv?




> - if "items > 2", do we need to handle the UTF8-ness of each of those
> items individually before we join them?

I'm not sure, how perl handles this?

Struggling as best as I can to read pp_print() in Perl's "pp_hot.c", it looks like Perl calls do_print() (which contains the UTF-8 handling that I've stolen) for each item in the list that is passed to it.


Considering this more, I think that it probably isn't an issue: if you have two variables in Perl, one of which is flagged UTF-8 and the other of which isn't, then when you concatenate them, the whole is "upgraded" to flagged UTF-8 anyway.

However, it has occurred to me that I've missed out adding the UTF-8 handling to half of mod_perl's print() method!

The method is split into two halves:

   if (!mod_perl_sent_header(r, 0)) {
   ...
   } else {
   ...
   }

and I've only handled the first half!

The first half joins all of the items together and then calls send_cgi_header(). That outputs everything down to the first blank line (i.e. all the headers), then sets the "sent headers" flag and recurses on $r->print(). Next time around, we'll enter the second half, which simply calls write_client().

If we've already been through the first half then the UTF-8 conversion will have been applied already, but if we come straight into the second half (i.e. by printing the headers and the body separately) then the UTF-8 conversion will not yet have been applied. So as my patch stands,

   use utf8;
   $a = "\xC3\xBC";
   utf8::decode($a);
   print "Content-type: text/plain\n\n", $a;

will have the UTF-8 data in $a handled, but

   use utf8;
   $a = "\xC3\xBC";
   utf8::decode($a);
   print "Content-type: text/plain\n\n";
   print $a;

will not!

The write_client() method appears to call rwrite() (Apache's ap_rwrite()?) for each item in the list that is passed to it, so I suppose I should also add the UTF-8 handling code to each of those items too. (This means that if the headers and body *are* printed together then the body will be UTF-8-handled twice -- once in the first half of print(), and then again in write_client(). However, that's "safe": the handling just ensures that the data is in the appropriate format. It knows not to do anything if it is already in the correct format.)

I've attached a patch that incorporates these changes (with the PerlIO_isutf8() stuff still commented out until I figure out what to do about it).

Steve
--- Apache.xs.orig      2003-06-06 12:31:10.000000000 +0100
+++ Apache.xs   2003-07-16 10:48:35.000000000 +0100
@@ -1119,11 +1119,27 @@
        SV *sv = sv_newmortal();
        SV *rp = ST(0);
        SV *sendh = perl_get_sv("Apache::__SendHeader", TRUE);
+#if PERL_VERSION >= 8
+       /*PerlIO *fp = IoOFP(GvIOp(defoutgv));*/
+#endif
 
        if(items > 2)
            do_join(sv, &sv_no, MARK+1, SP); /* $sv = join '', @_[1..$#_] */
         else
            sv_setsv(sv, ST(1));
+#if PERL_VERSION >= 8
+       /*if (PerlIO_isutf(fp)) {
+           if (!SvUTF8(sv))
+               sv_utf8_upgrade(sv = sv_mortalcopy(sv));
+       }
+       else*/ if (DO_UTF8(sv)) {
+           if (!sv_utf8_downgrade((sv = sv_mortalcopy(sv)), TRUE)
+               && ckWARN_d(WARN_UTF8))
+           {
+               Perl_warner(aTHX_ packWARN(WARN_UTF8), "Wide character in print");
+           }
+       }
+#endif
 
        PUSHMARK(sp);
        XPUSHs(rp);
@@ -1168,6 +1184,9 @@
 
     CODE:
     RETVAL = 0;
+#if PERL_VERSION >= 8
+    /*PerlIO *fp = IoOFP(GvIOp(defoutgv));*/
+#endif
 
     if (r->connection->aborted)
         XSRETURN_IV(0);
@@ -1176,6 +1195,19 @@
        int sent = 0;
         SV *sv = SvROK(ST(i)) && (SvTYPE(SvRV(ST(i))) == SVt_PV) ?
                  (SV*)SvRV(ST(i)) : ST(i);
+#if PERL_VERSION >= 8
+       /*if (PerlIO_isutf(fp)) {
+           if (!SvUTF8(sv))
+               sv_utf8_upgrade(sv = sv_mortalcopy(sv));
+       }
+       else*/ if (DO_UTF8(sv)) {
+           if (!sv_utf8_downgrade((sv = sv_mortalcopy(sv)), TRUE)
+               && ckWARN_d(WARN_UTF8))
+           {
+               Perl_warner(aTHX_ packWARN(WARN_UTF8), "Wide character in print");
+           }
+       }
+#endif
        buffer = SvPV(sv, len);
 #ifdef APACHE_SSL
         while(len > 0) {

Reply via email to