Re: mod_perl and utf8 and CGI->param

Torsten Förtsch Thu, 04 Sep 2014 01:22:00 -0700

On 03/09/14 21:38, Randal L. Schwartz wrote:
> What I need to know is what is mod_perl doing differently?  Does it not
> respect binmode STDIN, ":utf8"?  Apparently not.  So if you know of a
> way to get mod_perl to "fix" reading from the browser properly, I'm
> interested in that.


Something along these lines:

use Apache2::RequestIO ();
use Encode ();
BEGIN {
    my $orig=\&Apache2::RequestRec::read;
    *Apache2::RequestRec::read=sub {
        my ($r, $buf, $len, $offset)=@_;
        my $_buf;
        my $rc=$r->$orig($_buf, $len);
        substr($buf, $offset, undef, Encode::decode_utf8 $_buf);
        return $rc;
    };
}

It's a bit more complicated than that because $_buf may end in the
middle of a character. But you can catch that and read a few more bytes.
Also, not sure if you expect the return value to be in octets or characters.

Though, I wouldn't go this way. I'd either try to force CGI.pm to read
from STDIN and use the perl-script handler
(http://perl.apache.org/docs/2.0/user/config/config.html#C_perl_script_). This
pushes a PerlIO layer to STDIN so that you can read from STDIN. On top
of that you can push :utf8 then.

The other way I'd prefer over the hack above is to patch CGI.pm to
convert the data after it has read it. You can even do that in your
application. Many applications I have seen have a separate step to
sanitize the input. That would be the place to do that. However, then
you have to watch out for upload fields.

So, there is no really simple solution. And I don't think this will be
"fixed" in modperl because $r has no such concept as an IO layer. The
closest thing httpd/modperl has to offer is an input filter. But that
won't help you here because brigades are handled mainly by httpd which
knows only about octets. You don't want to change the data itself. You
want to change the data's metadata.

Torsten

Re: mod_perl and utf8 and CGI->param

Reply via email to