Re: mod_perl and utf8 and CGI->param

André Warnier Mon, 08 Sep 2014 12:17:50 -0700

Michael Schout wrote:

On 9/2/14, 4:19 PM, Randal L. Schwartz wrote:

  ## ensure utf8 CGI params:
  $CGI::PARAM_UTF8 = 1;


Sorry to chime in late on this, but part of the problem with CGI.pm and
UTF-8 is that PARAM_UTF8 gets clobbered by a cleanup handler that CGI.pm
itself registers if its running under mod_perl.

This caused major headaches for me at one time until I figured this out.

You have to make sure to set $CGI::PARAM_UTF8 early, and FOR EVERY
REQUEST, because if you just set it globally (e.g.: in a startup perl
script), then it only works for the first request.


Hi.
Just an addendum to the discussion :

There are really two distinct approaches to this issue, and they work at 
different levels :

1) is to "fix" CGI.pm so that it delivers the parameters in the way which you 
expect.

As shown by the previous valuable and technical contributions, this generally works, butit requires a certain level of expertise; and it does not necessarily work backwards withall versions of mod_perl and CGI.pm.

2) is to take whatever CGI.pm does deliver to the calling script or module, and use acouple of tricks and some additional code in ditto script or module, to ensure thatwhatever CGI.pm delivers under whatever mod_perl version, the receiving script or modulealways knows in the end what it is dealing with.

That is the method which I presented early in the discussion.

As stated in that contribution, it is not necessarily the most elegant or efficient way todeal with the issue, but it has the advantage of working always, no matter which versionof CGI.pm and/or mod_perl are in use.

The real crux of the matter is this, in my view : as things stand today in terms ofprotocol and RFCs, there is no real way for CGI.pm (or any comparable framework) to be*sure* of the encoding of the data sent by a browser or another HTTP client agent. Eventhe RFCs do not really provide a way by which this can be enforced. (*)

So if you are sure of what the client is sending, and the matter consists of *forcing*CGI.pm to always communicate POST (or GET) data as UTF-8 encoded and utf8-marked (or theopposite) to the calling script/module, then method 1 will work, and it is more elegantand probably more efficient than method 2.

But if the matter consists of ensuring that the receiving code in the script/module whichhandles the data submitted by the HTTP client, is resilient and "does the right thing"whatever the submitted data really was, then in my opinion method 2 is better.

(But that's only my opinion of the moment, and I stand ready to be corrected).

(*) and if you believe this not to be true, please send me some references about it,because I am really interested. It might save me some code in all my web-facing applications.

Re: mod_perl and utf8 and CGI->param

Reply via email to