Michael Schout wrote:
On 9/2/14, 4:19 PM, Randal L. Schwartz wrote:

  ## ensure utf8 CGI params:
  $CGI::PARAM_UTF8 = 1;

Sorry to chime in late on this, but part of the problem with CGI.pm and
UTF-8 is that PARAM_UTF8 gets clobbered by a cleanup handler that CGI.pm
itself registers if its running under mod_perl.

This caused major headaches for me at one time until I figured this out.

You have to make sure to set $CGI::PARAM_UTF8 early, and FOR EVERY
REQUEST, because if you just set it globally (e.g.: in a startup perl
script), then it only works for the first request.


Hi.
Just an addendum to the discussion :

There are really two distinct approaches to this issue, and they work at 
different levels :

1) is to "fix" CGI.pm so that it delivers the parameters in the way which you 
expect.
As shown by the previous valuable and technical contributions, this generally works, but it requires a certain level of expertise; and it does not necessarily work backwards with all versions of mod_perl and CGI.pm.

2) is to take whatever CGI.pm does deliver to the calling script or module, and use a couple of tricks and some additional code in ditto script or module, to ensure that whatever CGI.pm delivers under whatever mod_perl version, the receiving script or module always knows in the end what it is dealing with.
That is the method which I presented early in the discussion.
As stated in that contribution, it is not necessarily the most elegant or efficient way to deal with the issue, but it has the advantage of working always, no matter which version of CGI.pm and/or mod_perl are in use.

The real crux of the matter is this, in my view : as things stand today in terms of protocol and RFCs, there is no real way for CGI.pm (or any comparable framework) to be *sure* of the encoding of the data sent by a browser or another HTTP client agent. Even the RFCs do not really provide a way by which this can be enforced. (*)

So if you are sure of what the client is sending, and the matter consists of *forcing* CGI.pm to always communicate POST (or GET) data as UTF-8 encoded and utf8-marked (or the opposite) to the calling script/module, then method 1 will work, and it is more elegant and probably more efficient than method 2.

But if the matter consists of ensuring that the receiving code in the script/module which handles the data submitted by the HTTP client, is resilient and "does the right thing" whatever the submitted data really was, then in my opinion method 2 is better.
(But that's only my opinion of the moment, and I stand ready to be corrected).

(*) and if you believe this not to be true, please send me some references about it, because I am really interested. It might save me some code in all my web-facing applications.

Reply via email to