On 6/5/2012 3:02 AM, Arnon Weinberg wrote:

How can I set the output character encoding of Apache::ASP output?

There are several places where you set this, not just one, and they all have to agree to guarantee correct output:

        DB -> back end -> Apache -> HTML -> Apache::ASP -> browser

If they do not all agree, you can either get mixed encodings or encoding ping-ponging.

Ping-ponging is less common these days now that the world is settling on UTF-8. Back in the Perl 5.6/Apache 1.3/pre-Firefox days, I remember once chasing data through a system that stored data in the DB in Latin-1, which got translated to UTF-8 in the back-end daemon, which then sent it on to Apache and mod_perl, one of which smashed the data back to Latin-1 (never did nail that one down), before sending the data out to the browser which saw UTF-8 because Apache was configured to use that by default!

So, you have to check all the links in that chain:

- Your DB and any back-end daemon are up to you, since they're out of scope on this list.

- Apache has things like the "AddDefaultCharset" directive which play into this.

- For the Perl aspects, I recommend just reading the Perl manual chapter on it: perldoc perlunicode. Perl's Unicode support is deep, broad, and continually evolving[*]. You really must read your particular version's docs to know exactly how it's going to behave. There have been several breaking changes over the past decade or so.

- There are at least three ways to set the character encoding in your HTML. RTFEE: https://en.wikipedia.org/wiki/Character_encodings_in_HTML

- And finally, it's possible to set a browser to ignore whatever it's told by the HTTP server and the document, and force it to interpret the data using some other character set.


[*] Literally continuously. I happened to read through the Perl release notes from 5.8 onward last week, and I saw Unicode related changes in *every* major release, including the just-released 5.16!

Regular perl/CGI output defaults to ISO-8859-1 encoding,

Really? I'd expect it to take the overall Perl default, which is UTF-8 on most Unix type systems with Perl 5.6 onward on OSes contemporary with that version of Perl. I would have expected that you'd have to go out of your way to force a return to Latin-1.

Now, if you're on a system where the native character set is still Latin-1, I'd understand that, but then you'd be running a 10 year old box, wouldn't you? :)

How can I get the same results as the CGI script above?

It's 2012. Please, please, please abandon Latin-1. Everything speaks UTF-8 these days, at the borders at least, even systems like Windows and JavaScript where it isn't the native character set. It is safe to consider UTF-8 the standard Unicode encoding online.

---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscr...@perl.apache.org
For additional commands, e-mail: asp-h...@perl.apache.org

Reply via email to