On 6/5/2012 3:02 AM, Arnon Weinberg wrote:
How can I set the output character encoding of Apache::ASP output?
There are several places where you set this, not just one, and they all
have to agree to guarantee correct output:
DB -> back end -> Apache -> HTML -> Apache::ASP -> browser
If they do not all agree, you can either get mixed encodings or encoding
ping-ponging.
Ping-ponging is less common these days now that the world is settling on
UTF-8. Back in the Perl 5.6/Apache 1.3/pre-Firefox days, I remember
once chasing data through a system that stored data in the DB in
Latin-1, which got translated to UTF-8 in the back-end daemon, which
then sent it on to Apache and mod_perl, one of which smashed the data
back to Latin-1 (never did nail that one down), before sending the data
out to the browser which saw UTF-8 because Apache was configured to use
that by default!
So, you have to check all the links in that chain:
- Your DB and any back-end daemon are up to you, since they're out of
scope on this list.
- Apache has things like the "AddDefaultCharset" directive which play
into this.
- For the Perl aspects, I recommend just reading the Perl manual chapter
on it: perldoc perlunicode. Perl's Unicode support is deep, broad, and
continually evolving[*]. You really must read your particular version's
docs to know exactly how it's going to behave. There have been several
breaking changes over the past decade or so.
- There are at least three ways to set the character encoding in your
HTML. RTFEE: https://en.wikipedia.org/wiki/Character_encodings_in_HTML
- And finally, it's possible to set a browser to ignore whatever it's
told by the HTTP server and the document, and force it to interpret the
data using some other character set.
[*] Literally continuously. I happened to read through the Perl release
notes from 5.8 onward last week, and I saw Unicode related changes in
*every* major release, including the just-released 5.16!
Regular perl/CGI output defaults to ISO-8859-1 encoding,
Really? I'd expect it to take the overall Perl default, which is UTF-8
on most Unix type systems with Perl 5.6 onward on OSes contemporary with
that version of Perl. I would have expected that you'd have to go out
of your way to force a return to Latin-1.
Now, if you're on a system where the native character set is still
Latin-1, I'd understand that, but then you'd be running a 10 year old
box, wouldn't you? :)
How can I get the same results as the CGI script above?
It's 2012. Please, please, please abandon Latin-1. Everything speaks
UTF-8 these days, at the borders at least, even systems like Windows and
JavaScript where it isn't the native character set. It is safe to
consider UTF-8 the standard Unicode encoding online.
---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscr...@perl.apache.org
For additional commands, e-mail: asp-h...@perl.apache.org