Thanks for your help Warren. I wrote my last message before seeing
yours. I can see now that it can be confusing to track all the text
encoding changes, but that it is only the last one that generally
matters (assuming lossless conversion).
Before I discovered that the AddDefaultCharset Apache directive
would solve my problem, I found a stopgap solution of setting
$Response->{Charset} in my script.
Thanks again!
--- In [EMAIL PROTECTED], Warren Young <[EMAIL PROTECTED]> wrote:
> karl wrote:
> > I have
> > text output coming from a database and ' (apostrophes) are shown
in
> > the browser (IE6) as ? (question marks).
>
> There's apostrophes and there are apostrophes. There's ASCII code
39,
> there's Windows code page 1252 code 146, there's Unicode code
> <mumble>.... The question is, which of these codes are in your
> database? You must know the answer to that question before you
can
> decide how to proceed.
>
> Character code handling in the
database/Apache::ASP/Perl5/Apache/browser
> chain is stranger than you probably expect. Here's a post I wrote
a few
> months back detailing two chains I've personally observed:
>
> http://www.mail-archive.com/[EMAIL PROTECTED]/msg01952.html
>
> Notice that I saw two rather different translation chains on my
two test
> systems! Your particular configuration is quite different from
either
> of mine, so it could give yet a third path.
>
> > The only thing I can figure out is that
> > original output shows up as encoded Unicode (UTF-8) in the
browser;
>
> Don't guess, find out.
>
> The way I did the analysis to make that post I linked to, I dumped
the
> text in question to a file at several places along the I/O chain,
then I
> examined each file. You should also use a network sniffer to see
what
> the HTTP headers and HTML data are without the browser getting in
the
> way. There's a good list of sniffers in the Winsock Programmer's
FAQ,
> if you don't have one already:
>
> http://tangentsoft.net/wskfaq/
>
> I think you'll find, as I did, that your characters are being
translated
> back and forth between ISO 8859-x and Unicode multiple times, and
that
> the last step isn't being done correctly.
>
> That last step is critical because of the high probability that
the
> intermediate transformations are all lossless in your situation.
All
> you have to do is communicate to the browser what the final
character
> encoding is. In my particular situation, I had to change an
Apache
> setting to make it send a header informing the browser that the
> character encoding was UTF-8. The browser was then able to
display the
> web page correctly, nevermind that the data was stored as ISO 8859-
1
> (Latin-1) in the database, and translated back and forth several
times
> along the path.
>
> > The only physical
> > difference I can find between the output generated by
Apache::ASP
> > and IIS/ASP is that the Apache::ASP has Unix style LF line-
endings
> > and the IIS/ASP has DOS/Windows style CRLF line-endings.
>
> I'll bet you didn't compare the HTTP headers. Different web
servers,
> hence different headers, hence different browser interpretation.
>
> -------------------------------------------------------------------
--
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]