Re: Output formatting problem (text encoding?)

karl Wed, 21 Jul 2004 02:23:12 -0700

Thanks for your help Warren. I wrote my last message before seeing 
yours. I can see now that it can be confusing to track all the text 
encoding changes, but that it is only the last one that generally 
matters (assuming lossless conversion).


Before I discovered that the AddDefaultCharset Apache directive 
would solve my problem, I found a stopgap solution of setting 
$Response->{Charset} in my script.

Thanks again!

--- In [EMAIL PROTECTED], Warren Young <[EMAIL PROTECTED]> wrote:
> karl wrote:
> > I have 
> > text output coming from a database and ' (apostrophes) are shown 
in 
> > the browser (IE6) as ? (question marks). 
> 
> There's apostrophes and there are apostrophes.  There's ASCII code 
39, 
> there's Windows code page 1252 code 146, there's Unicode code 
> <mumble>....  The question is, which of these codes are in your 
> database?  You must know the answer to that question before you 
can 
> decide how to proceed.
> 
> Character code handling in the 
database/Apache::ASP/Perl5/Apache/browser 
> chain is stranger than you probably expect.  Here's a post I wrote 
a few 
> months back detailing two chains I've personally observed:
> 
>       http://www.mail-archive.com/[EMAIL PROTECTED]/msg01952.html
> 
> Notice that I saw two rather different translation chains on my 
two test 
> systems!  Your particular configuration is quite different from 
either 
> of mine, so it could give yet a third path.
> 
> > The only thing I can figure out is that 
> > original output shows up as encoded Unicode (UTF-8) in the 
browser; 
> 
> Don't guess, find out.
> 
> The way I did the analysis to make that post I linked to, I dumped 
the 
> text in question to a file at several places along the I/O chain, 
then I 
> examined each file.  You should also use a network sniffer to see 
what 
> the HTTP headers and HTML data are without the browser getting in 
the 
> way.  There's a good list of sniffers in the Winsock Programmer's 
FAQ, 
> if you don't have one already:
> 
>       http://tangentsoft.net/wskfaq/
> 
> I think you'll find, as I did, that your characters are being 
translated 
> back and forth between ISO 8859-x and Unicode multiple times, and 
that 
> the last step isn't being done correctly.
> 
> That last step is critical because of the high probability that 
the 
> intermediate transformations are all lossless in your situation.  
All 
> you have to do is communicate to the browser what the final 
character 
> encoding is.  In my particular situation, I had to change an 
Apache 
> setting to make it send a header informing the browser that the 
> character encoding was UTF-8.  The browser was then able to 
display the 
> web page correctly, nevermind that the data was stored as ISO 8859-
1 
> (Latin-1) in the database, and translated back and forth several 
times 
> along the path.
> 
> > The only physical 
> > difference I can find between the output generated by 
Apache::ASP 
> > and IIS/ASP is that the Apache::ASP has Unix style LF line-
endings 
> > and the IIS/ASP has DOS/Windows style CRLF line-endings. 
> 
> I'll bet you didn't compare the HTTP headers.  Different web 
servers, 
> hence different headers, hence different browser interpretation.
> 
> -------------------------------------------------------------------
--
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Output formatting problem (text encoding?)

Reply via email to