On Sep 25, 2004, at 2:19 PM, Jim Witte wrote:
Well, my 2004082608 (admitted, old) version of Camino displays the above amazon.co.uk (why use the co subdomain at all? In the name of simplicity! Grr..) fine, including pound signs - which it codes correctly as html elements. Camino seems to select *no* text encoding by default.

Now, why do you suppose UK stamps don't say "UK" on them but everyone else's stamps have the country on them?
Why do you think The Times is just The Times, no matter how many times Americans refer to it as "The London Times".
Now, why does the USA use .com, .net, .gov and .edu whilst everyone else gets .co.uk, .gov.uk & .ac.uk etc?


Because nothing really changes in this world. Someone's on the top of the pile.


With regards to the Gameplay site, it is using incorrect HTML, but will display correctly in either ISO Latin 1 or MacOS Roman (but not Unicode - Why wasn't unicode made compatible with the previous ISO standard - they have 65,556 characters at their disposal after all... When loading Gamplay, Camino selects ISO Latin 1 initially, but as soon as it starts "transferring data", the encoding menu displays NO encoding.

That statement makes no sense. "Unicode" isn't a character encoding, its character set. It merely specifies numbers for characters, not computer representations. UTF-7, UTF-8, UTF-16, UTF-32, UCS-2 and UCS-4 are encodings.


UTF-8 - the most commonly encountered Unicode encoding - is identical to ISO 8859-1 in the ranges 0..127 decimal (ie, ASCII)
and identical in the range 160..255 decimal. Everything from 128..159 decimal are extremely arcane *non-printing* control characters that you can't even see. So for all practical purposes, they are the same.


Now, Microsoft Windows Codepage 1252 is almost the same as 8859-1 and UTF8, except they put a whole bunch of stuff in the range 128..159 like the TM symbol and the Euro sign. Apple did the same with Mac Roman, but this is ancient history and can't be changed now. Problems frequently occur when ignorant Windows & Mac users insert chars in this magic 128..159 range without realizing. This is fine if the server actually serves the page as 1252 or MacRoman, but of course, if it says Unicode or 8859-1 then everything goes wrong.

So there is a problem, but its not between Unicode and 8859-1. Its the vendor specific stuff that breaks everything.

(Or, as a sort of Rube-Goldberg solution, could a simple text-analyzer analyze a page and compare it to letter/bigraph/top-word frequency tables for European languages to make an educated guess as to the correct encoding?)

There is an encoding guessing algorithm in Mozilla. Try View->Character Enconding->Auto detect->Universal.


Its reasonably good, but its not perfect.

AndyT (lordpixel - the cat who walks through walls)
A little bigger on the inside

        (see you later space cowboy ...)

_______________________________________________
Camino mailing list
[EMAIL PROTECTED]
http://mozdev.org/mailman/listinfo/camino

Reply via email to