On Mar 08 2007, Florian Kulzer wrote: > On Thu, Mar 08, 2007 at 10:47:10 -0500, H.S. wrote:
> > I am not sure I understand this comment. I am not very familiar with > > encoding. I was assuming the web pages which have international > > characters are better off by using UTF-8 encoding. > > What I meant was this: Your utf-8 setup (combined with using the proper > fonts) is able to encode and display umlauts, accented characters, > characters for Slavic languages, Scandinavian, Russian, Greek, (some) > Asian characters, etc. This is in contrast to, say, someone using an > iso-8859-1 locale who cannot display many of these "foreign" characters. > (Unless s/he uses an application which can work around the limitations > of the system's encoding, for example LaTeX.) > > The problem is that a webpage has to tell your browser which encoding it > uses to transmit the characters. If the browser has to guess things can > go wrong. In your case iceape guessed the page was encoded in utf-8 > which goes wrong for many characters outside the standard us-ascii set. > Once you told your browser that the page was in iso-8859-1 it could > transcode properly. The root of the problem is that the character "??" > (the accented e) exists in both utf-8 and iso-8859-1 but it has a > different code in the two encodings. Ok, dumb question time. I have hell's own mess with emails, basically amounting to inability to read non-US characters in text emails, but I was under the impression that there was a simple solution for web pages. Html includes its _own_ encoding for accented, umlauted and otherwise non-US characters, and conformant web pages are supposed to use it - not rely on the lucky browser switching their browser preferences from UTF-6 to ISO-988956-whatever to some-other-bloody-encoding depending on the whim of the web page author. People reading this mail in html may have difficulty if I try to give examples, but I type them into web page source all the time, to get the non-US characters I want - and they work. Perhaps things are different if the web page creator uses GUI-based "authoring" tools, and can't tell that the tool is making stupid decisions under the good ;-) Anyway, example time - á gives you an a with an acute acent. (That's an ampersand symbol followed by the letters "aacute" followed by a semi-colon.) > > I was assuming they should have used UTF-8 along with the language tags > > around that word. I might be mistaken though. > > This would maybe work if they would encode that word in utf-8. Since > they decided to use iso-8859-1 throughout the document they could simply > have included > > <meta http-equiv="CONTENT-TYPE" content="text/html; charset=iso-8859-1"> > > in the HTML header. I see. Since I'm lazy - and unsure precisely what query to feed to a search engine - could you possibly point at a list of these tags. -- Arlie (Arlie Stephens [EMAIL PROTECTED]) -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]