The following reply was made to PR general/4492; it has been noted by GNATS.
From: Dirk-Willem van Gulik <[EMAIL PROTECTED]> To: Ralf Weinand <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED] Subject: Re: general/4492: UTF-8 encoding URL's at IE5 won't work with special directory-names and apache Date: Sun, 30 May 1999 14:00:27 +0200 (CEST) On 29 May 1999, Ralf Weinand wrote: > After the Installation of IE5 (german version) i got Problems with my > websites > Some links, that i created via javascript won't work. > But javascript isn't the problem. > non english URL's are standardly encoded in the UTF-8 mode, so i can't > reach the sites with special words. when i disable the utf-8 in the > IE5-properties (deep inside), all will work. > i searched a while about the UTF-8 Meaning, but i do nor Know, whether > UTF-8 is a standard real planned for the Internet. > this .txt file will not be reached with IE5 and the standard-installation Although perhaps too technical; this is not really a server problem; but one having to do with the way IE5 implements some of their internationalization and localization. And some of that is plain wrong, wrong and wrong. Sorry. But there is a way round it; see the end of this longish msg. As for apache; apache can deal with UTF8 files just fine; they are send out exactly as they are; but you should make sure that the Charset is set right of course. See www.w3.org/International for more information. As for UTF8 inside a URI; there are some rules all URI's are to adhere to, and what characters they may contain. Unfortunately your ringel-ss or sz is not one of them, nor are say chinese characters. This page explains it in detail: http://www.w3.org/International/O-URL-and-ident.html In short the rules are 0. the URI is an octed stream with no real meaning, i.e. just a sequence of numbers. 1. the URI is an octed stream with no real meaning, i.e. just a sequence of numbers. 2. the URI is an octed stream with no real meaning, i.e. just a sequence of numbers. 3. the URI is an octed stream with no real meaning, i.e. just a sequence of numbers. 4. the URI is an octed stream with no real meaning, i.e. just a sequence of numbers. 5. the URI is an octed stream with no real meaning, i.e. just a sequence of numbers. 6. any special character (i.e. not a-z, 0-9 and a few more) is to be encoded as a '%xy' where x and y are hex numbers 0..9a..f. 7. the URI is an octed stream with no real meaning, i.e. just a sequence of numbers. To confuse matters; that sequence of numbers just _HAPPENS_ (but this is entirly coincedental and of no substance) to look like a human readable string when you look up the numbers in an ASCII table. But you should completely forget this :-) What now follows is an incredible simplification of the real story. But it might help. The 'solution' for your problem is at the end. I hope. What generally happens is that a user enters a URL in the bar of the browser. The browser, together with the OS then translates this into a valid octed-string, as per RFC2396 according to localization rules. I.e. the user can actually type in strange char's, such as the sz, the ae, ij and many others needed in dutch, danish, chinese, german and so on... but the browser; helped by the OS (which has details on what the user meant when it typed in the string) is to translate those to a simple octed string. This string then goes to the server. The apache server decodes part of this string; but basically passes it on the the OS which then tries to work out what file you have. If the OS understands UTF8 coded file names you are usually all right. But obviously there is a big i18n problem here. But... in an HTML, regardless of the charset it is written in, wether it is in chinese, german or greek; the URI's, i.e. the bits between the href="...." quotes are _NOT_ in the charset of that page; but are to be treated as an octed stream; and send on the wire exactly like that. So even though one would type in the browser window's location bar http://www.teddy-online.de/Teddys/Gro_/Teddy-schwarz.txt (where the '_' is the Beta shaped german 'sz' char), you would code it in the HTML as <a href="/images/Teddys/Gro%df/Teddy-schwarz.txt"> i.e. use a 'hex' escape instead of the ringel-ess/sz. The same applies for javascript _AND_ for java; despite the fact that all code, comments and displayable strings in java are in UTF8, you are to threat the URIs strictly as octed strings if you encode them directly. Hope this helps, Dw.