On Saturday 07 June 2003 03:17 am, Alexander Schatten wrote: > Joerg Heinicke wrote: > > Alexander Schatten wrote: > >> (1) UTF-8 practically only works for english texts, and does not work > >> with ae oe ue and so on > > > > That's wrong. UTF-8 works for *every* character. You only must use it > > correctly - and that's not so easy :-) > > By default giving a browser an UTF-8 document, it will send forms > > encoded in UTF-8 too, but Cocoon expects ISO-8859-1. You can change > > his by setting the form encoding correctly. > > Well, I have mentioned it, I am definitily no encoding expert, but my > practical know-how shows me with different tools(!) not only with > cocoon, that UTF-8 does in praxis not work with, e.g., german umlauts. > ISO-8859-1 does. Thats fact. Maybe, there are problems in > implementations, I don't know, but this is what I experienced.
Could it possibly be your encoding understanding that is a bit flawed? Unicode numbers are fixed in stone, and Java uses it internally for all String and Character. However, there is ALWAYS a Unicode to Encoding performed when outputting the characters to some other data medium. It is per definition necessary. ISO-8859-1 has a set of characters defined, and all other characters are encoded into "numeric text". I believe all encoding standards have a method of representing characters that are not part of the "encoding scope", such as chinese characters in ISO-8859-1. Now, to make matters worse, at least even more confusing, is that the characters is eventually displayed or printed to a human, in which case there must be a graphical representation available for the character in question, also called a font. Sad to say, today few tools and few fonts supports all character encodings. I believe that in this jungle of confusion, you have misunderstood how to use character encodings. It is easy to do, done it myself many times. For instance, MySQL requires to be setup to support UTF encodings, doesn't do that by default, and the JDBC driver must specify that it will use UTF in the connect string. You forget that, and everything seems like it doesn't work at all. I suggest that you slowly go through each part of your system, and verifies the use of character encoding. There should be no problem mixing them, e.g. having ISO-8859-1 documents which are easier to type, and serve UTF-8 to the web browsers. Niclas --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]