[Eap-features] Re: Improvement: make encoding detection intelligent, not based on assumptions

Guillaume Laforge Thu, 18 Jul 2002 01:05:22 -0700


"Maxim Shafirov" <[EMAIL PROTECTED]> a �crit dans le message news:
ah3ueq$ncj$[EMAIL PROTECTED]
> It is not as easy unfortunately. Thus, characters over 127 do not imply
> ISO-8859-1 at all. There could be other national encoding installed like
> KOI8-R (russian one). Physically there's could not be other possibility to
> determine which single byte encoding is used other than statistic analysis
> which seems not be accurate when working on .java files - to few national
> characters...
>


Okay, so I've done some search about charsets, and so on.
There are two aspects of the problem:
- the easy one is for XML files, the encoding is defined there, so it's very
easy to use the right encoding for XML files.
- the hardest one is for Classes and JSP: we can easily guess if a file is
UTF-x or not. The problem arise when it's not UTF. If it's not UTF, as it's
difficult to find rules to guess the right 8-bit charset, the default system
encoding should be used.

Does it make sense ?
At least, my UTF8 files will be opened with the right encoding, and my
ISO-8859-1 files will also be opened with my default system charset. In both
cases, whether they be XML or Java files.
By the way, do you use the Charset classes from JDK1.4 ? If yes, there may
be some stuff to explore, such as not modifying characters that are not part
of the current choosen or guessed encoding.

Guillaume


_______________________________________________
Eap-features mailing list
[EMAIL PROTECTED]
http://lists.jetbrains.com/mailman/listinfo/eap-features

[Eap-features] Re: Improvement: make encoding detection intelligent, not based on assumptions

Reply via email to