On Thu, 8 Jul 2021 21:23:00 GMT, Naoto Sato <na...@openjdk.org> wrote:
> This is an implementation for the `JEP 400: UTF-8 by Default`. The gist of > the changes is `Charset.defaultCharset()` returning `UTF-8` and > `file.encoding` system property being added in the spec, but another notable > modification is in `java.io.PrintStream` where it continues to use the > `Console` encoding as the default charset instead of `UTF-8`. Other changes > are mostly clarification of the term "default charset" and their links. > Corresponding CSR has also been drafted. > > JEP 400: https://bugs.openjdk.java.net/browse/JDK-8187041 > CSR: https://bugs.openjdk.java.net/browse/JDK-8260266 > Consider an application that creates a java.io.FileWriter with its > one-argument constructor and then uses it to write some text to a file. The > resulting file will contain a sequence of bytes encoded using the default > charset of the JDK running the application. A second application, run on a > different machine or by a different user on the same machine, creates a > java.io.FileReader with its one-argument constructor and uses it to read the > bytes in that file. The resulting text contains a sequence of characters > decoded using the default charset of the JDK running the second application. > If the default charset differs between the JDK of the first application and > the JDK of the second application, then the resulting text may be silently > corrupted or incomplete, since these APIs replace erroneous input rather than > fail. It's even worse than that, because many OpenSSH installs are configured by default to [forward](https://man.openbsd.org/ssh_config.5#SendEnv) and [accept](https://man.openbsd.org/sshd_config.5#AcceptEnv) the user locale (see e.g. for [RHEL 7](https://access.redhat.com/solutions/974273)). So a single application, on a single remote machine, can be unknowingly started by a single user with different locales, and therefore different encodings, depending on how the user connected to the remote machine. For example, on Windows connecting via powershell results in `LANG=en_US.UTF-8`, while using WSL2 results in `LANG=C.UTF-8`. On Java 11 in a RHEL7 machine, `file.encoding` results in `UTF-8` in the first case, but `ANSI_X3.4-1968` in the second, leading to a default charset `ASCII`. Worth mentioning is also that `Charset.forName("default")` is just an alias to `ASCII`, per `sun.nio.cs.StandardCharsets$Aliases`. ------------- PR: https://git.openjdk.java.net/jdk/pull/4733