On Wed, 14 Jul 2021 12:39:46 GMT, Giacomo Baso <github.com+12575901+gb...@openjdk.org> wrote:
> > Consider an application that creates a java.io.FileWriter with its > > one-argument constructor and then uses it to write some text to a file. The > > resulting file will contain a sequence of bytes encoded using the default > > charset of the JDK running the application. A second application, run on a > > different machine or by a different user on the same machine, creates a > > java.io.FileReader with its one-argument constructor and uses it to read > > the bytes in that file. The resulting text contains a sequence of > > characters decoded using the default charset of the JDK running the second > > application. If the default charset differs between the JDK of the first > > application and the JDK of the second application, then the resulting text > > may be silently corrupted or incomplete, since these APIs replace erroneous > > input rather than fail. > > It's even worse than that, because many OpenSSH installs are configured by > default to [forward](https://man.openbsd.org/ssh_config.5#SendEnv) and > [accept](https://man.openbsd.org/sshd_config.5#AcceptEnv) the user locale > (see e.g. for [RHEL 7](https://access.redhat.com/solutions/974273)). > > So a single application, on a single remote machine, can be unknowingly > started by a single user with different locales, and therefore different > encodings, depending on how the user connected to the remote machine. For > example, on Windows connecting via powershell results in `LANG=en_US.UTF-8`, > while using WSL2 results in `LANG=C.UTF-8`. On Java 11 in a RHEL7 machine, > `file.encoding` results in `UTF-8` in the first case, but `ANSI_X3.4-1968` in > the second, leading to a default charset `ASCII`. > > Worth mentioning is also that `Charset.forName("default")` is just an alias > to `ASCII`, per `sun.nio.cs.StandardCharsets$Aliases`. Thanks. Updated the JEP. ------------- PR: https://git.openjdk.java.net/jdk/pull/4733