On Sun, 6 Jun 2021 22:25:44 GMT, David Holmes <dhol...@openjdk.org> wrote:
>> I think we need to establish some common ground before proceeding further >> with this fix. It's a bit of a long read; please, bear with me. >> >> The path name starts its life as a `jstring` in >> `Java_jdk_internal_loader_NativeLibraries_load()`, its encoding is >> irrelevant at this point. >> >> Next, the name has to be passed down to `JVM_LoadLibrary()` that takes >> `char*`. So we need to convert form `jstring` to `char*` (point (a)). >> Following that, `os::dll_load()` that actually performs loading in a >> platform-specific manner also receives `char*`. All platform implementations >> of `os::dll_load()` pass the path name down to their respective platform's >> APIs unmodified, but I think that's just incidental and here we have another >> possible point of conversion (point (b)). Other consumers of the path name >> are exception(c) and logging(d) messages; they also take `char*`, but >> potentially of a different encoding. >> >> Let me try to enumerate all conceivably valid conversions for >> `JVM_LoadLibrary()` consumption (point (a)): >> 1. jstring -> platform-specific encoding (status quo meaning possibly lossy >> encoding on Windows and UTF-8 elsewhere AFAICT), >> 2. jstring -> modified UTF-8, >> 3. jstring -> UTF-8. >> >> This bug [8195129](https://bugs.openjdk.java.net/browse/JDK-8195129) occurs >> because conversion (1) may loose information on Windows if the platform >> encoding happens to be NOT UTF-8 (which it often - or even always - is). So >> that's a no-go and we are left with either (2) or (3). >> >> On MacOS and Linux, "platform" encoding already is UTF-8 and since all the >> platform APIs happily consume UTF-8, no further conversion is necessary >> (neither for actual library loading, nor for log or exception messages; the >> latter have to convert to UTF-16, but do that under the hood). >> >> On Windows, we require at least these variants of the path name: >> 1. UTF16 for library loading (Unicode Windows API), >> 2. "platform" encoding for logging (yes, loosing information here, but >> that's tolerable), >> 3. "platform" (lossy) or UTF8 (lossless) encoding for exception messages >> (prefer lossless). >> >> This is what's behind my choice of UTF-8 for the path name encoding as it >> gets passed down to `JVM_LoadLibrary()`. We can go with modified UTF-8, of >> course, in which case all platforms - not just Windows - will have to do the >> conversion on their own, loosing the benefit of the knowledge about the >> original string encoding (the String.coder field of jstring). > > @mkartashev thank you for the detailed explanation. > > It is not clear to me that the JDK's conformance to being a Unicode > application has significantly changed since the evaluation of JDK-8017274 - > @naotoj can you comment on that and related discussion from the CCC for > JDK-4958170 ? In particular I'm not sure that using the platform encoding is > wrong, nor how we can have a path that cannot be represented by the platform > encoding? > > Not being an expert in this area I cannot evaluate the affects of these > shared code changes on other platforms, and so am reluctant to introduce any > change that affects any non-Windows platforms. Also the JVM and JNI work with > modified-UTF8 so I do not think we should diverge from that. > I would hate to see windows specific code introduced into the JDK or JVM's > shared code for these APIs, but that may be the only choice to avoid > potential disruption to other platforms. Though perhaps we could push the > initial conversion down into the JVM? @dholmes-ora Sorry, I don't think anything has changed as to the encoding as of JDK-8017274. For some reason, I had the impression that JVM_LoadLibrary() accepts UTF-8 (either modified or standard), but that was not correct. It is using the platform encoded string for the pathname. @mkartashev As you mentioned in another comment, the only way to fix this issue is to pass UTF-8 down to JVM_LoadLibray, but I don't think it is feasible. One reason is the effort is too great, and the other is that all VM implementations would need to be modified. ------------- PR: https://git.openjdk.java.net/jdk/pull/4169