On Wed, 12 Oct 2022 17:00:15 GMT, Andrew Haley <a...@openjdk.org> wrote:

>> A bug in GCC causes shared libraries linked with -ffast-math to disable 
>> denormal arithmetic. This breaks Java's floating-point semantics.
>> 
>> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522
>> 
>> One solution is to save and restore the floating-point control word around 
>> System.loadLibrary(). This isn't perfect, because some shared library might 
>> load another shared library at runtime, but it's a lot better than what we 
>> do now. 
>> 
>> However, this fix is not complete. `dlopen()` is called from many places in 
>> the JDK. I guess the best thing to do is find and wrap them all. I'd like to 
>> hear people's opinions.
>
> Andrew Haley has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   8295159: DSO created with -ffast-math breaks Java floating-point arithmetic

So, IMO the discussion boils down to how we want a misbehaving native library 
to be handled by the JVM.

The ABI lists MXCSR as a callee-saved register, so there's nothing wrong on JVM 
side from that perspective.

>From a quality of implementation perspective though, JVM could do a better job 
>at catching broken libraries. Of course, there are numerous ways for a native 
>code to break the JVM, but in this particular case, it looks trivial to catch 
>the problem. The question is how much overhead we can afford to introduce for 
>that. Whether it should be an opt-in solution (e.g., `-Xcheck:jni` or 
>`-XX:+AlwaysRestoreFPU`/`-XX:+RestoreMXCSROnJNICalls`), opt-out 
>(unconditionally recover or report an error when FP env is corrupted, 
>optionally providing a way to turn it off), or apply a band-aid fix just to 
>fix the immediate problem with GCC's fast-math mode.

I'd like to dissuade from going with just a band-aid fix (we already went 
through that multiple times with different level of success) and try to improve 
the overall experience JVM provides. It feels like just pushing the problem 
further away and it would be very unfortunate to repeat the very same exercise 
in the future. 

My preferred solution would be to automatically detect the corruption and 
restore MXCSR register across a JNI call, but if it turns out to be too 
expensive, JVM could check for MXCSR register corruption after every JNI call 
and crash issuing a message with diagnostic details about where corruption 
happened (info about library and entry) offering to turn on  
`-XX:+AlwaysRestoreFPU`/`-XX:+RestoreMXCSROnJNICalls` as a stop-the-gap 
solution. It would send users a clear signal there's something wrong with their 
code/environment, but still giving them an option to workaround the problem 
while fixing the issue. 

Saying that, I'd like to stress that I'm perfectly fine with addressing the 
general issue of misbehaving native libraries separately (if we agree it's 
worth it) and I trust @dholmes-ora and @theRealAph to choose the most 
appropriate fix for this particular bug.

-------------

PR: https://git.openjdk.org/jdk/pull/10661

Reply via email to