On 2021-10-26 16:20, Vitaly Davidovich wrote:
Hi Magnus,

On Tue, Oct 26, 2021 at 6:44 AM Magnus Ihse Bursie <magnus.ihse.bur...@oracle.com> wrote:

    On 2021-10-26 01:43, Vitaly Davidovich wrote:
    > Hi all,
    >
    > We're testing some of our code on Java 17 (17.0.1)/linux and hit
    an issue
    > related to libsvml.so.  It seems this library is now part of 17
    to support
    > the (incubating) Vector API.  We have a java library backed (via
    JNI) by
    > NAG, which itself links against libsvml.so.  The issue arises due to
    > java.lang.UnsatisfiedLinkError when our java library is trying
    to call into
    > a NAG function which in turn is looking for a certain symbol
    from libsvml
    > (__svml_exp2_ha_mask in particular).
    >
    > It looks like the JDK is eagerly loading symbols from its
    packaged libsvml
    > (is there a way to disable that for now?).  That version of the
    library is
    > also in conflict with the one we want to load, as witnessed by
    the missing
    > symbol (there're probably others but we stopped testing at this
    point).
    >
    > Is this a known issue/compatibility hazard? Happy to hear
    thoughts/opinions
    > on this and provide further info, if needed.
    It sounds like there are two completely different libsvml.so around,
    muddying the waters. The libsvml.so shipped with the JDK is a core
    part
    of the vector functionality in the JDK. This is highly unlikely to be
    installed as a system library (basically, if anyone has ever
    managed to
    do that, they've worked hard to do the wrong thing).

    A quick googling indicates that there is a separate libsvml.so
    shipped
    by IBM as part of their compiler. My guess is that your JNI
    library is
    using this.

It's using libsvml from a NAG (vendor, https://www.nag.com/content/nag-library) library; the NAG library itself ships a version of Intel's MKL library, which is where libsvml comes from.  It looks like the JDK has its own subset of libsvml: https://github.com/openjdk/jdk/tree/master/src/jdk.incubator.vector/linux/native/libsvml. These all have Intel copyrights, so I assume they're based on the same product.  However, JDK's libsvml is a subset (in terms of total # of symbols) from MKL's libsvml.

Okay, maybe I'm mistaken here. I helped to create libsvml in the JDK by moving files out from src/hotspot, so I just assumed that these were independent products just happening to share the same name.

I've cc:ed Sandhya who were involved in getting libsvml into the JDK to shed some light on this.


    We've run into similar issues in the past. Unfortunately, there is no
    really good way of resolving this. :(

Sigh :(


    One long term solution to minimize this kind of problems would be to
    rename our library "libjsvml.so", in the hope that this is less
    likely
    to clash with another library. This is a similar solution to what
    we've
    been taking in the past with other name-clashing JDK libraries.

This would need to be coupled with JDK dlopen()'ing this "private" lib with RTLD_LOCAL, right? Otherwise, it seems like one may end up with a hybrid situation, where some symbols would be resolved from libjsvm.so and others from libsvm.so.

If this is the case, then we might need to rename the symbols in libjsvml as well. Normally, if we export symbols, they are given a unique (or hopefully unique...) prefix, like JVM_ or Java_<class>_<method>. This consideration was apparently not done for the SVML lib, and if the code was copied with function names unchanged, this went from "unlikely to collide" to "destined to collide". :-(


In the short term (say for a 17.0.2 release), could there be a JVM flag added that could skip loading libsvm.so? I understand that effectively would disable the Vector API, but that's ok for us - we're not currently experimenting with that API.  Just trying to get 17 working with existing "stock" code :).
Hm...

Actually, from looking at the code, I think you can just delete the libsvml.so from the JDK installation. It is already dlload:ed, and if that fails, I think it just silently ignores this.

/Magnus

Reply via email to