luhenry commented on a change in pull request #32253:
URL: https://github.com/apache/spark/pull/32253#discussion_r619069785



##########
File path: mllib-local/pom.xml
##########
@@ -75,48 +75,12 @@
       <type>test-jar</type>
       <scope>test</scope>
     </dependency>
+
+    <dependency>
+      <groupId>dev.ludovic.netlib</groupId>
+      <artifactId>blas</artifactId>
+    </dependency>
   </dependencies>
-  <profiles>
-    <profile>
-      <id>netlib-lgpl</id>

Review comment:
       > First some background - I think the issue with licensing comes from 
the submodules that are pulled in by all, like: 
https://github.com/fommil/netlib-java/blob/master/native_ref/xbuilds/linux-x86_64/pom.xml
 netlib-java generates and compiles native code. So things like 
https://mvnrepository.com/artifact/com.github.fommil.netlib/netlib-native_system-linux-x86_64/1.1
 ship a .so and I believe it's that code or what it links to that is LGPL. I've 
honestly forgotten the details but seemed pretty clear (it may be, at least, 
libgfortran). At least, I recall going over this with a fine-toothed comb with 
Cloudera's legal department and they agreed, FWIW.
   
   I can confirm that both `netlib_ref` and `netlib_system` do link to 
`libgfortran` and which is licensed with LGPL.  See 
[[1]](https://github.com/fommil/netlib-java/blob/master/native_ref/xbuilds/linux-x86_64/pom.xml#L66)
 and 
[[2]](https://github.com/fommil/netlib-java/blob/master/native_system/xbuilds/linux-x86_64/pom.xml#L77).
   
   > Anyway, that's the reason behind all this bother with profiles. The 
wrapper in netlib-java is OK to include in distributions and compile against. 
But adding in the dependencies that bring in native code is not. Hence it's 
under a profile that downstream users can turn on for their own builds if 
desired.
   
   So if I understand correctly, the limitation is that any code that 
statically or dynamically links to any LGPL library is out of the picture for 
_distribution_ with Spark. And if any downstream user wants to do it, they can 
but the Open Source Spark project won't. Is that correct?
   
   > So, my question is, where is the native code coming in here to do the 
acceleration?
   
   The main difference that `dev.ludovic.netlib` brings compared to 
`com.github.fommil.netlib:native_ref` or even 
`com.github.fommil.netlib:native_system` is that it doesn't link explicitely to 
any native library (OpenBLAS, MKL, libgfortran) at build-time (there is no 
`-l...` linker flag). Instead, it probes at run-time (via `dlopen`/`dlsym`) for 
a library which name is `libblas.so` (or anything else specified by the user) 
and which exposes certain symbols (`cblas_dgemm`, `cblas_ddot`, for example). 
It doesn't distribute any such library and it doesn't link to any such library 
(statically or dynamically) or any of their dependencies.
   
   If the user wants to use OpenBLAS or Intel MKL, or any other BLAS-like 
library for that matter, it's up to them to install it on their system. Then, 
either the library is named `libblas.so` (or there is such a symbolic 
filesystem link) and is accessible via LD_LIBRARY_PATH, or the user specifies 
the name with `-Ddev.ludovic.netlib.blas.nativeLib` or full path with 
`-Ddev.ludovic.netlib.blas.nativeLibPath`.
   
   Overall `dev.ludovic.netlib` doesn't link to, distribute, nor even favor any 
specific native BLAS implementation. So, by extension, Spark doesn't link to, 
distribute, nor favor any specific native BLAS implementation or external 
library (like libgfortran) with this patch.
   
   > I understand Java 16+ has it built in, that's the whole point of your 
change, but we still want to let people flip on dependencies that will build in 
bindings to OpenBLAS, etc. Where would that be now?
   
   `dev.ludovic.netlib.blas.ForeignBLAS` does load an arbitrary `libblas.so` 
library and probes for symbols like `cblas_dgemm`, `cblas_ddot`, etc. It's 
using `dlopen`/`dlsym` in the JVM directly.
   
   > because I see you're just depending on netlib-java core, which doesn't 
have that. I'm asking if that's what blas is, etc, but not sure that's it 
either.
   
   `blas` contains two things:
   1. the java implementations of the BLAS libraries with fallbacks to F2jBLAS 
(that's VectorizedBLAS for example)
   2. the bindings for native BLAS libraries (with the Foreign Linker API for 
now, and JNI-based bindings in the future).
   
   The contention point is in `2. the binding for native BLAS libraries` AFAIU. 
If it's a blocker, I should be able to break it down into `blas` and 
`blas-native` packages for example.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to