Cai-Yao opened a new issue, #12103:
URL: https://github.com/apache/gluten/issues/12103
### Backend
VL (Velox)
### Bug description
## Bug description
When running Spark SQL with Gluten enabled on Spark 4.0.1, JVM crashes with
a fatal error (`SIGSEGV`) shortly after loading `libgluten.so` and
`libvelox.so`.
### Expected behavior
Query should run successfully (or fallback gracefully), without JVM crash.
### Actual behavior
Driver JVM crashes with `SIGSEGV` in `libjvm.so` during JNI method lookup
path (`jni_GetMethodID`), and `hs_err_pid*.log` contains:
- `NoClassDefFoundError:
Lorg/apache/gluten/memory/listener/ReservationListener;`
- native libraries loaded from Gluten bundle (`libgluten.so`, `libvelox.so`)
- process exits due to fatal JVM error
This issue report was drafted with assistance from AI.
### Gluten version
main branch
### Spark version
spark-4.0.x
### Spark configurations
Sanitized key configs used in reproduction:
--master local[80]
--class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
--jars
/<REDACTED_PATH>/gluten-velox-bundle-spark4.0_2.13-linux_amd64-1.6.0.jar
--conf spark.plugins=org.apache.gluten.GlutenPlugin
--conf spark.memory.offHeap.enabled=true
--conf spark.memory.offHeap.size=20g
--conf spark.driver.memory=24g
--conf spark.sql.shuffle.partitions=80
--conf spark.sql.crossJoin.enabled=true
--conf spark.sql.legacy.timeParserPolicy=LEGACY
--conf spark.sql.ansi.enabled=false
--conf spark.gluten.sql.columnar.forceShuffledHashJoin=true
--conf spark.sql.warehouse.dir=/<REDACTED_PATH>/warehouse
spark.driver.extraJavaOptions:
-Dderby.system.home=/<REDACTED_PATH>
-XX:+UseG1GC
### System information
JVM args (sanitized, key ones):
JDK: Temurin 17.0.19+10
--add-modules=jdk.incubator.vector
multiple --add-opens=...=ALL-UNNAMED
-Djdk.reflect.useDirectMethodHandle=false
-Dio.netty.tryReflectionSetAccessible=true
System information
Sanitized environment details (from hs_err):
OS: Alibaba Cloud Linux 3
Kernel: 5.10.134-18.al8.x86_64
Arch: x86_64
CPU: Intel(R) Xeon(R) Platinum 8369B, 80 cores
Memory: ~114 GB
glibc: 2.32
Java: OpenJDK Temurin 17.0.19+10 (linux-amd64)
(If needed, I can run dev/info.sh and provide a redacted output.)
### Relevant logs
```bash
# A fatal error has been detected by the Java Runtime Environment:
# SIGSEGV (0xb)
# JRE version: OpenJDK Runtime Environment Temurin-17.0.19+10
# Java VM: OpenJDK 64-Bit Server VM ... linux-amd64
# Problematic frame:
# V [libjvm.so+0x28cda0] ... oop_access_barrier(void*)+0x0
Internal exceptions (20 events):
Event: 14.409 Thread ... Exception <a 'java/lang/NoClassDefFoundError' ...:
Lorg/apache/gluten/memory/listener/ReservationListener;>
thrown [src/hotspot/share/classfile/systemDictionary.cpp, line 245]
Event: 3.960 Loaded shared library .../linux/amd64/libgluten.so
Event: 4.996 Loaded shared library .../linux/amd64/libvelox.so
java_command: org.apache.spark.deploy.SparkSubmit ...
--jars
/<REDACTED_PATH>/gluten-velox-bundle-spark4.0_2.13-linux_amd64-1.6.0.jar ...
Reproduction notes
Same machine can run older Spark 3.3 + Gluten bundle without this crash.
Crash is observed in Spark 4.0.1 + Gluten 1.6.0 (Spark 4.0 / Scala 2.13
bundle).
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]