sagarlakshmipathy opened a new issue, #4924: URL: https://github.com/apache/incubator-gluten/issues/4924
### Backend VL (Velox) ### Bug description [Expected behavior]: spark-shell to successfully start [actual behavior]: java.lang.ClassNotFoundException: org.apache.spark.shuffle.sort.ColumnarShuffleManager ### Spark version None ### Spark configurations Spark 3.4.1 ``` ./spark-3.4.1-bin-hadoop3/bin/spark-shell --master yarn --deploy-mode client --jars https://github.com/apache/incubator-gluten/releases/download/v1.1.1/gluten-velox-bundle-spark3.4_2.12-1.1.1.jar --conf spark.plugins=io.glutenproject.GlutenPlugin --conf spark.memory.offHeap.enabled=true --conf spark.memory.offHeap.size=10g --conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager ``` ### System information EMR Amazon Linux 2 ### Relevant logs ```bash Spark 3.4.1 succeeds with when you do ./spark-3.4.1-bin-hadoop3/bin/spark-shell --jars https://github.com/apache/incubator-gluten/releases/download/v1.1.1/gluten-velox-bundle-spark3.4_2.12-1.1.1.jar --conf spark.plugins=io.glutenproject.GlutenPlugin --conf spark.memory.offHeap.enabled=true --conf spark.memory.offHeap.size=10g --conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager ``` But fails with ``` Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1894) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:62) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:428) at org.apache.spark.executor.YarnCoarseGrainedExecutorBackend$.main(YarnCoarseGrainedExecutorBackend.scala:83) at org.apache.spark.executor.YarnCoarseGrainedExecutorBackend.main(YarnCoarseGrainedExecutorBackend.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.shuffle.sort.ColumnarShuffleManager at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:225) at org.apache.spark.util.Utils$.instantiateSerializerOrShuffleManager(Utils.scala:2715) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:323) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:212) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:477) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:63) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) ... 4 more [2024-03-12 00:14:24.409]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties 24/03/12 00:14:22 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 9293@ip-10-0-113-51 24/03/12 00:14:22 INFO SignalUtils: Registering signal handler for TERM 24/03/12 00:14:22 INFO SignalUtils: Registering signal handler for HUP 24/03/12 00:14:22 INFO SignalUtils: Registering signal handler for INT 24/03/12 00:14:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 24/03/12 00:14:23 INFO SecurityManager: Changing view acls to: yarn,hadoop 24/03/12 00:14:23 INFO SecurityManager: Changing modify acls to: yarn,hadoop 24/03/12 00:14:23 INFO SecurityManager: Changing view acls groups to: 24/03/12 00:14:23 INFO SecurityManager: Changing modify acls groups to: 24/03/12 00:14:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: yarn, hadoop; groups with view permissions: EMPTY; users with modify permissions: yarn, hadoop; groups with modify permissions: EMPTY 24/03/12 00:14:23 INFO TransportClientFactory: Successfully created connection to ip-10-0-97-161.us-west-2.compute.internal/10.0.97.161:46097 after 67 ms (0 ms spent in bootstraps) 24/03/12 00:14:24 INFO SecurityManager: Changing view acls to: yarn,hadoop 24/03/12 00:14:24 INFO SecurityManager: Changing modify acls to: yarn,hadoop 24/03/12 00:14:24 INFO SecurityManager: Changing view acls groups to: 24/03/12 00:14:24 INFO SecurityManager: Changing modify acls groups to: 24/03/12 00:14:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: yarn, hadoop; groups with view permissions: EMPTY; users with modify permissions: yarn, hadoop; groups with modify permissions: EMPTY 24/03/12 00:14:24 INFO TransportClientFactory: Successfully created connection to ip-10-0-97-161.us-west-2.compute.internal/10.0.97.161:46097 after 1 ms (0 ms spent in bootstraps) Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1894) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:62) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:428) at org.apache.spark.executor.YarnCoarseGrainedExecutorBackend$.main(YarnCoarseGrainedExecutorBackend.scala:83) at org.apache.spark.executor.YarnCoarseGrainedExecutorBackend.main(YarnCoarseGrainedExecutorBackend.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.shuffle.sort.ColumnarShuffleManager at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:225) at org.apache.spark.util.Utils$.instantiateSerializerOrShuffleManager(Utils.scala:2715) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:323) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:212) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:477) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:63) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) ... 4 more . 24/03/12 00:14:25 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver to remove executor 2 for reason Container from a bad node: container_1710201709010_0005_01_000004 on host: ip-10-0-77-29.us-west-2.compute.internal. Exit status: 1. Diagnostics: [2024-03-12 00:14:24.266]Exception from container-launch. Container id: container_1710201709010_0005_01_000004 Exit code: 1 ``` when you do ``` ./spark-3.4.1-bin-hadoop3/bin/spark-she --master yarn --deploy-mode client --jars https://github.com/apache/incubator-gluten/releases/download/v1.1.1/gluten-velox-bundle-spark3.4_2.12-1.1.1.jar --conf spark.plugins=io.glutenproject.GlutenPlugin --conf spark.memory.offHeap.enabled=true --conf spark.memory.offHeap.size=10g --conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager ``` This only happens with gluten project, because ``` ./spark-3.4.1-bin-hadoop3/bin/spark-shell --master yarn --deploy-mode client ``` succeeds ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@gluten.apache.org For additional commands, e-mail: commits-h...@gluten.apache.org