[ https://issues.apache.org/jira/browse/SPARK-41013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642001#comment-17642001 ]
Sean R. Owen commented on SPARK-41013: -------------------------------------- Can you clarify the issue? this doesn't look like it directly relates to Spark, but, the error message is truncated. We need to see the underlying cause > spark-3.1.2以cluster模式提交作业报 Could not initialize class > com.github.luben.zstd.ZstdOutputStream > -------------------------------------------------------------------------------------------- > > Key: SPARK-41013 > URL: https://issues.apache.org/jira/browse/SPARK-41013 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.1.2 > Reporter: yutiantian > Priority: Major > Labels: libzstd-jni, spark.shuffle.mapStatus.compression.codec, > zstd > > 使用spark-3.1.2版本以cluster模式提交作业,报 > Could not initialize class com.github.luben.zstd.ZstdOutputStream。具体日志如下: > Exception in thread "map-output-dispatcher-0" Exception in thread > "map-output-dispatcher-2" java.lang.ExceptionInInitializerError: Cannot > unpack libzstd-jni: No such file or directory at > java.io.UnixFileSystem.createFileExclusively(Native Method) at > java.io.File.createTempFile(File.java:2024) at > com.github.luben.zstd.util.Native.load(Native.java:97) at > com.github.luben.zstd.util.Native.load(Native.java:55) at > com.github.luben.zstd.ZstdOutputStream.<clinit>(ZstdOutputStream.java:16) at > org.apache.spark.io.ZStdCompressionCodec.compressedOutputStream(CompressionCodec.scala:223) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:910) > at > org.apache.spark.ShuffleStatus.$anonfun$serializedMapStatus$2(MapOutputTracker.scala:233) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at > org.apache.spark.ShuffleStatus.withWriteLock(MapOutputTracker.scala:72) at > org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:230) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:466) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) Exception in thread > "map-output-dispatcher-7" Exception in thread "map-output-dispatcher-5" > java.lang.NoClassDefFoundError: Could not initialize class > com.github.luben.zstd.ZstdOutputStream at > org.apache.spark.io.ZStdCompressionCodec.compressedOutputStream(CompressionCodec.scala:223) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:910) > at > org.apache.spark.ShuffleStatus.$anonfun$serializedMapStatus$2(MapOutputTracker.scala:233) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at > org.apache.spark.ShuffleStatus.withWriteLock(MapOutputTracker.scala:72) at > org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:230) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:466) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) Exception in thread > "map-output-dispatcher-4" Exception in thread "map-output-dispatcher-3" > java.lang.NoClassDefFoundError: Could not initialize class > com.github.luben.zstd.ZstdOutputStream at > org.apache.spark.io.ZStdCompressionCodec.compressedOutputStream(CompressionCodec.scala:223) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:910) > at > org.apache.spark.ShuffleStatus.$anonfun$serializedMapStatus$2(MapOutputTracker.scala:233) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at > org.apache.spark.ShuffleStatus.withWriteLock(MapOutputTracker.scala:72) at > org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:230) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:466) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) java.lang.NoClassDefFoundError: > Could not initialize class com.github.luben.zstd.ZstdOutputStream at > org.apache.spark.io.ZStdCompressionCodec.compressedOutputStream(CompressionCodec.scala:223) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:910) > at > org.apache.spark.ShuffleStatus.$anonfun$serializedMapStatus$2(MapOutputTracker.scala:233) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at > org.apache.spark.ShuffleStatus.withWriteLock(MapOutputTracker.scala:72) at > org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:230) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:466) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 但是同样的代码,以client模式提交可以正常执行。 > 以cluster模式提交作业暂时的解决办法是在spark-default.conf > 中配置spark.shuffle.mapStatus.compression.codec lz4 作业可以正常提交。 > 想咨询下cluster模式,在shuffle 过程中使用zstd压缩为什么会不能正常使用呢? > 有任何思路提供的大佬将不胜感激。 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org