[ https://issues.apache.org/jira/browse/SPARK-26400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marcelo Vanzin resolved SPARK-26400. ------------------------------------ Resolution: Won't Fix Init containers were removed in 2.4, so standard Spark error handling for these errors applies. > [k8s] Init container silently swallows errors when fetching jars from remote > url > -------------------------------------------------------------------------------- > > Key: SPARK-26400 > URL: https://issues.apache.org/jira/browse/SPARK-26400 > Project: Spark > Issue Type: Bug > Components: Kubernetes > Affects Versions: 2.3.2 > Reporter: Stanis Shkel > Priority: Minor > > I run the following command > {code:bash} > spark-2.3.2-bin-hadoop2.7/bin/spark-submit --name client \ > --master "k8s://cluster" \ > --deploy-mode cluster \ > --conf spark.executor.instances=2 \ > --conf spark.executor.memory=5G \ > --conf spark.driver.memory=8G \ > --conf > spark.kubernetes.container.image=rego.azurecr.io/spark:spark-2.3.2-hadoop2.7 \ > --class au.com.random.DoesntMatter \ > "https://fake-link.com/jars/my.jar" > {code} > I expect init container to fail to download jar and get a failure in the init > stage. Instead I get driver failure with the following message. > {code:bash} > ++ id -u > + myuid=0 > ++ id -g > + mygid=0 > ++ getent passwd 0 > + uidentry=root:x:0:0:root:/root:/bin/ash > + '[' -z root:x:0:0:root:/root:/bin/ash ']' > + SPARK_K8S_CMD=driver > + '[' -z driver ']' > + shift 1 > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sed 's/[^=]*=\(.*\)/\1/g' > + sort -t_ -k4 -n > + readarray -t SPARK_JAVA_OPTS > + '[' -n /var/spark-data/spark-jars/my.jar:/var/spark-data/spark-jars/my.jar > ']' > + > SPARK_CLASSPATH=':/opt/spark/jars/*:/var/spark-data/spark-jars/my.jar:/var/spark-data/spark-jars/my.jar' > + '[' -n /var/spark-data/spark-files ']' > + cp -R /var/spark-data/spark-files/. . > + case "$SPARK_K8S_CMD" in > + CMD=(${JAVA_HOME}/bin/java "${SPARK_JAVA_OPTS[@]}" -cp "$SPARK_CLASSPATH" > -Xms$SPARK_DRIVER_MEMORY -Xmx$SPARK_DRIVER_MEMORY > -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS > $SPARK_DRIVER_ARGS) > + exec /sbin/tini -s -- /usr/lib/jvm/java-1.8-openjdk/bin/java > -Dspark.master=k8s://kubernetes:443 > -Dspark.app.id=spark-2f340a028a314e9cb0df8165d887bfb7 > -Dspark.kubernetes.container.image=azure.azurecr.io/spark:spark-2.3.2-hadoop2.7 > -Dspark.submit.deployMode=cluster -Dspark.driver.blockManager.port=7079 > -Dspark.executor.memory=5G > -Dspark.kubernetes.executor.podNamePrefix=client-f20f30e154a13624a728d6f56d45da3e > > -Dspark.jars=https://fake-link.com/jars/my.jar,https://fake-link.com/jars/my.jar > -Dspark.driver.memory=8G -Dspark.driver.port=7078 > -Dspark.kubernetes.driver.pod.name=client-f20f30e154a13624a728d6f56d45da3e-driver > -Dspark.app.name=client > -Dspark.kubernetes.initContainer.configMapKey=spark-init.properties > -Dspark.executor.instances=2 > -Dspark.driver.host=client-f20f30e154a13624a728d6f56d45da3e-driver-svc.default.svc > > -Dspark.kubernetes.initContainer.configMapName=client-f20f30e154a13624a728d6f56d45da3e-init-config > -cp > ':/opt/spark/jars/*:/var/spark-data/spark-jars/my.jar:/var/spark-data/spark-jars/my.jar' > -Xms8G -Xmx8G -Dspark.driver.bindAddress=10.1.0.101 > au.com.random.DoesntMatter > Error: Could not find or load main class au.com.random.DoesntMatter > {code} > This happens because spark-init container failed to download the dependencies > but misreports the status. Here is a log snippet from spark-init container > {code:bash} > ++ id -u > + myuid=0 > ++ id -g > + mygid=0 > ++ getent passwd 0 > + uidentry=root:x:0:0:root:/root:/bin/ash > + '[' -z root:x:0:0:root:/root:/bin/ash ']' > + SPARK_K8S_CMD=init > + '[' -z init ']' > + shift 1 > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sed 's/[^=]*=\(.*\)/\1/g' > + sort -t_ -k4 -n > + readarray -t SPARK_JAVA_OPTS > + '[' -n '' ']' > + '[' -n '' ']' > + case "$SPARK_K8S_CMD" in > + CMD=("$SPARK_HOME/bin/spark-class" > "org.apache.spark.deploy.k8s.SparkPodInitContainer" "$@") > + exec /sbin/tini -s -- /opt/spark/bin/spark-class > org.apache.spark.deploy.k8s.SparkPodInitContainer > /etc/spark-init/spark-init.properties > 2018-12-18 21:15:41 INFO SparkPodInitContainer:54 - Starting init-container > to download Spark application dependencies. > 2018-12-18 21:15:41 WARN NativeCodeLoader:62 - Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 2018-12-18 21:15:41 INFO SecurityManager:54 - Changing view acls to: root > 2018-12-18 21:15:41 INFO SecurityManager:54 - Changing modify acls to: root > 2018-12-18 21:15:41 INFO SecurityManager:54 - Changing view acls groups to: > 2018-12-18 21:15:41 INFO SecurityManager:54 - Changing modify acls groups to: > 2018-12-18 21:15:41 INFO SecurityManager:54 - SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(root); groups > with view permissions: Set(); users with modify permissions: Set(root); > groups with modify permissions: Set() > 2018-12-18 21:15:41 INFO SparkPodInitContainer:54 - Downloading remote jars: > Some(https://fake-link.com/jars/my.jar,https://fake-link.com/jars/my.jar) > 2018-12-18 21:15:41 INFO SparkPodInitContainer:54 - Downloading remote files: > None > 2018-12-18 21:15:42 INFO SparkPodInitContainer:54 - Finished downloading > application dependencies. > {code} > I think the problem resides somewhere around here(I have little experience > with scala, so might be wrong) > > [https://github.com/apache/spark/blob/branch-2.3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkPodInitContainer.scala#L80] > If I remove Future wrapper around fileFetcher I get expected behaviour. My > driver pod will fail with Init:Error and I get the error thrown properly. PS > the error below is from a different command, where I was trying to pull from > a blob, but similar issue. > {code:bash} > Exception in thread "main" org.apache.hadoop.fs.azure.AzureException: > org.apache.hadoop.fs.azure.AzureException: Container qrefinery in account > test.blob.core.windows.net not found, and we can't create it using anoynomous > credentials. > at > org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:938) > at > org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:438) > at > org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1048) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) > at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1910) > at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:700) > at org.apache.spark.util.Utils$.fetchFile(Utils.scala:492) > at > org.apache.spark.deploy.k8s.FileFetcher.fetchFile(SparkPodInitContainer.scala:91) > at > org.apache.spark.deploy.k8s.SparkPodInitContainer$$anonfun$downloadFiles$1$$anonfun$apply$2.apply(SparkPodInitContainer.scala:81) > at > org.apache.spark.deploy.k8s.SparkPodInitContainer$$anonfun$downloadFiles$1$$anonfun$apply$2.apply(SparkPodInitContainer.scala:79) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) > at > org.apache.spark.deploy.k8s.SparkPodInitContainer$$anonfun$downloadFiles$1.apply(SparkPodInitContainer.scala:79) > at > org.apache.spark.deploy.k8s.SparkPodInitContainer$$anonfun$downloadFiles$1.apply(SparkPodInitContainer.scala:77) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.deploy.k8s.SparkPodInitContainer.downloadFiles(SparkPodInitContainer.scala:77) > at > org.apache.spark.deploy.k8s.SparkPodInitContainer.run(SparkPodInitContainer.scala:56) > at > org.apache.spark.deploy.k8s.SparkPodInitContainer$.main(SparkPodInitContainer.scala:113) > at > org.apache.spark.deploy.k8s.SparkPodInitContainer.main(SparkPodInitContainer.scala) > Caused by: org.apache.hadoop.fs.azure.AzureException: Container qrefinery in > account jr3e3d.blob.core.windows.net not found, and we can't create it using > anoynomous credentials. > at > org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:730) > at > org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:933) > ... 22 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org