Niels Pardon created SPARK-45557: ------------------------------------ Summary: Spark Connect can not be started because of missing user home dir in Docker container Key: SPARK-45557 URL: https://issues.apache.org/jira/browse/SPARK-45557 Project: Spark Issue Type: Bug Components: Spark Docker Affects Versions: 3.5.0, 3.4.1, 3.4.0 Reporter: Niels Pardon
I was trying to start Spark Connect within a container using the Spark Docker container images and ran into an issue where Ivy could not pull the Spark Connect JAR since the user home /home/spark does not exist. Steps to reproduce: 1. Start the Spark container with `/bin/bash` as the command: {code:java} docker run -it --rm apache/spark:3.5.0 /bin/bash {code} 2. Try to start Spark Connect within the container: {code:java} /opt/spark/sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.5.0 {code} which lead to this output: {code:java} starting org.apache.spark.sql.connect.service.SparkConnectServer, logging to /opt/spark/logs/spark--org.apache.spark.sql.connect.service.SparkConnectServer-1-d8470a71dbd7.out failed to launch: nice -n 0 bash /opt/spark/bin/spark-submit --class org.apache.spark.sql.connect.service.SparkConnectServer --name Spark Connect server --packages org.apache.spark:spark-connect_2.12:3.5.0 at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1535) at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:185) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:964) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) full log in /opt/spark/logs/spark--org.apache.spark.sql.connect.service.SparkConnectServer-1-d8470a71dbd7.out {code} where then the full log file looks like this: {code:java} Spark Command: /opt/java/openjdk/bin/java -cp /opt/spark/conf:/opt/spark/jars/* -Xmx1g -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false org.apache.spark.deploy.SparkSubmit --class org.apache.spark.sql.connect.service.SparkConnectServer --name Spark Connect server --packages org.apache.spark:spark-connect_2.12:3.5.0 spark-internal ======================================== :: loading settings :: url = jar:file:/opt/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml Ivy Default Cache set to: /home/spark/.ivy2/cache The jars for the packages stored in: /home/spark/.ivy2/jars org.apache.spark#spark-connect_2.12 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-f8a04936-e8af-4f37-bdb0-e4026a8a3be5;1.0 confs: [default] Exception in thread "main" java.io.FileNotFoundException: /home/spark/.ivy2/cache/resolved-org.apache.spark-spark-submit-parent-f8a04936-e8af-4f37-bdb0-e4026a8a3be5-1.0.xml (No such file or directory) at java.base/java.io.FileOutputStream.open0(Native Method) at java.base/java.io.FileOutputStream.open(Unknown Source) at java.base/java.io.FileOutputStream.<init>(Unknown Source) at java.base/java.io.FileOutputStream.<init>(Unknown Source) at org.apache.ivy.plugins.parser.xml.XmlModuleDescriptorWriter.write(XmlModuleDescriptorWriter.java:71) at org.apache.ivy.plugins.parser.xml.XmlModuleDescriptorWriter.write(XmlModuleDescriptorWriter.java:63) at org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.toIvyFile(DefaultModuleDescriptor.java:553) at org.apache.ivy.core.cache.DefaultResolutionCacheManager.saveResolvedModuleDescriptor(DefaultResolutionCacheManager.java:184) at org.apache.ivy.core.resolve.ResolveEngine.resolve(ResolveEngine.java:259) at org.apache.ivy.Ivy.resolve(Ivy.java:522) at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1535) at org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:185) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:964) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code} The issue is that the user home /home/spark directory does not exist. {code:java} $ ls -l /home total 0 ${code} It seems there is an easy fix: simply switching from useradd to adduser in the Dockerfile should get the user home directory created. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org