[jira] [Updated] (SPARK-45557) Spark Connect can not be started because of missing user home dir in Docker container

2023-10-16 Thread Niels Pardon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niels Pardon updated SPARK-45557:
-
Description: 
I was trying to start Spark Connect within a container using the Spark Docker 
container images and ran into an issue where Ivy could not pull the Spark 
Connect JAR since the user home /home/spark does not exist.

Steps to reproduce:

1. Start the Spark container with `/bin/bash` as the command:
{code:java}
docker run -it --rm apache/spark:3.5.0 /bin/bash {code}
2. Try to start Spark Connect within the container:

 
{code:java}
/opt/spark/sbin/start-connect-server.sh --packages 
org.apache.spark:spark-connect_2.12:3.5.0 {code}
which lead to this output:

 

 
{code:java}
starting org.apache.spark.sql.connect.service.SparkConnectServer, logging to 
/opt/spark/logs/spark--org.apache.spark.sql.connect.service.SparkConnectServer-1-d8470a71dbd7.out
failed to launch: nice -n 0 bash /opt/spark/bin/spark-submit --class 
org.apache.spark.sql.connect.service.SparkConnectServer --name Spark Connect 
server --packages org.apache.spark:spark-connect_2.12:3.5.0
    at 
org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1535)
    at 
org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:185)
    at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
    at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:964)
    at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
    at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
full log in 
/opt/spark/logs/spark--org.apache.spark.sql.connect.service.SparkConnectServer-1-d8470a71dbd7.out
 {code}
where then the full log file looks like this:
{code:java}
Spark Command: /opt/java/openjdk/bin/java -cp /opt/spark/conf:/opt/spark/jars/* 
-Xmx1g -XX:+IgnoreUnrecognizedVMOptions 
--add-opens=java.base/java.lang=ALL-UNNAMED 
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED 
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED 
--add-opens=java.base/java.io=ALL-UNNAMED 
--add-opens=java.base/java.net=ALL-UNNAMED 
--add-opens=java.base/java.nio=ALL-UNNAMED 
--add-opens=java.base/java.util=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED 
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED 
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED 
--add-opens=java.base/sun.security.action=ALL-UNNAMED 
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED 
--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED 
-Djdk.reflect.useDirectMethodHandle=false org.apache.spark.deploy.SparkSubmit 
--class org.apache.spark.sql.connect.service.SparkConnectServer --name Spark 
Connect server --packages org.apache.spark:spark-connect_2.12:3.5.0 
spark-internal

:: loading settings :: url = 
jar:file:/opt/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/spark/.ivy2/cache
The jars for the packages stored in: /home/spark/.ivy2/jars
org.apache.spark#spark-connect_2.12 added as a dependency
:: resolving dependencies :: 
org.apache.spark#spark-submit-parent-f8a04936-e8af-4f37-bdb0-e4026a8a3be5;1.0
confs: [default]
Exception in thread "main" java.io.FileNotFoundException: 
/home/spark/.ivy2/cache/resolved-org.apache.spark-spark-submit-parent-f8a04936-e8af-4f37-bdb0-e4026a8a3be5-1.0.xml
 (No such file or directory)
at java.base/java.io.FileOutputStream.open0(Native Method)
at java.base/java.io.FileOutputStream.open(Unknown Source)
at java.base/java.io.FileOutputStream.(Unknown Source)
at java.base/java.io.FileOutputStream.(Unknown Source)
at 
org.apache.ivy.plugins.parser.xml.XmlModuleDescriptorWriter.write(XmlModuleDescriptorWriter.java:71)
at 
org.apache.ivy.plugins.parser.xml.XmlModuleDescriptorWriter.write(XmlModuleDescriptorWriter.java:63)
at 
org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.toIvyFile(DefaultModuleDescriptor.java:553)
at 
org.apache.ivy.core.cache.DefaultResolutionCacheManager.saveResolvedModuleDescriptor(DefaultResolutionCacheManager.java:184)
at 
org.apache.ivy.core.resolve.ResolveEngine.resolve(ResolveEngine.java:259)
at org.apache.ivy.Ivy.resolve(Ivy.java:522)
at 
org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1535)
at 

[jira] [Created] (SPARK-45557) Spark Connect can not be started because of missing user home dir in Docker container

2023-10-16 Thread Niels Pardon (Jira)
Niels Pardon created SPARK-45557:


 Summary: Spark Connect can not be started because of missing user 
home dir in Docker container
 Key: SPARK-45557
 URL: https://issues.apache.org/jira/browse/SPARK-45557
 Project: Spark
  Issue Type: Bug
  Components: Spark Docker
Affects Versions: 3.5.0, 3.4.1, 3.4.0
Reporter: Niels Pardon


I was trying to start Spark Connect within a container using the Spark Docker 
container images and ran into an issue where Ivy could not pull the Spark 
Connect JAR since the user home /home/spark does not exist.

Steps to reproduce:

1. Start the Spark container with `/bin/bash` as the command:
{code:java}
docker run -it --rm apache/spark:3.5.0 /bin/bash {code}
2. Try to start Spark Connect within the container:

 
{code:java}
/opt/spark/sbin/start-connect-server.sh --packages 
org.apache.spark:spark-connect_2.12:3.5.0 {code}
which lead to this output:

 

 
{code:java}
starting org.apache.spark.sql.connect.service.SparkConnectServer, logging to 
/opt/spark/logs/spark--org.apache.spark.sql.connect.service.SparkConnectServer-1-d8470a71dbd7.out
failed to launch: nice -n 0 bash /opt/spark/bin/spark-submit --class 
org.apache.spark.sql.connect.service.SparkConnectServer --name Spark Connect 
server --packages org.apache.spark:spark-connect_2.12:3.5.0
    at 
org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1535)
    at 
org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:185)
    at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
    at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:964)
    at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
    at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
full log in 
/opt/spark/logs/spark--org.apache.spark.sql.connect.service.SparkConnectServer-1-d8470a71dbd7.out
 {code}
where then the full log file looks like this:
{code:java}
Spark Command: /opt/java/openjdk/bin/java -cp /opt/spark/conf:/opt/spark/jars/* 
-Xmx1g -XX:+IgnoreUnrecognizedVMOptions 
--add-opens=java.base/java.lang=ALL-UNNAMED 
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED 
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED 
--add-opens=java.base/java.io=ALL-UNNAMED 
--add-opens=java.base/java.net=ALL-UNNAMED 
--add-opens=java.base/java.nio=ALL-UNNAMED 
--add-opens=java.base/java.util=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED 
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED 
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED 
--add-opens=java.base/sun.security.action=ALL-UNNAMED 
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED 
--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED 
-Djdk.reflect.useDirectMethodHandle=false org.apache.spark.deploy.SparkSubmit 
--class org.apache.spark.sql.connect.service.SparkConnectServer --name Spark 
Connect server --packages org.apache.spark:spark-connect_2.12:3.5.0 
spark-internal

:: loading settings :: url = 
jar:file:/opt/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/spark/.ivy2/cache
The jars for the packages stored in: /home/spark/.ivy2/jars
org.apache.spark#spark-connect_2.12 added as a dependency
:: resolving dependencies :: 
org.apache.spark#spark-submit-parent-f8a04936-e8af-4f37-bdb0-e4026a8a3be5;1.0
confs: [default]
Exception in thread "main" java.io.FileNotFoundException: 
/home/spark/.ivy2/cache/resolved-org.apache.spark-spark-submit-parent-f8a04936-e8af-4f37-bdb0-e4026a8a3be5-1.0.xml
 (No such file or directory)
at java.base/java.io.FileOutputStream.open0(Native Method)
at java.base/java.io.FileOutputStream.open(Unknown Source)
at java.base/java.io.FileOutputStream.(Unknown Source)
at java.base/java.io.FileOutputStream.(Unknown Source)
at 
org.apache.ivy.plugins.parser.xml.XmlModuleDescriptorWriter.write(XmlModuleDescriptorWriter.java:71)
at 
org.apache.ivy.plugins.parser.xml.XmlModuleDescriptorWriter.write(XmlModuleDescriptorWriter.java:63)
at 
org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.toIvyFile(DefaultModuleDescriptor.java:553)
at 
org.apache.ivy.core.cache.DefaultResolutionCacheManager.saveResolvedModuleDescriptor(DefaultResolutionCacheManager.java:184)
at