[ https://issues.apache.org/jira/browse/SPARK-24113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Parente updated SPARK-24113: ---------------------------------- Description: In spark < 2.3.0, using #NAME as part of an archive name results in a symlink of NAME in executor yarn containers pointing to the extracted archive. In spark 2.3.0, the #NAME is no longer honored and the symlink is named after basename of the archive file instead. For instance: {code:java} org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf spark.executor.memory=8G --conf spark.driver.memory=4g --conf spark.driver.maxResultSize=2g --conf spark.sql.catalogImplementation=hive --conf spark.executorEnv.PYSPARK_PYTHON=./CONDA/my-custom-env/bin/python --conf spark.driver.extraClassPath=./resources/conf --conf spark.executor.cores=5 --conf spark.dynamicAllocation.maxExecutors=10 --conf spark.sql.shuffle.partitions=2000 --conf spark.dynamicAllocation.cachedExecutorIdleTimeout=30m --conf spark.shuffle.service.enabled=True --conf spark.executor.instances=1 --conf spark.yarn.queue=notebook --conf spark.dynamicAllocation.enabled=True --conf spark.driver.extraJavaOptions=-Dmpi.conf.dir=./resources/conf --jars ./resources/PlatformToolkit.jar --keytab /home/p-pparente/.keytab --principal pare...@somewhere.com --archives hdfs:///some-path/my-custom-env.zip#CONDA --executor-memory 8G --executor-cores 5 pyspark-shell{code} results in the following in executors containers in Spark 2.2.1 (which is correct) {code:java} lrwxrwxrwx 1 parente yarn 65 Apr 27 11:44 CONDA -> /mnt/disk1/yarn/local/filecache/6013/my-custom-env.zip{code} and results in the following in executor containers in Spark 2.3.0 (which appears to be a regression) {code:java} lrwxrwxrwx 1 parente yarn 65 Apr 27 11:51 my-custom-env.zip -> /mnt/disk4/yarn/local/filecache/1272/my-custom-env.zip {code} was: In spark < 2.3.0, using #NAME as part of an archive name results in a symlink of NAME in executor yarn containers pointing to the extracted archive. In spark 2.3.0, the #NAME is no longer honored and the symlink is named after basename of the archive file instead. For instance: {code:java} org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf spark.executor.memory=8G --conf spark.driver.memory=4g --conf spark.driver.maxResultSize=2g --conf spark.sql.catalogImplementation=hive --conf spark.executorEnv.PYSPARK_PYTHON=./CONDA/my-custom-env/bin/python --conf spark.driver.extraClassPath=./resources/conf --conf spark.executor.cores=5 --conf spark.dynamicAllocation.maxExecutors=10 --conf spark.sql.shuffle.partitions=2000 --conf spark.dynamicAllocation.cachedExecutorIdleTimeout=30m --conf spark.shuffle.service.enabled=True --conf spark.executor.instances=1 --conf spark.yarn.queue=notebook --conf spark.dynamicAllocation.enabled=True --conf spark.driver.extraJavaOptions=-Dmpi.conf.dir=./resources/conf --jars ./resources/PlatformToolkit.jar --keytab /home/p-pparente/.keytab --principal p-ppare...@prod.maxpoint.mgt --archives hdfs:///some-path/my-custom-env.zip#CONDA --executor-memory 8G --executor-cores 5 pyspark-shell{code} results in the following in executors containers in Spark 2.2.1 (which is correct) {code:java} lrwxrwxrwx 1 parente yarn 65 Apr 27 11:44 CONDA -> /mnt/disk1/yarn/local/filecache/6013/my-custom-env.zip{code} and results in the following in executor containers in Spark 2.3.0 (which appears to be a regression) {code:java} lrwxrwxrwx 1 parente yarn 65 Apr 27 11:51 my-custom-env.zip -> /mnt/disk4/yarn/local/filecache/1272/my-custom-env.zip {code} > --archives hdfs://some/path.zip#newname renaming no longer works > ---------------------------------------------------------------- > > Key: SPARK-24113 > URL: https://issues.apache.org/jira/browse/SPARK-24113 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 2.3.0 > Reporter: Peter Parente > Priority: Major > > In spark < 2.3.0, using #NAME as part of an archive name results in a symlink > of NAME in executor yarn containers pointing to the extracted archive. In > spark 2.3.0, the #NAME is no longer honored and the symlink is named after > basename of the archive file instead. > For instance: > {code:java} > org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf > spark.executor.memory=8G --conf spark.driver.memory=4g --conf > spark.driver.maxResultSize=2g --conf spark.sql.catalogImplementation=hive > --conf spark.executorEnv.PYSPARK_PYTHON=./CONDA/my-custom-env/bin/python > --conf spark.driver.extraClassPath=./resources/conf --conf > spark.executor.cores=5 --conf spark.dynamicAllocation.maxExecutors=10 --conf > spark.sql.shuffle.partitions=2000 --conf > spark.dynamicAllocation.cachedExecutorIdleTimeout=30m --conf > spark.shuffle.service.enabled=True --conf spark.executor.instances=1 --conf > spark.yarn.queue=notebook --conf spark.dynamicAllocation.enabled=True --conf > spark.driver.extraJavaOptions=-Dmpi.conf.dir=./resources/conf --jars > ./resources/PlatformToolkit.jar --keytab /home/p-pparente/.keytab --principal > pare...@somewhere.com --archives hdfs:///some-path/my-custom-env.zip#CONDA > --executor-memory 8G --executor-cores 5 pyspark-shell{code} > results in the following in executors containers in Spark 2.2.1 (which is > correct) > {code:java} > lrwxrwxrwx 1 parente yarn 65 Apr 27 11:44 CONDA -> > /mnt/disk1/yarn/local/filecache/6013/my-custom-env.zip{code} > and results in the following in executor containers in Spark 2.3.0 (which > appears to be a regression) > {code:java} > lrwxrwxrwx 1 parente yarn 65 Apr 27 11:51 my-custom-env.zip -> > /mnt/disk4/yarn/local/filecache/1272/my-custom-env.zip > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org