path.zip#newname renaming no longer works

Peter Parente (JIRA) Fri, 27 Apr 2018 11:27:17 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-24113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Peter Parente updated SPARK-24113:
----------------------------------
    Description: 
In spark < 2.3.0, using #NAME as part of an archive name results in a symlink 
of NAME in executor yarn containers pointing to the extracted archive. In spark 
2.3.0, the #NAME is no longer honored and the symlink is named after basename 
of the archive file instead.

For instance:
{code:java}
org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf 
spark.executor.memory=8G --conf spark.driver.memory=4g --conf 
spark.driver.maxResultSize=2g --conf spark.sql.catalogImplementation=hive 
--conf spark.executorEnv.PYSPARK_PYTHON=./CONDA/my-custom-env/bin/python --conf 
spark.driver.extraClassPath=./resources/conf --conf spark.executor.cores=5 
--conf spark.dynamicAllocation.maxExecutors=10 --conf 
spark.sql.shuffle.partitions=2000 --conf 
spark.dynamicAllocation.cachedExecutorIdleTimeout=30m --conf 
spark.shuffle.service.enabled=True --conf spark.executor.instances=1 --conf 
spark.yarn.queue=notebook --conf spark.dynamicAllocation.enabled=True --conf 
spark.driver.extraJavaOptions=-Dmpi.conf.dir=./resources/conf --jars 
./resources/PlatformToolkit.jar --keytab /home/p-pparente/.keytab --principal 
pare...@somewhere.com --archives hdfs:///some-path/my-custom-env.zip#CONDA 
--executor-memory 8G --executor-cores 5 pyspark-shell{code}
results in the following in executors containers in Spark 2.2.1 (which is 
correct)
{code:java}
lrwxrwxrwx 1 parente yarn   65 Apr 27 11:44 CONDA -> 
/mnt/disk1/yarn/local/filecache/6013/my-custom-env.zip{code}
and results in the following in executor containers in Spark 2.3.0 (which 
appears to be a regression)
{code:java}
lrwxrwxrwx 1 parente yarn 65 Apr 27 11:51 my-custom-env.zip -> 
/mnt/disk4/yarn/local/filecache/1272/my-custom-env.zip
{code}

  was:
In spark < 2.3.0, using #NAME as part of an archive name results in a symlink 
of NAME in executor yarn containers pointing to the extracted archive. In spark 
2.3.0, the #NAME is no longer honored and the symlink is named after basename 
of the archive file instead.

For instance:
{code:java}
org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf 
spark.executor.memory=8G --conf spark.driver.memory=4g --conf 
spark.driver.maxResultSize=2g --conf spark.sql.catalogImplementation=hive 
--conf spark.executorEnv.PYSPARK_PYTHON=./CONDA/my-custom-env/bin/python --conf 
spark.driver.extraClassPath=./resources/conf --conf spark.executor.cores=5 
--conf spark.dynamicAllocation.maxExecutors=10 --conf 
spark.sql.shuffle.partitions=2000 --conf 
spark.dynamicAllocation.cachedExecutorIdleTimeout=30m --conf 
spark.shuffle.service.enabled=True --conf spark.executor.instances=1 --conf 
spark.yarn.queue=notebook --conf spark.dynamicAllocation.enabled=True --conf 
spark.driver.extraJavaOptions=-Dmpi.conf.dir=./resources/conf --jars 
./resources/PlatformToolkit.jar --keytab /home/p-pparente/.keytab --principal 
p-ppare...@prod.maxpoint.mgt --archives 
hdfs:///some-path/my-custom-env.zip#CONDA --executor-memory 8G --executor-cores 
5 pyspark-shell{code}
results in the following in executors containers in Spark 2.2.1 (which is 
correct)
{code:java}
lrwxrwxrwx 1 parente yarn   65 Apr 27 11:44 CONDA -> 
/mnt/disk1/yarn/local/filecache/6013/my-custom-env.zip{code}
and results in the following in executor containers in Spark 2.3.0 (which 
appears to be a regression)
{code:java}
lrwxrwxrwx 1 parente yarn 65 Apr 27 11:51 my-custom-env.zip -> 
/mnt/disk4/yarn/local/filecache/1272/my-custom-env.zip
{code}


> --archives hdfs://some/path.zip#newname renaming no longer works
> ----------------------------------------------------------------
>
>                 Key: SPARK-24113
>                 URL: https://issues.apache.org/jira/browse/SPARK-24113
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 2.3.0
>            Reporter: Peter Parente
>            Priority: Major
>
> In spark < 2.3.0, using #NAME as part of an archive name results in a symlink 
> of NAME in executor yarn containers pointing to the extracted archive. In 
> spark 2.3.0, the #NAME is no longer honored and the symlink is named after 
> basename of the archive file instead.
> For instance:
> {code:java}
> org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf 
> spark.executor.memory=8G --conf spark.driver.memory=4g --conf 
> spark.driver.maxResultSize=2g --conf spark.sql.catalogImplementation=hive 
> --conf spark.executorEnv.PYSPARK_PYTHON=./CONDA/my-custom-env/bin/python 
> --conf spark.driver.extraClassPath=./resources/conf --conf 
> spark.executor.cores=5 --conf spark.dynamicAllocation.maxExecutors=10 --conf 
> spark.sql.shuffle.partitions=2000 --conf 
> spark.dynamicAllocation.cachedExecutorIdleTimeout=30m --conf 
> spark.shuffle.service.enabled=True --conf spark.executor.instances=1 --conf 
> spark.yarn.queue=notebook --conf spark.dynamicAllocation.enabled=True --conf 
> spark.driver.extraJavaOptions=-Dmpi.conf.dir=./resources/conf --jars 
> ./resources/PlatformToolkit.jar --keytab /home/p-pparente/.keytab --principal 
> pare...@somewhere.com --archives hdfs:///some-path/my-custom-env.zip#CONDA 
> --executor-memory 8G --executor-cores 5 pyspark-shell{code}
> results in the following in executors containers in Spark 2.2.1 (which is 
> correct)
> {code:java}
> lrwxrwxrwx 1 parente yarn   65 Apr 27 11:44 CONDA -> 
> /mnt/disk1/yarn/local/filecache/6013/my-custom-env.zip{code}
> and results in the following in executor containers in Spark 2.3.0 (which 
> appears to be a regression)
> {code:java}
> lrwxrwxrwx 1 parente yarn 65 Apr 27 11:51 my-custom-env.zip -> 
> /mnt/disk4/yarn/local/filecache/1272/my-custom-env.zip
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24113) --archives hdfs://some/path.zip#newname renaming no longer works

Reply via email to