This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 2b9902d  [SPARK-35831][YARN][TEST-MAVEN] Handle PathOperationException 
in copyFileToRemote on the same src and dest
2b9902d is described below

commit 2b9902d26a5b7e3aeecfed3aa21744d1d2016d26
Author: Dongjoon Hyun <dongj...@apache.org>
AuthorDate: Mon Jun 21 23:28:27 2021 +0800

    [SPARK-35831][YARN][TEST-MAVEN] Handle PathOperationException in 
copyFileToRemote on the same src and dest
    
    ### What changes were proposed in this pull request?
    
    This PR aims to be more robust on the underlying Hadoop library changes. 
Apache Spark's `copyFileToRemote` has an option, `force`, to invoke copying 
always and it can hit `org.apache.hadoop.fs.PathOperationException` in some 
Hadoop versions.
    
    From Apache Hadoop 3.3.1, we reverted 
[HADOOP-16878](https://issues.apache.org/jira/browse/HADOOP-16878) as the last 
revert commit on `branch-3.3.1`. However, it's still in Apache Hadoop 3.4.0.
    - 
https://github.com/apache/hadoop/commit/a3b9c37a397ad4188041dd80621bdeefc46885f2
    
    ### Why are the changes needed?
    
    Currently, Apache Spark Jenkins hits a flakiness issue.
    - 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2/lastCompletedBuild/testReport/org.apache.spark.deploy.yarn/ClientSuite/distribute_jars_archive/history/
    - 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/2459/testReport/junit/org.apache.spark.deploy.yarn/ClientSuite/distribute_jars_archive/
    
    ```
    org.apache.hadoop.fs.PathOperationException:
    `Source 
(file:/home/jenkins/workspace/spark-master-test-maven-hadoop-3.2/resource-managers/yarn/target/tmp/spark-703b8e99-63cc-4ba6-a9bc-25c7cae8f5f9/testJar9120517778809167117.jar)
 and destination 
(/home/jenkins/workspace/spark-master-test-maven-hadoop-3.2/resource-managers/yarn/target/tmp/spark-703b8e99-63cc-4ba6-a9bc-25c7cae8f5f9/testJar9120517778809167117.jar)
    are equal in the copy command.': Operation not supported
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:403)
    ```
    
    Apache Spark has three cases.
    - `!compareFs(srcFs, destFs)`: This is safe because we will not have this 
exception.
    - `"file".equals(srcFs.getScheme)`: This is safe because this cannot be a 
`false` alarm.
    - `force=true`:
        - For the `good` alarm part, Spark works in the same way.
        - For the `false` alarm part, Spark is safe because we use `force = 
true` only for copying `localConfArchive` instead of a general copy between two 
random clusters.
    
    ```scala
    val localConfArchive = new Path(createConfArchive(confsToOverride).toURI())
    copyFileToRemote(destDir, localConfArchive, replication, symlinkCache, 
force = true,
    destName = Some(LOCALIZED_CONF_ARCHIVE))
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. This preserves the previous Apache Spark behavior.
    
    ### How was this patch tested?
    
    Pass the Jenkins with Maven.
    
    Closes #32983 from dongjoon-hyun/SPARK-35831.
    
    Authored-by: Dongjoon Hyun <dongj...@apache.org>
    Signed-off-by: Gengliang Wang <gengli...@apache.org>
---
 .../yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index 427202f..364bc3b 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -401,7 +401,13 @@ private[spark] class Client(
     if (force || !compareFs(srcFs, destFs) || "file".equals(srcFs.getScheme)) {
       destPath = new Path(destDir, destName.getOrElse(srcPath.getName()))
       logInfo(s"Uploading resource $srcPath -> $destPath")
-      FileUtil.copy(srcFs, srcPath, destFs, destPath, false, hadoopConf)
+      try {
+        FileUtil.copy(srcFs, srcPath, destFs, destPath, false, hadoopConf)
+      } catch {
+        // HADOOP-16878 changes the behavior to throw exceptions when src 
equals to dest
+        case e: PathOperationException
+            if 
srcFs.makeQualified(srcPath).equals(destFs.makeQualified(destPath)) =>
+      }
       destFs.setReplication(destPath, replication)
       destFs.setPermission(destPath, new FsPermission(APP_FILE_PERMISSION))
     } else {

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to