Mark Grover created SPARK-10965:
-----------------------------------

             Summary: Optimize filesEqualRecursive
                 Key: SPARK-10965
                 URL: https://issues.apache.org/jira/browse/SPARK-10965
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.5.2
            Reporter: Mark Grover
            Priority: Minor


When we try to download dependencies, if there is a file at the destination 
already, we compare if the files are equal (recursively, if they are 
directories). For files, we compare their bytes. Now, these dependencies can be 
jars and be really large and byte-by-byte comparisons can super slow.

I think it'd be better to do a checksum.
Here's the code in question:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L500



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to