Mark Grover created SPARK-10965: ----------------------------------- Summary: Optimize filesEqualRecursive Key: SPARK-10965 URL: https://issues.apache.org/jira/browse/SPARK-10965 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.5.2 Reporter: Mark Grover Priority: Minor
When we try to download dependencies, if there is a file at the destination already, we compare if the files are equal (recursively, if they are directories). For files, we compare their bytes. Now, these dependencies can be jars and be really large and byte-by-byte comparisons can super slow. I think it'd be better to do a checksum. Here's the code in question: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L500 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org