[ 
https://issues.apache.org/jira/browse/SPARK-10965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946469#comment-14946469
 ] 

Sean Owen commented on SPARK-10965:
-----------------------------------

You don't need it to be assigned to you, just go ahead. I will add you as a 
"Contributor" here which should grant that permission anyway.
Would you store the checksum? it still entails reading the whole file to 
compute it. The way this method is written, it wouldn't help as it's only 
looking at each file once.
Files.equal is at least comparing by blocks of bytes, not by byte.

> Optimize filesEqualRecursive
> ----------------------------
>
>                 Key: SPARK-10965
>                 URL: https://issues.apache.org/jira/browse/SPARK-10965
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.5.2
>            Reporter: Mark Grover
>            Priority: Minor
>
> When we try to download dependencies, if there is a file at the destination 
> already, we compare if the files are equal (recursively, if they are 
> directories). For files, we compare their bytes. Now, these dependencies can 
> be jars and be really large and byte-by-byte comparisons can super slow.
> I think it'd be better to do a checksum.
> Here's the code in question:
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L500



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to