[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-18 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-67601524
  
This looks good to me, so I'm ready to merge it.  One final little thing, 
though: since we have two separate PRs that address SPARK-3967, I'd like to 
create a new JIRA for this fix just in case this doesn't fix the YARN issue in 
the original ticket.

Therefore, I've opened https://issues.apache.org/jira/browse/SPARK-4896.  
Do you mind editing this PR's title to reference that JIRA instead?  Once you 
do that, I'll merge this into all of the maintenance branches and resolve the 
original JIRA.

Thanks for your patience during this long review and with the build 
difficulties.  I think that the end result here is a significant improvement 
over the old `fetchFile` code, so I'm looking forward to merging this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-16 Thread ryan-williams
Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-67253690
  
Thanks a lot for the build-process brain-dump, @JoshRosen; some of that is 
way better than processes I've wasted a lot of time on. I'm interested in 
getting all of these tips and tricks pushed to Spark docs, but even after all 
of the emailing your response includes tricks I've not seen documented anywhere 
(e.g. `SPARK_PREPEND_CLASSES`), so I'm leery of spending time trying to 
document what I've learned for fear that I'm still not at the bottom of the 
rabbit hole.

In any case, I was able to debug and (I believe) fix the test failure, 
which was a result of having inadvertently changed a file copy to a move, 
so it's a good thing the test caught it.

This should be ready to re-test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-16 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-67257189
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-67257529
  
  [Test build #24518 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24518/consoleFull)
 for   PR 2848 at commit 
[`c14daff`](https://github.com/apache/spark/commit/c14daffb3051ec53921cc9e5532d95c89a433c3b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-67264665
  
  [Test build #24518 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24518/consoleFull)
 for   PR 2848 at commit 
[`c14daff`](https://github.com/apache/spark/commit/c14daffb3051ec53921cc9e5532d95c89a433c3b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-67264671
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24518/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-15 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-67106447
  
Disclaimer: for iterative debugging, I use `sbt` to build Spark, not Maven. 
 Spark 1.2.0 has docs on building with SBT.  If possible, I'd switch to using 
that workflow.

The issue here is probably that you're running a full `mvn clean` and 
starting over from scratch after each change.  I'd like to help move this PR 
along, so the following is going to be some interactive logs to see whether I 
can quickly iterate on this using Maven.

Let's say that I'm starting from a completely cold build (but with Maven 
dependencies already downloaded):

```bash
# Since I have zinc installed, I'll use it:
zinc -start
git checkout /a/branch/with/your/pr/code
# Here, the -T C1 says build in parallel with one thread per core:
time mvn -T C1 clean package -DskipTests
```

This didn't take _super_ long, but it was a few minutes:

```
real4m19.537s
user3m14.634s
sys 0m16.882s
```

Let's run just the test suite that we're interested in ([instructions from 
here](https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools):

```
time mvn -T C1 test -DwildcardSuites=org.apache.spark.FileServerSuite
```

This took a little while because it had to build a bunch of test sources, 
but it was only a few seconds before the tests started running (and failing):

```
real0m33.968s
user0m36.544s
sys 0m3.032s

FileServerSuite:
- Distributing files locally
- Distributing files locally security On *** FAILED ***
  java.io.FileNotFoundException: 
/var/folders/0k/2qp2p2vs7bv033vljnb8nk1cgn/T/spark-7206dcdd-9db6-4d78-ac9e-2de69619dc67/test/FileServerSuite.txt
 (No such file or directory)
  at java.io.FileInputStream.open(Native Method)
  at java.io.FileInputStream.init(FileInputStream.java:146)
  at com.google.common.io.Files$FileByteSource.openStream(Files.java:124)
  at com.google.common.io.Files$FileByteSource.openStream(Files.java:114)
  at com.google.common.io.ByteSource.copyTo(ByteSource.java:202)
  at com.google.common.io.Files.copy(Files.java:436)
  at org.apache.spark.HttpFileServer.addFileToDir(HttpFileServer.scala:72)
  at org.apache.spark.HttpFileServer.addFile(HttpFileServer.scala:55)
  at org.apache.spark.SparkContext.addFile(SparkContext.scala:965)
  at 
org.apache.spark.FileServerSuite$$anonfun$3.apply$mcV$sp(FileServerSuite.scala:96)
  ...
- Distributing files locally using URL as input *** FAILED ***
  java.io.FileNotFoundException: 
/var/folders/0k/2qp2p2vs7bv033vljnb8nk1cgn/T/spark-7206dcdd-9db6-4d78-ac9e-2de69619dc67/test/FileServerSuite.txt
 (No such file or directory)
  at java.io.FileInputStream.open(Native Method)
  at java.io.FileInputStream.init(FileInputStream.java:146)
  at com.google.common.io.Files$FileByteSource.openStream(Files.java:124)
  at com.google.common.io.Files$FileByteSource.openStream(Files.java:114)
  at com.google.common.io.ByteSource.copyTo(ByteSource.java:202)
  at com.google.common.io.Files.copy(Files.java:436)
  at org.apache.spark.HttpFileServer.addFileToDir(HttpFileServer.scala:72)
  at org.apache.spark.HttpFileServer.addFile(HttpFileServer.scala:55)
  at org.apache.spark.SparkContext.addFile(SparkContext.scala:965)
  at 
org.apache.spark.FileServerSuite$$anonfun$5.apply$mcV$sp(FileServerSuite.scala:112)
  ...
 - Dynamically adding JARS locally
- Distributing files on a standalone cluster *** FAILED ***
  java.io.FileNotFoundException: 
/var/folders/0k/2qp2p2vs7bv033vljnb8nk1cgn/T/spark-7206dcdd-9db6-4d78-ac9e-2de69619dc67/test/FileServerSuite.txt
 (No such file or directory)
  at java.io.FileInputStream.open(Native Method)
  at java.io.FileInputStream.init(FileInputStream.java:146)
  at com.google.common.io.Files$FileByteSource.openStream(Files.java:124)
  at com.google.common.io.Files$FileByteSource.openStream(Files.java:114)
  at com.google.common.io.ByteSource.copyTo(ByteSource.java:202)
  at com.google.common.io.Files.copy(Files.java:436)
  at org.apache.spark.HttpFileServer.addFileToDir(HttpFileServer.scala:72)
  at org.apache.spark.HttpFileServer.addFile(HttpFileServer.scala:55)
  at org.apache.spark.SparkContext.addFile(SparkContext.scala:965)
  at 
org.apache.spark.FileServerSuite$$anonfun$8.apply$mcV$sp(FileServerSuite.scala:137)
  ...
- Dynamically adding JARS on a standalone cluster
- Dynamically adding JARS on a standalone cluster using local: URL
```

Let's try adding a print statement to `SparkContext.addFile`, then 
re-running the tests.  We could do this by re-packaging:

```
mvn -T C1 package -DskipTests
```

```
[INFO] BUILD SUCCESS
[INFO] 

[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-13 Thread ryan-williams
Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-66904556
  
interesting. this may be more appropriate on my [gigantic thread about 
testing/development best 
practices](http://apache-spark-developers-list.1001551.n3.nabble.com/Spurious-test-failures-testing-best-practices-td9560.html),
 but do you have any recommendations for debugging this?

AFAICT, this seems to be a test that requires a full assembly-build to run; 
I've tried running `mvn -Dsuites=FileServerSuite test` on a branch cut from the 
same `master` that this PR is cut from 
(b004150adb503ddbb54d5cd544e39ad974497c41) and I get strange failure modes, 
usually just that the test seems to hang for hours. I ran a `mvn package 
-DskipTests` to build a fresh assembly, and tried again, and it still hung. I'm 
currently waiting on a `mvn clean package -DskipTests`, at which point I will 
try again. Debugging / iterating on a particular (integration) test, like I 
need to do here, remains a prohibitively slow process for me.

I'm guessing this failure must have something to do with having moved / 
removed (temp?) files that this test was relying on; my naive approach will be 
to comment out some of the move/removal lines and see if that fixes the test, 
but on pain of running a `mvn clean package -DskipTests` every time.

Is there a better way?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-12 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-66865120
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-66865189
  
  [Test build #24431 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24431/consoleFull)
 for   PR 2848 at commit 
[`bcf2543`](https://github.com/apache/spark/commit/bcf2543de693ce6bf8a9294e0af028d2e1be3ccc).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-66866426
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24431/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-66866422
  
  [Test build #24431 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24431/consoleFull)
 for   PR 2848 at commit 
[`bcf2543`](https://github.com/apache/spark/commit/bcf2543de693ce6bf8a9294e0af028d2e1be3ccc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-12 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-66868021
  
Hmm, maybe a legitimate test failure?

```
[info] - Distributing files locally security On *** FAILED *** (82 
milliseconds)
[info]   java.io.FileNotFoundException: 
/tmp/spark-27c67e3e-d5f5-4d86-b9fa-948319ea6b7a/test/FileServerSuite.txt (No 
such file or directory)
[info]   at java.io.FileInputStream.open(Native Method)
[info]   at java.io.FileInputStream.init(FileInputStream.java:146)
[info]   at 
com.google.common.io.Files$FileByteSource.openStream(Files.java:127)
[info]   at 
com.google.common.io.Files$FileByteSource.openStream(Files.java:117)
[info]   at com.google.common.io.ByteSource.copyTo(ByteSource.java:247)
[info]   at com.google.common.io.Files.copy(Files.java:458)
[info]   at 
org.apache.spark.HttpFileServer.addFileToDir(HttpFileServer.scala:72)
[info]   at org.apache.spark.HttpFileServer.addFile(HttpFileServer.scala:55)
[info]   at org.apache.spark.SparkContext.addFile(SparkContext.scala:965)
[info]   at 
org.apache.spark.FileServerSuite$$anonfun$3.apply$mcV$sp(FileServerSuite.scala:96)
[info]   at 
org.apache.spark.FileServerSuite$$anonfun$3.apply(FileServerSuite.scala:90)
[info]   at 
org.apache.spark.FileServerSuite$$anonfun$3.apply(FileServerSuite.scala:90)
[info]   at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
[info]   at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
[info]   at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
[info]   at 
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[...]
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-11 Thread ryan-williams
Github user ryan-williams commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21710708
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
+
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-11 Thread ryan-williams
Github user ryan-williams commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21710662
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
--- End diff --

done. I can't get intellij to do this kind of spacing by default, have you 
guys succeeded in making it do so?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-11 Thread ryan-williams
Github user ryan-williams commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21710743
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
+
+  }
+
+  /**
+   * Copy file from `sourceFile` to `destFile`, checking whether 
`destFile` already exists, has
+   * the same contents as the downloaded file, and can be overwritten. 
Optionally removes
+   * `sourceFile` by moving instead of copying.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param sourceFile File path to copy/move from.
+   * @param destFile File path to copy/move to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   * @param removeSourceFile Whether to remove `sourceFile` after / as 
part of moving/copying it to
+   * `destFile`.
+   */
+  private def copyFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean,
+removeSourceFile: Boolean = false): Unit = {
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-11 Thread ryan-williams
Github user ryan-williams commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21710876
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
+
+  }
+
+  /**
+   * Copy file from `sourceFile` to `destFile`, checking whether 
`destFile` already exists, has
+   * the same contents as the downloaded file, and can be overwritten. 
Optionally removes
+   * `sourceFile` by moving instead of copying.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param sourceFile File path to copy/move from.
+   * @param destFile File path to copy/move to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   * @param removeSourceFile Whether to remove `sourceFile` after / as 
part of moving/copying it to
+   * `destFile`.
+   */
+  private def copyFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean,
+removeSourceFile: Boolean = false): Unit = {
+
+var shouldCopy = true
+if (destFile.exists) {
+  if (!Files.equal(sourceFile, destFile)) {
+if (fileOverwrite) {
+  destFile.delete()
+  logInfo(
+sFile $destFile exists and does not match contents of $url, 
replacing it with $url
+  )
+} else {
+  throw new SparkException(
+sFile $destFile exists and does not match contents of $url)
+}
+  } else {
+// Do nothing if the file contents are the same, i.e. this file 
has been copied
+// previously.
+logInfo(
+  s${sourceFile.getAbsolutePath} has been previously copied to  +
+destFile.getAbsolutePath
--- End diff --

`  s${sourceFile.getAbsolutePath} has been previously copied to 
${destFile.getAbsolutePath}` is 101 chars long. I don't see a way to use 
interpolation and not concatenation while staying = 100 chars. please advise 
@andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-11 Thread ryan-williams
Github user ryan-williams commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21711105
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
+
+  }
+
+  /**
+   * Copy file from `sourceFile` to `destFile`, checking whether 
`destFile` already exists, has
+   * the same contents as the downloaded file, and can be overwritten. 
Optionally removes
+   * `sourceFile` by moving instead of copying.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param sourceFile File path to copy/move from.
+   * @param destFile File path to copy/move to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   * @param removeSourceFile Whether to remove `sourceFile` after / as 
part of moving/copying it to
+   * `destFile`.
+   */
+  private def copyFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean,
+removeSourceFile: Boolean = false): Unit = {
+
+var shouldCopy = true
+if (destFile.exists) {
+  if (!Files.equal(sourceFile, destFile)) {
+if (fileOverwrite) {
+  destFile.delete()
+  logInfo(
+sFile $destFile exists and does not match contents of $url, 
replacing it with $url
+  )
+} else {
+  throw new SparkException(
+sFile $destFile exists and does not match contents of $url)
+}
+  } else {
+// Do nothing if the file contents are the same, i.e. this file 
has been copied
+// previously.
+logInfo(
+  s${sourceFile.getAbsolutePath} has been previously copied to  +
+destFile.getAbsolutePath
--- End diff --

see also: [previous diff that used only 
concatenation](https://github.com/apache/spark/pull/2848#discussion-diff-21350437R429)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-11 Thread ryan-williams
Github user ryan-williams commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21711212
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
+
+  }
+
+  /**
+   * Copy file from `sourceFile` to `destFile`, checking whether 
`destFile` already exists, has
+   * the same contents as the downloaded file, and can be overwritten. 
Optionally removes
+   * `sourceFile` by moving instead of copying.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param sourceFile File path to copy/move from.
+   * @param destFile File path to copy/move to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   * @param removeSourceFile Whether to remove `sourceFile` after / as 
part of moving/copying it to
+   * `destFile`.
+   */
+  private def copyFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean,
+removeSourceFile: Boolean = false): Unit = {
+
+var shouldCopy = true
+if (destFile.exists) {
+  if (!Files.equal(sourceFile, destFile)) {
+if (fileOverwrite) {
+  destFile.delete()
+  logInfo(
+sFile $destFile exists and does not match contents of $url, 
replacing it with $url
+  )
+} else {
+  throw new SparkException(
+sFile $destFile exists and does not match contents of $url)
+}
+  } else {
+// Do nothing if the file contents are the same, i.e. this file 
has been copied
+// previously.
+logInfo(
+  s${sourceFile.getAbsolutePath} has been previously copied to  +
+destFile.getAbsolutePath
--- End diff --

I've changed it to a version that uses `String.format`:
```
logInfo(
  %s has been previously copied to %s.format(
sourceFile.getAbsolutePath,
destFile.getAbsolutePath
  )
)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-11 Thread ryan-williams
Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-66693008
  
still adding the requested changes, stay tuned


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-11 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21712018
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
+
+  }
+
+  /**
+   * Copy file from `sourceFile` to `destFile`, checking whether 
`destFile` already exists, has
+   * the same contents as the downloaded file, and can be overwritten. 
Optionally removes
+   * `sourceFile` by moving instead of copying.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param sourceFile File path to copy/move from.
+   * @param destFile File path to copy/move to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   * @param removeSourceFile Whether to remove `sourceFile` after / as 
part of moving/copying it to
+   * `destFile`.
+   */
+  private def copyFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean,
+removeSourceFile: Boolean = false): Unit = {
+
+var shouldCopy = true
+if (destFile.exists) {
+  if (!Files.equal(sourceFile, destFile)) {
+if (fileOverwrite) {
+  destFile.delete()
+  logInfo(
+sFile $destFile exists and does not match contents of $url, 
replacing it with $url
+  )
+} else {
+  throw new SparkException(
+sFile $destFile exists and does not match contents of $url)
+}
+  } else {
+// Do nothing if the file contents are the same, i.e. this file 
has been copied
+// previously.
+logInfo(
+  s${sourceFile.getAbsolutePath} has been previously copied to  +
+destFile.getAbsolutePath
--- End diff --

ok by the way this is not super important so it's ok if you don't fix it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-11 Thread ryan-williams
Github user ryan-williams commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21720826
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
+
+  }
+
+  /**
+   * Copy file from `sourceFile` to `destFile`, checking whether 
`destFile` already exists, has
+   * the same contents as the downloaded file, and can be overwritten. 
Optionally removes
+   * `sourceFile` by moving instead of copying.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param sourceFile File path to copy/move from.
+   * @param destFile File path to copy/move to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   * @param removeSourceFile Whether to remove `sourceFile` after / as 
part of moving/copying it to
+   * `destFile`.
+   */
+  private def copyFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean,
+removeSourceFile: Boolean = false): Unit = {
+
+var shouldCopy = true
--- End diff --

cool, I've moved to that. we rearranged the logic deck-chairs of this block 
during an earlier review pass and settled on the `var shouldCopy` approach, but 
I can accept the `return` as being marginally less aesthetically unappealing :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-11 Thread ryan-williams
Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-66716295
  
OK, I added logic for making sure we always delete `tempFile` and throw an 
exception if deletion of an existing `sourceFile` fails; I think I hit all the 
other feedback as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-09 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21556089
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
--- End diff --

(but keep the Unit return type!)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-09 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21556631
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
+
+  }
+
+  /**
+   * Copy file from `sourceFile` to `destFile`, checking whether 
`destFile` already exists, has
+   * the same contents as the downloaded file, and can be overwritten. 
Optionally removes
+   * `sourceFile` by moving instead of copying.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param sourceFile File path to copy/move from.
+   * @param destFile File path to copy/move to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   * @param removeSourceFile Whether to remove `sourceFile` after / as 
part of moving/copying it to
+   * `destFile`.
+   */
+  private def copyFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean,
+removeSourceFile: Boolean = false): Unit = {
--- End diff --

indent these by 2 spaces


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-09 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21556581
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
+
+  }
+
+  /**
+   * Copy file from `sourceFile` to `destFile`, checking whether 
`destFile` already exists, has
+   * the same contents as the downloaded file, and can be overwritten. 
Optionally removes
+   * `sourceFile` by moving instead of copying.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param sourceFile File path to copy/move from.
+   * @param destFile File path to copy/move to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   * @param removeSourceFile Whether to remove `sourceFile` after / as 
part of moving/copying it to
+   * `destFile`.
+   */
+  private def copyFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean,
+removeSourceFile: Boolean = false): Unit = {
+
+var shouldCopy = true
+if (destFile.exists) {
+  if (!Files.equal(sourceFile, destFile)) {
+if (fileOverwrite) {
+  destFile.delete()
--- End diff --

what if `delete` fails? We'll still log the message it already exists, 
replacing. We should throw an exception or something if `delete` returns false.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-09 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21556716
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
+
+  }
+
+  /**
+   * Copy file from `sourceFile` to `destFile`, checking whether 
`destFile` already exists, has
+   * the same contents as the downloaded file, and can be overwritten. 
Optionally removes
+   * `sourceFile` by moving instead of copying.
--- End diff --

Can you elaborate on what happens when `destFile` already exists, when it 
has the same contents as the downloaded file, and when it cannot be 
overwritten? Also, if you followed my suggestion to throw an exception if 
delete fails, we should describe that behavior here as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-09 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21556751
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
--- End diff --

Same here, this java doc should be more elaborate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-09 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21556861
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
--- End diff --

Hm, but you don't always move the file. Also, from the name 
`downloadStreamAndMove` it's not clear what you're moving. I think it actually 
just makes sense to call this `downloadFile`, and describe in great detail when 
the file will actually be moved.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-09 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21557057
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
+
+  }
+
+  /**
+   * Copy file from `sourceFile` to `destFile`, checking whether 
`destFile` already exists, has
+   * the same contents as the downloaded file, and can be overwritten. 
Optionally removes
+   * `sourceFile` by moving instead of copying.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param sourceFile File path to copy/move from.
+   * @param destFile File path to copy/move to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   * @param removeSourceFile Whether to remove `sourceFile` after / as 
part of moving/copying it to
+   * `destFile`.
+   */
+  private def copyFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean,
+removeSourceFile: Boolean = false): Unit = {
+
+var shouldCopy = true
+if (destFile.exists) {
+  if (!Files.equal(sourceFile, destFile)) {
+if (fileOverwrite) {
+  destFile.delete()
+  logInfo(
+sFile $destFile exists and does not match contents of $url, 
replacing it with $url
+  )
+} else {
+  throw new SparkException(
+sFile $destFile exists and does not match contents of $url)
+}
+  } else {
+// Do nothing if the file contents are the same, i.e. this file 
has been copied
+// previously.
+logInfo(
+  s${sourceFile.getAbsolutePath} has been previously copied to  +
+destFile.getAbsolutePath
--- End diff --

weird mix of interpolation + concatenation here. Should just use 
interpolation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-09 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21557323
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
+
+  }
+
+  /**
+   * Copy file from `sourceFile` to `destFile`, checking whether 
`destFile` already exists, has
+   * the same contents as the downloaded file, and can be overwritten. 
Optionally removes
+   * `sourceFile` by moving instead of copying.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param sourceFile File path to copy/move from.
+   * @param destFile File path to copy/move to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   * @param removeSourceFile Whether to remove `sourceFile` after / as 
part of moving/copying it to
+   * `destFile`.
+   */
+  private def copyFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean,
+removeSourceFile: Boolean = false): Unit = {
+
+var shouldCopy = true
+if (destFile.exists) {
+  if (!Files.equal(sourceFile, destFile)) {
+if (fileOverwrite) {
+  destFile.delete()
--- End diff --

This is a good point; the old code (which was a mess) didn't handle this 
error case, but we might as well fix it here.  I'm in favor of throwing an 
exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21501389
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
+
--- End diff --

Mind removing this extra blank line?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21501367
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
--- End diff --

Fairly minor style nit, but do you mind indenting these method parameters 
by two more spaces, like `fetchFile` above:

```
   def fetchFile(
   url: String,
   targetDir: File,
   conf: SparkConf,
   securityMgr: SecurityManager,
   hadoopConf: Configuration,
   timestamp: Long,
   useCache: Boolean) {
 val fileName = url.split(/).last
```

This is a really minor style point that I've been guilty of overlooking 
myself: https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21501514
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
+
+  }
+
+  /**
+   * Copy file from `sourceFile` to `destFile`, checking whether 
`destFile` already exists, has
+   * the same contents as the downloaded file, and can be overwritten. 
Optionally removes
+   * `sourceFile` by moving instead of copying.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param sourceFile File path to copy/move from.
+   * @param destFile File path to copy/move to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   * @param removeSourceFile Whether to remove `sourceFile` after / as 
part of moving/copying it to
+   * `destFile`.
+   */
+  private def copyFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean,
+removeSourceFile: Boolean = false): Unit = {
+
+var shouldCopy = true
--- End diff --

This is super-nitpicky of me, but I love to avoid mutability whenever 
possible, so it would be nice to see if there was a clean way to remove this 
variable.  It looks like `shouldCopy=false` only in the case where the file 
contents are the same, so maybe we could just add a `return` on that branch and 
can remove the `shouldCopy` variable entirely.  This would let us remove the 
`if` on line 478,


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21501964
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
--- End diff --

This should probably be surrounded in a `try-catch` block so that we delete 
`tempFile` if `copyFile` throws an exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21502123
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +408,85 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from `in` to `tempFile`, then move it to `destFile`, 
checking whether
+   * `destFile` already exists, has the same contents as the downloaded 
file, and can be
+   * overwritten.
+   *
+   * @param url URL that `sourceFile` originated from, for logging 
purposes.
+   * @param in InputStream to download.
+   * @param tempFile File path to download `in` to.
+   * @param destFile File path to move `tempFile` to.
+   * @param fileOverwrite Whether to delete/overwrite an existing 
`destFile` that does not match
+   *  `sourceFile`
+   */
+  private def downloadStreamAndMove(
+url: String,
+in: InputStream,
+tempFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+copyFile(url, tempFile, destFile, fileOverwrite, removeSourceFile = 
true)
--- End diff --

I noticed this because there was a `tempFile.delete()` call in the old code 
that wasn't preserved here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-08 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-66222053
  
Left a couple more comments.  This is looking _much_ better than the code 
that was there before; thanks for your patience with the really late review.  I 
have to run now, but I'll loop back later to take a more detailed look later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-05 Thread ryan-williams
Github user ryan-williams commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21386174
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +403,48 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
--- End diff --

sure, will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-05 Thread ryan-williams
Github user ryan-williams commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21386186
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +403,48 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
+   * already exists, is equal to the downloaded file, and can be 
overwritten.
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-05 Thread ryan-williams
Github user ryan-williams commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21386332
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +403,48 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
+   * already exists, is equal to the downloaded file, and can be 
overwritten.
+   */
+  private def maybeMoveFile(
+url: String,
--- End diff --

Yes. The idea was to abstract out this duplicated code from ~4 other places 
in this file, all of which has a log message that includes the url, without 
changing any functionality.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-05 Thread ryan-williams
Github user ryan-williams commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21386398
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +403,48 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
+   * already exists, is equal to the downloaded file, and can be 
overwritten.
+   */
+  private def maybeMoveFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+var shouldCopy = true
--- End diff --

geez, seems like I had a bad merge at some point and dropped an important 
code path in this function; e.g. the `Files.move` call is gone. will audit and 
fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-05 Thread ryan-williams
Github user ryan-williams commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21386447
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +403,48 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
+   * already exists, is equal to the downloaded file, and can be 
overwritten.
+   */
+  private def maybeMoveFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+var shouldCopy = true
+if (destFile.exists) {
--- End diff --

yea, merge bug I think; looking


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-05 Thread ryan-williams
Github user ryan-williams commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21386501
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +403,48 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
+   * already exists, is equal to the downloaded file, and can be 
overwritten.
+   */
+  private def maybeMoveFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+var shouldCopy = true
+if (destFile.exists) {
+  if (!Files.equal(sourceFile, destFile)) {
+if (fileOverwrite) {
+  destFile.delete()
+  logInfo((File %s exists and does not match contents of %s,  +
--- End diff --

sure, done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-05 Thread ryan-williams
Github user ryan-williams commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21386652
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +403,48 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
+   * already exists, is equal to the downloaded file, and can be 
overwritten.
+   */
+  private def maybeMoveFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+var shouldCopy = true
+if (destFile.exists) {
+  if (!Files.equal(sourceFile, destFile)) {
+if (fileOverwrite) {
+  destFile.delete()
+  logInfo((File %s exists and does not match contents of %s,  +
+replacing it with %s).format(destFile, url, url))
--- End diff --

one would think so; this is how it was at 
[L459](https://github.com/apache/spark/pull/2848/files#diff-d239aee594001f8391676e1047a0381eL459),
 
[L477](https://github.com/apache/spark/pull/2848/files#diff-d239aee594001f8391676e1047a0381eL477),
 and 
[L506](https://github.com/apache/spark/pull/2848/files#diff-d239aee594001f8391676e1047a0381eL506).
 Not obvious to me what the original intent was but I'll try to figure it out 
and do something more sensible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-05 Thread ryan-williams
Github user ryan-williams commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21386945
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +403,48 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
+   * already exists, is equal to the downloaded file, and can be 
overwritten.
+   */
+  private def maybeMoveFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+var shouldCopy = true
+if (destFile.exists) {
+  if (!Files.equal(sourceFile, destFile)) {
+if (fileOverwrite) {
+  destFile.delete()
+  logInfo((File %s exists and does not match contents of %s,  +
+replacing it with %s).format(destFile, url, url))
+} else {
+  throw new SparkException(
+File  + destFile +  exists and does not match contents of 
+   + url)
+}
+  } else {
+// Do nothing if the file contents are the same, i.e. this file 
has been copied
+// previously.
+logInfo(sourceFile.getAbsolutePath +  has been previously copied 
to 
--- End diff --

done, though it went to 101 chars and I had to wrap it, which is not the 
clearest, but whatevs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-05 Thread ryan-williams
Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-65827581
  
Thanks for the review pass, @JoshRosen.

As I mentioned in some of the comments, this was attempting to shoehorn 4 
basically-identical blocks of code from elsewhere in this file into one common 
code path (it started with 1 or 2 and then grew to encompass the others as 
previous reviewers suggested that that was desirable).

The PR has been mostly sitting around for many weeks and apparently one of 
the maintenance merges I did removed some key bits; I'll fix those and bump 
this again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-05 Thread ryan-williams
Github user ryan-williams commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21389032
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +403,48 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
+   * already exists, is equal to the downloaded file, and can be 
overwritten.
+   */
+  private def maybeMoveFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+var shouldCopy = true
+if (destFile.exists) {
+  if (!Files.equal(sourceFile, destFile)) {
+if (fileOverwrite) {
+  destFile.delete()
+  logInfo((File %s exists and does not match contents of %s,  +
+replacing it with %s).format(destFile, url, url))
--- End diff --

on reading it again, I think the original code (that predates me) made 
sense: it asserting that the file exists and does not match `url`, therefore we 
are replacing the thing that already exists with what is at `url`, because that 
is what we want it to match.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-05 Thread ryan-williams
Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-65834877
  
OK @JoshRosen I fixed and cleaned things up.

* the two overloaded `maybeMoveFile` signatures are more distinctly named 
(`downloadStreamAndMove` and `copyFile`)
* each is called twice, and the former actually uses the latter, so I think 
there are some overall code-reuse wins, in addition to the don't move/copy if 
the thing is already there bug fix happening in one place.

I'm open to writing some tests but it's not that obvious how to do so in a 
way that is meaningful / doesn't just duplicate the logic in the file itself. 
The fetch file code path is not tested in the existing `UtilsSuite` afaict, 
and it doesn't feel worth adding an integration test around, imho. lmk what 
else you'd like.

Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-04 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21350361
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +403,48 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
--- End diff --

Style issue: I think it's clearer to use backticks, e.g.

```
download a file from `in` to `tempFile` ...
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-04 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21350382
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +403,48 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
+   * already exists, is equal to the downloaded file, and can be 
overwritten.
--- End diff --

is equal to is potentially ambiguous; do you think it would be clearer to 
say has the same contents as?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-04 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21350418
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +403,48 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
+   * already exists, is equal to the downloaded file, and can be 
overwritten.
+   */
+  private def maybeMoveFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+var shouldCopy = true
+if (destFile.exists) {
+  if (!Files.equal(sourceFile, destFile)) {
+if (fileOverwrite) {
+  destFile.delete()
+  logInfo((File %s exists and does not match contents of %s,  +
+replacing it with %s).format(destFile, url, url))
--- End diff --

`url, url` here is a bug?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-04 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21350409
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +403,48 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
+   * already exists, is equal to the downloaded file, and can be 
overwritten.
+   */
+  private def maybeMoveFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+var shouldCopy = true
+if (destFile.exists) {
+  if (!Files.equal(sourceFile, destFile)) {
+if (fileOverwrite) {
+  destFile.delete()
+  logInfo((File %s exists and does not match contents of %s,  +
--- End diff --

As long as you're modifying this code, it might be nice to use string 
formatters, e.g. `sFile $destFile exists and does not match the contents of 
$url ...`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-04 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21350464
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +403,48 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
+   * already exists, is equal to the downloaded file, and can be 
overwritten.
+   */
+  private def maybeMoveFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+var shouldCopy = true
--- End diff --

This variable isn't read anywhere; I'd remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-04 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21350437
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +403,48 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
+   * already exists, is equal to the downloaded file, and can be 
overwritten.
+   */
+  private def maybeMoveFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+var shouldCopy = true
+if (destFile.exists) {
+  if (!Files.equal(sourceFile, destFile)) {
+if (fileOverwrite) {
+  destFile.delete()
+  logInfo((File %s exists and does not match contents of %s,  +
+replacing it with %s).format(destFile, url, url))
+} else {
+  throw new SparkException(
+File  + destFile +  exists and does not match contents of 
+   + url)
+}
+  } else {
+// Do nothing if the file contents are the same, i.e. this file 
has been copied
+// previously.
+logInfo(sourceFile.getAbsolutePath +  has been previously copied 
to 
--- End diff --

Same here; might be nice to use string formatters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-04 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21350546
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +403,48 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
+   * already exists, is equal to the downloaded file, and can be 
overwritten.
+   */
+  private def maybeMoveFile(
+url: String,
+sourceFile: File,
+destFile: File,
+fileOverwrite: Boolean): Unit = {
+
+var shouldCopy = true
+if (destFile.exists) {
--- End diff --

There's no `else` branch for this `if`, which seems like an error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-04 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r21350677
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +403,48 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
+   * already exists, is equal to the downloaded file, and can be 
overwritten.
+   */
+  private def maybeMoveFile(
+url: String,
--- End diff --

I think that `url` is only used in log messages; is this intentional?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-12-04 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-65737090
  
Hi @ryan-williams,

I took a quick glance through the code and find the new `maybeMoveFile 
`method to be a bit confusing:

- It takes a `url` parameter that it doesn't use for anything other than 
logging.
- The name `maybe` implies an operation that can succeed or fail; it would 
be nice if the method's documentation clarified how success / failure is 
reported, maybe as a Scaladoc `@throws` annotation of a `Try` or something.  
This is important since the caller may need to know that the source file wasn't 
moved so that it can perform proper cleanup (e.g. delete the file).
- The different cases don't seem to be exhaustively handled; what happens 
if the destination file _doesn't_ already exist?
- One of the overloaded versions seems like it has two responsibilities: 
download a file then move it; it would be nice to separate these concerns to 
keep the individual methods simple and easy to understand.

Do you know if we have a test suite for this file copying code?  If not, we 
should create one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-11-30 Thread ryan-williams
Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-65020585
  
@pwendell @andrewor14 want to give this another look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-11-07 Thread ryan-williams
Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-62165292
  
OK, I refactored a little further. Updates:

* renamed the helper function `maybeMoveFile` (instead of `moveFile`)
* introduced a second signature for `maybeMoveFile` that just takes two 
`File`s
* this allowed me to bring the 3rd instance of this repeated logic in 
`Utils.doFetchFile` into the fold, which helps the overall consistency / 
cleanliness a lot, I think.
* incidentally, that last code path handled the `exists` vs. `delete()` 
trickery differently than I was doing before; it used a boolean `var` that 
recorded explicitly whether we `shouldCopy` (`true` to start, set to `false` 
iff we found an identical file to exist). I decided that this way was cleaner, 
per @andrewor14's and @pwendell's (earlier in this thread) suggestions, and 
structured `maybeMoveFile` that way.
* folded the code path around L397 into `maybeMoveFile` as well, per 
@andrewor14's last suggestion.

lmk how it looks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-11-06 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r19985577
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +412,36 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
+   * already exists, is equal to the downloaded file, and can be 
overwritten.
+   */
+  private def moveFile(url: String,
+   in: InputStream,
+   tempFile: File,
+   destFile: File,
+   fileOverwrite: Boolean): Unit = {
--- End diff --

style should be
```
private def moveFile(
url: String,
...
fileOverwrite: Boolean): Unit = {
  ...
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-11-06 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-62073167
  
retest this please, just because Jenkins hasn't tested this one in a while.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-11-06 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-62073442
  
Hey @ryan-williams +1 on abstracting the duplicate code. Just one 
clarification: the only change in functionality here is it saves overwriting an 
existing jar unnecessarily, is that correct? How does this solve SPARK-3967?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-11-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-62073441
  
  [Test build #23029 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23029/consoleFull)
 for   PR 2848 at commit 
[`80a20bd`](https://github.com/apache/spark/commit/80a20bd8422ebd78b89bca6f1655959383b0c701).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-11-06 Thread ryan-williams
Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-62074970
  
Thanks for having a look, I'll get to that style nit momentarily.

The tie-in to SPARK-3967 is that, on my cluster, when attempting to 
redundantly copy the JAR deps, I was getting Permission Denied errors because 
not only were the JARs already there, they were read-only and could not be 
overwritten.

I did not get to the bottom of who was putting them there; it seemed like 
every executor would try to copy its JARs, find them to be already present, try 
to copy anyway, and fail due to permission denied.

Simply skipping the redundant copy attempt (as this PR does) resulted in my 
jobs passing, and it seemed like the original authors intended to not attempt 
to redundantly copy but had a bug, so I settled for this fix and didn't get 
fully to the bottom of this.

For example, suppose an executor E hit the code path where a given JAR is 
present but its contents are not identical to the one E wants. In this case, E 
would attempt to overwrite the existing JAR and fail due to the same 
Permission denied, so there could be some larger way in which it needs to be 
made possible for executors to overwrite their JAR deps (maybe that is just a 
cluster config issue I and @preaudc are hitting?), but if so I think that's 
orthogonal to this change.

Let me know if I should just take the SPARK-3967 label out of this PR, 
since it might not fully solve the issue.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-11-06 Thread ryan-williams
Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-62075551
  
fixed the style nit, also added a comment nit I found in CoGroupedRDD, lmk 
if you want that in a separate PR or something


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-11-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-62081923
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23029/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-11-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-62081916
  
  [Test build #23029 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23029/consoleFull)
 for   PR 2848 at commit 
[`80a20bd`](https://github.com/apache/spark/commit/80a20bd8422ebd78b89bca6f1655959383b0c701).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-11-06 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r19990853
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +412,37 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
+   * already exists, is equal to the downloaded file, and can be 
overwritten.
+   */
+  private def moveFile(
+  url: String,
+  in: InputStream,
+  tempFile: File,
+  destFile: File,
+  fileOverwrite: Boolean): Unit = {
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+if (destFile.exists  !Files.equal(tempFile, destFile)) {
+  if (fileOverwrite) {
+destFile.delete()
+logInfo((File %s exists and does not match contents of %s,  +
+  replacing it with %s).format(destFile, url, url))
--- End diff --

Hey @ryan-williams my concern is that if this fails (e.g. because of some 
permissions thing), then we will always have an old version of the file because 
`destFile.exists` will always be true down there in L440. It might make sense 
to throw an exception if delete is unsuccessful instead of logging an incorrect 
message here (I realize you just moved this code, but it would still be good to 
fix this).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-11-06 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2848#discussion_r19990952
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -412,6 +412,37 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Download a file from @in to @tempFile, then move it to @destFile, 
checking whether @destFile
+   * already exists, is equal to the downloaded file, and can be 
overwritten.
+   */
+  private def moveFile(
+  url: String,
+  in: InputStream,
+  tempFile: File,
+  destFile: File,
+  fileOverwrite: Boolean): Unit = {
+val out = new FileOutputStream(tempFile)
+Utils.copyStream(in, out, closeStreams = true)
+if (destFile.exists  !Files.equal(tempFile, destFile)) {
+  if (fileOverwrite) {
+destFile.delete()
+logInfo((File %s exists and does not match contents of %s,  +
+  replacing it with %s).format(destFile, url, url))
+  } else {
+tempFile.delete()
+throw new SparkException(
+  File  + destFile +  exists and does not match contents of + 
  + url)
+  }
+}
+// If the destFile exists at this point, it is equal to tempFile and 
we can skip moving
+// it; the code above deletes it or throws an Exception, as 
appropriate, if it exists and
+// is not equal to tempFile.
--- End diff --

Hey @ryan-williams my concern is that if the delete fails (e.g. because of 
some permissions thing), then we will always have an old version of the file 
because `destFile.exists` will always be true. It might make sense to throw an 
exception if delete is unsuccessful instead of logging a message up there in 
L429 that is technically not true (I realize you just moved the code, but it 
would still be good to fix this).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-11-06 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-62087106
  
Hey @ryan-williams thanks for your explanation. Looks like there could be 
another potential source of the same issue in L397, when we use the executor 
cache.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-26 Thread ryan-williams
Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-60550074
  
bump. @JoshRosen @pwendell lmk what the process is please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-26 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-60550655
  
At this point its waiting on review from us. No action needed on your side.
I need to look this one over again since it was refactored but I'm sure we
can get it in soon.

---
Sent from my phone
On Oct 26, 2014 9:49 PM, Ryan Williams notificati...@github.com wrote:

 bump. @JoshRosen https://github.com/JoshRosen @pwendell
 https://github.com/pwendell lmk what the process is please?

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/2848#issuecomment-60550074.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-26 Thread ryan-williams
Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-60550725
  
cool, thanks @pwendell. I am new here so don't know if there are other 
things I'm supposed to be doing!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-22 Thread ryan-williams
Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-60172174
  
bump; what's the process for getting things merged in? any more comments 
here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-60014039
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-20 Thread ryan-williams
Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-59785816
  
@preaudc see last commit, I applied this change to the `case _` as well, 
per your suggestion!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-20 Thread ryan-williams
Github user ryan-williams commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-59807140
  
Jenkins, test this please. Does that work if I am not an admin?

@pwendell agreed, the logic is a little tricky but I couldn't find a 
simpler way to express it; in the meantime, I factored it out since it was 
repeated in two `case`s


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-20 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-59807368
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-59808502
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21924/consoleFull)
 for   PR 2848 at commit 
[`5f1a6f1`](https://github.com/apache/spark/commit/5f1a6f184dae3950f9b5348f8d1d997be1b6e7d4).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-59820947
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21924/consoleFull)
 for   PR 2848 at commit 
[`5f1a6f1`](https://github.com/apache/spark/commit/5f1a6f184dae3950f9b5348f8d1d997be1b6e7d4).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-59820962
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21924/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-19 Thread ryan-williams
GitHub user ryan-williams opened a pull request:

https://github.com/apache/spark/pull/2848

[SPARK-3967] don’t redundantly overwrite executor JAR deps



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ryan-williams/spark fetch-file

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2848.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2848


commit f3c80ae80be474ed9928b319f2b0d7808b028915
Author: Ryan Williams ryan.blake.willi...@gmail.com
Date:   2014-10-17T22:21:23Z

don’t redundantly overwrite executor JAR deps

see SPARK-3967




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-59669916
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-19 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-59677498
  
Jenkins, test this please. This LGTM pending tests - I do wonder whether 
there is a simple way to structure the logic such that we don't need a big 
comment explaining the safety there, but I couldn't think of one that didn't 
involve a lot of nesting.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-59677821
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21901/consoleFull)
 for   PR 2848 at commit 
[`f3c80ae`](https://github.com/apache/spark/commit/f3c80ae80be474ed9928b319f2b0d7808b028915).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-59681083
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21901/consoleFull)
 for   PR 2848 at commit 
[`f3c80ae`](https://github.com/apache/spark/commit/f3c80ae80be474ed9928b319f2b0d7808b028915).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3967] don’t redundantly overwrite exe...

2014-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2848#issuecomment-59681088
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21901/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org