[GitHub] spark pull request #19851: [SPARK-22654][TESTS] Retry Spark tarball download...

2017-11-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19851


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19851: [SPARK-22654][TESTS] Retry Spark tarball download...

2017-11-30 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19851#discussion_r154029237
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -50,14 +52,24 @@ class HiveExternalCatalogVersionsSuite extends 
SparkSubmitTestUtils {
 super.afterAll()
   }
 
-  private def downloadSpark(version: String): Unit = {
-import scala.sys.process._
+  private def tryDownloadSpark(version: String, path: String): Unit = {
+// Try mirrors a few times until one succeeds
+for (i <- 0 until 3) {
+  val preferredMirror =
+Seq("wget", 
"https://www.apache.org/dyn/closer.lua?preferred=true;, "-q", "-O", "-").!!.trim
+  val url = 
s"$preferredMirror/spark/spark-$version/spark-$version-bin-hadoop2.7.tgz"
+  logInfo(s"Downloading Spark $version from $url")
+  if (Seq("wget", url, "-q", "-P", path).! == 0) {
+return
+  }
+  logWarning(s"Failed to download Spark $version from $url")
+}
+fail(s"Unable to download Spark $version")
--- End diff --

btw, I've also seen a mirror abruptly ending a download but not getting 
reported as an error, resulting in an incomplete/corrupted tgz.

it's possible the mirror misreports the response byte size in that case.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19851: [SPARK-22654][TESTS] Retry Spark tarball download...

2017-11-30 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19851#discussion_r154027519
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -50,14 +52,24 @@ class HiveExternalCatalogVersionsSuite extends 
SparkSubmitTestUtils {
 super.afterAll()
   }
 
-  private def downloadSpark(version: String): Unit = {
-import scala.sys.process._
+  private def tryDownloadSpark(version: String, path: String): Unit = {
+// Try mirrors a few times until one succeeds
+for (i <- 0 until 3) {
+  val preferredMirror =
+Seq("wget", 
"https://www.apache.org/dyn/closer.lua?preferred=true;, "-q", "-O", "-").!!.trim
+  val url = 
s"$preferredMirror/spark/spark-$version/spark-$version-bin-hadoop2.7.tgz"
+  logInfo(s"Downloading Spark $version from $url")
--- End diff --

FYI not sure which is the better way, in SparkR we get a list with 
`http://www.apache.org/dyn/closer.cgi_json=1`

```
{
   "backup": [ "http://www-eu.apache.org/dist/;, 
"http://www-us.apache.org/dist/; ],
 "cca2": "us",
  "ftp": [ "ftp://apache.cs.utah.edu/apache.org/;, 
"ftp://apache.mirrors.tds.net/pub/apache.org/;, 
"ftp://ftp.osuosl.org/pub/apache/;, "ftp://mirror.reverse.net/pub/apache/; ],
 "http": [ "http://apache.claz.org/;, "http://apache.cs.utah.edu/;, 
"http://apache.mesi.com.ar/;, "http://apache.mirrors.hoobly.com/;, 
"http://apache.mirrors.ionfish.org/;, 
"http://apache.mirrors.lucidnetworks.net/;, "http://apache.mirrors.pair.com/;, 
"http://apache.mirrors.tds.net/;, "http://apache.osuosl.org/;, 
"http://download.nextag.com/apache/;, "http://ftp.wayne.edu/apache/;, 
"http://mirror.cc.columbia.edu/pub/software/apache/;, 
"http://mirror.cogentco.com/pub/apache/;, 
"http://mirror.jax.hugeserver.com/apache/;, 
"http://mirror.metrocast.net/apache/;, 
"http://mirror.olnevhost.net/pub/apache/;, 
"http://mirror.reverse.net/pub/apache/;, 
"http://mirror.stjschools.org/public/apache/;, 
"http://mirrors.advancedhosters.com/apache/;, 
"http://mirrors.gigenet.com/apache/;, "http://mirrors.koehn.com/apache/;, 
"http://mirrors.ocf.berkeley.edu/apache/;, "http://mirrors.sonic.net/apache/;, 
"http://supergsego.com/apache/;, "http://www.gtlib.gatech.edu/pub/apache/;, 
"http://www.name
 sdir.com/mirrors/apache/", "http://www.trieuvan.com/apache/; ],
"path_info": "",
"preferred": "http://mirror.cc.columbia.edu/pub/software/apache/;
}
```
 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19851: [SPARK-22654][TESTS] Retry Spark tarball download...

2017-11-30 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19851#discussion_r154027755
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -50,14 +52,24 @@ class HiveExternalCatalogVersionsSuite extends 
SparkSubmitTestUtils {
 super.afterAll()
   }
 
-  private def downloadSpark(version: String): Unit = {
-import scala.sys.process._
+  private def tryDownloadSpark(version: String, path: String): Unit = {
+// Try mirrors a few times until one succeeds
+for (i <- 0 until 3) {
+  val preferredMirror =
+Seq("wget", 
"https://www.apache.org/dyn/closer.lua?preferred=true;, "-q", "-O", "-").!!.trim
+  val url = 
s"$preferredMirror/spark/spark-$version/spark-$version-bin-hadoop2.7.tgz"
+  logInfo(s"Downloading Spark $version from $url")
--- End diff --

it is possible to also pass `path` but from testing, it doesn't seem like 
the path/path_info value is validated in any way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19851: [SPARK-22654][TESTS] Retry Spark tarball download...

2017-11-29 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/19851

[SPARK-22654][TESTS] Retry Spark tarball download if failed in 
HiveExternalCatalogVersionsSuite

## What changes were proposed in this pull request?

Adds a simple loop to retry download of Spark tarballs from different 
mirrors if the download fails.

## How was this patch tested?

Existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-22654

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19851.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19851


commit 375a8de32dd553100d1b0fd66a7081400a1c7cbd
Author: Sean Owen 
Date:   2017-11-29T19:05:32Z

Retry Spark tarball download if failed in HiveExternalCatalogVersionsSuite




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org