GitHub user dongjoon-hyun opened a pull request:
https://github.com/apache/spark/pull/21210
[SPARK-23489][SQL][TEST] HiveExternalCatalogVersionsSuite should verify the
downloaded file
## What changes were proposed in this pull request?
Although `HiveExternalCatalogVersionsSuite` designed to download from
Apache mirrors three times, it has been flaky because it didn't verify the
downloaded file. Some Apache mirrors terminate the downloading abnormally, the
*corrupted* file shows the following errors.
```
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
22:46:32.700 WARN
org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite:
= POSSIBLE THREAD LEAK IN SUITE
o.a.s.sql.hive.HiveExternalCatalogVersionsSuite, thread names: Keep-Alive-Timer
=
*** RUN ABORTED ***
java.io.IOException: Cannot run program "./bin/spark-submit" (in
directory "/tmp/test-spark/spark-2.2.0"): error=2, No such file or directory
```
This has been reported weirdly in two ways. For example, the above case is
reported as Case 2 `no failures`.
- Case 1. [Test Result (1 failure /
+1)](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/4389/)
- Case 2. [Test Result (no
failures)](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/4811/)
This PR aims to make `HiveExternalCatalogVersionsSuite` more robust by
verifying the downloaded `tgz` file by extracting and checking the existence of
`bin/spark-submit`. If it turns out that the file is empty or corrupted,
`HiveExternalCatalogVersionsSuite` will do retry logic like the download
failure.
## How was this patch tested?
Pass the Jenkins.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dongjoon-hyun/spark SPARK-23489
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21210.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21210
commit 51d4c0ed72c15893a112c39d9e360e4cfabe6a62
Author: Dongjoon Hyun
Date: 2018-05-02T04:48:21Z
[SPARK-23489][SQL][TEST] HiveExternalCatalogVersionsSuite should verify the
downloaded file
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org