GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/21210

    [SPARK-23489][SQL][TEST] HiveExternalCatalogVersionsSuite should verify the 
downloaded file

    ## What changes were proposed in this pull request?
    
    Although `HiveExternalCatalogVersionsSuite` designed to download from 
Apache mirrors three times, it has been flaky because it didn't verify the 
downloaded file. Some Apache mirrors terminate the downloading abnormally, the 
*corrupted* file shows the following errors.
    
    ```
    gzip: stdin: not in gzip format
    tar: Child returned status 1
    tar: Error is not recoverable: exiting now
    22:46:32.700 WARN 
org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite: 
    
    ===== POSSIBLE THREAD LEAK IN SUITE 
o.a.s.sql.hive.HiveExternalCatalogVersionsSuite, thread names: Keep-Alive-Timer 
=====
    
    *** RUN ABORTED ***
      java.io.IOException: Cannot run program "./bin/spark-submit" (in 
directory "/tmp/test-spark/spark-2.2.0"): error=2, No such file or directory
    ```
    
    This has been reported weirdly in two ways. For example, the above case is 
reported as Case 2 `no failures`.
    
    - Case 1. [Test Result (1 failure / 
+1)](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/4389/)
    - Case 2. [Test Result (no 
failures)](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/4811/)
    
    This PR aims to make `HiveExternalCatalogVersionsSuite` more robust by 
verifying the downloaded `tgz` file by extracting and checking the existence of 
`bin/spark-submit`. If it turns out that the file is empty or corrupted, 
`HiveExternalCatalogVersionsSuite` will do retry logic like the download 
failure.
    
    ## How was this patch tested?
    
    Pass the Jenkins.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-23489

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21210.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21210
    
----
commit 51d4c0ed72c15893a112c39d9e360e4cfabe6a62
Author: Dongjoon Hyun <dongjoon@...>
Date:   2018-05-02T04:48:21Z

    [SPARK-23489][SQL][TEST] HiveExternalCatalogVersionsSuite should verify the 
downloaded file

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to