[ 
https://issues.apache.org/jira/browse/SPARK-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shane knapp updated SPARK-3745:
-------------------------------
    Description: 
in spark/dev/check-license, there are four attempts to download the apache rat 
jar from maven:

{noformat}
  
URL1="http://search.maven.org/remotecontent?filepath=org/apache/rat/apache-rat/${RAT_VERSION}/apache-rat-${RAT_VERSION}.jar";
  
URL2="http://repo1.maven.org/maven2/org/apache/rat/apache-rat/${RAT_VERSION}/apache-rat-${RAT_VERSION}.jar";

*snip*

    if hash curl 2>/dev/null; then
      (curl --silent ${URL1} > "$JAR_DL" || curl --silent ${URL2} > "$JAR_DL") 
&& mv "$JAR_DL" "$JAR"
    elif hash wget 2>/dev/null; then
      (wget --quiet ${URL1} -O "$JAR_DL" || wget --quiet ${URL2} -O "$JAR_DL") 
&& mv "$JAR_DL" "$JAR"
{noformat}

the first attempt is on the search repo via curl, which returns a "YEP!  WE 
FOUND IT!" html blob:

{noformat}
[root@test01 sknapp]# curl --silent 
http://search.maven.org/remotecontent?filepath=org/apache/rat/apache-rat/0.10/apache-rat-0.10.jar
 > test.part
[root@test01 sknapp]# cat test.part
<html>
<head><title>302 Found</title></head>
<body bgcolor="white">
<center><h1>302 Found</h1></center>
<hr><center>nginx/0.8.55</center>
</body>
</html>
{noformat}

this is failing to DL for EVERY time the test is run.  i've run curl on the 2nd 
url, which points at the repo itself and it successfully downloads.  wget does 
the correct thing for both URLs.

there is also no error checking on the downloaded file, short of file existence.

potential fixes, in no particular order:
1) run unzip -tq ${$JAR}, check for 0 exist status to ensure it's a compressed 
archive
2) run wget before curl
3) only run curl on the 2nd URL (pointing directly to the repo)

  was:
in spark/dev/check-license, there are four attempts to download the apache rat 
jar from maven:

{noformat}
  
URL1="http://search.maven.org/remotecontent?filepath=org/apache/rat/apache-rat/${RAT_VERSION}/apache-rat-${RAT_VERSION}.jar";
  
URL2="http://repo1.maven.org/maven2/org/apache/rat/apache-rat/${RAT_VERSION}/apache-rat-${RAT_VERSION}.jar";

*snip*

    if hash curl 2>/dev/null; then
      (curl --silent ${URL1} > "$JAR_DL" || curl --silent ${URL2} > "$JAR_DL") 
&& mv "$JAR_DL" "$JAR"
    elif hash wget 2>/dev/null; then
      (wget --quiet ${URL1} -O "$JAR_DL" || wget --quiet ${URL2} -O "$JAR_DL") 
&& mv "$JAR_DL" "$JAR"
{noformat}

the first attempt is on the search repo via curl, which returns a "YEP!  WE 
FOUND IT!" html blob:

{noformat}
[root@test01 sknapp]# curl --silent 
http://search.maven.org/remotecontent?filepath=org/apache/rat/apache-rat/0.10/apache-rat-0.10.jar
 > test.part
######################################################################## 100.0%
[root@test01 sknapp]# cat test.part
<html>
<head><title>302 Found</title></head>
<body bgcolor="white">
<center><h1>302 Found</h1></center>
<hr><center>nginx/0.8.55</center>
</body>
</html>
{noformat}

this is failing to DL for EVERY time the test is run.  i've run curl on the 2nd 
url, which points at the repo itself and it successfully downloads.  wget does 
the correct thing for both URLs.

there is also no error checking on the downloaded file, short of file existence.

potential fixes, in no particular order:
1) run unzip -tq ${$JAR}, check for 0 exist status to ensure it's a compressed 
archive
2) run wget before curl
3) only run curl on the 2nd URL (pointing directly to the repo)


> curl on maven search repo apache rat url returns search status, not jar file
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-3745
>                 URL: https://issues.apache.org/jira/browse/SPARK-3745
>             Project: Spark
>          Issue Type: Bug
>          Components: Build
>         Environment: centos 6.5
>            Reporter: shane knapp
>              Labels: build-failure, easyfix, test
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> in spark/dev/check-license, there are four attempts to download the apache 
> rat jar from maven:
> {noformat}
>   
> URL1="http://search.maven.org/remotecontent?filepath=org/apache/rat/apache-rat/${RAT_VERSION}/apache-rat-${RAT_VERSION}.jar";
>   
> URL2="http://repo1.maven.org/maven2/org/apache/rat/apache-rat/${RAT_VERSION}/apache-rat-${RAT_VERSION}.jar";
> *snip*
>     if hash curl 2>/dev/null; then
>       (curl --silent ${URL1} > "$JAR_DL" || curl --silent ${URL2} > 
> "$JAR_DL") && mv "$JAR_DL" "$JAR"
>     elif hash wget 2>/dev/null; then
>       (wget --quiet ${URL1} -O "$JAR_DL" || wget --quiet ${URL2} -O 
> "$JAR_DL") && mv "$JAR_DL" "$JAR"
> {noformat}
> the first attempt is on the search repo via curl, which returns a "YEP!  WE 
> FOUND IT!" html blob:
> {noformat}
> [root@test01 sknapp]# curl --silent 
> http://search.maven.org/remotecontent?filepath=org/apache/rat/apache-rat/0.10/apache-rat-0.10.jar
>  > test.part
> [root@test01 sknapp]# cat test.part
> <html>
> <head><title>302 Found</title></head>
> <body bgcolor="white">
> <center><h1>302 Found</h1></center>
> <hr><center>nginx/0.8.55</center>
> </body>
> </html>
> {noformat}
> this is failing to DL for EVERY time the test is run.  i've run curl on the 
> 2nd url, which points at the repo itself and it successfully downloads.  wget 
> does the correct thing for both URLs.
> there is also no error checking on the downloaded file, short of file 
> existence.
> potential fixes, in no particular order:
> 1) run unzip -tq ${$JAR}, check for 0 exist status to ensure it's a 
> compressed archive
> 2) run wget before curl
> 3) only run curl on the 2nd URL (pointing directly to the repo)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to