Re: Downloading Hadoop from s3://spark-related-packages/

2015-12-24 Thread Nicholas Chammas
not that likely to get an answer as it’s really a support call, not a bug/task. The first question is about proper documentation of all the stuff we’ve been discussing in this thread, so one would think that’s a valid task. It doesn’t seem right that closer.lua, for example, is undocumented.

Re: Downloading Hadoop from s3://spark-related-packages/

2015-12-24 Thread Steve Loughran
On 24 Dec 2015, at 05:59, Nicholas Chammas > wrote: FYI: I opened an INFRA ticket with questions about how best to use the Apache mirror network. https://issues.apache.org/jira/browse/INFRA-10999 Nick not that likely to get an

Re: Downloading Hadoop from s3://spark-related-packages/

2015-12-23 Thread Nicholas Chammas
FYI: I opened an INFRA ticket with questions about how best to use the Apache mirror network. https://issues.apache.org/jira/browse/INFRA-10999 Nick On Mon, Nov 2, 2015 at 8:00 AM Luciano Resende wrote: > I am getting the same results using closer.lua versus close.cgi,

Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-02 Thread Luciano Resende
I am getting the same results using closer.lua versus close.cgi, which seems to be downloading a page where the user can choose the closest mirror. I tried to add parameters to follow redirect without much success. There seems to be already a jira for a similar request with infra:

Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Steve Loughran
On 1 Nov 2015, at 03:17, Nicholas Chammas > wrote: https://s3.amazonaws.com/spark-related-packages/ spark-ec2 uses this bucket to download and install HDFS on clusters. Is it owned by the Spark project or by the AMPLab? Anyway, it

Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Shivaram Venkataraman
I think that getting them from the ASF mirrors is a better strategy in general as it'll remove the overhead of keeping the S3 bucket up to date. It works in the spark-ec2 case because we only support a limited number of Hadoop versions from the tool. FWIW I don't have write access to the bucket

Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Nicholas Chammas
Oh, sweet! For example: http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz?asjson=1 Thanks for sharing that tip. Looks like you can also use as_json (vs. asjson). Nick ​

Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Shivaram Venkataraman
I think the lua one at https://svn.apache.org/repos/asf/infrastructure/site/trunk/content/dyn/closer.lua has replaced the cgi one from before. Also it looks like the lua one also supports `action=download` with a filename argument. So you could just do something like wget

Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Nicholas Chammas
Hmm, yeah, some Googling confirms this, though there isn't any clear documentation about this. Strangely, if I click on the link from your email the download works, but curl and wget somehow don't get redirected correctly... Nick On Sun, Nov 1, 2015 at 6:40 PM Shivaram Venkataraman <

Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Nicholas Chammas
OK, I’ll focus on the Apache mirrors going forward. The problem with the Apache mirrors, if I am not mistaken, is that you cannot use a single URL that automatically redirects you to a working mirror to download Hadoop. You have to pick a specific mirror and pray it doesn’t disappear tomorrow.

Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Shivaram Venkataraman
On Sun, Nov 1, 2015 at 2:16 PM, Nicholas Chammas wrote: > OK, I’ll focus on the Apache mirrors going forward. > > The problem with the Apache mirrors, if I am not mistaken, is that you > cannot use a single URL that automatically redirects you to a working mirror > to

Downloading Hadoop from s3://spark-related-packages/

2015-10-31 Thread Nicholas Chammas
https://s3.amazonaws.com/spark-related-packages/ spark-ec2 uses this bucket to download and install HDFS on clusters. Is it owned by the Spark project or by the AMPLab? Anyway, it looks like the latest Hadoop install available on there is Hadoop 2.4.0. Are there plans to add newer versions of