I think that getting them from the ASF mirrors is a better strategy in
general as it'll remove the overhead of keeping the S3 bucket up to
date. It works in the spark-ec2 case because we only support a limited
number of Hadoop versions from the tool. FWIW I don't have write
access to the bucket and also haven't heard of any plans to support
newer versions in spark-ec2.

Thanks
Shivaram

On Sun, Nov 1, 2015 at 2:30 AM, Steve Loughran <ste...@hortonworks.com> wrote:
>
> On 1 Nov 2015, at 03:17, Nicholas Chammas <nicholas.cham...@gmail.com>
> wrote:
>
> https://s3.amazonaws.com/spark-related-packages/
>
> spark-ec2 uses this bucket to download and install HDFS on clusters. Is it
> owned by the Spark project or by the AMPLab?
>
> Anyway, it looks like the latest Hadoop install available on there is Hadoop
> 2.4.0.
>
> Are there plans to add newer versions of Hadoop for use by spark-ec2 and
> similar tools, or should we just be getting that stuff via an Apache mirror?
> The latest version is 2.7.1, by the way.
>
>
> you should be grabbing the artifacts off the ASF and then verifying their
> SHA1 checksums as published on the ASF HTTPS web site
>
>
> The problem with the Apache mirrors, if I am not mistaken, is that you
> cannot use a single URL that automatically redirects you to a working mirror
> to download Hadoop. You have to pick a specific mirror and pray it doesn't
> disappear tomorrow.
>
>
> They don't go away, especially http://mirror.ox.ac.uk , and in the us the
> apache.osuosl.org, osu being a where a lot of the ASF servers are kept.
>
> full list with availability stats
>
> http://www.apache.org/mirrors/
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to