On 1 Nov 2015, at 03:17, Nicholas Chammas <nicholas.cham...@gmail.com<mailto:nicholas.cham...@gmail.com>> wrote:
https://s3.amazonaws.com/spark-related-packages/ spark-ec2 uses this bucket to download and install HDFS on clusters. Is it owned by the Spark project or by the AMPLab? Anyway, it looks like the latest Hadoop install available on there is Hadoop 2.4.0. Are there plans to add newer versions of Hadoop for use by spark-ec2 and similar tools, or should we just be getting that stuff via an Apache mirror<http://hadoop.apache.org/releases.html>? The latest version is 2.7.1, by the way. you should be grabbing the artifacts off the ASF and then verifying their SHA1 checksums as published on the ASF HTTPS web site The problem with the Apache mirrors, if I am not mistaken, is that you cannot use a single URL that automatically redirects you to a working mirror to download Hadoop. You have to pick a specific mirror and pray it doesn't disappear tomorrow. They don't go away, especially http://mirror.ox.ac.uk , and in the us the apache.osuosl.org<http://apache.osuosl.org>, osu being a where a lot of the ASF servers are kept. full list with availability stats http://www.apache.org/mirrors/