spark-ec2 uses old version of hadoop

Andy Davidson Tue, 23 Feb 2016 17:23:21 -0800

I do not have any hadoop legacy code. My goal is to run spark on top of
HDFS.


Recently I have been have hdfs corruption problem. I was also never able to
access S3 even though I used --copy-aws-credentials. I noticed that by
default the spark-ec2 script uses hadoop 1.0.4. I ran help and discovered
you can specify the hadoop-major version. It seems like this is still using
an old version of hadoop. I assume there are a lot of bug fixes between
hadoop 2.0.0 cdh4.2 and apache hadoop 2.6.x or 2.7.x

Any idea what I would need to do to move to a new version of hadoop hdfs?

Kind regards

Andy

[ec2-user@ip-172-31-18-23 ~]$  /root/ephemeral-hdfs/bin/hadoop version

Hadoop 2.0.0-cdh4.2.0

Subversion 
file:///var/lib/jenkins/workspace/CDH4.2.0-Packaging-Hadoop/build/cdh4/hadoo
p/2.0.0-cdh4.2.0/source/hadoop-common-project/hadoop-common -r
8bce4bd28a464e0a92950c50ba01a9deb1d85686

Compiled by jenkins on Fri Feb 15 10:42:32 PST 2013

>From source with checksum 3eefc211a14ac7b6e764d6ded2eeeb26

[ec2-user@ip-172-31-19-24 ~]$

spark-1.6.0-bin-hadoop2.6/ec2/spark-ec2 uses old version of hadoop

Reply via email to