Hi , I used spark-ec2 script to create ec2 cluster. Now I am trying copy data from s3 into hdfs. I am doing this *root@ip-172-31-21-160 ephemeral-hdfs]$ bin/hadoop distcp s3://<xxx>/home/mydata/small.sam hdfs://ec2-52-11-148-31.us-west-2.compute.amazonaws.com:9010/data1 <http://ec2-52-11-148-31.us-west-2.compute.amazonaws.com:9010/data1>*
and I get following error - 2015-03-06 01:39:27,299 INFO tools.DistCp (DistCp.java:run(109)) - Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[s3://<xxX>/home/mydata/small.sam], targetPath=hdfs:// ec2-52-11-148-31.us-west-2.compute.amazonaws.com:9010/data1} 2015-03-06 01:39:27,585 INFO mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid "mapreduce.jobtracker.address" configuration value for LocalJobRunner : " ec2-52-11-148-31.us-west-2.compute.amazonaws.com:9001" 2015-03-06 01:39:27,585 ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121) at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83) at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76) at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352) at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146) at org.apache.hadoop.tools.DistCp.run(DistCp.java:118) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:374) I tried doing start-all.sh , start-dfs.sh and start-yarn.sh what should I do ? Thanks -roni