Tomer, To use distcp, you need to have a Hadoop compute cluster up. start-dfs just restarts HDFS. I don’t have a Spark 1.0.2 cluster up right now, but there should be a start-mapred*.sh or start-all.sh script that will launch the Hadoop MapReduce cluster that you will need for distcp.
Regards, Frank Austin Nothaft fnoth...@berkeley.edu fnoth...@eecs.berkeley.edu 202-340-0466 On Sep 8, 2014, at 12:28 AM, Tomer Benyamini <tomer....@gmail.com> wrote: > ~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2; > > I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and > ~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same error > when trying to run distcp: > > ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered > > java.io.IOException: Cannot initialize Cluster. Please check your > configuration for mapreduce.framework.name and the correspond server > addresses. > > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121) > > at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83) > > at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76) > > at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352) > > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146) > > at org.apache.hadoop.tools.DistCp.run(DistCp.java:118) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > > at org.apache.hadoop.tools.DistCp.main(DistCp.java:374) > > Any idea? > > Thanks! > Tomer > > On Sun, Sep 7, 2014 at 9:27 PM, Josh Rosen <rosenvi...@gmail.com> wrote: >> If I recall, you should be able to start Hadoop MapReduce using >> ~/ephemeral-hdfs/sbin/start-mapred.sh. >> >> On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini <tomer....@gmail.com> wrote: >>> >>> Hi, >>> >>> I would like to copy log files from s3 to the cluster's >>> ephemeral-hdfs. I tried to use distcp, but I guess mapred is not >>> running on the cluster - I'm getting the exception below. >>> >>> Is there a way to activate it, or is there a spark alternative to distcp? >>> >>> Thanks, >>> Tomer >>> >>> mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use >>> org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: >>> Invalid "mapreduce.jobtracker.address" configuration value for >>> LocalJobRunner : "XXX:9001" >>> >>> ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered >>> >>> java.io.IOException: Cannot initialize Cluster. Please check your >>> configuration for mapreduce.framework.name and the correspond server >>> addresses. >>> >>> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121) >>> >>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83) >>> >>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76) >>> >>> at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352) >>> >>> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146) >>> >>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:118) >>> >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >>> >>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:374) >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org >