Re: distcp on ec2 standalone spark cluster

Frank Austin Nothaft Mon, 08 Sep 2014 00:34:04 -0700

Tomer,

To use distcp, you need to have a Hadoop compute cluster up. start-dfs just 
restarts HDFS. I don’t have a Spark 1.0.2 cluster up right now, but there 
should be a start-mapred*.sh or start-all.sh script that will launch the Hadoop 
MapReduce cluster that you will need for distcp.


Regards,

Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466

On Sep 8, 2014, at 12:28 AM, Tomer Benyamini <tomer....@gmail.com> wrote:

> ~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2;
> 
> I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and
> ~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same error
> when trying to run distcp:
> 
> ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
> 
> java.io.IOException: Cannot initialize Cluster. Please check your
> configuration for mapreduce.framework.name and the correspond server
> addresses.
> 
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
> 
> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
> 
> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
> 
> at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
> 
> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
> 
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
> 
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
> 
> Any idea?
> 
> Thanks!
> Tomer
> 
> On Sun, Sep 7, 2014 at 9:27 PM, Josh Rosen <rosenvi...@gmail.com> wrote:
>> If I recall, you should be able to start Hadoop MapReduce using
>> ~/ephemeral-hdfs/sbin/start-mapred.sh.
>> 
>> On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini <tomer....@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I would like to copy log files from s3 to the cluster's
>>> ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
>>> running on the cluster - I'm getting the exception below.
>>> 
>>> Is there a way to activate it, or is there a spark alternative to distcp?
>>> 
>>> Thanks,
>>> Tomer
>>> 
>>> mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
>>> org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
>>> Invalid "mapreduce.jobtracker.address" configuration value for
>>> LocalJobRunner : "XXX:9001"
>>> 
>>> ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
>>> 
>>> java.io.IOException: Cannot initialize Cluster. Please check your
>>> configuration for mapreduce.framework.name and the correspond server
>>> addresses.
>>> 
>>> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
>>> 
>>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
>>> 
>>> at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
>>> 
>>> at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
>>> 
>>> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
>>> 
>>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
>>> 
>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> 
>>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

Re: distcp on ec2 standalone spark cluster

Reply via email to