Re: distcp on ec2 standalone spark cluster

Nicholas Chammas Sun, 07 Sep 2014 11:21:07 -0700

I think you need to run start-all.sh or something similar on the EC2
cluster. MR is installed but is not running by default on EC2 clusters spun
up by spark-ec2.


On Sun, Sep 7, 2014 at 12:33 PM, Tomer Benyamini <tomer....@gmail.com>
wrote:

> I've installed a spark standalone cluster on ec2 as defined here -
> https://spark.apache.org/docs/latest/ec2-scripts.html. I'm not sure if
> mr1/2 is part of this installation.
>
>
> On Sun, Sep 7, 2014 at 7:25 PM, Ye Xianjin <advance...@gmail.com> wrote:
> > Distcp requires a mr1(or mr2) cluster to start. Do you have a mapreduce
> > cluster on your hdfs?
> > And from the error message, it seems that you didn't specify your
> jobtracker
> > address.
> >
> > --
> > Ye Xianjin
> > Sent with Sparrow
> >
> > On Sunday, September 7, 2014 at 9:42 PM, Tomer Benyamini wrote:
> >
> > Hi,
> >
> > I would like to copy log files from s3 to the cluster's
> > ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
> > running on the cluster - I'm getting the exception below.
> >
> > Is there a way to activate it, or is there a spark alternative to distcp?
> >
> > Thanks,
> > Tomer
> >
> > mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
> > org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
> > Invalid "mapreduce.jobtracker.address" configuration value for
> > LocalJobRunner : "XXX:9001"
> >
> > ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
> >
> > java.io.IOException: Cannot initialize Cluster. Please check your
> > configuration for mapreduce.framework.name and the correspond server
> > addresses.
> >
> > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
> >
> > at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:83)
> >
> > at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:76)
> >
> > at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
> >
> > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
> >
> > at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
> >
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >
> > at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: distcp on ec2 standalone spark cluster

Reply via email to