Re: distcp on ec2 standalone spark cluster

2015-03-08 Thread Akhil Das
Did you follow these steps? https://wiki.apache.org/hadoop/AmazonS3  Also
make sure your jobtracker/mapreduce processes are running fine.

Thanks
Best Regards

On Sun, Mar 8, 2015 at 7:32 AM, roni roni.epi...@gmail.com wrote:

 Did you get this to work?
 I got pass the issues with the cluster not startetd problem
 I am having problem where distcp with s3 URI says incorrect forlder path
 and
 s3n:// hangs.
 stuck for 2 days :(
 Thanks
 -R



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/distcp-on-ec2-standalone-spark-cluster-tp13652p21957.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: distcp on ec2 standalone spark cluster

2015-03-07 Thread roni
Did you get this to work?
I got pass the issues with the cluster not startetd problem
I am having problem where distcp with s3 URI says incorrect forlder path and
s3n:// hangs.
stuck for 2 days :(
Thanks
-R



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/distcp-on-ec2-standalone-spark-cluster-tp13652p21957.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Tomer Benyamini
~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2;

I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and
~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same error
when trying to run distcp:

ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered

java.io.IOException: Cannot initialize Cluster. Please check your
configuration for mapreduce.framework.name and the correspond server
addresses.

at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)

at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:83)

at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:76)

at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)

at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)

at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)

Any idea?

Thanks!
Tomer

On Sun, Sep 7, 2014 at 9:27 PM, Josh Rosen rosenvi...@gmail.com wrote:
 If I recall, you should be able to start Hadoop MapReduce using
 ~/ephemeral-hdfs/sbin/start-mapred.sh.

 On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini tomer@gmail.com wrote:

 Hi,

 I would like to copy log files from s3 to the cluster's
 ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
 running on the cluster - I'm getting the exception below.

 Is there a way to activate it, or is there a spark alternative to distcp?

 Thanks,
 Tomer

 mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
 org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
 Invalid mapreduce.jobtracker.address configuration value for
 LocalJobRunner : XXX:9001

 ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered

 java.io.IOException: Cannot initialize Cluster. Please check your
 configuration for mapreduce.framework.name and the correspond server
 addresses.

 at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)

 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:83)

 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:76)

 at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)

 at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)

 at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)

 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

 at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Frank Austin Nothaft
Tomer,

To use distcp, you need to have a Hadoop compute cluster up. start-dfs just 
restarts HDFS. I don’t have a Spark 1.0.2 cluster up right now, but there 
should be a start-mapred*.sh or start-all.sh script that will launch the Hadoop 
MapReduce cluster that you will need for distcp.

Regards,

Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466

On Sep 8, 2014, at 12:28 AM, Tomer Benyamini tomer@gmail.com wrote:

 ~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2;
 
 I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and
 ~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same error
 when trying to run distcp:
 
 ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
 
 java.io.IOException: Cannot initialize Cluster. Please check your
 configuration for mapreduce.framework.name and the correspond server
 addresses.
 
 at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
 
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:83)
 
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:76)
 
 at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
 
 at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
 
 at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
 
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 
 at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
 
 Any idea?
 
 Thanks!
 Tomer
 
 On Sun, Sep 7, 2014 at 9:27 PM, Josh Rosen rosenvi...@gmail.com wrote:
 If I recall, you should be able to start Hadoop MapReduce using
 ~/ephemeral-hdfs/sbin/start-mapred.sh.
 
 On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini tomer@gmail.com wrote:
 
 Hi,
 
 I would like to copy log files from s3 to the cluster's
 ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
 running on the cluster - I'm getting the exception below.
 
 Is there a way to activate it, or is there a spark alternative to distcp?
 
 Thanks,
 Tomer
 
 mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
 org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
 Invalid mapreduce.jobtracker.address configuration value for
 LocalJobRunner : XXX:9001
 
 ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
 
 java.io.IOException: Cannot initialize Cluster. Please check your
 configuration for mapreduce.framework.name and the correspond server
 addresses.
 
 at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
 
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:83)
 
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:76)
 
 at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
 
 at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
 
 at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
 
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 
 at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 



Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Nicholas Chammas
Tomer,

Did you try start-all.sh? It worked for me the last time I tried using
distcp, and it worked for this guy too
http://stackoverflow.com/a/18083790/877069.

Nick
​

On Mon, Sep 8, 2014 at 3:28 AM, Tomer Benyamini tomer@gmail.com wrote:

 ~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2;

 I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and
 ~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same error
 when trying to run distcp:

 ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered

 java.io.IOException: Cannot initialize Cluster. Please check your
 configuration for mapreduce.framework.name and the correspond server
 addresses.

 at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)

 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:83)

 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:76)

 at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)

 at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)

 at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)

 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

 at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)

 Any idea?

 Thanks!
 Tomer

 On Sun, Sep 7, 2014 at 9:27 PM, Josh Rosen rosenvi...@gmail.com wrote:
  If I recall, you should be able to start Hadoop MapReduce using
  ~/ephemeral-hdfs/sbin/start-mapred.sh.
 
  On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini tomer@gmail.com
 wrote:
 
  Hi,
 
  I would like to copy log files from s3 to the cluster's
  ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
  running on the cluster - I'm getting the exception below.
 
  Is there a way to activate it, or is there a spark alternative to
 distcp?
 
  Thanks,
  Tomer
 
  mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
  org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
  Invalid mapreduce.jobtracker.address configuration value for
  LocalJobRunner : XXX:9001
 
  ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
 
  java.io.IOException: Cannot initialize Cluster. Please check your
  configuration for mapreduce.framework.name and the correspond server
  addresses.
 
  at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
 
  at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:83)
 
  at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:76)
 
  at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
 
  at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
 
  at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
 
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 
  at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 
 

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Tomer Benyamini
Still no luck, even when running stop-all.sh followed by start-all.sh.

On Mon, Sep 8, 2014 at 5:57 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
 Tomer,

 Did you try start-all.sh? It worked for me the last time I tried using
 distcp, and it worked for this guy too.

 Nick


 On Mon, Sep 8, 2014 at 3:28 AM, Tomer Benyamini tomer@gmail.com wrote:

 ~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2;

 I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and
 ~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same error
 when trying to run distcp:

 ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered

 java.io.IOException: Cannot initialize Cluster. Please check your
 configuration for mapreduce.framework.name and the correspond server
 addresses.

 at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)

 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:83)

 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:76)

 at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)

 at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)

 at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)

 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

 at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)

 Any idea?

 Thanks!
 Tomer

 On Sun, Sep 7, 2014 at 9:27 PM, Josh Rosen rosenvi...@gmail.com wrote:
  If I recall, you should be able to start Hadoop MapReduce using
  ~/ephemeral-hdfs/sbin/start-mapred.sh.
 
  On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini tomer@gmail.com
  wrote:
 
  Hi,
 
  I would like to copy log files from s3 to the cluster's
  ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
  running on the cluster - I'm getting the exception below.
 
  Is there a way to activate it, or is there a spark alternative to
  distcp?
 
  Thanks,
  Tomer
 
  mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
  org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
  Invalid mapreduce.jobtracker.address configuration value for
  LocalJobRunner : XXX:9001
 
  ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
 
  java.io.IOException: Cannot initialize Cluster. Please check your
  configuration for mapreduce.framework.name and the correspond server
  addresses.
 
  at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
 
  at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:83)
 
  at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:76)
 
  at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
 
  at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
 
  at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
 
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 
  at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 
 

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Ye Xianjin
what did you see in the log? was there anything related to mapreduce?
can you log into your hdfs (data) node, use jps to list all java process and 
confirm whether there is a tasktracker process (or nodemanager) running with 
datanode process


-- 
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, September 8, 2014 at 11:13 PM, Tomer Benyamini wrote:

 Still no luck, even when running stop-all.sh (http://stop-all.sh) followed by 
 start-all.sh (http://start-all.sh).
 
 On Mon, Sep 8, 2014 at 5:57 PM, Nicholas Chammas
 nicholas.cham...@gmail.com (mailto:nicholas.cham...@gmail.com) wrote:
  Tomer,
  
  Did you try start-all.sh (http://start-all.sh)? It worked for me the last 
  time I tried using
  distcp, and it worked for this guy too.
  
  Nick
  
  
  On Mon, Sep 8, 2014 at 3:28 AM, Tomer Benyamini tomer@gmail.com 
  (mailto:tomer@gmail.com) wrote:
   
   ~/ephemeral-hdfs/sbin/start-mapred.sh (http://start-mapred.sh) does not 
   exist on spark-1.0.2;
   
   I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh 
   (http://stop-dfs.sh) and
   ~/ephemeral-hdfs/sbin/start-dfs.sh (http://start-dfs.sh), but still 
   getting the same error
   when trying to run distcp:
   
   ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
   
   java.io.IOException: Cannot initialize Cluster. Please check your
   configuration for mapreduce.framework.name 
   (http://mapreduce.framework.name) and the correspond server
   addresses.
   
   at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
   
   at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:83)
   
   at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:76)
   
   at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
   
   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
   
   at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
   
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   
   at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
   
   Any idea?
   
   Thanks!
   Tomer
   
   On Sun, Sep 7, 2014 at 9:27 PM, Josh Rosen rosenvi...@gmail.com 
   (mailto:rosenvi...@gmail.com) wrote:
If I recall, you should be able to start Hadoop MapReduce using
~/ephemeral-hdfs/sbin/start-mapred.sh (http://start-mapred.sh).

On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini tomer@gmail.com 
(mailto:tomer@gmail.com)
wrote:
 
 Hi,
 
 I would like to copy log files from s3 to the cluster's
 ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
 running on the cluster - I'm getting the exception below.
 
 Is there a way to activate it, or is there a spark alternative to
 distcp?
 
 Thanks,
 Tomer
 
 mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
 org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
 Invalid mapreduce.jobtracker.address configuration value for
 LocalJobRunner : XXX:9001
 
 ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
 
 java.io.IOException: Cannot initialize Cluster. Please check your
 configuration for mapreduce.framework.name 
 (http://mapreduce.framework.name) and the correspond server
 addresses.
 
 at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
 
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:83)
 
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:76)
 
 at 
 org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
 
 at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
 
 at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
 
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 
 at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
 (mailto:user-unsubscr...@spark.apache.org)
 For additional commands, e-mail: user-h...@spark.apache.org 
 (mailto:user-h...@spark.apache.org)
 


   
   
   -
   To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
   (mailto:user-unsubscr...@spark.apache.org)
   For additional commands, e-mail: user-h...@spark.apache.org 
   (mailto:user-h...@spark.apache.org)
   
  
  
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
 (mailto:user-unsubscr...@spark.apache.org)
 For additional commands, e-mail: user-h...@spark.apache.org 
 (mailto:user-h...@spark.apache.org)
 
 




Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Tomer Benyamini
No tasktracker or nodemanager. This is what I see:

On the master:

org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
org.apache.hadoop.hdfs.server.namenode.NameNode

On the data node (slave):

org.apache.hadoop.hdfs.server.datanode.DataNode



On Mon, Sep 8, 2014 at 6:39 PM, Ye Xianjin advance...@gmail.com wrote:
 what did you see in the log? was there anything related to mapreduce?
 can you log into your hdfs (data) node, use jps to list all java process and
 confirm whether there is a tasktracker process (or nodemanager) running with
 datanode process

 --
 Ye Xianjin
 Sent with Sparrow

 On Monday, September 8, 2014 at 11:13 PM, Tomer Benyamini wrote:

 Still no luck, even when running stop-all.sh followed by start-all.sh.

 On Mon, Sep 8, 2014 at 5:57 PM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:

 Tomer,

 Did you try start-all.sh? It worked for me the last time I tried using
 distcp, and it worked for this guy too.

 Nick


 On Mon, Sep 8, 2014 at 3:28 AM, Tomer Benyamini tomer@gmail.com wrote:


 ~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2;

 I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and
 ~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same error
 when trying to run distcp:

 ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered

 java.io.IOException: Cannot initialize Cluster. Please check your
 configuration for mapreduce.framework.name and the correspond server
 addresses.

 at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)

 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:83)

 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:76)

 at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)

 at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)

 at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)

 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

 at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)

 Any idea?

 Thanks!
 Tomer

 On Sun, Sep 7, 2014 at 9:27 PM, Josh Rosen rosenvi...@gmail.com wrote:

 If I recall, you should be able to start Hadoop MapReduce using
 ~/ephemeral-hdfs/sbin/start-mapred.sh.

 On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini tomer@gmail.com
 wrote:


 Hi,

 I would like to copy log files from s3 to the cluster's
 ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
 running on the cluster - I'm getting the exception below.

 Is there a way to activate it, or is there a spark alternative to
 distcp?

 Thanks,
 Tomer

 mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
 org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
 Invalid mapreduce.jobtracker.address configuration value for
 LocalJobRunner : XXX:9001

 ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered

 java.io.IOException: Cannot initialize Cluster. Please check your
 configuration for mapreduce.framework.name and the correspond server
 addresses.

 at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)

 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:83)

 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:76)

 at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)

 at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)

 at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)

 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

 at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Ye Xianjin
well, this means you didn't start a compute cluster. Most likely because the 
wrong value of mapreduce.jobtracker.address cause the slave node cannot start 
the node manager. ( I am not familiar with the ec2 script, so I don't know 
whether the slave node has node manager installed or not.) 
Can you check the slave node the hadoop daemon log to see whether you started 
the nodemanager  but failed or there is no nodemanager to start? The log file 
location defaults to
/var/log/hadoop-xxx if my memory is correct.

Sent from my iPhone

 On 2014年9月9日, at 0:08, Tomer Benyamini tomer@gmail.com wrote:
 
 No tasktracker or nodemanager. This is what I see:
 
 On the master:
 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
 org.apache.hadoop.hdfs.server.namenode.NameNode
 
 On the data node (slave):
 
 org.apache.hadoop.hdfs.server.datanode.DataNode
 
 
 
 On Mon, Sep 8, 2014 at 6:39 PM, Ye Xianjin advance...@gmail.com wrote:
 what did you see in the log? was there anything related to mapreduce?
 can you log into your hdfs (data) node, use jps to list all java process and
 confirm whether there is a tasktracker process (or nodemanager) running with
 datanode process
 
 --
 Ye Xianjin
 Sent with Sparrow
 
 On Monday, September 8, 2014 at 11:13 PM, Tomer Benyamini wrote:
 
 Still no luck, even when running stop-all.sh followed by start-all.sh.
 
 On Mon, Sep 8, 2014 at 5:57 PM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
 
 Tomer,
 
 Did you try start-all.sh? It worked for me the last time I tried using
 distcp, and it worked for this guy too.
 
 Nick
 
 
 On Mon, Sep 8, 2014 at 3:28 AM, Tomer Benyamini tomer@gmail.com wrote:
 
 
 ~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2;
 
 I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and
 ~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same error
 when trying to run distcp:
 
 ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
 
 java.io.IOException: Cannot initialize Cluster. Please check your
 configuration for mapreduce.framework.name and the correspond server
 addresses.
 
 at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
 
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:83)
 
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:76)
 
 at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
 
 at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
 
 at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
 
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 
 at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
 
 Any idea?
 
 Thanks!
 Tomer
 
 On Sun, Sep 7, 2014 at 9:27 PM, Josh Rosen rosenvi...@gmail.com wrote:
 
 If I recall, you should be able to start Hadoop MapReduce using
 ~/ephemeral-hdfs/sbin/start-mapred.sh.
 
 On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini tomer@gmail.com
 wrote:
 
 
 Hi,
 
 I would like to copy log files from s3 to the cluster's
 ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
 running on the cluster - I'm getting the exception below.
 
 Is there a way to activate it, or is there a spark alternative to
 distcp?
 
 Thanks,
 Tomer
 
 mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
 org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
 Invalid mapreduce.jobtracker.address configuration value for
 LocalJobRunner : XXX:9001
 
 ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
 
 java.io.IOException: Cannot initialize Cluster. Please check your
 configuration for mapreduce.framework.name and the correspond server
 addresses.
 
 at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
 
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:83)
 
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:76)
 
 at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
 
 at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
 
 at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
 
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 
 at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: 

distcp on ec2 standalone spark cluster

2014-09-07 Thread Tomer Benyamini
Hi,

I would like to copy log files from s3 to the cluster's
ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
running on the cluster - I'm getting the exception below.

Is there a way to activate it, or is there a spark alternative to distcp?

Thanks,
Tomer

mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
Invalid mapreduce.jobtracker.address configuration value for
LocalJobRunner : XXX:9001

ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered

java.io.IOException: Cannot initialize Cluster. Please check your
configuration for mapreduce.framework.name and the correspond server
addresses.

at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)

at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:83)

at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:76)

at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)

at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)

at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: distcp on ec2 standalone spark cluster

2014-09-07 Thread Tomer Benyamini
I've installed a spark standalone cluster on ec2 as defined here -
https://spark.apache.org/docs/latest/ec2-scripts.html. I'm not sure if
mr1/2 is part of this installation.


On Sun, Sep 7, 2014 at 7:25 PM, Ye Xianjin advance...@gmail.com wrote:
 Distcp requires a mr1(or mr2) cluster to start. Do you have a mapreduce
 cluster on your hdfs?
 And from the error message, it seems that you didn't specify your jobtracker
 address.

 --
 Ye Xianjin
 Sent with Sparrow

 On Sunday, September 7, 2014 at 9:42 PM, Tomer Benyamini wrote:

 Hi,

 I would like to copy log files from s3 to the cluster's
 ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
 running on the cluster - I'm getting the exception below.

 Is there a way to activate it, or is there a spark alternative to distcp?

 Thanks,
 Tomer

 mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
 org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
 Invalid mapreduce.jobtracker.address configuration value for
 LocalJobRunner : XXX:9001

 ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered

 java.io.IOException: Cannot initialize Cluster. Please check your
 configuration for mapreduce.framework.name and the correspond server
 addresses.

 at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)

 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:83)

 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:76)

 at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)

 at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)

 at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)

 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

 at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: distcp on ec2 standalone spark cluster

2014-09-07 Thread Nicholas Chammas
I think you need to run start-all.sh or something similar on the EC2
cluster. MR is installed but is not running by default on EC2 clusters spun
up by spark-ec2.
​

On Sun, Sep 7, 2014 at 12:33 PM, Tomer Benyamini tomer@gmail.com
wrote:

 I've installed a spark standalone cluster on ec2 as defined here -
 https://spark.apache.org/docs/latest/ec2-scripts.html. I'm not sure if
 mr1/2 is part of this installation.


 On Sun, Sep 7, 2014 at 7:25 PM, Ye Xianjin advance...@gmail.com wrote:
  Distcp requires a mr1(or mr2) cluster to start. Do you have a mapreduce
  cluster on your hdfs?
  And from the error message, it seems that you didn't specify your
 jobtracker
  address.
 
  --
  Ye Xianjin
  Sent with Sparrow
 
  On Sunday, September 7, 2014 at 9:42 PM, Tomer Benyamini wrote:
 
  Hi,
 
  I would like to copy log files from s3 to the cluster's
  ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
  running on the cluster - I'm getting the exception below.
 
  Is there a way to activate it, or is there a spark alternative to distcp?
 
  Thanks,
  Tomer
 
  mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
  org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
  Invalid mapreduce.jobtracker.address configuration value for
  LocalJobRunner : XXX:9001
 
  ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
 
  java.io.IOException: Cannot initialize Cluster. Please check your
  configuration for mapreduce.framework.name and the correspond server
  addresses.
 
  at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)
 
  at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:83)
 
  at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:76)
 
  at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)
 
  at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)
 
  at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)
 
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 
  at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 
 

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: distcp on ec2 standalone spark cluster

2014-09-07 Thread Josh Rosen
If I recall, you should be able to start Hadoop MapReduce using
~/ephemeral-hdfs/sbin/start-mapred.sh.

On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini tomer@gmail.com wrote:

 Hi,

 I would like to copy log files from s3 to the cluster's
 ephemeral-hdfs. I tried to use distcp, but I guess mapred is not
 running on the cluster - I'm getting the exception below.

 Is there a way to activate it, or is there a spark alternative to distcp?

 Thanks,
 Tomer

 mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to use
 org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
 Invalid mapreduce.jobtracker.address configuration value for
 LocalJobRunner : XXX:9001

 ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered

 java.io.IOException: Cannot initialize Cluster. Please check your
 configuration for mapreduce.framework.name and the correspond server
 addresses.

 at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:121)

 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:83)

 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:76)

 at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:352)

 at org.apache.hadoop.tools.DistCp.execute(DistCp.java:146)

 at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)

 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

 at org.apache.hadoop.tools.DistCp.main(DistCp.java:374)

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org