CDH5 MRV1 HA / YARN HA port assignment

2014-09-05 Thread Hansi Klose
Hi,

I am playing with CDH5 jobtracker HA and YARN HA configuration.

I am wondering about the configuration example in their web site. 
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-High-Availability-Guide/cdh5hag_jt_ha_config.html

They uses for each service on different nodes different ports.

Like 

mapred.jobtracker.rpc-address.logicaljt.jt1 = myjt1.myco.com:8021
mapred.jobtracker.rpc-address.logicaljt.jt2 = myjt2.myco.com:8022

mapred.job.tracker.http.address.logicaljt.jt1 = 0.0.0.0:50030
mapred.job.tracker.http.address.logicaljt.jt2 = 0.0.0.0:50031

mapred.ha.jobtracker.rpc-address.logicaljt.jt1 = myjt1.myco.com:8023
mapred.ha.jobtracker.rpc-address.logicaljt.jt2 = myjt2.myco.com:8024

mapred.ha.jobtracker.http-redirect-address.logicaljt.jt1 = myjt1.myco.com:50030
mapred.ha.jobtracker.http-redirect-address.logicaljt.jt2 = myjt2.myco.com:50031

Why do I need different ports when i use different nodes?


On their web site about YARN HA they write
 In an HA setting, you should configure two RMs to use different ports (for 
 example, ports on different hosts).

But in the example they use the same ports for the 2 ressource managers.

So do I need to use different ports on different nodes?

Regards Hansi


Map job not finishing

2014-09-05 Thread Charles Robertson
Hi all,

I'm using oozie to run a hive script, but the map job is not completing.
The tracking page shows its progress as 100%, and there's no warnings or
errors in the logs, it's just sitting there with a state of 'RUNNING'.

As best I can make out from the logs, the last statement in the hive script
has been successfully parsed and it tries to start the command, saying
launching job 1 of 3. That job is sitting there in the ACCEPTED state,
but doing nothing.

This is on a single-node cluster running Hortonworks Data Platform 2.1. Can
anyone suggest what might be the cause, or where else to look for
diagnostic information?

Thanks,
Charles


Re: Map job not finishing

2014-09-05 Thread Rich Haase
How many tasktrackers do you have setup for your single node cluster?
 Oozie runs each action as a java program on an arbitrary cluster node, so
running a workflow requires a minimum of two tasktrackers.


On Fri, Sep 5, 2014 at 7:33 AM, Charles Robertson 
charles.robert...@gmail.com wrote:

 Hi all,

 I'm using oozie to run a hive script, but the map job is not completing.
 The tracking page shows its progress as 100%, and there's no warnings or
 errors in the logs, it's just sitting there with a state of 'RUNNING'.

 As best I can make out from the logs, the last statement in the hive
 script has been successfully parsed and it tries to start the command,
 saying launching job 1 of 3. That job is sitting there in the ACCEPTED
 state, but doing nothing.

 This is on a single-node cluster running Hortonworks Data Platform 2.1.
 Can anyone suggest what might be the cause, or where else to look for
 diagnostic information?

 Thanks,
 Charles




-- 
*Kernighan's Law*
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it.


Re: Need some tutorials for Mapreduce written in Python

2014-09-05 Thread Tsuyoshi OZAWA
Hi,

Latest version of the document Sebastiano mentioned is available here:
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopStreaming.html

Thanks,
- Tsuyoshi

On Fri, Sep 5, 2014 at 12:39 PM, Andrew Ehrlich and...@aehrlich.com wrote:
 Also when you look at examples pay attention to the Hadoop version. The java
 API has changed a bit which can be confusing.

 On Aug 28, 2014, at 10:10 AM, Amar Singh amarsingh...@gmail.com wrote:

 Thank you to everyone who responded to this thread. I got couple of good
 moves and got some good online courses to explore from to get some
 fundamental understanding of the things.

 Thanks
 Amar


 On Thu, Aug 28, 2014 at 10:15 AM, Sriram Balachander
 sriram.balachan...@gmail.com wrote:

 Hadoop The Definitive Guide, Hadoop in action are good books and the
 course in edureka is also good.

 Regards
 Sriram


 On Wed, Aug 27, 2014 at 9:25 PM, thejas prasad thejch...@gmail.com
 wrote:

 Are any books for this as well?



 On Wed, Aug 27, 2014 at 8:30 PM, Marco Shaw marco.s...@gmail.com wrote:

 You might want to consider the Hadoop course on udacity.com.  I think it
 provides a decent foundation to Hadoop/MapReduce with a focus on Python
 (using the streaming API like Sebastiano mentions).

 Marco


 On Wed, Aug 27, 2014 at 3:13 PM, Amar Singh amarsingh...@gmail.com
 wrote:

 Hi Users,
 I am new to big data world and was in process of reading some material
 of writing mapreduce using Python.

 Any links or pointers in that direction will be really helpful.









-- 
- Tsuyoshi