CDH5 MRV1 HA / YARN HA port assignment
Hi, I am playing with CDH5 jobtracker HA and YARN HA configuration. I am wondering about the configuration example in their web site. http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-High-Availability-Guide/cdh5hag_jt_ha_config.html They uses for each service on different nodes different ports. Like mapred.jobtracker.rpc-address.logicaljt.jt1 = myjt1.myco.com:8021 mapred.jobtracker.rpc-address.logicaljt.jt2 = myjt2.myco.com:8022 mapred.job.tracker.http.address.logicaljt.jt1 = 0.0.0.0:50030 mapred.job.tracker.http.address.logicaljt.jt2 = 0.0.0.0:50031 mapred.ha.jobtracker.rpc-address.logicaljt.jt1 = myjt1.myco.com:8023 mapred.ha.jobtracker.rpc-address.logicaljt.jt2 = myjt2.myco.com:8024 mapred.ha.jobtracker.http-redirect-address.logicaljt.jt1 = myjt1.myco.com:50030 mapred.ha.jobtracker.http-redirect-address.logicaljt.jt2 = myjt2.myco.com:50031 Why do I need different ports when i use different nodes? On their web site about YARN HA they write In an HA setting, you should configure two RMs to use different ports (for example, ports on different hosts). But in the example they use the same ports for the 2 ressource managers. So do I need to use different ports on different nodes? Regards Hansi
Map job not finishing
Hi all, I'm using oozie to run a hive script, but the map job is not completing. The tracking page shows its progress as 100%, and there's no warnings or errors in the logs, it's just sitting there with a state of 'RUNNING'. As best I can make out from the logs, the last statement in the hive script has been successfully parsed and it tries to start the command, saying launching job 1 of 3. That job is sitting there in the ACCEPTED state, but doing nothing. This is on a single-node cluster running Hortonworks Data Platform 2.1. Can anyone suggest what might be the cause, or where else to look for diagnostic information? Thanks, Charles
Re: Map job not finishing
How many tasktrackers do you have setup for your single node cluster? Oozie runs each action as a java program on an arbitrary cluster node, so running a workflow requires a minimum of two tasktrackers. On Fri, Sep 5, 2014 at 7:33 AM, Charles Robertson charles.robert...@gmail.com wrote: Hi all, I'm using oozie to run a hive script, but the map job is not completing. The tracking page shows its progress as 100%, and there's no warnings or errors in the logs, it's just sitting there with a state of 'RUNNING'. As best I can make out from the logs, the last statement in the hive script has been successfully parsed and it tries to start the command, saying launching job 1 of 3. That job is sitting there in the ACCEPTED state, but doing nothing. This is on a single-node cluster running Hortonworks Data Platform 2.1. Can anyone suggest what might be the cause, or where else to look for diagnostic information? Thanks, Charles -- *Kernighan's Law* Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.
Re: Need some tutorials for Mapreduce written in Python
Hi, Latest version of the document Sebastiano mentioned is available here: http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/HadoopStreaming.html Thanks, - Tsuyoshi On Fri, Sep 5, 2014 at 12:39 PM, Andrew Ehrlich and...@aehrlich.com wrote: Also when you look at examples pay attention to the Hadoop version. The java API has changed a bit which can be confusing. On Aug 28, 2014, at 10:10 AM, Amar Singh amarsingh...@gmail.com wrote: Thank you to everyone who responded to this thread. I got couple of good moves and got some good online courses to explore from to get some fundamental understanding of the things. Thanks Amar On Thu, Aug 28, 2014 at 10:15 AM, Sriram Balachander sriram.balachan...@gmail.com wrote: Hadoop The Definitive Guide, Hadoop in action are good books and the course in edureka is also good. Regards Sriram On Wed, Aug 27, 2014 at 9:25 PM, thejas prasad thejch...@gmail.com wrote: Are any books for this as well? On Wed, Aug 27, 2014 at 8:30 PM, Marco Shaw marco.s...@gmail.com wrote: You might want to consider the Hadoop course on udacity.com. I think it provides a decent foundation to Hadoop/MapReduce with a focus on Python (using the streaming API like Sebastiano mentions). Marco On Wed, Aug 27, 2014 at 3:13 PM, Amar Singh amarsingh...@gmail.com wrote: Hi Users, I am new to big data world and was in process of reading some material of writing mapreduce using Python. Any links or pointers in that direction will be really helpful. -- - Tsuyoshi