Re: Installing both hadoop1 and haddop2 on single node
Thanks.. I set it accordingly.. On Monday, 5 May 2014, Shengjun Xin s...@gopivotal.com wrote: I think you need to set two different environment variables for hadoop1 and hadoop2, such as HADOOP_HOME, HADOOP_CONF_DIR and before you run hadoop command, you need to make sure the correct environment variables are enable. On Mon, May 5, 2014 at 12:36 PM, chandra kant chandralakshmikan...@gmail.comjavascript:_e(%7B%7D,'cvml','chandralakshmikan...@gmail.com'); wrote: Hi, Is it possible to install both hadoop1 and hadoop2 side by side on single node for development purpose so that I can choose any one of them by just shutting down one and starting another? I installed hadoop 1.2.1 and ran air successfully. Next, when I try to do hdfs namenode - format from hadoop2.2.0 , it tries to format the hadoop.tmp.dir set up by hadoop1.2.1 , which is clearly not desirable. I set up dfs.namenode.name.dir and dfs.datanode.data.dir in hdfs-site.xml to different locations. But again the same problem... Any suggestions -- Chandra -- Regards Shengjun
[Blog] Map-only jobs in Hadoop for beginers
Hi http://www.unmeshasreeveni.blogspot.in/2014/05/map-only-jobs-in-hadoop.html This is a post on Map-only Jobs in Hadoop for beginers. -- *Thanks Regards * *Unmesha Sreeveni U.B* *Hadoop, Bigdata Developer* *Center for Cyber Security | Amrita Vishwa Vidyapeetham* http://www.unmeshasreeveni.blogspot.in/
Regaring the shell for hadoop for command completing and ease directory listing
Hi, Does hadoop support the command completion tool and state aware tool like to store the current working directory in hadoop file system like in unix shell. If not any open source tools available ? Regards, James Arivazhagan Ponnusamy
Re: Installing both hadoop1 and haddop2 on single node
I mean , I tried that already. In my .bash_profile , I set up HADOOP_HOME, HADOOP_MAPRED_HOME, HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, HADOOP_YARN_HOME, HADOOP_CONF_DIR to point to hadoop2 directory. And similarly I have HADOOP_PREFIX and HADOOP_HOME for hadoop-1. So , i comment out hadoop-2 specific conf lines in .bash_profile while running hadoop-1 and vice-versa. And here the problem comes, that when trying to run hadoop-2 , it asks to format the hadoop.tmp.dir set up by hadoop-1. Just for context- I have done fair amount of work on hadoop-1 and this is my first attempt at hadoop-2. -- Chandra On Mon, May 5, 2014 at 11:31 AM, chandra kant chandralakshmikan...@gmail.com wrote: Thanks.. I set it accordingly.. On Monday, 5 May 2014, Shengjun Xin s...@gopivotal.com wrote: I think you need to set two different environment variables for hadoop1 and hadoop2, such as HADOOP_HOME, HADOOP_CONF_DIR and before you run hadoop command, you need to make sure the correct environment variables are enable. On Mon, May 5, 2014 at 12:36 PM, chandra kant chandralakshmikan...@gmail.com wrote: Hi, Is it possible to install both hadoop1 and hadoop2 side by side on single node for development purpose so that I can choose any one of them by just shutting down one and starting another? I installed hadoop 1.2.1 and ran air successfully. Next, when I try to do hdfs namenode - format from hadoop2.2.0 , it tries to format the hadoop.tmp.dir set up by hadoop1.2.1 , which is clearly not desirable. I set up dfs.namenode.name.dir and dfs.datanode.data.dir in hdfs-site.xml to different locations. But again the same problem... Any suggestions -- Chandra -- Regards Shengjun
Reading from jar file in local filesystem using hadoop FileSystem
Hey guys, I'm new here. I the following question on SO http://stackoverflow.com/questions/23478746/reading-from-jar-file-in-local-filesystem-using-hadoop-filesystembut I figured someone here might have a better idea. Any help would be greatly appreciated. -- Diego Fernandez - 爱国
Are mapper classes re-instantiated for each record?
Let's say I have TaskTracker that receives 5 records to process for a single job. When the TaskTracker processses the first record, it will instantiate my Mapper class and execute my setup() function. It will then run the map() method on that record. My question is this : what happens when the map() method has finished processing the first record? I'm guessing it will do one of two things : 1) My cleanup() function will execute. After the cleanup() method has finished, this instance of the Mapper object will be destroyed. When it is time to process the next record, a new Mapper object will be instantiated. Then my setup() method will execute, the map() method will execute, the cleanup() method will execute, and then the Mapper instance will be destroyed. When it is time to process the next record, a new Mapper object will be instantiated. This process will repeat itself until all 5 records have been processed. In other words, my setup() and cleanup() methods will have been executed 5 times each. or 2) When the map() method has finished processing my first record, the Mapper instance will NOT be destroyed. It will be reused for all 5 records. When the map() method has finished processing the last record, my cleanup() method will execute. In other words, my setup() and cleanup() methods will only execute 1 time each. Thanks for the help!
issue about cluster balance
hi,maillist: i have a 5-node hadoop cluster,and yesterday i add 5 new box into my cluster,after that i start balance task,but it move only 7% data to new node in 20 hour , and i already set dfs.datanode.balance.bandwidthPerSec 10M ,and the threshold is 10%,why the balance task take long time ?
Re: Installing both hadoop1 and haddop2 on single node
According to your description, I think it is still a configuration problem. Before you run hadoop command, did you check the hadoop version and hadoop environment variables ? Are they what you want? On Mon, May 5, 2014 at 11:42 PM, chandra kant chandralakshmikan...@gmail.com wrote: I mean , I tried that already. In my .bash_profile , I set up HADOOP_HOME, HADOOP_MAPRED_HOME, HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, HADOOP_YARN_HOME, HADOOP_CONF_DIR to point to hadoop2 directory. And similarly I have HADOOP_PREFIX and HADOOP_HOME for hadoop-1. So , i comment out hadoop-2 specific conf lines in .bash_profile while running hadoop-1 and vice-versa. And here the problem comes, that when trying to run hadoop-2 , it asks to format the hadoop.tmp.dir set up by hadoop-1. Just for context- I have done fair amount of work on hadoop-1 and this is my first attempt at hadoop-2. -- Chandra On Mon, May 5, 2014 at 11:31 AM, chandra kant chandralakshmikan...@gmail.com wrote: Thanks.. I set it accordingly.. On Monday, 5 May 2014, Shengjun Xin s...@gopivotal.com wrote: I think you need to set two different environment variables for hadoop1 and hadoop2, such as HADOOP_HOME, HADOOP_CONF_DIR and before you run hadoop command, you need to make sure the correct environment variables are enable. On Mon, May 5, 2014 at 12:36 PM, chandra kant chandralakshmikan...@gmail.com wrote: Hi, Is it possible to install both hadoop1 and hadoop2 side by side on single node for development purpose so that I can choose any one of them by just shutting down one and starting another? I installed hadoop 1.2.1 and ran air successfully. Next, when I try to do hdfs namenode - format from hadoop2.2.0 , it tries to format the hadoop.tmp.dir set up by hadoop1.2.1 , which is clearly not desirable. I set up dfs.namenode.name.dir and dfs.datanode.data.dir in hdfs-site.xml to different locations. But again the same problem... Any suggestions -- Chandra -- Regards Shengjun -- Regards Shengjun
Yarn HA - ZooKeeper ACL for fencing
Hi, For yarn.resourcemanager.zk-state-store.root-node.acl, the yarn-default.xml says For fencing to work, the ACLs should be carefully set differently on each ResourceManger such that all the ResourceManagers have shared admin access and the Active ResourceManger takes over (exclusively) the create-delete access. Can someone give actual example of such permissions ? Thanks,
Re: Installing both hadoop1 and haddop2 on single node
Please be sure to use different HADOOP_CONF_DIR for the two version; and also in the configuration, be sure to use different folder to store the HDFS related files; Regards, *Stanley Shi,* On Tue, May 6, 2014 at 8:41 AM, Shengjun Xin s...@gopivotal.com wrote: According to your description, I think it is still a configuration problem. Before you run hadoop command, did you check the hadoop version and hadoop environment variables ? Are they what you want? On Mon, May 5, 2014 at 11:42 PM, chandra kant chandralakshmikan...@gmail.com wrote: I mean , I tried that already. In my .bash_profile , I set up HADOOP_HOME, HADOOP_MAPRED_HOME, HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, HADOOP_YARN_HOME, HADOOP_CONF_DIR to point to hadoop2 directory. And similarly I have HADOOP_PREFIX and HADOOP_HOME for hadoop-1. So , i comment out hadoop-2 specific conf lines in .bash_profile while running hadoop-1 and vice-versa. And here the problem comes, that when trying to run hadoop-2 , it asks to format the hadoop.tmp.dir set up by hadoop-1. Just for context- I have done fair amount of work on hadoop-1 and this is my first attempt at hadoop-2. -- Chandra On Mon, May 5, 2014 at 11:31 AM, chandra kant chandralakshmikan...@gmail.com wrote: Thanks.. I set it accordingly.. On Monday, 5 May 2014, Shengjun Xin s...@gopivotal.com wrote: I think you need to set two different environment variables for hadoop1 and hadoop2, such as HADOOP_HOME, HADOOP_CONF_DIR and before you run hadoop command, you need to make sure the correct environment variables are enable. On Mon, May 5, 2014 at 12:36 PM, chandra kant chandralakshmikan...@gmail.com wrote: Hi, Is it possible to install both hadoop1 and hadoop2 side by side on single node for development purpose so that I can choose any one of them by just shutting down one and starting another? I installed hadoop 1.2.1 and ran air successfully. Next, when I try to do hdfs namenode - format from hadoop2.2.0 , it tries to format the hadoop.tmp.dir set up by hadoop1.2.1 , which is clearly not desirable. I set up dfs.namenode.name.dir and dfs.datanode.data.dir in hdfs-site.xml to different locations. But again the same problem... Any suggestions -- Chandra -- Regards Shengjun -- Regards Shengjun
RE: issue about cluster balance
Could you give more details like, - Could you convert 7% to the total amount of moved data in MBs. - Also, could you tell me 7% data movement per DN ? - What values showing for the ‘over-utilized’, ‘above-average’, ‘below-average’, ‘below-average’ nodes. Balancer will do the pairing based on these values. - Please tell me the cluster topology - SAME_NODE_GROUP, SAME_RACK. Basically this will matters when choosing the sourceNode vs balancerNode pairs as well as the proxy source. Did you see all the DNs are getting utilized for the block movement. - Any exceptions occurred when block movement - How many iterations played in these hours -Rakesh From: ch huang [mailto:justlo...@gmail.com] Sent: 06 May 2014 06:10 To: user@hadoop.apache.org Subject: issue about cluster balance hi,maillist: i have a 5-node hadoop cluster,and yesterday i add 5 new box into my cluster,after that i start balance task,but it move only 7% data to new node in 20 hour , and i already set dfs.datanode.balance.bandwidthPerSec 10M ,and the threshold is 10%,why the balance task take long time ?
Specifying a node/host name on which the Application Master should run
Hi, Is there a way to specify a host name on which we want to run our application master. Can we do this when it is being launched from the YarnClient? Thanks, Kishore
Re: Are mapper classes re-instantiated for each record?
Hi Jeremy, According to official documentation http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html setup and cleanup calls performed for each InputSplit. In this case you variant 2 is more correct. But actually single mapper can be used for processing multiple InputSplits. In you case if you have 5 files with 1 record each it can call setup/cleanup 5 times. But if your records are in single file I think that setup/cleanup should be called once. -- Thanks, Sergey On 06/05/14 02:49, jeremy p wrote: Let's say I have TaskTracker that receives 5 records to process for a single job. When the TaskTracker processses the first record, it will instantiate my Mapper class and execute my setup() function. It will then run the map() method on that record. My question is this : what happens when the map() method has finished processing the first record? I'm guessing it will do one of two things : 1) My cleanup() function will execute. After the cleanup() method has finished, this instance of the Mapper object will be destroyed. When it is time to process the next record, a new Mapper object will be instantiated. Then my setup() method will execute, the map() method will execute, the cleanup() method will execute, and then the Mapper instance will be destroyed. When it is time to process the next record, a new Mapper object will be instantiated. This process will repeat itself until all 5 records have been processed. In other words, my setup() and cleanup() methods will have been executed 5 times each. or 2) When the map() method has finished processing my first record, the Mapper instance will NOT be destroyed. It will be reused for all 5 records. When the map() method has finished processing the last record, my cleanup() method will execute. In other words, my setup() and cleanup() methods will only execute 1 time each. Thanks for the help! signature.asc Description: OpenPGP digital signature