Add keys to column family in HBase using Python
Dear Hadoop experts, I have a Hadoop cluster which has Hive, HBase installed along with other Hadoop components. I am currently exploring ways to automate a data migration process from Hive to HBase which involves new columns of data added ever so often. I was successful in creating a HBase table using Hive and load data into the HBase table, on these lines I tried to add columns to the HBase table(from Hive) using the alter table syntax and I got the error message, ALTER TABLE cannot be used for a non-native table temp_testing. As an alternative to this I am also trying to do this programmatically using Python, I have explored the libraries HappyBasehttps://happybase.readthedocs.org/en/latest/index.html and starbasehttp://pythonhosted.org//starbase/. These libraries provide functionality for creating, deleting and other features but none of these provide an option to add a key to a column family. Does anybody know of a better way of achieving this with Python, say libraries or through other means. Thanks in advance, Manoj The information transmitted in this email is intended only for the person or entity to which it is addressed, and may contain material confidential to Xoom Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. If you received this email in error, please contact the sender and delete the material from your files.
NodeManager was not connected to its ResourceManager
Dear Hadoop experts, Firstly, thank you all for answering my previous question(s). Now, I have a hadoop cluster of 8 nodes, we use YARN The information transmitted in this email is intended only for the person or entity to which it is addressed, and may contain material confidential to Xoom Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. If you received this email in error, please contact the sender and delete the material from your files.
Re: hadoop cluster with non-uniform disk spec
I had a similar question recently. Please check out balancer http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer this will balance the data across the nodes. - Manoj From: Chen Song chen.song...@gmail.commailto:chen.song...@gmail.com Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Wednesday, February 11, 2015 at 7:44 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: hadoop cluster with non-uniform disk spec We have a hadoop cluster consisting of 500 nodes. But the nodes are not uniform in term of disk spaces. Half of the racks are newer with 11 volumes of 1.1T on each node, while the other half have 5 volume of 900GB on each node. dfs.datanode.fsdataset.volume.choosing.policy is set to org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy. It winds up with the state of half of nodes are full while the other half underutilized. I am wondering if there is a known solution for this problem. Thank you for any suggestions. -- Chen Song The information transmitted in this email is intended only for the person or entity to which it is addressed, and may contain material confidential to Xoom Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. If you received this email in error, please contact the sender and delete the material from your files.
Re: Adding datanodes to Hadoop cluster - Will data redistribute?
Thank you all for answering, the hdfs balancer worked. Now the datanodes capacity is more or less equally balanced. Regards, Manoj From: Arpit Agarwal aagar...@hortonworks.commailto:aagar...@hortonworks.com Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Friday, February 6, 2015 at 3:07 PM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Re: Adding datanodes to Hadoop cluster - Will data redistribute? Hi Manoj, Existing data is not automatically redistributed when you add new DataNodes. Take a look at the 'hdfs balancer' command which can be run as a separate administrative tool to rebalance data distribution across DataNodes. From: Manoj Venkatesh manove...@gmail.commailto:manove...@gmail.com Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Friday, February 6, 2015 at 11:34 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Adding datanodes to Hadoop cluster - Will data redistribute? Dear Hadoop experts, I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 additional nodes were added later to increase disk and CPU capacity. What i see is that processing is shared amongst all the nodes whereas the storage is reaching capacity on the original 6 nodes whereas the newly added machines have relatively large amount of storage still unoccupied. I was wondering if there is an automated or any way of redistributing data so that all the nodes are equally utilized. I have checked for the configuration parameter - dfs.datanode.fsdataset.volume.choosing.policy have options 'Round Robin' or 'Available Space', are there any other configurations which need to be reviewed. Thanks, Manoj The information transmitted in this email is intended only for the person or entity to which it is addressed, and may contain material confidential to Xoom Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. If you received this email in error, please contact the sender and delete the material from your files.
Adding datanodes to Hadoop cluster - Will data redistribute?
Dear Hadoop experts, I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 additional nodes were added later to increase disk and CPU capacity. What i see is that processing is shared amongst all the nodes whereas the storage is reaching capacity on the original 6 nodes whereas the newly added machines have relatively large amount of storage still unoccupied. I was wondering if there is an automated or any way of redistributing data so that all the nodes are equally utilized. I have checked for the configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy* have options 'Round Robin' or 'Available Space', are there any other configurations which need to be reviewed. Thanks, Manoj