Add keys to column family in HBase using Python

2015-04-15 Thread Manoj Venkatesh
Dear Hadoop experts,

I have a Hadoop cluster which has Hive, HBase installed along with other Hadoop 
components.  I am currently exploring ways to automate a data migration process 
from Hive to HBase which involves new columns of data added ever so often.  I 
was successful in creating a HBase table using Hive and load data into the 
HBase table, on these lines I tried to add columns to the HBase table(from 
Hive) using the alter table syntax and I got the error message, ALTER TABLE 
cannot be used for a non-native table temp_testing.

As an alternative to this I am also trying to do this programmatically using 
Python, I have explored the libraries 
HappyBasehttps://happybase.readthedocs.org/en/latest/index.html and 
starbasehttp://pythonhosted.org//starbase/. These libraries provide 
functionality for creating, deleting and other features but none of these 
provide an option to add a key to a column family. Does anybody know of a 
better way of achieving this with Python, say libraries or through other means.

Thanks in advance,
Manoj

The information transmitted in this email is intended only for the person or 
entity to which it is addressed, and may contain material confidential to Xoom 
Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, 
retransmission, dissemination or other use of, or taking of any action in 
reliance upon, this information by persons or entities other than the intended 
recipient(s) is prohibited. If you received this email in error, please contact 
the sender and delete the material from your files.


NodeManager was not connected to its ResourceManager

2015-02-25 Thread Manoj Venkatesh
Dear Hadoop experts,

Firstly, thank you all for answering my previous question(s).

Now, I have a hadoop cluster of 8 nodes, we use YARN


The information transmitted in this email is intended only for the person or 
entity to which it is addressed, and may contain material confidential to Xoom 
Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, 
retransmission, dissemination or other use of, or taking of any action in 
reliance upon, this information by persons or entities other than the intended 
recipient(s) is prohibited. If you received this email in error, please contact 
the sender and delete the material from your files.


Re: hadoop cluster with non-uniform disk spec

2015-02-11 Thread Manoj Venkatesh
I had a similar question recently.
Please check out balancer 
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer
  this will balance the data across the nodes.

- Manoj

From: Chen Song chen.song...@gmail.commailto:chen.song...@gmail.com
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Wednesday, February 11, 2015 at 7:44 AM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: hadoop cluster with non-uniform disk spec

We have a hadoop cluster consisting of 500 nodes. But the nodes are not uniform 
in term of disk spaces. Half of the racks are newer with 11 volumes of 1.1T on 
each node, while the other half have 5 volume of 900GB on each node.

dfs.datanode.fsdataset.volume.choosing.policy is set to 
org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy.

It winds up with the state of half of nodes are full while the other half 
underutilized. I am wondering if there is a known solution for this problem.

Thank you for any suggestions.

--
Chen Song


The information transmitted in this email is intended only for the person or 
entity to which it is addressed, and may contain material confidential to Xoom 
Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, 
retransmission, dissemination or other use of, or taking of any action in 
reliance upon, this information by persons or entities other than the intended 
recipient(s) is prohibited. If you received this email in error, please contact 
the sender and delete the material from your files.


Re: Adding datanodes to Hadoop cluster - Will data redistribute?

2015-02-09 Thread Manoj Venkatesh
Thank you all for answering, the hdfs balancer worked. Now the datanodes 
capacity is more or less equally balanced.

Regards,
Manoj

From: Arpit Agarwal aagar...@hortonworks.commailto:aagar...@hortonworks.com
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Friday, February 6, 2015 at 3:07 PM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Hi Manoj,

Existing data is not automatically redistributed when you add new DataNodes. 
Take a look at the 'hdfs balancer' command which can be run as a separate 
administrative tool to rebalance data distribution across DataNodes.


From: Manoj Venkatesh manove...@gmail.commailto:manove...@gmail.com
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Friday, February 6, 2015 at 11:34 AM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 
additional nodes were added later to increase disk and CPU capacity. What i see 
is that processing is shared amongst all the nodes whereas the storage is 
reaching capacity on the original 6 nodes whereas the newly added machines have 
relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data so 
that all the nodes are equally utilized. I have checked for the configuration 
parameter - dfs.datanode.fsdataset.volume.choosing.policy have options 'Round 
Robin' or 'Available Space', are there any other configurations which need to 
be reviewed.

Thanks,
Manoj

The information transmitted in this email is intended only for the person or 
entity to which it is addressed, and may contain material confidential to Xoom 
Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, 
retransmission, dissemination or other use of, or taking of any action in 
reliance upon, this information by persons or entities other than the intended 
recipient(s) is prohibited. If you received this email in error, please contact 
the sender and delete the material from your files.


Adding datanodes to Hadoop cluster - Will data redistribute?

2015-02-06 Thread Manoj Venkatesh
Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
and 2 additional nodes were added later to increase disk and CPU capacity.
What i see is that processing is shared amongst all the nodes whereas the
storage is reaching capacity on the original 6 nodes whereas the newly
added machines have relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data
so that all the nodes are equally utilized. I have checked for the
configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
have options 'Round Robin' or 'Available Space', are there any other
configurations which need to be reviewed.

Thanks,
Manoj