Add keys to column family in HBase using Python
Dear Hadoop experts, I have a Hadoop cluster which has Hive, HBase installed along with other Hadoop components. I am currently exploring ways to automate a data migration process from Hive to HBase which involves new columns of data added ever so often. I was successful in creating a HBase table using Hive and load data into the HBase table, on these lines I tried to add columns to the HBase table(from Hive) using the alter table syntax and I got the error message, ALTER TABLE cannot be used for a non-native table temp_testing. As an alternative to this I am also trying to do this programmatically using Python, I have explored the libraries HappyBasehttps://happybase.readthedocs.org/en/latest/index.html and starbasehttp://pythonhosted.org//starbase/. These libraries provide functionality for creating, deleting and other features but none of these provide an option to add a key to a column family. Does anybody know of a better way of achieving this with Python, say libraries or through other means. Thanks in advance, Manoj The information transmitted in this email is intended only for the person or entity to which it is addressed, and may contain material confidential to Xoom Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. If you received this email in error, please contact the sender and delete the material from your files.
cluster utiliation when using fair scheduler
Hi, I'm using the fair scheduler for Yarn. I have not specified any pools, so the fair-scheduler.xml is basically empty. However, only one third of the cluster is utilized. On the scheduler page I see a single Queue which is root and it is specified that 33.3% used This 33.3% is independent of the number of jobs. Currently, I have a single job running and it is claimed that for this Application pipeline: Used Resources: memory:196608, vCores:48 Num Active Applications:21 Num Pending Applications: 0 Min Resources: memory:0, vCores:0 Max Resources: memory:589824, vCores:48 Fair Share: memory:45372, vCores:0 The number of vCores appears to be to low as well. The number of cores should be 6x16. Any suggestions what to check? Cheers Ingo
Restriction of disk space on HDFS
Hi Guys Quick question - Using the fair scheduler we can restrict access to map tasks, reduce tasks and overall system resources for each queue. Using the same mechanism we don't see a any parameter to allocate disk usage by the queues. Can you pls let us know if there is there a way to do this in CDH5? Thanks and Regards, Vijayadarshan Reddy TO-Core BI Technology DBS Bank Ltd Email : vijayadars...@dbs.com Mobile : +65 83157090 CONFIDENTIAL NOTE: The information contained in this email is intended only for the use of the individual or entity named above and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete the mail. Thank you.
RE: Mapreduce job got stuck
Hi Vandana From the configurations, it looks like none of the NodeManagers are registered with RM because of configuration “yarn.resourcemanager.resource- tracker.address” issue. May be you can confirm any NM’s are registered with RM. In the below, there is space after “resource-“ but “resource-tracker” is single without any space. Check after removing space. nameyarn.resourcemanager.resource- tracker.address/name Similarly I see same issue in “yarn.nodemanager.aux- services.mapreduce.shuffle.class” where space after “aux-”!!! Hope it helps you to resolve issue Thanks Regards Rohith Sharma K S From: Vandana kumari [mailto:kvandana1...@gmail.com] Sent: 15 April 2015 15:33 To: user@hadoop.apache.org Subject: Mapreduce job got stuck i had setup a 3 node hadoop cluster on centos 6.5 but nodemanager is not running on master and is running on slave nodes. Alse when i submit a job then job get stuck. the same job runs well on sinle node setup. I am unable to figure out the problem. Attaching all the configuration files. Any help will be highly appreciated. -- Thanks and regards Vandana kumari
RE: Change in fair-scheduler.xml
Hi 1 - Is there a document on what should be the default settings in the XML file for say 96 GB.. 48 core system with say 4/queues? You can refer below the doc for configuring fair scheduler http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html 2 - When we change the file does the yarn service need to be bounced for the changed values to get reflected? Yarn admin supports runtime refresh queues without restarting ResourceManager. It can be achieved by using “$HADOOP_HOME/bin/yarn rmadmin –refreshQueues” CLI command. Thanks Regards Rohith Sharma K S From: Manish Maheshwari [mailto:mylogi...@gmail.com] Sent: 15 April 2015 15:43 To: user@hadoop.apache.org Subject: Change in fair-scheduler.xml Hi, We are trying to change properties of fair scheduler settings. 1 - Is there a document on what should be the default settings in the XML file for say 96 GB.. 48 core system with say 4/queues? 2 - When we change the file does the yarn service need to be bounced for the changed values to get reflected? Thanks Manish
Re: Restriction of disk space on HDFS
HDFS and job scheduling queues are entirely different systems. HDFS disk quota are set at a directory level and then you can intern limit the permissions of that directory to a group and then it indirectly means this group has this much dist quota On Wed, Apr 15, 2015 at 3:55 PM, Vijayadarshan REDDY vijayadars...@dbs.com wrote: Hi Guys Quick question - Using the fair scheduler we can restrict access to map tasks, reduce tasks and overall system resources for each queue. Using the same mechanism we don’t see a any parameter to allocate disk usage by the queues. Can you pls let us know if there is there a way to do this in CDH5? *Thanks and Regards,* *Vijayadarshan Reddy* *TO-Core BI Technology* *DBS Bank Ltd* *Email : vijayadars...@dbs.com vijayadars...@dbs.com* *Mobile : +65 83157090 %2B65%2083157090* CONFIDENTIAL NOTE: The information contained in this email is intended only for the use of the individual or entity named above and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete the mail. Thank you. -- Nitin Pawar
Re: Mapreduce job got stuck
i had attached nodemanager log of master file and modified yarn-site.xml file On Wed, Apr 15, 2015 at 6:21 AM, Rohith Sharma K S rohithsharm...@huawei.com wrote: Hi Vandana From the configurations, it looks like none of the NodeManagers are registered with RM because of configuration “*yarn.resourcemanager.resource- tracker.address*” issue. May be you can confirm any NM’s are registered with RM. In the below, there is space after “*resource-“ but “resource-tracker” *is single without any space. Check after removing space. *nameyarn.resourcemanager.resource- tracker.address/name* *Similarly I see same issue in “yarn.nodemanager.aux- services.mapreduce.shuffle.class” where space after “aux-”!!!* *Hope it helps you to resolve issue* Thanks Regards Rohith Sharma K S *From:* Vandana kumari [mailto:kvandana1...@gmail.com] *Sent:* 15 April 2015 15:33 *To:* user@hadoop.apache.org *Subject:* Mapreduce job got stuck i had setup a 3 node hadoop cluster on centos 6.5 but nodemanager is not running on master and is running on slave nodes. Alse when i submit a job then job get stuck. the same job runs well on sinle node setup. I am unable to figure out the problem. Attaching all the configuration files. Any help will be highly appreciated. -- Thanks and regards Vandana kumari -- Thanks and regards Vandana kumari 2015-04-15 17:15:52,022 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: STARTUP_MSG: / STARTUP_MSG: Starting NodeManager STARTUP_MSG: host = kirti/172.17.14.22 STARTUP_MSG: args = [] STARTUP_MSG: version = 2.2.0 STARTUP_MSG: classpath =
RE: Restriction of disk space on HDFS
Please refer https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YarnComma nds.html#resourcemanager Best regards, Nair From: Vijayadarshan REDDY [mailto:vijayadars...@dbs.com] Sent: Wednesday, April 15, 2015 6:25 AM To: user@hadoop.apache.org Subject: Restriction of disk space on HDFS Hi Guys Quick question - Using the fair scheduler we can restrict access to map tasks, reduce tasks and overall system resources for each queue. Using the same mechanism we don't see a any parameter to allocate disk usage by the queues. Can you pls let us know if there is there a way to do this in CDH5? Thanks and Regards, Vijayadarshan Reddy TO-Core BI Technology DBS Bank Ltd Email : vijayadars...@dbs.com mailto:vijayadars...@dbs.com Mobile : +65 83157090 CONFIDENTIAL NOTE: The information contained in this email is intended only for the use of the individual or entity named above and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete the mail. Thank you.
Re: Mapreduce job got stuck
Please check the error logs. and send the logs. On Wed, Apr 15, 2015 at 3:33 PM, Vandana kumari kvandana1...@gmail.com wrote: nodemanager *Warm Regards,* *Shashwat Shriparv* *http://bit.ly/14cHpad http://bit.ly/14cHpad * *http://goo.gl/rxz0z8 http://goo.gl/rxz0z8* *http://goo.gl/RKyqO8 http://goo.gl/RKyqO8* [image: https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9] https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9[image: https://twitter.com/shriparv] https://twitter.com/shriparv[image: https://www.facebook.com/shriparv] https://www.facebook.com/shriparv[image: http://google.com/+ShashwatShriparv] http://google.com/+ShashwatShriparv[image: http://www.youtube.com/user/sShriparv/videos] http://www.youtube.com/user/sShriparv/videos[image: http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] shrip...@yahoo.com
Mapreduce job got stuck
i had setup a 3 node hadoop cluster on centos 6.5 but nodemanager is not running on master and is running on slave nodes. Alse when i submit a job then job get stuck. the same job runs well on sinle node setup. I am unable to figure out the problem. Attaching all the configuration files. Any help will be highly appreciated. -- Thanks and regards Vandana kumari ?xml version=1.0 encoding=UTF-8? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Licensed under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. -- !-- Put site-specific property overrides in this file. -- configuration property namefs.default.name/name valuehdfs://kirti:9000/value /property property namehadoop.tmp.dir/name value/tmp/hadoop-hadoopuser/value /property /configuration ?xml version=1.0 encoding=UTF-8? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Licensed under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. -- !-- Put site-specific property overrides in this file. -- configuration property namedfs.replication/name value3/value /property property namedfs.name.dir/name valuefile:///home/hadoopuser/hadoopspace/hdfs/namenode/value /property property namedfs.data.dir/name valuefile:///home/hadoopuser/hadoopspace/hdfs/datanode/value /property /configuration ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Licensed under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. -- !-- Put site-specific property overrides in this file. -- configuration property namemapreduce.framework.name/name valueyarn/value /property property namemapreduce.jobtracker.address/name valuekirti:54311/value /property property namemapreduce.jobtracker.http.address/name value0.0.0.0:50030/value /property property namemapreduce.jobhistory.address/name value0.0.0.0:10020/value /property property namemapreduce.jobhistory.webapp.address/name value0.0.0.0:19888/value /property /configuration ?xml version=1.0? !-- Licensed under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. -- configuration !-- Site specific YARN configuration properties -- property nameyarn.nodemanager.aux-services/name valuemapreduce_shuffle/value /property property nameyarn.nodemanager.aux- services.mapreduce.shuffle.class/name valueorg.apache.hadoop.mapred.ShuffleHandler/value /property property nameyarn.resourcemanager.resource- tracker.address/name valuekirti:8025/value /property property nameyarn.resourcemanager.scheduler.address/name valuekirti:8030/value /property property nameyarn.resourcemanager.address/name valuekirti:8040/value /property /configuration
Change in fair-scheduler.xml
Hi, We are trying to change properties of fair scheduler settings. 1 - Is there a document on what should be the default settings in the XML file for say 96 GB.. 48 core system with say 4/queues? 2 - When we change the file does the yarn service need to be bounced for the changed values to get reflected? Thanks Manish
Re: Mapreduce job got stuck
What is your yarn.nodemanager.address address ? *Warm Regards,* *Shashwat Shriparv* *http://bit.ly/14cHpad http://bit.ly/14cHpad * *http://goo.gl/rxz0z8 http://goo.gl/rxz0z8* *http://goo.gl/RKyqO8 http://goo.gl/RKyqO8* [image: https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9] https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9[image: https://twitter.com/shriparv] https://twitter.com/shriparv[image: https://www.facebook.com/shriparv] https://www.facebook.com/shriparv[image: http://google.com/+ShashwatShriparv] http://google.com/+ShashwatShriparv[image: http://www.youtube.com/user/sShriparv/videos] http://www.youtube.com/user/sShriparv/videos[image: http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] shrip...@yahoo.com On Wed, Apr 15, 2015 at 3:42 PM, shashwat shriparv dwivedishash...@gmail.com wrote: Please check the error logs. and send the logs. On Wed, Apr 15, 2015 at 3:33 PM, Vandana kumari kvandana1...@gmail.com wrote: nodemanager *Warm Regards,* *Shashwat Shriparv* *http://bit.ly/14cHpad http://bit.ly/14cHpad * *http://goo.gl/rxz0z8 http://goo.gl/rxz0z8* *http://goo.gl/RKyqO8 http://goo.gl/RKyqO8* [image: https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9] https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9[image: https://twitter.com/shriparv] https://twitter.com/shriparv[image: https://www.facebook.com/shriparv] https://www.facebook.com/shriparv[image: http://google.com/+ShashwatShriparv] http://google.com/+ShashwatShriparv[image: http://www.youtube.com/user/sShriparv/videos] http://www.youtube.com/user/sShriparv/videos[image: http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] shrip...@yahoo.com
RE: Mapreduce job got stuck
Hi, On master machine, NodeManager is not running because of “Caused by: java.net.BindException: Problem binding to [kirti:8040], got from logs. The port 8040 is in use!!! Configure available port number. Thanks Regards Rohith Sharma K S From: Vandana kumari [mailto:kvandana1...@gmail.com] Sent: 15 April 2015 16:29 To: user@hadoop.apache.org; Rohith Sharma K S Subject: Re: Mapreduce job got stuck When i made the changes as specified by Rohith, my job is running but it runs only on slave nodes(amit yashbir) not on master node(kirti) and still no nodemanager is running on master node. On Wed, Apr 15, 2015 at 6:39 AM, Vandana kumari kvandana1...@gmail.commailto:kvandana1...@gmail.com wrote: i had attached nodemanager log of master file and modified yarn-site.xml file On Wed, Apr 15, 2015 at 6:21 AM, Rohith Sharma K S rohithsharm...@huawei.commailto:rohithsharm...@huawei.com wrote: Hi Vandana From the configurations, it looks like none of the NodeManagers are registered with RM because of configuration “yarn.resourcemanager.resource- tracker.address” issue. May be you can confirm any NM’s are registered with RM. In the below, there is space after “resource-“ but “resource-tracker” is single without any space. Check after removing space. nameyarn.resourcemanager.resource- tracker.address/name Similarly I see same issue in “yarn.nodemanager.aux- services.mapreduce.shuffle.class” where space after “aux-”!!! Hope it helps you to resolve issue Thanks Regards Rohith Sharma K S From: Vandana kumari [mailto:kvandana1...@gmail.commailto:kvandana1...@gmail.com] Sent: 15 April 2015 15:33 To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Mapreduce job got stuck i had setup a 3 node hadoop cluster on centos 6.5 but nodemanager is not running on master and is running on slave nodes. Alse when i submit a job then job get stuck. the same job runs well on sinle node setup. I am unable to figure out the problem. Attaching all the configuration files. Any help will be highly appreciated. -- Thanks and regards Vandana kumari -- Thanks and regards Vandana kumari -- Thanks and regards Vandana kumari