Re: REST Vs RPC
Almost always RPC. Its more optimized for this use case Regards,Dhaval From: Jignesh Patel jigneshmpa...@gmail.com To: user@hbase.apache.org Sent: Monday, 17 November 2014 12:05 PM Subject: REST Vs RPC Which one is faster and better 1. REST 2. RPC I am not looking in a context of technology independence, but if we are using java as a client, which is more robust.
Re: Duplicate Value Inserts in HBase
You can achieve what you want using versions and some hackery with timestamps Sent from my T-Mobile 4G LTE Device Original message From: Jean-Marc Spaggiari jean-m...@spaggiari.org Date:10/21/2014 9:02 AM (GMT-05:00) To: user user@hbase.apache.org Cc: Subject: Re: Duplicate Value Inserts in HBase You can do check and puts to validate if value is already there, but it's slower... 2014-10-21 8:50 GMT-04:00 Krishna Kalyan krishnakaly...@gmail.com: Thanks Jean, If i put the same value in my table for a particular column for a rowkey i want HBase reject this value and retain old value with old time stamp. In other words update only when value changes. Regards, Krishna On Tue, Oct 21, 2014 at 6:02 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Krishna, HBase will store them in the same row, same cell but you will have 2 versions. If you want to keep just one, setup the version=1 on the table side and only one will be stored. Is that what yo mean? JM 2014-10-21 8:29 GMT-04:00 Krishna Kalyan krishnakaly...@gmail.com: Hi, I have a HBase table which is populated from pig using PigStorage. While inserting, suppose for rowkey i have a duplicate value. Is there a way to prevent an update?. I want to maintain the version history for my values which are unique. Regards, Krishna
Re: Connection pool Concurrency in HBase
HConnection connection = HConnectionManager.createConnection(config); will give you the shared HConnection. Do not close the connection object until all your threads are done using it. In your use case you should not close it when you close the table since other threads may be using it or may need to use it in the future Regards, Dhaval From: Serega Sheypak serega.shey...@gmail.com To: user@hbase.apache.org Sent: Monday, 4 August 2014 1:44 PM Subject: Connection pool Concurrency in HBase Hi, I'm trying to understand how does connection pooling works in HBase. I've seen that https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html is recommended to use. I have a servlet, it's instance shaed among many threads. What is a good way to use connection pooling in this case? Is this HConnection connection = HConnectionManager.createConnection(config); HTableInterface table = connection.getTable(table1); try { // Use the table as needed, for a single operation and a single thread } finally { table.close(); connection.close(); } 1. enough to reuse connection and they wouldn't be opened each time? 2. why do I have to close ALL: table and connection? It's done by design?
Re: hbase cluster working bad
We just solved a very similar issue with our cluster (yesterday!). I would suggest you look at 2 things in particular: - Is the network on your region server saturated? That would prevent connections from being made - See if the region server has any RPC handlers available when you get this error. Its possible that all RPC handlers are busy servicing other requests (or stuck due to a combination of load and bad configs). Regards, Dhaval From: Павел Мезенцев pa...@mezentsev.org To: user@hbase.apache.org Sent: Tuesday, 22 July 2014 7:46 AM Subject: Re: hbase cluster working bad Jobs, running on this cluster, print exceptions: java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: Call to ds-hadoop-wk01p.tcsbank.ru/10.218.64.11:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.218.64.14:38621 remote= ds-hadoop-wk01p.tcsbank.ru/10.218.64.11:60020] at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1569) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1421) at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:739) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:708) at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:367) at ru.tcsbank.hbase.HBasePersonDao.getUsersBatch(HBasePersonDao.java:306) at ru.tcsbank.matching.PersonMatcher.performSolrRequest(PersonMatcher.java:153) at ru.tcsbank.matching.PersonMatcher.search(PersonMatcher.java:135) at ru.tcsbank.personmatcher.mr.PersonMatcherJob$MapClass.map(PersonMatcherJob.java:80) at ru.tcsbank.personmatcher.mr.PersonMatcherJob$MapClass.map(PersonMatcherJob.java:65) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) С уважением, Мезенцев Павел 2014-07-22 14:59 GMT+04:00 Павел Мезенцев pa...@mezentsev.org: Hello all! We have a trouble with hbase Our hadoop cluster has 4 nodes (plus 1 client node). There are CHD 4.6 + CM 4.7 hadoop installed Hadoop versions are: - hadoop-hdfs : 2.0.0+1475 - hadoop-0.20-mapreduce : 2.0.0+1475 - hbase : 0.94.6+132 Hadoop and hBase configs are in attachment We have several tables in hbase with total volume of 2 Tb. We run mapReduce ETL jobs and analytics queries over them. There are a lot of warnings like - *The health test result for REGION_SERVER_READ_LATENCY has become bad: The moving average of HDFS read latency is 162 millisecond(s) over the previous 5 minute(s). Critical threshold: 100*. - *The health test result for REGION_SERVER_SYNC_LATENCY has become bad: The moving average of HDFS sync latency is 8.2 second(s) over the previous 5 minute(s). Critical threshold: 5,000*. *- HBase region health: 442 unhealthy regions * *- HDFS_DATA_NODES_HEALTHY has become bad* *- HBase Region Health Canary is running slowly **on the cluster* mapReduce jobs over hBase with random queries to hBase working very slowly (job is completed on 20% after 18 hours versus 100% after 12 hours on analogue cluster) Please help use to solve reasons of this alerts and speed up the cluster. Could you give us a good advise, what shall we do? Cheers, Mezentsev Pavel
Re: multiple region servers at one machine
Its certainly possible (atleast with command line) but probably very messy. You will need to have different ports, different log files, different pid files, possibly even different configs on the same machine. Regards, Dhaval From: Jane Tao jiao@oracle.com To: user@hbase.apache.org Sent: Wednesday, 16 July 2014 6:06 PM Subject: multiple region servers at one machine Hi there, Is it possible to run multiple region servers at one machine/node? If this is possible, how to start multiple region servers with command lines or cloudera manager? Thanks, Jane --
Re: HBase cluster design
A few things pop out to me on cursory glance: - You are using CMSIncrementalMode which after a long chain of events has a tendency to result in the famous Juliet pause of death. Can you try Par New GC instead and see if that helps? - You should try to reduce the CMSInitiatingOccupancyFraction to avoid a full GC - Your hbase-env.sh is not setting the Xmx at all. Do you know how much RAM you are giving to your region servers? It may be too small or too large given your use case and machines size - Your client scanner caching is 1 which may be too small depending on your row sizes. You can also override that setting in your scan for the MR job - You only have 2 zookeeper instances which is not at all recommended. Zookeeper needs a quorum to operate and generally works best with an odd number of zookeeper servers. This probably isn't related to your crashes but it would help stability if you had 1 or 3 zookeepers - I am not 100% sure if the version of hbase you are using has mslab enabled. If not you should enable it. - You can try increasing/decreasing the amount of RAM you provide to block caches and memstores to suit your use case. I see that you are using the defaults here On top of these, when you kick off your MR job to scan HBase you should setCacheBlocks to false Regards, Dhaval From: Flavio Pompermaier pomperma...@okkam.it To: user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in Sent: Friday, 23 May 2014 3:16 AM Subject: Re: HBase cluster design The hardware specs are: 4 nodes with 48g RAM, 24 cores and 1 TB disk each server Attached my hbase config files. Thanks, Flavio On Fri, May 23, 2014 at 3:33 AM, Dhaval Shah prince_mithi...@yahoo.co.in wrote: Can you share your hbase-env.sh and hbase-site.xml? And hardware specs of your cluster? Regards, Dhaval From: Flavio Pompermaier pomperma...@okkam.it To: user@hbase.apache.org Sent: Saturday, 17 May 2014 2:49 AM Subject: Re: HBase cluster design Could you tell me please in detail the parameters you'd like to see so i can look for them and learn the important ones?i'm using cloudera, cdh4 in one cluster and cdh5 in the other. Best, Flavio On May 17, 2014 2:48 AM, prince_mithi...@yahoo.co.in prince_mithi...@yahoo.co.in wrote: Can you describe your setup in more detail? Specifically the amount of heap hbase region servers have and your GC settings. Is your server swapping when your MR obs are running? Also do your regions go down or your region servers? We run many MR jobs simultaneously on our hbase tables (size is in TBs) along with serving real time requests at the same time. So I can vouch for the fact that a well tuned hbase cluster definitely supports this use case (well-tuned is the key word here) Sent from Yahoo Mail on Android
Re: HBase cluster design
Can you share your hbase-env.sh and hbase-site.xml? And hardware specs of your cluster? Regards, Dhaval From: Flavio Pompermaier pomperma...@okkam.it To: user@hbase.apache.org Sent: Saturday, 17 May 2014 2:49 AM Subject: Re: HBase cluster design Could you tell me please in detail the parameters you'd like to see so i can look for them and learn the important ones?i'm using cloudera, cdh4 in one cluster and cdh5 in the other. Best, Flavio On May 17, 2014 2:48 AM, prince_mithi...@yahoo.co.in prince_mithi...@yahoo.co.in wrote: Can you describe your setup in more detail? Specifically the amount of heap hbase region servers have and your GC settings. Is your server swapping when your MR obs are running? Also do your regions go down or your region servers? We run many MR jobs simultaneously on our hbase tables (size is in TBs) along with serving real time requests at the same time. So I can vouch for the fact that a well tuned hbase cluster definitely supports this use case (well-tuned is the key word here) Sent from Yahoo Mail on Android
Re: SCDynamicStore
I don't think its an error. Its an annoying warning message but does not affect functionality Regards, Dhaval From: Fabrice fchap...@ip-worldcom.ch To: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, 19 March 2014 10:51 AM Subject: SCDynamicStore Good afternoon, I'm new with hbase. I have installed the product and start hbase deamon, however when I create a table, the system send me this error message: Unable to load realm info from SCDynamicStore Somebody knows what this message means and how can could it be solved? Thanks for your help Fabrice Chapuis
Re: Need some information over WAL
Inline Regards, Dhaval From: Upendra Yadav upendra1...@gmail.com To: user@hbase.apache.org Sent: Tuesday, 25 February 2014 1:00 PM Subject: Need some information over WAL I have also doubt over WAL(write ahead log). In hdfs we can write a new file or we can append to old file. Is that correct : 1. WAL only logs operations not its data... just like disk journaling(ext4) - No WAL is a log of all new data, not just the operations 2. In case of Region Server failure... WAL replay will depends on Client to get each operation's data that yet not committed/flushed to hdfs. - No client will not generally know of a region server failure. It gets the data from the WAL and replays it. The client may not even exist when a region server crashes 3. Hbase uses append operation in hdfs to store/log WAL. - Yes Thanks...
Re: Need some information over WAL
yes WAL is a single and common file for all regions on a region server. Yes to 1. Re 2: HBase will roll over WAL files and eventually delete them when they are no longer needed. Regards, Dhaval From: Upendra Yadav upendra1...@gmail.com To: user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in Sent: Tuesday, 25 February 2014 1:18 PM Subject: Re: Need some information over WAL Thanks Dhawal Oh... whatever i assumed with reading documents (partially) that was wrong... With ur answer... i have another questions... WAL is a single and common file for all region of a Region server. Is this correct: 1. So all operation(including data) will go to WAL... operation by operation 2. Then on HDFS, HBase will have to perform delete operation on some set of WAL files always... On Tue, Feb 25, 2014 at 11:35 PM, Dhaval Shah prince_mithi...@yahoo.co.inwrote: Inline Regards, Dhaval From: Upendra Yadav upendra1...@gmail.com To: user@hbase.apache.org Sent: Tuesday, 25 February 2014 1:00 PM Subject: Need some information over WAL I have also doubt over WAL(write ahead log). In hdfs we can write a new file or we can append to old file. Is that correct : 1. WAL only logs operations not its data... just like disk journaling(ext4) - No WAL is a log of all new data, not just the operations 2. In case of Region Server failure... WAL replay will depends on Client to get each operation's data that yet not committed/flushed to hdfs. - No client will not generally know of a region server failure. It gets the data from the WAL and replays it. The client may not even exist when a region server crashes 3. Hbase uses append operation in hdfs to store/log WAL. - Yes Thanks...
Re: RegionServer unable to connect to master
Do you have a firewall between the master and the slaves? Regards, Dhaval From: Fernando Iwamoto - Plannej fernando.iwam...@plannej.com.br To: user@hbase.apache.org Sent: Wednesday, 29 January 2014 3:11 PM Subject: Re: RegionServer unable to connect to master Iam new to HBASE too, but I had same problem long time ago and I dont remember how i fixed, I will keep troubleshooting you... How about zookeeper? have you uncommented the HBASE_MANAGE_ZK(something like that) in hbase-env.sh and set to TRUE? 2014-01-29 Guang Gao birdeey...@gmail.com You mean the SSH key? Yes, any two nodes can ssh each other without password. On Wed, Jan 29, 2014 at 2:10 PM, Fernando Iwamoto - Plannej fernando.iwam...@plannej.com.br wrote: Did you tried to pass the key to the machines? 2014-01-29 birdeeyore birdeey...@gmail.com Thanks for your reply. Here's some additional info. Thanks. $ cat hbase-site.xml configuration property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.rootdir/name valuehdfs://obelix8.local:9001/hbase/value /property property namehbase.zookeeper.quorum/name valueobelix105.local,obelix106.local,obelix107.local/value /property property namehbase.zookeeper.property.clientPort/name value2183/value /property property namehbase.zookeeper.peerport/name value2890/value /property property namehbase.zookeeper.leaderport/name value3890/value /property property namehbase.zookeeper.property.dataDir/name value/ssd/hbase/hbase-0.94.16/zookeeper/value /property property namehbase.master/name valueobelix8.local:6/value /property property namehbase.master.info.port/name value50070/value /property property namehbase.client.scanner.caching/name value200/value /property /configuration == $ cat regionservers obelix105.local obelix106.local obelix107.local obelix108.local obelix109.local obelix110.local obelix111.local obelix112.local obelix113.local obelix114.local = On my master node: $ cat /etc/hosts 127.0.0.1 localhost 192.168.245.8 obelix8.local xx.yy.net obelix8 # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters 192.168.245.1 obelix.local === On one of my slave nodes: $ cat /etc/hosts 127.0.0.1 localhost 127.0.1.1 obelix105.local xx.yy.net obelix105 # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters 192.168.245.1 obelix.local == The error of HBase 0.94.16+Hadoop 1.2.1: 2014-01-29 12:58:30,922 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to Master server at obelix8.local,6,1391018303918 2014-01-29 12:58:40,960 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to master. Retrying. Error was: java.net.SocketException: Invalid argument at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:532) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:392) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:438) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1141) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:988) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:87) at $Proxy9.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:141) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:2043) at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2089) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:747) at java.lang.Thread.run(Thread.java:662) Best, Boduo On Wed, Jan 29, 2014 at 8:21 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi,
Re: HBase Design : Column name v/s Version
Versions in HBase are timestamps by default. If you intend to continue using the timestamps, what will happen when someone writes value_1 and value_2 at the exact same time? Regards, Dhaval - Original Message - From: Sagar Naik sn...@splunk.com To: user@hbase.apache.org user@hbase.apache.org Cc: Sent: Friday, 24 January 2014 12:27 PM Subject: HBase Design : Column name v/s Version Hi, I have a choice to maintain to data either in column values or as versioned data. This data is not a versioned copy per se. The access pattern on this get all the data every time So the schema choices are : Schema 1: 1. column_name/qualifier = data_1. column_value = value_1 1.a. column_name/qualifier = data_2. column_value = value_2,value_2.a 1.b. column_name/qualifier = data_3. column_value = value_3 To get all the values for data, I will have to use ColumnPrefixFilter with prefix set data Schema 2: 2. column_name/qualifier = data. version= 1, column_value = value_1 2.a. column_name/qualifier = data. version= 2, column_value = value_2,value_2.a 2.b. column_name/qualifier = data. version= 3, column_value = value_3 To get all the values for data , I will do a simple get operation to get all the versions. Number of versions can go from: 10 to 100K Get operation perf should beat the Filter perf. Comparing 100K values will be costly as the # versions increase. I would like to know if there are drawbacks in going the version route. -Sagar
Re: HBase Design : Column name v/s Version
Theoretically that could work. However, it does seem like a weird way of doing what you want to do and you might run into unforeseen issues. One issue I see is that 100k versions sounds a bit scary. You can paginate through columns but not through versions on the same column for example. Regards, Dhaval - Original Message - From: Sagar Naik sn...@splunk.com To: user@hbase.apache.org user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in Cc: Sent: Friday, 24 January 2014 1:46 PM Subject: Re: HBase Design : Column name v/s Version Thanks for clarifying, I will be using custom version numbers (auto incrementing on the client side) and not timestamps. Two clients do not update the same row -Sagar On 1/24/14 10:33 AM, Dhaval Shah prince_mithi...@yahoo.co.in wrote: I am talking about schema 2. Schema 1 would definitely work. Schema 2 can have the version collisions if you decide to use timestamps as versions Regards, Dhaval - Original Message - From: Sagar Naik sn...@splunk.com To: user@hbase.apache.org user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in Cc: Sent: Friday, 24 January 2014 1:07 PM Subject: Re: HBase Design : Column name v/s Version I am not sure I understand you correctly. I assume you are talking abt schema 1. In this case I m appending the version number to the column name. The column_names are different (data_1/data_2) for value_1 and value_2 respectively. -Sagar On 1/24/14 9:47 AM, Dhaval Shah prince_mithi...@yahoo.co.in wrote: Versions in HBase are timestamps by default. If you intend to continue using the timestamps, what will happen when someone writes value_1 and value_2 at the exact same time? Regards, Dhaval - Original Message - From: Sagar Naik sn...@splunk.com To: user@hbase.apache.org user@hbase.apache.org Cc: Sent: Friday, 24 January 2014 12:27 PM Subject: HBase Design : Column name v/s Version Hi, I have a choice to maintain to data either in column values or as versioned data. This data is not a versioned copy per se. The access pattern on this get all the data every time So the schema choices are : Schema 1: 1. column_name/qualifier = data_1. column_value = value_1 1.a. column_name/qualifier = data_2. column_value = value_2,value_2.a 1.b. column_name/qualifier = data_3. column_value = value_3 To get all the values for data, I will have to use ColumnPrefixFilter with prefix set data Schema 2: 2. column_name/qualifier = data. version= 1, column_value = value_1 2.a. column_name/qualifier = data. version= 2, column_value = value_2,value_2.a 2.b. column_name/qualifier = data. version= 3, column_value = value_3 To get all the values for data , I will do a simple get operation to get all the versions. Number of versions can go from: 10 to 100K Get operation perf should beat the Filter perf. Comparing 100K values will be costly as the # versions increase. I would like to know if there are drawbacks in going the version route. -Sagar
Re: Easiest way to get a random sample of keys
HBase shell is a JRuby shell and wraps all Java classes in a ruby interface. You can actually use a RandomRowFilter with a 5% configuration to achieve what you need. Regards, Dhaval From: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) skada...@bloomberg.net To: user@hbase.apache.org Sent: Friday, 24 January 2014 6:15 PM Subject: Easiest way to get a random sample of keys Something like count 't1', {INTERVAL=20} should give me every 20th row in table 't1'. Is there an easy way to get a random sample via. the shell using filters?
Re: Rebuild HBASE Table to reduce Regions per RS
If you can afford downtime for your table, there are ways to do it. You can: - Merge regions (requires table to be disabled atleast in some older versions and probably in newer ones too) - Go brute force by doing an export, truncate, import (this is a little more manageable when you have a large number of regions). This however is way more resource intensive and will take longer. If you can't afford downtime, I would suggest create another table which mirrors this one and then switch to the new one Regards, Dhaval From: Upender Nimbekar nimbekar.upen...@gmail.com To: d...@hbase.apache.org; user@hbase.apache.org Sent: Tuesday, 14 January 2014 10:21 AM Subject: Rebuild HBASE Table to reduce Regions per RS Hi, Does anyone have any experience rebuidling the HBASE table to reduce the number of regions. I am currently dealing with a situation where the no. of regions per RS have gone up quite significantly (500 per RS) and thereby causing some performance issues. This is how I am thinking of bringing it down: increase the hbase.hregion.max.filesize from 500 MB to 2 GB And the rebuild the HBASE table. I am assuming if after table rebuild, I should see the no. of regions come down to more than half. I would basically like to stay within HBASE suggested no. of regions per RS which is about 50-200. Please suggest if someone has any experience doing it. Thanks Upen
Re: Large Puts in MR job
If you are creating 1 big put object, how would auto flush help you? In theory you would run out of memory before you do a table.put() anyways. Am I missing something? Why don't you split your put into smaller puts and let the deferred flush do its job? Do you need all the kv's to be flushed at the same time? Technically you can create your own hbase client in setup() but I don't know if that's going to solve your issue Sent from Yahoo Mail on Android
Re: Schema Design Newbie Question
A 1000 CFs with HBase does not sound like a good idea. category + timestamp sounds like the better of the 2 options you have thought of. Can you tell us a little more about your data? Regards, Dhaval From: Kamal Bahadur mailtoka...@gmail.com To: user@hbase.apache.org Sent: Monday, 23 December 2013 6:01 PM Subject: Schema Design Newbie Question Hello, I am just starting to use HBase and I am coming from Cassandra world.Here is a quick background regarding my data: My system will be storing data that belongs to a certain category. Currently I have around 1000 categories. Also note that some categories produce lot more data than others. To be precise, 10% of the categories provide more than 65% of the total data in the system. Data access queries always contains this category in the query. I have listed 2 options to design the schema: 1. Add category as first component of the row key [category + timestamp] so that my data is sorted based on category for fast retrieval. 2. Add category as column family so that I can just use timestamp as rowkey. This option will however create more hfiles since I have more categories. I am leaning towards option2. I like the idea that HBase separates data for each CF into its own HFiles. However I still worried about the number of hfiles that will be created on the server. Will it cause any other side effects? I would like to hear from the user community as to which option will be the best option in my case. Kamal
Re: One Region Server fails - all M/R jobs crash.
Hmm ok. You are right. The principle of Hadoop/HBase is do do big data on commodity hardware but that's not to say you can do it without enough hardware. Get 8 commodity disks and see your performance/throughput numbers improve suibstantially. Before jumping into buying anything though, I would suggest you look at hardware utilization when the problem happens. That would tell you what your most pressing need is Regards, Dhaval From: David Koch ogd...@googlemail.com To: user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in Sent: Monday, 25 November 2013 3:36 AM Subject: Re: One Region Server fails - all M/R jobs crash. Hi Dhaval, Yes, rows can get very big, that's why we filter them. The filter lets KVs pass as long as the KV count is MAX_LIMIT and skips the row entirely once the count exceeds this limit. KV size is about constant. Alternatively, we could use batching, you are right. Also, with regard to the Java version used. Cloudera 4 installs its own JVM which happens to be Java 7 so it's not a choice we made. I always thought the principle of Hadoop/HBase was to do big data on commodity hardware. You suggest we get 1 disk per CPU? I am by no means an expert in setting up this kind of system. Thanks again for your response, /David On Fri, Nov 22, 2013 at 9:06 PM, Dhaval Shah prince_mithi...@yahoo.co.inwrote: How big can your rows get? If you have a million columns on a row, you might run your region server out of memory. Can you try setBatch to a smaller number and test if that works? 10k regions is too many Can you try and increase your max file size and see if that helps. 8 cores / 1 disk is a bad combination. Can you look at disk IO during the time of crash and see if you find anything there. You might also be swapping. Can you look at your GC logs? You are running dangerously close to the fence with the kind of hardware you have. Regards, Dhaval From: David Koch ogd...@googlemail.com To: user@hbase.apache.org Sent: Friday, 22 November 2013 2:43 PM Subject: Re: One Region Server fails - all M/R jobs crash. Hello, Thank you for your replies. Not that it matters but, cache is 1, batch is -1 on the scan i.e each RPC call returns one row. The jobs don't write any data back to HBase, compaction is de-activated and done manually. At the time of the crash all datanodes were fine, hbchk showed no inconsistencies. Table size is about 10k regions/3 billion records on the largest tables and we do a lot of server side filtering to limit what's sent across the network. Our machines may not be the most powerful, 32GB RAM, 8 cores, 1 disk. It's also true that when we took a closer look in the past it turned out that most of the issues we had were somehow rooted in the fact that CPUs were overloaded, not enough memory available - hardware stuff. What I don't get is why HBase always crashes. I mean if it's slow ok - the hardware is a bottleneck but at least you'd expect it to pull through eventually. Some days all jobs work fine, some days they don't and there is no telling why. HBase's erratic behavior has been causing us a lot of headache and we have been spending way too much time fiddling with HBase configuration settings over the past 18 months. /David On Fri, Nov 22, 2013 at 7:05 PM, Ted Yu yuzhih...@gmail.com wrote: Thanks Dhaval for the analysis. bq. The HBase version is 0.94.6 David: Please upgrade to 0.94.13, if possible. There have been several JIRAs backporting patches from trunk where jdk 1.7 is supported. Please also check your DataNode log to see whether there was problem there (likely there was). Cheers On Sat, Nov 23, 2013 at 2:00 AM, Dhaval Shah prince_mithi...@yahoo.co.in wrote: You logs suggest that you are overloading resources (servers/network/memory). How much data are you scanning with your MR job, how much are you writing back to HBase? What values are you setting for setBatch, setCaching, setCacheBlocks? How much memory do you have on your region servers? 1 server crashing should not cause a job to fail because it will move on to the next one (given the right parmas for retries and retry interval are set). Your region server logs suggest that its way more complicated than that. 2013-11-17 09:58:37,513 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Received close for region we are already opening or closing; e54b8e16ffbe2187b9017fef596c62aa looks like some state inconsistency issue I also see that you are using Java 7. Though some people have had success using it, I am not sure if Java 7 is currently the recommended version (most people use Java 6!) 2013-11-18 18:01:47,959 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x342654dfdd30017, likely server has closed socket
Re: RegionServer crash without any errors (compaction?)
Did you look at your GC logs? Probably the compaction process is running your region server out of memory. Can you provide more details on your setup? Max heap size? Max Region HFile size? Regards, Dhaval From: John johnnyenglish...@gmail.com To: user@hbase.apache.org Sent: Thursday, 7 November 2013 10:51 AM Subject: RegionServer crash without any errors (compaction?) Hi, I have a cluster with 7 regionserver. Some of them are crashing from time to time wihtout any error message in the hbase log. If I take a look at the log at the time I found this: 2013-11-07 15:29:02,511 INFO org.apache.hadoop.hbase.regionserver.Store: Starting compaction of 2 file(s) in 1 of P_SO, http://xmlns.com/foaf/0.1/homepage,1383188177383.59d0259c87c07dc666a5600ba4d6c916. i$ 2013-11-07 15:29:10,471 INFO org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter type for hdfs:// pc08.pool.ifis.uni-luebeck.de:8020/hbase/P_SO/59d0259c87c07dc666a5600ba4d6c916/.tmp/f$ 2013-11-07 15:31:05,944 INFO org.apache.hadoop.hbase.util.VersionInfo: HBase 0.94.6-cdh4.4.0 restart At this time 2 of the 7 RS crashed, both has this compaction message before they crashed. I don't know exactly what compaction is, but it seems that this compaction has to do with the crash. What can I do to avoid this restart/crash? best regards
Re: RegionServer crash without any errors (compaction?)
Operation too slow is generally in the .log file while the GC logs (if you enabled GC logging) is in the .out file. You have a very small heap for a 1GB HFIle size. You are probably running your region server out of memory. Try increasing the heap size and see if that helps Regards, Dhaval From: John johnnyenglish...@gmail.com To: user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in Sent: Thursday, 7 November 2013 11:09 AM Subject: Re: RegionServer crash without any errors (compaction?) there are no really other logs before. There are a operationTooSlow message before, but that log is ~50 mins bofre the other: http://pastebin.com/EAAubqGB 2013/11/7 John johnnyenglish...@gmail.com Hi, thanks for your fast answer. If I take a look at the cloudera manager at this time the %-time of using the GC increase at this time, so I think you are right. The max heap size is 1GB for this node. The hbase.hregion.max.filesize is also 1GB. regards 2013/11/7 Dhaval Shah prince_mithi...@yahoo.co.in Did you look at your GC logs? Probably the compaction process is running your region server out of memory. Can you provide more details on your setup? Max heap size? Max Region HFile size? Regards, Dhaval From: John johnnyenglish...@gmail.com To: user@hbase.apache.org Sent: Thursday, 7 November 2013 10:51 AM Subject: RegionServer crash without any errors (compaction?) Hi, I have a cluster with 7 regionserver. Some of them are crashing from time to time wihtout any error message in the hbase log. If I take a look at the log at the time I found this: 2013-11-07 15:29:02,511 INFO org.apache.hadoop.hbase.regionserver.Store: Starting compaction of 2 file(s) in 1 of P_SO, http://xmlns.com/foaf/0.1/homepage,1383188177383.59d0259c87c07dc666a5600ba4d6c916. i$ 2013-11-07 15:29:10,471 INFO org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter type for hdfs:// pc08.pool.ifis.uni-luebeck.de:8020/hbase/P_SO/59d0259c87c07dc666a5600ba4d6c916/.tmp/f$ 2013-11-07 15:31:05,944 INFO org.apache.hadoop.hbase.util.VersionInfo: HBase 0.94.6-cdh4.4.0 restart At this time 2 of the 7 RS crashed, both has this compaction message before they crashed. I don't know exactly what compaction is, but it seems that this compaction has to do with the crash. What can I do to avoid this restart/crash? best regards
Re: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.util.Bytes
You need to add the Hadoop and HBase libraries to the Hadoop Classpath. You successfully added it on the classpath of your mainproject but when it submits the job to Hadoop, the classpath is lost. The easiest way is to modify hadoop_env.sh. Another way would be to submit the jars for hbase and hadoop with your job submission Regards, Dhaval From: fateme Abiri fateme.ab...@yahoo.com To: user@hbase.apache.org user@hbase.apache.org Sent: Monday, 4 November 2013 11:32 AM Subject: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.util.Bytes hi all i'm running a mapreduce job in my hbase project. my hadoop hbase are remote and i run my code by this command in my terminal: $ java -cp myproject.jar:/user/HadoopAndHBaseLibrary/* mainproject but i get this error: attempt_201207261322_0002_m_00_0, Status : FAILED Error: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.util.Bytes my other project which dont use import org.apache.hadoop.hbase.util.Bytes has been successfully run!!! but when i use this class in my mapreduce Job i got this error... what could i do? can any one help me? my hbase version is : 0.94.11 and the jar files of it and Hadoop are completly in HadoopAndHBaseLibrary --tanx
Re: RE: Add Columnsize Filter for Scan Operation
Mapper.cleanup is always called after all map calls are over Sent from Yahoo Mail on Android
Re: RE: Add Columnsize Filter for Scan Operation
John, an important point to note here is that even though rows will get split over multiple calls to scanner.next(), all batches of 1 row will always reach 1 mapper. Another important point to note is that these batches will appear in consecutive calls to mapper.map() What this means is that you don't need to send your data to the reducer (and be more efficient by not writing to disk, no shuffle/sort phases and so on). You can just keep the state in memory for a particular row being processed (effectively a running count on the number of columns) and make the final decision when the row ends (effectively you encounter a different row or all rows are exhausted and you reach the cleanup function). The way I would do it is a map only MR job which keeps the state in memory as described above and uses the KeyOnlyFilter to reduce the amount of data flowing to the mapper Regards, Dhaval From: John johnnyenglish...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Friday, 25 October 2013 8:02 AM Subject: Re: RE: Add Columnsize Filter for Scan Operation One thing I could do is to drop every batch-row where the column-size is smaller than the batch size. Something like if(rowsize batchsize-1) drop row. The problem with this version is that the last row of a big row is also droped. Here a little example: There is one row: row1: 3500 columns If I set the batch to 1000. the mapper function got for the first row 1. Iteration: map function got 1000 columns - write to disk for the reducer 2. Iteration map function got 1000 columns - write to disk for the reducer 3. Iteration map function got 1000 columns - write to disk for the reducer 4. Iteration map function got 500 columns - drop, because it's smaller than the batch size Is there a way to count the columns over different map-functions? regards 2013/10/25 John johnnyenglish...@gmail.com I try to build a MR-Job, but in my case that doesn't work. Because if I set for example the batch to 1000 and there are 5000 columns in row. Now i found to generate something for rows where are the column size is bigger than 2500. BUT since the map function is executed for every batch-row i can't say if the row has a size bigger than 2500. any ideas? 2013/10/25 lars hofhansl la...@apache.org We need to finish up HBASE-8369 From: Dhaval Shah prince_mithi...@yahoo.co.in To: user@hbase.apache.org user@hbase.apache.org Sent: Thursday, October 24, 2013 4:38 PM Subject: Re: RE: Add Columnsize Filter for Scan Operation Well that depends on your use case ;) There are many nuances/code complexities to keep in mind: - merging results of various HFiles (each region can have.more than one) - merging results of WAL - applying delete markers - how about data which is only in memory of region servers and no where else - applying bloom filters for efficiency - what about hbase filters? At some point you would basically start rewriting an hbase region server on you map reduce job which is not ideal for maintainability. Do we ever read MySQL data files directly or issue a SQL query? Kind of goes back to the same argument ;) Sent from Yahoo Mail on Android
Re: RE: Add Columnsize Filter for Scan Operation
Cool Sent from Yahoo Mail on Android
Re: Add Columnsize Filter for Scan Operation
Jean, if we don't add setBatch to the scan, MR job does cause HBase to crash due to OOME. We have run into this in the past as well. Basically the problem is - Say I have a region server with 12GB of RAM and a row of size 20GB (an extreme example, in practice, HBase runs out of memory way before 20GB). If I query the entire row, HBase does not have enough memory to hold/process it for the response. In practice, if your setCaching 1, then the aggregate of all rows growing too big can also cause the same issue. I think 1 way we can solve this issue is making the HBase server serve responses in a streaming fashion somehow (not exactly sure about the details on how this can work but if it has to hold the entire row in memory, its going to be bound by HBase heap size) Regards, Dhaval From: Jean-Marc Spaggiari jean-m...@spaggiari.org To: user user@hbase.apache.org Sent: Thursday, 24 October 2013 12:37 PM Subject: Re: Add Columnsize Filter for Scan Operation If the MR crash because of the number of columns, then we have an issue that we need to fix ;) Please open a JIRA provide details if you are facing that. Thanks, JM 2013/10/24 John johnnyenglish...@gmail.com @Jean-Marc: Sure, I can do that, but thats a little bit complicated because the the rows has sometimes Millions of Columns and I have to handle them into different batches because otherwise hbase crashs. Maybe I will try it later, but first I want to try the API version. It works okay so far, but I want to improve it a little bit. @Ted: I try to modify it, but I have no idea how exactly do this. I've to count the number of columns in that filter (that works obviously with the count field). But there is no Method that is caleld after iterating over all elements, so I can not return the Drop ReturnCode in the filterKeyValue Method because I did'nt know when it was the last one. Any ideas? regards 2013/10/24 Ted Yu yuzhih...@gmail.com Please take a look at src/main/java/org/apache/hadoop/hbase/filter/ColumnCountGetFilter.java : * Simple filter that returns first N columns on row only. You can modify the filter to suit your needs. Cheers On Thu, Oct 24, 2013 at 7:52 AM, John johnnyenglish...@gmail.com wrote: Hi, I'm write currently a HBase Java programm which iterates over every row in a table. I have to modiy some rows if the column size (the amount of columns in this row) is bigger than 25000. Here is my sourcode: http://pastebin.com/njqG6ry6 Is there any way to add a Filter to the scan Operation and load only rows where the size is bigger than 25k? Currently I check the size at the client, but therefore I have to load every row to the client site. It would be better if the wrong rows already filtered at the server site. thanks John
Re: Add Columnsize Filter for Scan Operation
Interesting!! Can't wait to see this in action. I am already imagining huge performance gains Regards, Dhaval From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in Sent: Thursday, 24 October 2013 1:06 PM Subject: Re: Add Columnsize Filter for Scan Operation For streaming responses, there is this JIRA: HBASE-8691 High-Throughput Streaming Scan API On Thu, Oct 24, 2013 at 9:53 AM, Dhaval Shah prince_mithi...@yahoo.co.inwrote: Jean, if we don't add setBatch to the scan, MR job does cause HBase to crash due to OOME. We have run into this in the past as well. Basically the problem is - Say I have a region server with 12GB of RAM and a row of size 20GB (an extreme example, in practice, HBase runs out of memory way before 20GB). If I query the entire row, HBase does not have enough memory to hold/process it for the response. In practice, if your setCaching 1, then the aggregate of all rows growing too big can also cause the same issue. I think 1 way we can solve this issue is making the HBase server serve responses in a streaming fashion somehow (not exactly sure about the details on how this can work but if it has to hold the entire row in memory, its going to be bound by HBase heap size) Regards, Dhaval From: Jean-Marc Spaggiari jean-m...@spaggiari.org To: user user@hbase.apache.org Sent: Thursday, 24 October 2013 12:37 PM Subject: Re: Add Columnsize Filter for Scan Operation If the MR crash because of the number of columns, then we have an issue that we need to fix ;) Please open a JIRA provide details if you are facing that. Thanks, JM 2013/10/24 John johnnyenglish...@gmail.com @Jean-Marc: Sure, I can do that, but thats a little bit complicated because the the rows has sometimes Millions of Columns and I have to handle them into different batches because otherwise hbase crashs. Maybe I will try it later, but first I want to try the API version. It works okay so far, but I want to improve it a little bit. @Ted: I try to modify it, but I have no idea how exactly do this. I've to count the number of columns in that filter (that works obviously with the count field). But there is no Method that is caleld after iterating over all elements, so I can not return the Drop ReturnCode in the filterKeyValue Method because I did'nt know when it was the last one. Any ideas? regards 2013/10/24 Ted Yu yuzhih...@gmail.com Please take a look at src/main/java/org/apache/hadoop/hbase/filter/ColumnCountGetFilter.java : * Simple filter that returns first N columns on row only. You can modify the filter to suit your needs. Cheers On Thu, Oct 24, 2013 at 7:52 AM, John johnnyenglish...@gmail.com wrote: Hi, I'm write currently a HBase Java programm which iterates over every row in a table. I have to modiy some rows if the column size (the amount of columns in this row) is bigger than 25000. Here is my sourcode: http://pastebin.com/njqG6ry6 Is there any way to add a Filter to the scan Operation and load only rows where the size is bigger than 25k? Currently I check the size at the client, but therefore I have to load every row to the client site. It would be better if the wrong rows already filtered at the server site. thanks John
Re: RE: Add Columnsize Filter for Scan Operation
Well that depends on your use case ;) There are many nuances/code complexities to keep in mind: - merging results of various HFiles (each region can have.more than one) - merging results of WAL - applying delete markers - how about data which is only in memory of region servers and no where else - applying bloom filters for efficiency - what about hbase filters? At some point you would basically start rewriting an hbase region server on you map reduce job which is not ideal for maintainability. Do we ever read MySQL data files directly or issue a SQL query? Kind of goes back to the same argument ;) Sent from Yahoo Mail on Android
Re: How can I export HBase table using start and stop row key
Hi Karunakar. Unfortunately due to organizational restrictions I am not allowed to share my code. However, its a very simple modification. Basically look at Export.java within the hbase mapreduce package. Look for the function getConfiguredScanForJob (might be named differently based on your version) and add the required hbase Filter to your scan or you can also add a start row/stop row to your scan. Should not be more than 3 lines of code to do what you need Regards, Dhaval From: karunakar lkarunaka...@gmail.com To: user@hbase.apache.org Sent: Monday, 21 October 2013 7:36 PM Subject: Re: How can I export HBase table using start and stop row key Hi Dhaval, Can you please share your code if possible ? it would benefit others as well. Thanks, karunakar. -- View this message in context: http://apache-hbase.679495.n3.nabble.com/How-can-I-export-HBase-table-using-start-and-stop-row-key-tp4051961p4051972.html Sent from the HBase User mailing list archive at Nabble.com.
Re: How can I export HBase table using start and stop row key
The version you are using only support PrefixFilter and RegexFilter for scans. Unless your start and stop row have the same prefix (or you can somehow get it into a regex), you won't be able to do it as is. You can always write your own export (we did that to support some more functionality like batching, etc and its very easy to do) Regards, Dhaval From: karunakar lkarunaka...@gmail.com To: user@hbase.apache.org Sent: Monday, 21 October 2013 5:40 PM Subject: How can I export HBase table using start and stop row key Hi, I would like to fetch data from hbase table using map reduce export API. I see that I can fetch data using start and stop time, but I don't see any information regarding start and stop row key. Can any expert guide me or give me an example in order fetch first 1000 rows (or start and stop row key) using export API ? Hadoop 2.0.0-cdh4.1.2 HBase 0.92.1-cdh4.1.2 Please let me know if you need more information. Thank you. -- View this message in context: http://apache-hbase.679495.n3.nabble.com/How-can-I-export-HBase-table-using-start-and-stop-row-key-tp4051960.html Sent from the HBase User mailing list archive at Nabble.com.
Re: Multi-master info missing from book?
Yes. Just start HMaster on 2 different servers and they will fight it out Regards, Dhaval From: Otis Gospodnetic otis.gospodne...@gmail.com To: user@hbase.apache.org Sent: Wednesday, 25 September 2013 1:53 PM Subject: Re: Multi-master info missing from book? Thanks Ted. That's good, but shouldn't there be some info about how to run multiple masters? Is it as simple as starting hmaster on 2 different servers and have them fight it out? Otis -- HBASE Performance Monitoring http://sematext.com/spm/hbase-performance-monitoring/ On Wed, Sep 25, 2013 at 1:11 PM, Ted Yu yuzhih...@gmail.com wrote: I found: 2.5.1.2. If a backup Master, making primary Master fail fast Cheers On Wed, Sep 25, 2013 at 10:08 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, I was looking for info about running multiple HBase masters on http://hbase.apache.org/book.html and wasn't able to find any references to it. I think I spotted one mention of active in the context of master, but nothing else. Either I'm not seeing it there, or I'm not looking at the right place, or the info about this is lacking? Thanks, Otis -- HBASE Performance Monitoring http://sematext.com/spm/hbase-performance-monitoring/
Re: HBase Region Server crash if column size become to big
John can you check the .out file as well. We used to have a similar issue and turned out that query for such a large row ran the region server out of memory causing the crash and oome does not show up in the .log files but rather in the .out files. In such a situation setBatch for scans or column pagination filter for gets can help your case Sent from Yahoo! Mail on Android
Re: HBase Region Server crash if column size become to big
John oome is out of memory error. Your log file structure is a bit different than ours. We see the kind of messages you get in .log files and GC/JVM related logs in .out files but everything is in /var/log/hbase. Sent from Yahoo! Mail on Android
Re: HBase Region Server crash if column size become to big
@Mike rows can't span multiple regions but it does not cause crashes. It simply won't allow the region to split and continue to function like a huge region. We had a similar situation long back (when we were on 256mb region sizes) and it worked (just didn't split the region). Sent from Yahoo! Mail on Android
Re: HBase Region Server crash if column size become to big
@Mike rows can't span multiple regions but it does not cause crashes. It simply won't allow the region to split and continue to function like a huge region. We had a similar situation long back (when we were on 256mb region sizes) and it worked (just didn't split the region). Sent from Yahoo! Mail on Android
Re: How is pig so much faster than my java MR job?
Java MR code is not optimized/efficiently written while Pig is highly optimized? Can you give us more details on what exactly you are trying to do and how your Java MR code is written, how many MR jobs for Java vs Pig and so on Sent from Yahoo! Mail on Android
Re: Lease Exception Errors When Running Heavy Map Reduce Job
Couple of things: - Can you check the resources on the region server for which you get the lease exception? It seems like the server is heavily thrashed - What are your values for scan.setCaching and scan.setBatch? The lease does not exist exception generally happens when the client goes back to the region server after the lease expires (in your case 90). If you setCaching is really high for example, the client gets enough data in one call to scanner.next and keeps processing it for 90 ms and when it eventually goes back to the region server, the lease on the region server has already expired. Setting your setCaching value lower might help in this case Regards, Dhaval From: Ameya Kanitkar am...@groupon.com To: user@hbase.apache.org Sent: Wednesday, 28 August 2013 11:00 AM Subject: Lease Exception Errors When Running Heavy Map Reduce Job HI All, We have a very heavy map reduce job that goes over entire table with over 1TB+ data in HBase and exports all data (Similar to Export job but with some additional custom code built in) to HDFS. However this job is not very stable, and often times we get following error and job fails: org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '-4456594242606811626' does not exist at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429) at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor. Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb We have changed following settings in HBase to counter this problem but issue persists: property !-- Loaded from hbase-site.xml -- namehbase.regionserver.lease.period/name value90/value /property property !-- Loaded from hbase-site.xml -- namehbase.rpc.timeout/name value90/value /property We also reduced number of mappers per RS less than available CPU's on the box. We also observed that problem once happens, happens multiple times on the same RS. All other regions are unaffected. But different RS observes this problem on different days. There is no particular region causing this either. We are running: 0.94.2 with cdh4.2.0 Any ideas? Ameya
Re: Will hbase automatically distribute the data across region servers or NOT..??
Vamshi, max value for hbase.hregion.max.filesize to 10MB seems too small. Did you mean 10GB? Regards, Dhaval From: Vamshi Krishna vamshi2...@gmail.com To: user@hbase.apache.org; zhoushuaifeng zhoushuaif...@gmail.com Sent: Friday, 23 August 2013 9:38 AM Subject: Re: Will hbase automatically distribute the data across region servers or NOT..?? Thanks for the clarifications. I am using hbase-0.94.10 and zookeepr-3.4.5 But I am running into different issues . I set hbase.hregion.max.filesize to 10Mb and i am inserting 10 million rows in to hbase table. During the insertion after some time, suddenly master is going down. I don't know what is the reason for such peculiar behavior. I found in master log below content and not able to make out what exactly the mistake. Please somebody help. master-log: 2013-08-23 18:56:36,865 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2013-08-23 18:56:36,866 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state : scores,\x00\x00\x00\x00\x00\x02\xC8t,1377264003140.a564f31795091b6513880c5db49ec90f. state=PENDING_OPEN, ts=1377264396861, server=vamshi,60020,1377263789273 .. Cannot transit it to OFFLINE. java.lang.IllegalStateException: Unexpected state : scores,\x00\x00\x00\x00\x00\x02\xC8t,1377264003140.a564f31795091b6513880c5db49ec90f. state=PENDING_OPEN, ts=1377264396861, server=vamshi,60020,1377263789273 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) 2013-08-23 18:56:36,867 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2013-08-23 18:56:36,867 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads 2013-08-23 18:56:36,867 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 6 2013-08-23 18:56:36,867 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 6: exiting 2013-08-23 18:56:36,867 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 6: exiting 2013-08-23 18:56:36,867 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 6: exiting 2013-08-23 18:56:36,873 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener on 6 2013-08-23 18:56:36,873 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 2 on 6: exiting 2013-08-23 18:56:36,873 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 1 on 6: exiting 2013-08-23 18:56:36,873 INFO org.apache.hadoop.hbase.master.HMaster$2: vamshi,6,1377263788019-BalancerChore exiting 2013-08-23 18:56:36,873 INFO org.apache.hadoop.hbase.master.HMaster: Stopping infoServer 2013-08-23 18:56:36,873 INFO org.apache.hadoop.hbase.master.cleaner.HFileCleaner: master-vamshi,6,1377263788019.archivedHFileCleaner exiting 2013-08-23 18:56:36,873 INFO org.apache.hadoop.hbase.master.CatalogJanitor: vamshi,6,1377263788019-CatalogJanitor exiting 2013-08-23 18:56:36,873 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 0 on 6: exiting 2013-08-23 18:56:36,873 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 6: exiting 2013-08-23 18:56:36,874 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:60010 2013-08-23 18:56:36,874 INFO org.apache.hadoop.hbase.master.cleaner.LogCleaner: master-vamshi,6,1377263788019.oldLogCleaner exiting 2013-08-23 18:56:36,874 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 6: exiting 2013-08-23 18:56:36,874 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 6: exiting 2013-08-23 18:56:36,874 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 6: exiting 2013-08-23 18:56:36,874 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 8 on 6: exiting 2013-08-23 18:56:36,874 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder 2013-08-23 18:56:36,876 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder 2013-08-23 18:56:36,874 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 6: exiting 2013-08-23 18:56:36,873 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 4 on 6: exiting 2013-08-23
Re: Will hbase automatically distribute the data across region servers or NOT..??
Ok. The balancer runs as a separate thread (there is a config to set how often the thread wakes up but can't remember off the top of my head). Maybe if you wait long enough, it will balance eventually. Another thing you can try is run the balancer from hbase shell and see what you get back. If you get back a true, it means it should balance. If you get back a false, look at hbase master logs to see whats happening. I once had a scenario where my Unix accounts were messed up (2 users - hbase and another user mapped to the same unix ID and HDFS thought the user did not have the permissions to write to the HBase files on HDFS) and balancer did not run due to this exception. Another thing is (I think!) balancer generally does not run when regions are splitting. So its possible in your case that your regions are splitting so often (due to 10MB limit) that the balancer cannot be run since your regions are not stationary Regards, Dhaval From: Vamshi Krishna vamshi2...@gmail.com To: user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in Sent: Friday, 23 August 2013 10:21 AM Subject: Re: Will hbase automatically distribute the data across region servers or NOT..?? No that is 10MB itself. Just to observe the region splitting with respect to the amount of data i am inserting in to hbase. So, here i am inserting 40-50mb data and fixing that property to 10mb and checking the region splitting happening. But the intersting thing is regions got split BUT they are not being distributed across other servers. Whatever regions formed from the created tables on machine-1, all of them are residing on the same machine-1 not being moved to other machine. On Fri, Aug 23, 2013 at 7:40 PM, Dhaval Shah prince_mithi...@yahoo.co.inwrote: Vamshi, max value for hbase.hregion.max.filesize to 10MB seems too small. Did you mean 10GB? Regards, Dhaval From: Vamshi Krishna vamshi2...@gmail.com To: user@hbase.apache.org; zhoushuaifeng zhoushuaif...@gmail.com Sent: Friday, 23 August 2013 9:38 AM Subject: Re: Will hbase automatically distribute the data across region servers or NOT..?? Thanks for the clarifications. I am using hbase-0.94.10 and zookeepr-3.4.5 But I am running into different issues . I set hbase.hregion.max.filesize to 10Mb and i am inserting 10 million rows in to hbase table. During the insertion after some time, suddenly master is going down. I don't know what is the reason for such peculiar behavior. I found in master log below content and not able to make out what exactly the mistake. Please somebody help. master-log: 2013-08-23 18:56:36,865 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2013-08-23 18:56:36,866 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state : scores,\x00\x00\x00\x00\x00\x02\xC8t,1377264003140.a564f31795091b6513880c5db49ec90f. state=PENDING_OPEN, ts=1377264396861, server=vamshi,60020,1377263789273 .. Cannot transit it to OFFLINE. java.lang.IllegalStateException: Unexpected state : scores,\x00\x00\x00\x00\x00\x02\xC8t,1377264003140.a564f31795091b6513880c5db49ec90f. state=PENDING_OPEN, ts=1377264396861, server=vamshi,60020,1377263789273 .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) 2013-08-23 18:56:36,867 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2013-08-23 18:56:36,867 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads 2013-08-23 18:56:36,867 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 6 2013-08-23 18:56:36,867 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 6: exiting 2013-08-23 18:56:36,867 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 6: exiting 2013-08-23 18:56:36,867 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 6: exiting 2013-08-23 18:56:36,873 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener on 6 2013-08-23 18:56:36,873 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC
Re: about rowkey prefix search
Did you try setting start and e-d rows on your scan? Sent from Yahoo! Mail on Android
Re: Memory distribution for Hadoop/Hbase processes
You are way underpowered. I don't think you are going to get reasonable performance out of this hardware with so many processes running on it (specially memory heavy processes like HBase), obviously severity depends on your use case I would say you can decrease memory allocation to namenode/datanodes/secondary namenode/hbase master/zookeeper and increase allocation to region servers Regards, Dhaval From: Vimal Jain vkj...@gmail.com To: user@hbase.apache.org Sent: Wednesday, 7 August 2013 12:47 PM Subject: Re: Memory distribution for Hadoop/Hbase processes Hi Ted, I am using centOS. I could not get output of ps aux | grep pid as currently the hbase/hadoop is down in production due to some internal reasons. Can you please help me in figuring out memory distribution for my single node cluster ( pseudo-distributed mode) ? Currently its just 4GB RAM .Also i can try and make it up to 6 GB. So i have come up with following distribution :- Name node - 512 MB Data node - 1024MB Secondary Name node - 512 MB HMaster - 512 MB HRegion - 2048 MB Zookeeper - 512 MB So total memory allocation is 5 GB and i still have 1 GB left for OS. 1) So is it fine to go ahead with this configuration in production ? ( I am asking this because i had long GC pause problems in past when i did not change JVM memory allocation configuration in hbase-env.sh and hadoop-env.sh so it was taking default values . i.e. 1 GB for each of the 6 process so total allocation was 6 GB and i had only 4 GB of RAM. After this i just assigned 1.5 GB to HRegion and 512 MB each to HMaster and Zookeeper . I forgot to change it for Hadoop processes.Also i changed kernel parameter vm.swappiness to 0. After this , it was working fine). 2) Currently i am running pseudo-distributed mode as my data size is at max 10-15GB at present.How easy it is to migrate from pseudo-distributed mode to Fully distributed mode in future if my data size increases ? ( which will be the case for sure ) . Thanks for your help . Really appreciate it . On Sun, Aug 4, 2013 at 8:12 PM, Kevin O'dell kevin.od...@cloudera.comwrote: My questions are : 1) How this thing is working ? It is working because java can over allocate memory. You will know you are using too much memory when the kernel starts killing processes. 2) I just have one table whose size at present is about 10-15 GB , so what should be ideal memory distribution ? Really you should get a box with more memory. You can currently only hold about ~400 MB in memory. On Aug 4, 2013 9:58 AM, Ted Yu yuzhih...@gmail.com wrote: What OS are you using ? What is the output from the following command ? ps aux | grep pid where pid is the process Id for Namenode, Datanode, etc. Cheers On Sun, Aug 4, 2013 at 6:33 AM, Vimal Jain vkj...@gmail.com wrote: Hi, I have configured Hbase in pseudo distributed mode with HDFS as underlying storage.I am not using map reduce framework as of now I have 4GB RAM. Currently i have following distribution of memory Data Node,Name Node,Secondary Name Node each :1000MB(default HADOOP_HEAPSIZE property) Hmaster - 512 MB HRegion - 1536 MB Zookeeper - 512 MB So total heap allocation becomes - 5.5 GB which is absurd as my total RAM is only 4 GB , but still the setup is working fine on production. :-0 My questions are : 1) How this thing is working ? 2) I just have one table whose size at present is about 10-15 GB , so what should be ideal memory distribution ? -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain
NoRouteToHostException when zookeeper crashes
I have a weird (and a pretty serious) issue on my HBase cluster. Whenever one of my zookeeper server goes down, already running services work fine for a few hours but when I try to restart any service (be it region servers or clients), they fail with a NoRouteToHostException while trying to connect to zookeeper and I cannot restart any service successfully at all. I do realize that No Route to host is coming from my network infrastructure (ping gives the same error) but why would 1 zookeeper server going down bring down the entire HBase cluster. Why doesn't HBase ride over the exception and try some other zookeeper server? Is this an issue other people face or its just me? We are running these on DHCP (but the IPs don't change because we have long leases). Do you guys think its a DHCP specific issue? Do you have pointers to avoid this issue with DHCP or do I have to move to static IPs? Regards, Dhaval
Re: NoRouteToHostException when zookeeper crashes
HBase - 0.92.1 Zookeeper - 3.4.3 Regards, Dhaval - Original Message - From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in Cc: Sent: Tuesday, 6 August 2013 11:08 AM Subject: Re: NoRouteToHostException when zookeeper crashes What HBase / zookeeper versions are you using ? On Tue, Aug 6, 2013 at 7:48 AM, Dhaval Shah prince_mithi...@yahoo.co.inwrote: I have a weird (and a pretty serious) issue on my HBase cluster. Whenever one of my zookeeper server goes down, already running services work fine for a few hours but when I try to restart any service (be it region servers or clients), they fail with a NoRouteToHostException while trying to connect to zookeeper and I cannot restart any service successfully at all. I do realize that No Route to host is coming from my network infrastructure (ping gives the same error) but why would 1 zookeeper server going down bring down the entire HBase cluster. Why doesn't HBase ride over the exception and try some other zookeeper server? Is this an issue other people face or its just me? We are running these on DHCP (but the IPs don't change because we have long leases). Do you guys think its a DHCP specific issue? Do you have pointers to avoid this issue with DHCP or do I have to move to static IPs? Regards, Dhaval
Re: NoRouteToHostException when zookeeper crashes
Thanks Stack. Do you have any specific pointers as to what configs would help mitigate this issue with a DHCP setup (I am not a networking expert, other teams manage the network and if I have specific pointers that would help guide the discussion) Regards, Dhaval From: Stack st...@duboce.net To: Hbase-User user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in Sent: Tuesday, 6 August 2013 1:29 PM Subject: Re: NoRouteToHostException when zookeeper crashes On Tue, Aug 6, 2013 at 7:48 AM, Dhaval Shah prince_mithi...@yahoo.co.inwrote: I have a weird (and a pretty serious) issue on my HBase cluster. Whenever one of my zookeeper server goes down, already running services work fine for a few hours but when I try to restart any service (be it region servers or clients), they fail with a NoRouteToHostException while trying to connect to zookeeper and I cannot restart any service successfully at all. I do realize that No Route to host is coming from my network infrastructure (ping gives the same error) but why would 1 zookeeper server going down bring down the entire HBase cluster. Why doesn't HBase ride over the exception and try some other zookeeper server? Is this an issue other people face or its just me? We are running these on DHCP (but the IPs don't change because we have long leases). Do you guys think its a DHCP specific issue? Do you have pointers to avoid this issue with DHCP or do I have to move to static IPs? All bets are off in the face of NoRouteToHost. Please fixup your networking (My guess is first lookup works and gets cached. On restart, we run into your network issue). St.Ack
Re: help on key design
Looking at https://issues.apache.org/jira/browse/HBASE-6136 it seems like the 500 Gets are executed sequentially on the region server. Also 3k requests per minute = 50 requests per second. Assuming your requests take 1 sec (which seems really long but who knows) then you need atleast 50 threads/region server handlers to handle these. Defaults for that number on some older versions of hbase is 10 which means you are running out of threads. Which brings up the following questions - What version of HBase are you running? How many region server handlers do you have? Regards, Dhaval - Original Message - From: Demian Berjman dberj...@despegar.com To: user@hbase.apache.org Cc: Sent: Wednesday, 31 July 2013 11:12 AM Subject: Re: help on key design Thanks for the responses! why don't you use a scan I'll try that and compare it. How much memory do you have for your region servers? Have you enabled block caching? Is your CPU spiking on your region servers? Block caching is enabled. Cpu and memory dont seem to be a problem. We think we are saturating a region because the quantity of keys requested. In that case my question will be if asking 500+ keys per request is a normal scenario? Cheers, On Wed, Jul 31, 2013 at 11:24 AM, Pablo Medina pablomedin...@gmail.comwrote: The scan can be an option if the cost of scanning undesired cells and discarding them trough filters is better than accessing those keys individually. I would say that as the number of 'undesired' cells decreases the scan overall performance/efficiency gets increased. It all depends on how the keys are designed to be grouped together. 2013/7/30 Ted Yu yuzhih...@gmail.com Please also go over http://hbase.apache.org/book.html#perf.reading Cheers On Tue, Jul 30, 2013 at 3:40 PM, Dhaval Shah prince_mithi...@yahoo.co.in wrote: If all your keys are grouped together, why don't you use a scan with start/end key specified? A sequential scan can theoretically be faster than MultiGet lookups (assuming your grouping is tight, you can also use filters with the scan to give better performance) How much memory do you have for your region servers? Have you enabled block caching? Is your CPU spiking on your region servers? If you are saturating the resources on your *hot* region server then yes having more region servers will help. If no, then something else is the bottleneck and you probably need to dig further Regards, Dhaval From: Demian Berjman dberj...@despegar.com To: user@hbase.apache.org Sent: Tuesday, 30 July 2013 4:37 PM Subject: help on key design Hi, I would like to explain our use case of HBase, the row key design and the problems we are having so anyone can give us a help: The first thing we noticed is that our data set is too small compared to other cases we read in the list and forums. We have a table containing 20 million keys splitted automatically by HBase in 4 regions and balanced in 3 region servers. We have designed our key to keep together the set of keys requested by our app. That is, when we request a set of keys we expect them to be grouped together to improve data locality and block cache efficiency. The second thing we noticed, compared to other cases, is that we retrieve a bunch keys per request (500 aprox). Thus, during our peaks (3k requests per minute), we have a lot of requests going to a particular region servers and asking a lot of keys. That results in poor response times (in the order of seconds). Currently we are using multi gets. We think an improvement would be to spread the keys (introducing a randomized component on it) in more region servers, so each rs will have to handle less keys and probably less requests. Doing that way the multi gets will be spread over the region servers. Our questions: 1. Is it correct this design of asking so many keys on each request? (if you need high performance) 2. What about splitting in more region servers? It's a good idea? How we could accomplish this? We thought in apply some hashing... Thanks in advance!
Re: help on key design
Yup that issue definitely seems relevant. Unfortunately you might have to wait till you can upgrade or patch your version. In the time being depending on how well your rows are grouped (and if you are using Bloomfilters) the scan might give you a short term solution Regards, Dhaval - Original Message - From: Demian Berjman dberj...@despegar.com To: user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in Cc: Sent: Wednesday, 31 July 2013 2:41 PM Subject: Re: help on key design Dhaval, What version of HBase are you running? 0.94.7 How many region server handlers do you have? 100 We are following this issue: https://issues.apache.org/jira/browse/HBASE-9087 Ted, we think too that splitting may incur in a better performance. But like you said, it must be done manually. Thanks! On Wed, Jul 31, 2013 at 2:14 PM, Dhaval Shah prince_mithi...@yahoo.co.inwrote: Looking at https://issues.apache.org/jira/browse/HBASE-6136 it seems like the 500 Gets are executed sequentially on the region server. Also 3k requests per minute = 50 requests per second. Assuming your requests take 1 sec (which seems really long but who knows) then you need atleast 50 threads/region server handlers to handle these. Defaults for that number on some older versions of hbase is 10 which means you are running out of threads. Which brings up the following questions - What version of HBase are you running? How many region server handlers do you have? Regards, Dhaval - Original Message - From: Demian Berjman dberj...@despegar.com To: user@hbase.apache.org Cc: Sent: Wednesday, 31 July 2013 11:12 AM Subject: Re: help on key design Thanks for the responses! why don't you use a scan I'll try that and compare it. How much memory do you have for your region servers? Have you enabled block caching? Is your CPU spiking on your region servers? Block caching is enabled. Cpu and memory dont seem to be a problem. We think we are saturating a region because the quantity of keys requested. In that case my question will be if asking 500+ keys per request is a normal scenario? Cheers, On Wed, Jul 31, 2013 at 11:24 AM, Pablo Medina pablomedin...@gmail.com wrote: The scan can be an option if the cost of scanning undesired cells and discarding them trough filters is better than accessing those keys individually. I would say that as the number of 'undesired' cells decreases the scan overall performance/efficiency gets increased. It all depends on how the keys are designed to be grouped together. 2013/7/30 Ted Yu yuzhih...@gmail.com Please also go over http://hbase.apache.org/book.html#perf.reading Cheers On Tue, Jul 30, 2013 at 3:40 PM, Dhaval Shah prince_mithi...@yahoo.co.in wrote: If all your keys are grouped together, why don't you use a scan with start/end key specified? A sequential scan can theoretically be faster than MultiGet lookups (assuming your grouping is tight, you can also use filters with the scan to give better performance) How much memory do you have for your region servers? Have you enabled block caching? Is your CPU spiking on your region servers? If you are saturating the resources on your *hot* region server then yes having more region servers will help. If no, then something else is the bottleneck and you probably need to dig further Regards, Dhaval From: Demian Berjman dberj...@despegar.com To: user@hbase.apache.org Sent: Tuesday, 30 July 2013 4:37 PM Subject: help on key design Hi, I would like to explain our use case of HBase, the row key design and the problems we are having so anyone can give us a help: The first thing we noticed is that our data set is too small compared to other cases we read in the list and forums. We have a table containing 20 million keys splitted automatically by HBase in 4 regions and balanced in 3 region servers. We have designed our key to keep together the set of keys requested by our app. That is, when we request a set of keys we expect them to be grouped together to improve data locality and block cache efficiency. The second thing we noticed, compared to other cases, is that we retrieve a bunch keys per request (500 aprox). Thus, during our peaks (3k requests per minute), we have a lot of requests going to a particular region servers and asking a lot of keys. That results in poor response times (in the order of seconds). Currently we are using multi gets. We think an improvement would be to spread the keys (introducing a randomized component on it) in more region servers, so each rs will have to handle less keys and probably less requests. Doing that way the multi gets will be spread over the region servers
Re: help on key design
If all your keys are grouped together, why don't you use a scan with start/end key specified? A sequential scan can theoretically be faster than MultiGet lookups (assuming your grouping is tight, you can also use filters with the scan to give better performance) How much memory do you have for your region servers? Have you enabled block caching? Is your CPU spiking on your region servers? If you are saturating the resources on your *hot* region server then yes having more region servers will help. If no, then something else is the bottleneck and you probably need to dig further Regards, Dhaval From: Demian Berjman dberj...@despegar.com To: user@hbase.apache.org Sent: Tuesday, 30 July 2013 4:37 PM Subject: help on key design Hi, I would like to explain our use case of HBase, the row key design and the problems we are having so anyone can give us a help: The first thing we noticed is that our data set is too small compared to other cases we read in the list and forums. We have a table containing 20 million keys splitted automatically by HBase in 4 regions and balanced in 3 region servers. We have designed our key to keep together the set of keys requested by our app. That is, when we request a set of keys we expect them to be grouped together to improve data locality and block cache efficiency. The second thing we noticed, compared to other cases, is that we retrieve a bunch keys per request (500 aprox). Thus, during our peaks (3k requests per minute), we have a lot of requests going to a particular region servers and asking a lot of keys. That results in poor response times (in the order of seconds). Currently we are using multi gets. We think an improvement would be to spread the keys (introducing a randomized component on it) in more region servers, so each rs will have to handle less keys and probably less requests. Doing that way the multi gets will be spread over the region servers. Our questions: 1. Is it correct this design of asking so many keys on each request? (if you need high performance) 2. What about splitting in more region servers? It's a good idea? How we could accomplish this? We thought in apply some hashing... Thanks in advance!
Re: Writing unit tests against HBase
Why don't you spin up a mini cluster for your tests (there is a MiniHBaseCluster which brings up an in-memory cluster for testing and you can tear it down at the end of your test)? The benefit you get is that you no longer need to mock HBase responses and you will be talking to an actual cluster running similar code to the one you will have running in prod, so will be more reliable. Obviously the downside is that instead of mocking responses, you will have to populate data in HBase tables but I still feel this is more intuitive and reliable. Regards, Dhaval From: Adam Phelps a...@opendns.com To: user@hbase.apache.org Sent: Monday, 24 June 2013 5:14 PM Subject: Re: Writing unit tests against HBase On 6/18/13 4:22 PM, Stack wrote: On Tue, Jun 18, 2013 at 4:17 PM, Varun Sharma va...@pinterest.com wrote: Hi, If I wanted to write to write a unit test against HTable/HBase, is there an already available utility to that for unit testing my application logic. I don't want to write code that either touches production or requires me to mock an HTable. I am looking for a test htable object which behaves pretty close to a real HTable. Would this help if we included it? https://github.com/kijiproject/fake-hbase/ I figured I'd take a look as I was about to try using Mockito (https://code.google.com/p/mockito/) to try to implement unit testing of some of our code that accesses HBase. The example tests in there are all Scala, and I'm not having much success using them in Java. Do you know if there's any example Java tests that make use of fake-hbase? - Adam
Re: Writing unit tests against HBase
Yup I hear ya. MiniHBaseCluster adds an extra minute to the tests which kind of sucks. It gives me peace of mind though Regards, Dhaval - Original Message - From: Adam Phelps a...@opendns.com To: user@hbase.apache.org Cc: Sent: Monday, 24 June 2013 6:00 PM Subject: Re: Writing unit tests against HBase What I'm currently looking for is a method of adding quick unit tests (ie preferably run time of a few seconds) to test some algorithms that read hbase data and perform some operations on it. Mocking seems a much better way to handle this, though I'm open to other suggestions. I'll try out MiniHBaseCluster anyway since I can't seem to get FakeHBase to work. - Adam On 6/24/13 2:39 PM, Dhaval Shah wrote: Why don't you spin up a mini cluster for your tests (there is a MiniHBaseCluster which brings up an in-memory cluster for testing and you can tear it down at the end of your test)? The benefit you get is that you no longer need to mock HBase responses and you will be talking to an actual cluster running similar code to the one you will have running in prod, so will be more reliable. Obviously the downside is that instead of mocking responses, you will have to populate data in HBase tables but I still feel this is more intuitive and reliable. Regards, Dhaval From: Adam Phelps a...@opendns.com To: user@hbase.apache.org Sent: Monday, 24 June 2013 5:14 PM Subject: Re: Writing unit tests against HBase On 6/18/13 4:22 PM, Stack wrote: On Tue, Jun 18, 2013 at 4:17 PM, Varun Sharma va...@pinterest.com wrote: Hi, If I wanted to write to write a unit test against HTable/HBase, is there an already available utility to that for unit testing my application logic. I don't want to write code that either touches production or requires me to mock an HTable. I am looking for a test htable object which behaves pretty close to a real HTable. Would this help if we included it? https://github.com/kijiproject/fake-hbase/ I figured I'd take a look as I was about to try using Mockito (https://code.google.com/p/mockito/) to try to implement unit testing of some of our code that accesses HBase. The example tests in there are all Scala, and I'm not having much success using them in Java. Do you know if there's any example Java tests that make use of fake-hbase? - Adam
Re: Is there a way to view multiple versions of a cell in the HBase shell?
I think you can. Try specifying the following VERSIONS = 4| Its also documented in the HBase shell documentation for Get (and I am assuming the same would apply for scans)| get Get row or cell contents; pass table name, row, and optionally a dictionary of column(s), timestamp and versions. Examples: hbase get 't1', 'r1' hbase get 't1', 'r1', {COLUMN = 'c1'} hbase get 't1', 'r1', {COLUMN = ['c1', 'c2', 'c3']} hbase get 't1', 'r1', {COLUMN = 'c1', TIMESTAMP = ts1} hbase get 't1', 'r1', {COLUMN = 'c1', TIMESTAMP = ts1, \ VERSIONS = 4} Regards, Dhaval From: Jonathan Natkins na...@wibidata.com To: user@hbase.apache.org Sent: Thursday, 7 March 2013 12:08 PM Subject: Is there a way to view multiple versions of a cell in the HBase shell? It seems that the answer is no, but I just wanted to make sure I didn't miss something. As far as I can tell, scanning a column on a time range returns just the most recent value within that time range, rather than all the values in the range. Thanks, Natty -- http://www.wibidata.com office: 1.415.496.9424 x208 cell: 1.609.577.1600 twitter: @nattyice http://www.twitter.com/nattyice
Re: Json+hbase
JSON object is nothing but a String representation.. You can call json.toBytes() to get the byte representation and put that into HBase Regards, Dhaval From: ranjin...@polarisft.com ranjin...@polarisft.com To: user@hbase.apache.org Sent: Monday, 4 February 2013 5:14 AM Subject: Json+hbase Hi, Need to create JSON Object and put the data into Hbase table. How to but Json object in hbase table. Please guide me to complete the task. Thanks in advance. Regards, Ranjini R BSC-1,Nxt-Lvl 4th Floor East Wing Polaris.FT-Navallur-Chennai Mobile No:9003048194 This e-Mail may contain proprietary and confidential information and is sent for the intended recipient(s) only. If by an addressing or transmission error this mail has been misdirected to you, you are requested to delete this mail immediately. You are also hereby notified that any use, any form of reproduction, dissemination, copying, disclosure, modification, distribution and/or publication of this e-mail message, contents or its attachment other than by its intended recipient/s is strictly prohibited. Visit us at http://www.polarisFT.com
Re: Controlling TableMapReduceUtil table split points
Hi David.. We successfully use the logical schema approach and have not seen issues yet.. Ofcourse it all depends on the use case and saying it would work for you because it works for us would be naive.. However, if it does work, it will make your life much easier because with a logical schema other problems become simpler (like you can be sure that 1 map function will process an entire row rather than a row going to multiple mappers, or if you are using filters that restrict queries to only a small subset of the data, even setBatch won't be needed for those use cases).. I did run into issues where I did not use setBatch and my mappers ran out of memory but that was a simpler one to solve (and by the way if you are on CDH4, the HBase export utility also does not use setBatch and your mapper will run out of memory if you have a large row.. Its easy to put that line in though as a config param and this feature is available in future releases of HBase trunk) Regards, Dhaval From: David Koch ogd...@googlemail.com To: user@hbase.apache.org Sent: Sunday, 6 January 2013 12:53 PM Subject: Re: Controlling TableMapReduceUtil table split points Hi Dhaval, Good call on the setBatch. I had forgotten about it. Just like changing the schema it would involve changing the map(...) to reflect the fact that only part of the user's data is returned in each call but I would not have to manipulate table splits. The HBase book does suggest that it's bad practice to use the logical schema of lumping all user data into a single row(*) but I'll do some testing to see what works. Thank you, /David (*) Chapter 9, section Tall-Narrow Versus Flat-Wide Tables, 3rd ed., page 359) On Sun, Jan 6, 2013 at 6:29 PM, Dhaval Shah prince_mithi...@yahoo.co.inwrote: Another option to avoid the timeout/oome issues is to use scan.setBatch() so that the scanner would function normally for small rows but would break up large rows in multiple Result objects which you can now use in conjunction with scan.setCaching() to control how much data you get back.. This approach would not need a change in your schema design and would ensure that only 1 mapper processes the entire row (but in multiple calls to the map function)
Re: HDFS disk space requirements
Also depending on compression type chosen it might take less disk space -- On Fri 11 Jan, 2013 3:53 PM IST Mesika, Asaf wrote: 130 GB raw data will take in HBase since it adds the family name, qualifier and timestamp to each value, so it can even be 150GB. You can check it exactly, by loading only one row with one column and see how much it takes on the HDFS file system (run compaction first). Next, you 5 times that since you have 5 times replication, so 5x150=750GB On Jan 11, 2013, at 5:07 AM, Panshul Whisper wrote: Hello, I have a 5 node hadoop cluster and a fully distributed Hbase setup on the cluster with 130 GB of HDFS space avaialble. HDFS replication is set to 5. I have a total of 115 GB of JSON files that need to be loaded into the Hbase database and then they have to processed. So is the available HDFS space sufficient for the operations?? considering the replication and all factors? or should I increase the space and by how much? Thanking You, -- Regards, Ouch Whisper 010101010101
Re: Controlling TableMapReduceUtil table split points
Another option to avoid the timeout/oome issues is to use scan.setBatch() so that the scanner would function normally for small rows but would break up large rows in multiple Result objects which you can now use in conjunction with scan.setCaching() to control how much data you get back.. This approach would not need a change in your schema design and would ensure that only 1 mapper processes the entire row (but in multiple calls to the map function) -- On Sun 6 Jan, 2013 10:07 PM IST David Koch wrote: Hi Ted, Thank you for your response. I will take a look. With regards to the timeouts: I think changing the key design as outlined above would ameliorate the situation since each map call only requests a small amount of data as opposed to what could be a large chunk. I remember that simply doing a get on one of the large outlier rows (~500mb) brought down the region server involved. /David On Sun, Jan 6, 2013 at 5:11 PM, Ted Yu yuzhih...@gmail.com wrote: If events for one user are processed by a single mapper, I think you would
Re: disable table
I have had similar problems and it seems like zookeeper and hbase master have different notions of whether the table is enabled or not.. Stopping the cluster, deleting zookeeper data and then starting it worked for me in this scenario Regards, Dhaval From: Mohit Anchlia mohitanch...@gmail.com To: user@hbase.apache.org Sent: Wednesday, 26 September 2012 4:54 PM Subject: disable table When I try to disable table I get: hbase(main):011:0 disable 'SESSIONID_TIMELINE' ERROR: org.apache.hadoop.hbase.TableNotEnabledException: org.apache.hadoop.hbase.TableNotEnabledException: SESSIONID_TIMELINE Here is some help for this command: Start disable of named table: e.g. hbase disable 't1' But then I try to enable I get: hbase(main):012:0 enable 'SESSIONID_TIMELINE' ERROR: org.apache.hadoop.hbase.TableNotDisabledException: org.apache.hadoop.hbase.TableNotDisabledException: SESSIONID_TIMELINE Here is some help for this command: Start enable of named table: e.g. hbase enable 't1' I've tried flush, major_compaction also. I tseems it's stuck in inconsistent state. Could someone point me to correct direction? I am using 92.1
Re:: Hregionserver instance runs endlessly
Try killing the old process manually ( ps -ef ) -- On Tue 25 Sep, 2012 11:28 AM IST iwannaplay games wrote: Hi, My hbase was working properly.But now it shows two instances of hregionserver , the starting time of one is of 4 days back.If i try stopping hbase it doesnt stop by stop-hbase.sh.If i go to slave n stop it,it stops the new instance .The old instance is not getting deleted and its not showing up when i do jps in that slave. What to do please help me.I restarted the cluster but the problem remains the same. here is the snapshot
Re: Required a sample Java program to delete a row from Hbase
Delete d = new Delete(rowKey); HTable t = new HTable(tableName); t.delete(d); Regards, Dhaval From: Ramasubramanian Narayanan ramasubramanian.naraya...@gmail.com To: user@hbase.apache.org Sent: Saturday, 22 September 2012 10:15 AM Subject: Required a sample Java program to delete a row from Hbase Hi, Can someone send a sample Java program to delete a row from Hbase. Regards, Rams
Re: Required a sample Java program to delete a row from Hbase
HTable and Delete are the only 2 I remember Regards, Dhaval - Original Message - From: Ramasubramanian Narayanan ramasubramanian.naraya...@gmail.com To: user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in Cc: Sent: Saturday, 22 September 2012 10:47 AM Subject: Re: Required a sample Java program to delete a row from Hbase Dhaval, Thanks!! What are all the classes that we need to import? My requirement is to use java script in Pentaho. So I could not write a full Java program there... I may require to import required classes before using these functions. regards, Rams On Sat, Sep 22, 2012 at 7:48 PM, Dhaval Shah prince_mithi...@yahoo.co.inwrote: Delete d = new Delete(rowKey); HTable t = new HTable(tableName); t.delete(d); Regards, Dhaval From: Ramasubramanian Narayanan ramasubramanian.naraya...@gmail.com To: user@hbase.apache.org Sent: Saturday, 22 September 2012 10:15 AM Subject: Required a sample Java program to delete a row from Hbase Hi, Can someone send a sample Java program to delete a row from Hbase. Regards, Rams
Re:: HMaster construction failing over SASL authentication issue.
Looking at your life, it seems like SASL is just.a warning/info message.. your real issue is invalid zookeeper sessions.. Can you try stopping everything, delete zookeeper data dir and data log dir and start.. Also are you running a version of zookeeper compatible with your hbase version? -- On Fri 14 Sep, 2012 11:48 AM IST Arati ! wrote: Hi, I was running my HBase-Hadoop setup just fine with HBase 0.92.0 and Hadoop 1.0.1.on 3 nodes Recently upgraded to HBase-0.94.1 and Hadoop-1.0.3. Since i was running a trial environment I hadnt added my machines to a DNS. Now after the version change, the HRegionServer processes began failing over not being able to reverse look up the IPs of my nodes. So I got that worked out and added the servers onto the DNS. After which im getting the foloowing SASL exceptions. Am I missing something? Upon start-up all the processes are up but the logs indicate failed construction of HMaster. Can I disable SASL authentication? 2012-09-13 19:27:12,720 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server /10.12.19.110:2181 2012-09-13 19:27:12,722 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 17868@test110 2012-09-13 19:27:12,731 WARN org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException: java.lang.SecurityException: Unable to locate a login configuration occurred when trying to find JAAS configuration. 2012-09-13 19:27:12,731 INFO org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration. 2012-09-13 19:27:12,737 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035) Thanks, Arati Patro
Re: Enabling compression
I bet that your compression libraries are not available to HBase.. Run the compression test utility and see if it can find LZO Regards, Dhaval - Original Message - From: Mohit Anchlia mohitanch...@gmail.com To: user@hbase.apache.org Cc: Sent: Tuesday, 24 July 2012 4:39 PM Subject: Re: Enabling compression Thanks! I was trying it out and I see this message when I use COMPRESSION, but it works when I don't use it. Am I doing something wrong? hbase(main):012:0 create 't2', {NAME = 'f1', VERSIONS = 1, COMPRESSION = 'LZO'} ERROR: org.apache.hadoop.hbase.client.RegionOfflineException: Only 0 of 1 regions are online; retries exhausted. hbase(main):014:0 create 't3', {NAME = 'f1', VERSIONS = 1} 0 row(s) in 1.1260 seconds On Tue, Jul 24, 2012 at 1:37 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: On Tue, Jul 24, 2012 at 1:34 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Also, if I understand it correctly, this will enable the compression for the new put but will not compresse the actual cells already stored right? For that, we need to run a major compaction of the table which will rewrite all the cells and so compact them? Yeah, although you may not want to recompact everything all at once in a live system. You can just let it happen naturally through cycles of flushes and compactions, it's all fine. J-D
Re: Enabling compression
Yes you need to add the snappy libraries to hbase path (i think the variable to set is called HBASE_LIBRARY_PATH) -- On Wed 25 Jul, 2012 3:46 AM IST Mohit Anchlia wrote: On Tue, Jul 24, 2012 at 2:04 PM, Dhaval Shah prince_mithi...@yahoo.co.inwrote: I bet that your compression libraries are not available to HBase.. Run the compression test utility and see if it can find LZO That seems to be the case for SNAPPY. However, I do have snappy installed and it works with hadoop just fine and HBase is running on the same cluster. Is there something special I need to do for HBase? Regards, Dhaval - Original Message - From: Mohit Anchlia mohitanch...@gmail.com To: user@hbase.apache.org Cc: Sent: Tuesday, 24 July 2012 4:39 PM Subject: Re: Enabling compression Thanks! I was trying it out and I see this message when I use COMPRESSION, but it works when I don't use it. Am I doing something wrong? hbase(main):012:0 create 't2', {NAME = 'f1', VERSIONS = 1, COMPRESSION = 'LZO'} ERROR: org.apache.hadoop.hbase.client.RegionOfflineException: Only 0 of 1 regions are online; retries exhausted. hbase(main):014:0 create 't3', {NAME = 'f1', VERSIONS = 1} 0 row(s) in 1.1260 seconds On Tue, Jul 24, 2012 at 1:37 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: On Tue, Jul 24, 2012 at 1:34 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Also, if I understand it correctly, this will enable the compression for the new put but will not compresse the actual cells already stored right? For that, we need to run a major compaction of the table which will rewrite all the cells and so compact them? Yeah, although you may not want to recompact everything all at once in a live system. You can just let it happen naturally through cycles of flushes and compactions, it's all fine. J-D
RE: Applying QualifierFilter to one column family only.
Alternately you can use a filter list and say first column family and qualifier filter or second column family.. -- On Fri 20 Jul, 2012 8:40 AM IST Anoop Sam John wrote: Yes I was having this doubt. So if you know exactly the qualifier names in advance you can use this scan way. Else filter only u can use. QualifierFilter just checks the qualifier name only which CF it is part of is not checked. So the similar qualifier names in both T and S will get filtered out. You can create a simple filter of your own and plugin the same into HBase and use? Here you can pass the CF name also and in the filterKeyValue() u can consider the CF name too. I think it should be an easy job :) -Anoop- _ From: David Koch [ogd...@googlemail.com] Sent: Thursday, July 19, 2012 1:57 PM To: user@hbase.apache.org Subject: Re: Applying QualifierFilter to one column family only. Hello Anoop, Thank you for your answer. The QualifierFilter on T specifies a minimum value not one that has to be matched exactly, so merely adding a specific qualifier value directly to the scan does not work if I understand correctly :-/ /David On Thu, Jul 19, 2012 at 7:05 AM, Anoop Sam John anoo...@huawei.com wrote: Hi David, You want the below use case in scan Table :T1 -- CF : T CF: S q1 q2..q1 q2 .. Now in Scan u want to scan all the qualifiers under S and one qualifier under T. (I think I got ur use case correctly) Well this use case u can achieve with out using any filter also. Scan s = new Scan() s.addFamily(S); // Tells to add all the qualifier(KVs) under this CF in the result s.addColumn(T,q1) Use this scan object for your getScanner. Using the addColumn you can add more than one qualifier under one CF too. Hope this helps u. -Anoop- From: David Koch [ogd...@googlemail.com] Sent: Thursday, July 19, 2012 3:36 AM To: user@hbase.apache.org Subject: Applying QualifierFilter to one column family only. Hello, When scanning a table with 2 column families, is it possible to apply a QualifierFilter selectively to one family but still include the other family in the scan? The layout of my table is as follows: rowkeyT:timestamp -- data,S:summary_item -- value For each rowkey family T contains timestamp/data key/value pairs. Column S contains summary information about this row key. I want to apply a QualifierFilter to column T only - i.e filter by timestamp but return also all of S whenever the set of key/values matched in T is not empty. Is this doable using standard HBase filters? If so, how? If not could I implement such a filter myself using FilterBase? Thank you, /David
Re: HBase shell
Mohit, HBase shell is a JRuby wrapper and as such has all functions available which are available using Java API.. So you can import the Bytes class and the do a Bytes.toString() similar to what you'd do in Java Regards, Dhaval From: Mohit Anchlia mohitanch...@gmail.com To: user@hbase.apache.org Sent: Friday, 20 July 2012 8:39 PM Subject: HBase shell Is there a command on the shell that convert byte into char array when using HBase shell command line? It's all in hex format hbase(main):004:0 scan 'SESSION_TIMELINE' ROW COLUMN+CELL \x00\x00\x00\x01\x7F\xFF\xE8\x034\x04\xCF\xFF column=S_T_MTX:\x07A\xB8\xB1, timestamp=1342826789668, value=Hello \x00\x00\x00\x01\x7F\xFF\xE81\xDC\xE4\x07\xFF column=S_T_MTX:\x04@\xBB\x94, timestamp=1342826589226, value=Hello \x00\x00\x00\x01\x7F\xFF\xFE\xC7Y\x09\xA2\x7F column=S_T_MTX:\x00\x00O?, timestamp=1342830980018, value=Hello \x00\x00\x00\x01\x7F\xFF\xFE\xC7Y\x1B\xF1\xFF column=S_T_MTX:\x00\x00\x82\x19, timestamp=1342829793047, value=Hello \x00\x00\x00\x01\x7F\xFF\xFE\xC7Y\x1C\xDC_ column=S_T_MTX:\x00\x00S, timestamp=1342829721025, value=Hello \x00\x00\x00\x01\x7F\xFF\xFE\xC7Y\x1D\xC6\xBF column=S_T_MTX:\x00\x00\x8Az, timestamp=1342829675205, value=Hello \x00\x00\x00\x01\x7F\xFF\xFE\xC7Y \x85\xDF column=S_T_MTX:\x00\x00\x89\xDE, timestamp=1342829495072, value=Hello \x00\x00\x00\x01\x7F\xFF\xFE\xC7Y!p? column=S_T_MTX:\x00\x00b\xEA, timestamp=1342829425086, value=Hello
Re: Improvement: Provide better feedback on Put to unknown CF
+1 a proper error message always helps IMHO -- On Tue 10 Jul, 2012 5:58 PM IST Jean-Marc Spaggiari wrote: Hi Michael, I agree that in the code we have access to all the information to access the right column. However, let's imagine the column family name is dynamically retrieved from a property file, and there is a typo. Or, another process removed the column family. Or there is a bug in the code, and so on. There is many possibilities why an application might try to access a CF which, at the end, doesn't exist in the table. I agree it should have been checked from the meta before, but skeeping that step might be required to improve performances. Adding such exception will not have any negative impact on perfs, readability, etc. It will simply help a lot the defect tracking when someone will face the issue and see the stack trace. JM 2012/7/9, Michael Segel michael_se...@hotmail.com: Jean-Marc, I think you mis understood. At run time, you can query HBase to find out the table schema and its column families. While I agree that you are seeing poorly written exceptions, IMHO its easier to avoid the problem in the first place. In a Map/Reduce in side the mapper class, you have everything you need to get the table's schema. From that you can see the column families. HTH -Mike On Jul 9, 2012, at 8:42 AM, Jean-Marc Spaggiari wrote: In my case it was a codding issue. Used the wrong final byte array to access the CF. So I agree, the CF is well known since you create the table based on them. But maybe you have added some other CFs later and something went wrong? It's just that based on the exception received, there is no indication that there might be some issues with the CF. So you might end trying to figure what the issue is far from where it's really. 2012/7/9, Michael Segel michael_se...@hotmail.com: This may beg the question ... Why do you not know the CF? Your table schemas only consist of tables and CFs. So you should know them at the start of your job or m/r Mapper.setup(); On Jul 9, 2012, at 7:25 AM, Jean-Marc Spaggiari wrote: Hi, When we try to add a value to a CF which does not exist on a table, we are getting the error below. I think this is not really giving the right information about the issue. Should it not be better to provide an exception like UnknownColumnFamillyException? JM org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: DoNotRetryIOException: 1 time, servers with issues: phenom:60020, at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1591) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1367) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:945) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:801) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:776) at org.myapp.app.Integrator.main(Integrator.java:162)
Re: HBASE -- YCSB ?
This exception is generally caused when one of your server names returned does not map to a valid IP address on that host.. The services being up or not does not matter but the hostname should resolve to a valid IP Regards, Dhaval From: registrat...@circle-cross-jn.com registrat...@circle-cross-jn.com To: user@hbase.apache.org Sent: Monday, 9 July 2012 5:30 PM Subject: Re: HBASE -- YCSB ? Thank you Amandeep for your input. I go into hbase shell to create a table from my HMaster, which isn't running a DN process and I get the following. Could this be caused by a number of my DNs being offline, by the fact that the node isn't running a DN process, or something else? hbase(main):013:0 create 'usertable', 'testcol' ERROR: java.net.NoRouteToHostException: java.netNoRouteToHostException: No route to host Here is some help for this command: Create table; pass table name, a dictionary of specifications per column family, and optionally a dictionary of table configuration. Dictionaries are described below in the GENERAL NOTES section. Examples: hbase create 't1', {NAME = 'f1', VERSIONS = 5} hbase create 't1', {NAME = 'f1'}, {NAME = 'f2'}, {NAME = 'f3'} hbase # The above in shorthand would be the following: hbase create 't1', 'f1', 'f2', 'f3' hbase create 't1', {NAME = 'f1', VERSIONS = 1, TTL = 2592000, BLOCKCACHE = true} hbase create 't1', 'f1', {SPLITS = ['10', '20', '30', '40']} hbase create 't1', 'f1', {SPLITS_FILE = 'splits.txt'} I can see in the ZK logs and the RS logs that they talk to the shell, so I know that communication is good and I find no errors or exceptions in them. Also I can do a hbase shell status, hbase shell zk_dump, and hadoop dfsadmin -report all from the node I am trying to create the table from with no issue. If I get on a node with the DataNode process running on it and try, I get the following: [hadoop@srack0-11 ~]$ hbase shell HBase Shell; enter 'help' for list of supported commands. Type exit to leave the HBase Shell Version 0.90.6-cdh3u4, r, Mon May 7 13:14:00 PDT 2012 hbase(main):001:0 status 3 servers, 0 dead, 0.6667 average load hbase(main):002:0 create 'usertable', 'tempcol' ERROR: java.io.IOException: java.io.IOException: Bad connect ack with firstBadLink as 172.18.0.9:50010 I assume this means it is trying to talk to a DN process on a node that I know is down. --- Jay Wilson - Original Message - From: user@hbase.apache.org To:, Cc: Sent:Mon, 9 Jul 2012 12:21:22 -0700 Subject:Re: HBASE -- YCSB ? Inline. On Monday, July 9, 2012 at 12:17 PM, registrat...@circle-cross-jn.com [1] wrote: Now that I have a stable cluster, I would like to use YCSB to test its performance; however, I am a bit confused after reading several different website posting about YCSB. 1) Be default will YCSB read my hbase-site.xml [2] file or do I have to copy it into the YCSB conf directory? I plan on using on of my nodes with no Hadoop/HBASE processes running on it, but it has all the environmental stuff in place. You have to put the hbase-site.xml [3] in YCSB/hbase/src/main/conf/. 2) Does the hbase.master [4] property have to be site in the hbase-site.xml [5] file for YCSB to work? The only property that has to be there is the zookeeper quorum list. That's what the HBase client needs to talk to the cluster. 3) After working through all the workloads is there a script/tool that will clean up my HBase? Nope. You'll need to go in and disable, drop the table you wrote too. You can do that from the shell. disable 'mytable' drop 'mytable' That's all you'll need to do to clean it up. Thank You --- Jay Wilson Links: -- [1] mailto:registrat...@circle-cross-jn.com [2] http://hbase-site.xml [3] http://hbase-site.xml [4] http://hbase.master [5] http://hbase-site.xml
Re: HBASE -- YCSB ?
There is definitely a debug flag on hbase.. You can find out details on http://hbase.apache.org/shell.html.. I am not sure how much details would it log though.. I have never used it personally Regards, Dhaval - Original Message - From: registrat...@circle-cross-jn.com registrat...@circle-cross-jn.com To: 'user@hbase.apache.org' Cc: Sent: Monday, 9 July 2012 5:56 PM Subject: Re: HBASE -- YCSB ? Is there a debug flag I can use with hbase shell that will tell me the name it's trying to resolve? Thank you --- Jay Wilson - Original Message - From: To:user@hbase.apache.org , registrat...@circle-cross-jn.com Cc: Sent:Tue, 10 Jul 2012 05:36:44 +0800 (SGT) Subject:Re: HBASE -- YCSB ? This exception is generally caused when one of your server names returned does not map to a valid IP address on that host.. The services being up or not does not matter but the hostname should resolve to a valid IPÂ Regards, Dhaval From: registrat...@circle-cross-jn.com [1] To: user@hbase.apache.org [2] Sent: Monday, 9 July 2012 5:30 PM Subject: Re: HBASE -- YCSB ? Â Â Â Â Thank you Amandeep for your input. Â Â Â I go into hbase shell to create a table from my HMaster, which isn't running a DN process and I get the following. Â Could this be caused by a number of my DNs being offline, by the fact that the node isn't running a DN process, or something else? Â Â Â hbase(main):013:0 create 'usertable', 'testcol' ERROR: java.net.NoRouteToHostException: [3] java.netNoRouteToHostException: [4] No route to host Here is some help for this command: Create table; pass table name, a dictionary of specifications per column family, and optionally a dictionary of table configuration. Dictionaries are described below in the GENERAL NOTES section. Examples: hbase create 't1', {NAME = 'f1', VERSIONS = 5} hbase create 't1', {NAME = 'f1'}, {NAME = 'f2'}, {NAME = 'f3'} hbase # The above in shorthand would be the following: hbase create 't1', 'f1', 'f2', 'f3' hbase create 't1', {NAME = 'f1', VERSIONS = 1, TTL = 2592000, BLOCKCACHE = true} hbase create 't1', 'f1', {SPLITS = ['10', '20', '30', '40']} hbase create 't1', 'f1', {SPLITS_FILE = 'splits.txt'} [5] Â Â Â I can see in the ZK logs and the RS logs that they talk to the shell, so I know that communication is good and I find no errors or exceptions in them. Â Â Â Also I can do a hbase shell status, hbase shell zk_dump, and hadoop dfsadmin -report all from the node I am trying to create the table from with no issue. Â Â Â If I get on a node with the DataNode process running on it and try, I get the following: Â Â Â [hadoop@srack0-11 ~]$ hbase shell HBase Shell; enter 'help' for list of supported commands. Type exit to leave the HBase Shell Version 0.90.6-cdh3u4, r, Mon May 7 13:14:00 PDT 2012 hbase(main):001:0 status 3 servers, 0 dead, 0.6667 average load hbase(main):002:0 create 'usertable', 'tempcol' ERROR: java.io.IOException: [6] java.io.IOException: [7] Bad connect ack with firstBadLink as 172.18.0.9:50010 Â Â Â I assume this means it is trying to talk to a DN process on a node that I know is down. Â Â Â Â Â --- Â Â Â Jay Wilson - Original Message - From: user@hbase.apache.org [8] To:, Cc: Sent:Mon, 9 Jul 2012 12:21:22 -0700 Subject:Re: HBASE -- YCSB ? Inline. On Monday, July 9, 2012 at 12:17 PM, registrat...@circle-cross-jn.com [9] [1] wrote: Now that I have a stable cluster, I would like to use YCSB to test its performance; however, I am a bit confused after reading several different website posting about YCSB. 1) Be default will YCSB read my hbase-site.xml [10] [2] file or do I have to copy it into the YCSB conf directory? I plan on using on of my nodes with no Hadoop/HBASE processes running on it, but it has all the environmental stuff in place. You have to put the hbase-site.xml [11] [3] in YCSB/hbase/src/main/conf/. 2) Does the hbase.master [12] [4] property have to be site in the hbase-site.xml [13] [5] file for YCSB to work? The only property that has to be there is the zookeeper quorum list. That's what the HBase client needs to talk to the cluster. 3) After working through all the workloads is there a script/tool that will clean up my HBase? Nope. You'll need to go in and disable, drop the table you wrote too. You can do that from the shell. disable 'mytable' drop 'mytable' That's all you'll need to do to clean it up. Thank You --- Jay Wilson Links: -- [1] mailto:registrat...@circle-cross-jn.com [14] [2] http://hbase-site.xml [15] [3] http://hbase-site.xml [16] [4] http://hbase.master [17] [5] http://hbase-site.xml [18] Links: -- [1] mailto:registrat...@circle-cross-jn.com [2] mailto:user@hbase.apache.org [3] http://java.net.NoRouteToHostException [4] http://java.netNoRouteToHostException [5] http://sitemail.hostway.com/http: [6] http://java.io.IOException [7] http://java.io.IOException [8]
Re: Hmaster and HRegionServer disappearance reason to ask
Pablo, instead of CMSIncrementalMode try UseParNewGC.. That seemed to be the silver bullet when I was dealing with HBase region server crashes Regards, Dhaval From: Pablo Musa pa...@psafe.com To: user@hbase.apache.org user@hbase.apache.org Sent: Thursday, 5 July 2012 5:37 PM Subject: RE: Hmaster and HRegionServer disappearance reason to ask I am having the same problem. I tried N different things but I cannot solve the problem. hadoop-0.20.noarch 0.20.2+923.256-1 hadoop-hbase.noarch 0.90.6+84.29-1 hadoop-zookeeper.noarch 3.3.5+19.1-1 I already set: property namehbase.hregion.memstore.mslab.enabled/name valuetrue/value /property property namehbase.regionserver.handler.count/name value20/value /property But it does not seem to work. How can I check if this variables are really set in the HRegionServer? I am starting the server with the following: -Xmx8192m -XX:NewSize=64m -XX:MaxNewSize=64m -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps I am also having trouble to read reagionserver.out [GC 72004.406: [ParNew: 55830K-2763K(59008K), 0.0043820 secs] 886340K-835446K(1408788K) icms_dc=0 , 0.0044900 secs] [Times: user=0.04 sys=0.00, real=0.00 secs] [GC 72166.759: [ParNew: 55192K-6528K(59008K), 135.1102750 secs] 887876K-839688K(1408788K) icms_dc=0 , 135.1103920 secs] [Times: user=1045.58 sys=138.11, real=135.09 secs] [GC 72552.616: [ParNew: 58977K-6528K(59008K), 0.0083040 secs] 892138K-847415K(1408788K) icms_dc=0 , 0.0084060 secs] [Times: user=0.05 sys=0.01, real=0.01 secs] [GC 72882.991: [ParNew: 58979K-6528K(59008K), 151.4924490 secs] 899866K-853931K(1408788K) icms_dc=0 , 151.4925690 secs] [Times: user=0.07 sys=151.48, real=151.47 secs] What does each part means? Each line is a GC cicle? Thanks, Pablo -Original Message- From: Lars George [mailto:lars.geo...@gmail.com] Sent: segunda-feira, 2 de julho de 2012 06:43 To: user@hbase.apache.org Subject: Re: Hmaster and HRegionServer disappearance reason to ask Hi lztaomin, org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired indicates that you have experienced the Juliet Pause issue, which means you ran into a JVM garbage collection that lasted longer than the configured ZooKeeper timeout threshold. If you search for it on Google http://www.google.com/search?q=juliet+pause+hbase you will find quite a few pages explaining the problem, and what you can do to avoid this. Lars On Jul 2, 2012, at 10:30 AM, lztaomin wrote: HI ALL My HBase group a total of 3 machine, Hadoop HBase mounted in the same machine, zookeeper using HBase own. Operation 3 months after the reported abnormal as follows. Cause hmaster and HRegionServer processes are gone. Please help me. Thanks The following is a log ABORTING region server serverName=datanode1,60020,1325326435553, load=(requests=332, regions=188, usedHeap=2741, maxHeap=8165): regionserver:60020-0x3488dec38a02b1 regionserver:60020-0x3488dec38a02b1 received expired from ZooKeeper, aborting Cause: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(Zoo KeeperWatcher.java:343) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWa tcher.java:261) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.ja va:530) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) 2012-07-01 13:45:38,707 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for datanode1,60020,1325326435553 2012-07-01 13:45:38,756 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 32 hlog(s) in hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553 2012-07-01 13:45:38,764 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 1 of 32: hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanod e1%3A60020.1341006689352, length=5671397 2012-07-01 13:45:38,764 INFO org.apache.hadoop.hbase.util.FSUtils: Recovering file hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanod e1%3A60020.1341006689352 2012-07-01 13:45:39,766 INFO org.apache.hadoop.hbase.util.FSUtils: Finished lease recover attempt for hdfs://namenode:9000/hbase/.logs/datanode1,60020,1325326435553/datanod e1%3A60020.1341006689352 2012-07-01 13:45:39,880 INFO org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs -- HDFS-200 2012-07-01 13:45:39,925 INFO org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs -- HDFS-200 ABORTING region server serverName=datanode2,60020,1325146199444, load=(requests=614,
Re:: HBase dies shortly after starting.
Try cleaning up your zookeeper data.. I have had similar issues before due to corrupt zookeeper data/bad zookeeper state -- On Sat 30 Jun, 2012 4:12 AM IST Jay Wilson wrote: I somewhat have HBase up and running in a distributed mode. It starts fine, I can use hbase shell to create, disable, and drop tables; however, after a short period of time HMaster and the HRegionalservers terminate. Decoding the error messages is a bit bewildering and the O'Reilly HBase book hasn't helped much with message decoding. Here is a snippet of the messages from a regionalserver log: ~~~ U Stats: total=6.68 MB, free=807.12 MB, max=813.8 MB, blocks=2, accesses=19, hits=17, hitRatio=89.47%%, cachingAccesses=17, cachingHits=15, cachingHitsRatio=88. 23%%, evictions=0, evicted=0, evictedPerRun=NaN 2012-06-27 12:36:47,103 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=6.68 MB, free=807.12 MB, max=813.8 MB, blocks=2, accesses=19, hits=17, hitRatio=89.47%%, cachingAccesses=17, cachingHits=15, cachingHitsRatio=88. 23%%, evictions=0, evicted=0, evictedPerRun=NaN 2012-06-27 12:40:02,106 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x382f6861690003, likely server has closed socket, closing socket connection and attempting reconnect 2012-06-27 12:40:02,112 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x382f6861690004, likely server has closed socket, closing socket connection and attempting reconnect 2012-06-27 12:40:02,245 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server devrackA-01/172.18.0.2:2181 2012-06-27 12:40:02,247 WARN org.apache.zookeeper.ClientCnxn: Session 0x382f6861690003 for server null, unexpected error, closing socket connection and attempting reconnect java.net.NoRouteToHostException: No route to host ~~~ No route to host would imply it can't reach one of my HQuorumpeers, but it talks to them when I first run start-hase.sh. Also there is no DNS involved, the /etc/hosts files are identical on all nodes, and it's currently a closed cluster. All nodes are on the same subnet 172.18/16 Do I have something wrong in one of my xml files: Core-site.xml: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namehadoop.tmp.dir/name value/var/hbase-hadoop/tmp/value /property property namefs.default.name/name valuehdfs://devrackA-00:8020/value finaltrue/final /property /configuration Hdfs-site.xml: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namedfs.replication/name value3/value /property property namedfs.name.dir/name value/var/hbase-hadoop/name/value /property property namedfs.data.dir/name value/var/hbase-hadoop/data/value /property property namefs.checkpoint.dir/name value/var/hbase-hadoop/namesecondary/value /property property namedfs.datanode.max.xcievers/name value4096/value /property /configuration Hbase-site.xml: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- /** * Copyright 2010 The Apache Software Foundation * * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * License); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ -- configuration property namehbase.rootdir/name valuehdfs://devrackA-00:8020/var/hbase-hadoop/hbase/value /property property namedfs.datanode.max.xcievers/name value4096/value /property property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.regionserver.handler.count/name value20/value /property property namehbase.zookeeper.quorum/name valuedevrackA-00,devrackA-01,devrackA-25/value /property property
Re:: Rows count
Instead of the shell rowcount you can use the MR job for rowcount.. something like hadoop jar path_to_hbase.jar rowcount your_table The MR job is much faster than the shell -- On Mon 25 Jun, 2012 4:52 AM IST Jean-Marc Spaggiari wrote: Hi, In HBASE-1512 (https://issues.apache.org/jira/browse/HBASE-1512) there is the implementation of co-processor for count and others. Is there anywhere an example of the way to use them? Because the shell count is very slow with there is to many rows. Thanks, JM
Re:: Pseudo Distributed: ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of localhost.localdomain
Have you restarted zookeeper? Also clearing zookeeper data dir and data log dir might also help.. it seems that localhost.localdomain is being cached somewhere -- On Thu 7 Jun, 2012 2:48 PM IST Manu S wrote: Hi All, In pseudo distributed node HBaseMaster is stopping automatically when we starts HbaseRegion. I have changed all the configuration files of Hadoop,Hbase Zookeeper to set the exact hostname of the machine. Also commented the localhost entry from /etc/hosts cleared the cache as well. There is no entry of localhost.localdomain entry in these configurations, but this it is resolving to localhost.localdomain. Please find the error: 2012-06-07 12:13:11,995 INFO org.apache.hadoop.hbase.master.MasterFileSystem: No logs to split *2012-06-07 12:13:12,103 ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of localhost.localdomain 2012-06-07 12:13:12,104 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.* *java.lang.IllegalArgumentException: hostname can't be null* at java.net.InetSocketAddress.init(InetSocketAddress.java:121) at org.apache.hadoop.hbase.HServerAddress.getResolvedAddress(HServerAddress.java:108) at org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:64) at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82) at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:222) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:240) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:487) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:455) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:406) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:293) 2012-06-07 12:13:12,106 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-06-07 12:13:12,106 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads Thanks, Manu S
Re: HBase server treating my emails as spam (FW: Failure Notice)
Got ya.. will try that next time.. thanks -- On Sun 3 Jun, 2012 3:59 PM IST Harsh J wrote: Hey Dhaval, Sending plaintext email replies helps in such cases. Rich formatting may have caused this (With too many HTML links, etc.). On Sat, Jun 2, 2012 at 5:22 AM, Dhaval Shah prince_mithi...@yahoo.co.in wrote: Hi guys. When I send an email from my yahoo account (from a PC/laptop), the hbase mail servers are treating it as spam.. if I send it from my cell using the same yahoo account it goes through (like this one).. my last email got marked as spam by hbase servers as you can read below.. Forwarded Message From: mailer-dae...@yahoo.com To: prince_mithi...@yahoo.co.in Sent: Fri 1 Jun, 2012 8:18 PM IST Subject: Failure Notice Sorry, we were unable to deliver your message to the following address. user@hbase.apache.org: Remote host said: 552 spam score (5.4) exceeded threshold (FREEMAIL_FORGED_REPLYTO,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL ) [BODY] --- Below this line is a copy of the message. Received: from [106.10.166.116] by nm2.bullet.mail.sg3.yahoo.com with NNFMP; 01 Jun 2012 14:48:07 - Received: from [106.10.151.251] by tm5.bullet.mail.sg3.yahoo.com with NNFMP; 01 Jun 2012 14:48:07 - Received: from [127.0.0.1] by omp1022.mail.sg3.yahoo.com with NNFMP; 01 Jun 2012 14:48:07 - X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 151978.45173...@omp1022.mail.sg3.yahoo.com Received: (qmail 69489 invoked by uid 60001); 1 Jun 2012 14:48:07 - DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.co.in; s=s1024; t=1338562087; bh=4UyEMs/mYUWEloKWPm8TNHpf0XUq1hS6shr2jtC2png=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=Idtey84AMgbZ1B0L9sZFUeXVnhL4qrmUCjIVia5g6jKmTDtZ3QS5Qg8VbHpWAVzIORmWqOx1ia3u9WqCIwuMhHNDI9fXcHjJ0V6E5GX0dCXXarubdolBxCWAvodQr9ZXrj8JO+zsSDMFCDSW83ZmpPoFptt5NQWdNE1wNVUEKN8= DomainKey-Signature:a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.co.in; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=yn52KmZA19Tre13iYbmt0H4NRfudP7x7xrlGehPDMUU7OXCOWfKtfyaNZ5e7x0lI1A4mjdEmeEwaNkEFV4MYDcBl8LmuV3HQsX4NZl15VuPYS7GbDGxDdeoR9CZinzJhWHlzPuhPi+g4MGDWbTev7FtayVKJrLQSCrQCmw5WAFI=; X-YMail-OSG: O1MD734VM1nUIwivrAyqVmVyGupyuKq4og1.Ui6RaHpyrCi oqTdLss0wdr4WW5pfNFAxC4oK36jwewpOMhWLixlxazByEySyIIPKzBdakrK z0IlUE4QclLH6g3LjSc_JYtH9M4tHgSCN4WTuBS0s34F7xcTAFPdXhm2v.xu dv.7wJ1N_3QHrLvZh9XeEQxm721CMic72Yk.PtcEg_aSljOiZVd_MdLbQkyq 80l2H98OCHLqfvgbs2qO40x5_RwJ.pzUtRmx_gs.GxsfCuIvQMiA7XkXbKAs ZFQSJR0EDVRFCc5QSjCKPkK25hgjzkAzQ8MqPpc7o44O1az8bQzWPQbqVHaS 89m8QcqJ.R43KIRDRrdausENT199M0HvqygTrcPhkzUhSW73RXiOQyxJP_BP lRx2245t8bkU4Rm34LqkkyKTtPnhK7VWHCi8V..yq0qUKMoN1_KN6Y1XhVID dEucmLKc3PRe2z_BEAEr_hsh7HB.xmkXNM6JCFE3hXh1k9NCtnNm.7EhdS8S 9mw-- Received: from [199.172.169.86] by web192504.mail.sg3.yahoo.com via HTTP; Fri, 01 Jun 2012 22:48:06 SGT X-Mailer: YahooMailWebService/0.8.118.349524 References: 4fc4e89f.3090...@free.fr 4fc7c20c.4040...@free.fr CAGpTDNcDBGF=v+fx7vssvpdx91qg_4_vq0a8fxzee+70gue...@mail.gmail.com 4fc7ddf1.8080...@free.fr CAGpTDNe94LkkvLcHP1nCucXzb6Dapb-X+tfQEBw+55Tnx=5e=a...@mail.gmail.com 4fc8c2c6.7060...@free.fr 4fc8d456.6060...@free.fr Message-ID: 1338562086.60488.yahoomail...@web192504.mail.sg3.yahoo.com Date: Fri, 1 Jun 2012 22:48:06 +0800 (SGT) From: Dhaval Shah prince_mithi...@yahoo.co.in Reply-To: Dhaval Shah prince_mithi...@yahoo.co.in Subject: Re: hosts unreachables To: user@hbase.apache.org user@hbase.apache.org In-Reply-To: 4fc8d456.6060...@free.fr MIME-Version: 1.0 Content-Type: multipart/alternative; boundary=1766475574-631287525-1338562086=:60488 --1766475574-631287525-1338562086=:60488 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Can you try removing=A0-XX:+CMSIncrementalMode from your GC settings.. That= had caused a lot of pain a few months back=0A=A0=0ARegards,=0ADhaval=0A=0A= =0A=0A From: Cyril Scetbon cyril.scetbon@f= ree.fr=0ATo: user@hbase.apache.org =0ASent: Friday, 1 June 2012 10:40 AM= =0ASubject: Re: hosts unreachables=0A =0AI've another regionserver (hb-d2) = that crashed (I can easily reproduce the issue by continuing injections), a= nd as I see in master log, it gets information about hb-d2 every 5 minutes.= I suppose it's what helps him to note if a node is dead or not. However it= adds hb-d2 to the dead node list at 13:32:20, so before 5 minutes since th= e last time it got the server information. Is it normal ?=0A=0A2012-06-01 1= 3:02:36,309 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server infor= mation: hb-d5,60020,1338553124247=3D47, hb-d4,60020,1338553126577=3D47, hb-= d7,60020,1338553124279=3D46, hb-d10,60020,1338553126695=3D47, hb-d6,60020,1= 33=0A8553124588
HBase server treating my emails as spam (FW: Failure Notice)
Hi guys. When I send an email from my yahoo account (from a PC/laptop), the hbase mail servers are treating it as spam.. if I send it from my cell using the same yahoo account it goes through (like this one).. my last email got marked as spam by hbase servers as you can read below.. Forwarded Message From: mailer-dae...@yahoo.com To: prince_mithi...@yahoo.co.in Sent: Fri 1 Jun, 2012 8:18 PM IST Subject: Failure Notice Sorry, we were unable to deliver your message to the following address. user@hbase.apache.org: Remote host said: 552 spam score (5.4) exceeded threshold (FREEMAIL_FORGED_REPLYTO,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL ) [BODY] --- Below this line is a copy of the message. Received: from [106.10.166.116] by nm2.bullet.mail.sg3.yahoo.com with NNFMP; 01 Jun 2012 14:48:07 - Received: from [106.10.151.251] by tm5.bullet.mail.sg3.yahoo.com with NNFMP; 01 Jun 2012 14:48:07 - Received: from [127.0.0.1] by omp1022.mail.sg3.yahoo.com with NNFMP; 01 Jun 2012 14:48:07 - X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 151978.45173...@omp1022.mail.sg3.yahoo.com Received: (qmail 69489 invoked by uid 60001); 1 Jun 2012 14:48:07 - DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.co.in; s=s1024; t=1338562087; bh=4UyEMs/mYUWEloKWPm8TNHpf0XUq1hS6shr2jtC2png=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=Idtey84AMgbZ1B0L9sZFUeXVnhL4qrmUCjIVia5g6jKmTDtZ3QS5Qg8VbHpWAVzIORmWqOx1ia3u9WqCIwuMhHNDI9fXcHjJ0V6E5GX0dCXXarubdolBxCWAvodQr9ZXrj8JO+zsSDMFCDSW83ZmpPoFptt5NQWdNE1wNVUEKN8= DomainKey-Signature:a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.co.in; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=yn52KmZA19Tre13iYbmt0H4NRfudP7x7xrlGehPDMUU7OXCOWfKtfyaNZ5e7x0lI1A4mjdEmeEwaNkEFV4MYDcBl8LmuV3HQsX4NZl15VuPYS7GbDGxDdeoR9CZinzJhWHlzPuhPi+g4MGDWbTev7FtayVKJrLQSCrQCmw5WAFI=; X-YMail-OSG: O1MD734VM1nUIwivrAyqVmVyGupyuKq4og1.Ui6RaHpyrCi oqTdLss0wdr4WW5pfNFAxC4oK36jwewpOMhWLixlxazByEySyIIPKzBdakrK z0IlUE4QclLH6g3LjSc_JYtH9M4tHgSCN4WTuBS0s34F7xcTAFPdXhm2v.xu dv.7wJ1N_3QHrLvZh9XeEQxm721CMic72Yk.PtcEg_aSljOiZVd_MdLbQkyq 80l2H98OCHLqfvgbs2qO40x5_RwJ.pzUtRmx_gs.GxsfCuIvQMiA7XkXbKAs ZFQSJR0EDVRFCc5QSjCKPkK25hgjzkAzQ8MqPpc7o44O1az8bQzWPQbqVHaS 89m8QcqJ.R43KIRDRrdausENT199M0HvqygTrcPhkzUhSW73RXiOQyxJP_BP lRx2245t8bkU4Rm34LqkkyKTtPnhK7VWHCi8V..yq0qUKMoN1_KN6Y1XhVID dEucmLKc3PRe2z_BEAEr_hsh7HB.xmkXNM6JCFE3hXh1k9NCtnNm.7EhdS8S 9mw-- Received: from [199.172.169.86] by web192504.mail.sg3.yahoo.com via HTTP; Fri, 01 Jun 2012 22:48:06 SGT X-Mailer: YahooMailWebService/0.8.118.349524 References: 4fc4e89f.3090...@free.fr 4fc7c20c.4040...@free.fr CAGpTDNcDBGF=v+fx7vssvpdx91qg_4_vq0a8fxzee+70gue...@mail.gmail.com 4fc7ddf1.8080...@free.fr CAGpTDNe94LkkvLcHP1nCucXzb6Dapb-X+tfQEBw+55Tnx=5e=a...@mail.gmail.com 4fc8c2c6.7060...@free.fr 4fc8d456.6060...@free.fr Message-ID: 1338562086.60488.yahoomail...@web192504.mail.sg3.yahoo.com Date: Fri, 1 Jun 2012 22:48:06 +0800 (SGT) From: Dhaval Shah prince_mithi...@yahoo.co.in Reply-To: Dhaval Shah prince_mithi...@yahoo.co.in Subject: Re: hosts unreachables To: user@hbase.apache.org user@hbase.apache.org In-Reply-To: 4fc8d456.6060...@free.fr MIME-Version: 1.0 Content-Type: multipart/alternative; boundary=1766475574-631287525-1338562086=:60488 --1766475574-631287525-1338562086=:60488 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Can you try removing=A0-XX:+CMSIncrementalMode from your GC settings.. That= had caused a lot of pain a few months back=0A=A0=0ARegards,=0ADhaval=0A=0A= =0A=0A From: Cyril Scetbon cyril.scetbon@f= ree.fr=0ATo: user@hbase.apache.org =0ASent: Friday, 1 June 2012 10:40 AM= =0ASubject: Re: hosts unreachables=0A =0AI've another regionserver (hb-d2) = that crashed (I can easily reproduce the issue by continuing injections), a= nd as I see in master log, it gets information about hb-d2 every 5 minutes.= I suppose it's what helps him to note if a node is dead or not. However it= adds hb-d2 to the dead node list at 13:32:20, so before 5 minutes since th= e last time it got the server information. Is it normal ?=0A=0A2012-06-01 1= 3:02:36,309 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Server infor= mation: hb-d5,60020,1338553124247=3D47, hb-d4,60020,1338553126577=3D47, hb-= d7,60020,1338553124279=3D46, hb-d10,60020,1338553126695=3D47, hb-d6,60020,1= 33=0A8553124588=3D47, hb-d8,60020,1338553124113=3D47, hb-d2,60020,133855312= 6560=3D47, hb-d11,60020,1338553124329=3D47, hb-d12,60020,1338553126567=3D47= , hb-d1,60020,1338553126474=3D47, hb-d9,60020,1338553124179=3D47=0A..=0A201= 2-06-01 13:07:36,319 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Ser= ver information: hb-d5,60020,1338553124247=3D47, hb-d4,60020,1338553126577= =3D47, hb-d7,60020,1338553124279=3D46, hb
Re:: Cannot post or reply to hbase mailing list anymore - please unblock
I am seeing a very similar behavior. The funny part is that if I reply from my android it goes through (like right now) but if I send it from my browser its classified as spam (for the exact same email account) -- On Sat 26 May, 2012 2:00 PM IST Christian Schäfer wrote: Hi again I cannot post to hbase mailing list anymore. Like today, that seems to happen when I answer mails as I just wanted to it. My mail provider (yahoo) wrote me that it's not classifying my mail as spam on their side but on the hbase mailing server. Could someone unblock me or something? regards Chris
Re: : question on filters
Yes instead of a single Get you can supply a list of Get's to the same htable.get call. It will sort and partition the list on a per region basis, make requests in parallel, aggregate the responses and return an array of Result. Make sure you apply your filter to each Get -- On Fri 25 May, 2012 11:18 PM IST jack chrispoo wrote: Thanks Dhaval, and is there a way to get multiple rows (their keys not contiguous) from HBase server with only one request? it seems to me it's expensive to send one get request for each one row. jack On Thu, May 24, 2012 at 5:40 PM, Dhaval Shah prince_mithi...@yahoo.co.inwrote: Jack, you can use filters on Get's too.. -- On Fri 25 May, 2012 5:36 AM IST jack chrispoo wrote: Hi, I'm new to HBase and I have a question about using filters. I know that I can use filters with scan, say scan start-key=key1 end-key=key2 and with a SingleColumnValueFilter: columnA=valueA. But in my java program I need to do filtering on a set of rows which are not contiguous, my client needs to get all rows with rowid in a setString and with columnA=valueA. I don't know how this can be done efficiently. I can imagine that I can do a scan of the entire table and set filers rowid=... and columnA=valueA; or I can use get function to get the rows with rowid in my set all to my client and do the filtering on my client side. But I think neither way is efficient. Can anyone give me a hint on this? Thanks Yixiao
Re:: question on filters
Jack, you can use filters on Get's too.. -- On Fri 25 May, 2012 5:36 AM IST jack chrispoo wrote: Hi, I'm new to HBase and I have a question about using filters. I know that I can use filters with scan, say scan start-key=key1 end-key=key2 and with a SingleColumnValueFilter: columnA=valueA. But in my java program I need to do filtering on a set of rows which are not contiguous, my client needs to get all rows with rowid in a setString and with columnA=valueA. I don't know how this can be done efficiently. I can imagine that I can do a scan of the entire table and set filers rowid=... and columnA=valueA; or I can use get function to get the rows with rowid in my set all to my client and do the filtering on my client side. But I think neither way is efficient. Can anyone give me a hint on this? Thanks Yixiao
Re: RegionServer silently stops (only issue: CMS-concurrent-mark ~80sec)
Not sure if its related (or even helpful) but we were using cdh3b4 (which is 0.90.1) and we saw similar issues with region servers going down.. we didn't look at GC logs but we had very high zookeeper leases so its unlikely that the GC could have caused the issue.. this problem went away when we upgraded to cdh3u3 which is rock steady in terms of region servers.. (havent had a single region server crash in a month where on the older version I used to have 1 crash every couple of days).. the only other difference between the two is that we use snappy on the newer one and gz on the old We also noticed that having replication enabled also contributed to the issues.. -- On Tue 1 May, 2012 3:15 PM IST N Keywal wrote: Hi Alex, On the same idea, note that hbase is launched with -XX:OnOutOfMemoryError=kill -9 %p. N. On Tue, May 1, 2012 at 10:41 AM, Igal Shilman ig...@wix.com wrote: Hi Alex, just to rule out, oom killer, Try this: http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer On Mon, Apr 30, 2012 at 10:48 PM, Alex Baranau alex.barano...@gmail.com wrote: Hello, During recent weeks I constantly see some RSs *silently* dying on our HBase cluster. By silently I mean that process stops, but no errors in logs [1]. The only thing I can relate to it is long CMS-concurrent-mark: almost 80 seconds. But this should not cause issues as it is not a stop-the-world process. Any advice? HBase: hbase-0.90.4-cdh3u3 Hadoop: 0.20.2-cdh3u3 Thank you, Alex Baranau [1] last lines from RS log (no errors before too, and nothing written in *.out file): 2012-04-30 18:52:11,806 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for agg-sa-1.3,0011| te|dtc|\x00\x00\x00\x00\x00\x00\x1E\x002\x00\x00\x00\x015\x9C_n\x00\x00\x00\x00\x00\x00\x00\x00\x00,1334852280902.4285f9339b520ee617c087c0fd0dbf65. because regionserver60020.cacheFlusher; priority=-1, compaction queue size=0 2012-04-30 18:54:58,779 DEBUG org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: using new createWriter -- HADOOP-6840 2012-04-30 18:54:58,779 DEBUG org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Path=hdfs://xxx.ec2.internal/hbase/.logs/xxx.ec2.internal,60020,1335706613397/xxx.ec2.internal%3A60020.1335812098651, syncFs=true, hflush=false 2012-04-30 18:54:58,874 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/xxx.ec2.internal,60020,1335706613397/xxx.ec2.internal%3A60020.1335811856672, entries=73789, filesize=63773934. New hlog /hbase/.logs/xxx.ec2.internal,60020,1335706613397/xxx.ec2.internal%3A60020.1335812098651 2012-04-30 18:56:31,867 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up with memory above low water. 2012-04-30 18:56:31,867 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region agg-sa-1.3,s_00I4| tdqc\x00docs|mrtdocs|\x00\x00\x00\x00\x00\x03\x11\xF4\x00none\x00|1334692562\x00\x0D\xE0\xB6\xB3\xA7c\xFF\xBC|26837373\x00\x00\x00\x016\xC1\xE0D\xBE\x00\x00\x00\x00\x00\x00\x00\x00,1335761291026.30b127193485342359eadf1586819805. due to global heap pressure 2012-04-30 18:56:31,867 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for agg-sa-1.3,s_00I4| tdqc\x00docs|mrtdocs|\x00\x00\x00\x00\x00\x03\x11\xF4\x00none\x00|1334692562\x00\x0D\xE0\xB6\xB3\xA7c\xFF\xBC|26837373\x00\x00\x00\x016\xC1\xE0D\xBE\x00\x00\x00\x00\x00\x00\x00\x00,1335761291026.30b127193485342359eadf1586819805., current region memstore size 138.1m 2012-04-30 18:56:31,867 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting, commencing flushing stores 2012-04-30 18:56:56,303 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=322.84 MB, free=476.34 MB, max=799.17 MB, blocks=5024, accesses=12189396, hits=127592, hitRatio=1.04%%, cachingAccesses=132480, cachingHits=126949, cachingHitsRatio=95.82%%, evictions=0, evicted=0, evictedPerRun=NaN 2012-04-30 18:56:59,026 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://zzz.ec2.internal/hbase/agg-sa-1.3/30b127193485342359eadf1586819805/.tmp/391890051647401997 to hdfs://zzz.ec2.internal/hbase/agg-sa-1.3/30b127193485342359eadf1586819805/a/1139737908876846168 2012-04-30 18:56:59,034 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://zzz.ec2.internal/hbase/agg-sa-1.3/30b127193485342359eadf1586819805/a/1139737908876846168, entries=476418, sequenceid=880198761, memsize=138.1m, filesize=5.7m 2012-04-30 18:56:59,097 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~138.1m for region agg-sa-1.3,s_00I4|
HBase Thrift for CDH3U3 leaking file descriptors/socket connections to Zookeeper
We have an app written in Ruby which uses HBase as the backing store.. It uses Thrift to connect to it.. We were using HBase from Cloudera's CDH3B4 distro until now and it worked fine.. I just upgraded our Hadoop install to CDH3U3 (which is the latest stable CDH release at this point) and in a matter of hours all Thrift servers went down.. Upon further investigation I realized that it was hitting the limit on the number of allowed file descriptors (which is pretty high at 32k).. This problem occurs if I use thrift in any configuration (hsha, framed transport, threadpool) except the nonblocking mode.. Digging further I realized a couple of things: 1. Even with light load (1-2 processes hitting the thrift server in quick succession), thrift is spinning up new threads and each of the threads is maintaining a socket connection to zookeeper.. In a matter on minutes (with this load test), thrift has 32k open connections with 8k threads having connection to zookeeper which do not seem to die even after a day.. 2. The logs show approx 3-4 open connections (presumably for each thread): java53588 hbase 4135r FIFO0,6 177426 pipe java53588 hbase 4136w FIFO0,6 177426 pipe java53588 hbase 4137r 0,11 0 177427 eventpoll java53588 hbase 4138u IPv4 177428TCP njhaddev05:49729-njhaddev01:2181 (ESTABLISH ED) CDH3B4 with the exact same configurations and the exact same setup works fine but CDH3U3 does not.. Using Thrift in nonblocking mode isn't really an option because of the low throughput and single threaded nature.. Can someone help please?