Re: Simple user/pass authentication
Thanks for the info. I did look at the "simple" authentication however I couldn't see how it works with the clients? Do clients pass user/pass? How to define user/pass? Is password stored in clear text? On Fri, May 6, 2016 at 10:57 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Please take a look at: > > > http://hbase.apache.org/book.html#_server_side_configuration_for_simple_user_access_operation > > On Fri, May 6, 2016 at 10:51 AM, Mohit Anchlia <mohitanch...@gmail.com> > wrote: > > > Is there a way to implement a simple user/pass authentication in HBase > > instead of using a Kerberos? Are the coprocessor the right way of > > implementing such authentication? > > >
Simple user/pass authentication
Is there a way to implement a simple user/pass authentication in HBase instead of using a Kerberos? Are the coprocessor the right way of implementing such authentication?
Re: HTable and streaming
Better approach would be to break the data in chunks and create a behaviour similar to indirect blocks. On Mon, Jun 3, 2013 at 9:12 PM, Asaf Mesika asaf.mes...@gmail.com wrote: I guess one can hack opening a socket from a Coprocessor Endpoint and push its scanned data, thus achieving a stream. On Sun, Jun 2, 2013 at 12:42 AM, Stack st...@duboce.net wrote: Yeah, no streaming API in our current client (nor does our thrift client give you a streaming API). St.Ack On Sat, Jun 1, 2013 at 8:21 AM, Simon Majou si...@majou.org wrote: No I don't want to scan a table, I want a stream of one result. In case for example of big records. With thrift it is up to the client to deal with the response, so in theory you can build a client which returns streams. But when I look at the current implementation of thrift for node for instance (https://github.com/apache/thrift/tree/trunk/lib/nodejs), it sends back only full formed results (as the Java client). Simon On Sat, Jun 1, 2013 at 5:14 PM, Ted Yu yuzhih...@gmail.com wrote: I assume you want to scan the table: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getScanner(byte[]) Can you point out which thrift client methods provide streaming capability ? Thanks On Sat, Jun 1, 2013 at 8:09 AM, Simon Majou si...@majou.org wrote: Hi, I don't see any methods returning streams in the Java client HTable : http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html That means we need to use the thrift client to get streams ? Simon
Re: Understanding scan behaviour
Thanks, that's a good point about last byte being max :) When I query 1234555..1234556 do I also get row for 1234556 if one exist? On Sat, Mar 30, 2013 at 6:55 AM, Asaf Mesika asaf.mes...@gmail.com wrote: Yes. Watch out for last byte being max On Fri, Mar 29, 2013 at 7:31 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Thanks everyone, it's really helpful. I'll change my prefix filter to end row. Is it necessary to increment the last byte? So if I have hash of 1234555 my end key should be 1234556? On Thu, Mar 28, 2013 at 11:20 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Mohith, It is always better to go with start row and end row if you are knowing what are they. Just add one byte more to the actual end row (inclusive row) and form the end key. This will narrow down the search. Remeber the byte comparison is the way that HBase scans. Regards Ram On Fri, Mar 29, 2013 at 11:18 AM, Li, Min m...@microstrategy.com wrote: Hi, Mohit, Try using ENDROW. STARTROWENDROW is much faster than PrefixFilter. + ascii code is 43 , ascii code is 44 scan 'SESSIONID_TIMELINE', {LIMIT = 1,STARTROW = '', ENDROW='+++,'} Min -Original Message- From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Friday, March 29, 2013 1:18 AM To: user@hbase.apache.org Subject: Re: Understanding scan behaviour Could the prefix filter lead to full tablescan? In other words is PrefixFilter applied after fetching the rows? Another question I have is say I have row key abc and abd and I search for row abc, is it always guranteed to be the first key when returned from scanned results? If so I can alway put a condition in the client app. On Thu, Mar 28, 2013 at 9:15 AM, Ted Yu yuzhih...@gmail.com wrote: Take a look at the following in hbase-server/src/main/ruby/shell/commands/scan.rb (trunk) hbase scan 't1', {FILTER = (PrefixFilter ('row2') AND (QualifierFilter (=, 'binary:xyz'))) AND (TimestampsFilter ( 123, 456))} Cheers On Thu, Mar 28, 2013 at 9:02 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I see then I misunderstood the behaviour. My keys are id + timestamp so that I can do a range type search. So what I really want is to return a row where id matches the prefix. Is there a way to do this without having to scan large amounts of data? On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mohit, + ascii code is 43 9 ascii code is 57. So +9 is coming after ++. If you don't have any row with the exact key +, HBase will look for the first one after this one. And in your case, it's +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF. JM 2013/3/28 Mohit Anchlia mohitanch...@gmail.com: My understanding is that the row key would start with + for instance. On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mohit, I see nothing wrong with the results below. What would I have expected? JM 2013/3/28 Mohit Anchlia mohitanch...@gmail.com: I am running 92.1 version and this is what happens. hbase(main):003:0 scan 'SESSIONID_TIMELINE', {LIMIT = 1, STARTROW = 'sdw0'} ROW COLUMN+CELL s\xC1\xEAR\xDF\xEA\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106, value=PAGE\x09\x091363056252990\x09\x09/ 7F\xFF\xFE\xC2\xA3\x84Z\x7F 1 row(s) in 0.0450 seconds hbase(main):004:0 scan 'SESSIONID_TIMELINE', {LIMIT = 1, STARTROW = '--'} ROW COLUMN+CELL -\xA1\xAFr\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\ column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714, value=PAGE\x09239923973\x091363384698919\x09/ xFF\xFE\xC2\x8F\xF0\xC1\xBF row(s) in 0.0500 seconds hbase(main):005:0 scan 'SESSIONID_TIMELINE', {LIMIT = 1, STARTROW = ''} ROW COLUMN+CELL +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF column=SID_T_MTX:\x00\x002, timestamp=1364404155426, value=PAGE\x09\x091364404145275\x09 \x09/ E\xC2S-\x08\x1F 1 row(s) in 0.0640 seconds hbase(main):006:0 On Wed, Mar 27, 2013 at 9:23 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Same question, same time :) Regards Ram On Thu
Re: Understanding scan behaviour
Thanks everyone, it's really helpful. I'll change my prefix filter to end row. Is it necessary to increment the last byte? So if I have hash of 1234555 my end key should be 1234556? On Thu, Mar 28, 2013 at 11:20 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Mohith, It is always better to go with start row and end row if you are knowing what are they. Just add one byte more to the actual end row (inclusive row) and form the end key. This will narrow down the search. Remeber the byte comparison is the way that HBase scans. Regards Ram On Fri, Mar 29, 2013 at 11:18 AM, Li, Min m...@microstrategy.com wrote: Hi, Mohit, Try using ENDROW. STARTROWENDROW is much faster than PrefixFilter. + ascii code is 43 , ascii code is 44 scan 'SESSIONID_TIMELINE', {LIMIT = 1,STARTROW = '', ENDROW='+++,'} Min -Original Message- From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Friday, March 29, 2013 1:18 AM To: user@hbase.apache.org Subject: Re: Understanding scan behaviour Could the prefix filter lead to full tablescan? In other words is PrefixFilter applied after fetching the rows? Another question I have is say I have row key abc and abd and I search for row abc, is it always guranteed to be the first key when returned from scanned results? If so I can alway put a condition in the client app. On Thu, Mar 28, 2013 at 9:15 AM, Ted Yu yuzhih...@gmail.com wrote: Take a look at the following in hbase-server/src/main/ruby/shell/commands/scan.rb (trunk) hbase scan 't1', {FILTER = (PrefixFilter ('row2') AND (QualifierFilter (=, 'binary:xyz'))) AND (TimestampsFilter ( 123, 456))} Cheers On Thu, Mar 28, 2013 at 9:02 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I see then I misunderstood the behaviour. My keys are id + timestamp so that I can do a range type search. So what I really want is to return a row where id matches the prefix. Is there a way to do this without having to scan large amounts of data? On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mohit, + ascii code is 43 9 ascii code is 57. So +9 is coming after ++. If you don't have any row with the exact key +, HBase will look for the first one after this one. And in your case, it's +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF. JM 2013/3/28 Mohit Anchlia mohitanch...@gmail.com: My understanding is that the row key would start with + for instance. On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mohit, I see nothing wrong with the results below. What would I have expected? JM 2013/3/28 Mohit Anchlia mohitanch...@gmail.com: I am running 92.1 version and this is what happens. hbase(main):003:0 scan 'SESSIONID_TIMELINE', {LIMIT = 1, STARTROW = 'sdw0'} ROW COLUMN+CELL s\xC1\xEAR\xDF\xEA\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106, value=PAGE\x09\x091363056252990\x09\x09/ 7F\xFF\xFE\xC2\xA3\x84Z\x7F 1 row(s) in 0.0450 seconds hbase(main):004:0 scan 'SESSIONID_TIMELINE', {LIMIT = 1, STARTROW = '--'} ROW COLUMN+CELL -\xA1\xAFr\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\ column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714, value=PAGE\x09239923973\x091363384698919\x09/ xFF\xFE\xC2\x8F\xF0\xC1\xBF row(s) in 0.0500 seconds hbase(main):005:0 scan 'SESSIONID_TIMELINE', {LIMIT = 1, STARTROW = ''} ROW COLUMN+CELL +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF column=SID_T_MTX:\x00\x002, timestamp=1364404155426, value=PAGE\x09\x091364404145275\x09 \x09/ E\xC2S-\x08\x1F 1 row(s) in 0.0640 seconds hbase(main):006:0 On Wed, Mar 27, 2013 at 9:23 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Same question, same time :) Regards Ram On Thu, Mar 28, 2013 at 9:53 AM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Could you give us some more insights on this? So you mean when you set the row key as 'azzzaaa', though this row does not exist, the scanner returns some other row? Or it is giving you a row that does not exist? Or you mean it is doing a full table scan? Which version of HBase and what type of filters are you using? Regards Ram On Thu, Mar 28, 2013 at 9:45 AM, Mohit Anchlia mohitanch...@gmail.com wrote
Re: Understanding scan behaviour
I am running 92.1 version and this is what happens. hbase(main):003:0 scan 'SESSIONID_TIMELINE', {LIMIT = 1, STARTROW = 'sdw0'} ROW COLUMN+CELL s\xC1\xEAR\xDF\xEA\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106, value=PAGE\x09\x091363056252990\x09\x09/ 7F\xFF\xFE\xC2\xA3\x84Z\x7F 1 row(s) in 0.0450 seconds hbase(main):004:0 scan 'SESSIONID_TIMELINE', {LIMIT = 1, STARTROW = '--'} ROW COLUMN+CELL -\xA1\xAFr\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\ column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714, value=PAGE\x09239923973\x091363384698919\x09/ xFF\xFE\xC2\x8F\xF0\xC1\xBF row(s) in 0.0500 seconds hbase(main):005:0 scan 'SESSIONID_TIMELINE', {LIMIT = 1, STARTROW = ''} ROW COLUMN+CELL +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF column=SID_T_MTX:\x00\x002, timestamp=1364404155426, value=PAGE\x09\x091364404145275\x09 \x09/ E\xC2S-\x08\x1F 1 row(s) in 0.0640 seconds hbase(main):006:0 On Wed, Mar 27, 2013 at 9:23 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Same question, same time :) Regards Ram On Thu, Mar 28, 2013 at 9:53 AM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Could you give us some more insights on this? So you mean when you set the row key as 'azzzaaa', though this row does not exist, the scanner returns some other row? Or it is giving you a row that does not exist? Or you mean it is doing a full table scan? Which version of HBase and what type of filters are you using? Regards Ram On Thu, Mar 28, 2013 at 9:45 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have key in the form of hashedid + timestamp but when I run scan I get rows for almost every value. For instance if I run scan for 'azzzaaa' that doesn't even exist even then I get the results. Could someone help me understand what might be going on here?
Re: Understanding scan behaviour
My understanding is that the row key would start with + for instance. On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mohit, I see nothing wrong with the results below. What would I have expected? JM 2013/3/28 Mohit Anchlia mohitanch...@gmail.com: I am running 92.1 version and this is what happens. hbase(main):003:0 scan 'SESSIONID_TIMELINE', {LIMIT = 1, STARTROW = 'sdw0'} ROW COLUMN+CELL s\xC1\xEAR\xDF\xEA\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106, value=PAGE\x09\x091363056252990\x09\x09/ 7F\xFF\xFE\xC2\xA3\x84Z\x7F 1 row(s) in 0.0450 seconds hbase(main):004:0 scan 'SESSIONID_TIMELINE', {LIMIT = 1, STARTROW = '--'} ROW COLUMN+CELL -\xA1\xAFr\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\ column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714, value=PAGE\x09239923973\x091363384698919\x09/ xFF\xFE\xC2\x8F\xF0\xC1\xBF row(s) in 0.0500 seconds hbase(main):005:0 scan 'SESSIONID_TIMELINE', {LIMIT = 1, STARTROW = ''} ROW COLUMN+CELL +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF column=SID_T_MTX:\x00\x002, timestamp=1364404155426, value=PAGE\x09\x091364404145275\x09 \x09/ E\xC2S-\x08\x1F 1 row(s) in 0.0640 seconds hbase(main):006:0 On Wed, Mar 27, 2013 at 9:23 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Same question, same time :) Regards Ram On Thu, Mar 28, 2013 at 9:53 AM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Could you give us some more insights on this? So you mean when you set the row key as 'azzzaaa', though this row does not exist, the scanner returns some other row? Or it is giving you a row that does not exist? Or you mean it is doing a full table scan? Which version of HBase and what type of filters are you using? Regards Ram On Thu, Mar 28, 2013 at 9:45 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have key in the form of hashedid + timestamp but when I run scan I get rows for almost every value. For instance if I run scan for 'azzzaaa' that doesn't even exist even then I get the results. Could someone help me understand what might be going on here?
Re: Understanding scan behaviour
I see then I misunderstood the behaviour. My keys are id + timestamp so that I can do a range type search. So what I really want is to return a row where id matches the prefix. Is there a way to do this without having to scan large amounts of data? On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mohit, + ascii code is 43 9 ascii code is 57. So +9 is coming after ++. If you don't have any row with the exact key +, HBase will look for the first one after this one. And in your case, it's +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF. JM 2013/3/28 Mohit Anchlia mohitanch...@gmail.com: My understanding is that the row key would start with + for instance. On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mohit, I see nothing wrong with the results below. What would I have expected? JM 2013/3/28 Mohit Anchlia mohitanch...@gmail.com: I am running 92.1 version and this is what happens. hbase(main):003:0 scan 'SESSIONID_TIMELINE', {LIMIT = 1, STARTROW = 'sdw0'} ROW COLUMN+CELL s\xC1\xEAR\xDF\xEA\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106, value=PAGE\x09\x091363056252990\x09\x09/ 7F\xFF\xFE\xC2\xA3\x84Z\x7F 1 row(s) in 0.0450 seconds hbase(main):004:0 scan 'SESSIONID_TIMELINE', {LIMIT = 1, STARTROW = '--'} ROW COLUMN+CELL -\xA1\xAFr\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\ column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714, value=PAGE\x09239923973\x091363384698919\x09/ xFF\xFE\xC2\x8F\xF0\xC1\xBF row(s) in 0.0500 seconds hbase(main):005:0 scan 'SESSIONID_TIMELINE', {LIMIT = 1, STARTROW = ''} ROW COLUMN+CELL +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF column=SID_T_MTX:\x00\x002, timestamp=1364404155426, value=PAGE\x09\x091364404145275\x09 \x09/ E\xC2S-\x08\x1F 1 row(s) in 0.0640 seconds hbase(main):006:0 On Wed, Mar 27, 2013 at 9:23 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Same question, same time :) Regards Ram On Thu, Mar 28, 2013 at 9:53 AM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Could you give us some more insights on this? So you mean when you set the row key as 'azzzaaa', though this row does not exist, the scanner returns some other row? Or it is giving you a row that does not exist? Or you mean it is doing a full table scan? Which version of HBase and what type of filters are you using? Regards Ram On Thu, Mar 28, 2013 at 9:45 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have key in the form of hashedid + timestamp but when I run scan I get rows for almost every value. For instance if I run scan for 'azzzaaa' that doesn't even exist even then I get the results. Could someone help me understand what might be going on here?
Re: Understanding scan behaviour
Could the prefix filter lead to full tablescan? In other words is PrefixFilter applied after fetching the rows? Another question I have is say I have row key abc and abd and I search for row abc, is it always guranteed to be the first key when returned from scanned results? If so I can alway put a condition in the client app. On Thu, Mar 28, 2013 at 9:15 AM, Ted Yu yuzhih...@gmail.com wrote: Take a look at the following in hbase-server/src/main/ruby/shell/commands/scan.rb (trunk) hbase scan 't1', {FILTER = (PrefixFilter ('row2') AND (QualifierFilter (=, 'binary:xyz'))) AND (TimestampsFilter ( 123, 456))} Cheers On Thu, Mar 28, 2013 at 9:02 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I see then I misunderstood the behaviour. My keys are id + timestamp so that I can do a range type search. So what I really want is to return a row where id matches the prefix. Is there a way to do this without having to scan large amounts of data? On Thu, Mar 28, 2013 at 8:26 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mohit, + ascii code is 43 9 ascii code is 57. So +9 is coming after ++. If you don't have any row with the exact key +, HBase will look for the first one after this one. And in your case, it's +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF. JM 2013/3/28 Mohit Anchlia mohitanch...@gmail.com: My understanding is that the row key would start with + for instance. On Thu, Mar 28, 2013 at 7:53 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mohit, I see nothing wrong with the results below. What would I have expected? JM 2013/3/28 Mohit Anchlia mohitanch...@gmail.com: I am running 92.1 version and this is what happens. hbase(main):003:0 scan 'SESSIONID_TIMELINE', {LIMIT = 1, STARTROW = 'sdw0'} ROW COLUMN+CELL s\xC1\xEAR\xDF\xEA\x89\x91\xFF\x1A^\xB6d\xF0\xEC\x column=SID_T_MTX:\x00\x00Rc, timestamp=1363056261106, value=PAGE\x09\x091363056252990\x09\x09/ 7F\xFF\xFE\xC2\xA3\x84Z\x7F 1 row(s) in 0.0450 seconds hbase(main):004:0 scan 'SESSIONID_TIMELINE', {LIMIT = 1, STARTROW = '--'} ROW COLUMN+CELL -\xA1\xAFr\xBD\xE2L\x00\xCD*\xD7\xE8\xD6\x1Dk\x7F\ column=SID_T_MTX:\x00\x00hF, timestamp=1363384706714, value=PAGE\x09239923973\x091363384698919\x09/ xFF\xFE\xC2\x8F\xF0\xC1\xBF row(s) in 0.0500 seconds hbase(main):005:0 scan 'SESSIONID_TIMELINE', {LIMIT = 1, STARTROW = ''} ROW COLUMN+CELL +9hC\xFC\x82s\xABL3\xB3B\xC0\xF9\x87\x03\x7F\xFF\xF column=SID_T_MTX:\x00\x002, timestamp=1364404155426, value=PAGE\x09\x091364404145275\x09 \x09/ E\xC2S-\x08\x1F 1 row(s) in 0.0640 seconds hbase(main):006:0 On Wed, Mar 27, 2013 at 9:23 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Same question, same time :) Regards Ram On Thu, Mar 28, 2013 at 9:53 AM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Could you give us some more insights on this? So you mean when you set the row key as 'azzzaaa', though this row does not exist, the scanner returns some other row? Or it is giving you a row that does not exist? Or you mean it is doing a full table scan? Which version of HBase and what type of filters are you using? Regards Ram On Thu, Mar 28, 2013 at 9:45 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I have key in the form of hashedid + timestamp but when I run scan I get rows for almost every value. For instance if I run scan for 'azzzaaa' that doesn't even exist even then I get the results. Could someone help me understand what might be going on here?
Incorrect Root region server
I am seeing a wierd issue where zk is going to primarymaster (hostname) as a ROOT region. This host doesn't exist. Everything was working ok until I ran truncate on few tables. Does anyone know what might be the issue?
Re: persistence in Hbase
This is a broad topic by itself. In short often people use battery backed cache or leave the write cache disabled for such concern. There are various factors involved when deciding if to leave caches enabled or not. Caches are often good for OLTP type application or even light OLAP workload. But for huge workload you might saturate caches and is better off leaving it disabled, assuming these are not super computers with TBs of caches :). There are other factors in play like number of replica copies you have and what gurantees or SLA you have put in place around data availability and loss in the event of site disaster. Most people are ok with some data loss on site failure and leave caches enabled with multiple replica copies enabled. On Mon, Jan 14, 2013 at 7:19 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Just for my own edification - isn't data loss always going to be possible due to caches present in HDDs and inability(?) for force them to flush. I believe I've read even fsync lies... Thanks, Otis -- HBASE Performance Monitoring - http://sematext.com/spm/index.html On Thu, Jan 10, 2013 at 10:54 PM, lars hofhansl la...@apache.org wrote: Not entirely true, though. Data is not sync'ed to disk, but only distributed to all HDFS replicas. During a power outage event across all HDFS failure zones (such as a data center) you can lose data. -- Lars - Original Message - From: anil gupta anilgupt...@gmail.com To: user@hbase.apache.org Cc: Sent: Thursday, January 10, 2013 2:38 PM Subject: Re: persistence in Hbase Hi Mohammad, If the Write Ahead Log(WAL) is turned on then in **NO** case data should be lost. HBase is strongly-consistent. If you know of any case when WAL is turned on and data is lost then IMO that's a Critical bug in HBase. Thanks, Anil Gupta On Thu, Jan 10, 2013 at 7:37 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Data also gets written in WAL. See: http://hbase.apache.org/book/perf.writing.html On Thu, Jan 10, 2013 at 7:36 AM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Yes definitely you will get back the data. Please read the HBase Book that explains things in detail. http://hbase.apache.org/book.html. Regards Ram On Thu, Jan 10, 2013 at 8:48 PM, Panshul Gupta panshul...@gmail.com wrote: Hello, I was wondering if it is possible that I have data stored in Hbase tables on my 10 node cluster. I switch off (power down) my cluster. When I power up my cluster again, and run the HDFS and hadoop daemons, will the Hbase have my old data persisted in the form I left it?? or will I have to re import all the data?? Thankyou for the help. -- Regards, Panshul. http://about.me/panshulgupta -- Thanks Regards, Anil Gupta
Re: Storing images in Hbase
Thanks Jack for sharing this information. This definitely makes sense when using the type of caching layer. You mentioned about increasing write cache, I am assuming you had to increase the following parameters in addition to increase the memstore size: hbase.hregion.max.filesize hbase.hregion.memstore.flush.size On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin magn...@gmail.com wrote: We buffer all accesses to HBASE with Varnish SSD based caching layer. So the impact for reads is negligible. We have 70 node cluster, 8 GB of RAM per node, relatively weak nodes (intel core 2 duo), with 10-12TB per server of disks. Inserting 600,000 images per day. We have relatively little of compaction activity as we made our write cache much larger than read cache - so we don't experience region file fragmentation as much. -Jack On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I think it really depends on volume of the traffic, data distribution per region, how and when files compaction occurs, number of nodes in the cluster. In my experience when it comes to blob data where you are serving 10s of thousand+ requests/sec writes and reads then it's very difficult to manage HBase without very hard operations and maintenance in play. Jack earlier mentioned they have 1 billion images, It would be interesting to know what they see in terms of compaction, no of requests per sec. I'd be surprised that in high volume site it can be done without any Caching layer on the top to alleviate IO spikes that occurs because of GC and compactions. On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq donta...@gmail.com wrote: IMHO, if the image files are not too huge, Hbase can efficiently serve the purpose. You can store some additional info along with the file depending upon your search criteria to make the search faster. Say if you want to fetch images by the type, you can store images in one column and its extension in another column(jpg, tiff etc). BTW, what exactly is the problem which you are facing. You have written But I still cant do it? Warm Regards, Tariq https://mtariq.jux.com/ On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel michael_se...@hotmail.com wrote: That's a viable option. HDFS reads are faster than HBase, but it would require first hitting the index in HBase which points to the file and then fetching the file. It could be faster... we found storing binary data in a sequence file and indexed on HBase to be faster than HBase, however, YMMV and HBase has been improved since we did that project On Jan 10, 2013, at 10:56 PM, shashwat shriparv dwivedishash...@gmail.com wrote: Hi Kavish, i have a better idea for you copy your image files to a single file on hdfs, and if new image comes append it to the existing image, and keep and update the metadata and the offset to the HBase. Because if you put bigger image in hbase it wil lead to some issue. ∞ Shashwat Shriparv On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl la...@apache.org wrote: Interesting. That's close to a PB if my math is correct. Is there a write up about this somewhere? Something that we could link from the HBase homepage? -- Lars - Original Message - From: Jack Levin magn...@gmail.com To: user@hbase.apache.org Cc: Andrew Purtell apurt...@apache.org Sent: Thursday, January 10, 2013 9:24 AM Subject: Re: Storing images in Hbase We stored about 1 billion images into hbase with file size up to 10MB. Its been running for close to 2 years without issues and serves delivery of images for Yfrog and ImageShack. If you have any questions about the setup, I would be glad to answer them. -Jack On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I have done extensive testing and have found that blobs don't belong in the databases but are rather best left out on the file system. Andrew outlined issues that you'll face and not to mention IO issues when compaction occurs over large files. On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell apurt...@apache.org wrote: I meant this to say a few really large values On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell apurt...@apache.org wrote: Consider if the split threshold is 2 GB but your one row contains 10 GB as really large value. -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: Storing images in Hbase
I think it really depends on volume of the traffic, data distribution per region, how and when files compaction occurs, number of nodes in the cluster. In my experience when it comes to blob data where you are serving 10s of thousand+ requests/sec writes and reads then it's very difficult to manage HBase without very hard operations and maintenance in play. Jack earlier mentioned they have 1 billion images, It would be interesting to know what they see in terms of compaction, no of requests per sec. I'd be surprised that in high volume site it can be done without any Caching layer on the top to alleviate IO spikes that occurs because of GC and compactions. On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq donta...@gmail.com wrote: IMHO, if the image files are not too huge, Hbase can efficiently serve the purpose. You can store some additional info along with the file depending upon your search criteria to make the search faster. Say if you want to fetch images by the type, you can store images in one column and its extension in another column(jpg, tiff etc). BTW, what exactly is the problem which you are facing. You have written But I still cant do it? Warm Regards, Tariq https://mtariq.jux.com/ On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel michael_se...@hotmail.com wrote: That's a viable option. HDFS reads are faster than HBase, but it would require first hitting the index in HBase which points to the file and then fetching the file. It could be faster... we found storing binary data in a sequence file and indexed on HBase to be faster than HBase, however, YMMV and HBase has been improved since we did that project On Jan 10, 2013, at 10:56 PM, shashwat shriparv dwivedishash...@gmail.com wrote: Hi Kavish, i have a better idea for you copy your image files to a single file on hdfs, and if new image comes append it to the existing image, and keep and update the metadata and the offset to the HBase. Because if you put bigger image in hbase it wil lead to some issue. ∞ Shashwat Shriparv On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl la...@apache.org wrote: Interesting. That's close to a PB if my math is correct. Is there a write up about this somewhere? Something that we could link from the HBase homepage? -- Lars - Original Message - From: Jack Levin magn...@gmail.com To: user@hbase.apache.org Cc: Andrew Purtell apurt...@apache.org Sent: Thursday, January 10, 2013 9:24 AM Subject: Re: Storing images in Hbase We stored about 1 billion images into hbase with file size up to 10MB. Its been running for close to 2 years without issues and serves delivery of images for Yfrog and ImageShack. If you have any questions about the setup, I would be glad to answer them. -Jack On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I have done extensive testing and have found that blobs don't belong in the databases but are rather best left out on the file system. Andrew outlined issues that you'll face and not to mention IO issues when compaction occurs over large files. On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell apurt...@apache.org wrote: I meant this to say a few really large values On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell apurt...@apache.org wrote: Consider if the split threshold is 2 GB but your one row contains 10 GB as really large value. -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: persistence in Hbase
Data also gets written in WAL. See: http://hbase.apache.org/book/perf.writing.html On Thu, Jan 10, 2013 at 7:36 AM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Yes definitely you will get back the data. Please read the HBase Book that explains things in detail. http://hbase.apache.org/book.html. Regards Ram On Thu, Jan 10, 2013 at 8:48 PM, Panshul Gupta panshul...@gmail.com wrote: Hello, I was wondering if it is possible that I have data stored in Hbase tables on my 10 node cluster. I switch off (power down) my cluster. When I power up my cluster again, and run the HDFS and hadoop daemons, will the Hbase have my old data persisted in the form I left it?? or will I have to re import all the data?? Thankyou for the help. -- Regards, Panshul. http://about.me/panshulgupta
Re: HBase - Secondary Index
It makes sense to use inverted indexes when you have unique index columns. But if you have columns that are evenly distributed then parallel search makes more sense. It just depends on cardinality of your indexed columns. On Tue, Jan 8, 2013 at 5:28 PM, anil gupta anilgupt...@gmail.com wrote: +1 on Lars comment. Either the client gets the rowkey from secondary table and then gets the real data from Primary Table. ** OR ** Send the request to all the RS(or region) hosting a region of primary table. Anoop is using the latter mechanism. Both the mechanism have their pros and cons. IMO, there is no outright winner. ~Anil Gupta On Tue, Jan 8, 2013 at 4:30 PM, lars hofhansl la...@apache.org wrote: Different use cases. For global point queries you want exactly what you said below. For range scans across many rows you want Anoop's design. As usually it depends. The tradeoff is bringing a lot of unnecessary data to the client vs having to contact each region (or at least each region server). -- Lars From: Michael Segel michael_se...@hotmail.com To: user@hbase.apache.org Sent: Tuesday, January 8, 2013 6:33 AM Subject: Re: HBase - Secondary Index So if you're using an inverted table / index why on earth are you doing it at the region level? I've tried to explain this to others over 6 months ago and its not really a good idea. You're over complicating this and you will end up creating performance bottlenecks when your secondary index is completely orthogonal to your row key. To give you an example... Suppose you're CCCIS and you have a large database of auto insurance claims that you've acquired over the years from your Pathways product. Your primary key would be a combination of the Insurance Company's ID and their internal claim ID for the individual claim. Your row would be all of the data associated to that claim. So now lets say you want to find the average cost to repair a front end collision of an S80 Volvo. The make and model of the car would be orthogonal to the initial key. This means that the result set containing insurance records for Front End collisions of S80 Volvos would be most likely evenly distributed across the cluster's regions. If you used a series of inverted tables, you would be able to use a series of get()s to get the result set from each index and then find their intersections. (Note that you could also put them in sort order so that the intersections would be fairly straight forward to find. Doing this at the region level isn't so simple. So I have to again ask why go through and over complicate things? Just saying... On Jan 7, 2013, at 7:49 AM, Anoop Sam John anoo...@huawei.com wrote: Hi, It is inverted index based on column(s) value(s) It will be region wise indexing. Can work when some one knows the rowkey range or NOT. -Anoop- From: Mohit Anchlia [mohitanch...@gmail.com] Sent: Monday, January 07, 2013 9:47 AM To: user@hbase.apache.org Subject: Re: HBase - Secondary Index Hi Anoop, Am I correct in understanding that this indexing mechanism is only applicable when you know the row key? It's not an inverted index truly based on the column value. Mohit On Sun, Jan 6, 2013 at 7:48 PM, Anoop Sam John anoo...@huawei.com wrote: Hi Adrien We are making the consistency btw the main table and index table and the roll back mentioned below etc using the CP hooks. The current hooks were not enough for those though.. I am in the process of trying to contribute those new hooks, core changes etc now... Once all are done I will be able to explain in details.. -Anoop- From: Adrien Mogenet [adrien.moge...@gmail.com] Sent: Monday, January 07, 2013 2:00 AM To: user@hbase.apache.org Subject: Re: HBase - Secondary Index Nice topic, perhaps one of the most important for 2013 :-) I still don't get how you're ensuring consistency between index table and main table, without an external component (such as bookkeeper/zookeeper). What's the exact write path in your situation when inserting data ? (WAL/RegionObserver, pre/post put/WALedit...) The underlying question is about how you're ensuring that WALEdit in Index and Main tables are perfectly sync'ed, and how you 're able to rollback in case of issue in both WAL ? On Fri, Dec 28, 2012 at 11:55 AM, Shengjie Min kelvin@gmail.com wrote: Yes as you say when the no of rows to be returned is becoming more and more the latency will be becoming more. seeks within an HFile block is some what expensive op now. (Not much but still) The new encoding prefix trie will be a huge bonus here. There the seeks will be flying
Re: HBase - Secondary Index
Does anyone has any links or information to the new prefix encoding feature in HBase that's being referred to in this mail? On Sun, Jan 6, 2013 at 12:30 PM, Adrien Mogenet adrien.moge...@gmail.comwrote: Nice topic, perhaps one of the most important for 2013 :-) I still don't get how you're ensuring consistency between index table and main table, without an external component (such as bookkeeper/zookeeper). What's the exact write path in your situation when inserting data ? (WAL/RegionObserver, pre/post put/WALedit...) The underlying question is about how you're ensuring that WALEdit in Index and Main tables are perfectly sync'ed, and how you 're able to rollback in case of issue in both WAL ? On Fri, Dec 28, 2012 at 11:55 AM, Shengjie Min kelvin@gmail.com wrote: Yes as you say when the no of rows to be returned is becoming more and more the latency will be becoming more. seeks within an HFile block is some what expensive op now. (Not much but still) The new encoding prefix trie will be a huge bonus here. There the seeks will be flying.. [Ted also presented this in the Hadoop China] Thanks to Matt... :) I am trying to measure the scan performance with this new encoding . Trying to back port a simple patch for 94 version just for testing... Yes when the no of results to be returned is more and more any index will become less performing as per my study :) yes, you are right, I guess it's just a drawback of any index approach. Thanks for the explanation. Shengjie On 28 December 2012 04:14, Anoop Sam John anoo...@huawei.com wrote: Do you have link to that presentation? http://hbtc2012.hadooper.cn/subject/track4TedYu4.pdf -Anoop- From: Mohit Anchlia [mohitanch...@gmail.com] Sent: Friday, December 28, 2012 9:12 AM To: user@hbase.apache.org Subject: Re: HBase - Secondary Index On Thu, Dec 27, 2012 at 7:33 PM, Anoop Sam John anoo...@huawei.com wrote: Yes as you say when the no of rows to be returned is becoming more and more the latency will be becoming more. seeks within an HFile block is some what expensive op now. (Not much but still) The new encoding prefix trie will be a huge bonus here. There the seeks will be flying.. [Ted also presented this in the Hadoop China] Thanks to Matt... :) I am trying to measure the scan performance with this new encoding . Trying to back port a simple patch for 94 version just for testing... Yes when the no of results to be returned is more and more any index will become less performing as per my study :) Do you have link to that presentation? btw, quick question- in your presentation, the scale there is seconds or mill-seconds:) It is seconds. Dont consider the exact values. What is the % of increase in latency is important :) Those were not high end machines. -Anoop- From: Shengjie Min [kelvin@gmail.com] Sent: Thursday, December 27, 2012 9:59 PM To: user@hbase.apache.org Subject: Re: HBase - Secondary Index Didnt follow u completely here. There wont be any get() happening.. As the exact rowkey in a region we get from the index table, we can seek to the exact position and return that row. Sorry, When I misused get() here, I meant seeking. Yes, if it's just small number of rows returned, this works perfect. As you said you will get the exact rowkey positions per region, and simply seek them. I was trying to work out the case that when the number of result rows increases massively. Like in Anil's case, he wants to do a scan query against the 2ndary index(timestamp): select all rows from timestamp1 to timestamp2 given no customerId provided. During that time period, he might have a big chunk of rows from different customerIds. The index table returns a lot of rowkey positions for different customerIds (I believe they are scattered in different regions), then you end up seeking all different positions in different regions and return all the rows needed. According to your presentation page14 - Performance Test Results (Scan), without index, it's a linear increase as result rows # increases. on the other hand, with index, time spent climbs up way quicker than the case without index. btw, quick question- in your presentation, the scale there is seconds or mill-seconds:) - Shengjie On 27 December 2012 15:54, Anoop John anoop.hb...@gmail.com wrote: how the massive number of get() is going to perform againt the main table Didnt follow u completely here. There wont be any get() happening.. As the exact rowkey in a region we get from the index table, we can seek to the exact position
Re: Storing images in Hbase
I have done extensive testing and have found that blobs don't belong in the databases but are rather best left out on the file system. Andrew outlined issues that you'll face and not to mention IO issues when compaction occurs over large files. On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell apurt...@apache.org wrote: I meant this to say a few really large values On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell apurt...@apache.org wrote: Consider if the split threshold is 2 GB but your one row contains 10 GB as really large value. -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: HBase - Secondary Index
Hi Anoop, Am I correct in understanding that this indexing mechanism is only applicable when you know the row key? It's not an inverted index truly based on the column value. Mohit On Sun, Jan 6, 2013 at 7:48 PM, Anoop Sam John anoo...@huawei.com wrote: Hi Adrien We are making the consistency btw the main table and index table and the roll back mentioned below etc using the CP hooks. The current hooks were not enough for those though.. I am in the process of trying to contribute those new hooks, core changes etc now... Once all are done I will be able to explain in details.. -Anoop- From: Adrien Mogenet [adrien.moge...@gmail.com] Sent: Monday, January 07, 2013 2:00 AM To: user@hbase.apache.org Subject: Re: HBase - Secondary Index Nice topic, perhaps one of the most important for 2013 :-) I still don't get how you're ensuring consistency between index table and main table, without an external component (such as bookkeeper/zookeeper). What's the exact write path in your situation when inserting data ? (WAL/RegionObserver, pre/post put/WALedit...) The underlying question is about how you're ensuring that WALEdit in Index and Main tables are perfectly sync'ed, and how you 're able to rollback in case of issue in both WAL ? On Fri, Dec 28, 2012 at 11:55 AM, Shengjie Min kelvin@gmail.com wrote: Yes as you say when the no of rows to be returned is becoming more and more the latency will be becoming more. seeks within an HFile block is some what expensive op now. (Not much but still) The new encoding prefix trie will be a huge bonus here. There the seeks will be flying.. [Ted also presented this in the Hadoop China] Thanks to Matt... :) I am trying to measure the scan performance with this new encoding . Trying to back port a simple patch for 94 version just for testing... Yes when the no of results to be returned is more and more any index will become less performing as per my study :) yes, you are right, I guess it's just a drawback of any index approach. Thanks for the explanation. Shengjie On 28 December 2012 04:14, Anoop Sam John anoo...@huawei.com wrote: Do you have link to that presentation? http://hbtc2012.hadooper.cn/subject/track4TedYu4.pdf -Anoop- From: Mohit Anchlia [mohitanch...@gmail.com] Sent: Friday, December 28, 2012 9:12 AM To: user@hbase.apache.org Subject: Re: HBase - Secondary Index On Thu, Dec 27, 2012 at 7:33 PM, Anoop Sam John anoo...@huawei.com wrote: Yes as you say when the no of rows to be returned is becoming more and more the latency will be becoming more. seeks within an HFile block is some what expensive op now. (Not much but still) The new encoding prefix trie will be a huge bonus here. There the seeks will be flying.. [Ted also presented this in the Hadoop China] Thanks to Matt... :) I am trying to measure the scan performance with this new encoding . Trying to back port a simple patch for 94 version just for testing... Yes when the no of results to be returned is more and more any index will become less performing as per my study :) Do you have link to that presentation? btw, quick question- in your presentation, the scale there is seconds or mill-seconds:) It is seconds. Dont consider the exact values. What is the % of increase in latency is important :) Those were not high end machines. -Anoop- From: Shengjie Min [kelvin@gmail.com] Sent: Thursday, December 27, 2012 9:59 PM To: user@hbase.apache.org Subject: Re: HBase - Secondary Index Didnt follow u completely here. There wont be any get() happening.. As the exact rowkey in a region we get from the index table, we can seek to the exact position and return that row. Sorry, When I misused get() here, I meant seeking. Yes, if it's just small number of rows returned, this works perfect. As you said you will get the exact rowkey positions per region, and simply seek them. I was trying to work out the case that when the number of result rows increases massively. Like in Anil's case, he wants to do a scan query against the 2ndary index(timestamp): select all rows from timestamp1 to timestamp2 given no customerId provided. During that time period, he might have a big chunk of rows from different customerIds. The index table returns a lot of rowkey positions for different customerIds (I believe they are scattered in different regions), then you end up seeking all different positions in different regions and return all the rows needed. According to your presentation page14 - Performance Test Results (Scan), without index, it's a linear
Re: 答复: Storing images in Hbase
IMHO Use dfs unread for blobs and use Hbase for meta data Sent from my iPhone On Jan 5, 2013, at 7:58 PM, 谢良 xieli...@xiaomi.com wrote: Just out of curiousity, why not considering a blob storage system ? Best Regards, Liang 发件人: kavishahuja [kavishah...@yahoo.com] 发送时间: 2013年1月5日 18:11 收件人: user@hbase.apache.org 主题: Storing images in Hbase *Hello EVERYBODY first of all, a happy new year to everyone !! I need a small help regarding pushing images into apache HBase(DB)...i know its about converting objects into bytes and then saving those bytes into hbase rows. But still i cant do it. Kindly help !! * Regards, Kavish -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Storing-images-in-Hbase-tp4036184.html Sent from the HBase User mailing list archive at Nabble.com.
Re: HBase - Secondary Index
On Thu, Dec 27, 2012 at 7:33 PM, Anoop Sam John anoo...@huawei.com wrote: Yes as you say when the no of rows to be returned is becoming more and more the latency will be becoming more. seeks within an HFile block is some what expensive op now. (Not much but still) The new encoding prefix trie will be a huge bonus here. There the seeks will be flying.. [Ted also presented this in the Hadoop China] Thanks to Matt... :) I am trying to measure the scan performance with this new encoding . Trying to back port a simple patch for 94 version just for testing... Yes when the no of results to be returned is more and more any index will become less performing as per my study :) Do you have link to that presentation? btw, quick question- in your presentation, the scale there is seconds or mill-seconds:) It is seconds. Dont consider the exact values. What is the % of increase in latency is important :) Those were not high end machines. -Anoop- From: Shengjie Min [kelvin@gmail.com] Sent: Thursday, December 27, 2012 9:59 PM To: user@hbase.apache.org Subject: Re: HBase - Secondary Index Didnt follow u completely here. There wont be any get() happening.. As the exact rowkey in a region we get from the index table, we can seek to the exact position and return that row. Sorry, When I misused get() here, I meant seeking. Yes, if it's just small number of rows returned, this works perfect. As you said you will get the exact rowkey positions per region, and simply seek them. I was trying to work out the case that when the number of result rows increases massively. Like in Anil's case, he wants to do a scan query against the 2ndary index(timestamp): select all rows from timestamp1 to timestamp2 given no customerId provided. During that time period, he might have a big chunk of rows from different customerIds. The index table returns a lot of rowkey positions for different customerIds (I believe they are scattered in different regions), then you end up seeking all different positions in different regions and return all the rows needed. According to your presentation page14 - Performance Test Results (Scan), without index, it's a linear increase as result rows # increases. on the other hand, with index, time spent climbs up way quicker than the case without index. btw, quick question- in your presentation, the scale there is seconds or mill-seconds:) - Shengjie On 27 December 2012 15:54, Anoop John anoop.hb...@gmail.com wrote: how the massive number of get() is going to perform againt the main table Didnt follow u completely here. There wont be any get() happening.. As the exact rowkey in a region we get from the index table, we can seek to the exact position and return that row. -Anoop- On Thu, Dec 27, 2012 at 6:37 PM, Shengjie Min kelvin@gmail.com wrote: how the massive number of get() is going to perform againt the main table -- All the best, Shengjie Min
Re: Fixing badly distributed table manually.
On Mon, Dec 24, 2012 at 8:27 AM, Ivan Balashov ibalas...@gmail.com wrote: Vincent Barat vbarat@... writes: Hi, Balancing regions between RS is correctly handled by HBase : I mean that your RSs always manage the same number of regions (the balancer takes care of it). Unfortunately, balancing all the regions of one particular table between the RS of your cluster is not always easy, since HBase (as for 0.90.3) when it comes to splitting a region, create the new one always on the same RS. This means that if you start with a 1 region only table, and then you insert lots of data into it, new regions will always be created to the same RS (if you insert is a M/R job, you saturate this RS). Eventually, the balancer at a time will decide to balance one of these regions to other RS, limiting the issue, but it is not controllable. Here at Capptain, we solved this problem by developing a special Python script, based on the HBase shell, allowing to entirely balance all the regions of all tables to all RS. It ensure that regions of tables are uniformly deployed on all RS of the cluster, with a minimum region transitions. Is it possible to describe the logic at high level on what you did? It is fast, and even if it can trigger a lot of region transitions, there is very few impact at runtime and it can be run safely. If you are interested, just let me know, I can share it. Regards, Vincent, I would much like to see and possibly use the script that you mentioned. We've just run into the same issue (after the table has been truncated it was re-created with only 1 region, and after data loading and manual splits we ended up having all regions within the same RS). If you could share the script, it will be really appreciated, I believe not only by me. Thanks, Ivan
Re: Hbase scalability performance
Also, check how balanced your region servers are accross all the nodes On Sat, Dec 22, 2012 at 8:50 AM, Varun Sharma va...@pinterest.com wrote: Note that adding nodes will improve throughput and not latency. So, if your client application for benchmarking is single threaded, do not expect an improvement in number of reads per second by just adding nodes. On Sat, Dec 22, 2012 at 8:23 AM, Michael Segel michael_se...@hotmail.com wrote: I thought it was Doug Miel who said that HBase doesn't start to shine until you had at least 5 nodes. (Apologies if I misspelled Doug's name.) I happen to concur and if you want to start testing scalability, you will want to build a bigger test rig. Just saying! Oh and you're going to have a hot spot on that row key. Maybe do a hashed UUID ? I would suggest that you consider the following: Create N number of rows... where N is a very large number of rows. Then to generate your random access, do a full table scan to get the N row keys in to memory. Using a random number generator, generate a random number and pop that row off the stack so that the next iteration is between 1 and (N-1). Do this 200K times. Now time your 200K random fetches. It would be interesting to see how it performs getting an average of a 'couple' of runs... then increase the key space by an order of magnitude. (Start w 1 million rows, 10 million rows, 100 million rows ) In theory... if properly tuned. One should expect near linear results . That is to say the time it takes to get() a row across the data space should be consistent. Although I wonder if you would have to somehow clear the cache? Sorry, just a random thought... -Mike On Dec 22, 2012, at 10:06 AM, Ted Yu yuzhih...@gmail.com wrote: By '3 datanodes', did you mean that you also increased the number of region servers to 3 ? When your test was running, did you look at Web UI to see whether load was balanced ? You can also use Ganglia for such purpose. What version of HBase are you using ? Thanks On Sat, Dec 22, 2012 at 7:43 AM, Dalia Sobhy dalia.mohso...@hotmail.com wrote: Dear all, I am testing a simple hbase application on a cluster of multiple nodes. I am especially testing the scalability performance, by measuring the time taken for random reads Data size: 200,000 row Row key : 0,1,2 very simple row key incremental But i don't know why by increasing the cluster size, I see the same time. For ex: 2 Datanodes: 1000 random read: 1.757 sec 3 datanodes: 1000 random read: 1.7 sec So any help plzzz ??
Re: responseTooSlow
I looked at that link, but couldn't find anything useful. How do I check if it was client who didn't write data within that time or if it was region server that didn't finish operation in time. On Fri, Dec 21, 2012 at 2:54 PM, Mohammad Tariq donta...@gmail.com wrote: The socket through which your client is communicating is getting closed before the operation could get finished. May be it is taking longer than usual or something. Best Regards, Tariq +91-9741563634 https://mtariq.jux.com/ On Sat, Dec 22, 2012 at 4:08 AM, Mohammad Tariq donta...@gmail.com wrote: Hello Mohit, You might this link http://hbase.apache.org/book/ops.monitoring.htmluseful. Best Regards, Tariq +91-9741563634 https://mtariq.jux.com/ On Sat, Dec 22, 2012 at 2:09 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Could someone help me understand what this really means. Is this the network transfer taking long from client - server or region server taking long time writing to the memory? 2012-12-21 10:54:21,980 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:135652,call:multi( org.apache.hadoop.hbase.client.MultiAction@28338472), rpc version=1, client version=29, methodsFingerPrint=54742778,client:10.18.3.80:48218 ,starttimems:1356115926326,queuetimems:0,class:HRegionServer,responsesize:0,method:multi} 2012-12-21 10:54:21,985 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 26 on 60020 caught: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1653) at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:924) at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:1003) at org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:409) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1346)
Re: responseTooSlow
I am just doing a put. This operation generally takes 10ms but in this case it took more than 10sec. Nothing out of ordinary in the logs On Fri, Dec 21, 2012 at 3:26 PM, Mohammad Tariq donta...@gmail.com wrote: what exactly is the operation your trying to do?how is your network's health?is swapping too high at RS side?anything odd in your RS logs? Best Regards, Tariq +91-9741563634 https://mtariq.jux.com/ On Sat, Dec 22, 2012 at 4:36 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I looked at that link, but couldn't find anything useful. How do I check if it was client who didn't write data within that time or if it was region server that didn't finish operation in time. On Fri, Dec 21, 2012 at 2:54 PM, Mohammad Tariq donta...@gmail.com wrote: The socket through which your client is communicating is getting closed before the operation could get finished. May be it is taking longer than usual or something. Best Regards, Tariq +91-9741563634 https://mtariq.jux.com/ On Sat, Dec 22, 2012 at 4:08 AM, Mohammad Tariq donta...@gmail.com wrote: Hello Mohit, You might this link http://hbase.apache.org/book/ops.monitoring.htmluseful. Best Regards, Tariq +91-9741563634 https://mtariq.jux.com/ On Sat, Dec 22, 2012 at 2:09 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Could someone help me understand what this really means. Is this the network transfer taking long from client - server or region server taking long time writing to the memory? 2012-12-21 10:54:21,980 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:135652,call:multi( org.apache.hadoop.hbase.client.MultiAction@28338472), rpc version=1, client version=29, methodsFingerPrint=54742778,client:10.18.3.80:48218 ,starttimems:1356115926326,queuetimems:0,class:HRegionServer,responsesize:0,method:multi} 2012-12-21 10:54:21,985 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 26 on 60020 caught: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1653) at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:924) at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:1003) at org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:409) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1346)
Re: recommended nodes
On Nov 28, 2012, at 9:07 AM, Adrien Mogenet adrien.moge...@gmail.com wrote: Does HBase really benefit from 64 GB of RAM since allocating too large heap might increase GC time ? Benefit you get is from OS cache Another question : why not RAID 0, in order to aggregate disk bandwidth ? (and thus keep 3x replication factor) On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel michael_se...@hotmail.comwrote: Sorry, I need to clarify. 4GB per physical core is a good starting point. So with 2 quad core chips, that is going to be 32GB. IMHO that's a minimum. If you go with HBase, you will want more. (Actually you will need more.) The next logical jump would be to 48 or 64GB. If we start to price out memory, depending on vendor, your company's procurement, there really isn't much of a price difference in terms of 32,48, or 64 GB. Note that it also depends on the chips themselves. Also you need to see how many memory channels exist in the mother board. You may need to buy in pairs or triplets. Your hardware vendor can help you. (Also you need to keep an eye on your hardware vendor. Sometimes they will give you higher density chips that are going to be more expensive...) ;-) I tend to like having extra memory from the start. It gives you a bit more freedom and also protects you from 'fat' code. Looking at YARN... you will need more memory too. With respect to the hard drives... The best recommendation is to keep the drives as JBOD and then use 3x replication. In this case, make sure that the disk controller cards can handle JBOD. (Some don't support JBOD out of the box) With respect to RAID... If you are running MapR, no need for RAID. If you are running an Apache derivative, you could use RAID 1. Then cut your replication to 2X. This makes it easier to manage drive failures. (Its not the norm, but it works...) In some clusters, they are using appliances like Net App's e series where the machines see the drives as local attached storage and I think the appliances themselves are using RAID. I haven't played with this configuration, however it could make sense and its a valid design. HTH -Mike On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mike, Thanks for all those details! So to simplify the equation, for 16 virtual cores we need 48 to 64GB. Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are a good start? Or I simplified it to much? Regarding the hard drives. If you add more than one drive, do you need to build them on RAID or similar systems? Or can Hadoop/HBase be configured to use more than one drive? Thanks, JM 2012/11/27, Michael Segel michael_se...@hotmail.com: OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an inside joke ...] So here's the problem... By default, your child processes in a map/reduce job get a default 512MB. The majority of the time, this gets raised to 1GB. 8 cores (dual quad cores) shows up at 16 virtual processors in Linux. (Note: This is why when people talk about the number of cores, you have to specify physical cores or logical cores) So if you were to over subscribe and have lets say 12 mappers and 12 reducers, that's 24 slots. Which means that you would need 24GB of memory reserved just for the child processes. This would leave 8GB for DN, TT and the rest of the linux OS processes. Can you live with that? Sure. Now add in R, HBase, Impala, or some other set of tools on top of the cluster. Ooops! Now you are in trouble because you will swap. Also adding in R, you may want to bump up those child procs from 1GB to 2 GB. That means the 24 slots would now require 48GB. Now you have swap and if that happens you will see HBase in a cascading failure. So while you can do a rolling restart with the changed configuration (reducing the number of mappers and reducers) you end up with less slots which will mean in longer run time for your jobs. (Less slots == less parallelism ) Looking at the price of memory... you can get 48GB or even 64GB for around the same price point. (8GB chips) And I didn't even talk about adding SOLR either again a memory hog... ;-) Note that I matched the number of mappers w reducers. You could go with fewer reducers if you want. I tend to recommend a ratio of 2:1 mappers to reducers, depending on the work flow As to the disks... no 7200 SATA III drives are fine. SATA III interface is pretty much available in the new kit being shipped. Its just that you don't have enough drives. 8 cores should be 8 spindles if available. Otherwise you end up seeing your CPU load climb on wait states as the processes wait for the disk i/o to catch up. I mean you could build out a cluster w 4 x 3 3.5 2TB drives in a 1 U chassis based on price. You're making a trade off and you should be aware of the performance hit you will take.
Re: recommended nodes
MApr has its own concept of storage pools and stripe width Sent from my iPhone On Nov 28, 2012, at 5:28 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mike, Why not using LVM with MapR? Since LVM is reading from 2 drives almost at the same time, it should be better than RAID0 or a single drive, no? 2012/11/28, Michael Segel michael_se...@hotmail.com: Just a couple of things. I'm neutral on the use of LVMs. Some would point out that there's some overhead, but on the flip side, it can make managing the machines easier. If you're using MapR, you don't want to use LVMs but raw devices. In terms of GC, its going to depend on the heap size and not the total memory. With respect to HBase. ... MSLABS is the way to go. On Nov 28, 2012, at 12:05 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Gregory, I founs this about LVM: - http://blog.andrew.net.au/2006/08/09 - http://www.phoronix.com/scan.php?page=articleitem=fedora_15_lvmnum=2 Seems that performances are still correct with it. I will most probably give it a try and bench that too... I have one new hard drive which should arrived tomorrow. Perfect timing ;) JM 2012/11/28, Mohit Anchlia mohitanch...@gmail.com: On Nov 28, 2012, at 9:07 AM, Adrien Mogenet adrien.moge...@gmail.com wrote: Does HBase really benefit from 64 GB of RAM since allocating too large heap might increase GC time ? Benefit you get is from OS cache Another question : why not RAID 0, in order to aggregate disk bandwidth ? (and thus keep 3x replication factor) On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel michael_se...@hotmail.comwrote: Sorry, I need to clarify. 4GB per physical core is a good starting point. So with 2 quad core chips, that is going to be 32GB. IMHO that's a minimum. If you go with HBase, you will want more. (Actually you will need more.) The next logical jump would be to 48 or 64GB. If we start to price out memory, depending on vendor, your company's procurement, there really isn't much of a price difference in terms of 32,48, or 64 GB. Note that it also depends on the chips themselves. Also you need to see how many memory channels exist in the mother board. You may need to buy in pairs or triplets. Your hardware vendor can help you. (Also you need to keep an eye on your hardware vendor. Sometimes they will give you higher density chips that are going to be more expensive...) ;-) I tend to like having extra memory from the start. It gives you a bit more freedom and also protects you from 'fat' code. Looking at YARN... you will need more memory too. With respect to the hard drives... The best recommendation is to keep the drives as JBOD and then use 3x replication. In this case, make sure that the disk controller cards can handle JBOD. (Some don't support JBOD out of the box) With respect to RAID... If you are running MapR, no need for RAID. If you are running an Apache derivative, you could use RAID 1. Then cut your replication to 2X. This makes it easier to manage drive failures. (Its not the norm, but it works...) In some clusters, they are using appliances like Net App's e series where the machines see the drives as local attached storage and I think the appliances themselves are using RAID. I haven't played with this configuration, however it could make sense and its a valid design. HTH -Mike On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mike, Thanks for all those details! So to simplify the equation, for 16 virtual cores we need 48 to 64GB. Which mean 3 to 4GB per core. So with quad cores, 12GB to 16GB are a good start? Or I simplified it to much? Regarding the hard drives. If you add more than one drive, do you need to build them on RAID or similar systems? Or can Hadoop/HBase be configured to use more than one drive? Thanks, JM 2012/11/27, Michael Segel michael_se...@hotmail.com: OK... I don't know why Cloudera is so hung up on 32GB. ;-) [Its an inside joke ...] So here's the problem... By default, your child processes in a map/reduce job get a default 512MB. The majority of the time, this gets raised to 1GB. 8 cores (dual quad cores) shows up at 16 virtual processors in Linux. (Note: This is why when people talk about the number of cores, you have to specify physical cores or logical cores) So if you were to over subscribe and have lets say 12 mappers and 12 reducers, that's 24 slots. Which means that you would need 24GB of memory reserved just for the child processes. This would leave 8GB for DN, TT and the rest of the linux OS processes. Can you live with that? Sure. Now add in R, HBase, Impala, or some other set of tools on top of the cluster. Ooops! Now you are in trouble because you will swap. Also adding in R, you may want to bump up those child procs from 1GB to 2 GB. That means
Re: Configuration setup
Thanks! This is the client code I was referring to. The below code doesn't seem to work. Also I tried HBaseConfiguration.addHBaseResrouce and that didn't work either. Is there any other way to make it configurable outside the resource? On Mon, Nov 26, 2012 at 2:39 PM, Stack st...@duboce.net wrote: On Mon, Nov 26, 2012 at 2:16 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I have a need to move hbas-site.xml to an external location. So in order to do that I changed my configuration as shown below. But this doesn't seem to be working. It picks up the file but I get error, seems like it's going to the localhost. I checked hbase-site.xml in the directory and the zookeeper nodes are correctly listed. [11/26/2012 14:09:31,480] INFO apache.zookeeper.ClientCnxn [[web-analytics-ci-1.0.0-SNAPSHOT].AsyncFlow.async2.02-SendThread(localhost.localdomain:2181)](): Opening socket connection to server localhost.localdomain/127.0.0.1:2181 - changed from HBaseConfiguration.create() to config = *new* Configuration(); config.addResource(*new* Path(*CONF_FILE_PROP_NAME*)); *log*.info(Config location picked from: + prop); The above looks basically right but IIRC, this stuff can be tricky adding in new resources and making sure stuff is applied in order -- and then there is 'final' configs that are applied after yours. You could try copying the hbase conf dir to wherever, amending it to suit your needs and then when starting hbase, add '--config ALTERNATE_CONF_DIR'. St.Ack
Re: What could cause HBase's writes slower than reads?
Some of the things to look at is I/O on the disk, CPU on the server. Look at CPU and java thread dumps on the client. I use Ganglia to look at server stats and it is often very helpful. In my opinon best thing would be to add some code around various HBase calls to see where time is being spent on the client side and then go from there. Do your reads read same data set as writes? On Sat, Nov 3, 2012 at 7:09 AM, yun peng pengyunm...@gmail.com wrote: Hi, the throughput for write-only workload is 450 ops/sec and for read-only 900 ops/sec. I am using the same machine (1-core CPU, 2G mem) for client to drive workload into hbase/hdfs... one thread is used in client side. For this workload, it looks client should not be the bottleneck... Btw, is there anyway to verify this. Thanks, Yun On Sat, Nov 3, 2012 at 1:04 AM, Mohit Anchlia mohitanch...@gmail.com wrote: What load do you see on the system? I am wondering if bottleneck is on the client side. On Fri, Nov 2, 2012 at 9:07 PM, yun peng pengyunm...@gmail.com wrote: Hi, All, In my HBase cluster, I observed Put() executes faster than a Get(). Since HBase is optimized towards write, I wonder what may affect Put performance in a distributed setting below. Hbase setup: My HBase cluster are of three nodes, in which one hosts zookeeper and HMaster, and two slaves. HBase cluster is attached to HDFS which resides on a separated cluster. The machines are fairly commodity or lower end, with 2G memory and 1-core CPU. Observed results: I test the Put and Get latency on this setup, and find out Put runs slower than Get (which is a bit surprising to me.) In case anyone is interested, in my result, Put() takes around 3000us and Get only in 1000us (so I think it does not touch disk). What could possibly slow dow Put() and speed up Get() performance in HBase? Does this possibly have to do with distributed setting, like Put needs update multiple (duplicated) copies while Gets only one.. I am quite newbie to HBase internal and not familiar with HBase Put/Get code path, has anyone here have similar experiences? Thanks, Yun
Re: What could cause HBase's writes slower than reads?
What load do you see on the system? I am wondering if bottleneck is on the client side. On Fri, Nov 2, 2012 at 9:07 PM, yun peng pengyunm...@gmail.com wrote: Hi, All, In my HBase cluster, I observed Put() executes faster than a Get(). Since HBase is optimized towards write, I wonder what may affect Put performance in a distributed setting below. Hbase setup: My HBase cluster are of three nodes, in which one hosts zookeeper and HMaster, and two slaves. HBase cluster is attached to HDFS which resides on a separated cluster. The machines are fairly commodity or lower end, with 2G memory and 1-core CPU. Observed results: I test the Put and Get latency on this setup, and find out Put runs slower than Get (which is a bit surprising to me.) In case anyone is interested, in my result, Put() takes around 3000us and Get only in 1000us (so I think it does not touch disk). What could possibly slow dow Put() and speed up Get() performance in HBase? Does this possibly have to do with distributed setting, like Put needs update multiple (duplicated) copies while Gets only one.. I am quite newbie to HBase internal and not familiar with HBase Put/Get code path, has anyone here have similar experiences? Thanks, Yun
Re: HBase Tuning
What's the best way to see if all handlers are occupied? I am probably running into similar issue but would like to check. On Wed, Oct 10, 2012 at 8:24 PM, Stack st...@duboce.net wrote: On Wed, Oct 10, 2012 at 5:51 AM, Ricardo Vilaça rmvil...@di.uminho.pt wrote: However, when adding an additional client node, with also 400 clients, the latency increases 3 times, but the RegionServers remains idle more than 80%. I had tried different values for the hbase.regionserver.handler.count and also for the hbase.client.ipc.pool size and type but without any improvement. I was going to suggest that it sounded like all handlers are occupied... but it sounds like you tried upping them. Is this going from one client node (serving 400 clients) to two client nodes (serving 800 clients)? Where are you measuring from? Application side? Can you figure if we are binding up in HBase or in the client node? What does a client node look like? It is something hosting an hbase client? A webserver or something? Is there any configuration parameter that can improve the latency with several concurrent threads and more than one HBase client node and/or which JMX parameters should I monitor on RegionServers to check what may be causing this and how could I achieve better utilization of CPU at RegionServers? It sounds like all your data is memory resident given its size and the lack of iowait. Is that so? Studying the regionserver metrics, are they fairly constant across the addition of the new client node? St.Ack
Re: HBase client slows down
I am using HTableInterface as a pool but I don't see any setautoflush method. I am using 0.92.1 jar. Also, how can I see if RS is getting overloaded? I looked at the UI and I don't see anything obvious: equestsPerSecond=0, numberOfOnlineRegions=1, numberOfStores=1, numberOfStorefiles=1, storefileIndexSizeMB=0, rootIndexSizeKB=1, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, memstoreSizeMB=27, readRequestsCount=126, writeRequestsCount=96157, compactionQueueSize=0, flushQueueSize=0, usedHeapMB=44, maxHeapMB=3976, blockCacheSizeMB=8.79, blockCacheFreeMB=985.34, blockCacheCount=11, blockCacheHitCount=23, blockCacheMissCount=28, blockCacheEvictedCount=0, blockCacheHitRatio=45%, blockCacheHitCachingRatio=67%, hdfsBlocksLocalityIndex=100 On Tue, Oct 9, 2012 at 10:32 AM, Doug Meil doug.m...@explorysmedical.comwrote: It's one of those it depends answers. See this firstŠ http://hbase.apache.org/book.html#perf.writing Š Additionally, one thing to understand is where you are writing data. Either keep track of the requests per RS over the period (e.g., the web interface), or you can also track it on the client side with... http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html# getRegionLocation%28byte[],%20boolean%29 Š to know if you are continually hitting the same RS or spreading the load. On 10/9/12 1:27 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I just have 5 stress client threads writing timeseries data. What I see is after few mts HBaseClient slows down and starts to take 4 secs. Once I kill the client and restart it stays at sustainable rate for about 2 mts and then again it slows down. I am wondering if there is something I should be doing on the HBaseclient side? All the request are similar in terms of data.
Re: HBase client slows down
There are 2 CF on 2 separate region server. And yes, I have not pre-split the regions as I was told that we should let HBase handle that automatically. Is there a way to set autoflush when using HTableDescriptor? On Tue, Oct 9, 2012 at 10:50 AM, Doug Meil doug.m...@explorysmedical.comwrote: So you're running on a single regionserver? On 10/9/12 1:44 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am using HTableInterface as a pool but I don't see any setautoflush method. I am using 0.92.1 jar. Also, how can I see if RS is getting overloaded? I looked at the UI and I don't see anything obvious: equestsPerSecond=0, numberOfOnlineRegions=1, numberOfStores=1, numberOfStorefiles=1, storefileIndexSizeMB=0, rootIndexSizeKB=1, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, memstoreSizeMB=27, readRequestsCount=126, writeRequestsCount=96157, compactionQueueSize=0, flushQueueSize=0, usedHeapMB=44, maxHeapMB=3976, blockCacheSizeMB=8.79, blockCacheFreeMB=985.34, blockCacheCount=11, blockCacheHitCount=23, blockCacheMissCount=28, blockCacheEvictedCount=0, blockCacheHitRatio=45%, blockCacheHitCachingRatio=67%, hdfsBlocksLocalityIndex=100 On Tue, Oct 9, 2012 at 10:32 AM, Doug Meil doug.m...@explorysmedical.comwrote: It's one of those it depends answers. See this firstŠ http://hbase.apache.org/book.html#perf.writing Š Additionally, one thing to understand is where you are writing data. Either keep track of the requests per RS over the period (e.g., the web interface), or you can also track it on the client side with... http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.htm l# getRegionLocation%28byte[],%20boolean%29 Š to know if you are continually hitting the same RS or spreading the load. On 10/9/12 1:27 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I just have 5 stress client threads writing timeseries data. What I see is after few mts HBaseClient slows down and starts to take 4 secs. Once I kill the client and restart it stays at sustainable rate for about 2 mts and then again it slows down. I am wondering if there is something I should be doing on the HBaseclient side? All the request are similar in terms of data.
Re: HBase tunning
I have a timeseries data and each row has upto 1000 cols. I just started with defaults and I have not tuned any parameters on client or server. My reads are reading all the cols in a row. But request for a given row is completely random. On Fri, Oct 5, 2012 at 6:05 PM, Kevin O'dell kevin.od...@cloudera.comwrote: Mohit, Michael is right most parameters usually go one way or the other depending on what you are trying to accomplish. Memstore - raise for high write Blockcache - raise for high reads hbase blocksize - higher for sequential workload lower for random client caching - lower for really wide rows/large cells and higher for tall tables/small cells etc. On Fri, Oct 5, 2012 at 8:54 PM, Michael Segel michael_se...@hotmail.com wrote: Depends. What sort of system are you tuning? Sorry, but we have to start somewhere and if we don't know what you have in terms of hardware, we don't have a good starting point. On Oct 5, 2012, at 7:47 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Do most people start out with default values and then tune HBase? Or are there some important configuration parameter that should always be changed on client and the server? -- Kevin O'Dell Customer Operations Engineer, Cloudera
Re: HBase vs. HDFS
Are these 120K rows from a single region server? On Mon, Oct 1, 2012 at 4:01 PM, Juan P. gordoslo...@gmail.com wrote: Hi guys, I'm trying to get familiarized with HBase and one thing I noticed is that reads seem to very slow. I just tried doing a scan 'my_table' to get 120K records and it took about 50 seconds to print it all out. In contrast hadoop fs -cat my_file.csv where my_file.csv has 120K lines completed in under a second. Is that possible? Am I missing something about HBase reads? Thanks, Joni
Re: disable table
Thanks everyone for the input, it's helpful. I did remove the znode from /hbase/table/SESSIONID_TIMELINE and after that I was able to list the table. At that point I tried to do a put but when I did a put I got a message NoRegionServer online. I looked in the logs and it says the Failed to open region server at nodexxx. When I went to nodexxx it complains something about unable to run testcompression. I setup SNAPPY compression on my table and I also ran SNAPPY compression test which was successful. Not sure what's going on in the cluster. On Thu, Sep 27, 2012 at 1:10 AM, Mohammad Tariq donta...@gmail.com wrote: Hello Mohit, It should be /hbase/hbase/table/SESSIONID_TIMELINE..Apologies for the typo. For rest of the things, I feel Ramkrishna sir has provided a good and proper explanation. Please let us know if you still have any doubt or question. Ramkrishna.S.Vasudevan : You are welcome sir. It's my pleasure to share space with you people. Regards, Mohammad Tariq On Thu, Sep 27, 2012 at 9:59 AM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: Hi Mohith First of all thanks to Tariq for his replies. Just to add on, Basically HBase uses the Zookeeper to know the status of the cluster like the no of tables enabled, disabled and deleted. Enabled and deleted states are handled bit different in the 0.94 version. ZK is used for various region assignments. Also the ZK is used to track the Active master and standby master. As you understand correctly that the master is responsible for the overall maintenance of the no of tables and their respective states, it seeks the help of ZK to do it and that is where the states are persisted. Also there are few cases where the enable and disable table are having some issues due to some race conditions in the 0.92 versions, In the latest version we are trying to resolve them. You can attach the master and RS logs to identify exactly what caused this problem in your case which will be really help ful so that I can be fixed in the kernel. Regards Ram -Original Message- From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Thursday, September 27, 2012 5:09 AM To: user@hbase.apache.org Subject: Re: disable table I did /hbase/table/SESSIONID_TIMELINE and that seem to work. I'll restart hbase and see if it works. One thing I don't understand is why is zookeeper holding information about this table if it is enabled or disabled? Wouldn't this information be with master? On Wed, Sep 26, 2012 at 4:27 PM, Mohit Anchlia mohitanch...@gmail.comwrote: I don't see path like /hbase/SESSIONID_TIMELINE This is what I see [zk: pprfdaaha303:5181(CONNECTED) 5] ls /hbase/table [SESSIONID_TIMELINE] [zk: pprfdaaha303:5181(CONNECTED) 6] get /hbase/table cZxid = 0x100fe ctime = Mon Sep 10 15:31:45 PDT 2012 mZxid = 0x100fe mtime = Mon Sep 10 15:31:45 PDT 2012 pZxid = 0x508f1 cversion = 3 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 0 numChildren = 1 On Wed, Sep 26, 2012 at 3:57 PM, Mohammad Tariq donta...@gmail.comwrote: In order to delete a znode you have to go to the ZK shell and issue the delete command along with the required path. For example : delete /hbase/SESSIONID_TIMELINE. For detailed info you can visit the ZK homepage at : zookeeper.apache.org Actually when we try to fetch data from an Hbase table, the client or app first contacts the ZK to get the location of server holding the -ROOT- table. From this we come to know about the server hosting the .META. table. This tells us the location of the server which actually holds the rows of interest. Because of some reasons the znode which was holding this info has either faced some catastrophe or lost the info associated with this particular table. Or sometimes the znode remains unable to keep itself updated with the latest changes. That could also be a probable reason. We should always keep in mind that ZK is the centralized service that actually coordinating everything behind the scene. As a result, any problem to the ZK quorum means problem with Hbase custer. Regards, Mohammad Tariq On Thu, Sep 27, 2012 at 3:39 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Thanks! I do see Inconsistency. How do I remove the znode. And also could you please help me understand how this might have happened? ERROR: Region SESSIONID_TIMELINE,,1348689726526.0e200aace5e81cead8d8714ed8076050. not deployed on any region server. On Wed, Sep 26, 2012 at 2:36 PM, Mohammad Tariq donta...@gmail.com wrote: A possible reason could be that the znode associated
Re: disable table
I did restart entire cluster and still that didn't help. Looks like once I get in this Race condition there is no way to come out of it? On Thu, Sep 27, 2012 at 8:00 AM, rajesh babu chintaguntla chrajeshbab...@gmail.com wrote: Hi Mohit, We should not delete znode's manually which will cause inconsistencies like region may be shown as online on master, but it wont be on region server. That's put is failing in your case. Master restart will bring back your cluster to normal state(recovery any failures in enable/disable). Even hbck also wont solve this problem. FYI, Presently discussion is going on this issue. You can follow jira associated with this issue at https://issues.apache.org/jira/browse/HBASE-6469 On Thu, Sep 27, 2012 at 8:11 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Thanks everyone for the input, it's helpful. I did remove the znode from /hbase/table/SESSIONID_TIMELINE and after that I was able to list the table. At that point I tried to do a put but when I did a put I got a message NoRegionServer online. I looked in the logs and it says the Failed to open region server at nodexxx. When I went to nodexxx it complains something about unable to run testcompression. I setup SNAPPY compression on my table and I also ran SNAPPY compression test which was successful. Not sure what's going on in the cluster. On Thu, Sep 27, 2012 at 1:10 AM, Mohammad Tariq donta...@gmail.com wrote: Hello Mohit, It should be /hbase/hbase/table/SESSIONID_TIMELINE..Apologies for the typo. For rest of the things, I feel Ramkrishna sir has provided a good and proper explanation. Please let us know if you still have any doubt or question. Ramkrishna.S.Vasudevan : You are welcome sir. It's my pleasure to share space with you people. Regards, Mohammad Tariq On Thu, Sep 27, 2012 at 9:59 AM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: Hi Mohith First of all thanks to Tariq for his replies. Just to add on, Basically HBase uses the Zookeeper to know the status of the cluster like the no of tables enabled, disabled and deleted. Enabled and deleted states are handled bit different in the 0.94 version. ZK is used for various region assignments. Also the ZK is used to track the Active master and standby master. As you understand correctly that the master is responsible for the overall maintenance of the no of tables and their respective states, it seeks the help of ZK to do it and that is where the states are persisted. Also there are few cases where the enable and disable table are having some issues due to some race conditions in the 0.92 versions, In the latest version we are trying to resolve them. You can attach the master and RS logs to identify exactly what caused this problem in your case which will be really help ful so that I can be fixed in the kernel. Regards Ram -Original Message- From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Thursday, September 27, 2012 5:09 AM To: user@hbase.apache.org Subject: Re: disable table I did /hbase/table/SESSIONID_TIMELINE and that seem to work. I'll restart hbase and see if it works. One thing I don't understand is why is zookeeper holding information about this table if it is enabled or disabled? Wouldn't this information be with master? On Wed, Sep 26, 2012 at 4:27 PM, Mohit Anchlia mohitanch...@gmail.comwrote: I don't see path like /hbase/SESSIONID_TIMELINE This is what I see [zk: pprfdaaha303:5181(CONNECTED) 5] ls /hbase/table [SESSIONID_TIMELINE] [zk: pprfdaaha303:5181(CONNECTED) 6] get /hbase/table cZxid = 0x100fe ctime = Mon Sep 10 15:31:45 PDT 2012 mZxid = 0x100fe mtime = Mon Sep 10 15:31:45 PDT 2012 pZxid = 0x508f1 cversion = 3 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 0 numChildren = 1 On Wed, Sep 26, 2012 at 3:57 PM, Mohammad Tariq donta...@gmail.comwrote: In order to delete a znode you have to go to the ZK shell and issue the delete command along with the required path. For example : delete /hbase/SESSIONID_TIMELINE. For detailed info you can visit the ZK homepage at : zookeeper.apache.org Actually when we try to fetch data from an Hbase table, the client or app first contacts the ZK to get the location of server holding the -ROOT- table. From this we come to know about the server hosting the .META. table. This tells us the location of the server which actually holds the rows of interest. Because of some reasons the znode which was holding
Re: disable table
Which node should I look at for logs? Is this the master node? I'll try hbck. On Wed, Sep 26, 2012 at 2:19 PM, Mohammad Tariq donta...@gmail.com wrote: Hello Mohit, Try hbck once and see if it shows any inconsistency. Also, you can try restarting your cluster and deleting the table again. Having a look at the logs could also be useful. Regards, Mohammad Tariq On Thu, Sep 27, 2012 at 2:24 AM, Mohit Anchlia mohitanch...@gmail.com wrote: When I try to disable table I get: hbase(main):011:0 disable 'SESSIONID_TIMELINE' ERROR: org.apache.hadoop.hbase.TableNotEnabledException: org.apache.hadoop.hbase.TableNotEnabledException: SESSIONID_TIMELINE Here is some help for this command: Start disable of named table: e.g. hbase disable 't1' But then I try to enable I get: hbase(main):012:0 enable 'SESSIONID_TIMELINE' ERROR: org.apache.hadoop.hbase.TableNotDisabledException: org.apache.hadoop.hbase.TableNotDisabledException: SESSIONID_TIMELINE Here is some help for this command: Start enable of named table: e.g. hbase enable 't1' I've tried flush, major_compaction also. I tseems it's stuck in inconsistent state. Could someone point me to correct direction? I am using 92.1
Re: disable table
Thanks! I do see Inconsistency. How do I remove the znode. And also could you please help me understand how this might have happened? ERROR: Region SESSIONID_TIMELINE,,1348689726526.0e200aace5e81cead8d8714ed8076050. not deployed on any region server. On Wed, Sep 26, 2012 at 2:36 PM, Mohammad Tariq donta...@gmail.com wrote: A possible reason could be that the znode associated with this particular table is not behaving properly. In such case, you can try the following: Stop Hbase Stop ZK Take a backup of ZK data Restart ZK Remove the znode Start Hbase again After this hopefully your table would be enabled. Regards, Mohammad Tariq On Thu, Sep 27, 2012 at 2:59 AM, Mohammad Tariq donta...@gmail.com wrote: Yes. Also have a look at the logs of the problematic region if hbck shows any inconsistency. Regards, Mohammad Tariq On Thu, Sep 27, 2012 at 2:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Which node should I look at for logs? Is this the master node? I'll try hbck. On Wed, Sep 26, 2012 at 2:19 PM, Mohammad Tariq donta...@gmail.com wrote: Hello Mohit, Try hbck once and see if it shows any inconsistency. Also, you can try restarting your cluster and deleting the table again. Having a look at the logs could also be useful. Regards, Mohammad Tariq On Thu, Sep 27, 2012 at 2:24 AM, Mohit Anchlia mohitanch...@gmail.com wrote: When I try to disable table I get: hbase(main):011:0 disable 'SESSIONID_TIMELINE' ERROR: org.apache.hadoop.hbase.TableNotEnabledException: org.apache.hadoop.hbase.TableNotEnabledException: SESSIONID_TIMELINE Here is some help for this command: Start disable of named table: e.g. hbase disable 't1' But then I try to enable I get: hbase(main):012:0 enable 'SESSIONID_TIMELINE' ERROR: org.apache.hadoop.hbase.TableNotDisabledException: org.apache.hadoop.hbase.TableNotDisabledException: SESSIONID_TIMELINE Here is some help for this command: Start enable of named table: e.g. hbase enable 't1' I've tried flush, major_compaction also. I tseems it's stuck in inconsistent state. Could someone point me to correct direction? I am using 92.1
Re: disable table
I don't see path like /hbase/SESSIONID_TIMELINE This is what I see [zk: pprfdaaha303:5181(CONNECTED) 5] ls /hbase/table [SESSIONID_TIMELINE] [zk: pprfdaaha303:5181(CONNECTED) 6] get /hbase/table cZxid = 0x100fe ctime = Mon Sep 10 15:31:45 PDT 2012 mZxid = 0x100fe mtime = Mon Sep 10 15:31:45 PDT 2012 pZxid = 0x508f1 cversion = 3 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 0 numChildren = 1 On Wed, Sep 26, 2012 at 3:57 PM, Mohammad Tariq donta...@gmail.com wrote: In order to delete a znode you have to go to the ZK shell and issue the delete command along with the required path. For example : delete /hbase/SESSIONID_TIMELINE. For detailed info you can visit the ZK homepage at : zookeeper.apache.org Actually when we try to fetch data from an Hbase table, the client or app first contacts the ZK to get the location of server holding the -ROOT- table. From this we come to know about the server hosting the .META. table. This tells us the location of the server which actually holds the rows of interest. Because of some reasons the znode which was holding this info has either faced some catastrophe or lost the info associated with this particular table. Or sometimes the znode remains unable to keep itself updated with the latest changes. That could also be a probable reason. We should always keep in mind that ZK is the centralized service that actually coordinating everything behind the scene. As a result, any problem to the ZK quorum means problem with Hbase custer. Regards, Mohammad Tariq On Thu, Sep 27, 2012 at 3:39 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Thanks! I do see Inconsistency. How do I remove the znode. And also could you please help me understand how this might have happened? ERROR: Region SESSIONID_TIMELINE,,1348689726526.0e200aace5e81cead8d8714ed8076050. not deployed on any region server. On Wed, Sep 26, 2012 at 2:36 PM, Mohammad Tariq donta...@gmail.com wrote: A possible reason could be that the znode associated with this particular table is not behaving properly. In such case, you can try the following: Stop Hbase Stop ZK Take a backup of ZK data Restart ZK Remove the znode Start Hbase again After this hopefully your table would be enabled. Regards, Mohammad Tariq On Thu, Sep 27, 2012 at 2:59 AM, Mohammad Tariq donta...@gmail.com wrote: Yes. Also have a look at the logs of the problematic region if hbck shows any inconsistency. Regards, Mohammad Tariq On Thu, Sep 27, 2012 at 2:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Which node should I look at for logs? Is this the master node? I'll try hbck. On Wed, Sep 26, 2012 at 2:19 PM, Mohammad Tariq donta...@gmail.com wrote: Hello Mohit, Try hbck once and see if it shows any inconsistency. Also, you can try restarting your cluster and deleting the table again. Having a look at the logs could also be useful. Regards, Mohammad Tariq On Thu, Sep 27, 2012 at 2:24 AM, Mohit Anchlia mohitanch...@gmail.com wrote: When I try to disable table I get: hbase(main):011:0 disable 'SESSIONID_TIMELINE' ERROR: org.apache.hadoop.hbase.TableNotEnabledException: org.apache.hadoop.hbase.TableNotEnabledException: SESSIONID_TIMELINE Here is some help for this command: Start disable of named table: e.g. hbase disable 't1' But then I try to enable I get: hbase(main):012:0 enable 'SESSIONID_TIMELINE' ERROR: org.apache.hadoop.hbase.TableNotDisabledException: org.apache.hadoop.hbase.TableNotDisabledException: SESSIONID_TIMELINE Here is some help for this command: Start enable of named table: e.g. hbase enable 't1' I've tried flush, major_compaction also. I tseems it's stuck in inconsistent state. Could someone point me to correct direction? I am using 92.1
Re: disable table
I did /hbase/table/SESSIONID_TIMELINE and that seem to work. I'll restart hbase and see if it works. One thing I don't understand is why is zookeeper holding information about this table if it is enabled or disabled? Wouldn't this information be with master? On Wed, Sep 26, 2012 at 4:27 PM, Mohit Anchlia mohitanch...@gmail.comwrote: I don't see path like /hbase/SESSIONID_TIMELINE This is what I see [zk: pprfdaaha303:5181(CONNECTED) 5] ls /hbase/table [SESSIONID_TIMELINE] [zk: pprfdaaha303:5181(CONNECTED) 6] get /hbase/table cZxid = 0x100fe ctime = Mon Sep 10 15:31:45 PDT 2012 mZxid = 0x100fe mtime = Mon Sep 10 15:31:45 PDT 2012 pZxid = 0x508f1 cversion = 3 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 0 numChildren = 1 On Wed, Sep 26, 2012 at 3:57 PM, Mohammad Tariq donta...@gmail.comwrote: In order to delete a znode you have to go to the ZK shell and issue the delete command along with the required path. For example : delete /hbase/SESSIONID_TIMELINE. For detailed info you can visit the ZK homepage at : zookeeper.apache.org Actually when we try to fetch data from an Hbase table, the client or app first contacts the ZK to get the location of server holding the -ROOT- table. From this we come to know about the server hosting the .META. table. This tells us the location of the server which actually holds the rows of interest. Because of some reasons the znode which was holding this info has either faced some catastrophe or lost the info associated with this particular table. Or sometimes the znode remains unable to keep itself updated with the latest changes. That could also be a probable reason. We should always keep in mind that ZK is the centralized service that actually coordinating everything behind the scene. As a result, any problem to the ZK quorum means problem with Hbase custer. Regards, Mohammad Tariq On Thu, Sep 27, 2012 at 3:39 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Thanks! I do see Inconsistency. How do I remove the znode. And also could you please help me understand how this might have happened? ERROR: Region SESSIONID_TIMELINE,,1348689726526.0e200aace5e81cead8d8714ed8076050. not deployed on any region server. On Wed, Sep 26, 2012 at 2:36 PM, Mohammad Tariq donta...@gmail.com wrote: A possible reason could be that the znode associated with this particular table is not behaving properly. In such case, you can try the following: Stop Hbase Stop ZK Take a backup of ZK data Restart ZK Remove the znode Start Hbase again After this hopefully your table would be enabled. Regards, Mohammad Tariq On Thu, Sep 27, 2012 at 2:59 AM, Mohammad Tariq donta...@gmail.com wrote: Yes. Also have a look at the logs of the problematic region if hbck shows any inconsistency. Regards, Mohammad Tariq On Thu, Sep 27, 2012 at 2:55 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Which node should I look at for logs? Is this the master node? I'll try hbck. On Wed, Sep 26, 2012 at 2:19 PM, Mohammad Tariq donta...@gmail.com wrote: Hello Mohit, Try hbck once and see if it shows any inconsistency. Also, you can try restarting your cluster and deleting the table again. Having a look at the logs could also be useful. Regards, Mohammad Tariq On Thu, Sep 27, 2012 at 2:24 AM, Mohit Anchlia mohitanch...@gmail.com wrote: When I try to disable table I get: hbase(main):011:0 disable 'SESSIONID_TIMELINE' ERROR: org.apache.hadoop.hbase.TableNotEnabledException: org.apache.hadoop.hbase.TableNotEnabledException: SESSIONID_TIMELINE Here is some help for this command: Start disable of named table: e.g. hbase disable 't1' But then I try to enable I get: hbase(main):012:0 enable 'SESSIONID_TIMELINE' ERROR: org.apache.hadoop.hbase.TableNotDisabledException: org.apache.hadoop.hbase.TableNotDisabledException: SESSIONID_TIMELINE Here is some help for this command: Start enable of named table: e.g. hbase enable 't1' I've tried flush, major_compaction also. I tseems it's stuck in inconsistent state. Could someone point me to correct direction? I am using 92.1
Re: No of rows
But when resultscanner executes wouldn't it already query the servers for all the rows matching the startkey? I am tyring to avoid reading all the blocks from the file system that matches the keys. On Wed, Sep 12, 2012 at 3:59 PM, Doug Meil doug.m...@explorysmedical.comwrote: Hi there, If you're talking about stopping a scan after X rows (as opposed to the batching), but break out of the ResultScanner loop after X rows. http://hbase.apache.org/book.html#data_model_operations You can either add a ColumnFamily to a scan, or add specific attributes (I.e., cf:column) to a scan. On 9/12/12 6:50 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am using client 0.90.5 jar Is there a way to limit how many rows can be fetched in one scan call? Similarly is there something for colums?
Re: No of rows
On Wed, Sep 12, 2012 at 4:48 PM, lars hofhansl lhofha...@yahoo.com wrote: No. By default each call to ClientScanner.next(...) incurs an RPC call to the HBase server, which is why it is important to enable scanner caching (as opposed to batching) if you expect to scan many rows. By default scanner caching is set to 1. Thanks! If caching is set 1 then is there a way to limit no of rows that's fetched from the server? From: Mohit Anchlia mohitanch...@gmail.com To: user@hbase.apache.org Sent: Wednesday, September 12, 2012 4:29 PM Subject: Re: No of rows But when resultscanner executes wouldn't it already query the servers for all the rows matching the startkey? I am tyring to avoid reading all the blocks from the file system that matches the keys. On Wed, Sep 12, 2012 at 3:59 PM, Doug Meil doug.m...@explorysmedical.com wrote: Hi there, If you're talking about stopping a scan after X rows (as opposed to the batching), but break out of the ResultScanner loop after X rows. http://hbase.apache.org/book.html#data_model_operations You can either add a ColumnFamily to a scan, or add specific attributes (I.e., cf:column) to a scan. On 9/12/12 6:50 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am using client 0.90.5 jar Is there a way to limit how many rows can be fetched in one scan call? Similarly is there something for colums?
Re: More rows or less rows and more columns
On Mon, Sep 10, 2012 at 10:30 AM, Harsh J ha...@cloudera.com wrote: Hey Mohit, See http://hbase.apache.org/book.html#schema.smackdown.rowscols Thanks! Is there a way in HBase to get the most recent inserted column? Or a way to sort columns such that I can manage how many columns I want to read? In timeseries we might be interested in only most recent data point. On Mon, Sep 10, 2012 at 10:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Is there any recommendation on how many columns one should have per row. My columns are 200 bytes. This will help me to decide if I should shard my rows with id + some date/time value. -- Harsh J
Re: More rows or less rows and more columns
On Mon, Sep 10, 2012 at 10:59 AM, Harsh J ha...@cloudera.com wrote: Versions is what you're talking about, and by default all queries return the latest version of updated values. No actually I was asking if I have columns with qualifier: d,b,c,e can I store them sorted such that it is e,d,c,b? This ways I can just get the most recent qualifier or for timeseries most recent qualifier. On Mon, Sep 10, 2012 at 11:04 PM, Mohit Anchlia mohitanch...@gmail.com wrote: On Mon, Sep 10, 2012 at 10:30 AM, Harsh J ha...@cloudera.com wrote: Hey Mohit, See http://hbase.apache.org/book.html#schema.smackdown.rowscols Thanks! Is there a way in HBase to get the most recent inserted column? Or a way to sort columns such that I can manage how many columns I want to read? In timeseries we might be interested in only most recent data point. On Mon, Sep 10, 2012 at 10:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Is there any recommendation on how many columns one should have per row. My columns are 200 bytes. This will help me to decide if I should shard my rows with id + some date/time value. -- Harsh J -- Harsh J
Re: Key formats and very low cardinality leading fields
You can also look at pre-splitting the regions for timeseries type data. On Mon, Sep 3, 2012 at 1:11 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Initially your table will contain only one region. When you will reach its maximum size, it will split into 2 regions will are going to be distributed over the cluster. The 2 regions are going to be ordered by keys.So all entries starting with 1 will be on the first region. And the middle key (let's say 25..) will start the 2nd region. So region 1 will contain 1 to 24999. and the 2nd region will contain keys from 25 And so on. Since keys are ordered, all keys starting with a 1 are going to be closeby on the same region, expect if the region is big enought to be splitted and the servers by more region servers. So when you will load all your entries starting with 1, or 3, they will go on one uniq region. Only entries starting with 2 are going to be sometime on region 1, sometime on region 2. Of course, the more data you will load, the more regions you will have, the less hotspoting you will have. But at the beginning, it might be difficult for some of your servers. 2012/9/3, Eric Czech e...@nextbigsound.com: With regards to: If you have 3 region servers and your data is evenly distributed, that mean all the data starting with a 1 will be on server 1, and so on. Assuming there are multiple regions in existence for each prefix, why would they not be distributed across all the machines? In other words, if there are many regions with keys that generally start with 1, why would they ALL be on server 1 like you said? It's my understanding that the regions aren't placed around the cluster according to the range of information they contain so I'm not quite following that explanation. Putting the higher cardinality values in front of the key isn't entirely out of the question, but I'd like to use the low cardinality key out front for the sake of selecting rows for MapReduce jobs. Otherwise, I always have to scan the full table for each job. On Mon, Sep 3, 2012 at 3:20 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Yes, you're right, but again, it will depend on the number of regionservers and the distribution of your data. If you have 3 region servers and your data is evenly distributed, that mean all the data starting with a 1 will be on server 1, and so on. So if you write a million of lines starting with a 1, they will all land on the same server. Of course, you can pre-split your table. Like 1a to 1z and assign each region to one of you 3 servers. That way you will avoir hotspotting even if you write million of lines starting with a 1. If you have une hundred regions, you will face the same issue at the beginning, but the more data your will add, the more your table will be split across all the servers and the less hotspottig you will have. Can't you just revert your fields and put the 1 to 30 at the end of the key? 2012/9/3, Eric Czech e...@nextbigsound.com: Thanks for the response Jean-Marc! I understand what you're saying but in a more extreme case, let's say I'm choosing the leading number on the range 1 - 3 instead of 1 - 30. In that case, it seems like all of the data for any one prefix would already be split well across the cluster and as long as the second value isn't written sequentially, there wouldn't be an issue. Is my reasoning there flawed at all? On Mon, Sep 3, 2012 at 2:31 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Eric, In HBase, data is stored sequentially based on the key alphabetical order. It will depend of the number of reqions and regionservers you have but if you write data from 23AA to 23ZZ they will most probably go to the same region even if the cardinality of the 2nd part of the key is high. If the first number is always changing between 1 and 30 for each write, then you will reach multiple region/servers if you have, else, you might have some hot-stopping. JM 2012/9/3, Eric Czech e...@nextbigsound.com: Hi everyone, I was curious whether or not I should expect any write hot spots if I structured my composite keys in a way such that the first field is a low cardinality (maybe 30 distinct values) value and the next field contains a very high cardinality value that would not be written sequentially. More concisely, I want to do this: Given one number between 1 and 30, write many millions of rows with keys like number chosen : some generally distinct, non-sequential value Would there be any problem with the millions of writes happening with the same first field key prefix even if the second field is largely unique? Thank you!
Re: md5 hash key and splits
On Thu, Aug 30, 2012 at 11:52 PM, Stack st...@duboce.net wrote: On Thu, Aug 30, 2012 at 5:04 PM, Mohit Anchlia mohitanch...@gmail.com wrote: In general isn't it better to split the regions so that the load can be spread accross the cluster to avoid HotSpots? Time series data is a particular case [1] and the sematextians have tools to help w/ that particular loading pattern. Is time series your loading pattern? If so, yes, you need to employ some smarts (tsdb schema and write tricks or hbasewd tool) to avoid hotspotting. But hotspotting is an issue apart from splts; you can split all you want and if your row keys are time series, splitting won't undo them. My data is timeseries and to get random distribution and still have the keys in the same region for a user I am thinking of using md5(userid)+reversetimestamp as a row key. But with this type of key how can one do pre-splits? I have 30 nodes. You would split to distribute load over the cluster and HBase should be doing this for you w/o need of human intervention (caveat the reasons you might want to manually split as listed above by AK and Ian). St.Ack 1. http://hbase.apache.org/book.html#rowkey.design
Re: md5 hash key and splits
On Wed, Aug 29, 2012 at 10:50 PM, Stack st...@duboce.net wrote: On Wed, Aug 29, 2012 at 9:38 PM, Mohit Anchlia mohitanch...@gmail.com wrote: On Wed, Aug 29, 2012 at 9:19 PM, Stack st...@duboce.net wrote: On Wed, Aug 29, 2012 at 3:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote: If I use md5 hash + timestamp rowkey would hbase automatically detect the difference in ranges and peforms split? How does split work in such cases or is it still advisable to manually split the regions. What logic would you recommend to split the table into multiple regions when using md5 hash? Its hard to know how well your inserts will spread over the md5 namespace ahead of time. You could try sampling or just let HBase take care of the splits for you (Is there a problem w/ your letting HBase do the splits?) From what I;ve read it's advisable to do manual splits since you are able to spread the load in more predictable way. If I am missing something please let me know. St.Ack
Re: md5 hash key and splits
In general isn't it better to split the regions so that the load can be spread accross the cluster to avoid HotSpots? I read about pre-splitting here: http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/ On Thu, Aug 30, 2012 at 4:30 PM, Amandeep Khurana ama...@gmail.com wrote: Also, you might have read that an initial loading of data can be better distributed across the cluster if the table is pre-split rather than starting with a single region and splitting (possibly aggressively, depending on the throughput) as the data loads in. Once you are in a stable state with regions distributed across the cluster, there is really no benefit in terms of spreading load by managing splitting manually v/s letting HBase do it for you. At that point it's about what Ian mentioned - predictability of latencies by avoiding splits happening at a busy time. On Thu, Aug 30, 2012 at 4:26 PM, Ian Varley ivar...@salesforce.com wrote: The Facebook devs have mentioned in public talks that they pre-split their tables and don't use automated region splitting. But as far as I remember, the reason for that isn't predictability of spreading load, so much as predictability of uptime latency (they don't want an automated split to happen at a random busy time). Maybe that's what you mean, Mohit? Ian On Aug 30, 2012, at 5:45 PM, Stack wrote: On Thu, Aug 30, 2012 at 7:35 AM, Mohit Anchlia mohitanch...@gmail.com mailto:mohitanch...@gmail.com wrote: From what I;ve read it's advisable to do manual splits since you are able to spread the load in more predictable way. If I am missing something please let me know. Where did you read that? St.Ack
Re: md5 hash key and splits
On Wed, Aug 29, 2012 at 9:19 PM, Stack st...@duboce.net wrote: On Wed, Aug 29, 2012 at 3:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote: If I use md5 hash + timestamp rowkey would hbase automatically detect the difference in ranges and peforms split? How does split work in such cases or is it still advisable to manually split the regions. What logic would you recommend to split the table into multiple regions when using md5 hash? Yes. On how split works, when a region hits the maximum configured size, it splits in two. Manual splitting can be useful when you know your distribution and you'd save on hbase doing it for you. It can speed up bulk loads for instance. St.Ack
Re: Timeseries data
How does it deal with multiple writes in the same milliseconds for the same rowkey/column? I can't see that info. On Tue, Aug 28, 2012 at 5:33 PM, Marcos Ortiz mlor...@uci.cu wrote: Study the OpenTSDB at StumbleUpon described by Benoit tsuna Sigoure ( ts...@stumbleupon.com) in the HBaseCon talk called Lessons Learned from OpenTSDB. His team have done a great job working with Time-series data, and he gave a lot of great advices to work with this kind of data with HBase: - Wider rows to seek faster - Use asynchbase + Netty or Finagle(great tool created by Twitter engineers to work with HBase) = performance ++ - Make writes idempotent and independent before: start rows at arbitrary points in time after: align rows on 10m (then 1h) boundaries - Store more data per Key/Value - Compact your data - Use short family names Best wishes El 28/08/2012 20:21, Mohit Anchlia escribió: In timeseries type data how do people deal with scenarios where one might get multiple events in a millisecond? Using nano second approach seems tricky. Other option is to take advantage of versions or counters. 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/**universidad.ucihttp://www.facebook.com/universidad.uci http://www.flickr.com/photos/**universidad_ucihttp://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/**universidad.ucihttp://www.facebook.com/universidad.uci http://www.flickr.com/photos/**universidad_ucihttp://www.flickr.com/photos/universidad_uci
Re: Retrieving 2 separate timestamps' values
You timestamp as in version? Can you describe your scenario with more concrete example? On Mon, Aug 27, 2012 at 5:01 PM, Ioakim Perros imper...@gmail.com wrote: Hi, Is there any way of retrieving two values with totally different timestamps from a table? I am using timestamps as iteration counts, and I would like to be able to get at each iteration (besides the previous iteration results from table) some pre-computed amounts I save at some columns with timestamp 0, avoiding the cost of retrieving all table's versions. The only way I have come up with is to save the pre-computed amounts redundantly at all timestamps up to the maximum possible. Does anyone have an idea on a more efficient way of dealing with this? Thanks and regards, IP
Re: Retrieving 2 separate timestamps' values
Have you thought of making your row key as key+timestamp? And then you can do scan on the columns itself? On Mon, Aug 27, 2012 at 5:53 PM, Ioakim Perros imper...@gmail.com wrote: Of course, thank you for responding. I have an iterative procedure where I get and put data from/to an HBase table, and I am setting at each Put the timestamp equal to each iteration's number, as it is efficient to check for convergence in this way (by just retrieving the 2 last versions of my columns). Some amounts of my equations are the same through iterations, and I save them (serialized) at two specific columns of my table with timestamp equal to zero. The rest of my table's columns contain the (serialized) alternating results of my iterations. The thing is that the cached amounts are necessary to be read at each and every iteration, but it would not be efficient to scan all versions of all columns of my table, just to retrieve the previous iteration's results plus the initially saved cached amounts. For example, being at iteration 30 I would like to retrieve only columns 3 and 4 with timestamp 29 and columns 0 and 1 with timestamp 0. With the current HBase's API, I am not sure if this is possible and the solution I described at my previous message (by storing columns 0 and 1 at all timestamps up to 40 for example) seems inefficient. Any ideas? Thanks and regards, IP On 08/28/2012 03:33 AM, Mohit Anchlia wrote: You timestamp as in version? Can you describe your scenario with more concrete example? On Mon, Aug 27, 2012 at 5:01 PM, Ioakim Perros imper...@gmail.com wrote: Hi, Is there any way of retrieving two values with totally different timestamps from a table? I am using timestamps as iteration counts, and I would like to be able to get at each iteration (besides the previous iteration results from table) some pre-computed amounts I save at some columns with timestamp 0, avoiding the cost of retrieving all table's versions. The only way I have come up with is to save the pre-computed amounts redundantly at all timestamps up to the maximum possible. Does anyone have an idea on a more efficient way of dealing with this? Thanks and regards, IP
Re: HBase Put
On Wed, Aug 22, 2012 at 10:20 AM, Pamecha, Abhishek apame...@x.com wrote: So then a GET query means one needs to look in every HFile where key falls within the min/max range of the file. From another parallel thread, I gather, HFile comprise of blocks which, I think, is an atomic unit of persisted data in HDFS.(please correct if not). And that each block for a HFile has a range of keys. My key can satisfy the range for the block and yet may not be present. So, all the blocks that match the range for my key, will need to be scanned. There is one block index per HFile which sorts blocks by key ranges. This index help in reducing the number of blocks to scan by extracting only those blocks whose ranges satisfy the key. In this case, if puts are random wrt order, each block may have similar range and it may turn out that Hbase needs to scan every block for the File. This may not be good for performance. I just want to validate my understanding. If you have such a use case I think best practice is to use bloom filters. I think in generaly it's a good idea to atleast enable bloom filter at row level. Thanks, Abhishek -Original Message- From: lars hofhansl [mailto:lhofha...@yahoo.com] Sent: Tuesday, August 21, 2012 5:55 PM To: user@hbase.apache.org Subject: Re: HBase Put That is correct. From: Pamecha, Abhishek apame...@x.com To: user@hbase.apache.org user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com Sent: Tuesday, August 21, 2012 4:45 PM Subject: RE: HBase Put Hi Lars, Thanks for the explanation. I still have a little doubt: Based on your description, given gets do a merge sort, the data on disk is not kept sorted across files, but just sorted within a file. So, basically if on two separate days, say these keys get inserted: Day1: File1: A B J M Day2: File2: C D K P Then each file is sorted within itself, but scanning both files will require Hbase to use merge sort to produce a sorted result. Right? Also, File 1 and File2 are immutable, and during compactions, File 1 and File2 are compacted and sorted using merge sort to a bigger File3. Is that correct too? Thanks, Abhishek -Original Message- From: lars hofhansl [mailto:lhofha...@yahoo.com] Sent: Tuesday, August 21, 2012 4:07 PM To: user@hbase.apache.org Subject: Re: HBase Put In a nutshell: - Puts are collected in memory (in a sorted data structure) - When the collected data reaches a certain size it is flushed to a new file (which is sorted) - Gets do a merge sort between the various files that have been created - to contain the number of files they are periodically compacted into fewer, larger files So the data files (HFiles) are immutable once written, changes are batched in memory first. -- Lars From: Pamecha, Abhishek apame...@x.com To: user@hbase.apache.org user@hbase.apache.org Sent: Tuesday, August 21, 2012 4:00 PM Subject: HBase Put Hi I had a question on Hbase Put call. In the scenario, where data is inserted without any order to column qualifiers, how does Hbase maintain sortedness wrt column qualifiers in its store files/blocks? I checked the code base and I can see checks https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java#L319 being made for lexicographic insertions for Key value pairs. But I cant seem to find out how the key-offset is calculated in the first place? Also, given HDFS is by nature, append only, how do randomly ordered keys make their way to sorted order. Is it only during minor/major compactions, that this sortedness gets applied and that there is a small window during which data is not sorted? Thanks, Abhishek
Re: Slow full-table scans
It's possible that there is a bad or slower disk on Gurjeet's machine. I think details of iostat and cpu would clear things up. On Tue, Aug 21, 2012 at 4:33 PM, lars hofhansl lhofha...@yahoo.com wrote: I get roughly the same (~1.8s) - 100 rows, 200.000 columns, segment size 100 From: Gurjeet Singh gurj...@gmail.com To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com Sent: Tuesday, August 21, 2012 11:31 AM Subject: Re: Slow full-table scans How does that compare with the newScanTable on your build ? Gurjeet On Tue, Aug 21, 2012 at 11:18 AM, lars hofhansl lhofha...@yahoo.com wrote: Hmm... So I tried in HBase (current trunk). I created 100 rows with 200.000 columns each (using your oldMakeTable). The creation took a bit, but scanning finished in 1.8s. (HBase in pseudo distributed mode - with your oldScanTable). -- Lars - Original Message - From: lars hofhansl lhofha...@yahoo.com To: user@hbase.apache.org user@hbase.apache.org Cc: Sent: Monday, August 20, 2012 7:50 PM Subject: Re: Slow full-table scans Thanks Gurjeet, I'll (hopefully) have a look tomorrow. -- Lars - Original Message - From: Gurjeet Singh gurj...@gmail.com To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com Cc: Sent: Monday, August 20, 2012 7:42 PM Subject: Re: Slow full-table scans Hi Lars, Here is a testcase: https://gist.github.com/3410948 Benchmarking code: https://gist.github.com/3410952 Try running it with numRows = 100, numCols = 20, segmentSize = 1000 Gurjeet On Thu, Aug 16, 2012 at 11:40 AM, Gurjeet Singh gurj...@gmail.com wrote: Sure - I can create a minimal testcase and send it along. Gurjeet On Thu, Aug 16, 2012 at 11:36 AM, lars hofhansl lhofha...@yahoo.com wrote: That's interesting. Could you share your old and new schema. I would like to track down the performance problems you saw. (If you had a demo program that populates your rows with 200.000 columns in a way where you saw the performance issues, that'd be even better, but not necessary). -- Lars From: Gurjeet Singh gurj...@gmail.com To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com Sent: Thursday, August 16, 2012 11:26 AM Subject: Re: Slow full-table scans Sorry for the delay guys. Here are a few results: 1. Regions in the table = 11 2. The region servers don't appear to be very busy with the query ~5% CPU (but with parallelization, they are all busy) Finally, I changed the format of my data, such that each cell in HBase contains a chunk of a row instead of the single value it had. So, stuffing each Hbase cell with 500 columns of a row, gave me a performance boost of 1000x. It seems that the underlying issue was IO overhead per byte of actual data stored. On Wed, Aug 15, 2012 at 5:16 PM, lars hofhansl lhofha...@yahoo.com wrote: Yeah... It looks OK. Maybe 2G of heap is a bit low when dealing with 200.000 column rows. If you can I'd like to know how busy your regionservers are during these operations. That would be an indication on whether the parallelization is good or not. -- Lars - Original Message - From: Stack st...@duboce.net To: user@hbase.apache.org Cc: Sent: Wednesday, August 15, 2012 3:13 PM Subject: Re: Slow full-table scans On Mon, Aug 13, 2012 at 6:10 PM, Gurjeet Singh gurj...@gmail.com wrote: I am beginning to think that this is a configuration issue on my cluster. Do the following configuration files seem sane ? hbase-env.shhttps://gist.github.com/3345338 Nothing wrong w/ this (Remove the -ea, you don't want asserts in production, and the -XX:+CMSIncrementalMode flag if = 2 cores). hbase-site.xmlhttps://gist.github.com/3345356 This is all defaults effectively. I don't see any of the configs. recommended by the performance section of the reference guide and/or those suggested by the GBIF blog. You don't answer LarsH's query about where you see the 4% difference. How many regions in your table? Whats the HBase Master UI look like when this scan is running? St.Ack
Re: HBase replication
On Mon, Aug 20, 2012 at 3:06 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: On Sat, Aug 18, 2012 at 1:30 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Is it also possible to setup bi-directional replication? In other words is it possible to write to the same table to both HBase instances locally in each DC and still be able to replicate between them? Only starting with 0.92.0 Thanks! could you please point me to relevant docs? On the replication page all I see is Master-Slave configuration and I don't see Master-Master configuration. J-D
Re: HBase replication
On Sat, Aug 18, 2012 at 12:35 PM, Stack st...@duboce.net wrote: On Fri, Aug 17, 2012 at 5:36 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Are clients local to slave DC able to read data from HBase slave when replicating data from one DC to remote DC? Yes. Is it also possible to setup bi-directional replication? In other words is it possible to write to the same table to both HBase instances locally in each DC and still be able to replicate between them? If not then is there a way to design such a thing where clients are able to actively read/write from both DCs? You could do this too. Depends on your zk config (ensemble and where you home the client in the ensemble). St.Ack
Re: consistency, availability and partition pattern of HBase
I think availability is sacrificed in the sense that if region server fails clients will have data inaccessible for the time region comes up on some other server, not to confuse with data loss. Sent from my iPad On Aug 7, 2012, at 11:56 PM, Lin Ma lin...@gmail.com wrote: Thank you Wei! Two more comments, 1. How about Hadoop's CAP characters do you think about? 2. For your comments, if HBase implements per key sequential consistency, what are the missing characters for consistency? Cross-key update sequences? Could you show me an example about what you think are missed? thanks. regards, Lin On Wed, Aug 8, 2012 at 12:18 PM, Wei Tan w...@us.ibm.com wrote: Hi Lin, In the CAP theorem Consistency stands for atomic consistency, i.e., each CRUD operation occurs sequentially in a global, real-time clock Availability means each server if not partitioned can accept requests Partition means network partition As far as I understand (although I do not see any official documentation), HBase achieved per key sequential consistency, i.e., for a specific key, there is an agreed sequence, for all operations on it. This is weaker than strong or sequential consistency, but stronger than eventual consistency. BTW: CAP was proposed by Prof. Eric Brewer... http://en.wikipedia.org/wiki/Eric_Brewer_%28scientist%29 Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 w...@us.ibm.com; 914-784-6752 From: Lin Ma lin...@gmail.com To: user@hbase.apache.org, Date: 08/07/2012 09:30 PM Subject:consistency, availability and partition pattern of HBase Hello guys, According to the notes by Werner*, *He presented the CAP theorem, which states that of three properties of shared-data systems—data consistency, system availability, and tolerance to network partition—only two can be achieved at any given time. = http://www.allthingsdistributed.com/2008/12/eventually_consistent.html But it seems HBase could achieve all of the 3 features at the same time. Does it mean HBase breaks the rule by Werner. :-) If not, which one is sacrificed -- consistency (by using HDFS), availability (by using Zookeeper) or partition (by using region / column family) ? And why? regards, Lin
Re: consistency, availability and partition pattern of HBase
On Wed, Aug 8, 2012 at 7:32 PM, Lin Ma lin...@gmail.com wrote: Thank you Lars. Is the same data store duplicated copy across region server? If so, if one primary server for the region dies, client just need to read from the secondary server for the same region. Why there is data is unavailable time? To get better understanding of this I suggest looking at how the WAL logs are stored. WAL stores multiple regions in one log. Before region is alive on other region server master needs to split the logs so that it can replayed by the region server. This process causes downtime with respect to the region which is being replayed using edit logs. BTW: please feel free to correct me for any wrong knowledge about HBase. regards, Lin On Thu, Aug 9, 2012 at 9:31 AM, lars hofhansl lhofha...@yahoo.com wrote: After a write completes the next read (regardless of the location it is issued from) will see the latest value. This is because at any given time exactly RegionServer is responsible for a specific Key (through assignment of key ranges to regions and regions to RegionServers). As Mohit said, the trade off is that data is unavailable if a RegionServer dies until another RegionServer picks up the regions (and by extension the key range) -- Lars - Original Message - From: Lin Ma lin...@gmail.com To: user@hbase.apache.org Cc: Sent: Wednesday, August 8, 2012 8:47 AM Subject: Re: consistency, availability and partition pattern of HBase And consistency is not sacrificed? i.e. all distributed clients' update will results in sequential / real time update? Once update is done by one client, all other client could see results immediately? regards, Lin On Wed, Aug 8, 2012 at 11:17 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I think availability is sacrificed in the sense that if region server fails clients will have data inaccessible for the time region comes up on some other server, not to confuse with data loss. Sent from my iPad On Aug 7, 2012, at 11:56 PM, Lin Ma lin...@gmail.com wrote: Thank you Wei! Two more comments, 1. How about Hadoop's CAP characters do you think about? 2. For your comments, if HBase implements per key sequential consistency, what are the missing characters for consistency? Cross-key update sequences? Could you show me an example about what you think are missed? thanks. regards, Lin On Wed, Aug 8, 2012 at 12:18 PM, Wei Tan w...@us.ibm.com wrote: Hi Lin, In the CAP theorem Consistency stands for atomic consistency, i.e., each CRUD operation occurs sequentially in a global, real-time clock Availability means each server if not partitioned can accept requests Partition means network partition As far as I understand (although I do not see any official documentation), HBase achieved per key sequential consistency, i.e., for a specific key, there is an agreed sequence, for all operations on it. This is weaker than strong or sequential consistency, but stronger than eventual consistency. BTW: CAP was proposed by Prof. Eric Brewer... http://en.wikipedia.org/wiki/Eric_Brewer_%28scientist%29 Best Regards, Wei Wei Tan Research Staff Member IBM T. J. Watson Research Center 19 Skyline Dr, Hawthorne, NY 10532 w...@us.ibm.com; 914-784-6752 From: Lin Ma lin...@gmail.com To:user@hbase.apache.org, Date: 08/07/2012 09:30 PM Subject:consistency, availability and partition pattern of HBase Hello guys, According to the notes by Werner*, *He presented the CAP theorem, which states that of three properties of shared-data systems—data consistency, system availability, and tolerance to network partition—only two can be achieved at any given time. = http://www.allthingsdistributed.com/2008/12/eventually_consistent.html But it seems HBase could achieve all of the 3 features at the same time. Does it mean HBase breaks the rule by Werner. :-) If not, which one is sacrificed -- consistency (by using HDFS), availability (by using Zookeeper) or partition (by using region / column family) ? And why? regards, Lin
Re: column based or row based storage for HBase?
On Sun, Aug 5, 2012 at 8:03 PM, Lin Ma lin...@gmail.com wrote: Thank you for the informative reply, Mohit! Some more comments, 1. actually my confusion about column based storage is from the book HBase The Definitive Guide, chapter 1, section the Dawn of Big Data, which draw a picture showing HBase store the same column of all different rows continuously physically in storage. Any comments? 2. I want to confirm my understanding is correct -- supposing I have only one column family with 10 columns, the physical storage is row (with all related columns) after row, other than store 1st column of all rows, then store 2nd columns of all rows, etc? 3. It seems when we say column based storage, there are two meanings, (1) column oriented database = en.wikipedia.org/wiki/Column-oriented_DBMS, where the same column of different rows stored together, (2) and column oriented architecture, e.g. how Hbase is designed, which is used to describe the pattern to store sparse, large number of columns (with NULL for free). Any comments? In simple terms, HBase is not a column Oriented store. All the data for a row is stored together but the store file is created only per column family. regards, Lin On Mon, Aug 6, 2012 at 12:08 AM, Mohit Anchlia mohitanch...@gmail.comwrote: On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma lin...@gmail.com wrote: Hi guys, I am wondering whether HBase is using column based storage or row based storage? - I read some technical documents and mentioned advantages of HBase is using column based storage to store similar data together to foster compression. So it means same columns of different rows are stored together; Probably what you read was in context of Column Families. HBase has concept of column family similar to Google's bigtable. And the store files on disk is per column family. All columns of a given column family are in one store file and columns of different column family is a different file. - But I also learned HBase is a sorted key-value map in underlying HFile. It uses key to address all related columns for that key (row), so it seems to be a row based storage? HBase stores entire row together along with columns represented by KeyValue. This is also called cell in HBase. It is appreciated if anyone could clarify my confusions. Any related documents or code for more details are welcome. thanks in advance, Lin
Re: column based or row based storage for HBase?
On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma lin...@gmail.com wrote: Hi guys, I am wondering whether HBase is using column based storage or row based storage? - I read some technical documents and mentioned advantages of HBase is using column based storage to store similar data together to foster compression. So it means same columns of different rows are stored together; Probably what you read was in context of Column Families. HBase has concept of column family similar to Google's bigtable. And the store files on disk is per column family. All columns of a given column family are in one store file and columns of different column family is a different file. - But I also learned HBase is a sorted key-value map in underlying HFile. It uses key to address all related columns for that key (row), so it seems to be a row based storage? HBase stores entire row together along with columns represented by KeyValue. This is also called cell in HBase. It is appreciated if anyone could clarify my confusions. Any related documents or code for more details are welcome. thanks in advance, Lin
Re: sync on writes
On Wed, Aug 1, 2012 at 9:29 AM, lars hofhansl lhofha...@yahoo.com wrote: sync is a fluffy term in HDFS. HDFS has hsync and hflush. hflush forces all current changes at a DFSClient to all replica nodes (but not to disk). Until HDFS-744 hsync would be identical to hflush. After HDFS-744 hsync can be used to force data to disk at the replicas. When HBase refers to sync the hflush semantics are meant (at least until HBASE-5954 is finished). I.e. a sync here ensures that the replica nodes have seen the changes, which is what you want. So when you say since another copy is always there on the replica nodes, that is only guaranteed after an hflush (again, which HBase calls sync). I've also written about this here: http://hadoop-hbase.blogspot.com/2012/05/hbase-hdfs-and-durable-sync.html -- Lars Thanks this post is very helpful From: Mohit Anchlia mohitanch...@gmail.com To: user@hbase.apache.org Sent: Tuesday, July 31, 2012 6:09 PM Subject: sync on writes In the HBase book it mentioned that the default behaviour of write is to call sync on each node before sending replica copies to the nodes in the pipeline. Is there a reason this was kept default because if data is getting written on multiple nodes then likelyhood of losing data is really low since another copy is always there on the replica nodes. Is it ok to make this sync async and is it advisable?
Re: Region server failure question
On Wed, Aug 1, 2012 at 12:52 PM, Mohammad Tariq donta...@gmail.com wrote: Hello Mohit, If replication factor is set to some value 1, then the data is still present on some other node(perhaps within the same rack or a different one). And, as far as this post is concerned it tells us about Write Ahead Logs, i.e data that is still not written onto the disk. This is different from the data written in HFiles, i.e the persistent data. If the regionserver fails while the data is still being written, the data can be recovered by replaying the edits from the WAL file. Please let me know if you disagree. I understand that there is no data loss. However, it looks like that all the regions on specific region server is unavailable until it comes back up? It looks like that all client read and write calls for those key ranges would fail until new region server split the logs and bring the regions up. Regards, Mohammad Tariq On Thu, Aug 2, 2012 at 12:11 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I was reading blog http://www.cloudera.com/blog/2012/07/hbase-log-splitting/ and it looks like if region servers fails then all the regions on that region servers are unavailable until a regions are assigned to a different region server. Does it mean all the key ranges for the failed region server is unavailable for reads and writes until regions are available on some other region server? If so then how to deal with failures while real time data might be flowing into HBase.
Re: Null row key
HBase 90.4 On Tue, Jul 31, 2012 at 4:18 PM, Michael Segel michael_se...@hotmail.comwrote: Which release? On Jul 31, 2012, at 5:13 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am seeing null row key and I am wondering how I got the nulls in there. Is it possible when using HBaseClient that a null row might have got inserted?
Re: Null row key
Not sure how but I am getting one null row per 9 writes when I do a GET in result.getRow(). Is it even possible to write null rows? On Tue, Jul 31, 2012 at 4:49 PM, Mohit Anchlia mohitanch...@gmail.comwrote: HBase 90.4 On Tue, Jul 31, 2012 at 4:18 PM, Michael Segel michael_se...@hotmail.comwrote: Which release? On Jul 31, 2012, at 5:13 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am seeing null row key and I am wondering how I got the nulls in there. Is it possible when using HBaseClient that a null row might have got inserted?
Re: Cluster load
On Fri, Jul 27, 2012 at 6:03 PM, Alex Baranau alex.barano...@gmail.comwrote: Yeah, your row keys start with \x00 which is = (byte) 0. This is not the same as 0 (which is = (byte) 48). You know what to fix now ;) I made required changes and it seems to be load balancing it pretty well. I do have a follow up question around how to intrepret the output of hbase shell. If I want to visually calculate the length of the row key can I assume that \x00\x00 is equal to 2 bytes? I am just trying to get my head around understanding hex format displayed on the shell. \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7'\x05\x11 column=S_T_MTX:\x00\x00?\xB8, timestamp=1343670017892, value=1343670136312 \xBF Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Fri, Jul 27, 2012 at 8:43 PM, Mohit Anchlia mohitanch...@gmail.com wrote: On Fri, Jul 27, 2012 at 4:51 PM, Alex Baranau alex.barano...@gmail.com wrote: Can you scan your table and show one record? I guess you might be confusing Bytes.toBytes(0) vs byte[] {(byte) 0} that I mentioned in the other thread. I.e. looks like first region holds records which key starts with any byte up to 0, which is (byte) 48. Hence, if you set first byte of your key to anything from (byte) 0 - (byte) 9, all of them will fall into first regions which holds records with prefixes (byte) 0 - (byte) 48. Could you check that? I thought that if I give Bytes.toBytes(0) it really means that the row keys starting with 0 will go in that region. Here is my code that creates a row key and splits using admin util. I also am including the output of hbase shell scan after the code. public static byte[][] splitRegionsSessionTimeline(int start, int end) { byte[][] splitKeys = new byte[end][]; // the first region starting with empty key will be created // automatically for (int i = 0; i splitKeys.length; i++) { splitKeys[i] = Bytes.toBytes(String.valueOf(i)); } return splitKeys; } public static byte [] getRowKey(MetricType metricName, Long timestamp, Short bucketNo, char rowDelim){ byte [] result = null; int rowSize = getRowSize(); ByteBuffer b = ByteBuffer.allocate(rowSize); //Bucket No 0-9 randomely b.putShort(bucketNo); //Row Delimiter b.putChar(rowDelim); b.putShort(metricName.getId()); long reverseOrderEpoch = getReverseBaseTimeStamp(metricName,timestamp); b.putLong(reverseOrderEpoch); result = b.array(); return result; } from hbase shell scan table: \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gK, timestamp=1343350528865, value=1343350646443 F \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gL, timestamp=1343350528866, value=1343350646444 F \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gU, timestamp=1343350528874, value=1343350646453 F \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gZ, timestamp=1343350528880, value=1343350646458 F Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Fri, Jul 27, 2012 at 7:24 PM, Mohit Anchlia mohitanch...@gmail.com wrote: On Fri, Jul 27, 2012 at 11:48 AM, Alex Baranau alex.barano...@gmail.com wrote: You can read metrics [0] from JMX directly [1] or use Ganglia [2] or other third-party tools like [3] (I'm a little biased here;)). [0] http://hbase.apache.org/book.html#hbase_metrics [1] http://hbase.apache.org/metrics.html [2] http://wiki.apache.org/hadoop/GangliaMetrics [3] http://sematext.com/spm/hbase-performance-monitoring/index.html Note, that metrics values may seem a bit ugly/weird: as they say, you have to refer to Lars' book HBase in Action to understand how some of them calculated. There's an ongoing work towards revising metrics, they should look much better in next releases. I did flush still what I am seeing is that all my keys are still going to the first region even though my keys have 0-9 as the first character. Is there a easy way to see why that might be? hbase shell scan only shows value in hex. SESSION_TIMELINE1,,1343334722986.0a5f6fadd0435 column=info:regioninfo, timestamp=1343334723073, value=REGION = {NAME = 'SESSION_TIMELINE1,,1343334722986.0a5f6fadd0435898c6f4cf11daa989 898c6f4cf11daa9895a. 5a.', STARTKEY = '', ENDKEY = '0', ENCODED = 0a5f6fadd0435898c6f4cf11daa9895a, TABLE = {{NAME = 'SESSION_TIMELINE1', FAMILIES = [{NA ME = 'S_T_MTX', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'GZ', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE
Re: Cluster load
On Mon, Jul 30, 2012 at 11:58 AM, Alex Baranau alex.barano...@gmail.comwrote: Glad to hear that answers suggestions helped you! The format you are seeing is the output of org.apache.hadoop.hbase.util.Bytes.toStringBinary(..) method [1]. As you can see below, for printable characters it outputs the character itself, while for non-printable characters it outputs data in format \xNN (e.g. \x00). I.e. in your case \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7'\x05\x11\xBF - \x00\x00\x00 + : + \x00\x01\x7F\xFF\xFE\xC7 + ' + \xBF, which is 3+1+6+1+1=12 bytes. I'd better use Bytes.toBytesBinary(String) method, which converts back to byte array. Or, if you are using ResultScanner API for fetching data, just invoke Result.getRow().length. Thanks! Really appreciate your help. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] /** * Write a printable representation of a byte array. Non-printable * characters are hex escaped in the format \\x%02X, eg: * \x00 \x05 etc * * @param b array to write out * @param off offset to start at * @param len length to write * @return string output */ public static String toStringBinary(final byte [] b, int off, int len) { StringBuilder result = new StringBuilder(); try { String first = new String(b, off, len, ISO-8859-1); for (int i = 0; i first.length() ; ++i ) { int ch = first.charAt(i) 0xFF; if ( (ch = '0' ch = '9') || (ch = 'A' ch = 'Z') || (ch = 'a' ch = 'z') || `~!@#$%^*()-_=+[]{}\\|;:'\,./?.indexOf(ch) = 0 ) { result.append(first.charAt(i)); } else { result.append(String.format(\\x%02X, ch)); } } } catch (UnsupportedEncodingException e) { LOG.error(ISO-8859-1 not supported?, e); } return result.toString(); } On Mon, Jul 30, 2012 at 1:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote: On Fri, Jul 27, 2012 at 6:03 PM, Alex Baranau alex.barano...@gmail.com wrote: Yeah, your row keys start with \x00 which is = (byte) 0. This is not the same as 0 (which is = (byte) 48). You know what to fix now ;) I made required changes and it seems to be load balancing it pretty well. I do have a follow up question around how to intrepret the output of hbase shell. If I want to visually calculate the length of the row key can I assume that \x00\x00 is equal to 2 bytes? I am just trying to get my head around understanding hex format displayed on the shell. \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7'\x05\x11 column=S_T_MTX:\x00\x00?\xB8, timestamp=1343670017892, value=1343670136312 \xBF Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Fri, Jul 27, 2012 at 8:43 PM, Mohit Anchlia mohitanch...@gmail.com wrote: On Fri, Jul 27, 2012 at 4:51 PM, Alex Baranau alex.barano...@gmail.com wrote: Can you scan your table and show one record? I guess you might be confusing Bytes.toBytes(0) vs byte[] {(byte) 0} that I mentioned in the other thread. I.e. looks like first region holds records which key starts with any byte up to 0, which is (byte) 48. Hence, if you set first byte of your key to anything from (byte) 0 - (byte) 9, all of them will fall into first regions which holds records with prefixes (byte) 0 - (byte) 48. Could you check that? I thought that if I give Bytes.toBytes(0) it really means that the row keys starting with 0 will go in that region. Here is my code that creates a row key and splits using admin util. I also am including the output of hbase shell scan after the code. public static byte[][] splitRegionsSessionTimeline(int start, int end) { byte[][] splitKeys = new byte[end][]; // the first region starting with empty key will be created // automatically for (int i = 0; i splitKeys.length; i++) { splitKeys[i] = Bytes.toBytes(String.valueOf(i)); } return splitKeys; } public static byte [] getRowKey(MetricType metricName, Long timestamp, Short bucketNo, char rowDelim){ byte [] result = null; int rowSize = getRowSize(); ByteBuffer b = ByteBuffer.allocate(rowSize); //Bucket No 0-9 randomely b.putShort(bucketNo); //Row Delimiter b.putChar(rowDelim); b.putShort(metricName.getId()); long reverseOrderEpoch = getReverseBaseTimeStamp(metricName,timestamp); b.putLong(reverseOrderEpoch); result = b.array(); return result; } from hbase shell scan table: \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gK, timestamp=1343350528865, value=1343350646443 F \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9
Re: Cluster load
On Fri, Jul 27, 2012 at 6:03 PM, Alex Baranau alex.barano...@gmail.comwrote: Yeah, your row keys start with \x00 which is = (byte) 0. This is not the same as 0 (which is = (byte) 48). You know what to fix now ;) Thanks for checking! I'll make the required changes to my split. Is it possible to alter splits or only way is to re-create the tables? Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Fri, Jul 27, 2012 at 8:43 PM, Mohit Anchlia mohitanch...@gmail.com wrote: On Fri, Jul 27, 2012 at 4:51 PM, Alex Baranau alex.barano...@gmail.com wrote: Can you scan your table and show one record? I guess you might be confusing Bytes.toBytes(0) vs byte[] {(byte) 0} that I mentioned in the other thread. I.e. looks like first region holds records which key starts with any byte up to 0, which is (byte) 48. Hence, if you set first byte of your key to anything from (byte) 0 - (byte) 9, all of them will fall into first regions which holds records with prefixes (byte) 0 - (byte) 48. Could you check that? I thought that if I give Bytes.toBytes(0) it really means that the row keys starting with 0 will go in that region. Here is my code that creates a row key and splits using admin util. I also am including the output of hbase shell scan after the code. public static byte[][] splitRegionsSessionTimeline(int start, int end) { byte[][] splitKeys = new byte[end][]; // the first region starting with empty key will be created // automatically for (int i = 0; i splitKeys.length; i++) { splitKeys[i] = Bytes.toBytes(String.valueOf(i)); } return splitKeys; } public static byte [] getRowKey(MetricType metricName, Long timestamp, Short bucketNo, char rowDelim){ byte [] result = null; int rowSize = getRowSize(); ByteBuffer b = ByteBuffer.allocate(rowSize); //Bucket No 0-9 randomely b.putShort(bucketNo); //Row Delimiter b.putChar(rowDelim); b.putShort(metricName.getId()); long reverseOrderEpoch = getReverseBaseTimeStamp(metricName,timestamp); b.putLong(reverseOrderEpoch); result = b.array(); return result; } from hbase shell scan table: \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gK, timestamp=1343350528865, value=1343350646443 F \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gL, timestamp=1343350528866, value=1343350646444 F \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gU, timestamp=1343350528874, value=1343350646453 F \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gZ, timestamp=1343350528880, value=1343350646458 F Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Fri, Jul 27, 2012 at 7:24 PM, Mohit Anchlia mohitanch...@gmail.com wrote: On Fri, Jul 27, 2012 at 11:48 AM, Alex Baranau alex.barano...@gmail.com wrote: You can read metrics [0] from JMX directly [1] or use Ganglia [2] or other third-party tools like [3] (I'm a little biased here;)). [0] http://hbase.apache.org/book.html#hbase_metrics [1] http://hbase.apache.org/metrics.html [2] http://wiki.apache.org/hadoop/GangliaMetrics [3] http://sematext.com/spm/hbase-performance-monitoring/index.html Note, that metrics values may seem a bit ugly/weird: as they say, you have to refer to Lars' book HBase in Action to understand how some of them calculated. There's an ongoing work towards revising metrics, they should look much better in next releases. I did flush still what I am seeing is that all my keys are still going to the first region even though my keys have 0-9 as the first character. Is there a easy way to see why that might be? hbase shell scan only shows value in hex. SESSION_TIMELINE1,,1343334722986.0a5f6fadd0435 column=info:regioninfo, timestamp=1343334723073, value=REGION = {NAME = 'SESSION_TIMELINE1,,1343334722986.0a5f6fadd0435898c6f4cf11daa989 898c6f4cf11daa9895a. 5a.', STARTKEY = '', ENDKEY = '0', ENCODED = 0a5f6fadd0435898c6f4cf11daa9895a, TABLE = {{NAME = 'SESSION_TIMELINE1', FAMILIES = [{NA ME = 'S_T_MTX', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'GZ', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}} SESSION_TIMELINE1,0,1343334722986.79e03d78a784 column=info:regioninfo, timestamp=1343334723116, value=REGION = {NAME = 'SESSION_TIMELINE1,0,1343334722986.79e03d78a784601e8daa88aa85c39 601e8daa88aa85c39854. 854.', STARTKEY = '0
Re: Bloom Filter
On Fri, Jul 27, 2012 at 7:25 AM, Alex Baranau alex.barano...@gmail.comwrote: Very good explanation (and food for thinking) about using bloom filters in HBase in answers here: http://www.quora.com/How-are-bloom-filters-used-in-HBase. Should we put the link to it from Apache HBase book (ref guide)? Thanks this is helpful Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Thu, Jul 26, 2012 at 8:38 PM, Mohit Anchlia mohitanch...@gmail.com wrote: On Thu, Jul 26, 2012 at 1:52 PM, Minh Duc Nguyen mdngu...@gmail.com wrote: Mohit, According to HBase: The Definitive Guide, The row+column Bloom filter is useful when you cannot batch updates for a specific row, and end up with store files which all contain parts of the row. The more specific row+column filter can then identify which of the files contain the data you are requesting. Obviously, if you always load the entire row, this filter is once again hardly useful, as the region server will need to load the matching block out of each file anyway. Since the row+column filter will require more storage, you need to do the math to determine whether it is worth the extra resources. Thanks! I have a timeseries data so I am thinking I should enable bloom filters for only rows ~ Minh On Thu, Jul 26, 2012 at 4:30 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Is it advisable to enable bloom filters on the column family? Also, why is it called global kill switch? Bloom Filter Configuration 2.9.1. io.hfile.bloom.enabled global kill switch io.hfile.bloom.enabled in Configuration serves as the kill switch in case something goes wrong. Default = true. -- Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
Re: Cluster load
: 0a5f6fadd0435898c6f4cf11daa9895a, columnFamily: S_T_MTX, hfile(created by memstore flush): 1566523617482885717, size: 1993369 bytes. btw, 2MB looks weird: very small flush size (in this case, in other cases this may happen - long story). May be compression does very well :) Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Fri, Jul 27, 2012 at 10:52 AM, syed kather in.ab...@gmail.com wrote: Alex Baranau, Can please tell how did you found it has 2GB of data from 0a5f6fadd0435898c6f4cf11daa9895a . I am pretty much intrested to know it . Thanks and Regards, S SYED ABDUL KATHER On Fri, Jul 27, 2012 at 7:51 PM, Alex Baranau alex.barano...@gmail.com wrote: From what you posted above, I guess one of the regions (0a5f6fadd0435898c6f4cf11daa9895a, note that it has 2 files 2GB each [1], while others regions are empty) is getting hit with writes. You may want to run flush 'mytable' command from hbase shell before looking at hdfs - this way you make sure your data is flushed to hdfs (and not hanged in Memstores). You may want to check the START/END keys of this region (via master web ui or in .META.). Then you can compare with the keys generated by your app. This should give you some info about what's going on. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] -rwxr-xr-x 3 root root 1993369 2012-07-26 13:59 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/1566523617482885717 -rwxr-xr-x 3 root root 2003372 2012-07-26 13:57 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/7665015246030620502 On Fri, Jul 27, 2012 at 3:16 AM, Khang Pham khang...@gmail.com wrote: Hi, by node do you mean regionserver node ? if you referring to RegionServer node: you can go to the hbase master web interface master:65510/master.jsp to see load for each regionserver. That's the overall load. If you want to see load per node per table, you will need to query on .META. table (column: info:server) --K On Fri, Jul 27, 2012 at 9:07 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Is there a way to see how much data does each node have per Hbase table? On Thu, Jul 26, 2012 at 5:53 PM, syed kather in.ab...@gmail.com wrote: First check whether the data in hbase is consistent ... check this by running hbck (bin/hbase hbck ) If all the region is consistent . Now check no of splits in localhost:60010 for the table mention .. On Jul 27, 2012 4:02 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I added new regions and the performance didn't improve. I think it still is the load balancing issue. I want to ensure that my rows are getting distrbuted accross cluster. What I see is this: Could you please tell me what's the best way to see the load? [root@dsdb4 ~]# hadoop fs -lsr /hbase/SESSION_TIMELINE1/ drwxr-xr-x - root root 3 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641 drwxr-xr-x - root root 1 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs -rwxr-xr-x 3 root root 124 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs/hlog.1343334723359 drwxr-xr-x - root root 0 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/S_T_MTX -rwxr-xr-x 3 root root 764 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.regioninfo drwxr-xr-x - root root 3 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854 drwxr-xr-x - root root 1 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.oldlogs -rwxr-xr-x 3 root root 124 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.oldlogs/hlog.1343334723093 drwxr-xr-x - root root 0 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/S_T_MTX -rwxr-xr-x 3 root root 764 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.regioninfo drwxr-xr-x
Re: Cluster load
On Fri, Jul 27, 2012 at 4:51 PM, Alex Baranau alex.barano...@gmail.comwrote: Can you scan your table and show one record? I guess you might be confusing Bytes.toBytes(0) vs byte[] {(byte) 0} that I mentioned in the other thread. I.e. looks like first region holds records which key starts with any byte up to 0, which is (byte) 48. Hence, if you set first byte of your key to anything from (byte) 0 - (byte) 9, all of them will fall into first regions which holds records with prefixes (byte) 0 - (byte) 48. Could you check that? I thought that if I give Bytes.toBytes(0) it really means that the row keys starting with 0 will go in that region. Here is my code that creates a row key and splits using admin util. I also am including the output of hbase shell scan after the code. public static byte[][] splitRegionsSessionTimeline(int start, int end) { byte[][] splitKeys = new byte[end][]; // the first region starting with empty key will be created // automatically for (int i = 0; i splitKeys.length; i++) { splitKeys[i] = Bytes.toBytes(String.valueOf(i)); } return splitKeys; } public static byte [] getRowKey(MetricType metricName, Long timestamp, Short bucketNo, char rowDelim){ byte [] result = null; int rowSize = getRowSize(); ByteBuffer b = ByteBuffer.allocate(rowSize); //Bucket No 0-9 randomely b.putShort(bucketNo); //Row Delimiter b.putChar(rowDelim); b.putShort(metricName.getId()); long reverseOrderEpoch = getReverseBaseTimeStamp(metricName,timestamp); b.putLong(reverseOrderEpoch); result = b.array(); return result; } from hbase shell scan table: \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gK, timestamp=1343350528865, value=1343350646443 F \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gL, timestamp=1343350528866, value=1343350646444 F \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gU, timestamp=1343350528874, value=1343350646453 F \x00\x00\x00:\x00\x01\x7F\xFF\xFE\xC7:\x10@\x9 column=S_T_MTX:\x00\x00gZ, timestamp=1343350528880, value=1343350646458 F Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Fri, Jul 27, 2012 at 7:24 PM, Mohit Anchlia mohitanch...@gmail.com wrote: On Fri, Jul 27, 2012 at 11:48 AM, Alex Baranau alex.barano...@gmail.com wrote: You can read metrics [0] from JMX directly [1] or use Ganglia [2] or other third-party tools like [3] (I'm a little biased here;)). [0] http://hbase.apache.org/book.html#hbase_metrics [1] http://hbase.apache.org/metrics.html [2] http://wiki.apache.org/hadoop/GangliaMetrics [3] http://sematext.com/spm/hbase-performance-monitoring/index.html Note, that metrics values may seem a bit ugly/weird: as they say, you have to refer to Lars' book HBase in Action to understand how some of them calculated. There's an ongoing work towards revising metrics, they should look much better in next releases. I did flush still what I am seeing is that all my keys are still going to the first region even though my keys have 0-9 as the first character. Is there a easy way to see why that might be? hbase shell scan only shows value in hex. SESSION_TIMELINE1,,1343334722986.0a5f6fadd0435 column=info:regioninfo, timestamp=1343334723073, value=REGION = {NAME = 'SESSION_TIMELINE1,,1343334722986.0a5f6fadd0435898c6f4cf11daa989 898c6f4cf11daa9895a. 5a.', STARTKEY = '', ENDKEY = '0', ENCODED = 0a5f6fadd0435898c6f4cf11daa9895a, TABLE = {{NAME = 'SESSION_TIMELINE1', FAMILIES = [{NA ME = 'S_T_MTX', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'GZ', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}} SESSION_TIMELINE1,0,1343334722986.79e03d78a784 column=info:regioninfo, timestamp=1343334723116, value=REGION = {NAME = 'SESSION_TIMELINE1,0,1343334722986.79e03d78a784601e8daa88aa85c39 601e8daa88aa85c39854. 854.', STARTKEY = '0', ENDKEY = '1', ENCODED = 79e03d78a784601e8daa88aa85c39854, TABLE = {{NAME = 'SESSION_TIMELINE1', FAMILIES = [{ NAME = 'S_T_MTX', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'GZ', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}} SESSION_TIMELINE1,1,1343334722987.1f0735a7e085 column=info:regioninfo, timestamp=1343334723154, value=REGION = {NAME = 'SESSION_TIMELINE1,1,1343334722987.1f0735a7e08504357d0bca07e6772 04357d0bca07e6772a75. a75.', STARTKEY = '1', ENDKEY = '2', ENCODED = 1f0735a7e08504357d0bca07e6772a75, TABLE = {{NAME = 'SESSION_TIMELINE1', FAMILIES
Re: Row distribution
On Thu, Jul 26, 2012 at 7:16 AM, Alex Baranau alex.barano...@gmail.comwrote: Looks like you have only one region in your table. Right? If you want your writes to be distributed from the start (without waiting for HBase to fill table enough to split it in many regions), you should pre-split your table. In your case you can pre-split table with 10 regions (just an example, you can define more), with start keys: , 1, 2, ..., 9 [1]. Thanks a lot! Is there any specific best practice on how many regions one should split a table into? Btw, since you are salting your keys to achieve distribution, you might also find this small lib helpful which implements most of the stuff for you [2]. I'll take a look Hope this helps. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] byte[][] splitKeys = new byte[9][]; // the first region starting with empty key will be created automatically for (int i = 1; i splitKeys.length; i++) { splitKeys[i] = Bytes.toBytes(String.valueOf(i)); } HBaseAdmin admin = new HBaseAdmin(conf); admin.createTable(tableDescriptor, splitKeys); [2] https://github.com/sematext/HBaseWD http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/ On Wed, Jul 25, 2012 at 7:54 PM, Mohit Anchlia mohitanch...@gmail.com wrote: On Wed, Jul 25, 2012 at 6:53 AM, Alex Baranau alex.barano...@gmail.com wrote: Hi Mohit, 1. When talking about particular table: For viewing rows distribution you can check out how regions are distributed. And each region defined by the start/stop key, so depending on your key format, etc. you can see which records go into each region. You can see the regions distribution in web ui as Adrien mentioned. It may also be handy for you to query .META. table [1] which holds regions info. In cases when you use random keys or when you just not sure how data is distributed in key buckets (which are regions), you may also want to look at HBase data on HDFS [2]. Since data is stored for each region separately, you can see the size on the HDFS each one occupies. I did a scan and the data looks like as pasted below. It appears all my writes are going to just one server. My keys are of this type [0-9]:[current timestamp]. Number between 0-9 is generated randomly. I thought by having this random number I'll be able to place my keys on multiple nodes. How should I approach this such that I am able to use other nodes as well? SESSION_TIMELINE1,,1343074465420.5831bbac53e59 column=info:regioninfo, timestamp=1343170773523, value=REGION = {NAME = 'SESSION_TIMELINE1,,1343074465420.5831bbac53e591c609918c0e2d7da7 1c609918c0e2d7da7bf. bf.', STARTKEY = '', ENDKEY = '', ENCODED = 5831bbac53e591c609918c0e2d7da7bf, TABLE = {{NAME = 'SESSION_TIMELINE1', FAMILIES = [{NAM E = 'S_T_MTX', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'GZ', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = ' 65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}} SESSION_TIMELINE1,,1343074465420.5831bbac53e59 column=info:server, timestamp=1343178912655, value=dsdb3.:60020 1c609918c0e2d7da7bf. 2. When talking about whole cluster, it makes sense to use cluster monitoring tool [3], to find out more about overall load distribution, regions of multiple tables distribution, requests amount, and many more such things. And of course, you can use HBase Java API to fetch some data of the cluster state as well. I guess you should start looking at it from HBaseAdmin class. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] hbase(main):001:0 scan '.META.', {LIMIT=1, STARTROW=mytable,,} ROW COLUMN+CELL mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845. column=info:regioninfo, timestamp=1341279432625, value=REGION = {NAME = 'mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845.', STARTKEY = 'chicago', ENDKEY = 'new_york', ENCODED = fd61cd7ef426d2f233a4cd7e8b73845, TABLE = {{NAME = 'mytable', FAMILIES = [{NAME = 'job', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}} mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845. column=info:server, timestamp=1341279432673, value=myserver:60020 mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845. column=info:serverstartcode, timestamp=1341279432673, value=1341267474257 1 row(s) in 0.1980 seconds [2] ubuntu@ip-10-80-47-73:~$ sudo -u hdfs hadoop fs
Re: Row distribution
On Thu, Jul 26, 2012 at 7:16 AM, Alex Baranau alex.barano...@gmail.comwrote: Looks like you have only one region in your table. Right? If you want your writes to be distributed from the start (without waiting for HBase to fill table enough to split it in many regions), you should pre-split your table. In your case you can pre-split table with 10 regions (just an example, you can define more), with start keys: , 1, 2, ..., 9 [1]. Just one more question, in the split keys that you described below, is it based on the first byte value of the Key? Btw, since you are salting your keys to achieve distribution, you might also find this small lib helpful which implements most of the stuff for you [2]. Hope this helps. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] byte[][] splitKeys = new byte[9][]; // the first region starting with empty key will be created automatically for (int i = 1; i splitKeys.length; i++) { splitKeys[i] = Bytes.toBytes(String.valueOf(i)); } HBaseAdmin admin = new HBaseAdmin(conf); admin.createTable(tableDescriptor, splitKeys); [2] https://github.com/sematext/HBaseWD http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/ On Wed, Jul 25, 2012 at 7:54 PM, Mohit Anchlia mohitanch...@gmail.com wrote: On Wed, Jul 25, 2012 at 6:53 AM, Alex Baranau alex.barano...@gmail.com wrote: Hi Mohit, 1. When talking about particular table: For viewing rows distribution you can check out how regions are distributed. And each region defined by the start/stop key, so depending on your key format, etc. you can see which records go into each region. You can see the regions distribution in web ui as Adrien mentioned. It may also be handy for you to query .META. table [1] which holds regions info. In cases when you use random keys or when you just not sure how data is distributed in key buckets (which are regions), you may also want to look at HBase data on HDFS [2]. Since data is stored for each region separately, you can see the size on the HDFS each one occupies. I did a scan and the data looks like as pasted below. It appears all my writes are going to just one server. My keys are of this type [0-9]:[current timestamp]. Number between 0-9 is generated randomly. I thought by having this random number I'll be able to place my keys on multiple nodes. How should I approach this such that I am able to use other nodes as well? SESSION_TIMELINE1,,1343074465420.5831bbac53e59 column=info:regioninfo, timestamp=1343170773523, value=REGION = {NAME = 'SESSION_TIMELINE1,,1343074465420.5831bbac53e591c609918c0e2d7da7 1c609918c0e2d7da7bf. bf.', STARTKEY = '', ENDKEY = '', ENCODED = 5831bbac53e591c609918c0e2d7da7bf, TABLE = {{NAME = 'SESSION_TIMELINE1', FAMILIES = [{NAM E = 'S_T_MTX', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'GZ', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = ' 65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}} SESSION_TIMELINE1,,1343074465420.5831bbac53e59 column=info:server, timestamp=1343178912655, value=dsdb3.:60020 1c609918c0e2d7da7bf. 2. When talking about whole cluster, it makes sense to use cluster monitoring tool [3], to find out more about overall load distribution, regions of multiple tables distribution, requests amount, and many more such things. And of course, you can use HBase Java API to fetch some data of the cluster state as well. I guess you should start looking at it from HBaseAdmin class. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] hbase(main):001:0 scan '.META.', {LIMIT=1, STARTROW=mytable,,} ROW COLUMN+CELL mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845. column=info:regioninfo, timestamp=1341279432625, value=REGION = {NAME = 'mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845.', STARTKEY = 'chicago', ENDKEY = 'new_york', ENCODED = fd61cd7ef426d2f233a4cd7e8b73845, TABLE = {{NAME = 'mytable', FAMILIES = [{NAME = 'job', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}} mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845. column=info:server, timestamp=1341279432673, value=myserver:60020 mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845. column=info:serverstartcode, timestamp=1341279432673, value=1341267474257 1 row(s) in 0.1980 seconds [2] ubuntu@ip-10-80-47-73:~$ sudo -u hdfs hadoop fs -du
Re: Row distribution
On Thu, Jul 26, 2012 at 10:34 AM, Alex Baranau alex.barano...@gmail.comwrote: Is there any specific best practice on how many regions one should split a table into? As always, it depends. Usually you don't want your RegionServers to serve more than 50 regions or so. The fewer the better. But at the same time you usually want your regions to be distributed over the whole cluster (so that you use all power). So, it might make sense to start with one region per RS (if your writes are more or less evenly distributed across pre-splitted regions) if you don't know about you data size. If you know that you'll need to have more regions because of how big is your data, then you might create more regions at the start (with pre-splitting), so that you avoid region splits operations (you really want to avoid them if you can). Of course, you need to take into account other tables in your cluster as well. I.e. usually not more than 50 regions total per regionserver. Thanks for the detailed explanation. I understand the regions per regionserver, which is essentially range of rows distributed accross the cluster for a given table. But who decides on how many regionservers to have in the cluster? Just one more question, in the split keys that you described below, is it based on the first byte value of the Key? yes. And the first byte contains readable char, because of Bytes.ToBytes(String.valueOf(i)). If you want to prefix with (byte) 0, ..., (byte) 9 (i.e. with 0x00, 0x01, ..., 0x09) then no need to convert to String. How different is this mechanism as compared to regionsplitter that uses default string md5 split. Just trying to understand the difference in how different the key range is. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Thu, Jul 26, 2012 at 11:43 AM, Mohit Anchlia mohitanch...@gmail.com wrote: On Thu, Jul 26, 2012 at 7:16 AM, Alex Baranau alex.barano...@gmail.com wrote: Looks like you have only one region in your table. Right? If you want your writes to be distributed from the start (without waiting for HBase to fill table enough to split it in many regions), you should pre-split your table. In your case you can pre-split table with 10 regions (just an example, you can define more), with start keys: , 1, 2, ..., 9 [1]. Just one more question, in the split keys that you described below, is it based on the first byte value of the Key? Btw, since you are salting your keys to achieve distribution, you might also find this small lib helpful which implements most of the stuff for you [2]. Hope this helps. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] byte[][] splitKeys = new byte[9][]; // the first region starting with empty key will be created automatically for (int i = 1; i splitKeys.length; i++) { splitKeys[i] = Bytes.toBytes(String.valueOf(i)); } HBaseAdmin admin = new HBaseAdmin(conf); admin.createTable(tableDescriptor, splitKeys); [2] https://github.com/sematext/HBaseWD http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/ On Wed, Jul 25, 2012 at 7:54 PM, Mohit Anchlia mohitanch...@gmail.com wrote: On Wed, Jul 25, 2012 at 6:53 AM, Alex Baranau alex.barano...@gmail.com wrote: Hi Mohit, 1. When talking about particular table: For viewing rows distribution you can check out how regions are distributed. And each region defined by the start/stop key, so depending on your key format, etc. you can see which records go into each region. You can see the regions distribution in web ui as Adrien mentioned. It may also be handy for you to query .META. table [1] which holds regions info. In cases when you use random keys or when you just not sure how data is distributed in key buckets (which are regions), you may also want to look at HBase data on HDFS [2]. Since data is stored for each region separately, you can see the size on the HDFS each one occupies. I did a scan and the data looks like as pasted below. It appears all my writes are going to just one server. My keys are of this type [0-9]:[current timestamp]. Number between 0-9 is generated randomly. I thought by having this random number I'll be able to place my keys on multiple nodes. How should I approach this such that I am able to use other nodes as well? SESSION_TIMELINE1,,1343074465420.5831bbac53e59 column=info:regioninfo, timestamp=1343170773523, value=REGION = {NAME = 'SESSION_TIMELINE1,,1343074465420.5831bbac53e591c609918c0e2d7da7 1c609918c0e2d7da7bf. bf
Re: Row distribution
On Thu, Jul 26, 2012 at 1:29 PM, Alex Baranau alex.barano...@gmail.comwrote: But who decides on how many regionservers to have in the cluster? RegionServer is a process started on each slave in your cluster. So the number of RS is the same as the number of slaves. You might want to take a look at one of Intro to HBase presentations (which have pictures!) [1] How different is this mechanism as compared to regionsplitter that uses default string md5 split. Just trying to understand the difference in how different the key range is. You can use any of the splitter algorithm, but note that it probably will not take into account the row keys you are going to use. E.g.: * if your row keys have format countrystatecompany... and * you know that you will have most of the data about US companies (if e.g. this is your target audience) then * based on the example I gave, you can create regions defined by these start keys: US US_FL US_KN US_MS US_NC US_VM V so that data is more or less evenly distributed (note: there's no need to split other countries in regions as they they will have small amount of data). Thanks for great explanation!! No standard splitter will know what your data is (at the time of creation of the table). Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] http://blog.sematext.com/2012/07/09/introduction-to-hbase/ http://blog.sematext.com/2012/07/09/intro-to-hbase-internals-and-schema-desig/ or any other intro to hbase presentations over the web. On Thu, Jul 26, 2012 at 3:50 PM, Mohit Anchlia mohitanch...@gmail.com wrote: On Thu, Jul 26, 2012 at 10:34 AM, Alex Baranau alex.barano...@gmail.com wrote: Is there any specific best practice on how many regions one should split a table into? As always, it depends. Usually you don't want your RegionServers to serve more than 50 regions or so. The fewer the better. But at the same time you usually want your regions to be distributed over the whole cluster (so that you use all power). So, it might make sense to start with one region per RS (if your writes are more or less evenly distributed across pre-splitted regions) if you don't know about you data size. If you know that you'll need to have more regions because of how big is your data, then you might create more regions at the start (with pre-splitting), so that you avoid region splits operations (you really want to avoid them if you can). Of course, you need to take into account other tables in your cluster as well. I.e. usually not more than 50 regions total per regionserver. Thanks for the detailed explanation. I understand the regions per regionserver, which is essentially range of rows distributed accross the cluster for a given table. But who decides on how many regionservers to have in the cluster? Just one more question, in the split keys that you described below, is it based on the first byte value of the Key? yes. And the first byte contains readable char, because of Bytes.ToBytes(String.valueOf(i)). If you want to prefix with (byte) 0, ..., (byte) 9 (i.e. with 0x00, 0x01, ..., 0x09) then no need to convert to String. How different is this mechanism as compared to regionsplitter that uses default string md5 split. Just trying to understand the difference in how different the key range is. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Thu, Jul 26, 2012 at 11:43 AM, Mohit Anchlia mohitanch...@gmail.com wrote: On Thu, Jul 26, 2012 at 7:16 AM, Alex Baranau alex.barano...@gmail.com wrote: Looks like you have only one region in your table. Right? If you want your writes to be distributed from the start (without waiting for HBase to fill table enough to split it in many regions), you should pre-split your table. In your case you can pre-split table with 10 regions (just an example, you can define more), with start keys: , 1, 2, ..., 9 [1]. Just one more question, in the split keys that you described below, is it based on the first byte value of the Key? Btw, since you are salting your keys to achieve distribution, you might also find this small lib helpful which implements most of the stuff for you [2]. Hope this helps. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] byte[][] splitKeys = new byte[9][]; // the first region starting with empty key will be created automatically for (int i = 1; i splitKeys.length; i++) { splitKeys[i] = Bytes.toBytes(String.valueOf(i)); } HBaseAdmin admin = new
Re: Bloom Filter
On Thu, Jul 26, 2012 at 1:52 PM, Minh Duc Nguyen mdngu...@gmail.com wrote: Mohit, According to HBase: The Definitive Guide, The row+column Bloom filter is useful when you cannot batch updates for a specific row, and end up with store files which all contain parts of the row. The more specific row+column filter can then identify which of the files contain the data you are requesting. Obviously, if you always load the entire row, this filter is once again hardly useful, as the region server will need to load the matching block out of each file anyway. Since the row+column filter will require more storage, you need to do the math to determine whether it is worth the extra resources. Thanks! I have a timeseries data so I am thinking I should enable bloom filters for only rows ~ Minh On Thu, Jul 26, 2012 at 4:30 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Is it advisable to enable bloom filters on the column family? Also, why is it called global kill switch? Bloom Filter Configuration 2.9.1. io.hfile.bloom.enabled global kill switch io.hfile.bloom.enabled in Configuration serves as the kill switch in case something goes wrong. Default = true.
Re: Cluster load
Is there a way to see how much data does each node have per Hbase table? On Thu, Jul 26, 2012 at 5:53 PM, syed kather in.ab...@gmail.com wrote: First check whether the data in hbase is consistent ... check this by running hbck (bin/hbase hbck ) If all the region is consistent . Now check no of splits in localhost:60010 for the table mention .. On Jul 27, 2012 4:02 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I added new regions and the performance didn't improve. I think it still is the load balancing issue. I want to ensure that my rows are getting distrbuted accross cluster. What I see is this: Could you please tell me what's the best way to see the load? [root@dsdb4 ~]# hadoop fs -lsr /hbase/SESSION_TIMELINE1/ drwxr-xr-x - root root 3 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641 drwxr-xr-x - root root 1 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs -rwxr-xr-x 3 root root 124 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.oldlogs/hlog.1343334723359 drwxr-xr-x - root root 0 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/S_T_MTX -rwxr-xr-x 3 root root 764 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/8c02c8ed87e1a023ece8d8090a364641/.regioninfo drwxr-xr-x - root root 3 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854 drwxr-xr-x - root root 1 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.oldlogs -rwxr-xr-x 3 root root 124 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.oldlogs/hlog.1343334723093 drwxr-xr-x - root root 0 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/S_T_MTX -rwxr-xr-x 3 root root 764 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/79e03d78a784601e8daa88aa85c39854/.regioninfo drwxr-xr-x - root root 3 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba drwxr-xr-x - root root 1 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba/.oldlogs -rwxr-xr-x 3 root root 124 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba/.oldlogs/hlog.1343334723240 drwxr-xr-x - root root 0 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba/S_T_MTX -rwxr-xr-x 3 root root 764 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/1054fe861f199a23ebef8f49f99c4aba/.regioninfo drwxr-xr-x - root root 3 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/04295ef1d74887a107a013810f5aa26a drwxr-xr-x - root root 1 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/04295ef1d74887a107a013810f5aa26a/.oldlogs -rwxr-xr-x 3 root root 124 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/04295ef1d74887a107a013810f5aa26a/.oldlogs/hlog.1343334723171 drwxr-xr-x - root root 0 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/04295ef1d74887a107a013810f5aa26a/S_T_MTX -rwxr-xr-x 3 root root 764 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/04295ef1d74887a107a013810f5aa26a/.regioninfo drwxr-xr-x - root root 3 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/486b20400be4a901d92ecded96d737cf drwxr-xr-x - root root 1 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/486b20400be4a901d92ecded96d737cf/.oldlogs -rwxr-xr-x 3 root root 124 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/486b20400be4a901d92ecded96d737cf/.oldlogs/hlog.1343334723397 drwxr-xr-x - root root 0 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/486b20400be4a901d92ecded96d737cf/S_T_MTX -rwxr-xr-x 3 root root 762 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/486b20400be4a901d92ecded96d737cf/.regioninfo drwxr-xr-x - root root 4 2012-07-26 13:57 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a drwxr-xr-x - root root 0 2012-07-26 13:59 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/.tmp drwxr-xr-x - root root 1 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/.oldlogs -rwxr-xr-x 3 root root 124 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/.oldlogs/hlog.1343334723004 drwxr-xr-x - root root 2 2012-07-26 13:59 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX -rwxr-xr-x 3 root root 1993369 2012-07-26 13:59 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/1566523617482885717 -rwxr-xr-x 3 root root 2003372 2012-07-26 13:57 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/S_T_MTX/7665015246030620502 -rwxr-xr-x 3 root root 760 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/0a5f6fadd0435898c6f4cf11daa9895a/.regioninfo drwxr-xr-x - root root 3 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/5eb02340b918e80e5018c908845a8495 drwxr-xr-x - root root 1 2012-07-26 13:32 /hbase/SESSION_TIMELINE1/5eb02340b918e80e5018c908845a8495/.oldlogs -rwxr-xr-x 3 root root 124 2012-07-26 13:32 /hbase
Re: Row distribution
On Wed, Jul 25, 2012 at 6:53 AM, Alex Baranau alex.barano...@gmail.comwrote: Hi Mohit, 1. When talking about particular table: For viewing rows distribution you can check out how regions are distributed. And each region defined by the start/stop key, so depending on your key format, etc. you can see which records go into each region. You can see the regions distribution in web ui as Adrien mentioned. It may also be handy for you to query .META. table [1] which holds regions info. In cases when you use random keys or when you just not sure how data is distributed in key buckets (which are regions), you may also want to look at HBase data on HDFS [2]. Since data is stored for each region separately, you can see the size on the HDFS each one occupies. I did a scan and the data looks like as pasted below. It appears all my writes are going to just one server. My keys are of this type [0-9]:[current timestamp]. Number between 0-9 is generated randomly. I thought by having this random number I'll be able to place my keys on multiple nodes. How should I approach this such that I am able to use other nodes as well? SESSION_TIMELINE1,,1343074465420.5831bbac53e59 column=info:regioninfo, timestamp=1343170773523, value=REGION = {NAME = 'SESSION_TIMELINE1,,1343074465420.5831bbac53e591c609918c0e2d7da7 1c609918c0e2d7da7bf. bf.', STARTKEY = '', ENDKEY = '', ENCODED = 5831bbac53e591c609918c0e2d7da7bf, TABLE = {{NAME = 'SESSION_TIMELINE1', FAMILIES = [{NAM E = 'S_T_MTX', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'GZ', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = ' 65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}} SESSION_TIMELINE1,,1343074465420.5831bbac53e59 column=info:server, timestamp=1343178912655, value=dsdb3.:60020 1c609918c0e2d7da7bf. 2. When talking about whole cluster, it makes sense to use cluster monitoring tool [3], to find out more about overall load distribution, regions of multiple tables distribution, requests amount, and many more such things. And of course, you can use HBase Java API to fetch some data of the cluster state as well. I guess you should start looking at it from HBaseAdmin class. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] hbase(main):001:0 scan '.META.', {LIMIT=1, STARTROW=mytable,,} ROW COLUMN+CELL mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845. column=info:regioninfo, timestamp=1341279432625, value=REGION = {NAME = 'mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845.', STARTKEY = 'chicago', ENDKEY = 'new_york', ENCODED = fd61cd7ef426d2f233a4cd7e8b73845, TABLE = {{NAME = 'mytable', FAMILIES = [{NAME = 'job', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}} mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845. column=info:server, timestamp=1341279432673, value=myserver:60020 mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845. column=info:serverstartcode, timestamp=1341279432673, value=1341267474257 1 row(s) in 0.1980 seconds [2] ubuntu@ip-10-80-47-73:~$ sudo -u hdfs hadoop fs -du /hbase/mytable Found 130 items 3397hdfs://hbase.master/hbase/mytable /02925d3c335bff7e273f392324f16dca 2682163424 hdfs://hbase.master/hbase/mytable /03231b8ae2b73317c4858b1a85c09ad2 1038862956 hdfs://hbase.master/hbase/mytable /04f911571593e931a9a3d9e2a6616236 1039181555 hdfs://hbase.master/hbase/mytable /0a177633196cae7b158836181d69dc0f 107612 hdfs://hbase.master/hbase/mytable /0d52fc477c41a9a236803234d44c7c06 [3] You can get data from JMX directly using any tool you like or use: * Ganglia * SPM monitoring ( http://sematext.com/spm/hbase-performance-monitoring/index.html) * others On Wed, Jul 25, 2012 at 1:59 AM, Adrien Mogenet adrien.moge...@gmail.com wrote: From the web-interface, you can have such statistics when viewing the details of a table. You can also develop your own balance viewer through the HBase API (list of RS, regions, storeFiles, their size, etc.) On Wed, Jul 25, 2012 at 7:32 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Is there an easy way to tell how my nodes are balanced and how the rows are distributed in the cluster? -- Adrien Mogenet 06.59.16.64.22 http://www.mogenet.me -- Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
Re: Insert blocked
On Tue, Jul 24, 2012 at 3:09 AM, Lyska Anton ant...@wildec.com wrote: Hi, after first insert you are closing your table in finally block. thats why thread hangs I thought I need to close HTableInterface to return it back to the pool. Is that not the case? 24.07.2012 3:41, Mohit Anchlia пишет: I am now using HTablePool but still the call hangs at put. My code is something like this: hTablePool = *new* HTablePool(config,*MAX_POOL_**SIZE*); result = *new* SessionTimelineDAO(hTablePool.**getTable(t.name()), ColumnFamily.*S_T_MTX*); public SessionTimelineDAO(**HTableInterface hTableInterface, ColumnFamily cf){ this.tableInt = hTableInterface; this.cf = cf.name().getBytes(); log.info(Table + hTableInterface + + cf); } @Override public void create(DataStoreModel dm) throws DataStoreException { if(null == dm || null == dm.getKey()){ log.error(DataStoreModel is invalid); return; } Put p = new Put(dm.getKey().array()); for(ByteBuffer bf : dm.getCols().keySet()){ p.add(cf, bf.array(), dm.getColumnValue(bf).array())**; } try { log.info(In create ); tableInt.put(p); } catch (IOException e) { log.error(Error writing , e); throw new DataStoreException(e); } finally{ cleanUp(); } } private void cleanUp() { if(null != tableInt){ try { tableInt.close(); } catch (IOException e) { log.error(Failed while closing table interface, e); } } } On Mon, Jul 23, 2012 at 4:15 PM, Mohit Anchlia mohitanch...@gmail.com wrote: On Mon, Jul 23, 2012 at 3:54 PM, Elliott Clark ecl...@stumbleupon.com wrote: HTable is not thread safe[1]. It's better to use HTablePool if you want to share things across multiple threads.[2] 1 http://hbase.apache.org/**apidocs/org/apache/hadoop/** hbase/client/HTable.htmlhttp://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html 2 http://hbase.apache.org/**apidocs/org/apache/hadoop/** hbase/client/HTablePool.htmlhttp://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html Thanks! I'll change my code to use HtablePool On Mon, Jul 23, 2012 at 3:48 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am writing a stress tool to test my specific use case. In my current implementation HTable is a global static variable that I initialize just once and use it accross multiple threads. Is this ok? My row key consists of (timestamp - (timestamp % 1000)) and cols are counters. What I am seeing is that when I run my test after first row is created the application just hangs. I just wanted to check if there are obvious things that I should watch out for. I am currently testing few threads in eclipse, but I'll still try and generate stackTrace
Re: Insert blocked
I removed the close call and it works. So it looks like close call should be called only at the end. But then how does the pool know that the object is available if it's not returned to the pool explicitly? On Tue, Jul 24, 2012 at 10:00 AM, Mohit Anchlia mohitanch...@gmail.comwrote: On Tue, Jul 24, 2012 at 3:09 AM, Lyska Anton ant...@wildec.com wrote: Hi, after first insert you are closing your table in finally block. thats why thread hangs I thought I need to close HTableInterface to return it back to the pool. Is that not the case? 24.07.2012 3:41, Mohit Anchlia пишет: I am now using HTablePool but still the call hangs at put. My code is something like this: hTablePool = *new* HTablePool(config,*MAX_POOL_**SIZE*); result = *new* SessionTimelineDAO(hTablePool.**getTable(t.name()), ColumnFamily.*S_T_MTX*); public SessionTimelineDAO(**HTableInterface hTableInterface, ColumnFamily cf){ this.tableInt = hTableInterface; this.cf = cf.name().getBytes(); log.info(Table + hTableInterface + + cf); } @Override public void create(DataStoreModel dm) throws DataStoreException { if(null == dm || null == dm.getKey()){ log.error(DataStoreModel is invalid); return; } Put p = new Put(dm.getKey().array()); for(ByteBuffer bf : dm.getCols().keySet()){ p.add(cf, bf.array(), dm.getColumnValue(bf).array())**; } try { log.info(In create ); tableInt.put(p); } catch (IOException e) { log.error(Error writing , e); throw new DataStoreException(e); } finally{ cleanUp(); } } private void cleanUp() { if(null != tableInt){ try { tableInt.close(); } catch (IOException e) { log.error(Failed while closing table interface, e); } } } On Mon, Jul 23, 2012 at 4:15 PM, Mohit Anchlia mohitanch...@gmail.com wrote: On Mon, Jul 23, 2012 at 3:54 PM, Elliott Clark ecl...@stumbleupon.comwrote: HTable is not thread safe[1]. It's better to use HTablePool if you want to share things across multiple threads.[2] 1 http://hbase.apache.org/**apidocs/org/apache/hadoop/** hbase/client/HTable.htmlhttp://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html 2 http://hbase.apache.org/**apidocs/org/apache/hadoop/** hbase/client/HTablePool.htmlhttp://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html Thanks! I'll change my code to use HtablePool On Mon, Jul 23, 2012 at 3:48 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am writing a stress tool to test my specific use case. In my current implementation HTable is a global static variable that I initialize just once and use it accross multiple threads. Is this ok? My row key consists of (timestamp - (timestamp % 1000)) and cols are counters. What I am seeing is that when I run my test after first row is created the application just hangs. I just wanted to check if there are obvious things that I should watch out for. I am currently testing few threads in eclipse, but I'll still try and generate stackTrace
Re: Insert blocked
On Tue, Jul 24, 2012 at 12:55 PM, Elliott Clark ecl...@stumbleupon.comwrote: Thanks I hadn't seen that before Do you mean in your code you close HTableInterface after each put/get/scan operations? On Mon, Jul 23, 2012 at 10:29 PM, lars hofhansl lhofha...@yahoo.com wrote: Or you can pre-create your HConnection and Threadpool and use the HTable constructor that takes these as arguments. That is faster and less byzantine compared to the HTablePool monster. Also see here (if you don't mind the plug): http://hadoop-hbase.blogspot.com/2011/12/long-running-hbase-clients.html -- Lars - Original Message - From: Elliott Clark ecl...@stumbleupon.com To: user@hbase.apache.org Cc: Sent: Monday, July 23, 2012 3:54 PM Subject: Re: Insert blocked HTable is not thread safe[1]. It's better to use HTablePool if you want to share things across multiple threads.[2] 1 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html 2 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html On Mon, Jul 23, 2012 at 3:48 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am writing a stress tool to test my specific use case. In my current implementation HTable is a global static variable that I initialize just once and use it accross multiple threads. Is this ok? My row key consists of (timestamp - (timestamp % 1000)) and cols are counters. What I am seeing is that when I run my test after first row is created the application just hangs. I just wanted to check if there are obvious things that I should watch out for. I am currently testing few threads in eclipse, but I'll still try and generate stackTrace
Re: Enabling compression
Thanks! I was trying it out and I see this message when I use COMPRESSION, but it works when I don't use it. Am I doing something wrong? hbase(main):012:0 create 't2', {NAME = 'f1', VERSIONS = 1, COMPRESSION = 'LZO'} ERROR: org.apache.hadoop.hbase.client.RegionOfflineException: Only 0 of 1 regions are online; retries exhausted. hbase(main):014:0 create 't3', {NAME = 'f1', VERSIONS = 1} 0 row(s) in 1.1260 seconds On Tue, Jul 24, 2012 at 1:37 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: On Tue, Jul 24, 2012 at 1:34 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Also, if I understand it correctly, this will enable the compression for the new put but will not compresse the actual cells already stored right? For that, we need to run a major compaction of the table which will rewrite all the cells and so compact them? Yeah, although you may not want to recompact everything all at once in a live system. You can just let it happen naturally through cycles of flushes and compactions, it's all fine. J-D
Re: Enabling compression
On Tue, Jul 24, 2012 at 2:04 PM, Dhaval Shah prince_mithi...@yahoo.co.inwrote: I bet that your compression libraries are not available to HBase.. Run the compression test utility and see if it can find LZO That seems to be the case for SNAPPY. However, I do have snappy installed and it works with hadoop just fine and HBase is running on the same cluster. Is there something special I need to do for HBase? Regards, Dhaval - Original Message - From: Mohit Anchlia mohitanch...@gmail.com To: user@hbase.apache.org Cc: Sent: Tuesday, 24 July 2012 4:39 PM Subject: Re: Enabling compression Thanks! I was trying it out and I see this message when I use COMPRESSION, but it works when I don't use it. Am I doing something wrong? hbase(main):012:0 create 't2', {NAME = 'f1', VERSIONS = 1, COMPRESSION = 'LZO'} ERROR: org.apache.hadoop.hbase.client.RegionOfflineException: Only 0 of 1 regions are online; retries exhausted. hbase(main):014:0 create 't3', {NAME = 'f1', VERSIONS = 1} 0 row(s) in 1.1260 seconds On Tue, Jul 24, 2012 at 1:37 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: On Tue, Jul 24, 2012 at 1:34 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Also, if I understand it correctly, this will enable the compression for the new put but will not compresse the actual cells already stored right? For that, we need to run a major compaction of the table which will rewrite all the cells and so compact them? Yeah, although you may not want to recompact everything all at once in a live system. You can just let it happen naturally through cycles of flushes and compactions, it's all fine. J-D
Row distribution
Is there an easy way to tell how my nodes are balanced and how the rows are distributed in the cluster?
drop table
I am trying to drop one of the tables but on the shell I get run major_compact. I have couple of questions: 1. How to see if this table has more than one region? 2. And why do I need to run major compact hbase(main):010:0* drop 'SESSION_TIMELINE' ERROR: Table SESSION_TIMELINE is enabled. Disable it first.' Here is some help for this command: Drop the named table. Table must first be disabled. If table has more than one region, run a major compaction on .META.: hbase major_compact .META.
Re: drop table
Thanks! but I am still trying to understand these 2 questions: 1. How to see if this table has more than one region? 2. And why do I need to run major compact if I have more than one region? On Mon, Jul 23, 2012 at 1:14 PM, Mohammad Tariq donta...@gmail.com wrote: Hi Mohit, A table must be disabled first in order to get deleted. Regards, Mohammad Tariq On Tue, Jul 24, 2012 at 1:38 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am trying to drop one of the tables but on the shell I get run major_compact. I have couple of questions: 1. How to see if this table has more than one region? 2. And why do I need to run major compact hbase(main):010:0* drop 'SESSION_TIMELINE' ERROR: Table SESSION_TIMELINE is enabled. Disable it first.' Here is some help for this command: Drop the named table. Table must first be disabled. If table has more than one region, run a major compaction on .META.: hbase major_compact .META.
Re: drop table
Thanks everyone for your help On Mon, Jul 23, 2012 at 1:40 PM, Mohammad Tariq donta...@gmail.com wrote: Also, we don't have to worry about compaction under normal conditions. When something is written to HBase, it is first written to an in-memory store (memstore), once this memstore reaches a certain size, it is flushed to disk into a store file (everything is also written immediately to a log file for durability). The store files created on disk are immutable. Sometimes the store files are merged together, this is done by a process called compaction. Regards, Mohammad Tariq On Tue, Jul 24, 2012 at 2:00 AM, Mohammad Tariq donta...@gmail.com wrote: The HBase processes exposes a web-based user interface (in short UI), which you can use to gain insight into the cluster's state, as well as the tables it hosts. Just point your web browser to http://hmaster:60010;. Although majority of the functionality is read-only, but there are a few selected operation you can trigger through the UI(like splitting and compaction). Regards, Mohammad Tariq On Tue, Jul 24, 2012 at 1:56 AM, Rob Roland r...@simplymeasured.com wrote: You don't have to run the major compaction - the shell is doing that for you. You must disable the table first, like: disable 'session_timeline' drop 'session_timeline' See the admin.rb file: def drop(table_name) tableExists(table_name) raise ArgumentError, Table #{table_name} is enabled. Disable it first.' if enabled?(table_name) @admin.deleteTable(table_name) flush(org.apache.hadoop.hbase.HConstants::META_TABLE_NAME) major_compact(org.apache.hadoop.hbase.HConstants::META_TABLE_NAME) end On Mon, Jul 23, 2012 at 1:22 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Thanks! but I am still trying to understand these 2 questions: 1. How to see if this table has more than one region? 2. And why do I need to run major compact if I have more than one region? On Mon, Jul 23, 2012 at 1:14 PM, Mohammad Tariq donta...@gmail.com wrote: Hi Mohit, A table must be disabled first in order to get deleted. Regards, Mohammad Tariq On Tue, Jul 24, 2012 at 1:38 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am trying to drop one of the tables but on the shell I get run major_compact. I have couple of questions: 1. How to see if this table has more than one region? 2. And why do I need to run major compact hbase(main):010:0* drop 'SESSION_TIMELINE' ERROR: Table SESSION_TIMELINE is enabled. Disable it first.' Here is some help for this command: Drop the named table. Table must first be disabled. If table has more than one region, run a major compaction on .META.: hbase major_compact .META.
Insert blocked
I am writing a stress tool to test my specific use case. In my current implementation HTable is a global static variable that I initialize just once and use it accross multiple threads. Is this ok? My row key consists of (timestamp - (timestamp % 1000)) and cols are counters. What I am seeing is that when I run my test after first row is created the application just hangs. I just wanted to check if there are obvious things that I should watch out for. I am currently testing few threads in eclipse, but I'll still try and generate stackTrace
Re: Insert blocked
On Mon, Jul 23, 2012 at 3:54 PM, Elliott Clark ecl...@stumbleupon.comwrote: HTable is not thread safe[1]. It's better to use HTablePool if you want to share things across multiple threads.[2] 1 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html 2 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html Thanks! I'll change my code to use HtablePool On Mon, Jul 23, 2012 at 3:48 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am writing a stress tool to test my specific use case. In my current implementation HTable is a global static variable that I initialize just once and use it accross multiple threads. Is this ok? My row key consists of (timestamp - (timestamp % 1000)) and cols are counters. What I am seeing is that when I run my test after first row is created the application just hangs. I just wanted to check if there are obvious things that I should watch out for. I am currently testing few threads in eclipse, but I'll still try and generate stackTrace
Re: Insert blocked
I am now using HTablePool but still the call hangs at put. My code is something like this: hTablePool = *new* HTablePool(config,*MAX_POOL_SIZE*); result = *new* SessionTimelineDAO(hTablePool.getTable(t.name()), ColumnFamily.*S_T_MTX*); public SessionTimelineDAO(HTableInterface hTableInterface, ColumnFamily cf){ this.tableInt = hTableInterface; this.cf = cf.name().getBytes(); log.info(Table + hTableInterface + + cf); } @Override public void create(DataStoreModel dm) throws DataStoreException { if(null == dm || null == dm.getKey()){ log.error(DataStoreModel is invalid); return; } Put p = new Put(dm.getKey().array()); for(ByteBuffer bf : dm.getCols().keySet()){ p.add(cf, bf.array(), dm.getColumnValue(bf).array()); } try { log.info(In create ); tableInt.put(p); } catch (IOException e) { log.error(Error writing , e); throw new DataStoreException(e); } finally{ cleanUp(); } } private void cleanUp() { if(null != tableInt){ try { tableInt.close(); } catch (IOException e) { log.error(Failed while closing table interface, e); } } } On Mon, Jul 23, 2012 at 4:15 PM, Mohit Anchlia mohitanch...@gmail.comwrote: On Mon, Jul 23, 2012 at 3:54 PM, Elliott Clark ecl...@stumbleupon.comwrote: HTable is not thread safe[1]. It's better to use HTablePool if you want to share things across multiple threads.[2] 1 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html 2 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html Thanks! I'll change my code to use HtablePool On Mon, Jul 23, 2012 at 3:48 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am writing a stress tool to test my specific use case. In my current implementation HTable is a global static variable that I initialize just once and use it accross multiple threads. Is this ok? My row key consists of (timestamp - (timestamp % 1000)) and cols are counters. What I am seeing is that when I run my test after first row is created the application just hangs. I just wanted to check if there are obvious things that I should watch out for. I am currently testing few threads in eclipse, but I'll still try and generate stackTrace
Re: HBase shell
On Fri, Jul 20, 2012 at 6:11 PM, Alex Kozlov ale...@cloudera.com wrote: Is it printable? You should be able to call any java class from HBase shell: hbase(main):005:0 org.apache.hadoop.hbase.util.Bytes.toString(Hello HBase.to_java_bytes) = Hello HBase hbase(main):006:0 org.apache.hadoop.hbase.util.Bytes.toString(\x48\x65\x6c\x6c\x6f\x20\x48\x42\x61\x73\x65.to_java_bytes) = Hello HBase Thanks for the pointers. I'll try it out. -- Alex K On Fri, Jul 20, 2012 at 5:39 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Is there a command on the shell that convert byte into char array when using HBase shell command line? It's all in hex format hbase(main):004:0 scan 'SESSION_TIMELINE' ROW COLUMN+CELL \x00\x00\x00\x01\x7F\xFF\xE8\x034\x04\xCF\xFF column=S_T_MTX:\x07A\xB8\xB1, timestamp=1342826789668, value=Hello \x00\x00\x00\x01\x7F\xFF\xE81\xDC\xE4\x07\xFF column=S_T_MTX:\x04@ \xBB\x94, timestamp=1342826589226, value=Hello \x00\x00\x00\x01\x7F\xFF\xFE\xC7Y\x09\xA2\x7F column=S_T_MTX:\x00\x00O?, timestamp=1342830980018, value=Hello \x00\x00\x00\x01\x7F\xFF\xFE\xC7Y\x1B\xF1\xFF column=S_T_MTX:\x00\x00\x82\x19, timestamp=1342829793047, value=Hello \x00\x00\x00\x01\x7F\xFF\xFE\xC7Y\x1C\xDC_ column=S_T_MTX:\x00\x00S, timestamp=1342829721025, value=Hello \x00\x00\x00\x01\x7F\xFF\xFE\xC7Y\x1D\xC6\xBF column=S_T_MTX:\x00\x00\x8Az, timestamp=1342829675205, value=Hello \x00\x00\x00\x01\x7F\xFF\xFE\xC7Y \x85\xDF column=S_T_MTX:\x00\x00\x89\xDE, timestamp=1342829495072, value=Hello \x00\x00\x00\x01\x7F\xFF\xFE\xC7Y!p? column=S_T_MTX:\x00\x00b\xEA, timestamp=1342829425086, value=Hello
Re: HBase shell
On Fri, Jul 20, 2012 at 6:18 PM, Dhaval Shah prince_mithi...@yahoo.co.inwrote: Mohit, HBase shell is a JRuby wrapper and as such has all functions available which are available using Java API.. So you can import the Bytes class and the do a Bytes.toString() similar to what you'd do in Java Ah I see, you mean I change the HBase shell code? Regards, Dhaval From: Mohit Anchlia mohitanch...@gmail.com To: user@hbase.apache.org Sent: Friday, 20 July 2012 8:39 PM Subject: HBase shell Is there a command on the shell that convert byte into char array when using HBase shell command line? It's all in hex format hbase(main):004:0 scan 'SESSION_TIMELINE' ROW COLUMN+CELL \x00\x00\x00\x01\x7F\xFF\xE8\x034\x04\xCF\xFF column=S_T_MTX:\x07A\xB8\xB1, timestamp=1342826789668, value=Hello \x00\x00\x00\x01\x7F\xFF\xE81\xDC\xE4\x07\xFF column=S_T_MTX:\x04@ \xBB\x94, timestamp=1342826589226, value=Hello \x00\x00\x00\x01\x7F\xFF\xFE\xC7Y\x09\xA2\x7F column=S_T_MTX:\x00\x00O?, timestamp=1342830980018, value=Hello \x00\x00\x00\x01\x7F\xFF\xFE\xC7Y\x1B\xF1\xFF column=S_T_MTX:\x00\x00\x82\x19, timestamp=1342829793047, value=Hello \x00\x00\x00\x01\x7F\xFF\xFE\xC7Y\x1C\xDC_ column=S_T_MTX:\x00\x00S, timestamp=1342829721025, value=Hello \x00\x00\x00\x01\x7F\xFF\xFE\xC7Y\x1D\xC6\xBF column=S_T_MTX:\x00\x00\x8Az, timestamp=1342829675205, value=Hello \x00\x00\x00\x01\x7F\xFF\xFE\xC7Y \x85\xDF column=S_T_MTX:\x00\x00\x89\xDE, timestamp=1342829495072, value=Hello \x00\x00\x00\x01\x7F\xFF\xFE\xC7Y!p? column=S_T_MTX:\x00\x00b\xEA, timestamp=1342829425086, value=Hello
Scanning columns
I am designing a HBase schema as a timeseries model. Taking advice from the definitive guide and tsdb I am planning to use my row key as metricname:Long.MAX_VALUE - basetimestamp. And the column names would be timestamp-base timestamp. My col names would then look like 1,2,3,4,5 .. for instance. I am looking at Java API to see if I can do a range scan of columns, can I say fetch me columns starting at 1 and stop at 4? I see a scanner class for row scans but wondering if columns are sorted before storing and if I can do a range scan on them too.
Client side hbase-site.xml config
I just wanted to check if most people copu hbase-site.xml in the classpath or use some properties file as a resource and then set it in Configuration object returned by HBaseConfiguration.*create*();
Re: HBase Schema Design for clickstream data
Analysis include: Visitor level Session level - visitors could have multiple levels Page hits, conversions - popular pages, sequence of pages hit in one session Orders purchased - mostly determined by URL and query parameters How should I go about designing schema? Thanks Sent from my iPad On Jun 27, 2012, at 2:01 PM, Amandeep Khurana ama...@gmail.com wrote: Mohit, What would be your read patterns later on? Are you going to read per session, or for a time period, or for a set of users, or process through the entire dataset every time? That would play an important role in defining your keys and columns. -Amandeep On Tue, Jun 26, 2012 at 1:34 PM, Mohit Anchlia mohitanch...@gmail.comwrote: I am starting out with a new application where I need to store users clickstream data. I'll have Visitor Id, session id along with other page related data. I am wondering if I should just key off randomly generated session id and store all the page related data as columns inside that row assuming that this would also give good distribution accross region servers. In a session user could send 100s of HTML requests and get responses. If someone is already doing this in HBase I would like to learn more about it as to how they have designed the schema.
HBase Schema Design for clickstream data
I am starting out with a new application where I need to store users clickstream data. I'll have Visitor Id, session id along with other page related data. I am wondering if I should just key off randomly generated session id and store all the page related data as columns inside that row assuming that this would also give good distribution accross region servers. In a session user could send 100s of HTML requests and get responses. If someone is already doing this in HBase I would like to learn more about it as to how they have designed the schema.
HBase and Consistency in CAP
Why is HBase consisdered high in consistency and that it gives up parition tolerance? My understanding is that failure of one data node still doesn't impact client as they would re-adjust the list of available data nodes.
Re: HBase and Consistency in CAP
Where can I read more on this specific subject? Based on your answer I have more questions, but I want to read more specific information about how it works and why it's designed that way. On Fri, Dec 2, 2011 at 11:59 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: No, data is only served by one region server (even if it resides on multiple data nodes). If it dies, clients need to wait for the log replay and region reassignment. J-D On Fri, Dec 2, 2011 at 11:57 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Why is HBase consisdered high in consistency and that it gives up parition tolerance? My understanding is that failure of one data node still doesn't impact client as they would re-adjust the list of available data nodes.
Re: HBase and Consistency in CAP
Thanks for the overview. It's helpful. Can you also help me understand why 2 region servers for the same row keys can't be running on the nodes where blocks are being replicated? I am assuming all the logs/HFiles etc are already being replicated so if one region server fails other region server is still taking reads/writes. On Fri, Dec 2, 2011 at 12:15 PM, Ian Varley ivar...@salesforce.com wrote: Mohit, Yeah, those are great places to go and learn. To fill in a bit more on this topic: partition-tolerance usually refers to the idea that you could have a complete disconnection between N sets of machines in your data center, but still be taking writes and serving reads from all the servers. Some NoSQL databases can do this (to a degree), but HBase cannot; the master and ZK quorum must be accessible from any machine that's up and running the cluster. Individual machines can go down, as J-D said, and the master will reassign those regions to another region server. So, imagine you had a network switch fail that disconnected 10 machines in a 20-machine cluster; you wouldn't have 2 baby 10-machine clusters, like you might with some other software; you'd just have 10 machines down (and probably a significant interruption while the master replays logs on the remaining 10). That would also require that the underlying HDFS cluster (assuming it's on the same machines) was keeping replicas of the blocks on different racks (which it does by default), otherwise there's no hope. HBase makes this trade-off intentionally, because in real-world scenarios, there aren't too many cases where a true network partition would be survived by the rest of your stack, either (e.g. imagine a case where application servers can't access a relational database server because of a partition; you're just down). The focus of HBase fault tolerance is recovering from isolated machine failures, not the collapse of your infrastructure. Ian On Dec 2, 2011, at 2:03 PM, Jean-Daniel Cryans wrote: Get the HBase book: http://www.amazon.com/HBase-Definitive-Guide-Lars-George/dp/1449396100 And/Or read the Bigtable paper. J-D On Fri, Dec 2, 2011 at 12:01 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Where can I read more on this specific subject? Based on your answer I have more questions, but I want to read more specific information about how it works and why it's designed that way. On Fri, Dec 2, 2011 at 11:59 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: No, data is only served by one region server (even if it resides on multiple data nodes). If it dies, clients need to wait for the log replay and region reassignment. J-D On Fri, Dec 2, 2011 at 11:57 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Why is HBase consisdered high in consistency and that it gives up parition tolerance? My understanding is that failure of one data node still doesn't impact client as they would re-adjust the list of available data nodes.
Re: HBase and Consistency in CAP
Thanks. I am having just bit of conflict in understanding how is random node failure different than network partition? In both cases there is an impact clearly visible to the user (time it takes to failover and replay logs)? On Fri, Dec 2, 2011 at 1:42 PM, Ian Varley ivar...@salesforce.com wrote: The simple answer is that HBase isn't architected such that 2 region servers can simultaneously host the same region. In addition to being much simpler from an architecture point of view, that also allows for user-facing features that would be difficult or impossible to achieve otherwise: single-row put atomicity, atomic check-and-set operations, atomic increment operations, etc.--things that are only possible if you know for sure that exactly one machine is in control of the row. Ian On Dec 2, 2011, at 2:54 PM, Mohit Anchlia wrote: Thanks for the overview. It's helpful. Can you also help me understand why 2 region servers for the same row keys can't be running on the nodes where blocks are being replicated? I am assuming all the logs/HFiles etc are already being replicated so if one region server fails other region server is still taking reads/writes. On Fri, Dec 2, 2011 at 12:15 PM, Ian Varley ivar...@salesforce.commailto:ivar...@salesforce.com wrote: Mohit, Yeah, those are great places to go and learn. To fill in a bit more on this topic: partition-tolerance usually refers to the idea that you could have a complete disconnection between N sets of machines in your data center, but still be taking writes and serving reads from all the servers. Some NoSQL databases can do this (to a degree), but HBase cannot; the master and ZK quorum must be accessible from any machine that's up and running the cluster. Individual machines can go down, as J-D said, and the master will reassign those regions to another region server. So, imagine you had a network switch fail that disconnected 10 machines in a 20-machine cluster; you wouldn't have 2 baby 10-machine clusters, like you might with some other software; you'd just have 10 machines down (and probably a significant interruption while the master replays logs on the remaining 10). That would also require that the underlying HDFS cluster (assuming it's on the same machines) was keeping replicas of the blocks on different racks (which it does by default), otherwise there's no hope. HBase makes this trade-off intentionally, because in real-world scenarios, there aren't too many cases where a true network partition would be survived by the rest of your stack, either (e.g. imagine a case where application servers can't access a relational database server because of a partition; you're just down). The focus of HBase fault tolerance is recovering from isolated machine failures, not the collapse of your infrastructure. Ian On Dec 2, 2011, at 2:03 PM, Jean-Daniel Cryans wrote: Get the HBase book: http://www.amazon.com/HBase-Definitive-Guide-Lars-George/dp/1449396100 And/Or read the Bigtable paper. J-D On Fri, Dec 2, 2011 at 12:01 PM, Mohit Anchlia mohitanch...@gmail.commailto:mohitanch...@gmail.com wrote: Where can I read more on this specific subject? Based on your answer I have more questions, but I want to read more specific information about how it works and why it's designed that way. On Fri, Dec 2, 2011 at 11:59 AM, Jean-Daniel Cryans jdcry...@apache.orgmailto:jdcry...@apache.org wrote: No, data is only served by one region server (even if it resides on multiple data nodes). If it dies, clients need to wait for the log replay and region reassignment. J-D On Fri, Dec 2, 2011 at 11:57 AM, Mohit Anchlia mohitanch...@gmail.commailto:mohitanch...@gmail.com wrote: Why is HBase consisdered high in consistency and that it gives up parition tolerance? My understanding is that failure of one data node still doesn't impact client as they would re-adjust the list of available data nodes.
Atomicity questions
I have some questions about ACID after reading this page, http://hbase.apache.org/acid-semantics.html - Atomicity point 5 : row must either be a=1,b=1,c=1 or a=2,b=2,c=2 and must not be something like a=1,b=2,c=1. How is this internally handled in hbase such that above is possible?