Re: Delete rowKey in hexadecimal array bytes.
I found the error in my code, thanks. On 10/06/14 12:07, gortiz wrote: I think we are in different points :). The problem is when the comparation of the keys when I use Hex.decode I emit keyValues on this way and works, but it spends double of memory to store keys. rowKey = Bytes.toBytes(DigestUtils.*shaHex*(pOutput.getRow())); KeyValue kv = new KeyValue(Bytes.toBytes(pOutput.getRow()), family, null, ts, type, null); If I use for rowKey: rowKey = DigestUtils.*sha*(pOutput.getRow()); it doesn't work, I don't know why, since it's an byte array. But, I coded some Junit test and it never deletes the keys. On 09/06/14 13:43, Ted Yu wrote: Decode rowkey has timestamp. KeyValue has timestamp field. Do these two timestamps carry the same value ? Cheers On Jun 9, 2014, at 2:02 AM, Guillermo Ortiz konstt2...@gmail.com wrote: Hi, I'm generating key with SHA1, as it's a hex representation after generating the keys, I use Hex.decode to save memory since I could store them in half space. I have a MapReduce process which deletes some of these keys, the problem it's that it's that when I try to delete them, but I don't get it. If I don't do the parse to Hex, it works. So, For example, I put the keys in SHA like b343664e210e7a7abff3625a005e65e2b0d4616 works, but if I parse this key with Hex.decode to *\xB3CfN!\x0Ezz\xBF\xF3bZ\x00^e\xE2\xB0\ *column=l:dd, timestamp=1384317115000 it doesn't. I have been checked the code a lot but I think it's right, plus, if I comments the decode to Hex it works. Any clue about it? is there any problem with I am trying to?? -- *Guillermo Ortiz* /Big Data Developer/ Telf.: +34 917 680 490 Fax: +34 913 833 301 C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain _http://www.bidoop.es_
Re: How to decide the next HMaster?
But, it's the first which responses to Zookeeper or which creates the znode? I don't know how it works exactly this process, where could I read more about it? On 08/04/14 18:57, Jean-Daniel Cryans wrote: It's a simple leader election via ZooKeeper. J-D On Tue, Apr 8, 2014 at 7:18 AM, gortiz gor...@pragsis.com wrote: Could someone explain me which it's the process to select the next HMaster when the current one is gone down?? I've been looking for information about it in the documentation, but, I haven't found anything. -- *Guillermo Ortiz* /Big Data Developer/ Telf.: +34 917 680 490 Fax: +34 913 833 301 C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain _http://www.bidoop.es_
Re: Lease exception when I execute large scan with filters.
Well, I guessed that, what it doesn't make too much sense because it's so slow. I only have right now 100 rows with 1000 versions each row. I have checked the size of the dataset and each row is about 700Kbytes (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x 700Kbytes = 70Mb, since it just check the newest version. How can it spend too many time checking this quantity of data? I'm generating again the dataset with a bigger blocksize (previously was 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and baching parameters, but I don't think they're going to affect too much. Another test I want to do, it's generate the same dataset with just 100versions, It should spend around the same time, right? Or am I wrong? On 10/04/14 18:08, Ted Yu wrote: It should be newest version of each value. Cheers On Thu, Apr 10, 2014 at 9:55 AM, gortiz gor...@pragsis.com wrote: Another little question is, when the filter I'm using, Do I check all the versions? or just the newest? Because, I'm wondering if when I do a scan over all the table, I look for the value 5 in all the dataset or I'm just looking for in one newest version of each value. On 10/04/14 16:52, gortiz wrote: I was trying to check the behaviour of HBase. The cluster is a group of old computers, one master, five slaves, each one with 2Gb, so, 12gb in total. The table has a column family with 1000 columns and each column with 100 versions. There's another column faimily with four columns an one image of 100kb. (I've tried without this column family as well.) The table is partitioned manually in all the slaves, so data are balanced in the cluster. I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=, 'binary:5')* in HBase 0.94.6 My time for lease and rpc is three minutes. Since, it's a full scan of the table, I have been playing with the BLOCKCACHE as well (just disable and enable, not about the size of it). I thought that it was going to have too much calls to the GC. I'm not sure about this point. I know that it's not the best way to use HBase, it's just a test. I think that it's not working because the hardware isn't enough, although, I would like to try some kind of tunning to improve it. On 10/04/14 14:21, Ted Yu wrote: Can you give us a bit more information: HBase release you're running What filters are used for the scan Thanks On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote: I got this error when I execute a full scan with filters about a table. Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '-4165751462641113359' does not exist at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) at org.apache.hadoop.hbase.regionserver.HRegionServer. next(HRegionServer.java:2482) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call( WritableRpcEngine.java:320) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run( HBaseServer.java:1428) I have read about increase the lease time and rpc time, but it's not working.. what else could I try?? The table isn't too big. I have been checking the logs from GC, HMaster and some RegionServers and I didn't see anything weird. I tried as well to try with a couple of caching values. -- *Guillermo Ortiz* /Big Data Developer/ Telf.: +34 917 680 490 Fax: +34 913 833 301 C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain _http://www.bidoop.es_ -- *Guillermo Ortiz* /Big Data Developer/ Telf.: +34 917 680 490 Fax: +34 913 833 301 C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain _http://www.bidoop.es_
Re: Lease exception when I execute large scan with filters.
Last test I have done it's to reduce the number of versions to 100. So, right now, I have 100 rows with 100 versions each one. Times are: (I got the same times for blocksize of 64Ks and 1Mb) 100row-1000versions + blockcache- 80s. 100row-1000versions + No blockcache- 70s. 100row-*100*versions + blockcache- 7.3s. 100row-*100*versions + No blockcache- 6.1s. What's the reasons of this? I guess HBase is enough smart for not consider old versions, so, it just checks the newest. But, I reduce 10 times the size (in versions) and I got a 10x of performance. The filter is scan 'filters', {FILTER = ValueFilter(=, 'binary:5'),STARTROW = '10100101', STOPROW = '60100201'} On 11/04/14 09:04, gortiz wrote: Well, I guessed that, what it doesn't make too much sense because it's so slow. I only have right now 100 rows with 1000 versions each row. I have checked the size of the dataset and each row is about 700Kbytes (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x 700Kbytes = 70Mb, since it just check the newest version. How can it spend too many time checking this quantity of data? I'm generating again the dataset with a bigger blocksize (previously was 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and baching parameters, but I don't think they're going to affect too much. Another test I want to do, it's generate the same dataset with just 100versions, It should spend around the same time, right? Or am I wrong? On 10/04/14 18:08, Ted Yu wrote: It should be newest version of each value. Cheers On Thu, Apr 10, 2014 at 9:55 AM, gortiz gor...@pragsis.com wrote: Another little question is, when the filter I'm using, Do I check all the versions? or just the newest? Because, I'm wondering if when I do a scan over all the table, I look for the value 5 in all the dataset or I'm just looking for in one newest version of each value. On 10/04/14 16:52, gortiz wrote: I was trying to check the behaviour of HBase. The cluster is a group of old computers, one master, five slaves, each one with 2Gb, so, 12gb in total. The table has a column family with 1000 columns and each column with 100 versions. There's another column faimily with four columns an one image of 100kb. (I've tried without this column family as well.) The table is partitioned manually in all the slaves, so data are balanced in the cluster. I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=, 'binary:5')* in HBase 0.94.6 My time for lease and rpc is three minutes. Since, it's a full scan of the table, I have been playing with the BLOCKCACHE as well (just disable and enable, not about the size of it). I thought that it was going to have too much calls to the GC. I'm not sure about this point. I know that it's not the best way to use HBase, it's just a test. I think that it's not working because the hardware isn't enough, although, I would like to try some kind of tunning to improve it. On 10/04/14 14:21, Ted Yu wrote: Can you give us a bit more information: HBase release you're running What filters are used for the scan Thanks On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote: I got this error when I execute a full scan with filters about a table. Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '-4165751462641113359' does not exist at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) at org.apache.hadoop.hbase.regionserver.HRegionServer. next(HRegionServer.java:2482) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call( WritableRpcEngine.java:320) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run( HBaseServer.java:1428) I have read about increase the lease time and rpc time, but it's not working.. what else could I try?? The table isn't too big. I have been checking the logs from GC, HMaster and some RegionServers and I didn't see anything weird. I tried as well to try with a couple of caching values. -- *Guillermo Ortiz* /Big Data Developer/ Telf.: +34 917 680 490 Fax: +34 913 833 301 C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain _http://www.bidoop.es_ -- *Guillermo Ortiz* /Big Data Developer/ Telf.: +34 917 680 490 Fax: +34 913 833 301 C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain _http://www.bidoop.es_
Re: Lease exception when I execute large scan with filters.
Yes, I have tried with two different values for that value of versions, 1000 and maximum value for integers. But, I want to keep those versions. I don't want to keep just 3 versions. Imagine that I want to record a new version each minute and store a day, those are 1440 versions. Why is HBase going to read all the versions?? , I thought, if you don't indicate any versions it's just read the newest and skip the rest. It doesn't make too much sense to read all of them if data is sorted, plus the newest version is stored in the top. On 11/04/14 11:54, Anoop John wrote: What is the max version setting u have done for ur table cf? When u set some a value, HBase has to keep all those versions. During a scan it will read all those versions. In 94 version the default value for the max versions is 3. I guess you have set some bigger value. If u have not, mind testing after a major compaction? -Anoop- On Fri, Apr 11, 2014 at 1:01 PM, gortiz gor...@pragsis.com wrote: Last test I have done it's to reduce the number of versions to 100. So, right now, I have 100 rows with 100 versions each one. Times are: (I got the same times for blocksize of 64Ks and 1Mb) 100row-1000versions + blockcache- 80s. 100row-1000versions + No blockcache- 70s. 100row-*100*versions + blockcache- 7.3s. 100row-*100*versions + No blockcache- 6.1s. What's the reasons of this? I guess HBase is enough smart for not consider old versions, so, it just checks the newest. But, I reduce 10 times the size (in versions) and I got a 10x of performance. The filter is scan 'filters', {FILTER = ValueFilter(=, 'binary:5'),STARTROW = '10100101', STOPROW = '60100201'} On 11/04/14 09:04, gortiz wrote: Well, I guessed that, what it doesn't make too much sense because it's so slow. I only have right now 100 rows with 1000 versions each row. I have checked the size of the dataset and each row is about 700Kbytes (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x 700Kbytes = 70Mb, since it just check the newest version. How can it spend too many time checking this quantity of data? I'm generating again the dataset with a bigger blocksize (previously was 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and baching parameters, but I don't think they're going to affect too much. Another test I want to do, it's generate the same dataset with just 100versions, It should spend around the same time, right? Or am I wrong? On 10/04/14 18:08, Ted Yu wrote: It should be newest version of each value. Cheers On Thu, Apr 10, 2014 at 9:55 AM, gortiz gor...@pragsis.com wrote: Another little question is, when the filter I'm using, Do I check all the versions? or just the newest? Because, I'm wondering if when I do a scan over all the table, I look for the value 5 in all the dataset or I'm just looking for in one newest version of each value. On 10/04/14 16:52, gortiz wrote: I was trying to check the behaviour of HBase. The cluster is a group of old computers, one master, five slaves, each one with 2Gb, so, 12gb in total. The table has a column family with 1000 columns and each column with 100 versions. There's another column faimily with four columns an one image of 100kb. (I've tried without this column family as well.) The table is partitioned manually in all the slaves, so data are balanced in the cluster. I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=, 'binary:5')* in HBase 0.94.6 My time for lease and rpc is three minutes. Since, it's a full scan of the table, I have been playing with the BLOCKCACHE as well (just disable and enable, not about the size of it). I thought that it was going to have too much calls to the GC. I'm not sure about this point. I know that it's not the best way to use HBase, it's just a test. I think that it's not working because the hardware isn't enough, although, I would like to try some kind of tunning to improve it. On 10/04/14 14:21, Ted Yu wrote: Can you give us a bit more information: HBase release you're running What filters are used for the scan Thanks On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote: I got this error when I execute a full scan with filters about a table. Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase. regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '-4165751462641113359' does not exist at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) at org.apache.hadoop.hbase.regionserver.HRegionServer. next(HRegionServer.java:2482) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597
Re: Lease exception when I execute large scan with filters.
Sorry, I didn't get it why it should read all the timestamps and not just the newest it they're sorted and you didn't specific any timestamp in your filter. On 11/04/14 12:13, Anoop John wrote: In the storage layer (HFiles in HDFS) all versions of a particular cell will be staying together. (Yes it has to be lexicographically ordered KVs). So during a scan we will have to read all the version data. At this storage layer it doesn't know the versions stuff etc. -Anoop- On Fri, Apr 11, 2014 at 3:33 PM, gortiz gor...@pragsis.com wrote: Yes, I have tried with two different values for that value of versions, 1000 and maximum value for integers. But, I want to keep those versions. I don't want to keep just 3 versions. Imagine that I want to record a new version each minute and store a day, those are 1440 versions. Why is HBase going to read all the versions?? , I thought, if you don't indicate any versions it's just read the newest and skip the rest. It doesn't make too much sense to read all of them if data is sorted, plus the newest version is stored in the top. On 11/04/14 11:54, Anoop John wrote: What is the max version setting u have done for ur table cf? When u set some a value, HBase has to keep all those versions. During a scan it will read all those versions. In 94 version the default value for the max versions is 3. I guess you have set some bigger value. If u have not, mind testing after a major compaction? -Anoop- On Fri, Apr 11, 2014 at 1:01 PM, gortiz gor...@pragsis.com wrote: Last test I have done it's to reduce the number of versions to 100. So, right now, I have 100 rows with 100 versions each one. Times are: (I got the same times for blocksize of 64Ks and 1Mb) 100row-1000versions + blockcache- 80s. 100row-1000versions + No blockcache- 70s. 100row-*100*versions + blockcache- 7.3s. 100row-*100*versions + No blockcache- 6.1s. What's the reasons of this? I guess HBase is enough smart for not consider old versions, so, it just checks the newest. But, I reduce 10 times the size (in versions) and I got a 10x of performance. The filter is scan 'filters', {FILTER = ValueFilter(=, 'binary:5'),STARTROW = '10100101', STOPROW = '60100201'} On 11/04/14 09:04, gortiz wrote: Well, I guessed that, what it doesn't make too much sense because it's so slow. I only have right now 100 rows with 1000 versions each row. I have checked the size of the dataset and each row is about 700Kbytes (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x 700Kbytes = 70Mb, since it just check the newest version. How can it spend too many time checking this quantity of data? I'm generating again the dataset with a bigger blocksize (previously was 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and baching parameters, but I don't think they're going to affect too much. Another test I want to do, it's generate the same dataset with just 100versions, It should spend around the same time, right? Or am I wrong? On 10/04/14 18:08, Ted Yu wrote: It should be newest version of each value. Cheers On Thu, Apr 10, 2014 at 9:55 AM, gortiz gor...@pragsis.com wrote: Another little question is, when the filter I'm using, Do I check all the versions? or just the newest? Because, I'm wondering if when I do a scan over all the table, I look for the value 5 in all the dataset or I'm just looking for in one newest version of each value. On 10/04/14 16:52, gortiz wrote: I was trying to check the behaviour of HBase. The cluster is a group of old computers, one master, five slaves, each one with 2Gb, so, 12gb in total. The table has a column family with 1000 columns and each column with 100 versions. There's another column faimily with four columns an one image of 100kb. (I've tried without this column family as well.) The table is partitioned manually in all the slaves, so data are balanced in the cluster. I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=, 'binary:5')* in HBase 0.94.6 My time for lease and rpc is three minutes. Since, it's a full scan of the table, I have been playing with the BLOCKCACHE as well (just disable and enable, not about the size of it). I thought that it was going to have too much calls to the GC. I'm not sure about this point. I know that it's not the best way to use HBase, it's just a test. I think that it's not working because the hardware isn't enough, although, I would like to try some kind of tunning to improve it. On 10/04/14 14:21, Ted Yu wrote: Can you give us a bit more information: HBase release you're running What filters are used for the scan Thanks On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote: I got this error when I execute a full scan with filters about a table. Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase. regionserver.LeaseException
Re: BlockCache for large scans.
But, I think there's a direct relation between improving performance in large scan and memory for memstore. Until I understand, memstore just work as cache to write operations. On 09/04/14 23:44, Ted Yu wrote: Didn't quite get what you mean, Asaf. If you're talking about HBASE-5349, please read release note of HBASE-5349. By default, memstore min/max range is initialized to memstore percent: globalMemStorePercentMinRange = conf.getFloat( MEMSTORE_SIZE_MIN_RANGE_KEY, globalMemStorePercent); globalMemStorePercentMaxRange = conf.getFloat( MEMSTORE_SIZE_MAX_RANGE_KEY, globalMemStorePercent); Cheers On Wed, Apr 9, 2014 at 3:17 PM, Asaf Mesika asaf.mes...@gmail.com wrote: The Jira says it's enabled by auto. Is there an official explaining this feature? On Wednesday, April 9, 2014, Ted Yu yuzhih...@gmail.com wrote: Please take a look at http://www.n10k.com/blog/blockcache-101/ For D, hbase.regionserver.global.memstore.size is specified in terms of percentage of heap. Unless you enable HBASE-5349 'Automagically tweak global memstore and block cache sizes based on workload' On Wed, Apr 9, 2014 at 12:24 AM, gortiz gor...@pragsis.comjavascript:; wrote: I've been reading the book definitive guide and hbase in action a little. I found this question from Cloudera that I'm not sure after looking some benchmarks and documentations from HBase. Could someone explain me a little about? . I think that when you do a large scan you should disable the blockcache becuase the blocks are going to swat a lot, so you didn't get anything from cache, I guess you should be penalized since you're spending memory, calling GC and CPU with this task. *You want to do a full table scan on your data. You decide to disable block caching to see if this** **improves scan performance. Will disabling block caching improve scan performance?* A. No. Disabling block caching does not improve scan performance. B. Yes. When you disable block caching, you free up that memory for other operations. With a full table scan, you cannot take advantage of block caching anyway because your entire table won't fit into cache. C. No. If you disable block caching, HBase must read each block index from disk for each scan, thereby decreasing scan performance. D. Yes. When you disable block caching, you free up memory for MemStore, which improves, scan performance. -- *Guillermo Ortiz* /Big Data Developer/ Telf.: +34 917 680 490 Fax: +34 913 833 301 C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain _http://www.bidoop.es_
Lease exception when I execute large scan with filters.
I got this error when I execute a full scan with filters about a table. Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '-4165751462641113359' does not exist at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2482) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428) I have read about increase the lease time and rpc time, but it's not working.. what else could I try?? The table isn't too big. I have been checking the logs from GC, HMaster and some RegionServers and I didn't see anything weird. I tried as well to try with a couple of caching values.
Re: Lease exception when I execute large scan with filters.
I was trying to check the behaviour of HBase. The cluster is a group of old computers, one master, five slaves, each one with 2Gb, so, 12gb in total. The table has a column family with 1000 columns and each column with 100 versions. There's another column faimily with four columns an one image of 100kb. (I've tried without this column family as well.) The table is partitioned manually in all the slaves, so data are balanced in the cluster. I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=, 'binary:5')* in HBase 0.94.6 My time for lease and rpc is three minutes. Since, it's a full scan of the table, I have been playing with the BLOCKCACHE as well (just disable and enable, not about the size of it). I thought that it was going to have too much calls to the GC. I'm not sure about this point. I know that it's not the best way to use HBase, it's just a test. I think that it's not working because the hardware isn't enough, although, I would like to try some kind of tunning to improve it. On 10/04/14 14:21, Ted Yu wrote: Can you give us a bit more information: HBase release you're running What filters are used for the scan Thanks On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote: I got this error when I execute a full scan with filters about a table. Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '-4165751462641113359' does not exist at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2482) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428) I have read about increase the lease time and rpc time, but it's not working.. what else could I try?? The table isn't too big. I have been checking the logs from GC, HMaster and some RegionServers and I didn't see anything weird. I tried as well to try with a couple of caching values. -- *Guillermo Ortiz* /Big Data Developer/ Telf.: +34 917 680 490 Fax: +34 913 833 301 C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain _http://www.bidoop.es_
Re: Lease exception when I execute large scan with filters.
Another little question is, when the filter I'm using, Do I check all the versions? or just the newest? Because, I'm wondering if when I do a scan over all the table, I look for the value 5 in all the dataset or I'm just looking for in one newest version of each value. On 10/04/14 16:52, gortiz wrote: I was trying to check the behaviour of HBase. The cluster is a group of old computers, one master, five slaves, each one with 2Gb, so, 12gb in total. The table has a column family with 1000 columns and each column with 100 versions. There's another column faimily with four columns an one image of 100kb. (I've tried without this column family as well.) The table is partitioned manually in all the slaves, so data are balanced in the cluster. I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=, 'binary:5')* in HBase 0.94.6 My time for lease and rpc is three minutes. Since, it's a full scan of the table, I have been playing with the BLOCKCACHE as well (just disable and enable, not about the size of it). I thought that it was going to have too much calls to the GC. I'm not sure about this point. I know that it's not the best way to use HBase, it's just a test. I think that it's not working because the hardware isn't enough, although, I would like to try some kind of tunning to improve it. On 10/04/14 14:21, Ted Yu wrote: Can you give us a bit more information: HBase release you're running What filters are used for the scan Thanks On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote: I got this error when I execute a full scan with filters about a table. Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '-4165751462641113359' does not exist at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2482) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428) I have read about increase the lease time and rpc time, but it's not working.. what else could I try?? The table isn't too big. I have been checking the logs from GC, HMaster and some RegionServers and I didn't see anything weird. I tried as well to try with a couple of caching values. -- *Guillermo Ortiz* /Big Data Developer/ Telf.: +34 917 680 490 Fax: +34 913 833 301 C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain _http://www.bidoop.es_
BlockCache for large scans.
I've been reading the book definitive guide and hbase in action a little. I found this question from Cloudera that I'm not sure after looking some benchmarks and documentations from HBase. Could someone explain me a little about? . I think that when you do a large scan you should disable the blockcache becuase the blocks are going to swat a lot, so you didn't get anything from cache, I guess you should be penalized since you're spending memory, calling GC and CPU with this task. *You want to do a full table scan on your data. You decide to disable block caching to see if this** **improves scan performance. Will disabling block caching improve scan performance?* A. No. Disabling block caching does not improve scan performance. B. Yes. When you disable block caching, you free up that memory for other operations. With a full table scan, you cannot take advantage of block caching anyway because your entire table won't fit into cache. C. No. If you disable block caching, HBase must read each block index from disk for each scan, thereby decreasing scan performance. D. Yes. When you disable block caching, you free up memory for MemStore, which improves, scan performance.
Re: BlockCache for large scans.
Pretty interested the link, I'll keep it in my favorites. On 09/04/14 16:07, Ted Yu wrote: Please take a look at http://www.n10k.com/blog/blockcache-101/ For D, hbase.regionserver.global.memstore.size is specified in terms of percentage of heap. Unless you enable HBASE-5349 'Automagically tweak global memstore and block cache sizes based on workload' On Wed, Apr 9, 2014 at 12:24 AM, gortiz gor...@pragsis.com wrote: I've been reading the book definitive guide and hbase in action a little. I found this question from Cloudera that I'm not sure after looking some benchmarks and documentations from HBase. Could someone explain me a little about? . I think that when you do a large scan you should disable the blockcache becuase the blocks are going to swat a lot, so you didn't get anything from cache, I guess you should be penalized since you're spending memory, calling GC and CPU with this task. *You want to do a full table scan on your data. You decide to disable block caching to see if this** **improves scan performance. Will disabling block caching improve scan performance?* A. No. Disabling block caching does not improve scan performance. B. Yes. When you disable block caching, you free up that memory for other operations. With a full table scan, you cannot take advantage of block caching anyway because your entire table won't fit into cache. C. No. If you disable block caching, HBase must read each block index from disk for each scan, thereby decreasing scan performance. D. Yes. When you disable block caching, you free up memory for MemStore, which improves, scan performance. -- *Guillermo Ortiz* /Big Data Developer/ Telf.: +34 917 680 490 Fax: +34 913 833 301 C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain _http://www.bidoop.es_
How to decide the next HMaster?
Could someone explain me which it's the process to select the next HMaster when the current one is gone down?? I've been looking for information about it in the documentation, but, I haven't found anything.