What is the max version setting u have done for ur table cf?  When u set
some a value, HBase has to keep all those versions.  During a scan it will
read all those versions. In 94 version the default value for the max
versions is 3.  I guess you have set some bigger value.   If u have not,
mind testing after a major compaction?

-Anoop-

On Fri, Apr 11, 2014 at 1:01 PM, gortiz <gor...@pragsis.com> wrote:

> Last test I have done it's to reduce the number of versions to 100.
> So, right now, I have 100 rows with 100 versions each one.
> Times are: (I got the same times for blocksize of 64Ks and 1Mb)
> 100row-1000versions + blockcache-> 80s.
> 100row-1000versions + No blockcache-> 70s.
>
> 100row-*100*versions + blockcache-> 7.3s.
> 100row-*100*versions + No blockcache-> 6.1s.
>
> What's the reasons of this? I guess HBase is enough smart for not consider
> old versions, so, it just checks the newest. But, I reduce 10 times the
> size (in versions) and I got a 10x of performance.
>
> The filter is scan 'filters', {FILTER => "ValueFilter(=,
> 'binary:5')",STARTROW => '1010000000000000000000000000000000000101',
> STOPROW => '6010000000000000000000000000000000000201'}
>
>
>
> On 11/04/14 09:04, gortiz wrote:
>
>> Well, I guessed that, what it doesn't make too much sense because it's so
>> slow. I only have right now 100 rows with 1000 versions each row.
>> I have checked the size of the dataset and each row is about 700Kbytes
>> (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x
>> 700Kbytes = 70Mb, since it just check the newest version. How can it spend
>> too many time checking this quantity of data?
>>
>> I'm generating again the dataset with a bigger blocksize (previously was
>> 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
>> baching parameters, but I don't think they're going to affect too much.
>>
>> Another test I want to do, it's generate the same dataset with just
>> 100versions, It should spend around the same time, right? Or am I wrong?
>>
>> On 10/04/14 18:08, Ted Yu wrote:
>>
>>> It should be newest version of each value.
>>>
>>> Cheers
>>>
>>>
>>> On Thu, Apr 10, 2014 at 9:55 AM, gortiz <gor...@pragsis.com> wrote:
>>>
>>> Another little question is, when the filter I'm using, Do I check all the
>>>> versions? or just the newest? Because, I'm wondering if when I do a scan
>>>> over all the table, I look for the value "5" in all the dataset or I'm
>>>> just
>>>> looking for in one newest version of each value.
>>>>
>>>>
>>>> On 10/04/14 16:52, gortiz wrote:
>>>>
>>>> I was trying to check the behaviour of HBase. The cluster is a group of
>>>>> old computers, one master, five slaves, each one with 2Gb, so, 12gb in
>>>>> total.
>>>>> The table has a column family with 1000 columns and each column with
>>>>> 100
>>>>> versions.
>>>>> There's another column faimily with four columns an one image of 100kb.
>>>>>   (I've tried without this column family as well.)
>>>>> The table is partitioned manually in all the slaves, so data are
>>>>> balanced
>>>>> in the cluster.
>>>>>
>>>>> I'm executing this sentence *scan 'table1', {FILTER => "ValueFilter(=,
>>>>> 'binary:5')"* in HBase 0.94.6
>>>>> My time for lease and rpc is three minutes.
>>>>> Since, it's a full scan of the table, I have been playing with the
>>>>> BLOCKCACHE as well (just disable and enable, not about the size of
>>>>> it). I
>>>>> thought that it was going to have too much calls to the GC. I'm not
>>>>> sure
>>>>> about this point.
>>>>>
>>>>> I know that it's not the best way to use HBase, it's just a test. I
>>>>> think
>>>>> that it's not working because the hardware isn't enough, although, I
>>>>> would
>>>>> like to try some kind of tunning to improve it.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 10/04/14 14:21, Ted Yu wrote:
>>>>>
>>>>> Can you give us a bit more information:
>>>>>>
>>>>>> HBase release you're running
>>>>>> What filters are used for the scan
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Apr 10, 2014, at 2:36 AM, gortiz <gor...@pragsis.com> wrote:
>>>>>>
>>>>>>   I got this error when I execute a full scan with filters about a
>>>>>> table.
>>>>>>
>>>>>>> Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.
>>>>>>> regionserver.LeaseException:
>>>>>>> org.apache.hadoop.hbase.regionserver.LeaseException: lease
>>>>>>> '-4165751462641113359' does not exist
>>>>>>>      at 
>>>>>>> org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
>>>>>>>
>>>>>>>
>>>>>>>      at org.apache.hadoop.hbase.regionserver.HRegionServer.
>>>>>>> next(HRegionServer.java:2482)
>>>>>>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>      at sun.reflect.NativeMethodAccessorImpl.invoke(
>>>>>>> NativeMethodAccessorImpl.java:39)
>>>>>>>      at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>>>>>>> DelegatingMethodAccessorImpl.java:25)
>>>>>>>      at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>      at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
>>>>>>> WritableRpcEngine.java:320)
>>>>>>>      at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
>>>>>>> HBaseServer.java:1428)
>>>>>>>
>>>>>>> I have read about increase the lease time and rpc time, but it's not
>>>>>>> working.. what else could I try?? The table isn't too big. I have
>>>>>>> been
>>>>>>> checking the logs from GC, HMaster and some RegionServers and I
>>>>>>> didn't see
>>>>>>> anything weird. I tried as well to try with a couple of caching
>>>>>>> values.
>>>>>>>
>>>>>>>
>>>>> --
>>>> *Guillermo Ortiz*
>>>> /Big Data Developer/
>>>>
>>>> Telf.: +34 917 680 
>>>> 490<https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html#>
>>>> Fax: +34 913 833 
>>>> 301<https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html#>
>>>> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>>>>
>>>> _http://www.bidoop.es_
>>>>
>>>>
>>>>
>>
>
> --
> *Guillermo Ortiz*
> /Big Data Developer/
>
> Telf.: +34 917 680 
> 490<https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html#>
> Fax: +34 913 833 
> 301<https://mail.google.com/mail/u/0/html/compose/static_files/blank_quirks.html#>
> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>
> _http://www.bidoop.es_
>
>

Reply via email to