Re: Delete rowKey in hexadecimal array bytes.

2014-06-12 Thread gortiz

I found the error in my code, thanks.

On 10/06/14 12:07, gortiz wrote:

I think we are in different points :).

The problem is when the comparation of the keys when I use Hex.decode

I emit keyValues on this way and works, but it spends double of memory 
to store keys.


rowKey = Bytes.toBytes(DigestUtils.*shaHex*(pOutput.getRow()));
KeyValue kv =
new KeyValue(Bytes.toBytes(pOutput.getRow()), 
family, null, ts, type, null);


If I use for rowKey:
rowKey = DigestUtils.*sha*(pOutput.getRow());
it doesn't work, I don't know why, since it's an byte array. But, I 
coded some Junit test and it never deletes the keys.





On 09/06/14 13:43, Ted Yu wrote:

Decode rowkey has timestamp.
KeyValue has timestamp field.

Do these two timestamps carry the same value ?

Cheers

On Jun 9, 2014, at 2:02 AM, Guillermo Ortiz konstt2...@gmail.com 
wrote:



Hi,

I'm generating key with SHA1, as it's a hex representation after 
generating
the keys, I use Hex.decode to save memory since I could store them 
in half

space.

I have a MapReduce process which deletes some of these keys, the 
problem

it's that it's that when I try to delete them, but I don't get it. If I
don't do the parse to Hex, it works.

So, For example, I put the keys in SHA like
b343664e210e7a7abff3625a005e65e2b0d4616 works, but if I parse this 
key with

Hex.decode to *\xB3CfN!\x0Ezz\xBF\xF3bZ\x00^e\xE2\xB0\ *column=l:dd,
timestamp=1384317115000  it doesn't.

I have been checked the code a lot but I think it's right, plus, if I
comments the decode to Hex it works.

Any clue about it? is there any problem with I am trying to??






--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_



Re: How to decide the next HMaster?

2014-04-23 Thread gortiz


But, it's the first which responses to Zookeeper or which creates the 
znode? I don't know how it works exactly this process, where could I 
read more about it?


On 08/04/14 18:57, Jean-Daniel Cryans wrote:

It's a simple leader election via ZooKeeper.

J-D


On Tue, Apr 8, 2014 at 7:18 AM, gortiz gor...@pragsis.com wrote:


Could someone explain me which it's the process to select the next HMaster
when the current one is gone down?? I've been looking for information about
it in the documentation, but, I haven't found anything.






--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_



Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread gortiz
Well, I guessed that, what it doesn't make too much sense because it's 
so slow. I only have right now 100 rows with 1000 versions each row.
I have checked the size of the dataset and each row is about 700Kbytes 
(around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x 
700Kbytes = 70Mb, since it just check the newest version. How can it 
spend too many time checking this quantity of data?


I'm generating again the dataset with a bigger blocksize (previously was 
64Kb, now, it's going to be 1Mb). I could try tunning the scanning and 
baching parameters, but I don't think they're going to affect too much.


Another test I want to do, it's generate the same dataset with just 
100versions, It should spend around the same time, right? Or am I wrong?


On 10/04/14 18:08, Ted Yu wrote:

It should be newest version of each value.

Cheers


On Thu, Apr 10, 2014 at 9:55 AM, gortiz gor...@pragsis.com wrote:


Another little question is, when the filter I'm using, Do I check all the
versions? or just the newest? Because, I'm wondering if when I do a scan
over all the table, I look for the value 5 in all the dataset or I'm just
looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:


I was trying to check the behaviour of HBase. The cluster is a group of
old computers, one master, five slaves, each one with 2Gb, so, 12gb in
total.
The table has a column family with 1000 columns and each column with 100
versions.
There's another column faimily with four columns an one image of 100kb.
  (I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are balanced
in the cluster.

I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=,
'binary:5')* in HBase 0.94.6
My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the
BLOCKCACHE as well (just disable and enable, not about the size of it). I
thought that it was going to have too much calls to the GC. I'm not sure
about this point.

I know that it's not the best way to use HBase, it's just a test. I think
that it's not working because the hardware isn't enough, although, I would
like to try some kind of tunning to improve it.








On 10/04/14 14:21, Ted Yu wrote:


Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote:

  I got this error when I execute a full scan with filters about a table.

Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4165751462641113359' does not exist
 at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)

 at org.apache.hadoop.hbase.regionserver.HRegionServer.
next(HRegionServer.java:2482)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
WritableRpcEngine.java:320)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
HBaseServer.java:1428)

I have read about increase the lease time and rpc time, but it's not
working.. what else could I try?? The table isn't too big. I have been
checking the logs from GC, HMaster and some RegionServers and I didn't see
anything weird. I tried as well to try with a couple of caching values.




--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_




--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_



Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread gortiz

Last test I have done it's to reduce the number of versions to 100.
So, right now, I have 100 rows with 100 versions each one.
Times are: (I got the same times for blocksize of 64Ks and 1Mb)
100row-1000versions + blockcache- 80s.
100row-1000versions + No blockcache- 70s.

100row-*100*versions + blockcache- 7.3s.
100row-*100*versions + No blockcache- 6.1s.

What's the reasons of this? I guess HBase is enough smart for not 
consider old versions, so, it just checks the newest. But, I reduce 10 
times the size (in versions) and I got a 10x of performance.


The filter is scan 'filters', {FILTER = ValueFilter(=, 
'binary:5'),STARTROW = '10100101', 
STOPROW = '60100201'}



On 11/04/14 09:04, gortiz wrote:
Well, I guessed that, what it doesn't make too much sense because it's 
so slow. I only have right now 100 rows with 1000 versions each row.
I have checked the size of the dataset and each row is about 700Kbytes 
(around 7Gb, 100rowsx1000versions). So, it should only check 100 rows 
x 700Kbytes = 70Mb, since it just check the newest version. How can it 
spend too many time checking this quantity of data?


I'm generating again the dataset with a bigger blocksize (previously 
was 64Kb, now, it's going to be 1Mb). I could try tunning the scanning 
and baching parameters, but I don't think they're going to affect too 
much.


Another test I want to do, it's generate the same dataset with just 
100versions, It should spend around the same time, right? Or am I wrong?


On 10/04/14 18:08, Ted Yu wrote:

It should be newest version of each value.

Cheers


On Thu, Apr 10, 2014 at 9:55 AM, gortiz gor...@pragsis.com wrote:

Another little question is, when the filter I'm using, Do I check 
all the
versions? or just the newest? Because, I'm wondering if when I do a 
scan
over all the table, I look for the value 5 in all the dataset or 
I'm just

looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:

I was trying to check the behaviour of HBase. The cluster is a 
group of

old computers, one master, five slaves, each one with 2Gb, so, 12gb in
total.
The table has a column family with 1000 columns and each column 
with 100

versions.
There's another column faimily with four columns an one image of 
100kb.

  (I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are 
balanced

in the cluster.

I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=,
'binary:5')* in HBase 0.94.6
My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the
BLOCKCACHE as well (just disable and enable, not about the size of 
it). I
thought that it was going to have too much calls to the GC. I'm not 
sure

about this point.

I know that it's not the best way to use HBase, it's just a test. I 
think
that it's not working because the hardware isn't enough, although, 
I would

like to try some kind of tunning to improve it.








On 10/04/14 14:21, Ted Yu wrote:


Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote:

  I got this error when I execute a full scan with filters about a 
table.
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hbase.regionserver.LeaseException:

org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4165751462641113359' does not exist
 at 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) 



 at org.apache.hadoop.hbase.regionserver.HRegionServer.
next(HRegionServer.java:2482)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
WritableRpcEngine.java:320)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
HBaseServer.java:1428)

I have read about increase the lease time and rpc time, but it's not
working.. what else could I try?? The table isn't too big. I have 
been
checking the logs from GC, HMaster and some RegionServers and I 
didn't see
anything weird. I tried as well to try with a couple of caching 
values.





--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_







--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_



Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread gortiz
Yes, I have tried with two different values for that value of versions, 
1000 and maximum value for integers.


But, I want to keep those versions. I don't want to keep just 3 
versions. Imagine that I want to record a new version each minute and 
store a day, those are 1440 versions.


Why is HBase going to read all the versions?? , I thought, if you don't 
indicate any versions it's just read the newest and skip the rest. It 
doesn't make too much sense to read all of them if data is sorted, plus 
the newest version is stored in the top.



On 11/04/14 11:54, Anoop John wrote:

What is the max version setting u have done for ur table cf?  When u set
some a value, HBase has to keep all those versions.  During a scan it will
read all those versions. In 94 version the default value for the max
versions is 3.  I guess you have set some bigger value.   If u have not,
mind testing after a major compaction?

-Anoop-

On Fri, Apr 11, 2014 at 1:01 PM, gortiz gor...@pragsis.com wrote:


Last test I have done it's to reduce the number of versions to 100.
So, right now, I have 100 rows with 100 versions each one.
Times are: (I got the same times for blocksize of 64Ks and 1Mb)
100row-1000versions + blockcache- 80s.
100row-1000versions + No blockcache- 70s.

100row-*100*versions + blockcache- 7.3s.
100row-*100*versions + No blockcache- 6.1s.

What's the reasons of this? I guess HBase is enough smart for not consider
old versions, so, it just checks the newest. But, I reduce 10 times the
size (in versions) and I got a 10x of performance.

The filter is scan 'filters', {FILTER = ValueFilter(=,
'binary:5'),STARTROW = '10100101',
STOPROW = '60100201'}



On 11/04/14 09:04, gortiz wrote:


Well, I guessed that, what it doesn't make too much sense because it's so
slow. I only have right now 100 rows with 1000 versions each row.
I have checked the size of the dataset and each row is about 700Kbytes
(around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x
700Kbytes = 70Mb, since it just check the newest version. How can it spend
too many time checking this quantity of data?

I'm generating again the dataset with a bigger blocksize (previously was
64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
baching parameters, but I don't think they're going to affect too much.

Another test I want to do, it's generate the same dataset with just
100versions, It should spend around the same time, right? Or am I wrong?

On 10/04/14 18:08, Ted Yu wrote:


It should be newest version of each value.

Cheers


On Thu, Apr 10, 2014 at 9:55 AM, gortiz gor...@pragsis.com wrote:

Another little question is, when the filter I'm using, Do I check all the

versions? or just the newest? Because, I'm wondering if when I do a scan
over all the table, I look for the value 5 in all the dataset or I'm
just
looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:

I was trying to check the behaviour of HBase. The cluster is a group of

old computers, one master, five slaves, each one with 2Gb, so, 12gb in
total.
The table has a column family with 1000 columns and each column with
100
versions.
There's another column faimily with four columns an one image of 100kb.
   (I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are
balanced
in the cluster.

I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=,
'binary:5')* in HBase 0.94.6
My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the
BLOCKCACHE as well (just disable and enable, not about the size of
it). I
thought that it was going to have too much calls to the GC. I'm not
sure
about this point.

I know that it's not the best way to use HBase, it's just a test. I
think
that it's not working because the hardware isn't enough, although, I
would
like to try some kind of tunning to improve it.








On 10/04/14 14:21, Ted Yu wrote:

Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote:

   I got this error when I execute a full scan with filters about a
table.


Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.
regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4165751462641113359' does not exist
  at 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)


  at org.apache.hadoop.hbase.regionserver.HRegionServer.
next(HRegionServer.java:2482)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread gortiz
Sorry, I didn't get it why it should read all the timestamps and not 
just the newest it they're sorted and you didn't specific any timestamp 
in your filter.



On 11/04/14 12:13, Anoop John wrote:

In the storage layer (HFiles in HDFS) all versions of a particular cell
will be staying together.  (Yes it has to be lexicographically ordered
KVs). So during a scan we will have to read all the version data.  At this
storage layer it doesn't know the versions stuff etc.

-Anoop-

On Fri, Apr 11, 2014 at 3:33 PM, gortiz gor...@pragsis.com wrote:


Yes, I have tried with two different values for that value of versions,
1000 and maximum value for integers.

But, I want to keep those versions. I don't want to keep just 3 versions.
Imagine that I want to record a new version each minute and store a day,
those are 1440 versions.

Why is HBase going to read all the versions?? , I thought, if you don't
indicate any versions it's just read the newest and skip the rest. It
doesn't make too much sense to read all of them if data is sorted, plus the
newest version is stored in the top.



On 11/04/14 11:54, Anoop John wrote:


  What is the max version setting u have done for ur table cf?  When u set
some a value, HBase has to keep all those versions.  During a scan it will
read all those versions. In 94 version the default value for the max
versions is 3.  I guess you have set some bigger value.   If u have not,
mind testing after a major compaction?

-Anoop-

On Fri, Apr 11, 2014 at 1:01 PM, gortiz gor...@pragsis.com wrote:

  Last test I have done it's to reduce the number of versions to 100.

So, right now, I have 100 rows with 100 versions each one.
Times are: (I got the same times for blocksize of 64Ks and 1Mb)
100row-1000versions + blockcache- 80s.
100row-1000versions + No blockcache- 70s.

100row-*100*versions + blockcache- 7.3s.
100row-*100*versions + No blockcache- 6.1s.

What's the reasons of this? I guess HBase is enough smart for not
consider
old versions, so, it just checks the newest. But, I reduce 10 times the
size (in versions) and I got a 10x of performance.

The filter is scan 'filters', {FILTER = ValueFilter(=,
'binary:5'),STARTROW = '10100101',
STOPROW = '60100201'}



On 11/04/14 09:04, gortiz wrote:

  Well, I guessed that, what it doesn't make too much sense because it's

so
slow. I only have right now 100 rows with 1000 versions each row.
I have checked the size of the dataset and each row is about 700Kbytes
(around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x
700Kbytes = 70Mb, since it just check the newest version. How can it
spend
too many time checking this quantity of data?

I'm generating again the dataset with a bigger blocksize (previously was
64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
baching parameters, but I don't think they're going to affect too much.

Another test I want to do, it's generate the same dataset with just
100versions, It should spend around the same time, right? Or am I wrong?

On 10/04/14 18:08, Ted Yu wrote:

  It should be newest version of each value.

Cheers


On Thu, Apr 10, 2014 at 9:55 AM, gortiz gor...@pragsis.com wrote:

Another little question is, when the filter I'm using, Do I check all
the


  versions? or just the newest? Because, I'm wondering if when I do a
scan
over all the table, I look for the value 5 in all the dataset or I'm
just
looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:

I was trying to check the behaviour of HBase. The cluster is a group
of


old computers, one master, five slaves, each one with 2Gb, so, 12gb
in
total.
The table has a column family with 1000 columns and each column with
100
versions.
There's another column faimily with four columns an one image of
100kb.
(I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are
balanced
in the cluster.

I'm executing this sentence *scan 'table1', {FILTER =
ValueFilter(=,
'binary:5')* in HBase 0.94.6
My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the
BLOCKCACHE as well (just disable and enable, not about the size of
it). I
thought that it was going to have too much calls to the GC. I'm not
sure
about this point.

I know that it's not the best way to use HBase, it's just a test. I
think
that it's not working because the hardware isn't enough, although, I
would
like to try some kind of tunning to improve it.








On 10/04/14 14:21, Ted Yu wrote:

Can you give us a bit more information:


HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote:

I got this error when I execute a full scan with filters about a
table.

Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.

regionserver.LeaseException

Re: BlockCache for large scans.

2014-04-10 Thread gortiz
But, I think there's a direct relation between improving performance in 
large scan and memory for memstore. Until I understand, memstore just 
work as cache to write operations.


On 09/04/14 23:44, Ted Yu wrote:

Didn't quite get what you mean, Asaf.

If you're talking about HBASE-5349, please read release note of HBASE-5349.

By default, memstore min/max range is initialized to memstore percent:

 globalMemStorePercentMinRange = conf.getFloat(
MEMSTORE_SIZE_MIN_RANGE_KEY,

 globalMemStorePercent);

 globalMemStorePercentMaxRange = conf.getFloat(
MEMSTORE_SIZE_MAX_RANGE_KEY,

 globalMemStorePercent);

Cheers


On Wed, Apr 9, 2014 at 3:17 PM, Asaf Mesika asaf.mes...@gmail.com wrote:


The Jira says it's enabled by auto. Is there an official explaining this
feature?

On Wednesday, April 9, 2014, Ted Yu yuzhih...@gmail.com wrote:


Please take a look at http://www.n10k.com/blog/blockcache-101/

For D, hbase.regionserver.global.memstore.size is specified in terms of
percentage of heap. Unless you enable HBASE-5349 'Automagically tweak
global memstore and block cache sizes based on workload'


On Wed, Apr 9, 2014 at 12:24 AM, gortiz gor...@pragsis.comjavascript:;
wrote:


I've been reading the book definitive guide and hbase in action a

little.

I found this question from Cloudera that I'm not sure after looking

some

benchmarks and documentations from HBase. Could someone explain me a

little

about? . I think that when you do a large scan you should disable the
blockcache becuase the blocks are going to swat a lot, so you didn't

get

anything from cache, I guess you should be penalized since you're

spending

memory, calling GC and CPU with this task.

*You want to do a full table scan on your data. You decide to disable
block caching to see if this**
**improves scan performance. Will disabling block caching improve scan
performance?*

A.
No. Disabling block caching does not improve scan performance.

B.
Yes. When you disable block caching, you free up that memory for other
operations. With a full
table scan, you cannot take advantage of block caching anyway because

your

entire table won't fit
into cache.

C.
No. If you disable block caching, HBase must read each block index from
disk for each scan,
thereby decreasing scan performance.

D.
Yes. When you disable block caching, you free up memory for MemStore,
which improves,
scan performance.





--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_



Lease exception when I execute large scan with filters.

2014-04-10 Thread gortiz

I got this error when I execute a full scan with filters about a table.

Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hbase.regionserver.LeaseException: 
org.apache.hadoop.hbase.regionserver.LeaseException: lease 
'-4165751462641113359' does not exist
at 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2482)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)


I have read about increase the lease time and rpc time, but it's not 
working.. what else could I try?? The table isn't too big. I have been 
checking the logs from GC, HMaster and some RegionServers and I didn't 
see anything weird. I tried as well to try with a couple of caching values.


Re: Lease exception when I execute large scan with filters.

2014-04-10 Thread gortiz
I was trying to check the behaviour of HBase. The cluster is a group of 
old computers, one master, five slaves, each one with 2Gb, so, 12gb in 
total.
The table has a column family with 1000 columns and each column with 100 
versions.
There's another column faimily with four columns an one image of 100kb.  
(I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are 
balanced in the cluster.


I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=, 
'binary:5')* in HBase 0.94.6

My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the 
BLOCKCACHE as well (just disable and enable, not about the size of it). 
I thought that it was going to have too much calls to the GC. I'm not 
sure about this point.


I know that it's not the best way to use HBase, it's just a test. I 
think that it's not working because the hardware isn't enough, although, 
I would like to try some kind of tunning to improve it.









On 10/04/14 14:21, Ted Yu wrote:

Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote:


I got this error when I execute a full scan with filters about a table.

Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hbase.regionserver.LeaseException: 
org.apache.hadoop.hbase.regionserver.LeaseException: lease 
'-4165751462641113359' does not exist
at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2482)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)

I have read about increase the lease time and rpc time, but it's not working.. 
what else could I try?? The table isn't too big. I have been checking the logs 
from GC, HMaster and some RegionServers and I didn't see anything weird. I 
tried as well to try with a couple of caching values.



--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_



Re: Lease exception when I execute large scan with filters.

2014-04-10 Thread gortiz
Another little question is, when the filter I'm using, Do I check all 
the versions? or just the newest? Because, I'm wondering if when I do a 
scan over all the table, I look for the value 5 in all the dataset or 
I'm just looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:
I was trying to check the behaviour of HBase. The cluster is a group 
of old computers, one master, five slaves, each one with 2Gb, so, 12gb 
in total.
The table has a column family with 1000 columns and each column with 
100 versions.
There's another column faimily with four columns an one image of 
100kb.  (I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are 
balanced in the cluster.


I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=, 
'binary:5')* in HBase 0.94.6

My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the 
BLOCKCACHE as well (just disable and enable, not about the size of 
it). I thought that it was going to have too much calls to the GC. I'm 
not sure about this point.


I know that it's not the best way to use HBase, it's just a test. I 
think that it's not working because the hardware isn't enough, 
although, I would like to try some kind of tunning to improve it.









On 10/04/14 14:21, Ted Yu wrote:

Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote:


I got this error when I execute a full scan with filters about a table.

Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hbase.regionserver.LeaseException: 
org.apache.hadoop.hbase.regionserver.LeaseException: lease 
'-4165751462641113359' does not exist
at 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) 

at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2482)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)


I have read about increase the lease time and rpc time, but it's not 
working.. what else could I try?? The table isn't too big. I have 
been checking the logs from GC, HMaster and some RegionServers and I 
didn't see anything weird. I tried as well to try with a couple of 
caching values.






--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_



BlockCache for large scans.

2014-04-09 Thread gortiz
I've been reading the book definitive guide and hbase in action a 
little. I found this question from Cloudera that I'm not sure after 
looking some benchmarks and documentations from HBase. Could someone 
explain me a little about? . I think that when you do a large scan you 
should disable the blockcache becuase the blocks are going to swat a 
lot, so you didn't get anything from cache, I guess you should be 
penalized since you're spending memory, calling GC and CPU with this task.


*You want to do a full table scan on your data. You decide to disable 
block caching to see if this**
**improves scan performance. Will disabling block caching improve scan 
performance?*


A.
No. Disabling block caching does not improve scan performance.

B.
Yes. When you disable block caching, you free up that memory for other 
operations. With a full
table scan, you cannot take advantage of block caching anyway because 
your entire table won't fit

into cache.

C.
No. If you disable block caching, HBase must read each block index from 
disk for each scan,

thereby decreasing scan performance.

D.
Yes. When you disable block caching, you free up memory for MemStore, 
which improves,

scan performance.



Re: BlockCache for large scans.

2014-04-09 Thread gortiz

Pretty interested the link, I'll keep it in my favorites.



On 09/04/14 16:07, Ted Yu wrote:

Please take a look at http://www.n10k.com/blog/blockcache-101/

For D, hbase.regionserver.global.memstore.size is specified in terms of
percentage of heap. Unless you enable HBASE-5349 'Automagically tweak
global memstore and block cache sizes based on workload'


On Wed, Apr 9, 2014 at 12:24 AM, gortiz gor...@pragsis.com wrote:


I've been reading the book definitive guide and hbase in action a little.
I found this question from Cloudera that I'm not sure after looking some
benchmarks and documentations from HBase. Could someone explain me a little
about? . I think that when you do a large scan you should disable the
blockcache becuase the blocks are going to swat a lot, so you didn't get
anything from cache, I guess you should be penalized since you're spending
memory, calling GC and CPU with this task.

*You want to do a full table scan on your data. You decide to disable
block caching to see if this**
**improves scan performance. Will disabling block caching improve scan
performance?*

A.
No. Disabling block caching does not improve scan performance.

B.
Yes. When you disable block caching, you free up that memory for other
operations. With a full
table scan, you cannot take advantage of block caching anyway because your
entire table won't fit
into cache.

C.
No. If you disable block caching, HBase must read each block index from
disk for each scan,
thereby decreasing scan performance.

D.
Yes. When you disable block caching, you free up memory for MemStore,
which improves,
scan performance.





--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_



How to decide the next HMaster?

2014-04-08 Thread gortiz
Could someone explain me which it's the process to select the next 
HMaster when the current one is gone down?? I've been looking for 
information about it in the documentation, but, I haven't found anything.