RE: endpoint coprocessor

2014-04-11 Thread Bogala, Chandra Reddy
Thanks Yu. My understanding is, this coprocessor is available as part of Hbase 
server components. So I should be able to attach this coprocessor to any of my 
tables by using alter table command.



alter 'demo-table','COPROCESSOR' =>'|class|priority|args'



Then from hbase shell, I should be able to call this coprocessor with command. 
Same like how we do scan and filter. Is there any command like below(filter 
command) calling a coprocessor. So that it will run in region server and return 
back results?.



scan 'demo-table', {FILTER => 
org.apache.hadoop.hbase.filter.RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'),SubstringComparator.new("10001|1395309600"))}



I am trying to figure out a simple call mechanism(client) of coprocessor. If by 
default those classes and calling mechanism not available from hbase shell. 
Then planning to use java client code to invoke coprocessor.

Any pointers to java client to invoke 
aggregation
 coprocessor will be helpful.



Thanks,

Chandra



-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Friday, April 11, 2014 10:42 PM
To: user@hbase.apache.org
Subject: Re: endpoint coprocessor



Please take a look at :

hbase-shell/src/main/ruby/hbase/security.rb



for example on how coprocessor is activated from shell.



Cheers





On Fri, Apr 11, 2014 at 11:06 AM, Bogala, Chandra Reddy < 
chandra.bog...@gs.com> wrote:



> Thank you. I am aware of this challenge. How to call below coprocessor

> from client. Can I call this coprocessor from hbase shell?.  I am new

> to Hbase. So may be asking very dumb questions.

>

> Thanks,

> Chandra

>

> -Original Message-

> From: Asaf Mesika [mailto:asaf.mes...@gmail.com]

> Sent: Friday, April 11, 2014 12:10 PM

> To: user@hbase.apache.org

> Subject: Re: endpoint coprocessor

>

> Bear in mind each region will return its top n, then you will have to

> run another top n in your client code. This introduce a numerical

> error : top on top.

>

> On Thursday, April 10, 2014, Bogala, Chandra Reddy

> mailto:chandra.bog...@gs.com>>

> wrote:

>

> > Hi,

> > I am planning to write endpoint coprocessor to calculate TOP N

> > results for my usecase.  I got confused with old apis and new apis.

> > I followed below links and try to implement. But looks like api's

> > changed a lot. I don't see many of these classes in hbase jars. We

> > are using Hbase 0.96.

> > Can anyone point to the latest document/apis?. And if possible

> > sample code to calculate top N.

> >

> > https://blogs.apache.org/hbase/entry/coprocessor_introduction

> > HBase Coprocessors - Deploy shared functionality directly on the

> > cluster

> >

> > Thanks,

> > Chandra

> >

> >

> >

>


Re: HFile size writeup in HBase Blog

2014-04-11 Thread Ted Yu
Nice writeup, Doug.

Do you have plan to profile Prefix Tree data block encoding ?

Cheers


On Fri, Apr 11, 2014 at 3:14 PM, Doug Meil wrote:

> Hey folks,
>
> Stack published a writeup I did on the HBase blog on the effects of rowkey
> size, column-name size, CF compression, data block encoding and KV storage
> approach on HFile size.  For example, had large row keys vs. small row
> keys, used Snappy vs. LZO vs. etc., used prefix vs. fast-diff, used a KV
> per column vs. a single KV per row.  We tried 'em all... and wrote it up.
>
> http://blogs.apache.org/hbase/
>
>
> Doug Meil
> Chief Software Architect, Explorys
> doug.m...@explorysmedical.com
>
>
>


Re: BlockCache for large scans.

2014-04-11 Thread Jean-Marc Spaggiari
Ok. I see it in TableInputFormat:
// false by default, full table scans generate too much BC churn
scan.setCacheBlocks((conf.getBoolean(SCAN_CACHEBLOCKS, false)));

So ne need to to it too in initTableMapperJob I guess...

Thanks,

JM


2014-04-11 16:53 GMT-04:00 lars hofhansl :

> Yep. For all of our M/R jobs we do indeed disable the caching of blocks.
> In fact TableInputFormat sets cache blocks to false currently anyway.
>
> -- Lars
>
>   --
>  *From:* Jean-Marc Spaggiari 
> *To:* user ; lars hofhansl 
> *Sent:* Friday, April 11, 2014 6:54 AM
>
> *Subject:* Re: BlockCache for large scans.
>
> Hi Lars,
>
> So just to continue on that, when we are do MR jobs with HBase, this
> should be disable too since we will read the entire table, right? Is this
> done by default or it's something the client should setup manually? On my
> own code I setup this manually. I looked into
> TableMapReduceUtil.initTableMapperJob and there is nothing there. Should we
> not just set CacheBlocks to false in initTableMapperJob directly?
>
> JM
>
>
> 2014-04-10 14:50 GMT-04:00 lars hofhansl :
>
> Generally (and this is database lore not just HBase) if you use an LRU
> type cache, your working set does not fit into the cache, and you
> repeatedly scan this working set you have created the worst case scenario.
> The database does all the work caching the blocks, and subsequent scans
> will need block that were just evicted towards end of the previous scan.
>
> For large scans where it is likely that the entire scan does not fit into
> the block cache, you should absolutely disable caching the blocks traversed
> for this scan (i.e. scan.setCacheBlocks(false)). Index blocks are not
> affected, they are cached regardless.
>
> -- Lars
>
>
>
> 
>  From: gortiz 
> To: user@hbase.apache.org
> Sent: Wednesday, April 9, 2014 11:37 PM
> Subject: Re: BlockCache for large scans.
>
>
> But, I think there's a direct relation between improving performance in
> large scan and memory for memstore. Until I understand, memstore just
> work as cache to write operations.
>
>
> On 09/04/14 23:44, Ted Yu wrote:
> > Didn't quite get what you mean, Asaf.
> >
> > If you're talking about HBASE-5349, please read release note of
> HBASE-5349.
> >
> > By default, memstore min/max range is initialized to memstore percent:
> >
> >  globalMemStorePercentMinRange = conf.getFloat(
> > MEMSTORE_SIZE_MIN_RANGE_KEY,
> >
> >  globalMemStorePercent);
> >
> >  globalMemStorePercentMaxRange = conf.getFloat(
> > MEMSTORE_SIZE_MAX_RANGE_KEY,
> >
> >  globalMemStorePercent);
> >
> > Cheers
> >
> >
> > On Wed, Apr 9, 2014 at 3:17 PM, Asaf Mesika 
> wrote:
> >
> >> The Jira says it's enabled by auto. Is there an official explaining this
> >> feature?
> >>
> >> On Wednesday, April 9, 2014, Ted Yu  wrote:
> >>
> >>> Please take a look at http://www.n10k.com/blog/blockcache-101/
> >>>
> >>> For D, hbase.regionserver.global.memstore.size is specified in terms of
> >>> percentage of heap. Unless you enable HBASE-5349 'Automagically tweak
> >>> global memstore and block cache sizes based on workload'
> >>>
> >>>
> >>> On Wed, Apr 9, 2014 at 12:24 AM, gortiz  >
> >>> wrote:
> >>>
>  I've been reading the book definitive guide and hbase in action a
> >> little.
>  I found this question from Cloudera that I'm not sure after looking
> >> some
>  benchmarks and documentations from HBase. Could someone explain me a
> >>> little
>  about? . I think that when you do a large scan you should disable the
>  blockcache becuase the blocks are going to swat a lot, so you didn't
> >> get
>  anything from cache, I guess you should be penalized since you're
> >>> spending
>  memory, calling GC and CPU with this task.
> 
>  *You want to do a full table scan on your data. You decide to disable
>  block caching to see if this**
>  **improves scan performance. Will disabling block caching improve scan
>  performance?*
> 
>  A.
>  No. Disabling block caching does not improve scan performance.
> 
>  B.
>  Yes. When you disable block caching, you free up that memory for other
>  operations. With a full
>  table scan, you cannot take advantage of block caching anyway because
> >>> your
>  entire table won't fit
>  into cache.
> 
>  C.
>  No. If you disable block caching, HBase must read each block index
> from
>  disk for each scan,
>  thereby decreasing scan performance.
> 
>  D.
>  Yes. When you disable block caching, you free up memory for MemStore,
>  which improves,
>  scan performance.
> 
> 
>
>
> --
> *Guillermo Ortiz*
> /Big Data Developer/
>
> Telf.: +34 917 680 490
> Fax: +34 913 833 301
> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>
> _http://www.bidoop.es_
>
>
>
>
>


HFile size writeup in HBase Blog

2014-04-11 Thread Doug Meil
Hey folks,

Stack published a writeup I did on the HBase blog on the effects of rowkey 
size, column-name size, CF compression, data block encoding and KV storage 
approach on HFile size.  For example, had large row keys vs. small row keys, 
used Snappy vs. LZO vs. etc., used prefix vs. fast-diff, used a KV per column 
vs. a single KV per row.  We tried 'em all... and wrote it up.

http://blogs.apache.org/hbase/


Doug Meil
Chief Software Architect, Explorys
doug.m...@explorysmedical.com




Re: BlockCache for large scans.

2014-04-11 Thread lars hofhansl
Yep. For all of our M/R jobs we do indeed disable the caching of blocks.
In fact TableInputFormat sets cache blocks to false currently anyway.


-- Lars




 From: Jean-Marc Spaggiari 
To: user ; lars hofhansl  
Sent: Friday, April 11, 2014 6:54 AM
Subject: Re: BlockCache for large scans.
 


Hi Lars,

So just to continue on that, when we are do MR jobs with HBase, this should be 
disable too since we will read the entire table, right? Is this done by default 
or it's something the client should setup manually? On my own code I setup this 
manually. I looked into TableMapReduceUtil.initTableMapperJob and there is 
nothing there. Should we not just set CacheBlocks to false in 
initTableMapperJob directly?

JM




2014-04-10 14:50 GMT-04:00 lars hofhansl :

Generally (and this is database lore not just HBase) if you use an LRU type 
cache, your working set does not fit into the cache, and you repeatedly scan 
this working set you have created the worst case scenario. The database does 
all the work caching the blocks, and subsequent scans will need block that were 
just evicted towards end of the previous scan.
>
>For large scans where it is likely that the entire scan does not fit into the 
>block cache, you should absolutely disable caching the blocks traversed for 
>this scan (i.e. scan.setCacheBlocks(false)). Index blocks are not affected, 
>they are cached regardless.
>
>-- Lars
>
>
>
>
> From: gortiz 
>To: user@hbase.apache.org
>Sent: Wednesday, April 9, 2014 11:37 PM
>Subject: Re: BlockCache for large scans.
>
>
>
>But, I think there's a direct relation between improving performance in
>large scan and memory for memstore. Until I understand, memstore just
>work as cache to write operations.
>
>
>On 09/04/14 23:44, Ted Yu wrote:
>> Didn't quite get what you mean, Asaf.
>>
>> If you're talking about HBASE-5349, please read release note of HBASE-5349.
>>
>> By default, memstore min/max range is initialized to memstore percent:
>>
>>      globalMemStorePercentMinRange = conf.getFloat(
>> MEMSTORE_SIZE_MIN_RANGE_KEY,
>>
>>          globalMemStorePercent);
>>
>>      globalMemStorePercentMaxRange = conf.getFloat(
>> MEMSTORE_SIZE_MAX_RANGE_KEY,
>>
>>          globalMemStorePercent);
>>
>> Cheers
>>
>>
>> On Wed, Apr 9, 2014 at 3:17 PM, Asaf Mesika  wrote:
>>
>>> The Jira says it's enabled by auto. Is there an official explaining this
>>> feature?
>>>
>>> On Wednesday, April 9, 2014, Ted Yu  wrote:
>>>
 Please take a look at http://www.n10k.com/blog/blockcache-101/

 For D, hbase.regionserver.global.memstore.size is specified in terms of
 percentage of heap. Unless you enable HBASE-5349 'Automagically tweak
 global memstore and block cache sizes based on workload'


 On Wed, Apr 9, 2014 at 12:24 AM, gortiz >
 wrote:

> I've been reading the book definitive guide and hbase in action a
>>> little.
> I found this question from Cloudera that I'm not sure after looking
>>> some
> benchmarks and documentations from HBase. Could someone explain me a
 little
> about? . I think that when you do a large scan you should disable the
> blockcache becuase the blocks are going to swat a lot, so you didn't
>>> get
> anything from cache, I guess you should be penalized since you're
 spending
> memory, calling GC and CPU with this task.
>
> *You want to do a full table scan on your data. You decide to disable
> block caching to see if this**
> **improves scan performance. Will disabling block caching improve scan
> performance?*
>
> A.
> No. Disabling block caching does not improve scan performance.
>
> B.
> Yes. When you disable block caching, you free up that memory for other
> operations. With a full
> table scan, you cannot take advantage of block caching anyway because
 your
> entire table won't fit
> into cache.
>
> C.
> No. If you disable block caching, HBase must read each block index from
> disk for each scan,
> thereby decreasing scan performance.
>
> D.
> Yes. When you disable block caching, you free up memory for MemStore,
> which improves,
> scan performance.
>
>
>
>
>--
>*Guillermo Ortiz*
>/Big Data Developer/
>
>Telf.: +34 917 680 490
>Fax: +34 913 833 301
>C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>
>_http://www.bidoop.es_

Re: BlockCache for large scans.

2014-04-11 Thread Stack
On Fri, Apr 11, 2014 at 6:54 AM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> Hi Lars,
>
> So just to continue on that, when we are do MR jobs with HBase, this should
> be disable too since we will read the entire table, right? Is this done by
> default or it's something the client should setup manually? On my own code
> I setup this manually. I looked into TableMapReduceUtil.initTableMapperJob
> and there is nothing there. Should we not just set CacheBlocks to false in
> initTableMapperJob directly?
>

Yes.  Sounds right.
St.Ack


Re: endpoint coprocessor

2014-04-11 Thread Ted Yu
Please take a look at :
hbase-shell/src/main/ruby/hbase/security.rb

for example on how coprocessor is activated from shell.

Cheers


On Fri, Apr 11, 2014 at 11:06 AM, Bogala, Chandra Reddy <
chandra.bog...@gs.com> wrote:

> Thank you. I am aware of this challenge. How to call below coprocessor
> from client. Can I call this coprocessor from hbase shell?.  I am new to
> Hbase. So may be asking very dumb questions.
>
> Thanks,
> Chandra
>
> -Original Message-
> From: Asaf Mesika [mailto:asaf.mes...@gmail.com]
> Sent: Friday, April 11, 2014 12:10 PM
> To: user@hbase.apache.org
> Subject: Re: endpoint coprocessor
>
> Bear in mind each region will return its top n, then you will have to run
> another top n in your client code. This introduce a numerical error : top
> on top.
>
> On Thursday, April 10, 2014, Bogala, Chandra Reddy 
> wrote:
>
> > Hi,
> > I am planning to write endpoint coprocessor to calculate TOP N results
> > for my usecase.  I got confused with old apis and new apis.
> > I followed below links and try to implement. But looks like api's
> > changed a lot. I don't see many of these classes in hbase jars. We are
> > using Hbase 0.96.
> > Can anyone point to the latest document/apis?. And if possible sample
> > code to calculate top N.
> >
> > https://blogs.apache.org/hbase/entry/coprocessor_introduction
> > HBase Coprocessors - Deploy shared functionality directly on the
> > cluster
> >
> > Thanks,
> > Chandra
> >
> >
> >
>


RE: endpoint coprocessor

2014-04-11 Thread Bogala, Chandra Reddy
Thank you. I am aware of this challenge. How to call below coprocessor from 
client. Can I call this coprocessor from hbase shell?.  I am new to Hbase. So 
may be asking very dumb questions.

Thanks,
Chandra

-Original Message-
From: Asaf Mesika [mailto:asaf.mes...@gmail.com] 
Sent: Friday, April 11, 2014 12:10 PM
To: user@hbase.apache.org
Subject: Re: endpoint coprocessor

Bear in mind each region will return its top n, then you will have to run 
another top n in your client code. This introduce a numerical error : top on 
top.

On Thursday, April 10, 2014, Bogala, Chandra Reddy 
wrote:

> Hi,
> I am planning to write endpoint coprocessor to calculate TOP N results 
> for my usecase.  I got confused with old apis and new apis.
> I followed below links and try to implement. But looks like api's 
> changed a lot. I don't see many of these classes in hbase jars. We are 
> using Hbase 0.96.
> Can anyone point to the latest document/apis?. And if possible sample 
> code to calculate top N.
>
> https://blogs.apache.org/hbase/entry/coprocessor_introduction
> HBase Coprocessors - Deploy shared functionality directly on the 
> cluster
>
> Thanks,
> Chandra
>
>
>


Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread Guillermo Ortiz
Okay, thank you, I'll check it this Monday. I didn't know that Scan checks
all the versions.
So, I was checking each column and each version although it just showed me
the newest version because I didn't indicate anything about the VERSIONS
attribute. It makes sense that it takes so long.


2014-04-11 16:57 GMT+02:00 Ted Yu :

> In your previous example:
> scan 'table1', {FILTER => "ValueFilter(=, 'binary:5')"}
>
> there was no expression w.r.t. timestamp. See the following javadoc from
> Scan.java:
>
>  * To only retrieve columns within a specific range of version timestamps,
>
>  * execute {@link #setTimeRange(long, long) setTimeRange}.
>
>  * 
>
>  * To only retrieve columns with a specific timestamp, execute
>
>  * {@link #setTimeStamp(long) setTimestamp}.
>
> You can use one of the above methods to make your scan more selective.
>
>
> ValueFilter#filterKeyValue(Cell) doesn't utilize advanced feature of
> ReturnCode. You can refer to:
>
>
> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.ReturnCode.html
>
> You can take a look at SingleColumnValueFilter#filterKeyValue() for example
> of how various ReturnCode's are used to speed up scan.
>
> Cheers
>
>
> On Fri, Apr 11, 2014 at 8:40 AM, Guillermo Ortiz  >wrote:
>
> > I read something interesting about it in HBase TDG.
> >
> > Page 344:
> > The StoreScanner class combines the store files and memstore that the
> > Store instance
> > contains. It is also where the exclusion happens, based on the Bloom
> > filter, or the timestamp. If you are asking for versions that are not
> more
> > than 30 minutes old, for example, you can skip all storage files that are
> > older than one hour: they will not contain anything of interest. See "Key
> > Design" on page 357 for details on the exclusion, and how to make use of
> > it.
> >
> > So, I guess that it doesn't have to read all the HFiles?? But, I don't
> know
> > if HBase really uses the timestamp of each row or the date of the file. I
> > guess when I execute the scan, it reads everything, but, I don't know
> why.
> > I think there's something else that I don't see so that everything works
> to
> > me.
> >
> >
> > 2014-04-11 13:05 GMT+02:00 gortiz :
> >
> > > Sorry, I didn't get it why it should read all the timestamps and not
> just
> > > the newest it they're sorted and you didn't specific any timestamp in
> > your
> > > filter.
> > >
> > >
> > >
> > > On 11/04/14 12:13, Anoop John wrote:
> > >
> > >> In the storage layer (HFiles in HDFS) all versions of a particular
> cell
> > >> will be staying together.  (Yes it has to be lexicographically ordered
> > >> KVs). So during a scan we will have to read all the version data.  At
> > this
> > >> storage layer it doesn't know the versions stuff etc.
> > >>
> > >> -Anoop-
> > >>
> > >> On Fri, Apr 11, 2014 at 3:33 PM, gortiz  wrote:
> > >>
> > >>  Yes, I have tried with two different values for that value of
> versions,
> > >>> 1000 and maximum value for integers.
> > >>>
> > >>> But, I want to keep those versions. I don't want to keep just 3
> > versions.
> > >>> Imagine that I want to record a new version each minute and store a
> > day,
> > >>> those are 1440 versions.
> > >>>
> > >>> Why is HBase going to read all the versions?? , I thought, if you
> don't
> > >>> indicate any versions it's just read the newest and skip the rest. It
> > >>> doesn't make too much sense to read all of them if data is sorted,
> plus
> > >>> the
> > >>> newest version is stored in the top.
> > >>>
> > >>>
> > >>>
> > >>> On 11/04/14 11:54, Anoop John wrote:
> > >>>
> > >>>What is the max version setting u have done for ur table cf?
>  When u
> >  set
> >  some a value, HBase has to keep all those versions.  During a scan
> it
> >  will
> >  read all those versions. In 94 version the default value for the max
> >  versions is 3.  I guess you have set some bigger value.   If u have
> > not,
> >  mind testing after a major compaction?
> > 
> >  -Anoop-
> > 
> >  On Fri, Apr 11, 2014 at 1:01 PM, gortiz  wrote:
> > 
> >    Last test I have done it's to reduce the number of versions to
> 100.
> > 
> > > So, right now, I have 100 rows with 100 versions each one.
> > > Times are: (I got the same times for blocksize of 64Ks and 1Mb)
> > > 100row-1000versions + blockcache-> 80s.
> > > 100row-1000versions + No blockcache-> 70s.
> > >
> > > 100row-*100*versions + blockcache-> 7.3s.
> > > 100row-*100*versions + No blockcache-> 6.1s.
> > >
> > > What's the reasons of this? I guess HBase is enough smart for not
> > > consider
> > > old versions, so, it just checks the newest. But, I reduce 10 times
> > the
> > > size (in versions) and I got a 10x of performance.
> > >
> > > The filter is scan 'filters', {FILTER => "ValueFilter(=,
> > > 'binary:5')",STARTROW =>
> '10100101',
> > > STOPROW => '601000

Phoenix Testing HBASE-10850

2014-04-11 Thread Anoop John
Hi James
 Sorry for being late.
I have tested the same scenario. This works fine with Phoenix. :-)
Phoenix uses its own Filter not SCVF. In Phoenix Filter hasFilterRow() is
not implemented and by default it returns false. So the old 94 way of
filtering happens even in 98.1 code also and so things work perfectly.

I can see the issue PHOENIX-910. If this is implemented then we will end up
in issues.  So take care while doing PHOENIX-910

-Anoop-


On Thu, Apr 3, 2014 at 10:43 PM, James Taylor wrote:

> +1 to Andrew's suggestion. @Anoop - would you mind verifying whether or not
> the TestSCVFWithMiniCluster written as a Phoenix query returns the correct
> results?
>
>
> On Thu, Apr 3, 2014 at 9:34 AM, Andrew Purtell  >wrote:
>
> > This would be my preference also.
> >
> > Can someone provide a definitive statement on if a critical/blocker bug
> > exists for Phoenix or not? If not, we have sufficient votes at this point
> > to carry the RC and can go forward with the release at the end of the
> vote
> > period.
> >
> >
> > > On Apr 3, 2014, at 5:57 PM, James Taylor 
> wrote:
> > >
> > > I implore you to stick with releasing RC3. Phoenix 4.0 has no release
> it
> > > can currently run on. Phoenix doesn't use SingleColumnValueFilter, so
> it
> > > seems that HBASE-10850 has no impact wrt Phoenix. Can't we get these
> > > additional bugs in 0.98.2 - it's one month away [1]?
> > >
> > >James
> > >
> > > [1] http://en.wikipedia.org/wiki/The_Mythical_Man-Month
> > >
> > >
> > > On Thu, Apr 3, 2014 at 3:34 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasude...@gmail.com> wrote:
> > >
> > >> Will target HBASE-10899 also then by that time.
> > >>
> > >> Regards
> > >> Ram
> > >>
> > >>
> > >>> On Thu, Apr 3, 2014 at 3:47 PM, Ted Yu  wrote:
> > >>>
> > >>> Understood, Andy.
> > >>>
> > >>> I have integrated fix for HBASE-10850 to 0.98
> > >>>
> > >>> Cheers
> > >>>
> > >>>
> > >>> On Thu, Apr 3, 2014 at 3:00 AM, Andrew Purtell <
> > andrew.purt...@gmail.com
> >  wrote:
> > >>>
> >  I will sink this RC and roll a new one tomorrow.
> > 
> >  However, I may very well release the next RC even if I am the only
> +1
> > >>> vote
> >  and testing it causes your workstation to catch fire. So please take
> > >> the
> >  time to commit whatever you feel is needed to the 0.98 branch or
> file
> >  blockers against 0.98.1 in the next 24 hours. This is it for 0.98.1.
> >  0.98.2 will happen a mere 30 days from the 0.98.1 release.
> > 
> > > On Apr 3, 2014, at 11:21 AM, Ted Yu  wrote:
> > >
> > > I agree with Anoop's assessment.
> > >
> > > Cheers
> > >
> > >> On Apr 3, 2014, at 2:19 AM, Anoop John 
> > >> wrote:
> > >>
> > >> After analysing HBASE-10850  I think better we can fix this in
> 98.1
> >  release
> > >> itself.  Also Phoenix plan to use this 98.1 and Phoenix uses
> > >> essential
> >  CF
> > >> optimization.
> > >>
> > >> Also HBASE-10854 can be included in 98.1 in such a case,
> > >>
> > >> Considering those we need a new RC.
> > >>
> > >> -Anoop-
> > >>
> > >> On Tue, Apr 1, 2014 at 10:19 AM, ramkrishna vasudevan <
> > >> ramkrishna.s.vasude...@gmail.com> wrote:
> > >>
> > >>> +1 on the RC.
> > >>> Checked the signature.
> > >>> Downloaded the source, built and ran the testcases.
> > >>> Ran Integration Tests with ACL and Visibility labels.  Everything
> > >>> looks
> > >>> fine.
> > >>> Compaction, flushes etc too.
> > >>>
> > >>> Regards
> > >>> Ram
> > >>>
> > >>>
> > >>>
> >  On Tue, Apr 1, 2014 at 2:14 AM, Elliott Clark <
> ecl...@apache.org>
> >  wrote:
> > 
> >  +1
> > 
> >  Checked the hash
> >  Checked the tar layout.
> >  Played with a single node.  Everything seemed good after ITBLL
> > 
> > 
> > > On Mon, Mar 31, 2014 at 9:23 AM, Stack 
> wrote:
> > >
> > > +1
> > >
> > > The hash is good.  Doc. and layout looks good.  UI seems fine.
> > >
> > > Ran on small cluster w/ default hadoop 2.2 in hbase against a
> tip
> > >>> of
> > >>> the
> > > branch hadoop 2.4 cluster.  Seems to basically work (small big
> > >>> linked
> >  list
> > > test worked).
> > >
> > > TSDB seems to work fine against this RC.
> > >
> > > I don't mean to be stealing our Jon's thunder but in case he is
> > >> too
> > > occupied to vote here, I'll note that he has gotten our
> internal
> > >>> rig
> > > running against the tip of the 0.98 branch and it has been
> > >> passing
> > >>> green
> > > running IT tests on a small cluster over hours.
> > >
> > > St.Ack
> > >
> > >
> > >
> > >
> > > On Sun, Mar 30, 2014 at 12:49 AM, Andrew Purtell <
> >  apurt...@apache.org
> > 

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread Ted Yu
In your previous example:
scan 'table1', {FILTER => "ValueFilter(=, 'binary:5')"}

there was no expression w.r.t. timestamp. See the following javadoc from
Scan.java:

 * To only retrieve columns within a specific range of version timestamps,

 * execute {@link #setTimeRange(long, long) setTimeRange}.

 * 

 * To only retrieve columns with a specific timestamp, execute

 * {@link #setTimeStamp(long) setTimestamp}.

You can use one of the above methods to make your scan more selective.


ValueFilter#filterKeyValue(Cell) doesn't utilize advanced feature of
ReturnCode. You can refer to:

https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.ReturnCode.html

You can take a look at SingleColumnValueFilter#filterKeyValue() for example
of how various ReturnCode's are used to speed up scan.

Cheers


On Fri, Apr 11, 2014 at 8:40 AM, Guillermo Ortiz wrote:

> I read something interesting about it in HBase TDG.
>
> Page 344:
> The StoreScanner class combines the store files and memstore that the
> Store instance
> contains. It is also where the exclusion happens, based on the Bloom
> filter, or the timestamp. If you are asking for versions that are not more
> than 30 minutes old, for example, you can skip all storage files that are
> older than one hour: they will not contain anything of interest. See "Key
> Design" on page 357 for details on the exclusion, and how to make use of
> it.
>
> So, I guess that it doesn't have to read all the HFiles?? But, I don't know
> if HBase really uses the timestamp of each row or the date of the file. I
> guess when I execute the scan, it reads everything, but, I don't know why.
> I think there's something else that I don't see so that everything works to
> me.
>
>
> 2014-04-11 13:05 GMT+02:00 gortiz :
>
> > Sorry, I didn't get it why it should read all the timestamps and not just
> > the newest it they're sorted and you didn't specific any timestamp in
> your
> > filter.
> >
> >
> >
> > On 11/04/14 12:13, Anoop John wrote:
> >
> >> In the storage layer (HFiles in HDFS) all versions of a particular cell
> >> will be staying together.  (Yes it has to be lexicographically ordered
> >> KVs). So during a scan we will have to read all the version data.  At
> this
> >> storage layer it doesn't know the versions stuff etc.
> >>
> >> -Anoop-
> >>
> >> On Fri, Apr 11, 2014 at 3:33 PM, gortiz  wrote:
> >>
> >>  Yes, I have tried with two different values for that value of versions,
> >>> 1000 and maximum value for integers.
> >>>
> >>> But, I want to keep those versions. I don't want to keep just 3
> versions.
> >>> Imagine that I want to record a new version each minute and store a
> day,
> >>> those are 1440 versions.
> >>>
> >>> Why is HBase going to read all the versions?? , I thought, if you don't
> >>> indicate any versions it's just read the newest and skip the rest. It
> >>> doesn't make too much sense to read all of them if data is sorted, plus
> >>> the
> >>> newest version is stored in the top.
> >>>
> >>>
> >>>
> >>> On 11/04/14 11:54, Anoop John wrote:
> >>>
> >>>What is the max version setting u have done for ur table cf?  When u
>  set
>  some a value, HBase has to keep all those versions.  During a scan it
>  will
>  read all those versions. In 94 version the default value for the max
>  versions is 3.  I guess you have set some bigger value.   If u have
> not,
>  mind testing after a major compaction?
> 
>  -Anoop-
> 
>  On Fri, Apr 11, 2014 at 1:01 PM, gortiz  wrote:
> 
>    Last test I have done it's to reduce the number of versions to 100.
> 
> > So, right now, I have 100 rows with 100 versions each one.
> > Times are: (I got the same times for blocksize of 64Ks and 1Mb)
> > 100row-1000versions + blockcache-> 80s.
> > 100row-1000versions + No blockcache-> 70s.
> >
> > 100row-*100*versions + blockcache-> 7.3s.
> > 100row-*100*versions + No blockcache-> 6.1s.
> >
> > What's the reasons of this? I guess HBase is enough smart for not
> > consider
> > old versions, so, it just checks the newest. But, I reduce 10 times
> the
> > size (in versions) and I got a 10x of performance.
> >
> > The filter is scan 'filters', {FILTER => "ValueFilter(=,
> > 'binary:5')",STARTROW => '10100101',
> > STOPROW => '60100201'}
> >
> >
> >
> > On 11/04/14 09:04, gortiz wrote:
> >
> >   Well, I guessed that, what it doesn't make too much sense because
> > it's
> >
> >> so
> >> slow. I only have right now 100 rows with 1000 versions each row.
> >> I have checked the size of the dataset and each row is about
> 700Kbytes
> >> (around 7Gb, 100rowsx1000versions). So, it should only check 100
> rows
> >> x
> >> 700Kbytes = 70Mb, since it just check the newest version. How can it
> >> spend
> >> too many time checking this quantity of data?
> 

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread Guillermo Ortiz
I read something interesting about it in HBase TDG.

Page 344:
The StoreScanner class combines the store files and memstore that the
Store instance
contains. It is also where the exclusion happens, based on the Bloom
filter, or the timestamp. If you are asking for versions that are not more
than 30 minutes old, for example, you can skip all storage files that are
older than one hour: they will not contain anything of interest. See "Key
Design" on page 357 for details on the exclusion, and how to make use of
it.

So, I guess that it doesn't have to read all the HFiles?? But, I don't know
if HBase really uses the timestamp of each row or the date of the file. I
guess when I execute the scan, it reads everything, but, I don't know why.
I think there's something else that I don't see so that everything works to
me.


2014-04-11 13:05 GMT+02:00 gortiz :

> Sorry, I didn't get it why it should read all the timestamps and not just
> the newest it they're sorted and you didn't specific any timestamp in your
> filter.
>
>
>
> On 11/04/14 12:13, Anoop John wrote:
>
>> In the storage layer (HFiles in HDFS) all versions of a particular cell
>> will be staying together.  (Yes it has to be lexicographically ordered
>> KVs). So during a scan we will have to read all the version data.  At this
>> storage layer it doesn't know the versions stuff etc.
>>
>> -Anoop-
>>
>> On Fri, Apr 11, 2014 at 3:33 PM, gortiz  wrote:
>>
>>  Yes, I have tried with two different values for that value of versions,
>>> 1000 and maximum value for integers.
>>>
>>> But, I want to keep those versions. I don't want to keep just 3 versions.
>>> Imagine that I want to record a new version each minute and store a day,
>>> those are 1440 versions.
>>>
>>> Why is HBase going to read all the versions?? , I thought, if you don't
>>> indicate any versions it's just read the newest and skip the rest. It
>>> doesn't make too much sense to read all of them if data is sorted, plus
>>> the
>>> newest version is stored in the top.
>>>
>>>
>>>
>>> On 11/04/14 11:54, Anoop John wrote:
>>>
>>>What is the max version setting u have done for ur table cf?  When u
 set
 some a value, HBase has to keep all those versions.  During a scan it
 will
 read all those versions. In 94 version the default value for the max
 versions is 3.  I guess you have set some bigger value.   If u have not,
 mind testing after a major compaction?

 -Anoop-

 On Fri, Apr 11, 2014 at 1:01 PM, gortiz  wrote:

   Last test I have done it's to reduce the number of versions to 100.

> So, right now, I have 100 rows with 100 versions each one.
> Times are: (I got the same times for blocksize of 64Ks and 1Mb)
> 100row-1000versions + blockcache-> 80s.
> 100row-1000versions + No blockcache-> 70s.
>
> 100row-*100*versions + blockcache-> 7.3s.
> 100row-*100*versions + No blockcache-> 6.1s.
>
> What's the reasons of this? I guess HBase is enough smart for not
> consider
> old versions, so, it just checks the newest. But, I reduce 10 times the
> size (in versions) and I got a 10x of performance.
>
> The filter is scan 'filters', {FILTER => "ValueFilter(=,
> 'binary:5')",STARTROW => '10100101',
> STOPROW => '60100201'}
>
>
>
> On 11/04/14 09:04, gortiz wrote:
>
>   Well, I guessed that, what it doesn't make too much sense because
> it's
>
>> so
>> slow. I only have right now 100 rows with 1000 versions each row.
>> I have checked the size of the dataset and each row is about 700Kbytes
>> (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows
>> x
>> 700Kbytes = 70Mb, since it just check the newest version. How can it
>> spend
>> too many time checking this quantity of data?
>>
>> I'm generating again the dataset with a bigger blocksize (previously
>> was
>> 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
>> baching parameters, but I don't think they're going to affect too
>> much.
>>
>> Another test I want to do, it's generate the same dataset with just
>> 100versions, It should spend around the same time, right? Or am I
>> wrong?
>>
>> On 10/04/14 18:08, Ted Yu wrote:
>>
>>   It should be newest version of each value.
>>
>>> Cheers
>>>
>>>
>>> On Thu, Apr 10, 2014 at 9:55 AM, gortiz  wrote:
>>>
>>> Another little question is, when the filter I'm using, Do I check all
>>> the
>>>
>>>versions? or just the newest? Because, I'm wondering if when I do
 a
 scan
 over all the table, I look for the value "5" in all the dataset or
 I'm
 just
 looking for in one newest version of each value.


 On 10/04/14 16:52, gortiz wrote:

 I was trying to che

Re: BlockCache for large scans.

2014-04-11 Thread Jean-Marc Spaggiari
Hi Lars,

So just to continue on that, when we are do MR jobs with HBase, this should
be disable too since we will read the entire table, right? Is this done by
default or it's something the client should setup manually? On my own code
I setup this manually. I looked into TableMapReduceUtil.initTableMapperJob
and there is nothing there. Should we not just set CacheBlocks to false in
initTableMapperJob directly?

JM


2014-04-10 14:50 GMT-04:00 lars hofhansl :

> Generally (and this is database lore not just HBase) if you use an LRU
> type cache, your working set does not fit into the cache, and you
> repeatedly scan this working set you have created the worst case scenario.
> The database does all the work caching the blocks, and subsequent scans
> will need block that were just evicted towards end of the previous scan.
>
> For large scans where it is likely that the entire scan does not fit into
> the block cache, you should absolutely disable caching the blocks traversed
> for this scan (i.e. scan.setCacheBlocks(false)). Index blocks are not
> affected, they are cached regardless.
>
> -- Lars
>
>
>
> 
>  From: gortiz 
> To: user@hbase.apache.org
> Sent: Wednesday, April 9, 2014 11:37 PM
> Subject: Re: BlockCache for large scans.
>
>
> But, I think there's a direct relation between improving performance in
> large scan and memory for memstore. Until I understand, memstore just
> work as cache to write operations.
>
>
> On 09/04/14 23:44, Ted Yu wrote:
> > Didn't quite get what you mean, Asaf.
> >
> > If you're talking about HBASE-5349, please read release note of
> HBASE-5349.
> >
> > By default, memstore min/max range is initialized to memstore percent:
> >
> >  globalMemStorePercentMinRange = conf.getFloat(
> > MEMSTORE_SIZE_MIN_RANGE_KEY,
> >
> >  globalMemStorePercent);
> >
> >  globalMemStorePercentMaxRange = conf.getFloat(
> > MEMSTORE_SIZE_MAX_RANGE_KEY,
> >
> >  globalMemStorePercent);
> >
> > Cheers
> >
> >
> > On Wed, Apr 9, 2014 at 3:17 PM, Asaf Mesika 
> wrote:
> >
> >> The Jira says it's enabled by auto. Is there an official explaining this
> >> feature?
> >>
> >> On Wednesday, April 9, 2014, Ted Yu  wrote:
> >>
> >>> Please take a look at http://www.n10k.com/blog/blockcache-101/
> >>>
> >>> For D, hbase.regionserver.global.memstore.size is specified in terms of
> >>> percentage of heap. Unless you enable HBASE-5349 'Automagically tweak
> >>> global memstore and block cache sizes based on workload'
> >>>
> >>>
> >>> On Wed, Apr 9, 2014 at 12:24 AM, gortiz  >
> >>> wrote:
> >>>
>  I've been reading the book definitive guide and hbase in action a
> >> little.
>  I found this question from Cloudera that I'm not sure after looking
> >> some
>  benchmarks and documentations from HBase. Could someone explain me a
> >>> little
>  about? . I think that when you do a large scan you should disable the
>  blockcache becuase the blocks are going to swat a lot, so you didn't
> >> get
>  anything from cache, I guess you should be penalized since you're
> >>> spending
>  memory, calling GC and CPU with this task.
> 
>  *You want to do a full table scan on your data. You decide to disable
>  block caching to see if this**
>  **improves scan performance. Will disabling block caching improve scan
>  performance?*
> 
>  A.
>  No. Disabling block caching does not improve scan performance.
> 
>  B.
>  Yes. When you disable block caching, you free up that memory for other
>  operations. With a full
>  table scan, you cannot take advantage of block caching anyway because
> >>> your
>  entire table won't fit
>  into cache.
> 
>  C.
>  No. If you disable block caching, HBase must read each block index
> from
>  disk for each scan,
>  thereby decreasing scan performance.
> 
>  D.
>  Yes. When you disable block caching, you free up memory for MemStore,
>  which improves,
>  scan performance.
> 
> 
>
>
> --
> *Guillermo Ortiz*
> /Big Data Developer/
>
> Telf.: +34 917 680 490
> Fax: +34 913 833 301
> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>
> _http://www.bidoop.es_
>


Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread gortiz
Sorry, I didn't get it why it should read all the timestamps and not 
just the newest it they're sorted and you didn't specific any timestamp 
in your filter.



On 11/04/14 12:13, Anoop John wrote:

In the storage layer (HFiles in HDFS) all versions of a particular cell
will be staying together.  (Yes it has to be lexicographically ordered
KVs). So during a scan we will have to read all the version data.  At this
storage layer it doesn't know the versions stuff etc.

-Anoop-

On Fri, Apr 11, 2014 at 3:33 PM, gortiz  wrote:


Yes, I have tried with two different values for that value of versions,
1000 and maximum value for integers.

But, I want to keep those versions. I don't want to keep just 3 versions.
Imagine that I want to record a new version each minute and store a day,
those are 1440 versions.

Why is HBase going to read all the versions?? , I thought, if you don't
indicate any versions it's just read the newest and skip the rest. It
doesn't make too much sense to read all of them if data is sorted, plus the
newest version is stored in the top.



On 11/04/14 11:54, Anoop John wrote:


  What is the max version setting u have done for ur table cf?  When u set
some a value, HBase has to keep all those versions.  During a scan it will
read all those versions. In 94 version the default value for the max
versions is 3.  I guess you have set some bigger value.   If u have not,
mind testing after a major compaction?

-Anoop-

On Fri, Apr 11, 2014 at 1:01 PM, gortiz  wrote:

  Last test I have done it's to reduce the number of versions to 100.

So, right now, I have 100 rows with 100 versions each one.
Times are: (I got the same times for blocksize of 64Ks and 1Mb)
100row-1000versions + blockcache-> 80s.
100row-1000versions + No blockcache-> 70s.

100row-*100*versions + blockcache-> 7.3s.
100row-*100*versions + No blockcache-> 6.1s.

What's the reasons of this? I guess HBase is enough smart for not
consider
old versions, so, it just checks the newest. But, I reduce 10 times the
size (in versions) and I got a 10x of performance.

The filter is scan 'filters', {FILTER => "ValueFilter(=,
'binary:5')",STARTROW => '10100101',
STOPROW => '60100201'}



On 11/04/14 09:04, gortiz wrote:

  Well, I guessed that, what it doesn't make too much sense because it's

so
slow. I only have right now 100 rows with 1000 versions each row.
I have checked the size of the dataset and each row is about 700Kbytes
(around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x
700Kbytes = 70Mb, since it just check the newest version. How can it
spend
too many time checking this quantity of data?

I'm generating again the dataset with a bigger blocksize (previously was
64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
baching parameters, but I don't think they're going to affect too much.

Another test I want to do, it's generate the same dataset with just
100versions, It should spend around the same time, right? Or am I wrong?

On 10/04/14 18:08, Ted Yu wrote:

  It should be newest version of each value.

Cheers


On Thu, Apr 10, 2014 at 9:55 AM, gortiz  wrote:

Another little question is, when the filter I'm using, Do I check all
the


  versions? or just the newest? Because, I'm wondering if when I do a
scan
over all the table, I look for the value "5" in all the dataset or I'm
just
looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:

I was trying to check the behaviour of HBase. The cluster is a group
of


old computers, one master, five slaves, each one with 2Gb, so, 12gb
in
total.
The table has a column family with 1000 columns and each column with
100
versions.
There's another column faimily with four columns an one image of
100kb.
(I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are
balanced
in the cluster.

I'm executing this sentence *scan 'table1', {FILTER =>
"ValueFilter(=,
'binary:5')"* in HBase 0.94.6
My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the
BLOCKCACHE as well (just disable and enable, not about the size of
it). I
thought that it was going to have too much calls to the GC. I'm not
sure
about this point.

I know that it's not the best way to use HBase, it's just a test. I
think
that it's not working because the hardware isn't enough, although, I
would
like to try some kind of tunning to improve it.








On 10/04/14 14:21, Ted Yu wrote:

Can you give us a bit more information:


HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz  wrote:

I got this error when I execute a full scan with filters about a
table.

Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.

regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4165751462641113359' does not exist
 

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread Anoop John
In the storage layer (HFiles in HDFS) all versions of a particular cell
will be staying together.  (Yes it has to be lexicographically ordered
KVs). So during a scan we will have to read all the version data.  At this
storage layer it doesn't know the versions stuff etc.

-Anoop-

On Fri, Apr 11, 2014 at 3:33 PM, gortiz  wrote:

> Yes, I have tried with two different values for that value of versions,
> 1000 and maximum value for integers.
>
> But, I want to keep those versions. I don't want to keep just 3 versions.
> Imagine that I want to record a new version each minute and store a day,
> those are 1440 versions.
>
> Why is HBase going to read all the versions?? , I thought, if you don't
> indicate any versions it's just read the newest and skip the rest. It
> doesn't make too much sense to read all of them if data is sorted, plus the
> newest version is stored in the top.
>
>
>
> On 11/04/14 11:54, Anoop John wrote:
>
>>  What is the max version setting u have done for ur table cf?  When u set
>> some a value, HBase has to keep all those versions.  During a scan it will
>> read all those versions. In 94 version the default value for the max
>> versions is 3.  I guess you have set some bigger value.   If u have not,
>> mind testing after a major compaction?
>>
>> -Anoop-
>>
>> On Fri, Apr 11, 2014 at 1:01 PM, gortiz  wrote:
>>
>>  Last test I have done it's to reduce the number of versions to 100.
>>> So, right now, I have 100 rows with 100 versions each one.
>>> Times are: (I got the same times for blocksize of 64Ks and 1Mb)
>>> 100row-1000versions + blockcache-> 80s.
>>> 100row-1000versions + No blockcache-> 70s.
>>>
>>> 100row-*100*versions + blockcache-> 7.3s.
>>> 100row-*100*versions + No blockcache-> 6.1s.
>>>
>>> What's the reasons of this? I guess HBase is enough smart for not
>>> consider
>>> old versions, so, it just checks the newest. But, I reduce 10 times the
>>> size (in versions) and I got a 10x of performance.
>>>
>>> The filter is scan 'filters', {FILTER => "ValueFilter(=,
>>> 'binary:5')",STARTROW => '10100101',
>>> STOPROW => '60100201'}
>>>
>>>
>>>
>>> On 11/04/14 09:04, gortiz wrote:
>>>
>>>  Well, I guessed that, what it doesn't make too much sense because it's
 so
 slow. I only have right now 100 rows with 1000 versions each row.
 I have checked the size of the dataset and each row is about 700Kbytes
 (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x
 700Kbytes = 70Mb, since it just check the newest version. How can it
 spend
 too many time checking this quantity of data?

 I'm generating again the dataset with a bigger blocksize (previously was
 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
 baching parameters, but I don't think they're going to affect too much.

 Another test I want to do, it's generate the same dataset with just
 100versions, It should spend around the same time, right? Or am I wrong?

 On 10/04/14 18:08, Ted Yu wrote:

  It should be newest version of each value.
>
> Cheers
>
>
> On Thu, Apr 10, 2014 at 9:55 AM, gortiz  wrote:
>
> Another little question is, when the filter I'm using, Do I check all
> the
>
>>  versions? or just the newest? Because, I'm wondering if when I do a
>> scan
>> over all the table, I look for the value "5" in all the dataset or I'm
>> just
>> looking for in one newest version of each value.
>>
>>
>> On 10/04/14 16:52, gortiz wrote:
>>
>> I was trying to check the behaviour of HBase. The cluster is a group
>> of
>>
>>> old computers, one master, five slaves, each one with 2Gb, so, 12gb
>>> in
>>> total.
>>> The table has a column family with 1000 columns and each column with
>>> 100
>>> versions.
>>> There's another column faimily with four columns an one image of
>>> 100kb.
>>>(I've tried without this column family as well.)
>>> The table is partitioned manually in all the slaves, so data are
>>> balanced
>>> in the cluster.
>>>
>>> I'm executing this sentence *scan 'table1', {FILTER =>
>>> "ValueFilter(=,
>>> 'binary:5')"* in HBase 0.94.6
>>> My time for lease and rpc is three minutes.
>>> Since, it's a full scan of the table, I have been playing with the
>>> BLOCKCACHE as well (just disable and enable, not about the size of
>>> it). I
>>> thought that it was going to have too much calls to the GC. I'm not
>>> sure
>>> about this point.
>>>
>>> I know that it's not the best way to use HBase, it's just a test. I
>>> think
>>> that it's not working because the hardware isn't enough, although, I
>>> would
>>> like to try some kind of tunning to improve it.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 10/04/14 14:21, Ted Yu wro

Re: HBase cluster design

2014-04-11 Thread Flavio Pompermaier
Today I was able to catch an error during a mapreduce job that actually
mimes the rowCount more or less.
The error I saw is:

ould not sync. Requesting close of hlog
java.io.IOException: Reflection
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:230)
at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1141)
at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1245)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1100)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:228)
... 4 more
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on 
/hbase/.logs/host4,60020,1395928532020/host4%2C60020%2C1395928532020.1397205288300
File does not exist. Holder DFSClient_NONMAPREDUCE_-1746149332_40 does
not have any open files.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2308)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2299)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2095)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)

at org.apache.hadoop.ipc.Client.call(Client.java:1160)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy14.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy14.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)


What can be the cause of this error?

On Sat, Apr 5, 2014 at 2:25 PM, Michael Segel wrote:

> You have one other thing to consider.
>
> Did you oversubscribe on the m/r tuning side of things.
>
> Many people want to segment their HBase to a portion of the cluster.
> This should be the exception to the design not the primary cluster design.
>
> If you over subscribe your cluster, you will run out of memory, then you
> need to swap, and boom bad things happen.
>
> Also, while many suggest not reserving room for swap... I suggest that you
> do leave some room.
>
> While this doesn't address the issues in your question directly, they are
> something that you need to consider.
>
> More to your point...
> Poorly tuned HBase clusters can fail easily under heavy load.
>
> While Ted doesn't address this... consideration, it can become an issue.
>
> YMMV of course.
>
>
>
> On Apr 4, 2014, at 9:43 AM, Ted Yu  wrote:
>
> > The 'Connection refused' message was logged at WARN level.
> >
> > If you can pastebin more of the region server log before its crash, I
> would
> > be take a deeper look.
> >
> > BTW I assume your zookeeper quorum was healthy during that period of
> time.

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread gortiz
Yes, I have tried with two different values for that value of versions, 
1000 and maximum value for integers.


But, I want to keep those versions. I don't want to keep just 3 
versions. Imagine that I want to record a new version each minute and 
store a day, those are 1440 versions.


Why is HBase going to read all the versions?? , I thought, if you don't 
indicate any versions it's just read the newest and skip the rest. It 
doesn't make too much sense to read all of them if data is sorted, plus 
the newest version is stored in the top.



On 11/04/14 11:54, Anoop John wrote:

What is the max version setting u have done for ur table cf?  When u set
some a value, HBase has to keep all those versions.  During a scan it will
read all those versions. In 94 version the default value for the max
versions is 3.  I guess you have set some bigger value.   If u have not,
mind testing after a major compaction?

-Anoop-

On Fri, Apr 11, 2014 at 1:01 PM, gortiz  wrote:


Last test I have done it's to reduce the number of versions to 100.
So, right now, I have 100 rows with 100 versions each one.
Times are: (I got the same times for blocksize of 64Ks and 1Mb)
100row-1000versions + blockcache-> 80s.
100row-1000versions + No blockcache-> 70s.

100row-*100*versions + blockcache-> 7.3s.
100row-*100*versions + No blockcache-> 6.1s.

What's the reasons of this? I guess HBase is enough smart for not consider
old versions, so, it just checks the newest. But, I reduce 10 times the
size (in versions) and I got a 10x of performance.

The filter is scan 'filters', {FILTER => "ValueFilter(=,
'binary:5')",STARTROW => '10100101',
STOPROW => '60100201'}



On 11/04/14 09:04, gortiz wrote:


Well, I guessed that, what it doesn't make too much sense because it's so
slow. I only have right now 100 rows with 1000 versions each row.
I have checked the size of the dataset and each row is about 700Kbytes
(around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x
700Kbytes = 70Mb, since it just check the newest version. How can it spend
too many time checking this quantity of data?

I'm generating again the dataset with a bigger blocksize (previously was
64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
baching parameters, but I don't think they're going to affect too much.

Another test I want to do, it's generate the same dataset with just
100versions, It should spend around the same time, right? Or am I wrong?

On 10/04/14 18:08, Ted Yu wrote:


It should be newest version of each value.

Cheers


On Thu, Apr 10, 2014 at 9:55 AM, gortiz  wrote:

Another little question is, when the filter I'm using, Do I check all the

versions? or just the newest? Because, I'm wondering if when I do a scan
over all the table, I look for the value "5" in all the dataset or I'm
just
looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:

I was trying to check the behaviour of HBase. The cluster is a group of

old computers, one master, five slaves, each one with 2Gb, so, 12gb in
total.
The table has a column family with 1000 columns and each column with
100
versions.
There's another column faimily with four columns an one image of 100kb.
   (I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are
balanced
in the cluster.

I'm executing this sentence *scan 'table1', {FILTER => "ValueFilter(=,
'binary:5')"* in HBase 0.94.6
My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the
BLOCKCACHE as well (just disable and enable, not about the size of
it). I
thought that it was going to have too much calls to the GC. I'm not
sure
about this point.

I know that it's not the best way to use HBase, it's just a test. I
think
that it's not working because the hardware isn't enough, although, I
would
like to try some kind of tunning to improve it.








On 10/04/14 14:21, Ted Yu wrote:

Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz  wrote:

   I got this error when I execute a full scan with filters about a
table.


Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.
regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4165751462641113359' does not exist
  at 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)


  at org.apache.hadoop.hbase.regionserver.HRegionServer.
next(HRegionServer.java:2482)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.hbase.ipc.WritableRpcEngin

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread Anoop John
What is the max version setting u have done for ur table cf?  When u set
some a value, HBase has to keep all those versions.  During a scan it will
read all those versions. In 94 version the default value for the max
versions is 3.  I guess you have set some bigger value.   If u have not,
mind testing after a major compaction?

-Anoop-

On Fri, Apr 11, 2014 at 1:01 PM, gortiz  wrote:

> Last test I have done it's to reduce the number of versions to 100.
> So, right now, I have 100 rows with 100 versions each one.
> Times are: (I got the same times for blocksize of 64Ks and 1Mb)
> 100row-1000versions + blockcache-> 80s.
> 100row-1000versions + No blockcache-> 70s.
>
> 100row-*100*versions + blockcache-> 7.3s.
> 100row-*100*versions + No blockcache-> 6.1s.
>
> What's the reasons of this? I guess HBase is enough smart for not consider
> old versions, so, it just checks the newest. But, I reduce 10 times the
> size (in versions) and I got a 10x of performance.
>
> The filter is scan 'filters', {FILTER => "ValueFilter(=,
> 'binary:5')",STARTROW => '10100101',
> STOPROW => '60100201'}
>
>
>
> On 11/04/14 09:04, gortiz wrote:
>
>> Well, I guessed that, what it doesn't make too much sense because it's so
>> slow. I only have right now 100 rows with 1000 versions each row.
>> I have checked the size of the dataset and each row is about 700Kbytes
>> (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x
>> 700Kbytes = 70Mb, since it just check the newest version. How can it spend
>> too many time checking this quantity of data?
>>
>> I'm generating again the dataset with a bigger blocksize (previously was
>> 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
>> baching parameters, but I don't think they're going to affect too much.
>>
>> Another test I want to do, it's generate the same dataset with just
>> 100versions, It should spend around the same time, right? Or am I wrong?
>>
>> On 10/04/14 18:08, Ted Yu wrote:
>>
>>> It should be newest version of each value.
>>>
>>> Cheers
>>>
>>>
>>> On Thu, Apr 10, 2014 at 9:55 AM, gortiz  wrote:
>>>
>>> Another little question is, when the filter I'm using, Do I check all the
 versions? or just the newest? Because, I'm wondering if when I do a scan
 over all the table, I look for the value "5" in all the dataset or I'm
 just
 looking for in one newest version of each value.


 On 10/04/14 16:52, gortiz wrote:

 I was trying to check the behaviour of HBase. The cluster is a group of
> old computers, one master, five slaves, each one with 2Gb, so, 12gb in
> total.
> The table has a column family with 1000 columns and each column with
> 100
> versions.
> There's another column faimily with four columns an one image of 100kb.
>   (I've tried without this column family as well.)
> The table is partitioned manually in all the slaves, so data are
> balanced
> in the cluster.
>
> I'm executing this sentence *scan 'table1', {FILTER => "ValueFilter(=,
> 'binary:5')"* in HBase 0.94.6
> My time for lease and rpc is three minutes.
> Since, it's a full scan of the table, I have been playing with the
> BLOCKCACHE as well (just disable and enable, not about the size of
> it). I
> thought that it was going to have too much calls to the GC. I'm not
> sure
> about this point.
>
> I know that it's not the best way to use HBase, it's just a test. I
> think
> that it's not working because the hardware isn't enough, although, I
> would
> like to try some kind of tunning to improve it.
>
>
>
>
>
>
>
>
> On 10/04/14 14:21, Ted Yu wrote:
>
> Can you give us a bit more information:
>>
>> HBase release you're running
>> What filters are used for the scan
>>
>> Thanks
>>
>> On Apr 10, 2014, at 2:36 AM, gortiz  wrote:
>>
>>   I got this error when I execute a full scan with filters about a
>> table.
>>
>>> Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.
>>> regionserver.LeaseException:
>>> org.apache.hadoop.hbase.regionserver.LeaseException: lease
>>> '-4165751462641113359' does not exist
>>>  at 
>>> org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
>>>
>>>
>>>  at org.apache.hadoop.hbase.regionserver.HRegionServer.
>>> next(HRegionServer.java:2482)
>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>  at sun.reflect.NativeMethodAccessorImpl.invoke(
>>> NativeMethodAccessorImpl.java:39)
>>>  at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>>> DelegatingMethodAccessorImpl.java:25)
>>>  at java.lang.reflect.Method.invoke(Method.java:597)
>>>  at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
>>> WritableRpcEn

Re: hbase region server reboot steps

2014-04-11 Thread Rural Hunter
Yes, I've already stopped the balancer and manually moved the regions to 
other servers. Now I'm decommissioning the dfs data node on the server. 
After that I will reboot the server.


于 2014/4/9 22:28, Jean-Marc Spaggiari 写道:

Hum.

Disable load balancer, and move all the regions manually to other hosts
using the shell? Then hard restart it?

JM




Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread gortiz

Last test I have done it's to reduce the number of versions to 100.
So, right now, I have 100 rows with 100 versions each one.
Times are: (I got the same times for blocksize of 64Ks and 1Mb)
100row-1000versions + blockcache-> 80s.
100row-1000versions + No blockcache-> 70s.

100row-*100*versions + blockcache-> 7.3s.
100row-*100*versions + No blockcache-> 6.1s.

What's the reasons of this? I guess HBase is enough smart for not 
consider old versions, so, it just checks the newest. But, I reduce 10 
times the size (in versions) and I got a 10x of performance.


The filter is scan 'filters', {FILTER => "ValueFilter(=, 
'binary:5')",STARTROW => '10100101', 
STOPROW => '60100201'}



On 11/04/14 09:04, gortiz wrote:
Well, I guessed that, what it doesn't make too much sense because it's 
so slow. I only have right now 100 rows with 1000 versions each row.
I have checked the size of the dataset and each row is about 700Kbytes 
(around 7Gb, 100rowsx1000versions). So, it should only check 100 rows 
x 700Kbytes = 70Mb, since it just check the newest version. How can it 
spend too many time checking this quantity of data?


I'm generating again the dataset with a bigger blocksize (previously 
was 64Kb, now, it's going to be 1Mb). I could try tunning the scanning 
and baching parameters, but I don't think they're going to affect too 
much.


Another test I want to do, it's generate the same dataset with just 
100versions, It should spend around the same time, right? Or am I wrong?


On 10/04/14 18:08, Ted Yu wrote:

It should be newest version of each value.

Cheers


On Thu, Apr 10, 2014 at 9:55 AM, gortiz  wrote:

Another little question is, when the filter I'm using, Do I check 
all the
versions? or just the newest? Because, I'm wondering if when I do a 
scan
over all the table, I look for the value "5" in all the dataset or 
I'm just

looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:

I was trying to check the behaviour of HBase. The cluster is a 
group of

old computers, one master, five slaves, each one with 2Gb, so, 12gb in
total.
The table has a column family with 1000 columns and each column 
with 100

versions.
There's another column faimily with four columns an one image of 
100kb.

  (I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are 
balanced

in the cluster.

I'm executing this sentence *scan 'table1', {FILTER => "ValueFilter(=,
'binary:5')"* in HBase 0.94.6
My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the
BLOCKCACHE as well (just disable and enable, not about the size of 
it). I
thought that it was going to have too much calls to the GC. I'm not 
sure

about this point.

I know that it's not the best way to use HBase, it's just a test. I 
think
that it's not working because the hardware isn't enough, although, 
I would

like to try some kind of tunning to improve it.








On 10/04/14 14:21, Ted Yu wrote:


Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz  wrote:

  I got this error when I execute a full scan with filters about a 
table.
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hbase.regionserver.LeaseException:

org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4165751462641113359' does not exist
 at 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) 



 at org.apache.hadoop.hbase.regionserver.HRegionServer.
next(HRegionServer.java:2482)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
WritableRpcEngine.java:320)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
HBaseServer.java:1428)

I have read about increase the lease time and rpc time, but it's not
working.. what else could I try?? The table isn't too big. I have 
been
checking the logs from GC, HMaster and some RegionServers and I 
didn't see
anything weird. I tried as well to try with a couple of caching 
values.





--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_







--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_



Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread gortiz
Well, I guessed that, what it doesn't make too much sense because it's 
so slow. I only have right now 100 rows with 1000 versions each row.
I have checked the size of the dataset and each row is about 700Kbytes 
(around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x 
700Kbytes = 70Mb, since it just check the newest version. How can it 
spend too many time checking this quantity of data?


I'm generating again the dataset with a bigger blocksize (previously was 
64Kb, now, it's going to be 1Mb). I could try tunning the scanning and 
baching parameters, but I don't think they're going to affect too much.


Another test I want to do, it's generate the same dataset with just 
100versions, It should spend around the same time, right? Or am I wrong?


On 10/04/14 18:08, Ted Yu wrote:

It should be newest version of each value.

Cheers


On Thu, Apr 10, 2014 at 9:55 AM, gortiz  wrote:


Another little question is, when the filter I'm using, Do I check all the
versions? or just the newest? Because, I'm wondering if when I do a scan
over all the table, I look for the value "5" in all the dataset or I'm just
looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:


I was trying to check the behaviour of HBase. The cluster is a group of
old computers, one master, five slaves, each one with 2Gb, so, 12gb in
total.
The table has a column family with 1000 columns and each column with 100
versions.
There's another column faimily with four columns an one image of 100kb.
  (I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are balanced
in the cluster.

I'm executing this sentence *scan 'table1', {FILTER => "ValueFilter(=,
'binary:5')"* in HBase 0.94.6
My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the
BLOCKCACHE as well (just disable and enable, not about the size of it). I
thought that it was going to have too much calls to the GC. I'm not sure
about this point.

I know that it's not the best way to use HBase, it's just a test. I think
that it's not working because the hardware isn't enough, although, I would
like to try some kind of tunning to improve it.








On 10/04/14 14:21, Ted Yu wrote:


Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz  wrote:

  I got this error when I execute a full scan with filters about a table.

Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4165751462641113359' does not exist
 at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)

 at org.apache.hadoop.hbase.regionserver.HRegionServer.
next(HRegionServer.java:2482)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
WritableRpcEngine.java:320)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
HBaseServer.java:1428)

I have read about increase the lease time and rpc time, but it's not
working.. what else could I try?? The table isn't too big. I have been
checking the logs from GC, HMaster and some RegionServers and I didn't see
anything weird. I tried as well to try with a couple of caching values.




--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_




--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_