St.Ack, 

I  think you're side stepping the issue concerning schema design. 

Since HBase isn't my core focus, I also have to ask since when has heap sizes 
over 16GB been the norm? 
(Really 8GB seems to be quite a large heap size... ) 


On Oct 31, 2014, at 11:15 AM, Stack <st...@duboce.net> wrote:

> On Thu, Oct 30, 2014 at 8:20 AM, Andrejs Dubovskis <dubis...@gmail.com>
> wrote:
> 
>> Hi!
>> 
>> We have a bunch of rows on HBase which store varying sizes of data
>> (1-50MB). We use HBase versioning and keep up to 10000 column
>> versions. Typically each column has only few versions. But in rare
>> cases it may has thousands versions.
>> 
>> The Mapreduce alghoritm uses full scan and our algorithm requires all
>> versions to produce the result. So, we call scan.setMaxVersions().
>> 
>> In worst case Region Server returns one row only, but huge one. The
>> size is unpredictable and can not be controlled, because using
>> parameters we can control row count only. And the MR task can throws
>> OOME even if it has 50Gb heap.
>> 
>> Is it possible to handle this situation? For example, RS should not
>> send the raw to client, if the last has no memory to handle the row.
>> In this case client can handle error and fetch each row's version in a
>> separate get request.
>> 
> 
> See HBASE-11544 "[Ergonomics] hbase.client.scanner.caching is dogged and
> will try to return batch even if it means OOME".
> St.Ack

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to