St.Ack, I think you're side stepping the issue concerning schema design.
Since HBase isn't my core focus, I also have to ask since when has heap sizes over 16GB been the norm? (Really 8GB seems to be quite a large heap size... ) On Oct 31, 2014, at 11:15 AM, Stack <st...@duboce.net> wrote: > On Thu, Oct 30, 2014 at 8:20 AM, Andrejs Dubovskis <dubis...@gmail.com> > wrote: > >> Hi! >> >> We have a bunch of rows on HBase which store varying sizes of data >> (1-50MB). We use HBase versioning and keep up to 10000 column >> versions. Typically each column has only few versions. But in rare >> cases it may has thousands versions. >> >> The Mapreduce alghoritm uses full scan and our algorithm requires all >> versions to produce the result. So, we call scan.setMaxVersions(). >> >> In worst case Region Server returns one row only, but huge one. The >> size is unpredictable and can not be controlled, because using >> parameters we can control row count only. And the MR task can throws >> OOME even if it has 50Gb heap. >> >> Is it possible to handle this situation? For example, RS should not >> send the raw to client, if the last has no memory to handle the row. >> In this case client can handle error and fetch each row's version in a >> separate get request. >> > > See HBASE-11544 "[Ergonomics] hbase.client.scanner.caching is dogged and > will try to return batch even if it means OOME". > St.Ack
smime.p7s
Description: S/MIME cryptographic signature