Re: OOM when fetching all versions of single row

2014-11-03 Thread Michael Segel
St.Ack, I think you're side stepping the issue concerning schema design. Since HBase isn't my core focus, I also have to ask since when has heap sizes over 16GB been the norm? (Really 8GB seems to be quite a large heap size... ) On Oct 31, 2014, at 11:15 AM, Stack st...@duboce.net wrote:

Re: OOM when fetching all versions of single row

2014-11-03 Thread Bryan Beaudreault
There are many blog posts and articles about people turning for 16GB heaps since java7 and the G1 collector became mainstream. We run with 25GB heap ourselves with very short GC pauses using a mostly untuned G1 collector. Just one example is the excellent blog post by Intel,

Re: OOM when fetching all versions of single row

2014-11-03 Thread Michael Segel
Bryan, I wasn’t saying St.Ack’s post wasn’t relevant, but that its not addressing the easiest thing to fix. Schema design. IMHO, that’s shooting one’s self in the foot. You shouldn’t be using versioning to capture temporal data. On Nov 3, 2014, at 1:54 PM, Bryan Beaudreault

Re: OOM when fetching all versions of single row

2014-10-31 Thread Michael Segel
Here’s the simple answer. Don’t do it. They way you are abusing versioning is a bad design. Redesign your schema. On Oct 30, 2014, at 10:20 AM, Andrejs Dubovskis dubis...@gmail.com wrote: Hi! We have a bunch of rows on HBase which store varying sizes of data (1-50MB). We use HBase

Re: OOM when fetching all versions of single row

2014-10-31 Thread Stack
On Thu, Oct 30, 2014 at 8:20 AM, Andrejs Dubovskis dubis...@gmail.com wrote: Hi! We have a bunch of rows on HBase which store varying sizes of data (1-50MB). We use HBase versioning and keep up to 1 column versions. Typically each column has only few versions. But in rare cases it may

OOM when fetching all versions of single row

2014-10-30 Thread Andrejs Dubovskis
Hi! We have a bunch of rows on HBase which store varying sizes of data (1-50MB). We use HBase versioning and keep up to 1 column versions. Typically each column has only few versions. But in rare cases it may has thousands versions. The Mapreduce alghoritm uses full scan and our algorithm