RE: Hbase heap size

2013-01-18 Thread Chalcy Raja
Looking forward to the blog!

Thanks,
Chalcy

-Original Message-
From: lars hofhansl [mailto:la...@apache.org] 
Sent: Thursday, January 17, 2013 9:24 PM
To: user@hbase.apache.org
Subject: Re: Hbase heap size

You'll  need more memory then, or more machines with not much disk attached.

You can look at it this way:
- The largest useful region size is 20G (at least that is the current common 
tribal knowledge).
- Each region has at least one memstore (one per column family actually, let's 
just say one for the sake of argument).

If you have 10T disks per region server then you need ~170 regions per region 
server (3*20G*170 ~ 10T).
If you give the memstore 35% of your heap and have 128M memstores you would 
need 170*128M/0.35 G ~ 60G of heap. That's already too large.
If you make the memstores 600M, you'll need 17*600/0.35 G ~ 290G of heap (if 
all memstores are being written to simultaneously).

There are ways to address that.
If you expect that not all memstores are written to at the same time, you can 
leave them smaller and increase their size multipliers, which allows them to be 
temporarily larger.

Again, this is just back of the envelope.

This is a lengthy topic, I'm planning a blog post around this. There are a 
bunch or parameters that can be tweaked based on workload.

The main take away for HBase is that you have to match disk space with Java 
heap.

-- Lars




 From: Varun Sharma 
To: user@hbase.apache.org; lars hofhansl 
Sent: Thursday, January 17, 2013 3:24 PM
Subject: Re: Hbase heap size
 
Thanks for the info. I am looking for a balance where I have a write heavy work 
load and need excellent read latency. So 40 % to block cache for caching, 35 % 
to memstore.

But I would like to also reduce the number of HFiles and amount of compaction 
activity. So, having few number of regions and much larger memstore flush size 
- like 640M. Could a large memstore flush be a problem in some sense ? Are 
updates blocked on memstore flush ? In my case, I would expect a 600M sized 
memstore to materialize into a 200-300M sized HFile.

On Thu, Jan 17, 2013 at 2:31 PM, lars hofhansl  wrote:

> A good rule of thumb that I found is to give each region server a Java 
> help that is roughly 1/100th of the size of the disk space per region 
> server.
> (that is assuming all the default setting: 10G regions, 128M 
> memstores, 40% of heap for memstores, 20% of heap for block cache, 
> 3-way replication)
>
>
> That is, if you give the region server a 10G heap, you can expect to 
> be able to serve about 1T worth of disk space.
>
> That can be tweaked of course (increase the region size to 20G, if 
> your load is mostly readonly you shrink the memstores, etc).
> That way you can reduce that ratio to 1/200 or even less.
>
>
> I'm sure other folks will have more detailed input.
>
>
> -- Lars
>
>
>
> 
>  From: Varun Sharma 
> To: user@hbase.apache.org
> Sent: Thursday, January 17, 2013 1:15 PM
> Subject: Hbase heap size
>
> Hi,
>
> I was wondering how much folks typical give to hbase and how much they 
> leave for the file system cache for the region server. I am using 
> hbase
> 0.94 and running only the region server and data node daemons. I have 
> a system with 15G ram.
>
> Thanks
>


RE: Just joined the user group and have a question

2013-01-17 Thread Chalcy Raja
Thank you, Anil, for your reply.  I am beginning to get the feeling, that may 
be we should not push both in the same cluster.  In three replies, I get that 
same info from 2 of you.

Thanks again,
Chalcy

-Original Message-
From: anil gupta [mailto:anilgupt...@gmail.com] 
Sent: Thursday, January 17, 2013 12:48 PM
To: user@hbase.apache.org
Subject: Re: Just joined the user group and have a question

Hi Chalcy,

In addition to points others have made. Also have a look at your Disk I/O load. 
Mapreduce jobs are disk i/o intensive. When a MapReduce job is running there 
might be a contention for Disk i/o. Contention in Disk i/o might lead to 
request timeouts in HBase. Hence, you will start having trouble with HBase 
cluster.
It's little to tricky to get HBase and MapReduce going on the same cluster due 
to the completely different nature of MapReduce and HBase. Former is batch 
processing and latter is near real-time processing. If you happen to run them 
on one cluster then you will have to sacrifice the performance of any one of 
them. Both of them cannot be optimized.

HTH,
Anil

On Thu, Jan 17, 2013 at 9:34 AM, Doug Meil wrote:

> Hi there-
>
> If you're absolutely new to Hbase, you might want to check out the 
> Hbase refGuide in the architecture, performance, and troubleshooting 
> chapters first.
>
> http://hbase.apache.org/book.html
>
> In terms of determining why your region servers "just die", I think 
> you need to read the background information then provide more 
> information on your cluster and what you're trying to do because 
> although there are a lot of people on this dist-list that want to 
> help, you're not giving folks a whole lot to go on.
>
>
>
>
> On 1/17/13 12:24 PM, "Chalcy Raja"  wrote:
>
> >Hi HBASE Gurus,
> >
> >
> >
> >I am Chalcy Raja and I joined the hbase group yesterday.  I am 
> >already a member of hive and sqoop user groups.  Looking forward to 
> >learn and share information about hbase here!
> >
> >
> >
> >Have a question:  We have a cluster where we run hive jobs and also 
> >hbase.  There are stability issues like region servers just die.  We 
> >are looking into fine tuning.  When I read about performance and also 
> >heard from another user is separate mapreduce from hbase.  How do I do that?
> >If I understand that as running tasktrackers on some and hbase region 
> >servers on some, then we will run into data locality issues and I 
> >believe it will perform poorly.
> >
> >
> >
> >Definitely I am not the only one running into this issue.  Any 
> >thoughts on how to resolve this issue?
> >
> >
> >
> >Thanks,
> >
> >Chalcy
>
>
>


--
Thanks & Regards,
Anil Gupta


RE: Just joined the user group and have a question

2013-01-17 Thread Chalcy Raja
Thanks! Doug.  I am not absolutely new to hbase.  Like in Kevin's email, 
because of mapred job (hive) contention, hbase regionservers die and whole 
hbase go down.

I understand that we have to somehow logically or physically separate the 
clusters.

--Chalcy

-Original Message-
From: Doug Meil [mailto:doug.m...@explorysmedical.com] 
Sent: Thursday, January 17, 2013 12:35 PM
To: user@hbase.apache.org
Subject: Re: Just joined the user group and have a question

Hi there-

If you're absolutely new to Hbase, you might want to check out the Hbase 
refGuide in the architecture, performance, and troubleshooting chapters first.

http://hbase.apache.org/book.html

In terms of determining why your region servers "just die", I think you need to 
read the background information then provide more information on your cluster 
and what you're trying to do because although there are a lot of people on this 
dist-list that want to help, you're not giving folks a whole lot to go on.




On 1/17/13 12:24 PM, "Chalcy Raja"  wrote:

>Hi HBASE Gurus,
>
>
>
>I am Chalcy Raja and I joined the hbase group yesterday.  I am already 
>a member of hive and sqoop user groups.  Looking forward to learn and 
>share information about hbase here!
>
>
>
>Have a question:  We have a cluster where we run hive jobs and also 
>hbase.  There are stability issues like region servers just die.  We 
>are looking into fine tuning.  When I read about performance and also 
>heard from another user is separate mapreduce from hbase.  How do I do that?
>If I understand that as running tasktrackers on some and hbase region 
>servers on some, then we will run into data locality issues and I 
>believe it will perform poorly.
>
>
>
>Definitely I am not the only one running into this issue.  Any thoughts 
>on how to resolve this issue?
>
>
>
>Thanks,
>
>Chalcy





RE: Just joined the user group and have a question

2013-01-17 Thread Chalcy Raja
Hi Kevin,

Thanks for the reply.  Currently using 10 mappers and 10 reducers on each node. 
 With 32 GB memory, allotted 2 GB for hbase heapsize, 
mapred.map.child.java.opts and reduce.child.java.opts is 1 GB, and therefore 
having 10 mappers and 10 reducers looks like not a bad idea.

>From what you are saying, I can only use 10 MR(12-2)( meaning 6 mappers and 4 
>reducers??? Or 8 and 2 or 5 and 5?), is it not very few?

I would also want to test separating MR and hbase by running TT on some and 
region server on some like I thought.  Also thinking of separating clusters as 
well.  In that case we can get higher cpu servers for hbase.  

Only after adding hbase to the hadoop cluster, we are seeing stability issues 
and that is the reason trying to find out the working solution.

Thanks for your time,
Chalcy

-Original Message-
From: Kevin O'dell [mailto:kevin.od...@cloudera.com] 
Sent: Thursday, January 17, 2013 12:33 PM
To: user@hbase.apache.org
Subject: Re: Just joined the user group and have a question

Chalcy,

  Glad to have you aboard. One thing to look at is your max map and reduce 
slots that you are currently allowing. Typically, we look at the CPU 
architecture and say if it is not HT(hyperthreaded) then it is a 1:1, if it is 
using HT 1:1.5. Dual quad core without HT you would be able to use 8 total MR 
slots, but since you have HBase you should give your self a couple slots. This 
means only using 6 MR slots. Dual quad core with HT you would have 16 logical 
cores, you could use 12 MR slots, but since you have HBase you want to leave a 
couple cores. This means only using 9 or 10 slots for MR. This can help with 
some of the pressure from using MR/hive/pig on the same cluster.

  As for separating MR and HBase. You could break down your processes so that 
TT run on some nodes and RS run on others, but typically people will setup two 
separate clusters.


On Thu, Jan 17, 2013 at 12:24 PM, Chalcy Raja  wrote:

> Hi HBASE Gurus,
>
>
>
> I am Chalcy Raja and I joined the hbase group yesterday.  I am already 
> a member of hive and sqoop user groups.  Looking forward to learn and 
> share information about hbase here!
>
>
>
> Have a question:  We have a cluster where we run hive jobs and also hbase.
>  There are stability issues like region servers just die.  We are 
> looking into fine tuning.  When I read about performance and also 
> heard from another user is separate mapreduce from hbase.  How do I do 
> that?  If I understand that as running tasktrackers on some and hbase 
> region servers on some, then we will run into data locality issues and 
> I believe it will perform poorly.
>
>
>
> Definitely I am not the only one running into this issue.  Any 
> thoughts on how to resolve this issue?
>
>
>
> Thanks,
>
> Chalcy
>



--
Kevin O'Dell
Customer Operations Engineer, Cloudera


Just joined the user group and have a question

2013-01-17 Thread Chalcy Raja
Hi HBASE Gurus,



I am Chalcy Raja and I joined the hbase group yesterday.  I am already a member 
of hive and sqoop user groups.  Looking forward to learn and share information 
about hbase here!



Have a question:  We have a cluster where we run hive jobs and also hbase.  
There are stability issues like region servers just die.  We are looking into 
fine tuning.  When I read about performance and also heard from another user is 
separate mapreduce from hbase.  How do I do that?  If I understand that as 
running tasktrackers on some and hbase region servers on some, then we will run 
into data locality issues and I believe it will perform poorly.



Definitely I am not the only one running into this issue.  Any thoughts on how 
to resolve this issue?



Thanks,

Chalcy