Re: PerformanceEvaluation results

2012-03-20 Thread Oliver Meyn (GBIF)
Apologies for responding to myself, but after some more testing I've concluded 
that we had a minor network bottleneck that was partially masking the real 
problem: not enough disks.  Deductions based on ganglia metrics in a follow-up 
blog post:

http://gbif.blogspot.com/2012/03/hbase-performance-evaluation-continued.html

Cheers,
Oliver

On 2012-02-28, at 5:10 PM, Oliver Meyn (GBIF) wrote:

> Hi all,
> 
> I've spent the last couple of weeks working with PerformanceEvaluation, 
> trying to understand scan performance in our little cluster.  I've written a 
> blog post with the results and would really welcome any input you may have.
> 
> http://gbif.blogspot.com/2012/02/performance-evaluation-of-hbase.html
> 
> Cheers,
> Oliver


--
Oliver Meyn
Software Developer
Global Biodiversity Information Facility (GBIF)
+45 35 32 15 12
http://www.gbif.org



Re: PerformanceEvaluation results

2012-03-20 Thread Stack
On Tue, Mar 20, 2012 at 8:53 AM, Oliver Meyn (GBIF)  wrote:
> Apologies for responding to myself, but after some more testing I've 
> concluded that we had a minor network bottleneck that was partially masking 
> the real problem: not enough disks.  Deductions based on ganglia metrics in a 
> follow-up blog post:
>
> http://gbif.blogspot.com/2012/03/hbase-performance-evaluation-continued.html
>

Nice post Oliver.
St.Ack


Re: PerformanceEvaluation results

2012-03-20 Thread lars hofhansl
We should like to this from the reference guide.



- Original Message -
From: Stack 
To: user@hbase.apache.org
Cc: 
Sent: Tuesday, March 20, 2012 9:17 AM
Subject: Re: PerformanceEvaluation results

On Tue, Mar 20, 2012 at 8:53 AM, Oliver Meyn (GBIF)  wrote:
> Apologies for responding to myself, but after some more testing I've 
> concluded that we had a minor network bottleneck that was partially masking 
> the real problem: not enough disks.  Deductions based on ganglia metrics in a 
> follow-up blog post:
>
> http://gbif.blogspot.com/2012/03/hbase-performance-evaluation-continued.html
>

Nice post Oliver.
St.Ack



Re: PerformanceEvaluation results

2012-03-20 Thread Stack
On Tue, Mar 20, 2012 at 9:55 AM, lars hofhansl  wrote:
> We should like to this from the reference guide.
>
>
It would work well as a second case study to follow the explorsys one:
http://hbase.apache.org/book.html#trouble.casestudy

St.Ack


Re: PerformanceEvaluation results

2012-03-21 Thread Oliver Meyn (GBIF)
By all means please link - I would be very happy if we could amortize the 
amount of time I spent digging (and learning various low-level hardware things) 
across other people's problems :)

Oliver

On 2012-03-20, at 5:57 PM, Stack wrote:

> On Tue, Mar 20, 2012 at 9:55 AM, lars hofhansl  wrote:
>> We should like to this from the reference guide.
>> 
>> 
> It would work well as a second case study to follow the explorsys one:
> http://hbase.apache.org/book.html#trouble.casestudy
> 
> St.Ack
> 



Re: PerformanceEvaluation results

2012-03-21 Thread Doug Meil

Will do.





On 3/20/12 12:55 PM, "lars hofhansl"  wrote:

>We should like to this from the reference guide.
>
>
>
>- Original Message -
>From: Stack 
>To: user@hbase.apache.org
>Cc: 
>Sent: Tuesday, March 20, 2012 9:17 AM
>Subject: Re: PerformanceEvaluation results
>
>On Tue, Mar 20, 2012 at 8:53 AM, Oliver Meyn (GBIF) 
>wrote:
>> Apologies for responding to myself, but after some more testing I've
>>concluded that we had a minor network bottleneck that was partially
>>masking the real problem: not enough disks.  Deductions based on ganglia
>>metrics in a follow-up blog post:
>>
>> 
>>http://gbif.blogspot.com/2012/03/hbase-performance-evaluation-continued.h
>>tml
>>
>
>Nice post Oliver.
>St.Ack
>
>




Re: PerformanceEvaluation results

2012-06-08 Thread Oliver Meyn (GBIF)
And here's a followup blog post with testing of our new cluster, and some more 
conclusions about PerformanceEvaluation:

http://gbif.blogspot.dk/2012/06/faster-hbase-hardware-matters.html

Cheers,
Oliver

On 2012-03-21, at 6:52 PM, Doug Meil wrote:

> 
> Will do.
> 
> 
> 
> 
> 
> On 3/20/12 12:55 PM, "lars hofhansl"  wrote:
> 
>> We should like to this from the reference guide.
>> 
>> 
>> 
>> - Original Message -
>> From: Stack 
>> To: user@hbase.apache.org
>> Cc: 
>> Sent: Tuesday, March 20, 2012 9:17 AM
>> Subject: Re: PerformanceEvaluation results
>> 
>> On Tue, Mar 20, 2012 at 8:53 AM, Oliver Meyn (GBIF) 
>> wrote:
>>> Apologies for responding to myself, but after some more testing I've
>>> concluded that we had a minor network bottleneck that was partially
>>> masking the real problem: not enough disks.  Deductions based on ganglia
>>> metrics in a follow-up blog post:
>>> 
>>> 
>>> http://gbif.blogspot.com/2012/03/hbase-performance-evaluation-continued.h
>>> tml
>>> 
>> 
>> Nice post Oliver.
>> St.Ack
>> 
>> 
> 
> 
> 


--
Oliver Meyn
Software Developer
Global Biodiversity Information Facility (GBIF)
+45 35 32 15 12
http://www.gbif.org



Re: PerformanceEvaluation results

2012-02-01 Thread Michael Segel
No.
What tuning did you do?
Why such a small cluster?

Sorry, but when you start off with a bad hardware configuration, you can get 
Hadoop/HBase to work, but performance will always be sub-optimal.



Sent from my iPhone

On Feb 1, 2012, at 6:52 AM, "Tim Robertson"  wrote:

> Hi all,
> 
> We have a 3 node cluster (CD3u2) with the following hardware:
> 
> RegionServers (+DN + TT)
>  CPU: 2x Intel(R) Xeon(R) CPU E5630 @ 2.53GHz (quad)
>  Disks: 6x250G SATA 5.4K
>  Memory: 24GB
> 
> Master (+ZK, JT, NN)
>  CPU: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz, 2x6MB (quad)
>  Disks: 2x500G SATA 7.2K
>  Memory: 8GB
> 
> Memory wise, we have:
> Master:
>  NN: 1GB
>  JT: 1GB
>  HBase master: 6GB
>  ZK: 1GB
> RegionServers:
>  RegionServer: 6GB
>  TaskTracker: 1GB
>  11 Mappers @ 1GB each
>  7 Reducers @ 1GB each
> 
> HDFS was empty, and I ran randomWrite and scan both with number
> clients of 50 (seemed to spawn 500 Mappers though...)
> 
> randomWrite:
> 12/02/01 13:27:47 INFO mapred.JobClient: ROWS=52428500
> 12/02/01 13:27:47 INFO mapred.JobClient: ELAPSED_TIME=84504886
> 
> scan:
> 12/02/01 13:42:52 INFO mapred.JobClient: ROWS=52428500
> 12/02/01 13:42:52 INFO mapred.JobClient: ELAPSED_TIME=8158664
> 
> Would I be correct in thinking that this is way below what is to be
> expected of this hardware?
> We're setting up ganglia now to start debugging, but any suggestions
> on how to diagnose this would be greatly appreciated.
> 
> Thanks!
> Tim


Re: PerformanceEvaluation results

2012-02-01 Thread Tim Robertson
Thanks Michael,

It's a small cluster, but is the hardware so bad?  We are particularly
interested in relatively low load for random read write (2000
transactions per second on <1k rows) but a decent full table scan
speed, as we aim to mount Hive tables on HBase backed tables.

Regarding tuning... not exactly sure which you would be interested in
seeing.  The config is all here:
http://code.google.com/p/gbif-common-resources/source/browse/#svn%2Fcluster-puppet%2Fmodules%2Fhadoop%2Ftemplates

Cheers,
Tim



On Wed, Feb 1, 2012 at 1:56 PM, Michael Segel  wrote:
> No.
> What tuning did you do?
> Why such a small cluster?
>
> Sorry, but when you start off with a bad hardware configuration, you can get 
> Hadoop/HBase to work, but performance will always be sub-optimal.
>
>
>
> Sent from my iPhone
>
> On Feb 1, 2012, at 6:52 AM, "Tim Robertson"  wrote:
>
>> Hi all,
>>
>> We have a 3 node cluster (CD3u2) with the following hardware:
>>
>> RegionServers (+DN + TT)
>>  CPU: 2x Intel(R) Xeon(R) CPU E5630 @ 2.53GHz (quad)
>>  Disks: 6x250G SATA 5.4K
>>  Memory: 24GB
>>
>> Master (+ZK, JT, NN)
>>  CPU: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz, 2x6MB (quad)
>>  Disks: 2x500G SATA 7.2K
>>  Memory: 8GB
>>
>> Memory wise, we have:
>> Master:
>>  NN: 1GB
>>  JT: 1GB
>>  HBase master: 6GB
>>  ZK: 1GB
>> RegionServers:
>>  RegionServer: 6GB
>>  TaskTracker: 1GB
>>  11 Mappers @ 1GB each
>>  7 Reducers @ 1GB each
>>
>> HDFS was empty, and I ran randomWrite and scan both with number
>> clients of 50 (seemed to spawn 500 Mappers though...)
>>
>> randomWrite:
>> 12/02/01 13:27:47 INFO mapred.JobClient:     ROWS=52428500
>> 12/02/01 13:27:47 INFO mapred.JobClient:     ELAPSED_TIME=84504886
>>
>> scan:
>> 12/02/01 13:42:52 INFO mapred.JobClient:     ROWS=52428500
>> 12/02/01 13:42:52 INFO mapred.JobClient:     ELAPSED_TIME=8158664
>>
>> Would I be correct in thinking that this is way below what is to be
>> expected of this hardware?
>> We're setting up ganglia now to start debugging, but any suggestions
>> on how to diagnose this would be greatly appreciated.
>>
>> Thanks!
>> Tim


Re: PerformanceEvaluation results

2012-02-01 Thread Doug Meil

Hi there-

These perf-tests on small clusters are fairly common questions on the
dist-list, but it needs to be stressed that Hbase (and HDFS) doesn't begin
to stretch it's legs until about 5 nodes.

http://hbase.apache.org/book.html#arch.overview






On 2/1/12 7:51 AM, "Tim Robertson"  wrote:

>Hi all,
>
>We have a 3 node cluster (CD3u2) with the following hardware:
>
>RegionServers (+DN + TT)
>  CPU: 2x Intel(R) Xeon(R) CPU E5630 @ 2.53GHz (quad)
>  Disks: 6x250G SATA 5.4K
>  Memory: 24GB
>
>Master (+ZK, JT, NN)
>  CPU: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz, 2x6MB (quad)
>  Disks: 2x500G SATA 7.2K
>  Memory: 8GB
>
>Memory wise, we have:
>Master:
>  NN: 1GB
>  JT: 1GB
>  HBase master: 6GB
>  ZK: 1GB
>RegionServers:
>  RegionServer: 6GB
>  TaskTracker: 1GB
>  11 Mappers @ 1GB each
>  7 Reducers @ 1GB each
>
>HDFS was empty, and I ran randomWrite and scan both with number
>clients of 50 (seemed to spawn 500 Mappers though...)
>
>randomWrite:
>12/02/01 13:27:47 INFO mapred.JobClient: ROWS=52428500
>12/02/01 13:27:47 INFO mapred.JobClient: ELAPSED_TIME=84504886
>
>scan:
>12/02/01 13:42:52 INFO mapred.JobClient: ROWS=52428500
>12/02/01 13:42:52 INFO mapred.JobClient: ELAPSED_TIME=8158664
>
>Would I be correct in thinking that this is way below what is to be
>expected of this hardware?
>We're setting up ganglia now to start debugging, but any suggestions
>on how to diagnose this would be greatly appreciated.
>
>Thanks!
>Tim
>




Re: PerformanceEvaluation results

2012-02-01 Thread Stack
On Wed, Feb 1, 2012 at 4:51 AM, Tim Robertson  wrote:
> We're setting up ganglia now to start debugging, but any suggestions
> on how to diagnose this would be greatly appreciated.
>

Get Ganglia set up Tim and then lets chat.  You've checked out the
perf section in the reference manual?  What numbers you need?

St.Ack


Re: PerformanceEvaluation results

2012-02-01 Thread Michel Segel
Tim,

Here's the problem in a nutshell, 
With respect to hardware, you have  5.4k rpms ? 6 drive and 8 cores?
Small slow drives, and still  a ratio less than one when you compare drives to 
spindles.

I appreciate that you want to maximize performance, but when it comes to 
tuning, you have to start before you get your hardware. 

 You are asking a question about tuning, but how can we answer if the numbers 
are ok?
Have you looked at your GCs and implemented mslabs? We don't know. Network 
configuration?

I mean that there's a lot missing and fine tuning a cluster is something you 
have to do on your own. I guess I could say your numbers look fine to me for 
that config... But honestly, it would be a swag.


Sent from a remote device. Please excuse any typos...

Mike Segel

On Feb 1, 2012, at 7:09 AM, Tim Robertson  wrote:

> Thanks Michael,
> 
> It's a small cluster, but is the hardware so bad?  We are particularly
> interested in relatively low load for random read write (2000
> transactions per second on <1k rows) but a decent full table scan
> speed, as we aim to mount Hive tables on HBase backed tables.
> 
> Regarding tuning... not exactly sure which you would be interested in
> seeing.  The config is all here:
> http://code.google.com/p/gbif-common-resources/source/browse/#svn%2Fcluster-puppet%2Fmodules%2Fhadoop%2Ftemplates
> 
> Cheers,
> Tim
> 
> 
> 
> On Wed, Feb 1, 2012 at 1:56 PM, Michael Segel  
> wrote:
>> No.
>> What tuning did you do?
>> Why such a small cluster?
>> 
>> Sorry, but when you start off with a bad hardware configuration, you can get 
>> Hadoop/HBase to work, but performance will always be sub-optimal.
>> 
>> 
>> 
>> Sent from my iPhone
>> 
>> On Feb 1, 2012, at 6:52 AM, "Tim Robertson"  
>> wrote:
>> 
>>> Hi all,
>>> 
>>> We have a 3 node cluster (CD3u2) with the following hardware:
>>> 
>>> RegionServers (+DN + TT)
>>>  CPU: 2x Intel(R) Xeon(R) CPU E5630 @ 2.53GHz (quad)
>>>  Disks: 6x250G SATA 5.4K
>>>  Memory: 24GB
>>> 
>>> Master (+ZK, JT, NN)
>>>  CPU: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz, 2x6MB (quad)
>>>  Disks: 2x500G SATA 7.2K
>>>  Memory: 8GB
>>> 
>>> Memory wise, we have:
>>> Master:
>>>  NN: 1GB
>>>  JT: 1GB
>>>  HBase master: 6GB
>>>  ZK: 1GB
>>> RegionServers:
>>>  RegionServer: 6GB
>>>  TaskTracker: 1GB
>>>  11 Mappers @ 1GB each
>>>  7 Reducers @ 1GB each
>>> 
>>> HDFS was empty, and I ran randomWrite and scan both with number
>>> clients of 50 (seemed to spawn 500 Mappers though...)
>>> 
>>> randomWrite:
>>> 12/02/01 13:27:47 INFO mapred.JobClient: ROWS=52428500
>>> 12/02/01 13:27:47 INFO mapred.JobClient: ELAPSED_TIME=84504886
>>> 
>>> scan:
>>> 12/02/01 13:42:52 INFO mapred.JobClient: ROWS=52428500
>>> 12/02/01 13:42:52 INFO mapred.JobClient: ELAPSED_TIME=8158664
>>> 
>>> Would I be correct in thinking that this is way below what is to be
>>> expected of this hardware?
>>> We're setting up ganglia now to start debugging, but any suggestions
>>> on how to diagnose this would be greatly appreciated.
>>> 
>>> Thanks!
>>> Tim
> 


Re: PerformanceEvaluation results

2012-02-02 Thread Tim Robertson
Thanks all for the comments.  Ganglia set up is in progress.  We'll
keep plugging away.

I should mention that this is our first real dev cluster for
evaluation, and production would likely be more like a 6-7+ node
cluster of better machines, but for sure we are the small fry
leprechauns Ted Dunning refers to in his presentations - we're trying
to understand the potential and do some cost calculations before
buying hardware.

I do feel the HBase project would benefit from some example metrics
for various operations and hardware or else it will remain a difficult
technology for some people to get into with confidence.  We'll blog
our findings, and hopefully it might be of benefit to other
leprechauns.  If we can prove the concept, we're more likely to be
able to get $ to grow.




On Thu, Feb 2, 2012 at 5:24 AM, Michel Segel  wrote:
> Tim,
>
> Here's the problem in a nutshell,
> With respect to hardware, you have  5.4k rpms ? 6 drive and 8 cores?
> Small slow drives, and still  a ratio less than one when you compare drives 
> to spindles.
>
> I appreciate that you want to maximize performance, but when it comes to 
> tuning, you have to start before you get your hardware.
>
>  You are asking a question about tuning, but how can we answer if the numbers 
> are ok?
> Have you looked at your GCs and implemented mslabs? We don't know. Network 
> configuration?
>
> I mean that there's a lot missing and fine tuning a cluster is something you 
> have to do on your own. I guess I could say your numbers look fine to me for 
> that config... But honestly, it would be a swag.
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Feb 1, 2012, at 7:09 AM, Tim Robertson  wrote:
>
>> Thanks Michael,
>>
>> It's a small cluster, but is the hardware so bad?  We are particularly
>> interested in relatively low load for random read write (2000
>> transactions per second on <1k rows) but a decent full table scan
>> speed, as we aim to mount Hive tables on HBase backed tables.
>>
>> Regarding tuning... not exactly sure which you would be interested in
>> seeing.  The config is all here:
>> http://code.google.com/p/gbif-common-resources/source/browse/#svn%2Fcluster-puppet%2Fmodules%2Fhadoop%2Ftemplates
>>
>> Cheers,
>> Tim
>>
>>
>>
>> On Wed, Feb 1, 2012 at 1:56 PM, Michael Segel  
>> wrote:
>>> No.
>>> What tuning did you do?
>>> Why such a small cluster?
>>>
>>> Sorry, but when you start off with a bad hardware configuration, you can 
>>> get Hadoop/HBase to work, but performance will always be sub-optimal.
>>>
>>>
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 1, 2012, at 6:52 AM, "Tim Robertson"  
>>> wrote:
>>>
 Hi all,

 We have a 3 node cluster (CD3u2) with the following hardware:

 RegionServers (+DN + TT)
  CPU: 2x Intel(R) Xeon(R) CPU E5630 @ 2.53GHz (quad)
  Disks: 6x250G SATA 5.4K
  Memory: 24GB

 Master (+ZK, JT, NN)
  CPU: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz, 2x6MB (quad)
  Disks: 2x500G SATA 7.2K
  Memory: 8GB

 Memory wise, we have:
 Master:
  NN: 1GB
  JT: 1GB
  HBase master: 6GB
  ZK: 1GB
 RegionServers:
  RegionServer: 6GB
  TaskTracker: 1GB
  11 Mappers @ 1GB each
  7 Reducers @ 1GB each

 HDFS was empty, and I ran randomWrite and scan both with number
 clients of 50 (seemed to spawn 500 Mappers though...)

 randomWrite:
 12/02/01 13:27:47 INFO mapred.JobClient:     ROWS=52428500
 12/02/01 13:27:47 INFO mapred.JobClient:     ELAPSED_TIME=84504886

 scan:
 12/02/01 13:42:52 INFO mapred.JobClient:     ROWS=52428500
 12/02/01 13:42:52 INFO mapred.JobClient:     ELAPSED_TIME=8158664

 Would I be correct in thinking that this is way below what is to be
 expected of this hardware?
 We're setting up ganglia now to start debugging, but any suggestions
 on how to diagnose this would be greatly appreciated.

 Thanks!
 Tim
>>


Re: PerformanceEvaluation results

2012-02-02 Thread Stack
On Thu, Feb 2, 2012 at 8:00 AM, Tim Robertson  wrote:
> I do feel the HBase project would benefit from some example metrics
> for various operations and hardware or else it will remain a difficult
> technology for some people to get into with confidence.  We'll blog
> our findings, and hopefully it might be of benefit to other
> leprechauns.  If we can prove the concept, we're more likely to be
> able to get $ to grow.
>
>

Agree (except for the bit where you look like a leprechaun).  Would be
cool if folks published what stats they see doing various operations
in hbase on a specific hardware.  Previous I'd have thought the
deploys, configs., etc., too various but I suppose you have to start
somewhere.

Go easy Tim,
St.Ack


Re: PerformanceEvaluation results

2012-02-07 Thread Lars Francke
Hi Stack, Hi everyone,

>> I do feel the HBase project would benefit from some example metrics
>> for various operations and hardware or else it will remain a difficult
>> technology for some people to get into with confidence.  We'll blog
>> our findings, and hopefully it might be of benefit to other
>> leprechauns.  If we can prove the concept, we're more likely to be
>> able to get $ to grow.
>
> Agree (except for the bit where you look like a leprechaun).  Would be
> cool if folks published what stats they see doing various operations
> in hbase on a specific hardware.  Previous I'd have thought the
> deploys, configs., etc., too various but I suppose you have to start
> somewhere.

I too agree.

>From my experience there are a lot of small companies[4] which can't
afford or need large clusters and don't have the knowledge and
resources to fully optimize a cluster. We're certainly one of those
organizations. It's already a challenge for us to follow the rapid
development in the projects we're using (Hadoop, HBase, Oozie, Hive,
etc.). We're still putting Hadoop and HBase to good use and it's
tremendously helpful.

As all our work is Open Source we're in the very fortunate position to
being able to point to all our configs[1], workflows[2] and metrics
(Ganglia now up and public)[3] etc. and ask for recommendations based
on that but a lot of other companies don't enjoy that privilege. We're
more than willing to provide information and even test out different
configurations on our (admittedly small and aging) cluster and we
would hope that this'll prove helpful for others as well.

It is worth noting that we do plan to buy new and better hardware, but
need to understand the technologies and capabilities to make some
informed choices before spending our total yearly hardware budget.
Therefore, understanding the behavior even on lesser quality hardware
is still important for us.

Thanks for all the past and (hopefully) future help and it's great to
finally be able to work with HBase again.

Cheers,
Lars

PS: Tim and I work at the same organization

[1] 

[2] 

[3] 
[4] See also the cluster sizes on 


Re: PerformanceEvaluation results

2012-02-07 Thread Stack
On Tue, Feb 7, 2012 at 3:27 AM, Lars Francke  wrote:
> [1] 
> 

I don't see your hbase-site.xml up here Lars.  Am I looking in the wrong place?

Good on you,

St.Ack


Re: PerformanceEvaluation results

2012-02-08 Thread Tim Robertson
Hey Stack,

Because we run a couple clusters now, we're using templating for the
*.site.xml etc.

You'll find them in:
  
http://code.google.com/p/gbif-common-resources/source/browse/cluster-puppet/modules/hadoop/templates/

The values for the HBase 3 node cluster come from:
  
http://code.google.com/p/gbif-common-resources/source/browse/cluster-puppet/manifests/cluster2.pp

Thanks for looking - really appreciate some expert eyes on this!

Tim




On Tue, Feb 7, 2012 at 11:39 PM, Stack  wrote:
> On Tue, Feb 7, 2012 at 3:27 AM, Lars Francke  wrote:
>> [1] 
>> 
>
> I don't see your hbase-site.xml up here Lars.  Am I looking in the wrong 
> place?
>
> Good on you,
>
> St.Ack