Apologies for addressing this to Kafka Dev. That should have been HBase Dev.

From: Mike Freyberger <mfreyber...@appnexus.com>
Date: Tuesday, August 7, 2018 at 11:04 AM
To: "dev@hbase.apache.org" <dev@hbase.apache.org>
Subject: Hbase mutate is hogging my CPU

Kafka Dev,

I’d love some help investigating a slow Hbase mutator.

The cluster is Hbase 1.2 and cluster has 22 region servers. The region servers 
are pretty big: 24 cores, 126 GB RAM.

The cluster has 2 tables, each only have 1 column family. Both tables have the 
same pre splits.

Each table is pre split into 400 regions. The split keys are all 2 bytes and 
evenly divide the key space.

The keys are 13 bytes. The key is formed by concatenating:
1 byte kafka partition
8 byte random int
4 byte timestamp (second level granularity)

The workload is 100% write for now. There are about 1M writes per second with a 
total data volume of .6GB per second.

I find that my application is spending the majority of its CPU time (71.7%) 
calling org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate (), which is 
in turn spending most of its time calling 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion
 ().

Attached are two images showing the performance of my application. The first is 
an overview showing that my application is spending a lot of in mutate. The 
next is a deep dive into the functions that mutate is calling internally.

I am very surprised to see this function taking so long. My intuition is that 
all this needs to do is:
1) Determine which region the Mutation belongs in
2) Append the Mutation to a queue for async write to HBase.


Any thoughts, comments of suggestions from the community would be much 
appreciated! I’m really hoping to improve the performance profile here so that 
my CPU can be freed up.

Thanks,

Mike Freyberger

Reply via email to