Re: Regionserver burns CPU and stops responding to RPC calls on HDP 2.1

Chris Tarnas Tue, 13 May 2014 18:48:13 -0700

Ahh, yes. Here is a pastebin for it:

http://pastebin.com/w6mtabag


thanks again,
-chris

On May 13, 2014, at 7:47 PM, Nick Dimiduk <[email protected]> wrote:

> Hi Chris,
> 
> Attachments are filtered out by the mail server. Can you pastebin it some
> place?
> 
> Thanks,
> Nick
> 
> 
> On Tue, May 13, 2014 at 2:56 PM, Chris Tarnas <[email protected]>wrote:
> 
>> Hello,
>> 
>> We set the HBase RegionServer Handler to 10 (it appears to have been set
>> to 60 by Ambari during install process). Now we have narrowed down what
>> causes the CPU to increase and have some detailed logs:
>> 
>> If we connect using sqlline.py and execute a select that selects one row
>> using the primary_key, no increate in CPU is observed and the number of RPC
>> threads in a RUNNABLE state remains the same.
>> 
>> If we execute a select that scans the table such as "select count(*) from
>> TABLE" or where the "where" clause only limits on non-primary key
>> attributes, then the number of RUNNABLE RpcServer.handler threads increases
>> and the CPU utilization of the regionserver increases by ~105%.
>> 
>> Disconnecting the client does not have an effect and the RpcServer.handler
>> thread is left RUNNABLE and the CPU stays at the higher usage.
>> 
>> Checking the Web Console for the Regionserver just shows 10
>> RpcServer.reader tasks, all in a WAITING state, no other monitored tasks
>> are happening. The regionserver has a Max Heap of 10G and a Used heap of
>> 445.2M.
>> 
>> I've attached the regionserver log with IPC debug logging turned on right
>> when one of the Phoenix statements is executed (this statement actually
>> used up the last available handler).
>> 
>> thanks,
>> -chris
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On May 12, 2014, at 5:32 PM, Jeffrey Zhong <[email protected]> wrote:
>> 
>>> 
>>> From the stack, it seems you increase the default rpc handler number to
>>> about 60. All handlers are serving Get request(You can search
>>> 
>> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2
>>> 841).
>>> 
>>> You can check why there are so many get requests by adding some log info
>>> or enable hbase rpc trace. I guess if you decrease the number of rpc
>>> handlers per region server, it will mitigate your current issue.
>>> 
>>> 
>>> On 5/12/14 2:28 PM, "Chris Tarnas" <[email protected]> wrote:
>>> 
>>>> We have hit a problem with Phoenix and regionservers CPU usage spiking
>> up
>>>> to use all available CPU and becoming unresponsive.
>>>> 
>>>> After HDP 2.1 was released we setup a 4 compute node cluster (with 3
>>>> VMWare "master" nodes) to test out Phoenix on it. It is a plain Ambari
>>>> 1.5/HDP 2.1 install and we added the HDP Phoenix RPM release and hand
>>>> linked in the jar files to the hadoop lib. Everything was going well and
>>>> we were able to load in ~30k records into several tables. What happened
>>>> was after about 3-4 days of being up the regionservers became
>>>> unresponsive and started to use most of the available CPU (12 core
>>>> boxes). Nothing terribly informative was in the logs (initially we saw
>>>> some flush messages that seemed excessive, but that was not all of the
>>>> time and we changed back to the standard HBase WAL codec). We are able
>> to
>>>> kill the unresponsive regionservers and then restart them, the cluster
>>>> will be fine for a day or so but will start to lock up again.
>>>> 
>>>> We've dropped the entire HBase and zookeper information and started from
>>>> scratch, but that has not helped.
>>>> 
>>>> James Taylor suggested I send this off here. I've attached a jstack
>>>> report of a locked up regionserver in hopes that someone can shed some
>>>> light.
>>>> 
>>>> thanks,
>>>> -chris
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>> to
>>> which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified
>> that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender
>> immediately
>>> and delete it from your system. Thank You.
>> 
>> 
>>

Re: Regionserver burns CPU and stops responding to RPC calls on HDP 2.1

Reply via email to