Ahh, yes. Here is a pastebin for it: http://pastebin.com/w6mtabag
thanks again, -chris On May 13, 2014, at 7:47 PM, Nick Dimiduk <[email protected]> wrote: > Hi Chris, > > Attachments are filtered out by the mail server. Can you pastebin it some > place? > > Thanks, > Nick > > > On Tue, May 13, 2014 at 2:56 PM, Chris Tarnas <[email protected]>wrote: > >> Hello, >> >> We set the HBase RegionServer Handler to 10 (it appears to have been set >> to 60 by Ambari during install process). Now we have narrowed down what >> causes the CPU to increase and have some detailed logs: >> >> If we connect using sqlline.py and execute a select that selects one row >> using the primary_key, no increate in CPU is observed and the number of RPC >> threads in a RUNNABLE state remains the same. >> >> If we execute a select that scans the table such as "select count(*) from >> TABLE" or where the "where" clause only limits on non-primary key >> attributes, then the number of RUNNABLE RpcServer.handler threads increases >> and the CPU utilization of the regionserver increases by ~105%. >> >> Disconnecting the client does not have an effect and the RpcServer.handler >> thread is left RUNNABLE and the CPU stays at the higher usage. >> >> Checking the Web Console for the Regionserver just shows 10 >> RpcServer.reader tasks, all in a WAITING state, no other monitored tasks >> are happening. The regionserver has a Max Heap of 10G and a Used heap of >> 445.2M. >> >> I've attached the regionserver log with IPC debug logging turned on right >> when one of the Phoenix statements is executed (this statement actually >> used up the last available handler). >> >> thanks, >> -chris >> >> >> >> >> >> >> >> On May 12, 2014, at 5:32 PM, Jeffrey Zhong <[email protected]> wrote: >> >>> >>> From the stack, it seems you increase the default rpc handler number to >>> about 60. All handlers are serving Get request(You can search >>> >> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2 >>> 841). >>> >>> You can check why there are so many get requests by adding some log info >>> or enable hbase rpc trace. I guess if you decrease the number of rpc >>> handlers per region server, it will mitigate your current issue. >>> >>> >>> On 5/12/14 2:28 PM, "Chris Tarnas" <[email protected]> wrote: >>> >>>> We have hit a problem with Phoenix and regionservers CPU usage spiking >> up >>>> to use all available CPU and becoming unresponsive. >>>> >>>> After HDP 2.1 was released we setup a 4 compute node cluster (with 3 >>>> VMWare "master" nodes) to test out Phoenix on it. It is a plain Ambari >>>> 1.5/HDP 2.1 install and we added the HDP Phoenix RPM release and hand >>>> linked in the jar files to the hadoop lib. Everything was going well and >>>> we were able to load in ~30k records into several tables. What happened >>>> was after about 3-4 days of being up the regionservers became >>>> unresponsive and started to use most of the available CPU (12 core >>>> boxes). Nothing terribly informative was in the logs (initially we saw >>>> some flush messages that seemed excessive, but that was not all of the >>>> time and we changed back to the standard HBase WAL codec). We are able >> to >>>> kill the unresponsive regionservers and then restart them, the cluster >>>> will be fine for a day or so but will start to lock up again. >>>> >>>> We've dropped the entire HBase and zookeper information and started from >>>> scratch, but that has not helped. >>>> >>>> James Taylor suggested I send this off here. I've attached a jstack >>>> report of a locked up regionserver in hopes that someone can shed some >>>> light. >>>> >>>> thanks, >>>> -chris >>>> >>>> >>> >>> >>> >>> -- >>> CONFIDENTIALITY NOTICE >>> NOTICE: This message is intended for the use of the individual or entity >> to >>> which it is addressed and may contain information that is confidential, >>> privileged and exempt from disclosure under applicable law. If the reader >>> of this message is not the intended recipient, you are hereby notified >> that >>> any printing, copying, dissemination, distribution, disclosure or >>> forwarding of this communication is strictly prohibited. If you have >>> received this communication in error, please contact the sender >> immediately >>> and delete it from your system. Thank You. >> >> >>
