Hi Jeffrey, The performance.py from HDP 2.1 does not work out of the box - it is not looking in the right place for the phoenix jar files (even with PHOENIX_LIB_DIR set). I downloaded the tarball release from apache and that one works. After running the script and the counts as you suggested I saw no increase in CPU usage or stuck RpcServer.handlers. I ran a few scans as well just for good measure. I used the HDP sqlline.py to do the queries.
Re-running the count on one of our tables I did see the problem - but can add one more bit of info: the only affected regionserver is the regionserver that hosts the last region of the scanned table, and I cannot perfectly reproduce it. It seems to happen about 80% of the time. Also, if there is a stuck RpcServer.handlers it also prevents the regionserver from exiting, it needs to be killed with -9. I have one regionserver running with a few stuck handlers, any commands that you would like me to run on it? thank you, -chris On Wed, May 14, 2014 at 5:54 PM, Jeffrey Zhong <[email protected]>wrote: > > Hey Chris, > > I used performance.py tool which created a table with 50K rows in one > table, run the following query from sqlline.py and everything seems fine > without seeing CPU running hot. > > 0: jdbc:phoenix:hor11n21.gq1.ygridcore.net> select count(*) from > PERFORMANCE_50000; > +------------+ > | COUNT(1) | > +------------+ > | 50000 | > +------------+ > 1 row selected (0.166 seconds) > 0: jdbc:phoenix:hor11n21.gq1.ygridcore.net> select count(*) from > PERFORMANCE_50000; > +------------+ > | COUNT(1) | > +------------+ > | 50000 | > +------------+ > 1 row selected (0.167 seconds) > > Is there anyway could you run profiler to see where the CPU goes? > > > > On 5/13/14 6:40 PM, "Chris Tarnas" <[email protected]> wrote: > > >Ahh, yes. Here is a pastebin for it: > > > >http://pastebin.com/w6mtabag > > > >thanks again, > >-chris > > > >On May 13, 2014, at 7:47 PM, Nick Dimiduk <[email protected]> wrote: > > > >> Hi Chris, > >> > >> Attachments are filtered out by the mail server. Can you pastebin it > >>some > >> place? > >> > >> Thanks, > >> Nick > >> > >> > >> On Tue, May 13, 2014 at 2:56 PM, Chris Tarnas > >><[email protected]>wrote: > >> > >>> Hello, > >>> > >>> We set the HBase RegionServer Handler to 10 (it appears to have been > >>>set > >>> to 60 by Ambari during install process). Now we have narrowed down what > >>> causes the CPU to increase and have some detailed logs: > >>> > >>> If we connect using sqlline.py and execute a select that selects one > >>>row > >>> using the primary_key, no increate in CPU is observed and the number > >>>of RPC > >>> threads in a RUNNABLE state remains the same. > >>> > >>> If we execute a select that scans the table such as "select count(*) > >>>from > >>> TABLE" or where the "where" clause only limits on non-primary key > >>> attributes, then the number of RUNNABLE RpcServer.handler threads > >>>increases > >>> and the CPU utilization of the regionserver increases by ~105%. > >>> > >>> Disconnecting the client does not have an effect and the > >>>RpcServer.handler > >>> thread is left RUNNABLE and the CPU stays at the higher usage. > >>> > >>> Checking the Web Console for the Regionserver just shows 10 > >>> RpcServer.reader tasks, all in a WAITING state, no other monitored > >>>tasks > >>> are happening. The regionserver has a Max Heap of 10G and a Used heap > >>>of > >>> 445.2M. > >>> > >>> I've attached the regionserver log with IPC debug logging turned on > >>>right > >>> when one of the Phoenix statements is executed (this statement actually > >>> used up the last available handler). > >>> > >>> thanks, > >>> -chris > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> On May 12, 2014, at 5:32 PM, Jeffrey Zhong <[email protected]> > >>>wrote: > >>> > >>>> > >>>> From the stack, it seems you increase the default rpc handler number > >>>>to > >>>> about 60. All handlers are serving Get request(You can search > >>>> > >>> > >>>org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.jav > >>>a:2 > >>>> 841). > >>>> > >>>> You can check why there are so many get requests by adding some log > >>>>info > >>>> or enable hbase rpc trace. I guess if you decrease the number of rpc > >>>> handlers per region server, it will mitigate your current issue. > >>>> > >>>> > >>>> On 5/12/14 2:28 PM, "Chris Tarnas" <[email protected]> wrote: > >>>> > >>>>> We have hit a problem with Phoenix and regionservers CPU usage > >>>>>spiking > >>> up > >>>>> to use all available CPU and becoming unresponsive. > >>>>> > >>>>> After HDP 2.1 was released we setup a 4 compute node cluster (with 3 > >>>>> VMWare "master" nodes) to test out Phoenix on it. It is a plain > >>>>>Ambari > >>>>> 1.5/HDP 2.1 install and we added the HDP Phoenix RPM release and hand > >>>>> linked in the jar files to the hadoop lib. Everything was going well > >>>>>and > >>>>> we were able to load in ~30k records into several tables. What > >>>>>happened > >>>>> was after about 3-4 days of being up the regionservers became > >>>>> unresponsive and started to use most of the available CPU (12 core > >>>>> boxes). Nothing terribly informative was in the logs (initially we > >>>>>saw > >>>>> some flush messages that seemed excessive, but that was not all of > >>>>>the > >>>>> time and we changed back to the standard HBase WAL codec). We are > >>>>>able > >>> to > >>>>> kill the unresponsive regionservers and then restart them, the > >>>>>cluster > >>>>> will be fine for a day or so but will start to lock up again. > >>>>> > >>>>> We've dropped the entire HBase and zookeper information and started > >>>>>from > >>>>> scratch, but that has not helped. > >>>>> > >>>>> James Taylor suggested I send this off here. I've attached a jstack > >>>>> report of a locked up regionserver in hopes that someone can shed > >>>>>some > >>>>> light. > >>>>> > >>>>> thanks, > >>>>> -chris > >>>>> > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> CONFIDENTIALITY NOTICE > >>>> NOTICE: This message is intended for the use of the individual or > >>>>entity > >>> to > >>>> which it is addressed and may contain information that is > >>>>confidential, > >>>> privileged and exempt from disclosure under applicable law. If the > >>>>reader > >>>> of this message is not the intended recipient, you are hereby notified > >>> that > >>>> any printing, copying, dissemination, distribution, disclosure or > >>>> forwarding of this communication is strictly prohibited. If you have > >>>> received this communication in error, please contact the sender > >>> immediately > >>>> and delete it from your system. Thank You. > >>> > >>> > >>> > > > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. >
