In the previous stack trace you sent, shortCompactions and longCompactions threads were not active.
Was the stack trace captured during period when the number of client operations was low ? If not, can you capture stack trace during off peak hours ? Cheers On Mon, Sep 10, 2018 at 12:08 PM Srinidhi Muppalla <srinid...@trulia.com> wrote: > Hi Ted, > > The highest number of filters used is 10, but the average is generally > close to 1. Is it possible the CPU usage spike has to do with Hbase > internal maintenance operations? It looks like post-upgrade the spike isn’t > correlated with the frequency of reads/writes we are making, because the > high CPU usage persisted when the number of operations went down. > > Thank you, > Srinidhi > > On 9/8/18, 9:44 AM, "Ted Yu" <yuzhih...@gmail.com> wrote: > > Srinidhi : > Do you know the average / highest number of ColumnPrefixFilter's in the > FilterList ? > > Thanks > > On Fri, Sep 7, 2018 at 10:00 PM Ted Yu <yuzhih...@gmail.com> wrote: > > > Thanks for detailed background information. > > > > I assume your code has done de-dup for the filters contained in > > FilterListWithOR. > > > > I took a look at JIRAs which > > touched hbase-client/src/main/java/org/apache/hadoop/hbase/filter in > > branch-1.4 > > There were a few patches (some were very big) since the release of > 1.3.0 > > So it is not obvious at first glance which one(s) might be related. > > > > I noticed ColumnPrefixFilter.getNextCellHint (and > > KeyValueUtil.createFirstOnRow) appearing many times in the stack > trace. > > > > I plan to dig more in this area. > > > > Cheers > > > > On Fri, Sep 7, 2018 at 11:30 AM Srinidhi Muppalla < > srinid...@trulia.com> > > wrote: > > > >> Sure thing. For our table schema, each row represents one user and > the > >> row key is that user’s unique id in our system. We currently only > use one > >> column family in the table. The column qualifiers represent an item > that > >> has been surfaced to that user as well as additional information to > >> differentiate the way the item has been surfaced to the user. > Without > >> getting into too many specifics, the qualifier follows the rough > format of: > >> > >> “Channel-itemId-distinguisher”. > >> > >> The channel here is the channel through the item was previously > surfaced > >> to the user. The itemid is the unique id of the item that has been > surfaced > >> to the user. A distinguisher is some attribute about how that item > was > >> surfaced to the user. > >> > >> When we run a scan, we currently only ever run it on one row at a > time. > >> It was chosen over ‘get’ because (from our understanding) the > performance > >> difference is negligible, and down the road using scan would allow > us some > >> more flexibility. > >> > >> The filter list that is constructed with scan works by using a > >> ColumnPrefixFilter as you mentioned. When a user is being > communicated to > >> on a particular channel, we have a list of items that we want to > >> potentially surface for that user. So, we construct a prefix list > with the > >> channel and each of the item ids in the form of: “channel-itemId”. > Then we > >> run a scan on that row with that filter list using “WithOr” to get > all of > >> the matching channel-itemId combinations currently in that > row/column > >> family in the table. This way we can then know which of the items > we want > >> to surface to that user on that channel have already been surfaced > on that > >> channel. The reason we query using a prefix filter is so that we > don’t need > >> to know the ‘distinguisher’ part of the record when writing the > actual > >> query, because the distinguisher is only relevant in certain > circumstances. > >> > >> Let me know if this is the information about our query pattern that > you > >> were looking for and if there is anything I can clarify or add. > >> > >> Thanks, > >> Srinidhi > >> > >> On 9/6/18, 12:24 PM, "Ted Yu" <yuzhih...@gmail.com> wrote: > >> > >> From the stack trace, ColumnPrefixFilter is used during scan. > >> > >> Can you illustrate how various filters are formed thru > >> FilterListWithOR ? > >> It would be easier for other people to reproduce the problem > given > >> your > >> query pattern. > >> > >> Cheers > >> > >> On Thu, Sep 6, 2018 at 11:43 AM Srinidhi Muppalla < > >> srinid...@trulia.com> > >> wrote: > >> > >> > Hi Vlad, > >> > > >> > Thank you for the suggestion. I recreated the issue and > attached > >> the stack > >> > traces I took. Let me know if there’s any other info I can > provide. > >> We > >> > narrowed the issue down to occurring when upgrading from > 1.3.0 to > >> any 1.4.x > >> > version. > >> > > >> > Thanks, > >> > Srinidhi > >> > > >> > On 9/4/18, 8:19 PM, "Vladimir Rodionov" < > vladrodio...@gmail.com> > >> wrote: > >> > > >> > Hi, Srinidhi > >> > > >> > Next time you will see this issue, take jstack of a RS > several > >> times > >> > in a > >> > row. W/o stack traces it is hard > >> > to tell what was going on with your cluster after upgrade. > >> > > >> > -Vlad > >> > > >> > > >> > > >> > On Tue, Sep 4, 2018 at 3:50 PM Srinidhi Muppalla < > >> srinid...@trulia.com > >> > > > >> > wrote: > >> > > >> > > Hello all, > >> > > > >> > > We are currently running Hbase 1.3.0 on an EMR cluster > >> running EMR > >> > 5.5.0. > >> > > Recently, we attempted to upgrade our cluster to using > Hbase > >> 1.4.4 > >> > (along > >> > > with upgrading our EMR cluster to 5.16). After > upgrading, the > >> CPU > >> > usage for > >> > > all of our region servers spiked up to 90%. The > load_one for > >> all of > >> > our > >> > > servers spiked from roughly 1-2 to 10 threads. After > >> upgrading, the > >> > number > >> > > of operations to the cluster hasn’t increased. After > giving > >> the > >> > cluster a > >> > > few hours, we had to revert the upgrade. From the logs, > we are > >> > unable to > >> > > tell what is occupying the CPU resources. Is this a > known > >> issue with > >> > 1.4.4? > >> > > Any guidance or ideas for debugging the cause would be > greatly > >> > > appreciated. What are the best steps for debugging CPU > usage? > >> > > > >> > > Thank you, > >> > > Srinidhi > >> > > > >> > > >> > > >> > > >> > >> > >> > > >