Tianying Chang created HBASE-15155:
--------------------------------------

             Summary: Show All RPC handler tasks stop working after cluster is 
under heavy load for a while
                 Key: HBASE-15155
                 URL: https://issues.apache.org/jira/browse/HBASE-15155
             Project: HBase
          Issue Type: Bug
          Components: monitoring
    Affects Versions: 0.94.19, 1.0.0, 0.98.0
            Reporter: Tianying Chang
            Assignee: Tianying Chang


After we upgrade from 94.7 to 94.26 and 1.0, we found that "Show All RPC 
handler status" link on RS webUI stops working after running in production 
cluster with relatively high load for several days.  

Turn out to be it is a bug introduced by 
https://issues.apache.org/jira/browse/HBASE-10312 The BoundedFIFOBuffer cause 
RPCHandler Status overriden/removed permanently when there is a spike of 
non-RPC tasks status that is over the MAX_SIZE (1000).  So as long as the RS 
experienced "high" load once, the RPC status monitoring is gone forever, until 
RS is restarted. 

 We added a unit test that can repro this. And the fix can pass the test.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to