[ https://issues.apache.org/jira/browse/HBASE-21991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783982#comment-16783982 ]
Sakthi edited comment on HBASE-21991 at 3/5/19 2:18 AM: -------------------------------------------------------- Regarding the race condition: * With multiple rpc handler threads trying to access the map of <meter names,meters>, a thread might try to access a metric that was removed by some other thread from the map as shown below: {code:java} private void markMeterIfPresent(String requestMeter) { ... if (requestsMap.containsKey(requestMeter) && requestsMap.get(requestMeter).isPresent()) { Meter metric = (Meter) requestsMap.get(requestMeter).get();// <-----------Thread-1 trying to access the // -----------metric here that was removed by Thread-2 below---------------------------- ... } } private void registerLossyCountingMeterIfNotPresent(...) { ... Set<String> metersToBeRemoved = lossyCounting.addByOne(requestMeter); if(!requestsMap.containsKey(requestMeter) && metersToBeRemoved.contains(requestMeter)){ for(String meter: metersToBeRemoved) { requestsMap.remove(meter); //------------------> Thread-2 removing the meter here ... } ... } ... } {code} Verified with a unit test. Frequently got errors of the below kind: {code:none} regionserver.HRegionServer: ***** ABORTING region server x.y.z.24,16020,1548747043814: The coprocessor org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw java.lang.NullPointerException ***** {code} was (Author: jatsakthi): Regarding the race condition: * With multiple rcp handler threads trying to access the map of <meter names,meters>, a thread might try to access a metric that was removed by some other thread from the map as shown below: {code:java} private void markMeterIfPresent(String requestMeter) { ... if (requestsMap.containsKey(requestMeter) && requestsMap.get(requestMeter).isPresent()) { Meter metric = (Meter) requestsMap.get(requestMeter).get();// <-----------Thread-1 trying to access the // -----------metric here that was removed by Thread-2 below---------------------------- ... } } private void registerLossyCountingMeterIfNotPresent(...) { ... Set<String> metersToBeRemoved = lossyCounting.addByOne(requestMeter); if(!requestsMap.containsKey(requestMeter) && metersToBeRemoved.contains(requestMeter)){ for(String meter: metersToBeRemoved) { requestsMap.remove(meter); //------------------> Thread-2 removing the meter here ... } ... } ... } {code} Verified with a unit test. Frequently got errors of the below kind: {code:none} regionserver.HRegionServer: ***** ABORTING region server x.y.z.24,16020,1548747043814: The coprocessor org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw java.lang.NullPointerException ***** {code} > Fix MetaMetrics issues - [Race condition, Faulty remove logic], few > improvements > -------------------------------------------------------------------------------- > > Key: HBASE-21991 > URL: https://issues.apache.org/jira/browse/HBASE-21991 > Project: HBase > Issue Type: Bug > Components: Coprocessors, metrics > Reporter: Sakthi > Assignee: Sakthi > Priority: Major > > Here is a list of the issues related to the MetaMetrics implementation: > +*Bugs*+: > # [_Lossy counting for top-k_] *Faulty remove logic of non-eligible meters*: > Under certain conditions, we might end up storing/exposing all the meters > rather than top-k-ish > # MetaMetrics can throw NPE resulting in aborting of the RS because of a > *Race Condition*. > +*Improvements*+: > # With high number of regions in the cluster, exposure of metrics for each > region blows up the JMX from ~140 Kbs to 100+ Mbs depending on the number of > regions. It's better to use *lossy counting to maintain top-k for region > metrics* as well. > # As the lossy meters do not represent actual counts, I think, it'll be > better to *rename the meters to include "lossy" in the name*. It would be > more informative while monitoring the metrics and there would be less > confusion regarding actual counts to lossy counts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)