[ 
https://issues.apache.org/jira/browse/HBASE-21991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783982#comment-16783982
 ] 

Sakthi edited comment on HBASE-21991 at 3/5/19 2:18 AM:
--------------------------------------------------------

Regarding the race condition:
 * With multiple rpc handler threads trying to access the map of <meter 
names,meters>, a thread might try to access a metric that was removed by some 
other thread from the map as shown below:

{code:java}
   private void markMeterIfPresent(String requestMeter) {
      ...
      if (requestsMap.containsKey(requestMeter) && 
requestsMap.get(requestMeter).isPresent()) {    
      Meter metric = (Meter) requestsMap.get(requestMeter).get();// 
<-----------Thread-1 trying to access the 
      // -----------metric here that was removed by Thread-2 
below----------------------------
        ...
      }
    }

    private void registerLossyCountingMeterIfNotPresent(...) {
      ...
      Set<String> metersToBeRemoved = lossyCounting.addByOne(requestMeter);
      if(!requestsMap.containsKey(requestMeter) && 
metersToBeRemoved.contains(requestMeter)){
        for(String meter: metersToBeRemoved) {
          requestsMap.remove(meter); //------------------> Thread-2 removing 
the meter here
          ...
        }
        ...
      }
      ...
    }
{code}
 

Verified with a unit test. Frequently got errors of the below kind:
{code:none}
regionserver.HRegionServer: ***** ABORTING region server 
x.y.z.24,16020,1548747043814: The coprocessor 
org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw 
java.lang.NullPointerException *****
{code}


was (Author: jatsakthi):
Regarding the race condition:
 * With multiple rcp handler threads trying to access the map of <meter 
names,meters>, a thread might try to access a metric that was removed by some 
other thread from the map as shown below:

{code:java}
   private void markMeterIfPresent(String requestMeter) {
      ...
      if (requestsMap.containsKey(requestMeter) && 
requestsMap.get(requestMeter).isPresent()) {    
      Meter metric = (Meter) requestsMap.get(requestMeter).get();// 
<-----------Thread-1 trying to access the 
      // -----------metric here that was removed by Thread-2 
below----------------------------
        ...
      }
    }

    private void registerLossyCountingMeterIfNotPresent(...) {
      ...
      Set<String> metersToBeRemoved = lossyCounting.addByOne(requestMeter);
      if(!requestsMap.containsKey(requestMeter) && 
metersToBeRemoved.contains(requestMeter)){
        for(String meter: metersToBeRemoved) {
          requestsMap.remove(meter); //------------------> Thread-2 removing 
the meter here
          ...
        }
        ...
      }
      ...
    }
{code}
 

Verified with a unit test. Frequently got errors of the below kind:
{code:none}
regionserver.HRegionServer: ***** ABORTING region server 
x.y.z.24,16020,1548747043814: The coprocessor 
org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw 
java.lang.NullPointerException *****
{code}





> Fix MetaMetrics issues - [Race condition, Faulty remove logic], few 
> improvements
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-21991
>                 URL: https://issues.apache.org/jira/browse/HBASE-21991
>             Project: HBase
>          Issue Type: Bug
>          Components: Coprocessors, metrics
>            Reporter: Sakthi
>            Assignee: Sakthi
>            Priority: Major
>
> Here is a list of the issues related to the MetaMetrics implementation:
> +*Bugs*+:
>  # [_Lossy counting for top-k_] *Faulty remove logic of non-eligible meters*: 
> Under certain conditions, we might end up storing/exposing all the meters 
> rather than top-k-ish
>  # MetaMetrics can throw NPE resulting in aborting of the RS because of a 
> *Race Condition*.
> +*Improvements*+:
>  # With high number of regions in the cluster, exposure of metrics for each 
> region blows up the JMX from ~140 Kbs to 100+ Mbs depending on the number of 
> regions. It's better to use *lossy counting to maintain top-k for region 
> metrics* as well.
>  # As the lossy meters do not represent actual counts, I think, it'll be 
> better to *rename the meters to include "lossy" in the name*. It would be 
> more informative while monitoring the metrics and there would be less 
> confusion regarding actual counts to lossy counts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to