Ted Yu created HBASE-21139: ------------------------------ Summary: Concurrent invocations of MetricsTableAggregateSourceImpl.getOrCreateTableSource may return unregistered MetricsTableSource Key: HBASE-21139 URL: https://issues.apache.org/jira/browse/HBASE-21139 Project: HBase Issue Type: Bug Reporter: Ted Yu
>From test output of TestRestoreFlushSnapshotFromClient : {code} 2018-09-01 21:09:38,174 WARN [member: 'hw13463.attlocal.net,49623,1535861370108' subprocedure-pool6-thread-1] snapshot. RegionServerSnapshotManager$SnapshotSubprocedurePool(348): Got Exception in SnapshotSubprocedurePool java.util.concurrent.ExecutionException: java.lang.NullPointerException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:324) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:173) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:193) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:189) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.MetricsTableSourceImpl.updateFlushTime(MetricsTableSourceImpl.java:375) at org.apache.hadoop.hbase.regionserver.MetricsTable.updateFlushTime(MetricsTable.java:56) at org.apache.hadoop.hbase.regionserver.MetricsRegionServer.updateFlush(MetricsRegionServer.java:210) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2826) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2444) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2416) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2306) at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2209) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:115) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:77) {code} In MetricsTableAggregateSourceImpl.getOrCreateTableSource : {code} MetricsTableSource prev = tableSources.putIfAbsent(table, source); if (prev != null) { return prev; } else { // register the new metrics now register(source); {code} Suppose threads t1 and t2 execute the above code concurrently. t1 calls putIfAbsent first and proceeds to running {{register(source)}}. Context switches, t2 gets to putIfAbsent and retrieves the instance stored by t1 which is not registered yet. We would end up with what the stack trace showed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)