[ https://issues.apache.org/jira/browse/HBASE-21139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602156#comment-16602156 ]
Ted Yu commented on HBASE-21139: -------------------------------- The lazy table metrics registration results in penalty for the first flushes. Excerpt from log shows delay (note the same timestamp 08:18:23,234) : {code} 2018-09-02 08:18:23,232 DEBUG [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2] regionserver.MetricsTableSourceImpl(124): Creating new MetricsTableSourceImpl for table 'testtb-1535901500805' 2018-09-02 08:18:23,233 DEBUG [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2] regionserver.MetricsTableSourceImpl(137): registering metrics for testtb- 1535901500805 2018-09-02 08:18:23,234 INFO [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-1] regionserver.HRegion(2822): Finished flush of dataSize ~2.29 KB/2343, heapSize ~5.16 KB/5280, currentSize=0 B/0 for fa403f6a4fb8dbc1a1c389744fce2d58 in 280ms, sequenceid=5, compaction requested=false 2018-09-02 08:18:23,234 DEBUG [rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-1] regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register testtb-1535901500805 Thread[rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-1,5,FailOnTimeoutGroup] 2018-09-02 08:18:23,234 DEBUG [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-1] regionserver.MetricsTableAggregateSourceImpl(84): it took 0 ms to register testtb-1535901500805 Thread[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-1,5,FailOnTimeoutGroup] 2018-09-02 08:18:23,234 DEBUG [rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-1] regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register testtb-1535901500805 Thread[rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-1,5,FailOnTimeoutGroup] 2018-09-02 08:18:23,234 DEBUG [rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-2] regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register testtb-1535901500805 Thread[rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-2,5,FailOnTimeoutGroup] 2018-09-02 08:18:23,234 DEBUG [rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-2] regionserver.MetricsTableAggregateSourceImpl(84): it took 5 ms to register testtb-1535901500805 Thread[rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-2,5,FailOnTimeoutGroup] 2018-09-02 08:18:23,234 DEBUG [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2] regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register testtb-1535901500805 Thread[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2,5,FailOnTimeoutGroup] {code} > Concurrent invocations of > MetricsTableAggregateSourceImpl.getOrCreateTableSource may return > unregistered MetricsTableSource > --------------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-21139 > URL: https://issues.apache.org/jira/browse/HBASE-21139 > Project: HBase > Issue Type: Bug > Reporter: Ted Yu > Priority: Major > > From test output of TestRestoreFlushSnapshotFromClient : > {code} > 2018-09-01 21:09:38,174 WARN [member: > 'hw13463.attlocal.net,49623,1535861370108' subprocedure-pool6-thread-1] > snapshot. > RegionServerSnapshotManager$SnapshotSubprocedurePool(348): Got Exception in > SnapshotSubprocedurePool > java.util.concurrent.ExecutionException: java.lang.NullPointerException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:324) > at > org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:173) > at > org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:193) > at > org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:189) > at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.MetricsTableSourceImpl.updateFlushTime(MetricsTableSourceImpl.java:375) > at > org.apache.hadoop.hbase.regionserver.MetricsTable.updateFlushTime(MetricsTable.java:56) > at > org.apache.hadoop.hbase.regionserver.MetricsRegionServer.updateFlush(MetricsRegionServer.java:210) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2826) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2444) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2416) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2306) > at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2209) > at > org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:115) > at > org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:77) > {code} > In MetricsTableAggregateSourceImpl.getOrCreateTableSource : > {code} > MetricsTableSource prev = tableSources.putIfAbsent(table, source); > if (prev != null) { > return prev; > } else { > // register the new metrics now > register(source); > {code} > Suppose threads t1 and t2 execute the above code concurrently. > t1 calls putIfAbsent first and proceeds to running {{register(source)}}. > Context switches, t2 gets to putIfAbsent and retrieves the instance stored by > t1 which is not registered yet. > We would end up with what the stack trace showed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)