[ 
https://issues.apache.org/jira/browse/HBASE-21139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602156#comment-16602156
 ] 

Ted Yu commented on HBASE-21139:
--------------------------------

The lazy table metrics registration results in penalty for the first flushes.
Excerpt from log shows delay (note the same timestamp 08:18:23,234) :
{code}
2018-09-02 08:18:23,232 DEBUG 
[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2] 
regionserver.MetricsTableSourceImpl(124): Creating new                      
MetricsTableSourceImpl for table 'testtb-1535901500805'
2018-09-02 08:18:23,233 DEBUG 
[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2] 
regionserver.MetricsTableSourceImpl(137): registering metrics for testtb-   
1535901500805
2018-09-02 08:18:23,234 INFO  
[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-1] 
regionserver.HRegion(2822): Finished flush of dataSize ~2.29 KB/2343,       
heapSize ~5.16 KB/5280, currentSize=0 B/0 for fa403f6a4fb8dbc1a1c389744fce2d58 
in 280ms, sequenceid=5, compaction requested=false
2018-09-02 08:18:23,234 DEBUG 
[rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-1] 
regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register  
testtb-1535901500805 
Thread[rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-1,5,FailOnTimeoutGroup]
2018-09-02 08:18:23,234 DEBUG 
[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-1] 
regionserver.MetricsTableAggregateSourceImpl(84): it took 0 ms to register  
testtb-1535901500805 
Thread[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-1,5,FailOnTimeoutGroup]
2018-09-02 08:18:23,234 DEBUG 
[rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-1] 
regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register   
testtb-1535901500805 
Thread[rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-1,5,FailOnTimeoutGroup]
2018-09-02 08:18:23,234 DEBUG 
[rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-2] 
regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register   
testtb-1535901500805 
Thread[rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-2,5,FailOnTimeoutGroup]
2018-09-02 08:18:23,234 DEBUG 
[rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-2] 
regionserver.MetricsTableAggregateSourceImpl(84): it took 5 ms to register  
testtb-1535901500805 
Thread[rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-2,5,FailOnTimeoutGroup]
2018-09-02 08:18:23,234 DEBUG 
[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2] 
regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register  
testtb-1535901500805 
Thread[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2,5,FailOnTimeoutGroup]
{code}

> Concurrent invocations of 
> MetricsTableAggregateSourceImpl.getOrCreateTableSource may return 
> unregistered MetricsTableSource
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-21139
>                 URL: https://issues.apache.org/jira/browse/HBASE-21139
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Major
>
> From test output of TestRestoreFlushSnapshotFromClient :
> {code}
> 2018-09-01 21:09:38,174 WARN  [member: 
> 'hw13463.attlocal.net,49623,1535861370108' subprocedure-pool6-thread-1] 
> snapshot.                                                          
> RegionServerSnapshotManager$SnapshotSubprocedurePool(348): Got Exception in 
> SnapshotSubprocedurePool
> java.util.concurrent.ExecutionException: java.lang.NullPointerException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:324)
>   at 
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:173)
>   at 
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:193)
>   at 
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:189)
>   at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsTableSourceImpl.updateFlushTime(MetricsTableSourceImpl.java:375)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsTable.updateFlushTime(MetricsTable.java:56)
>   at 
> org.apache.hadoop.hbase.regionserver.MetricsRegionServer.updateFlush(MetricsRegionServer.java:210)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2826)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2444)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2416)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2306)
>   at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2209)
>   at 
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:115)
>   at 
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:77)
> {code}
> In MetricsTableAggregateSourceImpl.getOrCreateTableSource :
> {code}
>     MetricsTableSource prev = tableSources.putIfAbsent(table, source);
>     if (prev != null) {
>       return prev;
>     } else {
>       // register the new metrics now
>       register(source);
> {code}
> Suppose threads t1 and t2 execute the above code concurrently.
> t1 calls putIfAbsent first and proceeds to running {{register(source)}}.
> Context switches, t2 gets to putIfAbsent and retrieves the instance stored by 
> t1 which is not registered yet.
> We would end up with what the stack trace showed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to