[ 
https://issues.apache.org/jira/browse/HBASE-30118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18076415#comment-18076415
 ] 

Duo Zhang commented on HBASE-30118:
-----------------------------------

The code is in hadoop so it is not easy to fix.

Plan to have a quick workaround to stablize our tests, and then find a more 
general way to fix the problem.

> Dead lock in metrics system cause UTs hang
> ------------------------------------------
>
>                 Key: HBASE-30118
>                 URL: https://issues.apache.org/jira/browse/HBASE-30118
>             Project: HBase
>          Issue Type: Bug
>          Components: hadoop2, hadoop3, metrics, test
>            Reporter: Duo Zhang
>            Priority: Major
>
> {noformat}
> "RS_OPEN_META-regionserver/2cd189b8f196:0-0" daemon prio=5 tid=470 blocked
> java.lang.Thread.State: BLOCKED
>         at 
> app//org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:223)
>         at 
> app//org.apache.hadoop.hbase.metrics.BaseSourceImpl.<init>(BaseSourceImpl.java:115)
>         at 
> app//org.apache.hadoop.hbase.io.MetricsIOSourceImpl.<init>(MetricsIOSourceImpl.java:44)
>         at 
> app//org.apache.hadoop.hbase.io.MetricsIOSourceImpl.<init>(MetricsIOSourceImpl.java:39)
>         at 
> app//org.apache.hadoop.hbase.regionserver.MetricsRegionServerSourceFactoryImpl.createIO(MetricsRegionServerSourceFactoryImpl.java:99)
>         at 
> app//org.apache.hadoop.hbase.io.MetricsIO.<init>(MetricsIO.java:36)
>         at 
> app//org.apache.hadoop.hbase.io.MetricsIO.getInstance(MetricsIO.java:52)
>         at 
> app//org.apache.hadoop.hbase.io.hfile.HFile.updateWriteLatency(HFile.java:205)
>         at 
> app//org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlockAndWriteHeaderAndData(HFileBlock.java:1051)
>         at 
> app//org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:1036)
>         at 
> app//org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishBlock(HFileWriterImpl.java:384)
>         at 
> app//org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:653)
>         at 
> app//org.apache.hadoop.hbase.regionserver.StoreFileWriter$SingleStoreFileWriter.close(StoreFileWriter.java:781)
>         at 
> app//org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:301)
>         at 
> app//org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:70)
>         at 
> app//org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:74)
>         at 
> app//org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:836)
>         at 
> app//org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1987)
>         at 
> app//org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:3158)
>         at 
> app//org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2866)
>         at 
> app//org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:5623)
>         at 
> app//org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1099)
>         at 
> app//org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:1033)
>         at 
> app//org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:8038)
>         at 
> app//org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7992)
>         at 
> app//org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7964)
>         at 
> app//org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7912)
>         at 
> app//org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7843)
>         at 
> app//org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:143)
>         at 
> app//org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>         at 
> [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>         at 
> [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>         at [email protected]/java.lang.Thread.run(Thread.java:840)
> {noformat}
> {noformat}
> "HBase-Metrics2-1" daemon prio=5 tid=199 in Object.wait()
> java.lang.Thread.State: WAITING (on object monitor)
>         at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
>         at 
> [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
>         at 
> [email protected]/java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1864)
>         at 
> [email protected]/java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool.java:3465)
>         at 
> [email protected]/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3436)
>         at 
> [email protected]/java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1898)
>         at 
> [email protected]/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2072)
>         at 
> app//org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:182)
>         at 
> app//org.apache.hadoop.hbase.client.TableOverAsyncTable.get(TableOverAsyncTable.java:188)
>         at 
> app//org.apache.hadoop.hbase.MetaTableAccessor.getTableState(MetaTableAccessor.java:601)
>         at 
> app//org.apache.hadoop.hbase.master.TableStateManager.readMetaState(TableStateManager.java:177)
>         at 
> app//org.apache.hadoop.hbase.master.TableStateManager.isTablePresent(TableStateManager.java:107)
>         at 
> app//org.apache.hadoop.hbase.master.HMaster.getTableDescriptors(HMaster.java:3856)
>         at 
> app//org.apache.hadoop.hbase.master.HMaster.listTableDescriptors(HMaster.java:3806)
>         at 
> app//org.apache.hadoop.hbase.master.MetricsMasterWrapperImpl.getRegionCounts(MetricsMasterWrapperImpl.java:227)
>         at 
> app//org.apache.hadoop.hbase.master.MetricsMasterSourceImpl.getMetrics(MetricsMasterSourceImpl.java:95)
>         at 
> app//org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
>         at 
> app//org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:183)
>         at 
> app//org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:156)
>         at 
> [email protected]/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getClassName(DefaultMBeanServerInterceptor.java:1766)
>         at 
> [email protected]/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.safeGetClassName(DefaultMBeanServerInterceptor.java:1575)
>         at 
> [email protected]/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.checkMBeanPermission(DefaultMBeanServerInterceptor.java:1776)
>         at 
> [email protected]/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:426)
>         at 
> [email protected]/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:411)
>         at 
> [email protected]/com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:547)
>         at 
> app//org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:144)
>         at 
> app//org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.stopMBeans(MetricsSourceAdapter.java:228)
>         at 
> app//org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.stop(MetricsSourceAdapter.java:213)
>         at 
> app//org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stopSources(MetricsSystemImpl.java:464)
>         at 
> app//org.apache.hadoop.metrics2.impl.MetricsSystemImpl.stop(MetricsSystemImpl.java:212)
>         at 
> app//org.apache.hadoop.metrics2.impl.JmxCacheBuster$JmxCacheBusterRunnable.run(JmxCacheBuster.java:98)
>         at 
> [email protected]/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
>         at 
> [email protected]/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>         at 
> [email protected]/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
>         at 
> [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>         at 
> [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>         at [email protected]/java.lang.Thread.run(Thread.java:840)
> {noformat}
> In a UT, we kill the regionserver hosting meta, so we will assign meta to a 
> new regionserver, and finally when updating metrics, it blocks on the metrics 
> lock for registering. But at the same time, JmxCacheBuster is trying to 
> recreate all the jmx metrics and finally lead to access meta region under the 
> metrics lock, and is blocked since meta is not online.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to