[ 
https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377291#comment-16377291
 ] 

Ted Yu commented on HBASE-20081:
--------------------------------

After a few of these:
{code}
Thread 22 (Time-limited test):
  State: RUNNABLE
  Blocked count: 583
  Waited count: 1063
  Stack:
    sun.management.ThreadImpl.getThreadInfo1(Native Method)
    sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:178)
    sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:139)
    
org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:169)
    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    java.lang.reflect.Method.invoke(Method.java:498)
    
org.apache.hadoop.hbase.util.Threads$PrintThreadInfoLazyHolder$1.printThreadInfo(Threads.java:294)
    org.apache.hadoop.hbase.util.Threads.printThreadInfo(Threads.java:341)
    org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:135)
    org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:385)
    
org.apache.hadoop.hbase.MiniHBaseCluster.waitUntilShutDown(MiniHBaseCluster.java:867)
    
org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:1133)
    
org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:1108)
{code}
The final stack frame contained:
{code}
"Time-limited test" daemon prio=5 tid=22 runnable
java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.interrupt(Native Method)
        at sun.nio.ch.EPollArrayWrapper.interrupt(EPollArrayWrapper.java:317)
        at sun.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:207)
        at io.netty.channel.nio.NioEventLoop.wakeup(NioEventLoop.java:591)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor.shutdownGracefully(SingleThreadEventExecutor.java:561)
        at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.shutdownGracefully(MultithreadEventExecutorGroup.java:146)
        at 
io.netty.util.concurrent.AbstractEventExecutorGroup.shutdownGracefully(AbstractEventExecutorGroup.java:69)
        at 
org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer.close(DatanodeHttpServer.java:266)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:2006)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNode(MiniDFSCluster.java:2015)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:2005)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1984)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1958)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1951)
        at 
org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniDFSCluster(HBaseTestingUtility.java:767)
        at 
org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:1109)
        at 
org.apache.hadoop.hbase.master.procedure.TestTableDDLProcedureBase.cleanupTest(TestTableDDLProcedureBase.java:53)
{code}
It seems that the test was waiting for the DataNode to shutdown.

> TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-20081
>                 URL: https://issues.apache.org/jira/browse/HBASE-20081
>             Project: HBase
>          Issue Type: Test
>            Reporter: Ted Yu
>            Priority: Major
>
> https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/
>  was one recent occurrence.
> I noticed two things in test output:
> {code}
> 2018-02-25 18:12:45,053 WARN  [Time-limited test-EventThread] 
> master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 
> is not online or isn't known to the master.The latter could be caused by a 
> DNS misconfiguration.
> {code}
> Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the 
> above should not have been logged.
> {code}
> 2018-02-25 18:16:51,531 WARN  [master/asf912:0.Chore.1] 
> master.CatalogJanitor(127): Failed scan of catalog table
> java.io.IOException: connection is closed
>       at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263)
>       at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761)
>       at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680)
>       at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:675)
>       at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:188)
>       at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:140)
>       at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:246)
>       at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:119)
>       at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> {code}
> The above was possibly related to the lost region server.
> I searched test output of successful run where none of the above two can be 
> seen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to