[jira] [Commented] (HBASE-20081) TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
[ https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377808#comment-16377808 ] stack commented on HBASE-20081: --- It is a daemon thread. That will not hold-up the shutdown. > TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown > -- > > Key: HBASE-20081 > URL: https://issues.apache.org/jira/browse/HBASE-20081 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Priority: Major > > https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/ > was one recent occurrence. > I noticed two things in test output: > {code} > 2018-02-25 18:12:45,053 WARN [Time-limited test-EventThread] > master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 > is not online or isn't known to the master.The latter could be caused by a > DNS misconfiguration. > {code} > Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the > above should not have been logged. > {code} > 2018-02-25 18:16:51,531 WARN [master/asf912:0.Chore.1] > master.CatalogJanitor(127): Failed scan of catalog table > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:675) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:188) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:140) > at > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:246) > at > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:119) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > {code} > The above was possibly related to the lost region server. > I searched test output of successful run where none of the above two can be > seen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20081) TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
[ https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377291#comment-16377291 ] Ted Yu commented on HBASE-20081: After a few of these: {code} Thread 22 (Time-limited test): State: RUNNABLE Blocked count: 583 Waited count: 1063 Stack: sun.management.ThreadImpl.getThreadInfo1(Native Method) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:178) sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:139) org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:169) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:498) org.apache.hadoop.hbase.util.Threads$PrintThreadInfoLazyHolder$1.printThreadInfo(Threads.java:294) org.apache.hadoop.hbase.util.Threads.printThreadInfo(Threads.java:341) org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:135) org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:385) org.apache.hadoop.hbase.MiniHBaseCluster.waitUntilShutDown(MiniHBaseCluster.java:867) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:1133) org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:1108) {code} The final stack frame contained: {code} "Time-limited test" daemon prio=5 tid=22 runnable java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.interrupt(Native Method) at sun.nio.ch.EPollArrayWrapper.interrupt(EPollArrayWrapper.java:317) at sun.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:207) at io.netty.channel.nio.NioEventLoop.wakeup(NioEventLoop.java:591) at io.netty.util.concurrent.SingleThreadEventExecutor.shutdownGracefully(SingleThreadEventExecutor.java:561) at io.netty.util.concurrent.MultithreadEventExecutorGroup.shutdownGracefully(MultithreadEventExecutorGroup.java:146) at io.netty.util.concurrent.AbstractEventExecutorGroup.shutdownGracefully(AbstractEventExecutorGroup.java:69) at org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer.close(DatanodeHttpServer.java:266) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:2006) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNode(MiniDFSCluster.java:2015) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:2005) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1984) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1958) at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1951) at org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniDFSCluster(HBaseTestingUtility.java:767) at org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:1109) at org.apache.hadoop.hbase.master.procedure.TestTableDDLProcedureBase.cleanupTest(TestTableDDLProcedureBase.java:53) {code} It seems that the test was waiting for the DataNode to shutdown. > TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown > -- > > Key: HBASE-20081 > URL: https://issues.apache.org/jira/browse/HBASE-20081 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Priority: Major > > https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/ > was one recent occurrence. > I noticed two things in test output: > {code} > 2018-02-25 18:12:45,053 WARN [Time-limited test-EventThread] > master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 > is not online or isn't known to the master.The latter could be caused by a > DNS misconfiguration. > {code} > Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the > above should not have been logged. > {code} > 2018-02-25 18:16:51,531 WARN [master/asf912:0.Chore.1] > master.CatalogJanitor(127): Failed scan of catalog table > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680) > at >
[jira] [Commented] (HBASE-20081) TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
[ https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377277#comment-16377277 ] stack commented on HBASE-20081: --- Why is hanging? > TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown > -- > > Key: HBASE-20081 > URL: https://issues.apache.org/jira/browse/HBASE-20081 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Priority: Major > > https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/ > was one recent occurrence. > I noticed two things in test output: > {code} > 2018-02-25 18:12:45,053 WARN [Time-limited test-EventThread] > master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 > is not online or isn't known to the master.The latter could be caused by a > DNS misconfiguration. > {code} > Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the > above should not have been logged. > {code} > 2018-02-25 18:16:51,531 WARN [master/asf912:0.Chore.1] > master.CatalogJanitor(127): Failed scan of catalog table > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:675) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:188) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:140) > at > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:246) > at > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:119) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > {code} > The above was possibly related to the lost region server. > I searched test output of successful run where none of the above two can be > seen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20081) TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
[ https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377274#comment-16377274 ] Ted Yu commented on HBASE-20081: >From test output, the server was carrying meta: {code} 2018-02-25 18:12:44,833 INFO [PEWorker-1] procedure.ServerCrashProcedure(118): Start pid=54, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure server=asf912.gq1. ygridcore.net,45649,1519582305777, splitWal=true, meta=true {code} This happened at the end of TestDisableTableProcedure#testRecoveryAndDoubleExecution, the last subtest. > TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown > -- > > Key: HBASE-20081 > URL: https://issues.apache.org/jira/browse/HBASE-20081 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Priority: Major > > https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/ > was one recent occurrence. > I noticed two things in test output: > {code} > 2018-02-25 18:12:45,053 WARN [Time-limited test-EventThread] > master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 > is not online or isn't known to the master.The latter could be caused by a > DNS misconfiguration. > {code} > Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the > above should not have been logged. > {code} > 2018-02-25 18:16:51,531 WARN [master/asf912:0.Chore.1] > master.CatalogJanitor(127): Failed scan of catalog table > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:675) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:188) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:140) > at > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:246) > at > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:119) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > {code} > The above was possibly related to the lost region server. > I searched test output of successful run where none of the above two can be > seen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20081) TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
[ https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377184#comment-16377184 ] stack commented on HBASE-20081: --- See .zip file in build artifacts for full logs. > TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown > -- > > Key: HBASE-20081 > URL: https://issues.apache.org/jira/browse/HBASE-20081 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Priority: Major > > https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/ > was one recent occurrence. > I noticed two things in test output: > {code} > 2018-02-25 18:12:45,053 WARN [Time-limited test-EventThread] > master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 > is not online or isn't known to the master.The latter could be caused by a > DNS misconfiguration. > {code} > Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the > above should not have been logged. > {code} > 2018-02-25 18:16:51,531 WARN [master/asf912:0.Chore.1] > master.CatalogJanitor(127): Failed scan of catalog table > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:675) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:188) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:140) > at > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:246) > at > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:119) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > {code} > The above was possibly related to the lost region server. > I searched test output of successful run where none of the above two can be > seen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20081) TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
[ https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376560#comment-16376560 ] Ted Yu commented on HBASE-20081: The test failure happened for build #218 : https://builds.apache.org/job/HBase-2.0-hadoop3-tests/org.apache.hbase$hbase-server/218/testReport/junit/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/ However, the archive of the test output was truncated : {code} 2018-02-25 18:12:02,398 DEBUG [RS_OPEN_REGION-regionserver/asf912:0-1] regionserver.FlushLargeStoresPolicy(61): No hbase.hregion.percolumnfamilyflush.size.lower.bound s ...[truncated 1772313 bytes]... ...[truncated 10750 chars]... {code} The second truncation was right above RS-EventLoopGroup-3-9 was shown. Thus lot of relevant information was not recorded. {code} java.io.IOException: connection is closed at org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263) at org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761) {code} The above was among the truncated logs for the test output. Let me search for option of not truncating test output. The test failed not often. I looped 20 times locally against hadoop 3 which passed. Doing another round of local test runs. > TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown > -- > > Key: HBASE-20081 > URL: https://issues.apache.org/jira/browse/HBASE-20081 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Priority: Major > > https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/ > was one recent occurrence. > I noticed two things in test output: > {code} > 2018-02-25 18:12:45,053 WARN [Time-limited test-EventThread] > master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 > is not online or isn't known to the master.The latter could be caused by a > DNS misconfiguration. > {code} > Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the > above should not have been logged. > {code} > 2018-02-25 18:16:51,531 WARN [master/asf912:0.Chore.1] > master.CatalogJanitor(127): Failed scan of catalog table > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:675) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:188) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:140) > at > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:246) > at > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:119) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > {code} > The above was possibly related to the lost region server. > I searched test output of successful run where none of the above two can be > seen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20081) TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
[ https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376401#comment-16376401 ] stack commented on HBASE-20081: --- The link doesn't work. How often does the test fail? Why would a region (not meta) not being online hold up shutdown? bq. Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the above should not have been logged. ? bq. java.io.IOException: connection is closedThe above was possibly related to the lost region server. ? The server connection is closed on shutdown. If a catalog janitor running, it will get connection closed. What does any of the above have to do w/ a hung shutdown? Did the test timeout? Isn't there a thread dump? > TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown > -- > > Key: HBASE-20081 > URL: https://issues.apache.org/jira/browse/HBASE-20081 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Priority: Major > > https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/ > was one recent occurrence. > I noticed two things in test output: > {code} > 2018-02-25 18:12:45,053 WARN [Time-limited test-EventThread] > master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 > is not online or isn't known to the master.The latter could be caused by a > DNS misconfiguration. > {code} > Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the > above should not have been logged. > {code} > 2018-02-25 18:16:51,531 WARN [master/asf912:0.Chore.1] > master.CatalogJanitor(127): Failed scan of catalog table > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:675) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:188) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:140) > at > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:246) > at > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:119) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > {code} > The above was possibly related to the lost region server. > I searched test output of successful run where none of the above two can be > seen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20081) TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
[ https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376290#comment-16376290 ] Ted Yu commented on HBASE-20081: I am looping TestDisableTableProcedure locally with some additional logging to see if I can get more clue. > TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown > -- > > Key: HBASE-20081 > URL: https://issues.apache.org/jira/browse/HBASE-20081 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Priority: Major > > https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/ > was one recent occurrence. > I noticed two things in test output: > {code} > 2018-02-25 18:12:45,053 WARN [Time-limited test-EventThread] > master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 > is not online or isn't known to the master.The latter could be caused by a > DNS misconfiguration. > {code} > Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the > above should not have been logged. > {code} > 2018-02-25 18:16:51,531 WARN [master/asf912:0.Chore.1] > master.CatalogJanitor(127): Failed scan of catalog table > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:675) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:188) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:140) > at > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:246) > at > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:119) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > {code} > The above was possibly related to the lost region server. > I searched test output of successful run where none of the above two can be > seen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)