[jira] [Commented] (HBASE-20081) TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown

2018-02-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377808#comment-16377808
 ] 

stack commented on HBASE-20081:
---

It is a daemon thread. That will not hold-up the shutdown.

> TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
> --
>
> Key: HBASE-20081
> URL: https://issues.apache.org/jira/browse/HBASE-20081
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Major
>
> https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/
>  was one recent occurrence.
> I noticed two things in test output:
> {code}
> 2018-02-25 18:12:45,053 WARN  [Time-limited test-EventThread] 
> master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 
> is not online or isn't known to the master.The latter could be caused by a 
> DNS misconfiguration.
> {code}
> Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the 
> above should not have been logged.
> {code}
> 2018-02-25 18:16:51,531 WARN  [master/asf912:0.Chore.1] 
> master.CatalogJanitor(127): Failed scan of catalog table
> java.io.IOException: connection is closed
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:675)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:188)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:140)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:246)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:119)
>   at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> {code}
> The above was possibly related to the lost region server.
> I searched test output of successful run where none of the above two can be 
> seen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20081) TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown

2018-02-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377291#comment-16377291
 ] 

Ted Yu commented on HBASE-20081:


After a few of these:
{code}
Thread 22 (Time-limited test):
  State: RUNNABLE
  Blocked count: 583
  Waited count: 1063
  Stack:
sun.management.ThreadImpl.getThreadInfo1(Native Method)
sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:178)
sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:139)

org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:169)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)

org.apache.hadoop.hbase.util.Threads$PrintThreadInfoLazyHolder$1.printThreadInfo(Threads.java:294)
org.apache.hadoop.hbase.util.Threads.printThreadInfo(Threads.java:341)
org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:135)
org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:385)

org.apache.hadoop.hbase.MiniHBaseCluster.waitUntilShutDown(MiniHBaseCluster.java:867)

org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:1133)

org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:1108)
{code}
The final stack frame contained:
{code}
"Time-limited test" daemon prio=5 tid=22 runnable
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.interrupt(Native Method)
at sun.nio.ch.EPollArrayWrapper.interrupt(EPollArrayWrapper.java:317)
at sun.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:207)
at io.netty.channel.nio.NioEventLoop.wakeup(NioEventLoop.java:591)
at 
io.netty.util.concurrent.SingleThreadEventExecutor.shutdownGracefully(SingleThreadEventExecutor.java:561)
at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.shutdownGracefully(MultithreadEventExecutorGroup.java:146)
at 
io.netty.util.concurrent.AbstractEventExecutorGroup.shutdownGracefully(AbstractEventExecutorGroup.java:69)
at 
org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer.close(DatanodeHttpServer.java:266)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:2006)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNode(MiniDFSCluster.java:2015)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:2005)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1984)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1958)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1951)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniDFSCluster(HBaseTestingUtility.java:767)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:1109)
at 
org.apache.hadoop.hbase.master.procedure.TestTableDDLProcedureBase.cleanupTest(TestTableDDLProcedureBase.java:53)
{code}
It seems that the test was waiting for the DataNode to shutdown.

> TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
> --
>
> Key: HBASE-20081
> URL: https://issues.apache.org/jira/browse/HBASE-20081
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Major
>
> https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/
>  was one recent occurrence.
> I noticed two things in test output:
> {code}
> 2018-02-25 18:12:45,053 WARN  [Time-limited test-EventThread] 
> master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 
> is not online or isn't known to the master.The latter could be caused by a 
> DNS misconfiguration.
> {code}
> Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the 
> above should not have been logged.
> {code}
> 2018-02-25 18:16:51,531 WARN  [master/asf912:0.Chore.1] 
> master.CatalogJanitor(127): Failed scan of catalog table
> java.io.IOException: connection is closed
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680)
>   at 
> 

[jira] [Commented] (HBASE-20081) TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown

2018-02-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377277#comment-16377277
 ] 

stack commented on HBASE-20081:
---

Why is hanging?

> TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
> --
>
> Key: HBASE-20081
> URL: https://issues.apache.org/jira/browse/HBASE-20081
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Major
>
> https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/
>  was one recent occurrence.
> I noticed two things in test output:
> {code}
> 2018-02-25 18:12:45,053 WARN  [Time-limited test-EventThread] 
> master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 
> is not online or isn't known to the master.The latter could be caused by a 
> DNS misconfiguration.
> {code}
> Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the 
> above should not have been logged.
> {code}
> 2018-02-25 18:16:51,531 WARN  [master/asf912:0.Chore.1] 
> master.CatalogJanitor(127): Failed scan of catalog table
> java.io.IOException: connection is closed
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:675)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:188)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:140)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:246)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:119)
>   at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> {code}
> The above was possibly related to the lost region server.
> I searched test output of successful run where none of the above two can be 
> seen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20081) TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown

2018-02-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377274#comment-16377274
 ] 

Ted Yu commented on HBASE-20081:


>From test output, the server was carrying meta:
{code}
2018-02-25 18:12:44,833 INFO  [PEWorker-1] procedure.ServerCrashProcedure(118): 
Start pid=54, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
server=asf912.gq1.   ygridcore.net,45649,1519582305777, splitWal=true, 
meta=true
{code}
This happened at the end of 
TestDisableTableProcedure#testRecoveryAndDoubleExecution, the last subtest.

> TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
> --
>
> Key: HBASE-20081
> URL: https://issues.apache.org/jira/browse/HBASE-20081
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Major
>
> https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/
>  was one recent occurrence.
> I noticed two things in test output:
> {code}
> 2018-02-25 18:12:45,053 WARN  [Time-limited test-EventThread] 
> master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 
> is not online or isn't known to the master.The latter could be caused by a 
> DNS misconfiguration.
> {code}
> Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the 
> above should not have been logged.
> {code}
> 2018-02-25 18:16:51,531 WARN  [master/asf912:0.Chore.1] 
> master.CatalogJanitor(127): Failed scan of catalog table
> java.io.IOException: connection is closed
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:675)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:188)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:140)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:246)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:119)
>   at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> {code}
> The above was possibly related to the lost region server.
> I searched test output of successful run where none of the above two can be 
> seen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20081) TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown

2018-02-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377184#comment-16377184
 ] 

stack commented on HBASE-20081:
---

See .zip file in build artifacts for full logs.

> TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
> --
>
> Key: HBASE-20081
> URL: https://issues.apache.org/jira/browse/HBASE-20081
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Major
>
> https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/
>  was one recent occurrence.
> I noticed two things in test output:
> {code}
> 2018-02-25 18:12:45,053 WARN  [Time-limited test-EventThread] 
> master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 
> is not online or isn't known to the master.The latter could be caused by a 
> DNS misconfiguration.
> {code}
> Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the 
> above should not have been logged.
> {code}
> 2018-02-25 18:16:51,531 WARN  [master/asf912:0.Chore.1] 
> master.CatalogJanitor(127): Failed scan of catalog table
> java.io.IOException: connection is closed
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:675)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:188)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:140)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:246)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:119)
>   at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> {code}
> The above was possibly related to the lost region server.
> I searched test output of successful run where none of the above two can be 
> seen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20081) TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown

2018-02-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376560#comment-16376560
 ] 

Ted Yu commented on HBASE-20081:


The test failure happened for build #218 :

https://builds.apache.org/job/HBase-2.0-hadoop3-tests/org.apache.hbase$hbase-server/218/testReport/junit/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/

However, the archive of the test output was truncated :
{code}
2018-02-25 18:12:02,398 DEBUG [RS_OPEN_REGION-regionserver/asf912:0-1] 
regionserver.FlushLargeStoresPolicy(61): No 
hbase.hregion.percolumnfamilyflush.size.lower.bound s
...[truncated 1772313 bytes]...

...[truncated 10750 chars]...
{code}
The second truncation was right above RS-EventLoopGroup-3-9 was shown. Thus lot 
of relevant information was not recorded.
{code}
java.io.IOException: connection is closed
at 
org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263)
at 
org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761)
{code}
The above was among the truncated logs for the test output.
Let me search for option of not truncating test output.

The test failed not often.

I looped 20 times locally against hadoop 3 which passed. Doing another round of 
local test runs.

> TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
> --
>
> Key: HBASE-20081
> URL: https://issues.apache.org/jira/browse/HBASE-20081
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Major
>
> https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/
>  was one recent occurrence.
> I noticed two things in test output:
> {code}
> 2018-02-25 18:12:45,053 WARN  [Time-limited test-EventThread] 
> master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 
> is not online or isn't known to the master.The latter could be caused by a 
> DNS misconfiguration.
> {code}
> Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the 
> above should not have been logged.
> {code}
> 2018-02-25 18:16:51,531 WARN  [master/asf912:0.Chore.1] 
> master.CatalogJanitor(127): Failed scan of catalog table
> java.io.IOException: connection is closed
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:675)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:188)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:140)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:246)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:119)
>   at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> {code}
> The above was possibly related to the lost region server.
> I searched test output of successful run where none of the above two can be 
> seen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20081) TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown

2018-02-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376401#comment-16376401
 ] 

stack commented on HBASE-20081:
---

The link doesn't work.

How often does the test fail?

Why would a region (not meta) not being online hold up shutdown?

bq. Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the 
above should not have been logged.

?

bq. java.io.IOException: connection is closedThe above was possibly related 
to the lost region server.

?

The server connection is closed on shutdown. If a catalog janitor running, it 
will get connection closed.

What does any of the above have to do w/ a hung shutdown?

Did the test timeout? Isn't there a thread dump?



> TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
> --
>
> Key: HBASE-20081
> URL: https://issues.apache.org/jira/browse/HBASE-20081
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Major
>
> https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/
>  was one recent occurrence.
> I noticed two things in test output:
> {code}
> 2018-02-25 18:12:45,053 WARN  [Time-limited test-EventThread] 
> master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 
> is not online or isn't known to the master.The latter could be caused by a 
> DNS misconfiguration.
> {code}
> Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the 
> above should not have been logged.
> {code}
> 2018-02-25 18:16:51,531 WARN  [master/asf912:0.Chore.1] 
> master.CatalogJanitor(127): Failed scan of catalog table
> java.io.IOException: connection is closed
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:675)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:188)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:140)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:246)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:119)
>   at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> {code}
> The above was possibly related to the lost region server.
> I searched test output of successful run where none of the above two can be 
> seen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20081) TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown

2018-02-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376290#comment-16376290
 ] 

Ted Yu commented on HBASE-20081:


I am looping TestDisableTableProcedure locally with some additional logging to 
see if I can get more clue.



> TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
> --
>
> Key: HBASE-20081
> URL: https://issues.apache.org/jira/browse/HBASE-20081
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Priority: Major
>
> https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/
>  was one recent occurrence.
> I noticed two things in test output:
> {code}
> 2018-02-25 18:12:45,053 WARN  [Time-limited test-EventThread] 
> master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 
> is not online or isn't known to the master.The latter could be caused by a 
> DNS misconfiguration.
> {code}
> Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the 
> above should not have been logged.
> {code}
> 2018-02-25 18:16:51,531 WARN  [master/asf912:0.Chore.1] 
> master.CatalogJanitor(127): Failed scan of catalog table
> java.io.IOException: connection is closed
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680)
>   at 
> org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:675)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:188)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:140)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:246)
>   at 
> org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:119)
>   at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
> {code}
> The above was possibly related to the lost region server.
> I searched test output of successful run where none of the above two can be 
> seen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)