[jira] [Commented] (HIVE-13491) Testing : log thread stacks when metastore fails to start
[ https://issues.apache.org/jira/browse/HIVE-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237785#comment-15237785 ] Thejas M Nair commented on HIVE-13491: -- [~szehon] Yes, I agree, a restart is worth trying out. Meanwhile, I will go ahead and commit this. > Testing : log thread stacks when metastore fails to start > -- > > Key: HIVE-13491 > URL: https://issues.apache.org/jira/browse/HIVE-13491 > Project: Hive > Issue Type: Bug > Components: Test, Testing Infrastructure >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-13491.1.patch > > > Many tests are failing in ptest2 because metastore fails to startup in the > expected time. > There is not enough information to figure out why the metastore startup > failed/got hung in the hive.log file. Printing the thread dumps when that > happens would be useful for finding the root cause. > The stack in test failure looks like this - > {code} > java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:579) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.loopUntilHMSReady(MetaStoreUtils.java:1208) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1195) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1177) > at > org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.setup(TestHadoopAuthBridge23.java:153) > at > org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testMetastoreProxyUser(TestHadoopAuthBridge23.java:241) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13491) Testing : log thread stacks when metastore fails to start
[ https://issues.apache.org/jira/browse/HIVE-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237675#comment-15237675 ] Szehon Ho commented on HIVE-13491: -- I thinking to restart PTest server, which should trigger auto-generation of new test slaves fresh from the image, does anyone mind me doing that? > Testing : log thread stacks when metastore fails to start > -- > > Key: HIVE-13491 > URL: https://issues.apache.org/jira/browse/HIVE-13491 > Project: Hive > Issue Type: Bug > Components: Test, Testing Infrastructure >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-13491.1.patch > > > Many tests are failing in ptest2 because metastore fails to startup in the > expected time. > There is not enough information to figure out why the metastore startup > failed/got hung in the hive.log file. Printing the thread dumps when that > happens would be useful for finding the root cause. > The stack in test failure looks like this - > {code} > java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:579) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.loopUntilHMSReady(MetaStoreUtils.java:1208) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1195) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1177) > at > org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.setup(TestHadoopAuthBridge23.java:153) > at > org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testMetastoreProxyUser(TestHadoopAuthBridge23.java:241) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13491) Testing : log thread stacks when metastore fails to start
[ https://issues.apache.org/jira/browse/HIVE-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237667#comment-15237667 ] Szehon Ho commented on HIVE-13491: -- I was also thinking the other day that maybe the machines are getting loaded or somewhat slow, hence HMS cannot start up in time. But this will tell us for certain. I will also take a look at that if I get a chance. > Testing : log thread stacks when metastore fails to start > -- > > Key: HIVE-13491 > URL: https://issues.apache.org/jira/browse/HIVE-13491 > Project: Hive > Issue Type: Bug > Components: Test, Testing Infrastructure >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-13491.1.patch > > > Many tests are failing in ptest2 because metastore fails to startup in the > expected time. > There is not enough information to figure out why the metastore startup > failed/got hung in the hive.log file. Printing the thread dumps when that > happens would be useful for finding the root cause. > The stack in test failure looks like this - > {code} > java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:579) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.loopUntilHMSReady(MetaStoreUtils.java:1208) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1195) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1177) > at > org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.setup(TestHadoopAuthBridge23.java:153) > at > org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testMetastoreProxyUser(TestHadoopAuthBridge23.java:241) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13491) Testing : log thread stacks when metastore fails to start
[ https://issues.apache.org/jira/browse/HIVE-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237653#comment-15237653 ] Szehon Ho commented on HIVE-13491: -- Thanks +1 > Testing : log thread stacks when metastore fails to start > -- > > Key: HIVE-13491 > URL: https://issues.apache.org/jira/browse/HIVE-13491 > Project: Hive > Issue Type: Bug > Components: Test, Testing Infrastructure >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-13491.1.patch > > > Many tests are failing in ptest2 because metastore fails to startup in the > expected time. > There is not enough information to figure out why the metastore startup > failed/got hung in the hive.log file. Printing the thread dumps when that > happens would be useful for finding the root cause. > The stack in test failure looks like this - > {code} > java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:579) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.loopUntilHMSReady(MetaStoreUtils.java:1208) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1195) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1177) > at > org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.setup(TestHadoopAuthBridge23.java:153) > at > org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testMetastoreProxyUser(TestHadoopAuthBridge23.java:241) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13491) Testing : log thread stacks when metastore fails to start
[ https://issues.apache.org/jira/browse/HIVE-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237509#comment-15237509 ] Thejas M Nair commented on HIVE-13491: -- [~sershe] [~szehon] [~ashutoshc] [~sseth] Can someone please review this change ? It should help nail down the problem with metastore startup in large number of tests. This change impacts only the tests. We have 30+ patches in the queue, and many runs are taking 3+hrs to finish. Putting this in asap could help in reducing the number of failures in those tests and might also give more clues on why the runs are taking so long. > Testing : log thread stacks when metastore fails to start > -- > > Key: HIVE-13491 > URL: https://issues.apache.org/jira/browse/HIVE-13491 > Project: Hive > Issue Type: Bug > Components: Test, Testing Infrastructure >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-13491.1.patch > > > Many tests are failing in ptest2 because metastore fails to startup in the > expected time. > There is not enough information to figure out why the metastore startup > failed/got hung in the hive.log file. Printing the thread dumps when that > happens would be useful for finding the root cause. > The stack in test failure looks like this - > {code} > java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:579) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.loopUntilHMSReady(MetaStoreUtils.java:1208) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1195) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1177) > at > org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.setup(TestHadoopAuthBridge23.java:153) > at > org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testMetastoreProxyUser(TestHadoopAuthBridge23.java:241) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13491) Testing : log thread stacks when metastore fails to start
[ https://issues.apache.org/jira/browse/HIVE-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236763#comment-15236763 ] Thejas M Nair commented on HIVE-13491: -- Also increased the frequency of checks for metastore startup from every 10 sec to every sec. 1 sec pause should be more than enough to not consume too much of cpu resources on the machine. > Testing : log thread stacks when metastore fails to start > -- > > Key: HIVE-13491 > URL: https://issues.apache.org/jira/browse/HIVE-13491 > Project: Hive > Issue Type: Bug > Components: Test, Testing Infrastructure >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-13491.1.patch > > > Many tests are failing in ptest2 because metastore fails to startup in the > expected time. > There is not enough information to figure out why the metastore startup > failed/got hung in the hive.log file. Printing the thread dumps when that > happens would be useful for finding the root cause. > The stack in test failure looks like this - > {code} > java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:579) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.loopUntilHMSReady(MetaStoreUtils.java:1208) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1195) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1177) > at > org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.setup(TestHadoopAuthBridge23.java:153) > at > org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testMetastoreProxyUser(TestHadoopAuthBridge23.java:241) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13491) Testing : log thread stacks when metastore fails to start
[ https://issues.apache.org/jira/browse/HIVE-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236757#comment-15236757 ] ASF GitHub Bot commented on HIVE-13491: --- GitHub user thejasmn opened a pull request: https://github.com/apache/hive/pull/71 HIVE-13491 - print thread dumps You can merge this pull request into a Git repository by running: $ git pull https://github.com/thejasmn/hive HIVE-13491 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/71.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #71 commit 9648db8c3cdc97a9a6449a6e801c9670d796de86 Author: Thejas NairDate: 2016-04-12T07:30:32Z HIVE-13491 - print thread dumps > Testing : log thread stacks when metastore fails to start > -- > > Key: HIVE-13491 > URL: https://issues.apache.org/jira/browse/HIVE-13491 > Project: Hive > Issue Type: Bug > Components: Test, Testing Infrastructure >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-13491.1.patch > > > Many tests are failing in ptest2 because metastore fails to startup in the > expected time. > There is not enough information to figure out why the metastore startup > failed/got hung in the hive.log file. Printing the thread dumps when that > happens would be useful for finding the root cause. > The stack in test failure looks like this - > {code} > java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:579) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.loopUntilHMSReady(MetaStoreUtils.java:1208) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1195) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1177) > at > org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.setup(TestHadoopAuthBridge23.java:153) > at > org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testMetastoreProxyUser(TestHadoopAuthBridge23.java:241) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)