[jira] [Commented] (HIVE-13491) Testing : log thread stacks when metastore fails to start

2016-04-12 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237785#comment-15237785
 ] 

Thejas M Nair commented on HIVE-13491:
--

[~szehon] Yes, I agree, a restart is worth trying out.
Meanwhile, I will go ahead and commit this.


> Testing  : log thread stacks when metastore fails to start
> --
>
> Key: HIVE-13491
> URL: https://issues.apache.org/jira/browse/HIVE-13491
> Project: Hive
>  Issue Type: Bug
>  Components: Test, Testing Infrastructure
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-13491.1.patch
>
>
> Many tests are failing in ptest2 because metastore fails to startup in the 
> expected time.
> There is not enough information to figure out why the metastore startup 
> failed/got hung in the hive.log file. Printing the thread dumps when that 
> happens would be useful for finding the root cause.
> The stack in test failure looks like this -
> {code}
> java.net.ConnectException: Connection refused
>   at java.net.PlainSocketImpl.socketConnect(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>   at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
>   at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>   at java.net.Socket.connect(Socket.java:579)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.loopUntilHMSReady(MetaStoreUtils.java:1208)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1195)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1177)
>   at 
> org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.setup(TestHadoopAuthBridge23.java:153)
>   at 
> org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testMetastoreProxyUser(TestHadoopAuthBridge23.java:241)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13491) Testing : log thread stacks when metastore fails to start

2016-04-12 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237675#comment-15237675
 ] 

Szehon Ho commented on HIVE-13491:
--

I thinking to restart PTest server, which should trigger auto-generation of new 
test slaves fresh from the image, does anyone mind me doing that?

> Testing  : log thread stacks when metastore fails to start
> --
>
> Key: HIVE-13491
> URL: https://issues.apache.org/jira/browse/HIVE-13491
> Project: Hive
>  Issue Type: Bug
>  Components: Test, Testing Infrastructure
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-13491.1.patch
>
>
> Many tests are failing in ptest2 because metastore fails to startup in the 
> expected time.
> There is not enough information to figure out why the metastore startup 
> failed/got hung in the hive.log file. Printing the thread dumps when that 
> happens would be useful for finding the root cause.
> The stack in test failure looks like this -
> {code}
> java.net.ConnectException: Connection refused
>   at java.net.PlainSocketImpl.socketConnect(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>   at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
>   at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>   at java.net.Socket.connect(Socket.java:579)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.loopUntilHMSReady(MetaStoreUtils.java:1208)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1195)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1177)
>   at 
> org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.setup(TestHadoopAuthBridge23.java:153)
>   at 
> org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testMetastoreProxyUser(TestHadoopAuthBridge23.java:241)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13491) Testing : log thread stacks when metastore fails to start

2016-04-12 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237667#comment-15237667
 ] 

Szehon Ho commented on HIVE-13491:
--

I was also thinking the other day that maybe the machines are getting loaded or 
somewhat slow, hence HMS cannot start up in time.  But this will tell us for 
certain.

I will also take a look at that if I get a chance.

> Testing  : log thread stacks when metastore fails to start
> --
>
> Key: HIVE-13491
> URL: https://issues.apache.org/jira/browse/HIVE-13491
> Project: Hive
>  Issue Type: Bug
>  Components: Test, Testing Infrastructure
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-13491.1.patch
>
>
> Many tests are failing in ptest2 because metastore fails to startup in the 
> expected time.
> There is not enough information to figure out why the metastore startup 
> failed/got hung in the hive.log file. Printing the thread dumps when that 
> happens would be useful for finding the root cause.
> The stack in test failure looks like this -
> {code}
> java.net.ConnectException: Connection refused
>   at java.net.PlainSocketImpl.socketConnect(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>   at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
>   at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>   at java.net.Socket.connect(Socket.java:579)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.loopUntilHMSReady(MetaStoreUtils.java:1208)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1195)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1177)
>   at 
> org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.setup(TestHadoopAuthBridge23.java:153)
>   at 
> org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testMetastoreProxyUser(TestHadoopAuthBridge23.java:241)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13491) Testing : log thread stacks when metastore fails to start

2016-04-12 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237653#comment-15237653
 ] 

Szehon Ho commented on HIVE-13491:
--

Thanks +1

> Testing  : log thread stacks when metastore fails to start
> --
>
> Key: HIVE-13491
> URL: https://issues.apache.org/jira/browse/HIVE-13491
> Project: Hive
>  Issue Type: Bug
>  Components: Test, Testing Infrastructure
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-13491.1.patch
>
>
> Many tests are failing in ptest2 because metastore fails to startup in the 
> expected time.
> There is not enough information to figure out why the metastore startup 
> failed/got hung in the hive.log file. Printing the thread dumps when that 
> happens would be useful for finding the root cause.
> The stack in test failure looks like this -
> {code}
> java.net.ConnectException: Connection refused
>   at java.net.PlainSocketImpl.socketConnect(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>   at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
>   at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>   at java.net.Socket.connect(Socket.java:579)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.loopUntilHMSReady(MetaStoreUtils.java:1208)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1195)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1177)
>   at 
> org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.setup(TestHadoopAuthBridge23.java:153)
>   at 
> org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testMetastoreProxyUser(TestHadoopAuthBridge23.java:241)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13491) Testing : log thread stacks when metastore fails to start

2016-04-12 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237509#comment-15237509
 ] 

Thejas M Nair commented on HIVE-13491:
--

[~sershe] [~szehon] [~ashutoshc] [~sseth]
Can someone please review this change ? It should help nail down the problem 
with metastore startup in large number of tests. 
This change impacts only the tests.

We have 30+ patches in the queue, and many runs are taking 3+hrs to finish. 
Putting this in asap could help in reducing the number of failures in those 
tests and might also give more clues on why the runs are taking so long.


> Testing  : log thread stacks when metastore fails to start
> --
>
> Key: HIVE-13491
> URL: https://issues.apache.org/jira/browse/HIVE-13491
> Project: Hive
>  Issue Type: Bug
>  Components: Test, Testing Infrastructure
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-13491.1.patch
>
>
> Many tests are failing in ptest2 because metastore fails to startup in the 
> expected time.
> There is not enough information to figure out why the metastore startup 
> failed/got hung in the hive.log file. Printing the thread dumps when that 
> happens would be useful for finding the root cause.
> The stack in test failure looks like this -
> {code}
> java.net.ConnectException: Connection refused
>   at java.net.PlainSocketImpl.socketConnect(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>   at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
>   at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>   at java.net.Socket.connect(Socket.java:579)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.loopUntilHMSReady(MetaStoreUtils.java:1208)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1195)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1177)
>   at 
> org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.setup(TestHadoopAuthBridge23.java:153)
>   at 
> org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testMetastoreProxyUser(TestHadoopAuthBridge23.java:241)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13491) Testing : log thread stacks when metastore fails to start

2016-04-12 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236763#comment-15236763
 ] 

Thejas M Nair commented on HIVE-13491:
--

Also increased the frequency of checks for metastore startup from every 10 sec 
to every sec. 1 sec pause should be more than enough to not consume too much of 
cpu resources on the machine.


> Testing  : log thread stacks when metastore fails to start
> --
>
> Key: HIVE-13491
> URL: https://issues.apache.org/jira/browse/HIVE-13491
> Project: Hive
>  Issue Type: Bug
>  Components: Test, Testing Infrastructure
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-13491.1.patch
>
>
> Many tests are failing in ptest2 because metastore fails to startup in the 
> expected time.
> There is not enough information to figure out why the metastore startup 
> failed/got hung in the hive.log file. Printing the thread dumps when that 
> happens would be useful for finding the root cause.
> The stack in test failure looks like this -
> {code}
> java.net.ConnectException: Connection refused
>   at java.net.PlainSocketImpl.socketConnect(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>   at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
>   at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>   at java.net.Socket.connect(Socket.java:579)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.loopUntilHMSReady(MetaStoreUtils.java:1208)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1195)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1177)
>   at 
> org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.setup(TestHadoopAuthBridge23.java:153)
>   at 
> org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testMetastoreProxyUser(TestHadoopAuthBridge23.java:241)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13491) Testing : log thread stacks when metastore fails to start

2016-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236757#comment-15236757
 ] 

ASF GitHub Bot commented on HIVE-13491:
---

GitHub user thejasmn opened a pull request:

https://github.com/apache/hive/pull/71

HIVE-13491 - print thread dumps



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/thejasmn/hive HIVE-13491

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/71.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #71


commit 9648db8c3cdc97a9a6449a6e801c9670d796de86
Author: Thejas Nair 
Date:   2016-04-12T07:30:32Z

HIVE-13491 - print thread dumps




> Testing  : log thread stacks when metastore fails to start
> --
>
> Key: HIVE-13491
> URL: https://issues.apache.org/jira/browse/HIVE-13491
> Project: Hive
>  Issue Type: Bug
>  Components: Test, Testing Infrastructure
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-13491.1.patch
>
>
> Many tests are failing in ptest2 because metastore fails to startup in the 
> expected time.
> There is not enough information to figure out why the metastore startup 
> failed/got hung in the hive.log file. Printing the thread dumps when that 
> happens would be useful for finding the root cause.
> The stack in test failure looks like this -
> {code}
> java.net.ConnectException: Connection refused
>   at java.net.PlainSocketImpl.socketConnect(Native Method)
>   at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>   at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
>   at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>   at java.net.Socket.connect(Socket.java:579)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.loopUntilHMSReady(MetaStoreUtils.java:1208)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1195)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.startMetaStore(MetaStoreUtils.java:1177)
>   at 
> org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.setup(TestHadoopAuthBridge23.java:153)
>   at 
> org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testMetastoreProxyUser(TestHadoopAuthBridge23.java:241)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)