[jira] [Commented] (KAFKA-5608) System test failure due to timeout starting Jmx tool

2017-07-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16096194#comment-16096194
 ] 

ASF GitHub Bot commented on KAFKA-5608:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3553


> System test failure due to timeout starting Jmx tool
> 
>
> Key: KAFKA-5608
> URL: https://issues.apache.org/jira/browse/KAFKA-5608
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.11.0.1, 0.11.1.0
>
>
> Began seeing this in some failing system tests:
> {code}
> [INFO  - 2017-07-18 14:25:55,375 - background_thread - _protected_worker - 
> lineno:39]: Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/services/background_thread.py",
>  line 35, in _protected_worker
> self._worker(idx, node)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/console_consumer.py",
>  line 261, in _worker
> self.start_jmx_tool(idx, node)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/monitor/jmx.py",
>  line 73, in start_jmx_tool
> wait_until(lambda: self._jmx_has_output(node), timeout_sec=10, 
> backoff_sec=.5, err_msg="%s: Jmx tool took too long to start" % node.account)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: ubuntu@worker7: Jmx tool took too long to start
> {code}
> This is immediately followed by a consumer timeout in the failing cases:
> {code}
> [INFO  - 2017-07-18 14:26:46,907 - runner_client - log - lineno:221]: 
> RunnerClient: 
> kafkatest.tests.core.security_rolling_upgrade_test.TestSecurityRollingUpgrade.test_rolling_upgrade_phase_two.broker_protocol=SASL_SSL.client_protocol=SASL_SSL:
>  FAIL: Consumer failed to consume messages for 60s.
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py",
>  line 148, in test_rolling_upgrade_phase_two
> self.run_produce_consume_validate(self.roll_in_secured_settings, 
> client_protocol, broker_protocol)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 106, in run_produce_consume_validate
> self.start_producer_and_consumer()
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in start_producer_and_consumer
> self.consumer_start_timeout_sec)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Consumer failed to consume messages for 60s.
> {code}
> There does not appear to be anything wrong with the consumer in the logs, so 
> the timeout seems to be caused by the Jmx tool timeout.
> Possibly due to https://github.com/apache/kafka/pull/3447/files?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5608) System test failure due to timeout starting Jmx tool

2017-07-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094541#comment-16094541
 ] 

ASF GitHub Bot commented on KAFKA-5608:
---

GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/3553

KAFKA-5608: Follow-up to fix potential NPE and clarify method name



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka kafka-5608-follow-up

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3553.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3553


commit 7f43b0f23402c7018532f92aac438929bc83ca68
Author: Ismael Juma 
Date:   2017-07-20T11:09:02Z

KAFKA-5608: Follow-up to fix potential NPE and clarify method name




> System test failure due to timeout starting Jmx tool
> 
>
> Key: KAFKA-5608
> URL: https://issues.apache.org/jira/browse/KAFKA-5608
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.11.0.1, 0.11.1.0
>
>
> Began seeing this in some failing system tests:
> {code}
> [INFO  - 2017-07-18 14:25:55,375 - background_thread - _protected_worker - 
> lineno:39]: Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/services/background_thread.py",
>  line 35, in _protected_worker
> self._worker(idx, node)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/console_consumer.py",
>  line 261, in _worker
> self.start_jmx_tool(idx, node)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/monitor/jmx.py",
>  line 73, in start_jmx_tool
> wait_until(lambda: self._jmx_has_output(node), timeout_sec=10, 
> backoff_sec=.5, err_msg="%s: Jmx tool took too long to start" % node.account)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: ubuntu@worker7: Jmx tool took too long to start
> {code}
> This is immediately followed by a consumer timeout in the failing cases:
> {code}
> [INFO  - 2017-07-18 14:26:46,907 - runner_client - log - lineno:221]: 
> RunnerClient: 
> kafkatest.tests.core.security_rolling_upgrade_test.TestSecurityRollingUpgrade.test_rolling_upgrade_phase_two.broker_protocol=SASL_SSL.client_protocol=SASL_SSL:
>  FAIL: Consumer failed to consume messages for 60s.
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py",
>  line 148, in test_rolling_upgrade_phase_two
> self.run_produce_consume_validate(self.roll_in_secured_settings, 
> client_protocol, broker_protocol)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 106, in run_produce_consume_validate
> self.start_producer_and_consumer()
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in start_producer_and_consumer
> self.consumer_start_timeout_sec)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Consumer failed to consume messages for 60s.
> {code}
> There does not appear to be anything wrong with the consumer in the logs, so 
> the timeout seems to be caused by the Jmx tool timeout.
> Possibly due to https://github.com/apache/kafka/pull/3447/files?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5608) System test failure due to timeout starting Jmx tool

2017-07-19 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094188#comment-16094188
 ] 

Ewen Cheslack-Postava commented on KAFKA-5608:
--

For anyone following up, there was quite a bit more conversation on the PR 
itself: https://github.com/apache/kafka/pull/3547

> System test failure due to timeout starting Jmx tool
> 
>
> Key: KAFKA-5608
> URL: https://issues.apache.org/jira/browse/KAFKA-5608
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.11.0.1, 0.11.1.0
>
>
> Began seeing this in some failing system tests:
> {code}
> [INFO  - 2017-07-18 14:25:55,375 - background_thread - _protected_worker - 
> lineno:39]: Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/services/background_thread.py",
>  line 35, in _protected_worker
> self._worker(idx, node)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/console_consumer.py",
>  line 261, in _worker
> self.start_jmx_tool(idx, node)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/monitor/jmx.py",
>  line 73, in start_jmx_tool
> wait_until(lambda: self._jmx_has_output(node), timeout_sec=10, 
> backoff_sec=.5, err_msg="%s: Jmx tool took too long to start" % node.account)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: ubuntu@worker7: Jmx tool took too long to start
> {code}
> This is immediately followed by a consumer timeout in the failing cases:
> {code}
> [INFO  - 2017-07-18 14:26:46,907 - runner_client - log - lineno:221]: 
> RunnerClient: 
> kafkatest.tests.core.security_rolling_upgrade_test.TestSecurityRollingUpgrade.test_rolling_upgrade_phase_two.broker_protocol=SASL_SSL.client_protocol=SASL_SSL:
>  FAIL: Consumer failed to consume messages for 60s.
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py",
>  line 148, in test_rolling_upgrade_phase_two
> self.run_produce_consume_validate(self.roll_in_secured_settings, 
> client_protocol, broker_protocol)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 106, in run_produce_consume_validate
> self.start_producer_and_consumer()
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in start_producer_and_consumer
> self.consumer_start_timeout_sec)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Consumer failed to consume messages for 60s.
> {code}
> There does not appear to be anything wrong with the consumer in the logs, so 
> the timeout seems to be caused by the Jmx tool timeout.
> Possibly due to https://github.com/apache/kafka/pull/3447/files?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5608) System test failure due to timeout starting Jmx tool

2017-07-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094164#comment-16094164
 ] 

ASF GitHub Bot commented on KAFKA-5608:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3547


> System test failure due to timeout starting Jmx tool
> 
>
> Key: KAFKA-5608
> URL: https://issues.apache.org/jira/browse/KAFKA-5608
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.11.0.1, 0.11.1.0
>
>
> Began seeing this in some failing system tests:
> {code}
> [INFO  - 2017-07-18 14:25:55,375 - background_thread - _protected_worker - 
> lineno:39]: Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/services/background_thread.py",
>  line 35, in _protected_worker
> self._worker(idx, node)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/console_consumer.py",
>  line 261, in _worker
> self.start_jmx_tool(idx, node)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/monitor/jmx.py",
>  line 73, in start_jmx_tool
> wait_until(lambda: self._jmx_has_output(node), timeout_sec=10, 
> backoff_sec=.5, err_msg="%s: Jmx tool took too long to start" % node.account)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: ubuntu@worker7: Jmx tool took too long to start
> {code}
> This is immediately followed by a consumer timeout in the failing cases:
> {code}
> [INFO  - 2017-07-18 14:26:46,907 - runner_client - log - lineno:221]: 
> RunnerClient: 
> kafkatest.tests.core.security_rolling_upgrade_test.TestSecurityRollingUpgrade.test_rolling_upgrade_phase_two.broker_protocol=SASL_SSL.client_protocol=SASL_SSL:
>  FAIL: Consumer failed to consume messages for 60s.
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py",
>  line 148, in test_rolling_upgrade_phase_two
> self.run_produce_consume_validate(self.roll_in_secured_settings, 
> client_protocol, broker_protocol)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 106, in run_produce_consume_validate
> self.start_producer_and_consumer()
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in start_producer_and_consumer
> self.consumer_start_timeout_sec)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Consumer failed to consume messages for 60s.
> {code}
> There does not appear to be anything wrong with the consumer in the logs, so 
> the timeout seems to be caused by the Jmx tool timeout.
> Possibly due to https://github.com/apache/kafka/pull/3447/files?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5608) System test failure due to timeout starting Jmx tool

2017-07-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092578#comment-16092578
 ] 

ASF GitHub Bot commented on KAFKA-5608:
---

GitHub user ewencp opened a pull request:

https://github.com/apache/kafka/pull/3547

KAFKA-5608: Add --wait option for JmxTool and use in system tests to avoid 
race between JmxTool and monitored services



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ewencp/kafka wait-jmx-metrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3547.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3547


commit b33306053eda3ece107965b0db6111f36cc3232e
Author: Ewen Cheslack-Postava 
Date:   2017-07-19T04:31:23Z

KAFKA-5608: Add --wait option for JmxTool and use in system tests to avoid 
race between JmxTool and monitored services




> System test failure due to timeout starting Jmx tool
> 
>
> Key: KAFKA-5608
> URL: https://issues.apache.org/jira/browse/KAFKA-5608
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Ewen Cheslack-Postava
>
> Began seeing this in some failing system tests:
> {code}
> [INFO  - 2017-07-18 14:25:55,375 - background_thread - _protected_worker - 
> lineno:39]: Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/services/background_thread.py",
>  line 35, in _protected_worker
> self._worker(idx, node)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/console_consumer.py",
>  line 261, in _worker
> self.start_jmx_tool(idx, node)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/monitor/jmx.py",
>  line 73, in start_jmx_tool
> wait_until(lambda: self._jmx_has_output(node), timeout_sec=10, 
> backoff_sec=.5, err_msg="%s: Jmx tool took too long to start" % node.account)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: ubuntu@worker7: Jmx tool took too long to start
> {code}
> This is immediately followed by a consumer timeout in the failing cases:
> {code}
> [INFO  - 2017-07-18 14:26:46,907 - runner_client - log - lineno:221]: 
> RunnerClient: 
> kafkatest.tests.core.security_rolling_upgrade_test.TestSecurityRollingUpgrade.test_rolling_upgrade_phase_two.broker_protocol=SASL_SSL.client_protocol=SASL_SSL:
>  FAIL: Consumer failed to consume messages for 60s.
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py",
>  line 148, in test_rolling_upgrade_phase_two
> self.run_produce_consume_validate(self.roll_in_secured_settings, 
> client_protocol, broker_protocol)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 106, in run_produce_consume_validate
> self.start_producer_and_consumer()
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in start_producer_and_consumer
> self.consumer_start_timeout_sec)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Consumer failed to consume messages for 60s.
> {code}
> There does not appear to be anything wrong with the consumer in the logs, so 
> the timeout seems to be caused by the Jmx tool timeout.
> Possibly due to https://github.com/apache/kafka/pull/3447/files?



--
Thi

[jira] [Commented] (KAFKA-5608) System test failure due to timeout starting Jmx tool

2017-07-18 Thread Jason Gustafson (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092497#comment-16092497
 ] 

Jason Gustafson commented on KAFKA-5608:


[~ewencp] I think there is a race condition when starting up the jmx tool at 
the same time as the console consumer. If the metrics haven't been registered 
at initialization time, we may end up querying for nothing. We should probably 
modify the tool to wait until at least one expected name has been found before 
beginning polling. I did a quick dirty fix locally and it seemed to work. Does 
that sound plausible?

> System test failure due to timeout starting Jmx tool
> 
>
> Key: KAFKA-5608
> URL: https://issues.apache.org/jira/browse/KAFKA-5608
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Ewen Cheslack-Postava
>
> Began seeing this in some failing system tests:
> {code}
> [INFO  - 2017-07-18 14:25:55,375 - background_thread - _protected_worker - 
> lineno:39]: Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/services/background_thread.py",
>  line 35, in _protected_worker
> self._worker(idx, node)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/console_consumer.py",
>  line 261, in _worker
> self.start_jmx_tool(idx, node)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/monitor/jmx.py",
>  line 73, in start_jmx_tool
> wait_until(lambda: self._jmx_has_output(node), timeout_sec=10, 
> backoff_sec=.5, err_msg="%s: Jmx tool took too long to start" % node.account)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: ubuntu@worker7: Jmx tool took too long to start
> {code}
> This is immediately followed by a consumer timeout in the failing cases:
> {code}
> [INFO  - 2017-07-18 14:26:46,907 - runner_client - log - lineno:221]: 
> RunnerClient: 
> kafkatest.tests.core.security_rolling_upgrade_test.TestSecurityRollingUpgrade.test_rolling_upgrade_phase_two.broker_protocol=SASL_SSL.client_protocol=SASL_SSL:
>  FAIL: Consumer failed to consume messages for 60s.
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py",
>  line 148, in test_rolling_upgrade_phase_two
> self.run_produce_consume_validate(self.roll_in_secured_settings, 
> client_protocol, broker_protocol)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 106, in run_produce_consume_validate
> self.start_producer_and_consumer()
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in start_producer_and_consumer
> self.consumer_start_timeout_sec)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Consumer failed to consume messages for 60s.
> {code}
> There does not appear to be anything wrong with the consumer in the logs, so 
> the timeout seems to be caused by the Jmx tool timeout.
> Possibly due to https://github.com/apache/kafka/pull/3447/files?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5608) System test failure due to timeout starting Jmx tool

2017-07-18 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092467#comment-16092467
 ] 

Ewen Cheslack-Postava commented on KAFKA-5608:
--

Another interesting point by [~hachikuji] is that all the tests that fail with 
this error seem to use SASL.

> System test failure due to timeout starting Jmx tool
> 
>
> Key: KAFKA-5608
> URL: https://issues.apache.org/jira/browse/KAFKA-5608
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Ewen Cheslack-Postava
>
> Began seeing this in some failing system tests:
> {code}
> [INFO  - 2017-07-18 14:25:55,375 - background_thread - _protected_worker - 
> lineno:39]: Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/services/background_thread.py",
>  line 35, in _protected_worker
> self._worker(idx, node)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/console_consumer.py",
>  line 261, in _worker
> self.start_jmx_tool(idx, node)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/monitor/jmx.py",
>  line 73, in start_jmx_tool
> wait_until(lambda: self._jmx_has_output(node), timeout_sec=10, 
> backoff_sec=.5, err_msg="%s: Jmx tool took too long to start" % node.account)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: ubuntu@worker7: Jmx tool took too long to start
> {code}
> This is immediately followed by a consumer timeout in the failing cases:
> {code}
> [INFO  - 2017-07-18 14:26:46,907 - runner_client - log - lineno:221]: 
> RunnerClient: 
> kafkatest.tests.core.security_rolling_upgrade_test.TestSecurityRollingUpgrade.test_rolling_upgrade_phase_two.broker_protocol=SASL_SSL.client_protocol=SASL_SSL:
>  FAIL: Consumer failed to consume messages for 60s.
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py",
>  line 148, in test_rolling_upgrade_phase_two
> self.run_produce_consume_validate(self.roll_in_secured_settings, 
> client_protocol, broker_protocol)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 106, in run_produce_consume_validate
> self.start_producer_and_consumer()
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in start_producer_and_consumer
> self.consumer_start_timeout_sec)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Consumer failed to consume messages for 60s.
> {code}
> There does not appear to be anything wrong with the consumer in the logs, so 
> the timeout seems to be caused by the Jmx tool timeout.
> Possibly due to https://github.com/apache/kafka/pull/3447/files?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5608) System test failure due to timeout starting Jmx tool

2017-07-18 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092414#comment-16092414
 ] 

Ewen Cheslack-Postava commented on KAFKA-5608:
--

Further adding to the confusion, I've found a test 
ConnectMysqlTest.test_query_modes.query_mode=table.incremental_mode=timestamp 
which has this in the jmx_tool.err.log:

{quote}
Trying to connect to JMX url: 
service:jmx:rmi:///jndi/rmi://127.0.0.1:9192/jmxrmi.
Trying to connect to JMX url: 
service:jmx:rmi:///jndi/rmi://127.0.0.1:9192/jmxrmi.
{quote}

But based on the code 
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/JmxTool.scala#L98-L110
 it doesn't seem that it should be possible to see that message twice without 
also seeing an error message...

> System test failure due to timeout starting Jmx tool
> 
>
> Key: KAFKA-5608
> URL: https://issues.apache.org/jira/browse/KAFKA-5608
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Ewen Cheslack-Postava
>
> Began seeing this in some failing system tests:
> {code}
> [INFO  - 2017-07-18 14:25:55,375 - background_thread - _protected_worker - 
> lineno:39]: Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/services/background_thread.py",
>  line 35, in _protected_worker
> self._worker(idx, node)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/console_consumer.py",
>  line 261, in _worker
> self.start_jmx_tool(idx, node)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/services/monitor/jmx.py",
>  line 73, in start_jmx_tool
> wait_until(lambda: self._jmx_has_output(node), timeout_sec=10, 
> backoff_sec=.5, err_msg="%s: Jmx tool took too long to start" % node.account)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: ubuntu@worker7: Jmx tool took too long to start
> {code}
> This is immediately followed by a consumer timeout in the failing cases:
> {code}
> [INFO  - 2017-07-18 14:26:46,907 - runner_client - log - lineno:221]: 
> RunnerClient: 
> kafkatest.tests.core.security_rolling_upgrade_test.TestSecurityRollingUpgrade.test_rolling_upgrade_phase_two.broker_protocol=SASL_SSL.client_protocol=SASL_SSL:
>  FAIL: Consumer failed to consume messages for 60s.
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py",
>  line 148, in test_rolling_upgrade_phase_two
> self.run_produce_consume_validate(self.roll_in_secured_settings, 
> client_protocol, broker_protocol)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 106, in run_produce_consume_validate
> self.start_producer_and_consumer()
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in start_producer_and_consumer
> self.consumer_start_timeout_sec)
>   File 
> "/home/jenkins/workspace/system-test-kafka-0.11.0/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Consumer failed to consume messages for 60s.
> {code}
> There does not appear to be anything wrong with the consumer in the logs, so 
> the timeout seems to be caused by the Jmx tool timeout.
> Possibly due to https://github.com/apache/kafka/pull/3447/files?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-5608) System test failure due to timeout starting Jmx tool

2017-07-18 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092286#comment-16092286
 ] 

Ewen Cheslack-Postava commented on KAFKA-5608:
--

It looks like the JmxTool really did take too long to generate output:

{quote}
[INFO  - 2017-07-18 14:25:44,820 - kafka - bootstrap_servers - lineno:549]: 
Bootstrap client port is: 9095
[DEBUG - 2017-07-18 14:25:44,820 - console_consumer - _worker - lineno:255]: 
Console consumer 1 command: export JMX_PORT=9192; export 
LOG_DIR=/mnt/console_consumer/logs; export 
KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:/mnt/console_consumer/tools-log4j.properties";
 export KAFKA_OPTS="-Djava.security.auth.login.config=/mnt/security/jaas.conf 
-Djava.security.krb5.conf=/mnt/security/krb5.conf"; /opt/kafka-dev/bin/kafka
-console-consumer.sh --topic test_topic --consumer.config 
/mnt/console_consumer/console_consumer.properties --bootstrap-server 
worker11:9095,worker8:9095,worker4:9095 --isolation-level read_uncommitted 
--from-beginning --timeout-ms 6 --formatter 
kafka.tools.LoggingMessageFormatter 2>> 
/mnt/console_consumer/console_consumer.stderr | tee -a 
/mnt/console_consumer/console_consumer.stdout &
[DEBUG - 2017-07-18 14:25:44,820 - remoteaccount - _log - lineno:158]: 
ubuntu@worker7: Running ssh command: export JMX_PORT=9192; export 
LOG_DIR=/mnt/console_consumer/logs; export 
KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:/mnt/console_consumer/tools-log4j.properties";
 export KAFKA_OPTS="-Djava.security.auth.login.config=/mnt/security/jaas.conf 
-Djava.security.krb5.conf=/mnt/security/krb5.conf"; /opt/kafka-dev/bin/ka
fka-console-consumer.sh --topic test_topic --consumer.config 
/mnt/console_consumer/console_consumer.properties --bootstrap-server 
worker11:9095,worker8:9095,worker4:9095 --isolation-level read_uncommitted 
--from-beginning --timeout-ms 6 --formatter 
kafka.tools.LoggingMessageFormatter 2>> 
/mnt/console_consumer/console_consumer.stderr | tee -a 
/mnt/console_consumer/console_consumer.stdout &
[DEBUG - 2017-07-18 14:25:44,823 - console_consumer - _worker - lineno:260]: 
collecting following jmx objects: 
['kafka.consumer:type=consumer-coordinator-metrics,client-id=console-consumer']
[DEBUG - 2017-07-18 14:25:44,823 - remoteaccount - _log - lineno:158]: 
ubuntu@worker7: Running ssh command: nc -z 127.0.0.1 9192
[DEBUG - 2017-07-18 14:25:44,929 - remoteaccount - _log - lineno:158]: 
ubuntu@worker7: Running ssh command: nc -z 127.0.0.1 9192
[DEBUG - 2017-07-18 14:25:45,035 - remoteaccount - _log - lineno:158]: 
ubuntu@worker7: Running ssh command: nc -z 127.0.0.1 9192
[DEBUG - 2017-07-18 14:25:45,141 - remoteaccount - _log - lineno:158]: 
ubuntu@worker7: Running ssh command: nc -z 127.0.0.1 9192
[DEBUG - 2017-07-18 14:25:45,148 - jmx - start_jmx_tool - lineno:71]: 
ubuntu@worker7: Start JmxTool 1 command: /opt/kafka-dev/bin/kafka-run-class.sh 
kafka.tools.JmxTool --reporting-interval 1000 --jmx-url 
service:jmx:rmi:///jndi/rmi://127.0.0.1:9192/jmxrmi --object-name 
kafka.consumer:type=consumer-coordinator-metrics,client-id=console-consumer 
--attributes assigned-partitions 1>> /mnt/jmx_tool.log 2>> 
/mnt/jmx_tool.err.log &
[DEBUG - 2017-07-18 14:25:45,148 - remoteaccount - _log - lineno:158]: 
ubuntu@worker7: Running ssh command: /opt/kafka-dev/bin/kafka-run-class.sh 
kafka.tools.JmxTool --reporting-interval 1000 --jmx-url 
service:jmx:rmi:///jndi/rmi://127.0.0.1:9192/jmxrmi --object-name 
kafka.consumer:type=consumer-coordinator-metrics,client-id=console-consumer 
--attributes assigned-partitions 1>> /mnt/jmx_tool.log 2>> 
/mnt/jmx_tool.err.log &
[DEBUG - 2017-07-18 14:25:45,210 - remoteaccount - _log - lineno:158]: 
ubuntu@worker10: Running ssh command: mkdir -p /mnt/security
[DEBUG - 2017-07-18 14:25:45,215 - remoteaccount - _log - lineno:158]: 
ubuntu@worker10: Running ssh command: java -version
[DEBUG - 2017-07-18 14:25:45,216 - remoteaccount - _log - lineno:158]: 
ubuntu@worker7: Running ssh command: test -z "$(cat /mnt/jmx_tool.log)"
[INFO  - 2017-07-18 14:25:45,356 - verifiable_producer - _worker - lineno:131]: 
verifiable_producer.properties:
[INFO  - 2017-07-18 14:25:45,356 - verifiable_producer - _worker - lineno:132]:
ssl.endpoint.identification.algorithm=HTTPS
sasl.kerberos.service.name=kafka
security.protocol=SASL_SSL
ssl.keystore.location=/mnt/security/test.keystore.jks
ssl.truststore.location=/mnt/security/test.truststore.jks
ssl.keystore.password=test-ks-passwd
sasl.mechanism=GSSAPI
ssl.truststore.password=test-ts-passwd
ssl.key.password=test-key-passwd
sasl.mechanism.inter.broker.protocol=GSSAPI

request.timeout.ms=3

[DEBUG - 2017-07-18 14:25:45,359 - kafka_path - path - lineno:92]: Using path 
resolver KafkaSystemTestPathResolver
[INFO  - 2017-07-18 14:25:45,359 - kafka - bootstrap_servers - lineno:549]: 
Bootstrap client port is: 9095
[DEBUG - 2017-07-18 14:25:45,359 - verifiable_producer - _worker -