Jayush Luniya created AMBARI-10197:
--------------------------------------
Summary: Apache builds for trunk are getting aborted
Key: AMBARI-10197
URL: https://issues.apache.org/jira/browse/AMBARI-10197
Project: Ambari
Issue Type: Bug
Components: ambari-agent
Affects Versions: 2.1.0
Reporter: Jayush Luniya
Fix For: 2.1.0
On 3/24/15, 7:50 PM, "Jonathan Hurley" <[email protected]> wrote:
Ah, I see that. Looks like TestController.TestController is a common theme here
then. I tried running the tests on CentOS 6 instead of OSX and it looks like
mine hung on test_certSigningFailed the first time and
test_heartbeat_no_host_check_cmd_in_queue the second time.
Let’s open up a Jira for this so it can be tracked and resolved.
On Mar 24, 2015, at 7:20 PM, Jayush Luniya <[email protected]> wrote:
Hi Jonathan,
Yes, as I mentioned the UT tests hang which is not 100% repro. The BOA is
aborted after 2 hours.
However the builds always hang during Ambari Agent Test. If you see the
logs further up, you will see that the actual abort happened during the
TestController UTs (I.e. Python was terminated), but the build was not yet
entirely terminated and hence we continue building the ambari client,
python client until it was completely aborted.
test_addToStatusQueue (TestController.TestController) ... ok
test_certSigningFailed (TestController.TestController) ... ok
test_heartbeatWithServer (TestController.TestController) ... ok
test_registerAndHeartbeat (TestController.TestController) ... ok
test_registerAndHeartbeatWithException (TestController.TestController) ...
ok
test_registerAndHeartbeat_check_registration_listener
(TestController.TestController) ... Build timed out (after 120 minutes).
Marking the build as aborted.
Build was aborted
/home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-agent/../a
mbari-common/src/main/unix/ambari-python-wrap: line 40: 31955 Terminated
$PYTHON "$@"
[INFO]
[INFO]
------------------------------------------------------------------------
[INFO] Building Ambari Client 2.0.0-SNAPSHOT
[INFO]
------------------------------------------------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ ambari-client ---
[INFO] Deleting
/home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-client
(includes = [**/*.pyc], excludes = [])
[INFO]
[INFO] --- build-helper-maven-plugin:1.8:regex-property
(parse-package-version) @ ambari-client ---
[INFO]
[INFO] --- build-helper-maven-plugin:1.8:regex-property
(parse-package-release) @ ambari-client ---
[INFO]
[INFO] --- apache-rat-plugin:0.11:check (default) @ ambari-client ---
[INFO] 53 implicit excludes (use -debug for more details).
[INFO] No excludes explicitly specified.
[INFO] 2 resources included (use -debug for more details)
[INFO] Rat check: Summary of files. Unapproved: 0 unknown: 0 generated: 0
approved: 2 licence.
[INFO]
[INFO] --- maven-assembly-plugin:2.2-beta-5:single (build-tarball) @
ambari-client ---
[INFO] Reading assembly descriptor: assemblies/client.xml
[INFO]
[INFO] --- maven-assembly-plugin:2.2-beta-5:single (make-assembly) @
ambari-client ---
[INFO] Reading assembly descriptor: assemblies/client.xml
[INFO]
[INFO] --- maven-install-plugin:2.4:install (default-install) @
ambari-client ---
[INFO] Installing
/home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-client/pom
.xml to
/home/jenkins/.m2/repository/org/apache/ambari/ambari-client/2.0.0-SNAPSHOT
/ambari-client-2.0.0-SNAPSHOT.pom
[INFO]
[INFO]
------------------------------------------------------------------------
[INFO] Building Ambari Python Client 2.0.0-SNAPSHOT
[INFO]
------------------------------------------------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ python-client ---
[INFO] Deleting
/home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-client/pyt
hon-client (includes = [**/*.pyc], excludes = [])
[INFO]
[INFO] --- build-helper-maven-plugin:1.8:regex-property
(parse-package-version) @ python-client ---
[INFO]
[INFO] --- build-helper-maven-plugin:1.8:regex-property
(parse-package-release) @ python-client ---
[INFO]
[INFO] --- exec-maven-plugin:1.2:exec (python-test) @ python-client ---
Updating AMBARI-10163
Recording test results
Warning: you have no plugins providing access control for builds, so
falling back to legacy behavior of permitting any downstream builds to be
triggered
Finished: ABORTED
Thanks
Jayush
On 3/24/15, 1:25 PM, "Jonathan Hurley" <[email protected]> wrote:
I think that we¹re looking in the wrong places. Consider:
https://builds.apache.org/job/Ambari-trunk-Commit/2101
and
https://builds.apache.org/job/Ambari-trunk-Commit/2100
2101 successfully built in about an hour. 2100 did not; it aborted after
2 hours. It aborted during the Groovy unit tests. Ambari unit test time
variances should not swing the total job time by an hour.
Perhaps something else is going gone here. Maybe there¹s a network issue
and Git or one of the maven build steps is taking too long.
The pattern seems to be that the builds are not stuck since they are
aborted at different stages in between jobs. Groovy, agent tests, etc.
On Mar 24, 2015, at 4:07 PM, Jonathan Hurley
<[email protected]<mailto:[email protected]>> wrote:
No, that change should have no effect on the tests. There were aborted
runs before that change, and there were failed runs after it. It seems
like in some cases, the tests just take too long.
On Mar 24, 2015, at 3:55 PM, Jayush Luniya
<[email protected]<mailto:[email protected]>> wrote:
This is the change that went in in build#2072.
Jonathan, any change the issue below could have been caused by it?
Sumit, what was the commit version of your change to reenable
TestController tests and when was it committed?
1. AMBARI-10126 <https://issues.apache.org/jira/browse/AMBARI-10126> -
Alert Scheduler Is Double Scheduling Jobs (jonathanhurley) (details
<https://builds.apache.org/job/Ambari-trunk-Commit/2072/changes#detail0>)
Commit 68468feeeeb35ca9edd4899ea8b1abafb7c2742a
<http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=68468feeee
b
35ca9edd4899ea8b1abafb7c2742a> by jhurley
<https://builds.apache.org/user/jhurley/>AMBARI-10126
<https://issues.apache.org/jira/browse/AMBARI-10126> - Alert Scheduler Is
Double Scheduling Jobs (jonathanhurley)
ambari-agent/src/main/python/ambari_agent/Controller.py
<http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=blob&f=ambari-agent
/
src/main/python/ambari_agent/Controller.py&h=bb85337bfdf2404a6aabf78eb361c
1
12f77c977e&hb=68468feeeeb35ca9edd4899ea8b1abafb7c2742a> (diff)
<http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=blobdiff&f=ambari-a
g
ent/src/main/python/ambari_agent/Controller.py&fp=ambari-agent/src/main/py
t
hon/ambari_agent/Controller.py&h=eeca4c294399e04dae8d893f078d6e6125f3df47&
h
p=bb85337bfdf2404a6aabf78eb361c112f77c977e&hb=68468feeeeb35ca9edd4899ea8b1
a
bafb7c2742a&hpb=32e1215639f3cdfea68e2955f316576f1ded85fe>
Thanks
Jayush
On 3/24/15, 12:49 PM, "Sumit Mohanty"
<[email protected]<mailto:[email protected]>> wrote:
The TestController are the tests I re-enabled to run on mac recently. So
we may see these failures locally as well if your dev box is mac.
________________________________________
From: Jayush Luniya
<[email protected]<mailto:[email protected]>>
Sent: Tuesday, March 24, 2015 12:24 PM
To: Alejandro Fernandez;
[email protected]<mailto:[email protected]>
Subject: Re: Server unit tests take too long (30+ minutes)
Agreed we should take a look at reducing our test times.
Also, I looked at the latest builds on trunk, looks like there agent
tests are hanging as well leading to builds being aborted. Culprit seems
to be TestController tests. This is not a consistent failure but happens
very frequently since build#2072
https://builds.apache.org/job/Ambari-trunk-Commit/
test_repeatRegistration (TestController.TestController) ... ok
test_restartAgent (TestController.TestController) ... ok
test_run (TestController.TestController) ... Build timed out (after 120
minutes). Marking the build as aborted.
Build was aborted
/home/jenkins/jenkins-slave/workspace/Ambari-trunk-Commit/ambari-agent/../
ambari-common/src/main/unix/ambari-python-wrap: line 40: 20024 Terminated
$PYTHON "$@"
Thanks
Jayush
From: Alejandro Fernandez
<[email protected]<mailto:[email protected]>>
Date: Tuesday, March 24, 2015 at 12:18 PM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Cc: Jayush Luniya
<[email protected]<mailto:[email protected]>>
Subject: Re: Server unit tests take too long (30+ minutes)
+1 to that.
grep -B1 ".*sec$" ~/test_times.txt | sed 's/^.*Time elapsed: \(.*\)$/\1/'
Here's another run with all tests that took over 30 secs. Total time in
these 28 test classes was 28 mins.
The biggest culprit was AmbariManagementControllerTest at 5:28
Running org.apache.ambari.server.agent.TestHeartbeatHandler
89.435 sec
Running org.apache.ambari.server.upgrade.UpgradeTest
76.566 sec
Running
org.apache.ambari.server.security.authorization.AmbariLdapAuthenticationPr
oviderForDNWithSpaceTest
55.582 sec
Running org.apache.ambari.server.security.authorization.TestUsers
43.228 sec
Running
org.apache.ambari.server.security.authorization.AmbariLdapAuthenticationPr
oviderTest
57.922 sec
Running
org.apache.ambari.server.controller.internal.StackDefinedPropertyProviderT
est
56.585 sec
Running
org.apache.ambari.server.controller.internal.RepositoryVersionResourceProv
iderTest
60.788 sec
Running
org.apache.ambari.server.controller.internal.UpgradeResourceProviderTest
40.329 sec
Running
org.apache.ambari.server.controller.internal.HostStackVersionResourceProvi
derTest
34.812 sec
Running
org.apache.ambari.server.controller.internal.StageResourceProviderTest
37.434 sec
Running org.apache.ambari.server.controller.AmbariServerTest
37.638 sec
Running org.apache.ambari.server.controller.AmbariManagementControllerTest
317.327 sec
Running org.apache.ambari.server.actionmanager.TestActionDBAccessorImpl
53.404 sec
Running org.apache.ambari.server.scheduler.ExecutionScheduleManagerTest
34.245 sec
Running
org.apache.ambari.server.notifications.dispatchers.SNMPDispatcherTest
34.732 sec
Running org.apache.ambari.server.state.UpgradeHelperTest
35.616 sec
Running org.apache.ambari.server.state.alerts.AlertEventPublisherTest
62.627 sec
Running org.apache.ambari.server.state.alerts.AlertDefinitionHashTest
42.206 sec
Running org.apache.ambari.server.state.alerts.AlertStateChangedEventTest
41.462 sec
Running org.apache.ambari.server.state.stack.UpgradePackTest
72.379 sec
Running org.apache.ambari.server.state.ConfigHelperTest
72.849 sec
Running
org.apache.ambari.server.state.svccomphost.ServiceComponentHostTest
50.383 sec
Running org.apache.ambari.server.state.cluster.ClusterTest
69.889 sec
Running org.apache.ambari.server.state.cluster.ClusterDeadlockTest
80.271 sec
Running org.apache.ambari.server.state.ServiceTest
45.443 sec
Running org.apache.ambari.server.orm.dao.AlertsDAOTest
57.077 sec
Running org.apache.ambari.server.orm.dao.AlertDefinitionDAOTest
33.872 sec
Running org.apache.ambari.server.metadata.RoleCommandOrderTest
31.794 sec
Thanks,
Alejandro
On 3/24/15, 11:54 AM, "Jonathan Hurley"
<[email protected]<mailto:[email protected]>> wrote:
Many of these, such as the deadlock tests and alert tests are just going
to take a long time due to the nature of what they're doing. In general,
if b.a.o is timing out, we need to either increase the timeout for the
job or change our pom.xml to allow for forked execution of the tests.
In my local environment, 3 concurrent forks can run through the test
suite in about 20 minutes. The problem is that both LDAP tests below
always fail in a forked environment. I'd say if we want to get the build
times down, we should look into making the 2 LDAP tests work with forked
test runners in the pom.xml
On Mar 24, 2015, at 2:33 PM, Sumit Mohanty
<[email protected]<mailto:[email protected]>> wrote:
?Hi,
these are some of the unit tests that take too long (more than 30 seconds
on my machine). There are several that are above 10 seconds but below 30
seconds range that can also use some optimization.
Jayush tells me that the Apache builds may be getting aborted as the
build + UT run takes more than an hour.
I will look into some of it when I get a chance. If there are any that
piques your curiosity then take a look.
Running org.apache.ambari.server.agent.TestHeartbeatHandler
Tests run: 34, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 67.43 sec
Running org.apache.ambari.server.state.cluster.ClusterTest
Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 55.576
sec
Running org.apache.ambari.server.state.cluster.ClusterDeadlockTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 52.252 sec
Running org.apache.ambari.server.upgrade.UpgradeTest
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 50.433 sec
Running org.apache.ambari.server.orm.dao.AlertDispatchDAOTest
Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 46.681
sec
Running org.apache.ambari.server.orm.dao.AlertsDAOTest
Tests run: 22, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 44.474
sec
Running org.apache.ambari.server.security.authorization.TestUsers
Tests run: 26, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 36.421
sec
Running
org.apache.ambari.server.security.authorization.AmbariLdapAuthenticationPr
oviderTest
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.46 sec
Running
org.apache.ambari.server.security.authorization.AmbariLdapAuthenticationPr
oviderForDNWithSpaceTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 35.706 sec
Running org.apache.ambari.server.state.ConfigHelperTest
Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.863
sec
Running
org.apache.ambari.server.controller.internal.StackDefinedPropertyProviderT
est
Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.247
sec
...
thanks
?-Sumit
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)