subject:"\[openstack\-dev\] \[Trove\] Trove\-Gate timeouts"

Re: [openstack-dev] [Trove] Trove-Gate timeouts

2014-02-17 Thread Lowery, Mathew

Hi Nikhil,

This is great!

You may have assumed this but I wanted to be explicit: To get the higher
timeout, I think all that needs to be done is to upgrade the
boot-hpcloud-vm plugin. Then the plugin will pass the current timeout
(7200 in rdjenkins) down to the net-ssl-simple library. (This is all based
on the assumption that rdjenkins is running an older version of the
plugin.)

Thanks,
Mat

On 2/17/14, 10:13 PM, Nikhil Manchanda nik...@manchanda.me wrote:


Hi Mathew:

Nice work identifying these issues with the current Jenkins integration
tests! I'm looking into some of these issues at the moment and am
attempting to ensure that the appropriate fixes are merged so that the
Trove integration gate job is healthy again.

Some of my comments are inline.

Lowery, Mathew writes:

 Hi all,

 Issue #1: Jobs that need more than one hour
[...]

 Suggested action items:

   *   If it is acceptable to have jobs that run over one hour, then
   install the latest boot-hpcloud-vm plugin for Jenkins which will
   increase the make the operation timeout match the idle timeout.

Turns out that some of the configuration parameters changes checked in
pushed the gate job above this timeout threshold. For the time
being, I suggest we increase the timeout value of the gate job to cater
to this increase. However, we also need to take a closer look at our
test suite, to see how we might be able to parallelize tests and
speed things up.

 Issue #2: The running time of all jobs is 1 hr 1 min
[...]

 Suggested action items:

   * Given that the minimum running time is one hour, I assume the
   problem is in the net-ssh-simple library. Needs more investigation.

I took a brief look at this and it seems like this might be a bug with the
underlying plugin. I will try and contact Matty Rhodes who is the author
of the plugin to take a look.


 Issue #3: Jenkins console log line timestamps different between full
 and truncated views
[...]
 Suggested action items:

   *   Upgrade the timestamper
   pluginhttps://wiki.jenkins-ci.org/display/JENKINS/Timestamper.

I've gone ahead and updated the timestamper plugin on the rdjenkins box,
so that this should no longer be an issue.

Let me know if you find anything else.

Cheers,
-Nikhil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Trove] Trove-Gate timeouts

2014-02-17 Thread Nikhil Manchanda


Hi Mat:

If I recall correctly, rdjenkins is actually running a version of the
plugin specifically built by Matty for it (should be the latest). I'm not
sure why it's still hitting the timeout issue mentioned earlier, so
we'll have to check with Matty and try to figure out what the deal with
that is.

Cheers,
-Nikhil


Lowery, Mathew writes:

 Hi Nikhil,

 This is great!

 You may have assumed this but I wanted to be explicit: To get the higher
 timeout, I think all that needs to be done is to upgrade the
 boot-hpcloud-vm plugin. Then the plugin will pass the current timeout
 (7200 in rdjenkins) down to the net-ssl-simple library. (This is all based
 on the assumption that rdjenkins is running an older version of the
 plugin.)

 Thanks,
 Mat

 On 2/17/14, 10:13 PM, Nikhil Manchanda nik...@manchanda.me wrote:


Hi Mathew:

Nice work identifying these issues with the current Jenkins integration
tests! I'm looking into some of these issues at the moment and am
attempting to ensure that the appropriate fixes are merged so that the
Trove integration gate job is healthy again.

Some of my comments are inline.

Lowery, Mathew writes:

 Hi all,

 Issue #1: Jobs that need more than one hour
[...]

 Suggested action items:

   *   If it is acceptable to have jobs that run over one hour, then
   install the latest boot-hpcloud-vm plugin for Jenkins which will
   increase the make the operation timeout match the idle timeout.

Turns out that some of the configuration parameters changes checked in
pushed the gate job above this timeout threshold. For the time
being, I suggest we increase the timeout value of the gate job to cater
to this increase. However, we also need to take a closer look at our
test suite, to see how we might be able to parallelize tests and
speed things up.

 Issue #2: The running time of all jobs is 1 hr 1 min
[...]

 Suggested action items:

   * Given that the minimum running time is one hour, I assume the
   problem is in the net-ssh-simple library. Needs more investigation.

I took a brief look at this and it seems like this might be a bug with the
underlying plugin. I will try and contact Matty Rhodes who is the author
of the plugin to take a look.


 Issue #3: Jenkins console log line timestamps different between full
 and truncated views
[...]
 Suggested action items:

   *   Upgrade the timestamper
   pluginhttps://wiki.jenkins-ci.org/display/JENKINS/Timestamper.

I've gone ahead and updated the timestamper plugin on the rdjenkins box,
so that this should no longer be an issue.

Let me know if you find anything else.

Cheers,
-Nikhil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Trove] Trove-Gate timeouts

2014-02-16 Thread Craig Vyvial

Trovesters,

One reason for the longer running test was that for the configuration
groups i added a creation of a new instance. This is to test a new instance
will be created with a configuration group applied. This might be causing
the run to be a little longer but i am surprised that its taking over an
hour to run through everything still.

-Craig Vyvial

On Sun, Feb 16, 2014 at 12:25 AM, Mirantis dmako...@mirantis.com wrote:

Hello, Mathew.

I'm seeing same issues with the gate.
I also tried to found out why gate job is failing. First ran into issue
related to cinder installation failure in devstack. But then I found same
problem as you described. The best option is to increase job time range.
Thanks for such research. I hope gate will be fixed in the easiest way and
for the shortest period of time.

Best regards
Denis Makogon.
Sent from an iPad

16 февр. 2014, в 00:46, Lowery, Mathew mlow...@ebay.com написал(а):

Hi all,

*Issue #1: Jobs that need more than one hour*

Of the last 30 Trove-Gate
https://rdjenkins.dyndns.org/job/Trove-Gate/builds (spanning three days), 7
have failed due to a Jenkins job-level
timeout (not a proboscis timeout). These jobs had no failed tests when the
timeout occurred.

Not having access to the job config to see what the job looks like, I
used the console output to guess what was going on. It appears that a
Jenkins plugin named
boot-hpcloud-vmhttps://github.com/mrhoades/boot-hpcloud-vm/blob/2272770b0ce54752eabb84229dc8939d79b2be50/models/boot_vm_concurrent.rb#L181
is
booting a VM and running the commands given, including redstack int-tests.
From the console output, it states that it was supplied with an
ssh_shell_timeout=7200. This is passed down to another library called
net-ssh-simplehttps://github.com/busyloop/net-ssh-simple/blob/e3834f259a47606bfb06a487ca701fc20dbad8a5/lib/net/ssh/simple.rb#L632.
net-ssh-simple has two timeouts: an idle timeout and an operation timeout.

In the latest
boot-hpcloud-vmhttps://github.com/mrhoades/boot-hpcloud-vm/blob/2272770b0ce54752eabb84229dc8939d79b2be50/models/boot_vm_concurrent.rb#L182,
ssh_shell_timeout is passed down to net-ssh-simple for both the idle
timeout and the operation timeout. But in older versions of
boot-hp-cloud-vmhttps://github.com/mrhoades/boot-hpcloud-vm/blob/9260e957d6c54142c33dd9e9632b86e17fd5c02f/models/boot_vm_concurrent.rb#L141,
ssh_shell_timeout is passed down to net-ssh-simple for only the idle
timeout, leaving a default operation timeout of 3600. This is why I believe
these jobs are failing after exactly one hour.

FYI: Here are the jobs that failed due to the Jenkins job-level timeout
(and had no test failures when the timeout occurred) along with their
associated patch sets:
https://rdjenkins.dyndns.org/job/Trove-Gate/2532/console (
http://review.openstack.org/73786)
https://rdjenkins.dyndns.org/job/Trove-Gate/2530/console (
http://review.openstack.org/73736)
https://rdjenkins.dyndns.org/job/Trove-Gate/2517/console (
http://review.openstack.org/63789)
https://rdjenkins.dyndns.org/job/Trove-Gate/2514/console (
https://review.openstack.org/50944)
https://rdjenkins.dyndns.org/job/Trove-Gate/2513/console (
https://review.openstack.org/50944)
https://rdjenkins.dyndns.org/job/Trove-Gate/2504/console (
https://review.openstack.org/73147)
https://rdjenkins.dyndns.org/job/Trove-Gate/2503/console (
https://review.openstack.org/73147)

*Suggested action items:*

- If it is acceptable to have jobs that run over one hour, then
install the latest boot-hpcloud-vm plugin for Jenkins which will increase
the make the operation timeout match the idle timeout.

*Issue #2: The running time of all jobs is 1 hr 1 min*

While the Jenkins job-level timeout will end the job after one hour, it
also appears to keep every job running for a minimum of one hour. To be
more precise, the timeout (or minimum running time) occurs on the part of
the Jenkins job that runs commands on the VM; the VM provision (which takes
about one minute) is excluded from this timeout which is why the running
time of all jobs is around 1 hr 1
minhttps://rdjenkins.dyndns.org/job/Trove-Gate/buildTimeTrend.
A sampling of console logs showing the time the int-tests completed and
when the timeout kicks in:

https://rdjenkins.dyndns.org/job/Trove-Gate/2531/console (00:01:03
wasted)

*04:51:12* COMMAND_0: echo refs/changes/36/73736/2

...

*05:50:10* 335.41 proboscis.case.MethodTest
(test_instance_created)*05:50:10* 194.05 proboscis.case.MethodTest
(test_instance_returns_to_active_after_resize)*05:51:13*
***05:51:13* ** STDERR-BEGIN **

https://rdjenkins.dyndns.org/job/Trove-Gate/2521/console (00:06:44
wasted)

*21:11:44* COMMAND_0: echo refs/changes/89/63789/13

...

*22:05:00* 195.11 proboscis.case.MethodTest
(test_instance_returns_to_active_after_resize)*22:05:00* 186.89

Re: [openstack-dev] [Trove] Trove-Gate timeouts

2014-02-16 Thread Denis Makogon

Hi, Craig.

Yes, i thought about configurations test suits.
For now core team, maybe, should extend gate running time.
But for the tempest tests i would suggest to exclude some tests from
'gate'-group (the longest ones).
We need to deal with it asap, because gate failing for four or five days.

Best regards
Denis Makogon.

Sent from an iPad

On Mon, Feb 17, 2014 at 6:33 AM, Craig Vyvial cp16...@gmail.com wrote:

Trovesters,

-Craig Vyvial

On Sun, Feb 16, 2014 at 12:25 AM, Mirantis dmako...@mirantis.com wrote:

Hello, Mathew.

Best regards
Denis Makogon.
Sent from an iPad

16 февр. 2014, в 00:46, Lowery, Mathew mlow...@ebay.com написал(а):

Hi all,

*Issue #1: Jobs that need more than one hour*

Of the last 30 Trove-Gate
https://rdjenkins.dyndns.org/job/Trove-Gate/builds (spanning three days),
7 have failed due to a Jenkins job-level
timeout (not a proboscis timeout). These jobs had no failed tests when the
timeout occurred.

*Suggested action items:*

- If it is acceptable to have jobs that run over one hour, then
install the latest boot-hpcloud-vm plugin for Jenkins which will increase
the make the operation timeout match the idle timeout.

*Issue #2: The running time of all jobs is 1 hr 1 min*

https://rdjenkins.dyndns.org/job/Trove-Gate/2531/console (00:01:03
wasted)

*04:51:12* COMMAND_0: echo refs/changes/36/73736/2

...

*05:50:10* 335.41 proboscis.case.MethodTest
(test_instance_created)*05:50:10* 194.05

[openstack-dev] [Trove] Trove-Gate timeouts

2014-02-15 Thread Lowery, Mathew

Hi all,

Issue #1: Jobs that need more than one hour

Of the last 30 Trove-Gatehttps://rdjenkins.dyndns.org/job/Trove-Gate/ builds 
(spanning three days), 7 have failed due to a Jenkins job-level timeout (not a 
proboscis timeout). These jobs had no failed tests when the timeout occurred.

Not having access to the job config to see what the job looks like, I used the 
console output to guess what was going on. It appears that a Jenkins plugin 
named 
boot-hpcloud-vmhttps://github.com/mrhoades/boot-hpcloud-vm/blob/2272770b0ce54752eabb84229dc8939d79b2be50/models/boot_vm_concurrent.rb#L181
 is booting a VM and running the commands given, including redstack int-tests. 
From the console output, it states that it was supplied with an 
ssh_shell_timeout=7200. This is passed down to another library called 
net-ssh-simplehttps://github.com/busyloop/net-ssh-simple/blob/e3834f259a47606bfb06a487ca701fc20dbad8a5/lib/net/ssh/simple.rb#L632.
 net-ssh-simple has two timeouts: an idle timeout and an operation timeout.

In the latest 
boot-hpcloud-vmhttps://github.com/mrhoades/boot-hpcloud-vm/blob/2272770b0ce54752eabb84229dc8939d79b2be50/models/boot_vm_concurrent.rb#L182,
 ssh_shell_timeout is passed down to net-ssh-simple for both the idle timeout 
and the operation timeout. But in older versions of 
boot-hp-cloud-vmhttps://github.com/mrhoades/boot-hpcloud-vm/blob/9260e957d6c54142c33dd9e9632b86e17fd5c02f/models/boot_vm_concurrent.rb#L141,
 ssh_shell_timeout is passed down to net-ssh-simple for only the idle timeout, 
leaving a default operation timeout of 3600. This is why I believe these jobs 
are failing after exactly one hour.

FYI: Here are the jobs that failed due to the Jenkins job-level timeout (and 
had no test failures when the timeout occurred) along with their associated 
patch sets:
https://rdjenkins.dyndns.org/job/Trove-Gate/2532/console 
(http://review.openstack.org/73786)
https://rdjenkins.dyndns.org/job/Trove-Gate/2530/console 
(http://review.openstack.org/73736)
https://rdjenkins.dyndns.org/job/Trove-Gate/2517/console 
(http://review.openstack.org/63789)
https://rdjenkins.dyndns.org/job/Trove-Gate/2514/console 
(https://review.openstack.org/50944)
https://rdjenkins.dyndns.org/job/Trove-Gate/2513/console 
(https://review.openstack.org/50944)
https://rdjenkins.dyndns.org/job/Trove-Gate/2504/console 
(https://review.openstack.org/73147)
https://rdjenkins.dyndns.org/job/Trove-Gate/2503/console 
(https://review.openstack.org/73147)

Suggested action items:

  *   If it is acceptable to have jobs that run over one hour, then install the 
latest boot-hpcloud-vm plugin for Jenkins which will increase the make the 
operation timeout match the idle timeout.

Issue #2: The running time of all jobs is 1 hr 1 min

While the Jenkins job-level timeout will end the job after one hour, it also 
appears to keep every job running for a minimum of one hour.  To be more 
precise, the timeout (or minimum running time) occurs on the part of the 
Jenkins job that runs commands on the VM; the VM provision (which takes about 
one minute) is excluded from this timeout which is why the running time of all 
jobs is around 1 hr 1 
minhttps://rdjenkins.dyndns.org/job/Trove-Gate/buildTimeTrend. A sampling of 
console logs showing the time the int-tests completed and when the timeout 
kicks in:

https://rdjenkins.dyndns.org/job/Trove-Gate/2531/console (00:01:03 wasted)

04:51:12 COMMAND_0: echo refs/changes/36/73736/2

...

05:50:10 335.41 proboscis.case.MethodTest (test_instance_created)
05:50:10 194.05 proboscis.case.MethodTest 
(test_instance_returns_to_active_after_resize)
05:51:13 **
05:51:13 ** STDERR-BEGIN **

https://rdjenkins.dyndns.org/job/Trove-Gate/2521/console (00:06:44 wasted)

21:11:44 COMMAND_0: echo refs/changes/89/63789/13

...

22:05:00 195.11 proboscis.case.MethodTest 
(test_instance_returns_to_active_after_resize)
22:05:00 186.89 proboscis.case.MethodTest (test_resize_down)
22:11:44 **
22:11:44 ** STDERR-BEGIN **


https://rdjenkins.dyndns.org/job/Trove-Gate/2518/consoleFull (00:06:01 wasted)

17:46:59 COMMAND_0: echo refs/changes/02/64302/20

...

18:40:57 210.03 proboscis.case.MethodTest 
(test_instance_returns_to_active_after_resize)
18:40:57 187.89 proboscis.case.MethodTest (test_resize_down)
18:46:58 **
18:46:58 ** STDERR-BEGIN **


Suggested action items:

  *

Given that the minimum running time is one hour, I assume the problem is in the 
net-ssh-simple library. Needs more investigation.


Issue #3: Jenkins console log line timestamps different between full and 
truncated views

I assume this is due to 
JENKINS-17779https://issues.jenkins-ci.org/browse/JENKINS-17779.

Suggested action items:

  *   Upgrade the timestamper 
pluginhttps://wiki.jenkins-ci.org/display/JENKINS/Timestamper.
___

Re: [openstack-dev] [Trove] Trove-Gate timeouts

2014-02-15 Thread Mirantis

Hello, Mathew.

I'm seeing same issues with the gate.
I also tried to found out why gate job is failing. First ran into issue related 
to cinder installation failure in devstack. But then I found same problem as 
you described. The best option is to increase job time range. 
Thanks for such research. I hope gate will be fixed in the easiest way and for 
the shortest period of time.

Best regards
Denis Makogon.
Sent from an iPad

 16 февр. 2014, в 00:46, Lowery, Mathew mlow...@ebay.com написал(а):
 
 Hi all,
 
 Issue #1: Jobs that need more than one hour
 
 Of the last 30 Trove-Gate builds (spanning three days), 7 have failed due to 
 a Jenkins job-level timeout (not a proboscis timeout). These jobs had no 
 failed tests when the timeout occurred.
 
 Not having access to the job config to see what the job looks like, I used 
 the console output to guess what was going on. It appears that a Jenkins 
 plugin named boot-hpcloud-vm is booting a VM and running the commands given, 
 including redstack int-tests. From the console output, it states that it was 
 supplied with an ssh_shell_timeout=7200. This is passed down to another 
 library called net-ssh-simple. net-ssh-simple has two timeouts: an idle 
 timeout and an operation timeout.
 
 In the latest boot-hpcloud-vm, ssh_shell_timeout is passed down to 
 net-ssh-simple for both the idle timeout and the operation timeout. But in 
 older versions of boot-hp-cloud-vm, ssh_shell_timeout is passed down to 
 net-ssh-simple for only the idle timeout, leaving a default operation timeout 
 of 3600. This is why I believe these jobs are failing after exactly one hour.
 
 FYI: Here are the jobs that failed due to the Jenkins job-level timeout (and 
 had no test failures when the timeout occurred) along with their associated 
 patch sets:
 https://rdjenkins.dyndns.org/job/Trove-Gate/2532/console 
 (http://review.openstack.org/73786)
 https://rdjenkins.dyndns.org/job/Trove-Gate/2530/console 
 (http://review.openstack.org/73736)
 https://rdjenkins.dyndns.org/job/Trove-Gate/2517/console 
 (http://review.openstack.org/63789)
 https://rdjenkins.dyndns.org/job/Trove-Gate/2514/console 
 (https://review.openstack.org/50944)
 https://rdjenkins.dyndns.org/job/Trove-Gate/2513/console 
 (https://review.openstack.org/50944)
 https://rdjenkins.dyndns.org/job/Trove-Gate/2504/console 
 (https://review.openstack.org/73147)
 https://rdjenkins.dyndns.org/job/Trove-Gate/2503/console 
 (https://review.openstack.org/73147)
 
 Suggested action items:
 If it is acceptable to have jobs that run over one hour, then install the 
 latest boot-hpcloud-vm plugin for Jenkins which will increase the make the 
 operation timeout match the idle timeout.
 
 Issue #2: The running time of all jobs is 1 hr 1 min
 
 While the Jenkins job-level timeout will end the job after one hour, it also 
 appears to keep every job running for a minimum of one hour.  To be more 
 precise, the timeout (or minimum running time) occurs on the part of the 
 Jenkins job that runs commands on the VM; the VM provision (which takes about 
 one minute) is excluded from this timeout which is why the running time of 
 all jobs is around 1 hr 1 min. A sampling of console logs showing the time 
 the int-tests completed and when the timeout kicks in:
 
 https://rdjenkins.dyndns.org/job/Trove-Gate/2531/console (00:01:03 wasted)
 04:51:12 COMMAND_0: echo refs/changes/36/73736/2
 ...
 05:50:10 335.41 proboscis.case.MethodTest (test_instance_created)
 05:50:10 194.05 proboscis.case.MethodTest 
 (test_instance_returns_to_active_after_resize)
 05:51:13 **
 05:51:13 ** STDERR-BEGIN **
 
 https://rdjenkins.dyndns.org/job/Trove-Gate/2521/console (00:06:44 wasted)
 21:11:44 COMMAND_0: echo refs/changes/89/63789/13
 ...
 22:05:00 195.11 proboscis.case.MethodTest 
 (test_instance_returns_to_active_after_resize)
 22:05:00 186.89 proboscis.case.MethodTest (test_resize_down)
 22:11:44 **
 22:11:44 ** STDERR-BEGIN **
 
 https://rdjenkins.dyndns.org/job/Trove-Gate/2518/consoleFull (00:06:01 wasted)
 17:46:59 COMMAND_0: echo refs/changes/02/64302/20
 ...
 18:40:57 210.03 proboscis.case.MethodTest 
 (test_instance_returns_to_active_after_resize)
 18:40:57 187.89 proboscis.case.MethodTest (test_resize_down)
 18:46:58 **
 18:46:58 ** STDERR-BEGIN **
 
 Suggested action items:
 Given that the minimum running time is one hour, I assume the problem is in 
 the net-ssh-simple library. Needs more investigation.
 
 Issue #3: Jenkins console log line timestamps different between full and 
 truncated views
 
 I assume this is due to JENKINS-17779.
 
 Suggested action items:
 Upgrade the timestamper plugin.
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Trove] Trove-Gate timeouts

Re: [openstack-dev] [Trove] Trove-Gate timeouts

Re: [openstack-dev] [Trove] Trove-Gate timeouts

Re: [openstack-dev] [Trove] Trove-Gate timeouts

[openstack-dev] [Trove] Trove-Gate timeouts

Re: [openstack-dev] [Trove] Trove-Gate timeouts

6 matches

Site Navigation

Mail list logo

Footer information