[Openstack] [Continuous-Integration] What else is running on the Jenkins slaves?

2012-06-26 Thread Eoghan Glynn

Folks,

A question for the CI side-of-the-house ...

What else is running on the Jenkins slaves, concurrently with the gating CI 
tests?

The background is the intermittent glance service launch failure - the recently
added strace-on-failure logic reveals the issue to be an EADDRINUSE when the
registry service listen socket is bound to a supposedly unused port.

Two possible explanations for this:

1. A race whereby some other process jumps in  grabs this port before the 
registry
   service is launched (the window of opportunity is not too narrow, as the API
   service is being launched in the meantime).

2. We identify the unused port by quickly opening a closing a socket on port 
zero -
   there could I guess be some lag in recycling the port, but this seems 
unlikely
   as no connections were established, hence no need for TIME_WAIT.

Option #1 seems the more likely, so I wanted to confirm there is indeed other
port-grabbing stuff running on the Jenkins slaves.

Cheers,
Eoghan
 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [Continuous-Integration] What else is running on the Jenkins slaves?

2012-06-26 Thread Andrew Hutchings
Hi Eoghan,

On 26/06/12 12:30, Eoghan Glynn wrote:
 A question for the CI side-of-the-house ...
 
 What else is running on the Jenkins slaves, concurrently with the gating CI 
 tests?

Very basic things, not much other than the Jenkins Slave service and
SSH.  Nothing that should cause conflicts that you are seeing.  We also
intentionally only run one test run per slave at a time.

 The background is the intermittent glance service launch failure - the 
 recently
 added strace-on-failure logic reveals the issue to be an EADDRINUSE when the
 registry service listen socket is bound to a supposedly unused port.

Are you closing ports with SO_REUSEADDR?  If the registry service or
something else isn't then I guess that could cause it.

Kind Regards
-- 
Andrew Hutchings - LinuxJedi - http://www.linuxjedi.co.uk/



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [Continuous-Integration] What else is running on the Jenkins slaves?

2012-06-26 Thread Eoghan Glynn

Thanks for the quick response ...
 
 Very basic things, not much other than the Jenkins Slave service and
 SSH.  Nothing that should cause conflicts that you are seeing.  We
 also intentionally only run one test run per slave at a time.

Interesting, seems the alternate explanation of a lag-on-closure is the
more likely in that case. 
 
 Are you closing ports with SO_REUSEADDR?  If the registry service or
 something else isn't then I guess that could cause it.

We do set SO_REUSEADDR on the registry server socket, but not on the dummy
socket used to identify an unused port. But I think setting SO_REUSEADDR
on the latter would  defeat the purpose of the dummy socket, by breaking
the constraint that the port should be previously unused.

Cheers,
Eoghan

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp