There's a bug on this: 
https://bugs.launchpad.net/ironic/+bug/1321787?comments=all

It seems like it's been well-known for a long time that paramiko 
parallelism doesn't work well with eventlet. Ironic's aggressive use of 
the ssh power driver seems to hit this hard.

The sign that you're hitting a problem with this is the "multiple 
simultaneous readers" warning, which is spurious (but a sign of trouble).

I've started to follow up this problem with the eventletdev mailing list, 
since having had a trawl back through tickets it looks like we've seen 
other issues arising from this in various places, going back at least 18 
months. They're not all paramiko-related; not are they necessarily 
"caused" by the thing mentioned in the TRACE lines - that's just the point 
eventlet can detect the problem. I've seen the glanceclient, at the least, 
also trigger this - as well as Ironic's use of utils.execute to launch 
(again, parallel) qemu-img calls.

I'm just trying a tripleo run with a much reduced workers_pool_size to see 
if I can at least forcibly get a run to complete successfully.


As to where the problem lies: it seems eventlet has a registered listener 
backed by one FD. That FD gets recycled by another thread, which attempts 
to read or write on it. That's why eventlet is carping. Paramiko seems to 
trigger this quite reliably because it uses a worker thread to manage its 
ssh communication.

Fixes to eventlet might be quite tricky - the bug above has a link to some 
quick sketches in github - although it's just struck me that there may be 
a simpler approach to investigate, which I'll pursue after sending this.


It'd be good to get some eyeballs on this eventlet problem - it's been 
hitting us for quite a time - only, previously to ironic, not at a 
sufficiently high rate to cause huge amounts of pain.


Cheers,
jan

PS. A longer, rambling braindump went to the eventletdev mailing list, 
which can be found here:

https://lists.secondlife.com/pipermail/eventletdev/2014-June/thread.html

...I think I've a better handle on the problem now, but I still don't have 
a satisfactory from-first-principles explanation of exactly the state of 
events that causes paramiko to trigger this.

-- 
j...@ioctl.org  http://ioctl.org/jan/
stty intr ^m

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to