On 02/22/2016 11:30 AM, Chris Friesen wrote:
On 02/22/2016 11:17 AM, Jay Pipes wrote:
On 02/22/2016 10:43 AM, Chris Friesen wrote:
Hi all,

We've recently run into some interesting behaviour that I thought I
should bring up to see if we want to do anything about it.

Basically the problem seems to be that nova-compute is doing disk I/O
from the main thread, and if it blocks then it can block all of
nova-compute (since all eventlets will be blocked).  Examples that we've
found include glance image download, file renaming, instance directory
creation, opening the instance xml file, etc.  We've seen nova-compute
block for upwards of 50 seconds.

Now the specific case where we hit this is not a production
environment.  It's only got one spinning disk shared by all the guests,
the guests were hammering on the disk pretty hard, the IO scheduler for
the instance disk was CFQ which seems to be buggy in our kernel.

But the fact remains that nova-compute is doing disk I/O from the main
thread, and if the guests push that disk hard enough then nova-compute
is going to suffer.

Given the above...would it make sense to use eventlet.tpool or similar
to perform all disk access in a separate OS thread?  There'd likely be a
bit of a performance hit, but at least it would isolate the main thread
from IO blocking.

This is probably a good idea, but will require quite a bit of code
change. I
think in the past we've taken the expedient route of just exec'ing
problematic
code in a greenthread using utils.spawn().

I'm not an expert on eventlet, but from what I've seen this isn't
sufficient to deal with disk access in a robust way.

It's my understanding that utils.spawn() will result in the code running
in the same OS thread, but in a separate eventlet greenthread.  If that
code tries to access the disk via a potentially-blocking call the
eventlet subsystem will not jump to another greenthread.  Because of
this it can potentially block the whole OS thread (and thus all other
greenthreads running in that OS thread).

not sure what utils.spawn() does but if it is in fact an "exec" (or if Jay is suggesting that an exec() be used within) then the code would be in a different process entirely, and communicating with it becomes an issue of pipe IO over unix sockets which IIRC can do non blocking.



I think we need to eventlet.tpool for disk IO (or else fork a whole
separate process).  Basically we need to ensure that the main OS thread
never issues a potentially-blocking syscall.

tpool would probably be easier (and more performant because no socket needed).



Chris

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to