Fwiw, we've seen this with nova-scheduler as well. I think the default pool 
size is too large in general. The problem that I've seen stems from the fact 
that DB calls all block and you can easily get a stack of 64 workers all 
waiting to do DB calls. And it happens to work out such that none of the rpc 
pool threads return before all run their DB calls. This is compounded by the 
explicit yield we have for every DB call in nova.  Anyway, this means that all 
of the workers are tied up for quite a while. Since nova casts to the 
scheduler, it doesn't impact the API much. But if you were waiting on an RPC 
response, you could be waiting a while.

Ironic does a lot of RPC calls. I don't think we know the exact behavior in 
Ironic, but I'm assuming it's something similar. If all rpc pool threads are 
essentially stuck until roughly the same time, you end up with API hangs. But 
we're also seeing periodic task run delays as well. It must be getting stuck 
behind a lot of the rpc worker threads such that lowering the number of threads 
helps considerably.

Given DB calls all block the process right now, there's really not much 
advantage to a larger pool size. 64 is too much, IMO. It would
make more sense if there was more IO that could be parallelized.

That didn't answer your question. I've been meaning to ask the same one since 
we discovered this. :)

- Chris

> On Apr 22, 2014, at 3:54 PM, Devananda van der Veen <devananda....@gmail.com> 
> wrote:
> 
> Hi!
> 
> When a project is using oslo.messaging, how can we change our default 
> rpc_thread_pool_size?
> 
> -----------------------
> Background
> 
> Ironic has hit a bug where a flood of API requests can deplete the RPC worker 
> pool on the other end and cause things to break in very bad ways. Apparently, 
> nova-conductor hit something similar a while back too. There've been a few 
> long discussions on IRC about it, tracked partially here:
>   https://bugs.launchpad.net/ironic/+bug/1308680
> 
> tldr; a way we can fix this is to set the rpc_thread_pool_size very small 
> (eg, 4) and keep our conductor.worker_pool size near its current value (eg, 
> 64). I'd like these to be the default option values, rather than require 
> every user to change the rpc_thread_pool_size in their local ironic.conf file.
> 
> We're also about to switch from the RPC module in oslo-incubator to using the 
> oslo.messaging library.
> 
> Why are these related? Because it looks impossible for us to change the 
> default for this option from within Ironic, because the option is registered 
> when EventletExecutor is instantaited (rather than loaded).
> 
> https://github.com/openstack/oslo.messaging/blob/master/oslo/messaging/_executors/impl_eventlet.py#L76
> 
> 
> Thanks,
> Devananda
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to