Thank you Russel, i'll wait for the fix Best Regards
2012/5/29 Russell Bryant <rbry...@redhat.com> > On 05/28/2012 01:21 PM, Clint Byrum wrote: > > Looks to me that you need to make sure the other side of that RPC > > connection is up before nova-compute. I am not familiar with the > specifics > > of what Nova needs at startup, but I'd guess this is nova-api or > keystone. > > Thats a pretty easy thing to do in a single system (just mess with the > > upstart jobs or init scripts) but across multiple systems, you'll need > > some kind of orchestration layer, and even then modeling the dependencies > > on the network with some other tool seems like something just begging > > to break. > > In this case, it's nova-compute expecting nova-network to be up and > running when it starts up. This also causes a problem when restarting > all of the services at the same time, as seen here: > > https://bugs.launchpad.net/nova/+bug/999698 > > > Instead, the timeout should just be multiple minutes during startup, and > > the services should all be able to start in parallel if they are on the > > same box. I always think of one of those HP EcoPOD that is pre-installed > > with everything you need for OpenStack, and just shipped and then turned > > on. You could spend a lot of time trying to get that order just right, > > or you could just have everything extend their timeouts and get as far > > as they can without contact with the other services. > > > > nova-compute doesn't *know* that the other side is in error, it just > > knows that it is not responding. This is not a problem with nova-compute, > > so why should nova-compute fail so quickly? One could even argue that > > nova-compute should wait *forever* for the other side. From an ops > > standpoint, they're both "down", so why make the operations team take > > two actions when the actual broken service recovers? > > The problem is that since nova-network isn't up, the request gets lost. > nova-compute is sitting there waiting for a response to a message that > was never even received most likely. It's also possible that > nova-network received the message but the service stopped before it > responded (but that is less likely, I think). > > The message queues get created by the consumer of messages in nova. So, > in this case, nova-network creates the queue. Some possible solutions: > > 1) We could adjust this code path to just loop around and try again if > it hits a timeout. We could make the timeout much shorter than the > default, to make recover quicker. > > The downside would be that we're fixing a single place, when this issue > could pop up elsewhere. > > 2) We could make it so the sender creates the queue if it doesn't exist. > > This is good because it covers all cases. The bad thing is that we > would not be able to set the queue to be auto-deleted in this case, so > we could end up with a "leak" of unwanted message queues. > > > I'm tempted to just write a patch that does #1 for now to address the > immediate issue and then do something better later if we come up with > something. > > -- > Russell Bryant > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp >
_______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp