On Jul 1, 2013, at 2:27 AM, Rosa, Andrea (HP Cloud Services)
andrea.r...@hp.com wrote:
Hi Ben,
Thank you very much for your reply.
That function is using the synchronized decorator, which means that it's
wrapped by a semaphore context. As I understand it (and someone correct
me if I'm wrong), if an error happens and an exception is thrown the context
would be exited and the semaphore released. Of course, I suppose there are
situations where a thread could be terminated without being able to do that
cleanup, but I suspect most of those cases would kill the entire process,
making the lock irrelevant (since you specify not external).
Ok, that is my understanding. Thanks for confirming it.
If not I think that all other actions for that instance are blocked
waiting for the lock, is that correct?
That is a potential pitfall of synchronization, but I think it shouldn't
happen in
this case. Are you seeing this behavior?
I am seeing an odd behaviour, sometimes (not often) I find instances in
DELETED status (vm_state) which are not marked as deleted.
Below what I found when I was debugging it:
I found an instance in that odd status, looking at the log file for the
compute node I didn't find any error, the service was running, the only thing
I spotted was a gap of several minutes in the log file of the compute node.
That is very unlikely.
I tried to delete again the same instances but the operation never got
completed. Maybe the thread which was trying to manage the first deletion
died but the lock was still valid so all the other attempts to delete the
same instance failed.
Were other commands working on the compute node? It seems much more likely that
the node had a hung connection to rabbit. If you are not using tcp keepalives,
a network hiccup (or failover) can cause half open connections where the server
thinks the connection is still active so it sends the message but the compute
node never receives it.
Vish
To fix the issue I had to restart the nova-compute service (so all locks
were released) and then I was able to complete the deletion.
Does that make sense to you?
PS: As you are on this topic I submitted a fix to complete the pending
deletion when the compute service starts, it would be great if you can have a
look at it: https://review.openstack.org/33265
Regards
--
Andrea Rosa
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev