Re: [openstack-dev] Question about locking

2013-07-02 Thread Vishvananda Ishaya

On Jul 2, 2013, at 1:49 AM, Rosa, Andrea (HP Cloud Services) 
andrea.r...@hp.com wrote:

 Hi Vish,
 
 Were other commands working on the compute node? It seems much more
 likely that the node had a hung connection to rabbit. If you are not using 
 tcp
 keepalives, a network hiccup (or failover) can cause half open connections
 where the server thinks the connection is still active so it sends the 
 message
 but the compute node never receives it.
 
 The compute nodes is fine, messages are delivered and when I send a new 
 delete for the same instance, I can see the message received by the compute 
 node.
 As I said I don't see that very often, it's a rare case but I'd like to know 
 if the hanging lock could be an explanation.

Definitely seems like a possibility given your explanation, but I haven't seen 
it happen myself.

Vish

 
 PS: As you are on this topic I submitted a fix to complete the 
 pending deletion when the compute service starts, it would be great 
 if you can have a look at it: https://review.openstack.org/33265
 
 Regards
 --
 Andrea Rosa
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Question about locking

2013-07-01 Thread Vishvananda Ishaya

On Jul 1, 2013, at 2:27 AM, Rosa, Andrea (HP Cloud Services) 
andrea.r...@hp.com wrote:

 Hi Ben,
 
 Thank you very much for your reply.
 
 That function is using the synchronized decorator, which means that it's
 wrapped by a semaphore context.  As I understand it (and someone correct
 me if I'm wrong), if an error happens and an exception is thrown the context
 would be exited and the semaphore released.  Of course, I suppose there are
 situations where a thread could be terminated without being able to do that
 cleanup, but I suspect most of those cases would kill the entire process,
 making the lock irrelevant (since you specify not external).
 
 Ok, that is my understanding. Thanks for confirming it.
 
 If  not I think that all other actions for that instance are blocked
 waiting for the lock, is that correct?
 
 That is a potential pitfall of synchronization, but I think it shouldn't 
 happen in
 this case.  Are you seeing this behavior?
 
 I am seeing an odd  behaviour, sometimes (not often) I find instances in 
 DELETED status (vm_state) which are not marked as deleted.
 
 Below what I found when I was debugging it:
 I found an instance in that odd status, looking at the log file for the 
 compute node I didn't find any error, the service was running, the only thing 
 I spotted was a gap of several minutes in the log file of the compute node. 
 That is very unlikely.
 I tried to delete again the same instances but the operation never got 
 completed. Maybe the thread which was trying to manage the first deletion 
 died but the lock was still valid so all the other attempts to delete the 
 same instance failed.

Were other commands working on the compute node? It seems much more likely that 
the node had a hung connection to rabbit. If you are not using tcp keepalives, 
a network hiccup (or failover) can cause half open connections where the server 
thinks the connection is still active so it sends the message but the compute 
node never receives it.

Vish

 To fix the issue I had to restart the nova-compute service (so all locks 
 were released) and then I was able to complete the deletion.
 
 Does that make sense to you?
 
 PS: As you are on this topic I submitted a fix to complete the pending 
 deletion when the compute service starts, it would be great if you can have a 
 look at it: https://review.openstack.org/33265
 Regards
 --
 Andrea Rosa
 
 
 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Question about locking

2013-06-26 Thread Rosa, Andrea (HP Cloud Services)
Hi all,

What happens if a greenthread, after acquiring a lock  (not external), it dies?
For example: 
A thread is performing the do_terminate_instance, it has the lock and before 
terminating the process it dies,  what happens at the lock?
Is that released in some way?
If  not I think that all other actions for that instance are blocked waiting 
for the lock, is that correct?

Regards
--
Andrea


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev