On December 3, 2017 9:27 pm, Paul Belanger wrote:
[snip]
Please reach out to me the next time you restart it, something is seriously
wrong is we have to keep restarting nodepool every few days.
At this rate, I would even leave nodepool-launcher is the bad state until we 
inspect it.

Thanks,
PB


Hello,

nodepoold was stuck again. Before restarting it I dumped the thread's 
stack-trace and
it seems like 8 threads were trying to aquire a single lock (futex=0xe41de0):
https://review.rdoproject.org/paste/show/9VnzowfzBogKG4Gw0Kes/

This make the main loop stuck at
http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/nodepool.py#n1281

I'm not entirely sure what caused this deadlock, the other threads involved
are quite complex:
* kazoo zk_loop
* zmq received
* apscheduler mainloop
* periodicCheck paramiko client connect
* paramiko transport run
* nodepool webapp handle request

Next time, before restarting the process, it would be good to know what
thread is actually holding the lock, using (gdb) py-print, as explained
here:
https://stackoverflow.com/questions/42169768/debug-pythread-acquire-lock-deadlock/42256864#42256864

Paul: any other debug instructions would be appreciated.

Regards,
-Tristan

Attachment: pgpfB0wtAv7NU.pgp
Description: PGP signature

_______________________________________________
dev mailing list
dev@lists.rdoproject.org
http://lists.rdoproject.org/mailman/listinfo/dev

To unsubscribe: dev-unsubscr...@lists.rdoproject.org

Reply via email to