On December 3, 2017 9:27 pm, Paul Belanger wrote: [snip]
Please reach out to me the next time you restart it, something is seriously wrong is we have to keep restarting nodepool every few days. At this rate, I would even leave nodepool-launcher is the bad state until we inspect it.Thanks, PB
Hello, nodepoold was stuck again. Before restarting it I dumped the thread's stack-trace and it seems like 8 threads were trying to aquire a single lock (futex=0xe41de0): https://review.rdoproject.org/paste/show/9VnzowfzBogKG4Gw0Kes/ This make the main loop stuck at http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/nodepool.py#n1281 I'm not entirely sure what caused this deadlock, the other threads involved are quite complex: * kazoo zk_loop * zmq received * apscheduler mainloop * periodicCheck paramiko client connect * paramiko transport run * nodepool webapp handle request Next time, before restarting the process, it would be good to know what thread is actually holding the lock, using (gdb) py-print, as explained here: https://stackoverflow.com/questions/42169768/debug-pythread-acquire-lock-deadlock/42256864#42256864 Paul: any other debug instructions would be appreciated. Regards, -Tristan
pgpfB0wtAv7NU.pgp
Description: PGP signature
_______________________________________________ dev mailing list dev@lists.rdoproject.org http://lists.rdoproject.org/mailman/listinfo/dev To unsubscribe: dev-unsubscr...@lists.rdoproject.org