On 6/30/25 1:21 PM, Ilya Maximets wrote: > There is an ABBA deadlock between time_init() and seq_wait(): > > Thread 1: > poll_block() > time_poll() > time_init() > pthread_once() <-- lock A > do_time_init() > seq_create() > pthread_mutex_lock(seq_mutex) <-- lock B > > Thread 2: > seq_wait(different seqno) > pthread_mutex_lock(seq_mutex) <-- lock B > poll_immediate_wake() > poll_timer_wait() > time_msec() > time_init() > pthread_once() <-- lock A > > This is likely the same deadlock Intel CI saw last year before the lab > was shut down. > > The issue should not happen with normal applications as those would > normally have the time module initialized early in the process before > waiting on any sequence numbers, but it happens in the test-barrier > application from time to time causing the test suite to hang. > > Fix that by making sure we're not calling poll_immediate_wake() under > the seq_mutex. The time and seq modules are independent and it's hard > to ensure the dependency without exporting some of their internals. > Instead re-defining the prototype of the poll_immediate_wake_at(), > adding the thread safety annotation, so we have some basic protection > from this deadlock if the code ever changes. Compiler will warn on > the prototype mismatch as well if it ever happens, so it's not a big > problem. Having this prototype also gives us a spot in the code where > we can place a comment explaining the locking order. > > Reportde-at: > https://mail.openvswitch.org/pipermail/ovs-dev/2024-July/415436.html > Reported-at: https://issues.redhat.com/browse/FDP-1493 > Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> > --- > lib/seq.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-)
CI failed due to automake race: https://mail.openvswitch.org/pipermail/ovs-dev/2025-June/423797.html Recheck-request: github-robot _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev