QEMU BUG: #1 Alright, one of the issues is (according to comment #14):
""" Meaning that code is waiting for a futex inside kernel. (gdb) print rcu_call_ready_event $4 = {value = 4294967295, initialized = true} The QemuEvent "rcu_call_ready_event->value" is set to INT_MAX and I don't know why yet. rcu_call_ready_event->value is only touched by: qemu_event_init() -> bool init ? EV_SET : EV_FREE qemu_event_reset() -> atomic_or(&ev->value, EV_FREE) qemu_event_set() -> atomic_xchg(&ev->value, EV_SET) qemu_event_wait() -> atomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY)' """ Now I know why rcu_call_ready_event->value is set to INT_MAX. That is because in the following declaration: struct QemuEvent { #ifndef __linux__ pthread_mutex_t lock; pthread_cond_t cond; #endif unsigned value; bool initialized; }; #define EV_SET 0 #define EV_FREE 1 #define EV_BUSY -1 "value" is declared as unsigned, but EV_BUSY sets it to -1, and, according to the Two's Complement Operation (https://en.wikipedia.org/wiki/Two%27s_complement), it will be INT_MAX (4294967295). So this is the "first bug" found AND it is definitely funny that this hasn't been seen in other architectures at all... I can reproduce it at will. With that said, it seems that there is still another issue causing (less frequently): (gdb) thread 2 [Switching to thread 2 (Thread 0xffffbec5ad90 (LWP 17459))] #0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38 38 ../sysdeps/unix/sysv/linux/aarch64/syscall.S: No such file or directory. (gdb) bt #0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38 #1 0x0000aaaaaabd41cc in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at ./util/qemu-thread-posix.c:438 #2 qemu_event_wait (ev=ev@entry=0xaaaaaac86ce8 <rcu_call_ready_event>) at ./util/qemu-thread-posix.c:442 #3 0x0000aaaaaabed05c in call_rcu_thread (opaque=opaque@entry=0x0) at ./util/rcu.c:261 #4 0x0000aaaaaabd34c8 in qemu_thread_start (args=<optimized out>) at ./util/qemu-thread-posix.c:498 #5 0x0000ffffbf25c880 in start_thread (arg=0xfffffffff5bf) at pthread_create.c:486 #6 0x0000ffffbf1b6b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 Thread 2 to be stuck at "futex()" kernel syscall (like the FUTEX_WAKE never happened and/or wasn't atomic for this arch/binary). Need to investigate this also. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0x0000ffffae4f8154 in __GI_ppoll (fds=0xaaaae8a67dc0, nfds=187650274213760, timeout=<optimized out>, timeout@entry=0x0, sigmask=0xffffc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0x0000aaaabbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0x0000aaaabbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497 #5 0x0000aaaabbe2aa30 in convert_do_copy (s=0xffffc123bb58) at qemu-img.c:1980 #6 img_convert (argc=<optimized out>, argv=<optimized out>) at qemu-img.c:2456 #7 0x0000aaaabbe2333c in main (argc=7, argv=<optimized out>) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions