Re: iotest 030 SIGSEGV

2021-10-15 Thread Vladimir Sementsov-Ogievskiy

15.10.2021 12:38, Paolo Bonzini wrote:

On 14/10/21 18:14, Vladimir Sementsov-Ogievskiy wrote:


iotest 30 failing is a long story.. And as I understand the main source of all 
these crashes is that we do diffreent graph modifications simultaneously from 
parallel block jobs.

In past I sent RFC series with global mutext, to fix a subset of the problem: https://patchew.org/QEMU/20201120161622.1537-1-vsement...@virtuozzo.com/ [just look at patch 5: https://patchew.org/QEMU/20201120161622.1537-1-vsement...@virtuozzo.com/20201120161622.1537-6-vsement...@virtuozzo.com/] 


Can you explain the way they interleave, and where the job callbacks are 
yielding in the middle of graph manipulations?


Not exactly, and I don't think it worth to recover this concrete old problem 
about permissions: too much changes since it were made, especially in block 
permission update system.

So, I can only refer to my old comments on it:

  OK, after some debugging and looking at block-graph dumps I tend to think that
  this a race between finish (.prepare) of mirror and block-stream. They do 
graph
  updates. Nothing prevents interleaving of graph-updating operations (note that
  bdrv_replace_child_noperm may do aio_poll). And nothing protects two processes
  of graph-update from intersection.

and

  aio_poll at start of mirror_exit_common is my addition. But anyway the problem
  is here: we do call mirror_prepare inside of stream_prepare!

(https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05181.html)

At this link there is a good core dump, where we are in stream_prepare, we do 
bdrv_change_backing_file() which finally call bdrv_pwritev(), which lead to 
aio_poll, during which we switch to mirror_prepare().

and

  2. The core problem is that graph modification functions may trigger
  context-switch due to nested aio_polls.. which leads to (for example) nested
  permission updates..

(https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05212.html)



The problem with a CoMutex is that plenty of graph manipulations happen outside 
coroutines, and if coroutines such as stream_co_clean yield the monitor can do 
graph manipulations of its own.

So if the solution could be "no yielding in the middle of graph manipulations", that 
would be much better.  In fact, maybe the coroutine API should have a way to assert "no-yield 
regions" (similar to how Linux croaks if you call a sleeping function while preemption is 
disabled). More assertions = more bugs found early.


Not sure that it's possible to fix bdrv_change_backing_file() in this way..  If 
some graph modifications connected with updating metadata in images, we want to 
write data and therefore - to yield. And the whole operation, both updating 
metadata in the image and updating the graph should be protected from 
interleaving with another graph-modifying operation.




Not sure was it good enough to try to recover it. I didn't look close at Emanuele's 
"block layer: split block APIs in global state and I/O". Wasn't there something 
on protecting graph operations?


In his series, graph operations are supposed to operate from the main thread 
(which they do) but he didn't cover the case of coroutines that yield.

Paolo



Maybe, it's possible to develop a critical section, a kind of mutex, that can 
be used both inside and outside of coroutines? So that coroutine will yield 
until it can acquire the mutex, non-coroutine will poll.

--
Best regards,
Vladimir



Re: iotest 030 SIGSEGV

2021-10-15 Thread Paolo Bonzini

On 14/10/21 18:14, Vladimir Sementsov-Ogievskiy wrote:


iotest 30 failing is a long story.. And as I understand the main source 
of all these crashes is that we do diffreent graph modifications 
simultaneously from parallel block jobs.


In past I sent RFC series with global mutext, to fix a subset of the 
problem: 
https://patchew.org/QEMU/20201120161622.1537-1-vsement...@virtuozzo.com/   
[just look at patch 5: 
https://patchew.org/QEMU/20201120161622.1537-1-vsement...@virtuozzo.com/20201120161622.1537-6-vsement...@virtuozzo.com/] 


Can you explain the way they interleave, and where the job callbacks are 
yielding in the middle of graph manipulations?


The problem with a CoMutex is that plenty of graph manipulations happen 
outside coroutines, and if coroutines such as stream_co_clean yield the 
monitor can do graph manipulations of its own.


So if the solution could be "no yielding in the middle of graph 
manipulations", that would be much better.  In fact, maybe the coroutine 
API should have a way to assert "no-yield regions" (similar to how Linux 
croaks if you call a sleeping function while preemption is disabled). 
More assertions = more bugs found early.


Not sure was it good enough to try to recover it. I didn't look close at 
Emanuele's "block layer: split block APIs in global state and I/O". 
Wasn't there something on protecting graph operations?


In his series, graph operations are supposed to operate from the main 
thread (which they do) but he didn't cover the case of coroutines that 
yield.


Paolo




Re: iotest 030 SIGSEGV

2021-10-14 Thread Vladimir Sementsov-Ogievskiy

14.10.2021 00:50, John Snow wrote:

In trying to replace the QMP library backend, I have now twice stumbled upon a 
SIGSEGV in iotest 030 in the last three weeks or so.

I didn't have debug symbols on at the time, so I've got only this stack trace:

(gdb) thread apply all bt

Thread 8 (Thread 0x7f0a6b8c4640 (LWP 1873554)):
#0  0x7f0a748a53ff in poll () at /lib64/libc.so.6
#1  0x7f0a759bfa36 in g_main_context_iterate.constprop () at 
/lib64/libglib-2.0.so.0
#2  0x7f0a7596d163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#3  0x557dac31d121 in iothread_run (opaque=opaque@entry=0x557dadd98800) at 
../../iothread.c:73
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a6b8c3650) at 
../../util/qemu-thread-posix.c:557
#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 7 (Thread 0x7f0a6b000640 (LWP 1873555)):
#0  0x7f0a747ed7d2 in sigtimedwait () at /lib64/libc.so.6
#1  0x7f0a74b72cdc in sigwait () at /lib64/libpthread.so.0
#2  0x557dac2e403b in dummy_cpu_thread_fn (arg=arg@entry=0x557dae041c10) at 
../../accel/dummy-cpus.c:46
#3  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a6afff650) at 
../../util/qemu-thread-posix.c:557
#4  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#5  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 6 (Thread 0x7f0a56afa640 (LWP 1873582)):
#0  0x7f0a74b71308 in do_futex_wait.constprop () at /lib64/libpthread.so.0
#1  0x7f0a74b71433 in __new_sem_wait_slow.constprop.0 () at 
/lib64/libpthread.so.0
#2  0x557dac4d8f1f in qemu_sem_timedwait (sem=sem@entry=0x557dadd62878, 
ms=ms@entry=1) at ../../util/qemu-thread-posix.c:327
#3  0x557dac4f5ac4 in worker_thread (opaque=opaque@entry=0x557dadd62800) at 
../../util/thread-pool.c:91
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a56af9650) at 
../../util/qemu-thread-posix.c:557
#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 5 (Thread 0x7f0a57dff640 (LWP 1873580)):
#0  0x7f0a74b71308 in do_futex_wait.constprop () at /lib64/libpthread.so.0
#1  0x7f0a74b71433 in __new_sem_wait_slow.constprop.0 () at 
/lib64/libpthread.so.0
#2  0x557dac4d8f1f in qemu_sem_timedwait (sem=sem@entry=0x557dadd62878, 
ms=ms@entry=1) at ../../util/qemu-thread-posix.c:327
#3  0x557dac4f5ac4 in worker_thread (opaque=opaque@entry=0x557dadd62800) at 
../../util/thread-pool.c:91
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a57dfe650) at 
../../util/qemu-thread-posix.c:557
#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 4 (Thread 0x7f0a572fb640 (LWP 1873581)):
#0  0x7f0a74b7296f in pread64 () at /lib64/libpthread.so.0
#1  0x557dac39f18f in pread64 (__offset=, __nbytes=, 
__buf=, __fd=) at /usr/include/bits/unistd.h:105
#2  handle_aiocb_rw_linear (aiocb=aiocb@entry=0x7f0a573fc150, buf=0x7f0a6a47e000 
'\377' ...) at ../../block/file-posix.c:1481
#3  0x557dac39f664 in handle_aiocb_rw (opaque=0x7f0a573fc150) at 
../../block/file-posix.c:1521
#4  0x557dac4f5b54 in worker_thread (opaque=opaque@entry=0x557dadd62800) at 
../../util/thread-pool.c:104
#5  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a572fa650) at 
../../util/qemu-thread-posix.c:557
#6  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#7  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 3 (Thread 0x7f0a714e8640 (LWP 1873552)):
#0  0x7f0a748aaedd in syscall () at /lib64/libc.so.6
#1  0x557dac4d916a in qemu_futex_wait (val=, f=) at /home/jsnow/src/qemu/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0x557dace1f1e8 ) at 
../../util/qemu-thread-posix.c:480
#3  0x557dac4e189a in call_rcu_thread (opaque=opaque@entry=0x0) at 
../../util/rcu.c:258
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a714e7650) at 
../../util/qemu-thread-posix.c:557
#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 2 (Thread 0x7f0a70ae5640 (LWP 1873553)):
#0  0x7f0a74b71308 in do_futex_wait.constprop () at /lib64/libpthread.so.0
#1  0x7f0a74b71433 in __new_sem_wait_slow.constprop.0 () at 
/lib64/libpthread.so.0
#2  0x557dac4d8f1f in qemu_sem_timedwait (sem=sem@entry=0x557dadd62878, 
ms=ms@entry=1) at ../../util/qemu-thread-posix.c:327
#3  0x557dac4f5ac4 in worker_thread (opaque=opaque@entry=0x557dadd62800) at 
../../util/thread-pool.c:91
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a70ae4650) at 
../../util/qemu-thread-posix.c:557
#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 1 (Thread 0x7f0a714ebec0 (LWP 1873551)):
#0  bdrv_inherits_from_recursive (parent=parent@entry=0x557dadfb5050, 
child=0xafafafafafafafaf, child@entr

Re: iotest 030 SIGSEGV

2021-10-14 Thread Vladimir Sementsov-Ogievskiy

14.10.2021 16:20, Hanna Reitz wrote:

On 13.10.21 23:50, John Snow wrote:

In trying to replace the QMP library backend, I have now twice stumbled upon a 
SIGSEGV in iotest 030 in the last three weeks or so.

I didn't have debug symbols on at the time, so I've got only this stack trace:

(gdb) thread apply all bt

Thread 8 (Thread 0x7f0a6b8c4640 (LWP 1873554)):
#0  0x7f0a748a53ff in poll () at /lib64/libc.so.6
#1  0x7f0a759bfa36 in g_main_context_iterate.constprop () at 
/lib64/libglib-2.0.so.0
#2  0x7f0a7596d163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#3  0x557dac31d121 in iothread_run (opaque=opaque@entry=0x557dadd98800) at 
../../iothread.c:73
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a6b8c3650) at 
../../util/qemu-thread-posix.c:557
#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 7 (Thread 0x7f0a6b000640 (LWP 1873555)):
#0  0x7f0a747ed7d2 in sigtimedwait () at /lib64/libc.so.6
#1  0x7f0a74b72cdc in sigwait () at /lib64/libpthread.so.0
#2  0x557dac2e403b in dummy_cpu_thread_fn (arg=arg@entry=0x557dae041c10) at 
../../accel/dummy-cpus.c:46
#3  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a6afff650) at 
../../util/qemu-thread-posix.c:557
#4  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#5  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 6 (Thread 0x7f0a56afa640 (LWP 1873582)):
#0  0x7f0a74b71308 in do_futex_wait.constprop () at /lib64/libpthread.so.0
#1  0x7f0a74b71433 in __new_sem_wait_slow.constprop.0 () at 
/lib64/libpthread.so.0
#2  0x557dac4d8f1f in qemu_sem_timedwait (sem=sem@entry=0x557dadd62878, 
ms=ms@entry=1) at ../../util/qemu-thread-posix.c:327
#3  0x557dac4f5ac4 in worker_thread (opaque=opaque@entry=0x557dadd62800) at 
../../util/thread-pool.c:91
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a56af9650) at 
../../util/qemu-thread-posix.c:557
#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 5 (Thread 0x7f0a57dff640 (LWP 1873580)):
#0  0x7f0a74b71308 in do_futex_wait.constprop () at /lib64/libpthread.so.0
#1  0x7f0a74b71433 in __new_sem_wait_slow.constprop.0 () at 
/lib64/libpthread.so.0
#2  0x557dac4d8f1f in qemu_sem_timedwait (sem=sem@entry=0x557dadd62878, 
ms=ms@entry=1) at ../../util/qemu-thread-posix.c:327
#3  0x557dac4f5ac4 in worker_thread (opaque=opaque@entry=0x557dadd62800) at 
../../util/thread-pool.c:91
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a57dfe650) at 
../../util/qemu-thread-posix.c:557
#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 4 (Thread 0x7f0a572fb640 (LWP 1873581)):
#0  0x7f0a74b7296f in pread64 () at /lib64/libpthread.so.0
#1  0x557dac39f18f in pread64 (__offset=, __nbytes=, 
__buf=, __fd=) at /usr/include/bits/unistd.h:105
#2  handle_aiocb_rw_linear (aiocb=aiocb@entry=0x7f0a573fc150, buf=0x7f0a6a47e000 
'\377' ...) at ../../block/file-posix.c:1481
#3  0x557dac39f664 in handle_aiocb_rw (opaque=0x7f0a573fc150) at 
../../block/file-posix.c:1521
#4  0x557dac4f5b54 in worker_thread (opaque=opaque@entry=0x557dadd62800) at 
../../util/thread-pool.c:104
#5  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a572fa650) at 
../../util/qemu-thread-posix.c:557
#6  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#7  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 3 (Thread 0x7f0a714e8640 (LWP 1873552)):
#0  0x7f0a748aaedd in syscall () at /lib64/libc.so.6
#1  0x557dac4d916a in qemu_futex_wait (val=, f=) at /home/jsnow/src/qemu/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0x557dace1f1e8 ) at 
../../util/qemu-thread-posix.c:480
#3  0x557dac4e189a in call_rcu_thread (opaque=opaque@entry=0x0) at 
../../util/rcu.c:258
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a714e7650) at 
../../util/qemu-thread-posix.c:557
#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 2 (Thread 0x7f0a70ae5640 (LWP 1873553)):
#0  0x7f0a74b71308 in do_futex_wait.constprop () at /lib64/libpthread.so.0
#1  0x7f0a74b71433 in __new_sem_wait_slow.constprop.0 () at 
/lib64/libpthread.so.0
#2  0x557dac4d8f1f in qemu_sem_timedwait (sem=sem@entry=0x557dadd62878, 
ms=ms@entry=1) at ../../util/qemu-thread-posix.c:327
#3  0x557dac4f5ac4 in worker_thread (opaque=opaque@entry=0x557dadd62800) at 
../../util/thread-pool.c:91
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a70ae4650) at 
../../util/qemu-thread-posix.c:557
#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 1 (Thread 0x7f0a714ebec0 (LWP 1873551)):
#0  bdrv_inherits_from_recursive (parent=parent@entry=0x557dadfb5050

Re: iotest 030 SIGSEGV

2021-10-14 Thread John Snow
On Thu, Oct 14, 2021 at 9:20 AM Hanna Reitz  wrote:

> On 13.10.21 23:50, John Snow wrote:
> > In trying to replace the QMP library backend, I have now twice
> > stumbled upon a SIGSEGV in iotest 030 in the last three weeks or so.
> >
> > I didn't have debug symbols on at the time, so I've got only this
> > stack trace:
> >
> > (gdb) thread apply all bt
> >
> > Thread 8 (Thread 0x7f0a6b8c4640 (LWP 1873554)):
> > #0  0x7f0a748a53ff in poll () at /lib64/libc.so.6
> > #1  0x7f0a759bfa36 in g_main_context_iterate.constprop () at
> > /lib64/libglib-2.0.so.0
> > #2  0x7f0a7596d163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
> > #3  0x557dac31d121 in iothread_run
> > (opaque=opaque@entry=0x557dadd98800) at ../../iothread.c:73
> > #4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a6b8c3650) at
> > ../../util/qemu-thread-posix.c:557
> > #5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
> > #6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6
> >
> > Thread 7 (Thread 0x7f0a6b000640 (LWP 1873555)):
> > #0  0x7f0a747ed7d2 in sigtimedwait () at /lib64/libc.so.6
> > #1  0x7f0a74b72cdc in sigwait () at /lib64/libpthread.so.0
> > #2  0x557dac2e403b in dummy_cpu_thread_fn
> > (arg=arg@entry=0x557dae041c10) at ../../accel/dummy-cpus.c:46
> > #3  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a6afff650) at
> > ../../util/qemu-thread-posix.c:557
> > #4  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
> > #5  0x7f0a748b04c3 in clone () at /lib64/libc.so.6
> >
> > Thread 6 (Thread 0x7f0a56afa640 (LWP 1873582)):
> > #0  0x7f0a74b71308 in do_futex_wait.constprop () at
> > /lib64/libpthread.so.0
> > #1  0x7f0a74b71433 in __new_sem_wait_slow.constprop.0 () at
> > /lib64/libpthread.so.0
> > #2  0x557dac4d8f1f in qemu_sem_timedwait
> > (sem=sem@entry=0x557dadd62878, ms=ms@entry=1) at
> > ../../util/qemu-thread-posix.c:327
> > #3  0x557dac4f5ac4 in worker_thread
> > (opaque=opaque@entry=0x557dadd62800) at ../../util/thread-pool.c:91
> > #4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a56af9650) at
> > ../../util/qemu-thread-posix.c:557
> > #5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
> > #6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6
> >
> > Thread 5 (Thread 0x7f0a57dff640 (LWP 1873580)):
> > #0  0x7f0a74b71308 in do_futex_wait.constprop () at
> > /lib64/libpthread.so.0
> > #1  0x7f0a74b71433 in __new_sem_wait_slow.constprop.0 () at
> > /lib64/libpthread.so.0
> > #2  0x557dac4d8f1f in qemu_sem_timedwait
> > (sem=sem@entry=0x557dadd62878, ms=ms@entry=1) at
> > ../../util/qemu-thread-posix.c:327
> > #3  0x557dac4f5ac4 in worker_thread
> > (opaque=opaque@entry=0x557dadd62800) at ../../util/thread-pool.c:91
> > #4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a57dfe650) at
> > ../../util/qemu-thread-posix.c:557
> > #5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
> > #6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6
> >
> > Thread 4 (Thread 0x7f0a572fb640 (LWP 1873581)):
> > #0  0x7f0a74b7296f in pread64 () at /lib64/libpthread.so.0
> > #1  0x557dac39f18f in pread64 (__offset=,
> > __nbytes=, __buf=, __fd=)
> > at /usr/include/bits/unistd.h:105
> > #2  handle_aiocb_rw_linear (aiocb=aiocb@entry=0x7f0a573fc150,
> > buf=0x7f0a6a47e000 '\377' ...) at
> > ../../block/file-posix.c:1481
> > #3  0x557dac39f664 in handle_aiocb_rw (opaque=0x7f0a573fc150) at
> > ../../block/file-posix.c:1521
> > #4  0x557dac4f5b54 in worker_thread
> > (opaque=opaque@entry=0x557dadd62800) at ../../util/thread-pool.c:104
> > #5  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a572fa650) at
> > ../../util/qemu-thread-posix.c:557
> > #6  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
> > #7  0x7f0a748b04c3 in clone () at /lib64/libc.so.6
> >
> > Thread 3 (Thread 0x7f0a714e8640 (LWP 1873552)):
> > #0  0x7f0a748aaedd in syscall () at /lib64/libc.so.6
> > #1  0x557dac4d916a in qemu_futex_wait (val=,
> > f=) at /home/jsnow/src/qemu/include/qemu/futex.h:29
> > #2  qemu_event_wait (ev=ev@entry=0x557dace1f1e8
> > ) at ../../util/qemu-thread-posix.c:480
> > #3  0x557dac4e189a in call_rcu_thread (opaque=opaque@entry=0x0) at
> > ../../util/rcu.c:258
> > #4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a714e7650) at
> > ../../util/qemu-thread-posix.c:557
> > #5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
> > #6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6
> >
> > Thread 2 (Thread 0x7f0a70ae5640 (LWP 1873553)):
> > #0  0x7f0a74b71308 in do_futex_wait.constprop () at
> > /lib64/libpthread.so.0
> > #1  0x7f0a74b71433 in __new_sem_wait_slow.constprop.0 () at
> > /lib64/libpthread.so.0
> > #2  0x557dac4d8f1f in qemu_sem_timedwait
> > (sem=sem@entry=0x557dadd62878, ms=ms@entry=1) at
> > ../../util/qemu-thread-posix.c:327
> > #3  0x557dac4f5ac4 in worker_thread
> > (opaque=opaque@entry=0x557dadd62

Re: iotest 030 SIGSEGV

2021-10-14 Thread Hanna Reitz

On 13.10.21 23:50, John Snow wrote:
In trying to replace the QMP library backend, I have now twice 
stumbled upon a SIGSEGV in iotest 030 in the last three weeks or so.


I didn't have debug symbols on at the time, so I've got only this 
stack trace:


(gdb) thread apply all bt

Thread 8 (Thread 0x7f0a6b8c4640 (LWP 1873554)):
#0  0x7f0a748a53ff in poll () at /lib64/libc.so.6
#1  0x7f0a759bfa36 in g_main_context_iterate.constprop () at 
/lib64/libglib-2.0.so.0

#2  0x7f0a7596d163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#3  0x557dac31d121 in iothread_run 
(opaque=opaque@entry=0x557dadd98800) at ../../iothread.c:73
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a6b8c3650) at 
../../util/qemu-thread-posix.c:557

#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 7 (Thread 0x7f0a6b000640 (LWP 1873555)):
#0  0x7f0a747ed7d2 in sigtimedwait () at /lib64/libc.so.6
#1  0x7f0a74b72cdc in sigwait () at /lib64/libpthread.so.0
#2  0x557dac2e403b in dummy_cpu_thread_fn 
(arg=arg@entry=0x557dae041c10) at ../../accel/dummy-cpus.c:46
#3  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a6afff650) at 
../../util/qemu-thread-posix.c:557

#4  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#5  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 6 (Thread 0x7f0a56afa640 (LWP 1873582)):
#0  0x7f0a74b71308 in do_futex_wait.constprop () at 
/lib64/libpthread.so.0
#1  0x7f0a74b71433 in __new_sem_wait_slow.constprop.0 () at 
/lib64/libpthread.so.0
#2  0x557dac4d8f1f in qemu_sem_timedwait 
(sem=sem@entry=0x557dadd62878, ms=ms@entry=1) at 
../../util/qemu-thread-posix.c:327
#3  0x557dac4f5ac4 in worker_thread 
(opaque=opaque@entry=0x557dadd62800) at ../../util/thread-pool.c:91
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a56af9650) at 
../../util/qemu-thread-posix.c:557

#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 5 (Thread 0x7f0a57dff640 (LWP 1873580)):
#0  0x7f0a74b71308 in do_futex_wait.constprop () at 
/lib64/libpthread.so.0
#1  0x7f0a74b71433 in __new_sem_wait_slow.constprop.0 () at 
/lib64/libpthread.so.0
#2  0x557dac4d8f1f in qemu_sem_timedwait 
(sem=sem@entry=0x557dadd62878, ms=ms@entry=1) at 
../../util/qemu-thread-posix.c:327
#3  0x557dac4f5ac4 in worker_thread 
(opaque=opaque@entry=0x557dadd62800) at ../../util/thread-pool.c:91
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a57dfe650) at 
../../util/qemu-thread-posix.c:557

#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 4 (Thread 0x7f0a572fb640 (LWP 1873581)):
#0  0x7f0a74b7296f in pread64 () at /lib64/libpthread.so.0
#1  0x557dac39f18f in pread64 (__offset=, 
__nbytes=, __buf=, __fd=) 
at /usr/include/bits/unistd.h:105
#2  handle_aiocb_rw_linear (aiocb=aiocb@entry=0x7f0a573fc150, 
buf=0x7f0a6a47e000 '\377' ...) at 
../../block/file-posix.c:1481
#3  0x557dac39f664 in handle_aiocb_rw (opaque=0x7f0a573fc150) at 
../../block/file-posix.c:1521
#4  0x557dac4f5b54 in worker_thread 
(opaque=opaque@entry=0x557dadd62800) at ../../util/thread-pool.c:104
#5  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a572fa650) at 
../../util/qemu-thread-posix.c:557

#6  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#7  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 3 (Thread 0x7f0a714e8640 (LWP 1873552)):
#0  0x7f0a748aaedd in syscall () at /lib64/libc.so.6
#1  0x557dac4d916a in qemu_futex_wait (val=, 
f=) at /home/jsnow/src/qemu/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0x557dace1f1e8 
) at ../../util/qemu-thread-posix.c:480
#3  0x557dac4e189a in call_rcu_thread (opaque=opaque@entry=0x0) at 
../../util/rcu.c:258
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a714e7650) at 
../../util/qemu-thread-posix.c:557

#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 2 (Thread 0x7f0a70ae5640 (LWP 1873553)):
#0  0x7f0a74b71308 in do_futex_wait.constprop () at 
/lib64/libpthread.so.0
#1  0x7f0a74b71433 in __new_sem_wait_slow.constprop.0 () at 
/lib64/libpthread.so.0
#2  0x557dac4d8f1f in qemu_sem_timedwait 
(sem=sem@entry=0x557dadd62878, ms=ms@entry=1) at 
../../util/qemu-thread-posix.c:327
#3  0x557dac4f5ac4 in worker_thread 
(opaque=opaque@entry=0x557dadd62800) at ../../util/thread-pool.c:91
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a70ae4650) at 
../../util/qemu-thread-posix.c:557

#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 1 (Thread 0x7f0a714ebec0 (LWP 1873551)):
#0  bdrv_inherits_from_recursive (parent=parent@entry=0x557dadfb5050, 
child=0xafafafaf

iotest 030 SIGSEGV

2021-10-13 Thread John Snow
In trying to replace the QMP library backend, I have now twice stumbled
upon a SIGSEGV in iotest 030 in the last three weeks or so.

I didn't have debug symbols on at the time, so I've got only this stack
trace:

(gdb) thread apply all bt

Thread 8 (Thread 0x7f0a6b8c4640 (LWP 1873554)):
#0  0x7f0a748a53ff in poll () at /lib64/libc.so.6
#1  0x7f0a759bfa36 in g_main_context_iterate.constprop () at
/lib64/libglib-2.0.so.0
#2  0x7f0a7596d163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#3  0x557dac31d121 in iothread_run (opaque=opaque@entry=0x557dadd98800)
at ../../iothread.c:73
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a6b8c3650) at
../../util/qemu-thread-posix.c:557
#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 7 (Thread 0x7f0a6b000640 (LWP 1873555)):
#0  0x7f0a747ed7d2 in sigtimedwait () at /lib64/libc.so.6
#1  0x7f0a74b72cdc in sigwait () at /lib64/libpthread.so.0
#2  0x557dac2e403b in dummy_cpu_thread_fn (arg=arg@entry=0x557dae041c10)
at ../../accel/dummy-cpus.c:46
#3  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a6afff650) at
../../util/qemu-thread-posix.c:557
#4  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#5  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 6 (Thread 0x7f0a56afa640 (LWP 1873582)):
#0  0x7f0a74b71308 in do_futex_wait.constprop () at
/lib64/libpthread.so.0
#1  0x7f0a74b71433 in __new_sem_wait_slow.constprop.0 () at
/lib64/libpthread.so.0
#2  0x557dac4d8f1f in qemu_sem_timedwait (sem=sem@entry=0x557dadd62878,
ms=ms@entry=1) at ../../util/qemu-thread-posix.c:327
#3  0x557dac4f5ac4 in worker_thread (opaque=opaque@entry=0x557dadd62800)
at ../../util/thread-pool.c:91
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a56af9650) at
../../util/qemu-thread-posix.c:557
#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 5 (Thread 0x7f0a57dff640 (LWP 1873580)):
#0  0x7f0a74b71308 in do_futex_wait.constprop () at
/lib64/libpthread.so.0
#1  0x7f0a74b71433 in __new_sem_wait_slow.constprop.0 () at
/lib64/libpthread.so.0
#2  0x557dac4d8f1f in qemu_sem_timedwait (sem=sem@entry=0x557dadd62878,
ms=ms@entry=1) at ../../util/qemu-thread-posix.c:327
#3  0x557dac4f5ac4 in worker_thread (opaque=opaque@entry=0x557dadd62800)
at ../../util/thread-pool.c:91
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a57dfe650) at
../../util/qemu-thread-posix.c:557
#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 4 (Thread 0x7f0a572fb640 (LWP 1873581)):
#0  0x7f0a74b7296f in pread64 () at /lib64/libpthread.so.0
#1  0x557dac39f18f in pread64 (__offset=,
__nbytes=, __buf=, __fd=) at
/usr/include/bits/unistd.h:105
#2  handle_aiocb_rw_linear (aiocb=aiocb@entry=0x7f0a573fc150,
buf=0x7f0a6a47e000 '\377' ...) at
../../block/file-posix.c:1481
#3  0x557dac39f664 in handle_aiocb_rw (opaque=0x7f0a573fc150) at
../../block/file-posix.c:1521
#4  0x557dac4f5b54 in worker_thread (opaque=opaque@entry=0x557dadd62800)
at ../../util/thread-pool.c:104
#5  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a572fa650) at
../../util/qemu-thread-posix.c:557
#6  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#7  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 3 (Thread 0x7f0a714e8640 (LWP 1873552)):
#0  0x7f0a748aaedd in syscall () at /lib64/libc.so.6
#1  0x557dac4d916a in qemu_futex_wait (val=,
f=) at /home/jsnow/src/qemu/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0x557dace1f1e8 ) at
../../util/qemu-thread-posix.c:480
#3  0x557dac4e189a in call_rcu_thread (opaque=opaque@entry=0x0) at
../../util/rcu.c:258
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a714e7650) at
../../util/qemu-thread-posix.c:557
#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 2 (Thread 0x7f0a70ae5640 (LWP 1873553)):
#0  0x7f0a74b71308 in do_futex_wait.constprop () at
/lib64/libpthread.so.0
#1  0x7f0a74b71433 in __new_sem_wait_slow.constprop.0 () at
/lib64/libpthread.so.0
#2  0x557dac4d8f1f in qemu_sem_timedwait (sem=sem@entry=0x557dadd62878,
ms=ms@entry=1) at ../../util/qemu-thread-posix.c:327
#3  0x557dac4f5ac4 in worker_thread (opaque=opaque@entry=0x557dadd62800)
at ../../util/thread-pool.c:91
#4  0x557dac4d7f89 in qemu_thread_start (args=0x7f0a70ae4650) at
../../util/qemu-thread-posix.c:557
#5  0x7f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x7f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 1 (Thread 0x7f0a714ebec0 (LWP 1873551)):
#0  bdrv_inherits_from_recursive (parent=parent@entry=0x557dadfb5050,
child=0xafafafafafafafaf, child@entry=0x557dae857010) at ../../block.c:3124
#1  bdrv_set_file_or_bac