Re: s390 migration crash

2023-03-26 Thread Peter Xu
On Wed, Mar 22, 2023 at 03:16:23PM -0400, Peter Xu wrote:
> On Wed, Mar 22, 2023 at 06:13:43PM +, Dr. David Alan Gilbert wrote:
> > * Peter Xu (pet...@redhat.com) wrote:
> > > On Wed, Mar 22, 2023 at 02:05:06PM +, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (pet...@redhat.com) wrote:
> > > > > On Tue, Mar 21, 2023 at 08:24:37PM +, Dr. David Alan Gilbert 
> > > > > wrote:
> > > > > > Hi Peter's,
> > > > > >   Peter M pointed me to a seg in a migration test in CI; I can 
> > > > > > reproduce
> > > > > > it:
> > > > > >   * On an s390 host
> > > > > 
> > > > > How easy to reproduce?
> > > > 
> > > > Pretty much every time when run as:
> > > > make check -j 4
> > > > 
> > > > > >   * only as part of a make check - running migration-test by itself
> > > > > > doesn't trigger for me.
> > > > > >   * It looks like it's postcopy preempt
> > > > > > 
> > > > > > (gdb) bt full
> > > > > > #0  iov_size (iov=iov@entry=0x2aa00e60670, iov_cnt=) 
> > > > > > at ../util/iov.c:88
> > > > > > len = 13517923312037845750
> > > > > > i = 17305
> > > > > > #1  0x02aa004d068c in qemu_fflush (f=0x2aa00e58630) at 
> > > > > > ../migration/qemu-file.c:307
> > > > > > local_error = 0x0
> > > > > > #2  0x02aa004d0e04 in qemu_fflush (f=) at 
> > > > > > ../migration/qemu-file.c:297
> > > > > > #3  0x02aa00613962 in postcopy_preempt_shutdown_file 
> > > > > > (s=s@entry=0x2aa00d1b4e0) at ../migration/ram.c:4657
> > > > > > #4  0x02aa004e12b4 in migration_completion (s=0x2aa00d1b4e0) at 
> > > > > > ../migration/migration.c:3469
> > > > > > ret = 
> > > > > > current_active_state = 5
> > > > > > must_precopy = 0
> > > > > > can_postcopy = 0
> > > > > > in_postcopy = true
> > > > > > pending_size = 0
> > > > > > __func__ = "migration_iteration_run"
> > > > > > iter_state = 
> > > > > > s = 0x2aa00d1b4e0
> > > > > > thread = 
> > > > > > setup_start = 
> > > > > > thr_error = 
> > > > > > urgent = 
> > > > > > #5  migration_iteration_run (s=0x2aa00d1b4e0) at 
> > > > > > ../migration/migration.c:3882
> > > > > > must_precopy = 0
> > > > > > can_postcopy = 0
> > > > > > in_postcopy = true
> > > > > > pending_size = 0
> > > > > > __func__ = "migration_iteration_run"
> > > > > > iter_state = 
> > > > > > s = 0x2aa00d1b4e0
> > > > > > thread = 
> > > > > > setup_start = 
> > > > > > thr_error = 
> > > > > > urgent = 
> > > > > > #6  migration_thread (opaque=opaque@entry=0x2aa00d1b4e0) at 
> > > > > > ../migration/migration.c:4124
> > > > > > iter_state = 
> > > > > > s = 0x2aa00d1b4e0
> > > > > > --Type  for more, q to quit, c to continue without paging--
> > > > > > thread = 
> > > > > > setup_start = 
> > > > > > thr_error = 
> > > > > > urgent = 
> > > > > > #7  0x02aa00819b8c in qemu_thread_start (args=) 
> > > > > > at ../util/qemu-thread-posix.c:541
> > > > > > __cancel_buf = 
> > > > > > {__cancel_jmp_buf = {{__cancel_jmp_buf = {{__gregs = 
> > > > > > {4396782422080, 4393751543808, 4397299389454, 4396844235904, 
> > > > > > 2929182727824, 2929182933488, 4396843986792, 4397299389455, 
> > > > > > 33679382915066768, 33678512846981306}, __fpregs = {4396774031360, 
> > > > > > 8392704, 2929182933488, 0, 4396782422272, 2929172491858, 
> > > > > > 4396774031360, 1}}}, __mask_was_saved = 0}}, __pad = 
> > > > > > {0x3ffb4a77a60, 0x0, 0x0, 0x0}}
> > > > > > __cancel_routine = 0x2aa00819bf0 
> > > > > > __not_first_call = 
> > > > > > start_routine = 0x2aa004e08f0 
> > > > > > arg = 0x2aa00d1b4e0
> > > > > > r = 
> > > > > > #8  0x03ffb7b1e2e6 in start_thread () at /lib64/libc.so.6
> > > > > > #9  0x03ffb7aafdbe in thread_start () at /lib64/libc.so.6
> > > > > > 
> > > > > > It looks like it's in the preempt test:
> > > > > > 
> > > > > > (gdb) where
> > > > > > #0  0x03ffb17a0126 in __pthread_kill_implementation () from 
> > > > > > /lib64/libc.so.6
> > > > > > #1  0x03ffb1750890 in raise () from /lib64/libc.so.6
> > > > > > #2  0x03ffb172a340 in abort () from /lib64/libc.so.6
> > > > > > #3  0x02aa0041c130 in qtest_check_status (s=) at 
> > > > > > ../tests/qtest/libqtest.c:194
> > > > > > #4  0x03ffb1a3b5de in g_hook_list_invoke () from 
> > > > > > /lib64/libglib-2.0.so.0
> > > > > > #5  
> > > > > > #6  0x03ffb17a0126 in __pthread_kill_implementation () from 
> > > > > > /lib64/libc.so.6
> > > > > > #7  0x03ffb1750890 in raise () from /lib64/libc.so.6
> > > > > > #8  0x03ffb172a340 in abort () from /lib64/libc.so.6
> > > > > > #9  0x02aa00420318 in qmp_fd_receive (fd=) at 
> > > > > > ../tests/qtest/libqmp.c:80
> > > > > > #10 0x02aa0041d5ee in qtest_qmp_receive_dict (s=0x2aa01eb2700) 
> > > > > > at ../tests/qtest/libqtest.c:713
> > > 

Re: s390 migration crash

2023-03-22 Thread Daniel P . Berrangé
On Tue, Mar 21, 2023 at 08:19:00PM -0400, Peter Xu wrote:
> On Tue, Mar 21, 2023 at 08:24:37PM +, Dr. David Alan Gilbert wrote:
> > Hi Peter's,
> >   Peter M pointed me to a seg in a migration test in CI; I can reproduce
> > it:
> >   * On an s390 host
> 
> How easy to reproduce?
> 
> >   * only as part of a make check - running migration-test by itself
> > doesn't trigger for me.
> >   * It looks like it's postcopy preempt

snip

> > Looking at the iov and file it's garbage; so it makes me think this is
> > something like a flush on a closed file.
> 
> I didn't figure out how that could be closed, but I think there's indeed a
> possible race that the qemufile can be accessed by both the return path
> thread and the migration thread concurrently, while qemufile is not thread
> safe on that.

snip

> From 0e317fa78e9671c119f6be78a0e0a36201517dc2 Mon Sep 17 00:00:00 2001
> From: Peter Xu 
> Date: Tue, 21 Mar 2023 19:58:42 -0400
> Subject: [PATCH 1/2] io: tls: Inherit QIO_CHANNEL_FEATURE_SHUTDOWN on server
>  side
> 
> TLS iochannel will inherit io_shutdown() from the master ioc, however we
> missed to do that on the server side.
> 
> This will e.g. allow qemu_file_shutdown() to work on dest QEMU too for
> migration.
> 
> Signed-off-by: Peter Xu 
> ---
>  io/channel-tls.c | 3 +++
>  1 file changed, 3 insertions(+)

Acked-by: Daniel P. Berrangé 

> 
> diff --git a/io/channel-tls.c b/io/channel-tls.c
> index 5a7a3d48d6..9805dd0a3f 100644
> --- a/io/channel-tls.c
> +++ b/io/channel-tls.c
> @@ -74,6 +74,9 @@ qio_channel_tls_new_server(QIOChannel *master,
>  ioc = QIO_CHANNEL_TLS(object_new(TYPE_QIO_CHANNEL_TLS));
>  
>  ioc->master = master;
> +if (qio_channel_has_feature(master, QIO_CHANNEL_FEATURE_SHUTDOWN)) {
> +qio_channel_set_feature(QIO_CHANNEL(ioc), 
> QIO_CHANNEL_FEATURE_SHUTDOWN);
> +}
>  object_ref(OBJECT(master));
>  
>  ioc->session = qcrypto_tls_session_new(



With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: s390 migration crash

2023-03-22 Thread Peter Xu
On Wed, Mar 22, 2023 at 06:13:43PM +, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > On Wed, Mar 22, 2023 at 02:05:06PM +, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (pet...@redhat.com) wrote:
> > > > On Tue, Mar 21, 2023 at 08:24:37PM +, Dr. David Alan Gilbert wrote:
> > > > > Hi Peter's,
> > > > >   Peter M pointed me to a seg in a migration test in CI; I can 
> > > > > reproduce
> > > > > it:
> > > > >   * On an s390 host
> > > > 
> > > > How easy to reproduce?
> > > 
> > > Pretty much every time when run as:
> > > make check -j 4
> > > 
> > > > >   * only as part of a make check - running migration-test by itself
> > > > > doesn't trigger for me.
> > > > >   * It looks like it's postcopy preempt
> > > > > 
> > > > > (gdb) bt full
> > > > > #0  iov_size (iov=iov@entry=0x2aa00e60670, iov_cnt=) 
> > > > > at ../util/iov.c:88
> > > > > len = 13517923312037845750
> > > > > i = 17305
> > > > > #1  0x02aa004d068c in qemu_fflush (f=0x2aa00e58630) at 
> > > > > ../migration/qemu-file.c:307
> > > > > local_error = 0x0
> > > > > #2  0x02aa004d0e04 in qemu_fflush (f=) at 
> > > > > ../migration/qemu-file.c:297
> > > > > #3  0x02aa00613962 in postcopy_preempt_shutdown_file 
> > > > > (s=s@entry=0x2aa00d1b4e0) at ../migration/ram.c:4657
> > > > > #4  0x02aa004e12b4 in migration_completion (s=0x2aa00d1b4e0) at 
> > > > > ../migration/migration.c:3469
> > > > > ret = 
> > > > > current_active_state = 5
> > > > > must_precopy = 0
> > > > > can_postcopy = 0
> > > > > in_postcopy = true
> > > > > pending_size = 0
> > > > > __func__ = "migration_iteration_run"
> > > > > iter_state = 
> > > > > s = 0x2aa00d1b4e0
> > > > > thread = 
> > > > > setup_start = 
> > > > > thr_error = 
> > > > > urgent = 
> > > > > #5  migration_iteration_run (s=0x2aa00d1b4e0) at 
> > > > > ../migration/migration.c:3882
> > > > > must_precopy = 0
> > > > > can_postcopy = 0
> > > > > in_postcopy = true
> > > > > pending_size = 0
> > > > > __func__ = "migration_iteration_run"
> > > > > iter_state = 
> > > > > s = 0x2aa00d1b4e0
> > > > > thread = 
> > > > > setup_start = 
> > > > > thr_error = 
> > > > > urgent = 
> > > > > #6  migration_thread (opaque=opaque@entry=0x2aa00d1b4e0) at 
> > > > > ../migration/migration.c:4124
> > > > > iter_state = 
> > > > > s = 0x2aa00d1b4e0
> > > > > --Type  for more, q to quit, c to continue without paging--
> > > > > thread = 
> > > > > setup_start = 
> > > > > thr_error = 
> > > > > urgent = 
> > > > > #7  0x02aa00819b8c in qemu_thread_start (args=) at 
> > > > > ../util/qemu-thread-posix.c:541
> > > > > __cancel_buf = 
> > > > > {__cancel_jmp_buf = {{__cancel_jmp_buf = {{__gregs = 
> > > > > {4396782422080, 4393751543808, 4397299389454, 4396844235904, 
> > > > > 2929182727824, 2929182933488, 4396843986792, 4397299389455, 
> > > > > 33679382915066768, 33678512846981306}, __fpregs = {4396774031360, 
> > > > > 8392704, 2929182933488, 0, 4396782422272, 2929172491858, 
> > > > > 4396774031360, 1}}}, __mask_was_saved = 0}}, __pad = {0x3ffb4a77a60, 
> > > > > 0x0, 0x0, 0x0}}
> > > > > __cancel_routine = 0x2aa00819bf0 
> > > > > __not_first_call = 
> > > > > start_routine = 0x2aa004e08f0 
> > > > > arg = 0x2aa00d1b4e0
> > > > > r = 
> > > > > #8  0x03ffb7b1e2e6 in start_thread () at /lib64/libc.so.6
> > > > > #9  0x03ffb7aafdbe in thread_start () at /lib64/libc.so.6
> > > > > 
> > > > > It looks like it's in the preempt test:
> > > > > 
> > > > > (gdb) where
> > > > > #0  0x03ffb17a0126 in __pthread_kill_implementation () from 
> > > > > /lib64/libc.so.6
> > > > > #1  0x03ffb1750890 in raise () from /lib64/libc.so.6
> > > > > #2  0x03ffb172a340 in abort () from /lib64/libc.so.6
> > > > > #3  0x02aa0041c130 in qtest_check_status (s=) at 
> > > > > ../tests/qtest/libqtest.c:194
> > > > > #4  0x03ffb1a3b5de in g_hook_list_invoke () from 
> > > > > /lib64/libglib-2.0.so.0
> > > > > #5  
> > > > > #6  0x03ffb17a0126 in __pthread_kill_implementation () from 
> > > > > /lib64/libc.so.6
> > > > > #7  0x03ffb1750890 in raise () from /lib64/libc.so.6
> > > > > #8  0x03ffb172a340 in abort () from /lib64/libc.so.6
> > > > > #9  0x02aa00420318 in qmp_fd_receive (fd=) at 
> > > > > ../tests/qtest/libqmp.c:80
> > > > > #10 0x02aa0041d5ee in qtest_qmp_receive_dict (s=0x2aa01eb2700) at 
> > > > > ../tests/qtest/libqtest.c:713
> > > > > #11 qtest_qmp_receive (s=0x2aa01eb2700) at 
> > > > > ../tests/qtest/libqtest.c:701
> > > > > #12 qtest_vqmp (s=s@entry=0x2aa01eb2700, fmt=fmt@entry=0x2aa00487100 
> > > > > "{ 'execute': 'query-migrate' }", ap=ap@entry=0x3ffc247cc68)
> > > > > at 

Re: s390 migration crash

2023-03-22 Thread Dr. David Alan Gilbert
* Peter Xu (pet...@redhat.com) wrote:
> On Wed, Mar 22, 2023 at 02:05:06PM +, Dr. David Alan Gilbert wrote:
> > * Peter Xu (pet...@redhat.com) wrote:
> > > On Tue, Mar 21, 2023 at 08:24:37PM +, Dr. David Alan Gilbert wrote:
> > > > Hi Peter's,
> > > >   Peter M pointed me to a seg in a migration test in CI; I can reproduce
> > > > it:
> > > >   * On an s390 host
> > > 
> > > How easy to reproduce?
> > 
> > Pretty much every time when run as:
> > make check -j 4
> > 
> > > >   * only as part of a make check - running migration-test by itself
> > > > doesn't trigger for me.
> > > >   * It looks like it's postcopy preempt
> > > > 
> > > > (gdb) bt full
> > > > #0  iov_size (iov=iov@entry=0x2aa00e60670, iov_cnt=) at 
> > > > ../util/iov.c:88
> > > > len = 13517923312037845750
> > > > i = 17305
> > > > #1  0x02aa004d068c in qemu_fflush (f=0x2aa00e58630) at 
> > > > ../migration/qemu-file.c:307
> > > > local_error = 0x0
> > > > #2  0x02aa004d0e04 in qemu_fflush (f=) at 
> > > > ../migration/qemu-file.c:297
> > > > #3  0x02aa00613962 in postcopy_preempt_shutdown_file 
> > > > (s=s@entry=0x2aa00d1b4e0) at ../migration/ram.c:4657
> > > > #4  0x02aa004e12b4 in migration_completion (s=0x2aa00d1b4e0) at 
> > > > ../migration/migration.c:3469
> > > > ret = 
> > > > current_active_state = 5
> > > > must_precopy = 0
> > > > can_postcopy = 0
> > > > in_postcopy = true
> > > > pending_size = 0
> > > > __func__ = "migration_iteration_run"
> > > > iter_state = 
> > > > s = 0x2aa00d1b4e0
> > > > thread = 
> > > > setup_start = 
> > > > thr_error = 
> > > > urgent = 
> > > > #5  migration_iteration_run (s=0x2aa00d1b4e0) at 
> > > > ../migration/migration.c:3882
> > > > must_precopy = 0
> > > > can_postcopy = 0
> > > > in_postcopy = true
> > > > pending_size = 0
> > > > __func__ = "migration_iteration_run"
> > > > iter_state = 
> > > > s = 0x2aa00d1b4e0
> > > > thread = 
> > > > setup_start = 
> > > > thr_error = 
> > > > urgent = 
> > > > #6  migration_thread (opaque=opaque@entry=0x2aa00d1b4e0) at 
> > > > ../migration/migration.c:4124
> > > > iter_state = 
> > > > s = 0x2aa00d1b4e0
> > > > --Type  for more, q to quit, c to continue without paging--
> > > > thread = 
> > > > setup_start = 
> > > > thr_error = 
> > > > urgent = 
> > > > #7  0x02aa00819b8c in qemu_thread_start (args=) at 
> > > > ../util/qemu-thread-posix.c:541
> > > > __cancel_buf = 
> > > > {__cancel_jmp_buf = {{__cancel_jmp_buf = {{__gregs = 
> > > > {4396782422080, 4393751543808, 4397299389454, 4396844235904, 
> > > > 2929182727824, 2929182933488, 4396843986792, 4397299389455, 
> > > > 33679382915066768, 33678512846981306}, __fpregs = {4396774031360, 
> > > > 8392704, 2929182933488, 0, 4396782422272, 2929172491858, 4396774031360, 
> > > > 1}}}, __mask_was_saved = 0}}, __pad = {0x3ffb4a77a60, 0x0, 0x0, 0x0}}
> > > > __cancel_routine = 0x2aa00819bf0 
> > > > __not_first_call = 
> > > > start_routine = 0x2aa004e08f0 
> > > > arg = 0x2aa00d1b4e0
> > > > r = 
> > > > #8  0x03ffb7b1e2e6 in start_thread () at /lib64/libc.so.6
> > > > #9  0x03ffb7aafdbe in thread_start () at /lib64/libc.so.6
> > > > 
> > > > It looks like it's in the preempt test:
> > > > 
> > > > (gdb) where
> > > > #0  0x03ffb17a0126 in __pthread_kill_implementation () from 
> > > > /lib64/libc.so.6
> > > > #1  0x03ffb1750890 in raise () from /lib64/libc.so.6
> > > > #2  0x03ffb172a340 in abort () from /lib64/libc.so.6
> > > > #3  0x02aa0041c130 in qtest_check_status (s=) at 
> > > > ../tests/qtest/libqtest.c:194
> > > > #4  0x03ffb1a3b5de in g_hook_list_invoke () from 
> > > > /lib64/libglib-2.0.so.0
> > > > #5  
> > > > #6  0x03ffb17a0126 in __pthread_kill_implementation () from 
> > > > /lib64/libc.so.6
> > > > #7  0x03ffb1750890 in raise () from /lib64/libc.so.6
> > > > #8  0x03ffb172a340 in abort () from /lib64/libc.so.6
> > > > #9  0x02aa00420318 in qmp_fd_receive (fd=) at 
> > > > ../tests/qtest/libqmp.c:80
> > > > #10 0x02aa0041d5ee in qtest_qmp_receive_dict (s=0x2aa01eb2700) at 
> > > > ../tests/qtest/libqtest.c:713
> > > > #11 qtest_qmp_receive (s=0x2aa01eb2700) at ../tests/qtest/libqtest.c:701
> > > > #12 qtest_vqmp (s=s@entry=0x2aa01eb2700, fmt=fmt@entry=0x2aa00487100 "{ 
> > > > 'execute': 'query-migrate' }", ap=ap@entry=0x3ffc247cc68)
> > > > at ../tests/qtest/libqtest.c:765
> > > > #13 0x02aa00413f1e in wait_command (who=who@entry=0x2aa01eb2700, 
> > > > command=command@entry=0x2aa00487100 "{ 'execute': 'query-migrate' }")
> > > > at ../tests/qtest/migration-helpers.c:73
> > > > #14 0x02aa00414078 in migrate_query (who=who@entry=0x2aa01eb2700) 
> > > > at 

Re: s390 migration crash

2023-03-22 Thread Peter Xu
On Wed, Mar 22, 2023 at 02:05:06PM +, Dr. David Alan Gilbert wrote:
> * Peter Xu (pet...@redhat.com) wrote:
> > On Tue, Mar 21, 2023 at 08:24:37PM +, Dr. David Alan Gilbert wrote:
> > > Hi Peter's,
> > >   Peter M pointed me to a seg in a migration test in CI; I can reproduce
> > > it:
> > >   * On an s390 host
> > 
> > How easy to reproduce?
> 
> Pretty much every time when run as:
> make check -j 4
> 
> > >   * only as part of a make check - running migration-test by itself
> > > doesn't trigger for me.
> > >   * It looks like it's postcopy preempt
> > > 
> > > (gdb) bt full
> > > #0  iov_size (iov=iov@entry=0x2aa00e60670, iov_cnt=) at 
> > > ../util/iov.c:88
> > > len = 13517923312037845750
> > > i = 17305
> > > #1  0x02aa004d068c in qemu_fflush (f=0x2aa00e58630) at 
> > > ../migration/qemu-file.c:307
> > > local_error = 0x0
> > > #2  0x02aa004d0e04 in qemu_fflush (f=) at 
> > > ../migration/qemu-file.c:297
> > > #3  0x02aa00613962 in postcopy_preempt_shutdown_file 
> > > (s=s@entry=0x2aa00d1b4e0) at ../migration/ram.c:4657
> > > #4  0x02aa004e12b4 in migration_completion (s=0x2aa00d1b4e0) at 
> > > ../migration/migration.c:3469
> > > ret = 
> > > current_active_state = 5
> > > must_precopy = 0
> > > can_postcopy = 0
> > > in_postcopy = true
> > > pending_size = 0
> > > __func__ = "migration_iteration_run"
> > > iter_state = 
> > > s = 0x2aa00d1b4e0
> > > thread = 
> > > setup_start = 
> > > thr_error = 
> > > urgent = 
> > > #5  migration_iteration_run (s=0x2aa00d1b4e0) at 
> > > ../migration/migration.c:3882
> > > must_precopy = 0
> > > can_postcopy = 0
> > > in_postcopy = true
> > > pending_size = 0
> > > __func__ = "migration_iteration_run"
> > > iter_state = 
> > > s = 0x2aa00d1b4e0
> > > thread = 
> > > setup_start = 
> > > thr_error = 
> > > urgent = 
> > > #6  migration_thread (opaque=opaque@entry=0x2aa00d1b4e0) at 
> > > ../migration/migration.c:4124
> > > iter_state = 
> > > s = 0x2aa00d1b4e0
> > > --Type  for more, q to quit, c to continue without paging--
> > > thread = 
> > > setup_start = 
> > > thr_error = 
> > > urgent = 
> > > #7  0x02aa00819b8c in qemu_thread_start (args=) at 
> > > ../util/qemu-thread-posix.c:541
> > > __cancel_buf = 
> > > {__cancel_jmp_buf = {{__cancel_jmp_buf = {{__gregs = 
> > > {4396782422080, 4393751543808, 4397299389454, 4396844235904, 
> > > 2929182727824, 2929182933488, 4396843986792, 4397299389455, 
> > > 33679382915066768, 33678512846981306}, __fpregs = {4396774031360, 
> > > 8392704, 2929182933488, 0, 4396782422272, 2929172491858, 4396774031360, 
> > > 1}}}, __mask_was_saved = 0}}, __pad = {0x3ffb4a77a60, 0x0, 0x0, 0x0}}
> > > __cancel_routine = 0x2aa00819bf0 
> > > __not_first_call = 
> > > start_routine = 0x2aa004e08f0 
> > > arg = 0x2aa00d1b4e0
> > > r = 
> > > #8  0x03ffb7b1e2e6 in start_thread () at /lib64/libc.so.6
> > > #9  0x03ffb7aafdbe in thread_start () at /lib64/libc.so.6
> > > 
> > > It looks like it's in the preempt test:
> > > 
> > > (gdb) where
> > > #0  0x03ffb17a0126 in __pthread_kill_implementation () from 
> > > /lib64/libc.so.6
> > > #1  0x03ffb1750890 in raise () from /lib64/libc.so.6
> > > #2  0x03ffb172a340 in abort () from /lib64/libc.so.6
> > > #3  0x02aa0041c130 in qtest_check_status (s=) at 
> > > ../tests/qtest/libqtest.c:194
> > > #4  0x03ffb1a3b5de in g_hook_list_invoke () from 
> > > /lib64/libglib-2.0.so.0
> > > #5  
> > > #6  0x03ffb17a0126 in __pthread_kill_implementation () from 
> > > /lib64/libc.so.6
> > > #7  0x03ffb1750890 in raise () from /lib64/libc.so.6
> > > #8  0x03ffb172a340 in abort () from /lib64/libc.so.6
> > > #9  0x02aa00420318 in qmp_fd_receive (fd=) at 
> > > ../tests/qtest/libqmp.c:80
> > > #10 0x02aa0041d5ee in qtest_qmp_receive_dict (s=0x2aa01eb2700) at 
> > > ../tests/qtest/libqtest.c:713
> > > #11 qtest_qmp_receive (s=0x2aa01eb2700) at ../tests/qtest/libqtest.c:701
> > > #12 qtest_vqmp (s=s@entry=0x2aa01eb2700, fmt=fmt@entry=0x2aa00487100 "{ 
> > > 'execute': 'query-migrate' }", ap=ap@entry=0x3ffc247cc68)
> > > at ../tests/qtest/libqtest.c:765
> > > #13 0x02aa00413f1e in wait_command (who=who@entry=0x2aa01eb2700, 
> > > command=command@entry=0x2aa00487100 "{ 'execute': 'query-migrate' }")
> > > at ../tests/qtest/migration-helpers.c:73
> > > #14 0x02aa00414078 in migrate_query (who=who@entry=0x2aa01eb2700) at 
> > > ../tests/qtest/migration-helpers.c:139
> > > #15 migrate_query_status (who=who@entry=0x2aa01eb2700) at 
> > > ../tests/qtest/migration-helpers.c:161
> > > #16 0x02aa00414480 in check_migration_status (ungoals=0x0, 
> > > goal=0x2aa00495c7e "completed", who=0x2aa01eb2700) at 

Re: s390 migration crash

2023-03-22 Thread Dr. David Alan Gilbert
* Peter Xu (pet...@redhat.com) wrote:
> On Tue, Mar 21, 2023 at 08:24:37PM +, Dr. David Alan Gilbert wrote:
> > Hi Peter's,
> >   Peter M pointed me to a seg in a migration test in CI; I can reproduce
> > it:
> >   * On an s390 host
> 
> How easy to reproduce?

Pretty much every time when run as:
make check -j 4

> >   * only as part of a make check - running migration-test by itself
> > doesn't trigger for me.
> >   * It looks like it's postcopy preempt
> > 
> > (gdb) bt full
> > #0  iov_size (iov=iov@entry=0x2aa00e60670, iov_cnt=) at 
> > ../util/iov.c:88
> > len = 13517923312037845750
> > i = 17305
> > #1  0x02aa004d068c in qemu_fflush (f=0x2aa00e58630) at 
> > ../migration/qemu-file.c:307
> > local_error = 0x0
> > #2  0x02aa004d0e04 in qemu_fflush (f=) at 
> > ../migration/qemu-file.c:297
> > #3  0x02aa00613962 in postcopy_preempt_shutdown_file 
> > (s=s@entry=0x2aa00d1b4e0) at ../migration/ram.c:4657
> > #4  0x02aa004e12b4 in migration_completion (s=0x2aa00d1b4e0) at 
> > ../migration/migration.c:3469
> > ret = 
> > current_active_state = 5
> > must_precopy = 0
> > can_postcopy = 0
> > in_postcopy = true
> > pending_size = 0
> > __func__ = "migration_iteration_run"
> > iter_state = 
> > s = 0x2aa00d1b4e0
> > thread = 
> > setup_start = 
> > thr_error = 
> > urgent = 
> > #5  migration_iteration_run (s=0x2aa00d1b4e0) at 
> > ../migration/migration.c:3882
> > must_precopy = 0
> > can_postcopy = 0
> > in_postcopy = true
> > pending_size = 0
> > __func__ = "migration_iteration_run"
> > iter_state = 
> > s = 0x2aa00d1b4e0
> > thread = 
> > setup_start = 
> > thr_error = 
> > urgent = 
> > #6  migration_thread (opaque=opaque@entry=0x2aa00d1b4e0) at 
> > ../migration/migration.c:4124
> > iter_state = 
> > s = 0x2aa00d1b4e0
> > --Type  for more, q to quit, c to continue without paging--
> > thread = 
> > setup_start = 
> > thr_error = 
> > urgent = 
> > #7  0x02aa00819b8c in qemu_thread_start (args=) at 
> > ../util/qemu-thread-posix.c:541
> > __cancel_buf = 
> > {__cancel_jmp_buf = {{__cancel_jmp_buf = {{__gregs = 
> > {4396782422080, 4393751543808, 4397299389454, 4396844235904, 2929182727824, 
> > 2929182933488, 4396843986792, 4397299389455, 33679382915066768, 
> > 33678512846981306}, __fpregs = {4396774031360, 8392704, 2929182933488, 0, 
> > 4396782422272, 2929172491858, 4396774031360, 1}}}, __mask_was_saved = 0}}, 
> > __pad = {0x3ffb4a77a60, 0x0, 0x0, 0x0}}
> > __cancel_routine = 0x2aa00819bf0 
> > __not_first_call = 
> > start_routine = 0x2aa004e08f0 
> > arg = 0x2aa00d1b4e0
> > r = 
> > #8  0x03ffb7b1e2e6 in start_thread () at /lib64/libc.so.6
> > #9  0x03ffb7aafdbe in thread_start () at /lib64/libc.so.6
> > 
> > It looks like it's in the preempt test:
> > 
> > (gdb) where
> > #0  0x03ffb17a0126 in __pthread_kill_implementation () from 
> > /lib64/libc.so.6
> > #1  0x03ffb1750890 in raise () from /lib64/libc.so.6
> > #2  0x03ffb172a340 in abort () from /lib64/libc.so.6
> > #3  0x02aa0041c130 in qtest_check_status (s=) at 
> > ../tests/qtest/libqtest.c:194
> > #4  0x03ffb1a3b5de in g_hook_list_invoke () from /lib64/libglib-2.0.so.0
> > #5  
> > #6  0x03ffb17a0126 in __pthread_kill_implementation () from 
> > /lib64/libc.so.6
> > #7  0x03ffb1750890 in raise () from /lib64/libc.so.6
> > #8  0x03ffb172a340 in abort () from /lib64/libc.so.6
> > #9  0x02aa00420318 in qmp_fd_receive (fd=) at 
> > ../tests/qtest/libqmp.c:80
> > #10 0x02aa0041d5ee in qtest_qmp_receive_dict (s=0x2aa01eb2700) at 
> > ../tests/qtest/libqtest.c:713
> > #11 qtest_qmp_receive (s=0x2aa01eb2700) at ../tests/qtest/libqtest.c:701
> > #12 qtest_vqmp (s=s@entry=0x2aa01eb2700, fmt=fmt@entry=0x2aa00487100 "{ 
> > 'execute': 'query-migrate' }", ap=ap@entry=0x3ffc247cc68)
> > at ../tests/qtest/libqtest.c:765
> > #13 0x02aa00413f1e in wait_command (who=who@entry=0x2aa01eb2700, 
> > command=command@entry=0x2aa00487100 "{ 'execute': 'query-migrate' }")
> > at ../tests/qtest/migration-helpers.c:73
> > #14 0x02aa00414078 in migrate_query (who=who@entry=0x2aa01eb2700) at 
> > ../tests/qtest/migration-helpers.c:139
> > #15 migrate_query_status (who=who@entry=0x2aa01eb2700) at 
> > ../tests/qtest/migration-helpers.c:161
> > #16 0x02aa00414480 in check_migration_status (ungoals=0x0, 
> > goal=0x2aa00495c7e "completed", who=0x2aa01eb2700) at 
> > ../tests/qtest/migration-helpers.c:177
> > #17 wait_for_migration_status (who=0x2aa01eb2700, goal=, 
> > ungoals=0x0) at ../tests/qtest/migration-helpers.c:202
> > #18 0x02aa0041300e in migrate_postcopy_complete 
> > (from=from@entry=0x2aa01eb2700, to=to@entry=0x2aa01eb3000, 
> > 

Re: s390 migration crash

2023-03-21 Thread Peter Xu
On Tue, Mar 21, 2023 at 08:24:37PM +, Dr. David Alan Gilbert wrote:
> Hi Peter's,
>   Peter M pointed me to a seg in a migration test in CI; I can reproduce
> it:
>   * On an s390 host

How easy to reproduce?

>   * only as part of a make check - running migration-test by itself
> doesn't trigger for me.
>   * It looks like it's postcopy preempt
> 
> (gdb) bt full
> #0  iov_size (iov=iov@entry=0x2aa00e60670, iov_cnt=) at 
> ../util/iov.c:88
> len = 13517923312037845750
> i = 17305
> #1  0x02aa004d068c in qemu_fflush (f=0x2aa00e58630) at 
> ../migration/qemu-file.c:307
> local_error = 0x0
> #2  0x02aa004d0e04 in qemu_fflush (f=) at 
> ../migration/qemu-file.c:297
> #3  0x02aa00613962 in postcopy_preempt_shutdown_file 
> (s=s@entry=0x2aa00d1b4e0) at ../migration/ram.c:4657
> #4  0x02aa004e12b4 in migration_completion (s=0x2aa00d1b4e0) at 
> ../migration/migration.c:3469
> ret = 
> current_active_state = 5
> must_precopy = 0
> can_postcopy = 0
> in_postcopy = true
> pending_size = 0
> __func__ = "migration_iteration_run"
> iter_state = 
> s = 0x2aa00d1b4e0
> thread = 
> setup_start = 
> thr_error = 
> urgent = 
> #5  migration_iteration_run (s=0x2aa00d1b4e0) at ../migration/migration.c:3882
> must_precopy = 0
> can_postcopy = 0
> in_postcopy = true
> pending_size = 0
> __func__ = "migration_iteration_run"
> iter_state = 
> s = 0x2aa00d1b4e0
> thread = 
> setup_start = 
> thr_error = 
> urgent = 
> #6  migration_thread (opaque=opaque@entry=0x2aa00d1b4e0) at 
> ../migration/migration.c:4124
> iter_state = 
> s = 0x2aa00d1b4e0
> --Type  for more, q to quit, c to continue without paging--
> thread = 
> setup_start = 
> thr_error = 
> urgent = 
> #7  0x02aa00819b8c in qemu_thread_start (args=) at 
> ../util/qemu-thread-posix.c:541
> __cancel_buf = 
> {__cancel_jmp_buf = {{__cancel_jmp_buf = {{__gregs = 
> {4396782422080, 4393751543808, 4397299389454, 4396844235904, 2929182727824, 
> 2929182933488, 4396843986792, 4397299389455, 33679382915066768, 
> 33678512846981306}, __fpregs = {4396774031360, 8392704, 2929182933488, 0, 
> 4396782422272, 2929172491858, 4396774031360, 1}}}, __mask_was_saved = 0}}, 
> __pad = {0x3ffb4a77a60, 0x0, 0x0, 0x0}}
> __cancel_routine = 0x2aa00819bf0 
> __not_first_call = 
> start_routine = 0x2aa004e08f0 
> arg = 0x2aa00d1b4e0
> r = 
> #8  0x03ffb7b1e2e6 in start_thread () at /lib64/libc.so.6
> #9  0x03ffb7aafdbe in thread_start () at /lib64/libc.so.6
> 
> It looks like it's in the preempt test:
> 
> (gdb) where
> #0  0x03ffb17a0126 in __pthread_kill_implementation () from 
> /lib64/libc.so.6
> #1  0x03ffb1750890 in raise () from /lib64/libc.so.6
> #2  0x03ffb172a340 in abort () from /lib64/libc.so.6
> #3  0x02aa0041c130 in qtest_check_status (s=) at 
> ../tests/qtest/libqtest.c:194
> #4  0x03ffb1a3b5de in g_hook_list_invoke () from /lib64/libglib-2.0.so.0
> #5  
> #6  0x03ffb17a0126 in __pthread_kill_implementation () from 
> /lib64/libc.so.6
> #7  0x03ffb1750890 in raise () from /lib64/libc.so.6
> #8  0x03ffb172a340 in abort () from /lib64/libc.so.6
> #9  0x02aa00420318 in qmp_fd_receive (fd=) at 
> ../tests/qtest/libqmp.c:80
> #10 0x02aa0041d5ee in qtest_qmp_receive_dict (s=0x2aa01eb2700) at 
> ../tests/qtest/libqtest.c:713
> #11 qtest_qmp_receive (s=0x2aa01eb2700) at ../tests/qtest/libqtest.c:701
> #12 qtest_vqmp (s=s@entry=0x2aa01eb2700, fmt=fmt@entry=0x2aa00487100 "{ 
> 'execute': 'query-migrate' }", ap=ap@entry=0x3ffc247cc68)
> at ../tests/qtest/libqtest.c:765
> #13 0x02aa00413f1e in wait_command (who=who@entry=0x2aa01eb2700, 
> command=command@entry=0x2aa00487100 "{ 'execute': 'query-migrate' }")
> at ../tests/qtest/migration-helpers.c:73
> #14 0x02aa00414078 in migrate_query (who=who@entry=0x2aa01eb2700) at 
> ../tests/qtest/migration-helpers.c:139
> #15 migrate_query_status (who=who@entry=0x2aa01eb2700) at 
> ../tests/qtest/migration-helpers.c:161
> #16 0x02aa00414480 in check_migration_status (ungoals=0x0, 
> goal=0x2aa00495c7e "completed", who=0x2aa01eb2700) at 
> ../tests/qtest/migration-helpers.c:177
> #17 wait_for_migration_status (who=0x2aa01eb2700, goal=, 
> ungoals=0x0) at ../tests/qtest/migration-helpers.c:202
> #18 0x02aa0041300e in migrate_postcopy_complete 
> (from=from@entry=0x2aa01eb2700, to=to@entry=0x2aa01eb3000, 
> args=args@entry=0x3ffc247cf48)
> at ../tests/qtest/migration-test.c:1137
> #19 0x02aa004131a4 in test_postcopy_common (args=0x3ffc247cf48) at 
> ../tests/qtest/migration-test.c:1162
> #20 test_postcopy_preempt () at ../tests/qtest/migration-test.c:1178
> 
> Looking at the iov and file it's garbage; so it makes me think this is
>