Re: s390 migration crash
On Wed, Mar 22, 2023 at 03:16:23PM -0400, Peter Xu wrote: > On Wed, Mar 22, 2023 at 06:13:43PM +, Dr. David Alan Gilbert wrote: > > * Peter Xu (pet...@redhat.com) wrote: > > > On Wed, Mar 22, 2023 at 02:05:06PM +, Dr. David Alan Gilbert wrote: > > > > * Peter Xu (pet...@redhat.com) wrote: > > > > > On Tue, Mar 21, 2023 at 08:24:37PM +, Dr. David Alan Gilbert > > > > > wrote: > > > > > > Hi Peter's, > > > > > > Peter M pointed me to a seg in a migration test in CI; I can > > > > > > reproduce > > > > > > it: > > > > > > * On an s390 host > > > > > > > > > > How easy to reproduce? > > > > > > > > Pretty much every time when run as: > > > > make check -j 4 > > > > > > > > > > * only as part of a make check - running migration-test by itself > > > > > > doesn't trigger for me. > > > > > > * It looks like it's postcopy preempt > > > > > > > > > > > > (gdb) bt full > > > > > > #0 iov_size (iov=iov@entry=0x2aa00e60670, iov_cnt=) > > > > > > at ../util/iov.c:88 > > > > > > len = 13517923312037845750 > > > > > > i = 17305 > > > > > > #1 0x02aa004d068c in qemu_fflush (f=0x2aa00e58630) at > > > > > > ../migration/qemu-file.c:307 > > > > > > local_error = 0x0 > > > > > > #2 0x02aa004d0e04 in qemu_fflush (f=) at > > > > > > ../migration/qemu-file.c:297 > > > > > > #3 0x02aa00613962 in postcopy_preempt_shutdown_file > > > > > > (s=s@entry=0x2aa00d1b4e0) at ../migration/ram.c:4657 > > > > > > #4 0x02aa004e12b4 in migration_completion (s=0x2aa00d1b4e0) at > > > > > > ../migration/migration.c:3469 > > > > > > ret = > > > > > > current_active_state = 5 > > > > > > must_precopy = 0 > > > > > > can_postcopy = 0 > > > > > > in_postcopy = true > > > > > > pending_size = 0 > > > > > > __func__ = "migration_iteration_run" > > > > > > iter_state = > > > > > > s = 0x2aa00d1b4e0 > > > > > > thread = > > > > > > setup_start = > > > > > > thr_error = > > > > > > urgent = > > > > > > #5 migration_iteration_run (s=0x2aa00d1b4e0) at > > > > > > ../migration/migration.c:3882 > > > > > > must_precopy = 0 > > > > > > can_postcopy = 0 > > > > > > in_postcopy = true > > > > > > pending_size = 0 > > > > > > __func__ = "migration_iteration_run" > > > > > > iter_state = > > > > > > s = 0x2aa00d1b4e0 > > > > > > thread = > > > > > > setup_start = > > > > > > thr_error = > > > > > > urgent = > > > > > > #6 migration_thread (opaque=opaque@entry=0x2aa00d1b4e0) at > > > > > > ../migration/migration.c:4124 > > > > > > iter_state = > > > > > > s = 0x2aa00d1b4e0 > > > > > > --Type for more, q to quit, c to continue without paging-- > > > > > > thread = > > > > > > setup_start = > > > > > > thr_error = > > > > > > urgent = > > > > > > #7 0x02aa00819b8c in qemu_thread_start (args=) > > > > > > at ../util/qemu-thread-posix.c:541 > > > > > > __cancel_buf = > > > > > > {__cancel_jmp_buf = {{__cancel_jmp_buf = {{__gregs = > > > > > > {4396782422080, 4393751543808, 4397299389454, 4396844235904, > > > > > > 2929182727824, 2929182933488, 4396843986792, 4397299389455, > > > > > > 33679382915066768, 33678512846981306}, __fpregs = {4396774031360, > > > > > > 8392704, 2929182933488, 0, 4396782422272, 2929172491858, > > > > > > 4396774031360, 1}}}, __mask_was_saved = 0}}, __pad = > > > > > > {0x3ffb4a77a60, 0x0, 0x0, 0x0}} > > > > > > __cancel_routine = 0x2aa00819bf0 > > > > > > __not_first_call = > > > > > > start_routine = 0x2aa004e08f0 > > > > > > arg = 0x2aa00d1b4e0 > > > > > > r = > > > > > > #8 0x03ffb7b1e2e6 in start_thread () at /lib64/libc.so.6 > > > > > > #9 0x03ffb7aafdbe in thread_start () at /lib64/libc.so.6 > > > > > > > > > > > > It looks like it's in the preempt test: > > > > > > > > > > > > (gdb) where > > > > > > #0 0x03ffb17a0126 in __pthread_kill_implementation () from > > > > > > /lib64/libc.so.6 > > > > > > #1 0x03ffb1750890 in raise () from /lib64/libc.so.6 > > > > > > #2 0x03ffb172a340 in abort () from /lib64/libc.so.6 > > > > > > #3 0x02aa0041c130 in qtest_check_status (s=) at > > > > > > ../tests/qtest/libqtest.c:194 > > > > > > #4 0x03ffb1a3b5de in g_hook_list_invoke () from > > > > > > /lib64/libglib-2.0.so.0 > > > > > > #5 > > > > > > #6 0x03ffb17a0126 in __pthread_kill_implementation () from > > > > > > /lib64/libc.so.6 > > > > > > #7 0x03ffb1750890 in raise () from /lib64/libc.so.6 > > > > > > #8 0x03ffb172a340 in abort () from /lib64/libc.so.6 > > > > > > #9 0x02aa00420318 in qmp_fd_receive (fd=) at > > > > > > ../tests/qtest/libqmp.c:80 > > > > > > #10 0x02aa0041d5ee in qtest_qmp_receive_dict (s=0x2aa01eb2700) > > > > > > at ../tests/qtest/libqtest.c:713 > > >
Re: s390 migration crash
On Tue, Mar 21, 2023 at 08:19:00PM -0400, Peter Xu wrote: > On Tue, Mar 21, 2023 at 08:24:37PM +, Dr. David Alan Gilbert wrote: > > Hi Peter's, > > Peter M pointed me to a seg in a migration test in CI; I can reproduce > > it: > > * On an s390 host > > How easy to reproduce? > > > * only as part of a make check - running migration-test by itself > > doesn't trigger for me. > > * It looks like it's postcopy preempt snip > > Looking at the iov and file it's garbage; so it makes me think this is > > something like a flush on a closed file. > > I didn't figure out how that could be closed, but I think there's indeed a > possible race that the qemufile can be accessed by both the return path > thread and the migration thread concurrently, while qemufile is not thread > safe on that. snip > From 0e317fa78e9671c119f6be78a0e0a36201517dc2 Mon Sep 17 00:00:00 2001 > From: Peter Xu > Date: Tue, 21 Mar 2023 19:58:42 -0400 > Subject: [PATCH 1/2] io: tls: Inherit QIO_CHANNEL_FEATURE_SHUTDOWN on server > side > > TLS iochannel will inherit io_shutdown() from the master ioc, however we > missed to do that on the server side. > > This will e.g. allow qemu_file_shutdown() to work on dest QEMU too for > migration. > > Signed-off-by: Peter Xu > --- > io/channel-tls.c | 3 +++ > 1 file changed, 3 insertions(+) Acked-by: Daniel P. Berrangé > > diff --git a/io/channel-tls.c b/io/channel-tls.c > index 5a7a3d48d6..9805dd0a3f 100644 > --- a/io/channel-tls.c > +++ b/io/channel-tls.c > @@ -74,6 +74,9 @@ qio_channel_tls_new_server(QIOChannel *master, > ioc = QIO_CHANNEL_TLS(object_new(TYPE_QIO_CHANNEL_TLS)); > > ioc->master = master; > +if (qio_channel_has_feature(master, QIO_CHANNEL_FEATURE_SHUTDOWN)) { > +qio_channel_set_feature(QIO_CHANNEL(ioc), > QIO_CHANNEL_FEATURE_SHUTDOWN); > +} > object_ref(OBJECT(master)); > > ioc->session = qcrypto_tls_session_new( With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: s390 migration crash
On Wed, Mar 22, 2023 at 06:13:43PM +, Dr. David Alan Gilbert wrote: > * Peter Xu (pet...@redhat.com) wrote: > > On Wed, Mar 22, 2023 at 02:05:06PM +, Dr. David Alan Gilbert wrote: > > > * Peter Xu (pet...@redhat.com) wrote: > > > > On Tue, Mar 21, 2023 at 08:24:37PM +, Dr. David Alan Gilbert wrote: > > > > > Hi Peter's, > > > > > Peter M pointed me to a seg in a migration test in CI; I can > > > > > reproduce > > > > > it: > > > > > * On an s390 host > > > > > > > > How easy to reproduce? > > > > > > Pretty much every time when run as: > > > make check -j 4 > > > > > > > > * only as part of a make check - running migration-test by itself > > > > > doesn't trigger for me. > > > > > * It looks like it's postcopy preempt > > > > > > > > > > (gdb) bt full > > > > > #0 iov_size (iov=iov@entry=0x2aa00e60670, iov_cnt=) > > > > > at ../util/iov.c:88 > > > > > len = 13517923312037845750 > > > > > i = 17305 > > > > > #1 0x02aa004d068c in qemu_fflush (f=0x2aa00e58630) at > > > > > ../migration/qemu-file.c:307 > > > > > local_error = 0x0 > > > > > #2 0x02aa004d0e04 in qemu_fflush (f=) at > > > > > ../migration/qemu-file.c:297 > > > > > #3 0x02aa00613962 in postcopy_preempt_shutdown_file > > > > > (s=s@entry=0x2aa00d1b4e0) at ../migration/ram.c:4657 > > > > > #4 0x02aa004e12b4 in migration_completion (s=0x2aa00d1b4e0) at > > > > > ../migration/migration.c:3469 > > > > > ret = > > > > > current_active_state = 5 > > > > > must_precopy = 0 > > > > > can_postcopy = 0 > > > > > in_postcopy = true > > > > > pending_size = 0 > > > > > __func__ = "migration_iteration_run" > > > > > iter_state = > > > > > s = 0x2aa00d1b4e0 > > > > > thread = > > > > > setup_start = > > > > > thr_error = > > > > > urgent = > > > > > #5 migration_iteration_run (s=0x2aa00d1b4e0) at > > > > > ../migration/migration.c:3882 > > > > > must_precopy = 0 > > > > > can_postcopy = 0 > > > > > in_postcopy = true > > > > > pending_size = 0 > > > > > __func__ = "migration_iteration_run" > > > > > iter_state = > > > > > s = 0x2aa00d1b4e0 > > > > > thread = > > > > > setup_start = > > > > > thr_error = > > > > > urgent = > > > > > #6 migration_thread (opaque=opaque@entry=0x2aa00d1b4e0) at > > > > > ../migration/migration.c:4124 > > > > > iter_state = > > > > > s = 0x2aa00d1b4e0 > > > > > --Type for more, q to quit, c to continue without paging-- > > > > > thread = > > > > > setup_start = > > > > > thr_error = > > > > > urgent = > > > > > #7 0x02aa00819b8c in qemu_thread_start (args=) at > > > > > ../util/qemu-thread-posix.c:541 > > > > > __cancel_buf = > > > > > {__cancel_jmp_buf = {{__cancel_jmp_buf = {{__gregs = > > > > > {4396782422080, 4393751543808, 4397299389454, 4396844235904, > > > > > 2929182727824, 2929182933488, 4396843986792, 4397299389455, > > > > > 33679382915066768, 33678512846981306}, __fpregs = {4396774031360, > > > > > 8392704, 2929182933488, 0, 4396782422272, 2929172491858, > > > > > 4396774031360, 1}}}, __mask_was_saved = 0}}, __pad = {0x3ffb4a77a60, > > > > > 0x0, 0x0, 0x0}} > > > > > __cancel_routine = 0x2aa00819bf0 > > > > > __not_first_call = > > > > > start_routine = 0x2aa004e08f0 > > > > > arg = 0x2aa00d1b4e0 > > > > > r = > > > > > #8 0x03ffb7b1e2e6 in start_thread () at /lib64/libc.so.6 > > > > > #9 0x03ffb7aafdbe in thread_start () at /lib64/libc.so.6 > > > > > > > > > > It looks like it's in the preempt test: > > > > > > > > > > (gdb) where > > > > > #0 0x03ffb17a0126 in __pthread_kill_implementation () from > > > > > /lib64/libc.so.6 > > > > > #1 0x03ffb1750890 in raise () from /lib64/libc.so.6 > > > > > #2 0x03ffb172a340 in abort () from /lib64/libc.so.6 > > > > > #3 0x02aa0041c130 in qtest_check_status (s=) at > > > > > ../tests/qtest/libqtest.c:194 > > > > > #4 0x03ffb1a3b5de in g_hook_list_invoke () from > > > > > /lib64/libglib-2.0.so.0 > > > > > #5 > > > > > #6 0x03ffb17a0126 in __pthread_kill_implementation () from > > > > > /lib64/libc.so.6 > > > > > #7 0x03ffb1750890 in raise () from /lib64/libc.so.6 > > > > > #8 0x03ffb172a340 in abort () from /lib64/libc.so.6 > > > > > #9 0x02aa00420318 in qmp_fd_receive (fd=) at > > > > > ../tests/qtest/libqmp.c:80 > > > > > #10 0x02aa0041d5ee in qtest_qmp_receive_dict (s=0x2aa01eb2700) at > > > > > ../tests/qtest/libqtest.c:713 > > > > > #11 qtest_qmp_receive (s=0x2aa01eb2700) at > > > > > ../tests/qtest/libqtest.c:701 > > > > > #12 qtest_vqmp (s=s@entry=0x2aa01eb2700, fmt=fmt@entry=0x2aa00487100 > > > > > "{ 'execute': 'query-migrate' }", ap=ap@entry=0x3ffc247cc68) > > > > > at
Re: s390 migration crash
* Peter Xu (pet...@redhat.com) wrote: > On Wed, Mar 22, 2023 at 02:05:06PM +, Dr. David Alan Gilbert wrote: > > * Peter Xu (pet...@redhat.com) wrote: > > > On Tue, Mar 21, 2023 at 08:24:37PM +, Dr. David Alan Gilbert wrote: > > > > Hi Peter's, > > > > Peter M pointed me to a seg in a migration test in CI; I can reproduce > > > > it: > > > > * On an s390 host > > > > > > How easy to reproduce? > > > > Pretty much every time when run as: > > make check -j 4 > > > > > > * only as part of a make check - running migration-test by itself > > > > doesn't trigger for me. > > > > * It looks like it's postcopy preempt > > > > > > > > (gdb) bt full > > > > #0 iov_size (iov=iov@entry=0x2aa00e60670, iov_cnt=) at > > > > ../util/iov.c:88 > > > > len = 13517923312037845750 > > > > i = 17305 > > > > #1 0x02aa004d068c in qemu_fflush (f=0x2aa00e58630) at > > > > ../migration/qemu-file.c:307 > > > > local_error = 0x0 > > > > #2 0x02aa004d0e04 in qemu_fflush (f=) at > > > > ../migration/qemu-file.c:297 > > > > #3 0x02aa00613962 in postcopy_preempt_shutdown_file > > > > (s=s@entry=0x2aa00d1b4e0) at ../migration/ram.c:4657 > > > > #4 0x02aa004e12b4 in migration_completion (s=0x2aa00d1b4e0) at > > > > ../migration/migration.c:3469 > > > > ret = > > > > current_active_state = 5 > > > > must_precopy = 0 > > > > can_postcopy = 0 > > > > in_postcopy = true > > > > pending_size = 0 > > > > __func__ = "migration_iteration_run" > > > > iter_state = > > > > s = 0x2aa00d1b4e0 > > > > thread = > > > > setup_start = > > > > thr_error = > > > > urgent = > > > > #5 migration_iteration_run (s=0x2aa00d1b4e0) at > > > > ../migration/migration.c:3882 > > > > must_precopy = 0 > > > > can_postcopy = 0 > > > > in_postcopy = true > > > > pending_size = 0 > > > > __func__ = "migration_iteration_run" > > > > iter_state = > > > > s = 0x2aa00d1b4e0 > > > > thread = > > > > setup_start = > > > > thr_error = > > > > urgent = > > > > #6 migration_thread (opaque=opaque@entry=0x2aa00d1b4e0) at > > > > ../migration/migration.c:4124 > > > > iter_state = > > > > s = 0x2aa00d1b4e0 > > > > --Type for more, q to quit, c to continue without paging-- > > > > thread = > > > > setup_start = > > > > thr_error = > > > > urgent = > > > > #7 0x02aa00819b8c in qemu_thread_start (args=) at > > > > ../util/qemu-thread-posix.c:541 > > > > __cancel_buf = > > > > {__cancel_jmp_buf = {{__cancel_jmp_buf = {{__gregs = > > > > {4396782422080, 4393751543808, 4397299389454, 4396844235904, > > > > 2929182727824, 2929182933488, 4396843986792, 4397299389455, > > > > 33679382915066768, 33678512846981306}, __fpregs = {4396774031360, > > > > 8392704, 2929182933488, 0, 4396782422272, 2929172491858, 4396774031360, > > > > 1}}}, __mask_was_saved = 0}}, __pad = {0x3ffb4a77a60, 0x0, 0x0, 0x0}} > > > > __cancel_routine = 0x2aa00819bf0 > > > > __not_first_call = > > > > start_routine = 0x2aa004e08f0 > > > > arg = 0x2aa00d1b4e0 > > > > r = > > > > #8 0x03ffb7b1e2e6 in start_thread () at /lib64/libc.so.6 > > > > #9 0x03ffb7aafdbe in thread_start () at /lib64/libc.so.6 > > > > > > > > It looks like it's in the preempt test: > > > > > > > > (gdb) where > > > > #0 0x03ffb17a0126 in __pthread_kill_implementation () from > > > > /lib64/libc.so.6 > > > > #1 0x03ffb1750890 in raise () from /lib64/libc.so.6 > > > > #2 0x03ffb172a340 in abort () from /lib64/libc.so.6 > > > > #3 0x02aa0041c130 in qtest_check_status (s=) at > > > > ../tests/qtest/libqtest.c:194 > > > > #4 0x03ffb1a3b5de in g_hook_list_invoke () from > > > > /lib64/libglib-2.0.so.0 > > > > #5 > > > > #6 0x03ffb17a0126 in __pthread_kill_implementation () from > > > > /lib64/libc.so.6 > > > > #7 0x03ffb1750890 in raise () from /lib64/libc.so.6 > > > > #8 0x03ffb172a340 in abort () from /lib64/libc.so.6 > > > > #9 0x02aa00420318 in qmp_fd_receive (fd=) at > > > > ../tests/qtest/libqmp.c:80 > > > > #10 0x02aa0041d5ee in qtest_qmp_receive_dict (s=0x2aa01eb2700) at > > > > ../tests/qtest/libqtest.c:713 > > > > #11 qtest_qmp_receive (s=0x2aa01eb2700) at ../tests/qtest/libqtest.c:701 > > > > #12 qtest_vqmp (s=s@entry=0x2aa01eb2700, fmt=fmt@entry=0x2aa00487100 "{ > > > > 'execute': 'query-migrate' }", ap=ap@entry=0x3ffc247cc68) > > > > at ../tests/qtest/libqtest.c:765 > > > > #13 0x02aa00413f1e in wait_command (who=who@entry=0x2aa01eb2700, > > > > command=command@entry=0x2aa00487100 "{ 'execute': 'query-migrate' }") > > > > at ../tests/qtest/migration-helpers.c:73 > > > > #14 0x02aa00414078 in migrate_query (who=who@entry=0x2aa01eb2700) > > > > at
Re: s390 migration crash
On Wed, Mar 22, 2023 at 02:05:06PM +, Dr. David Alan Gilbert wrote: > * Peter Xu (pet...@redhat.com) wrote: > > On Tue, Mar 21, 2023 at 08:24:37PM +, Dr. David Alan Gilbert wrote: > > > Hi Peter's, > > > Peter M pointed me to a seg in a migration test in CI; I can reproduce > > > it: > > > * On an s390 host > > > > How easy to reproduce? > > Pretty much every time when run as: > make check -j 4 > > > > * only as part of a make check - running migration-test by itself > > > doesn't trigger for me. > > > * It looks like it's postcopy preempt > > > > > > (gdb) bt full > > > #0 iov_size (iov=iov@entry=0x2aa00e60670, iov_cnt=) at > > > ../util/iov.c:88 > > > len = 13517923312037845750 > > > i = 17305 > > > #1 0x02aa004d068c in qemu_fflush (f=0x2aa00e58630) at > > > ../migration/qemu-file.c:307 > > > local_error = 0x0 > > > #2 0x02aa004d0e04 in qemu_fflush (f=) at > > > ../migration/qemu-file.c:297 > > > #3 0x02aa00613962 in postcopy_preempt_shutdown_file > > > (s=s@entry=0x2aa00d1b4e0) at ../migration/ram.c:4657 > > > #4 0x02aa004e12b4 in migration_completion (s=0x2aa00d1b4e0) at > > > ../migration/migration.c:3469 > > > ret = > > > current_active_state = 5 > > > must_precopy = 0 > > > can_postcopy = 0 > > > in_postcopy = true > > > pending_size = 0 > > > __func__ = "migration_iteration_run" > > > iter_state = > > > s = 0x2aa00d1b4e0 > > > thread = > > > setup_start = > > > thr_error = > > > urgent = > > > #5 migration_iteration_run (s=0x2aa00d1b4e0) at > > > ../migration/migration.c:3882 > > > must_precopy = 0 > > > can_postcopy = 0 > > > in_postcopy = true > > > pending_size = 0 > > > __func__ = "migration_iteration_run" > > > iter_state = > > > s = 0x2aa00d1b4e0 > > > thread = > > > setup_start = > > > thr_error = > > > urgent = > > > #6 migration_thread (opaque=opaque@entry=0x2aa00d1b4e0) at > > > ../migration/migration.c:4124 > > > iter_state = > > > s = 0x2aa00d1b4e0 > > > --Type for more, q to quit, c to continue without paging-- > > > thread = > > > setup_start = > > > thr_error = > > > urgent = > > > #7 0x02aa00819b8c in qemu_thread_start (args=) at > > > ../util/qemu-thread-posix.c:541 > > > __cancel_buf = > > > {__cancel_jmp_buf = {{__cancel_jmp_buf = {{__gregs = > > > {4396782422080, 4393751543808, 4397299389454, 4396844235904, > > > 2929182727824, 2929182933488, 4396843986792, 4397299389455, > > > 33679382915066768, 33678512846981306}, __fpregs = {4396774031360, > > > 8392704, 2929182933488, 0, 4396782422272, 2929172491858, 4396774031360, > > > 1}}}, __mask_was_saved = 0}}, __pad = {0x3ffb4a77a60, 0x0, 0x0, 0x0}} > > > __cancel_routine = 0x2aa00819bf0 > > > __not_first_call = > > > start_routine = 0x2aa004e08f0 > > > arg = 0x2aa00d1b4e0 > > > r = > > > #8 0x03ffb7b1e2e6 in start_thread () at /lib64/libc.so.6 > > > #9 0x03ffb7aafdbe in thread_start () at /lib64/libc.so.6 > > > > > > It looks like it's in the preempt test: > > > > > > (gdb) where > > > #0 0x03ffb17a0126 in __pthread_kill_implementation () from > > > /lib64/libc.so.6 > > > #1 0x03ffb1750890 in raise () from /lib64/libc.so.6 > > > #2 0x03ffb172a340 in abort () from /lib64/libc.so.6 > > > #3 0x02aa0041c130 in qtest_check_status (s=) at > > > ../tests/qtest/libqtest.c:194 > > > #4 0x03ffb1a3b5de in g_hook_list_invoke () from > > > /lib64/libglib-2.0.so.0 > > > #5 > > > #6 0x03ffb17a0126 in __pthread_kill_implementation () from > > > /lib64/libc.so.6 > > > #7 0x03ffb1750890 in raise () from /lib64/libc.so.6 > > > #8 0x03ffb172a340 in abort () from /lib64/libc.so.6 > > > #9 0x02aa00420318 in qmp_fd_receive (fd=) at > > > ../tests/qtest/libqmp.c:80 > > > #10 0x02aa0041d5ee in qtest_qmp_receive_dict (s=0x2aa01eb2700) at > > > ../tests/qtest/libqtest.c:713 > > > #11 qtest_qmp_receive (s=0x2aa01eb2700) at ../tests/qtest/libqtest.c:701 > > > #12 qtest_vqmp (s=s@entry=0x2aa01eb2700, fmt=fmt@entry=0x2aa00487100 "{ > > > 'execute': 'query-migrate' }", ap=ap@entry=0x3ffc247cc68) > > > at ../tests/qtest/libqtest.c:765 > > > #13 0x02aa00413f1e in wait_command (who=who@entry=0x2aa01eb2700, > > > command=command@entry=0x2aa00487100 "{ 'execute': 'query-migrate' }") > > > at ../tests/qtest/migration-helpers.c:73 > > > #14 0x02aa00414078 in migrate_query (who=who@entry=0x2aa01eb2700) at > > > ../tests/qtest/migration-helpers.c:139 > > > #15 migrate_query_status (who=who@entry=0x2aa01eb2700) at > > > ../tests/qtest/migration-helpers.c:161 > > > #16 0x02aa00414480 in check_migration_status (ungoals=0x0, > > > goal=0x2aa00495c7e "completed", who=0x2aa01eb2700) at
Re: s390 migration crash
* Peter Xu (pet...@redhat.com) wrote: > On Tue, Mar 21, 2023 at 08:24:37PM +, Dr. David Alan Gilbert wrote: > > Hi Peter's, > > Peter M pointed me to a seg in a migration test in CI; I can reproduce > > it: > > * On an s390 host > > How easy to reproduce? Pretty much every time when run as: make check -j 4 > > * only as part of a make check - running migration-test by itself > > doesn't trigger for me. > > * It looks like it's postcopy preempt > > > > (gdb) bt full > > #0 iov_size (iov=iov@entry=0x2aa00e60670, iov_cnt=) at > > ../util/iov.c:88 > > len = 13517923312037845750 > > i = 17305 > > #1 0x02aa004d068c in qemu_fflush (f=0x2aa00e58630) at > > ../migration/qemu-file.c:307 > > local_error = 0x0 > > #2 0x02aa004d0e04 in qemu_fflush (f=) at > > ../migration/qemu-file.c:297 > > #3 0x02aa00613962 in postcopy_preempt_shutdown_file > > (s=s@entry=0x2aa00d1b4e0) at ../migration/ram.c:4657 > > #4 0x02aa004e12b4 in migration_completion (s=0x2aa00d1b4e0) at > > ../migration/migration.c:3469 > > ret = > > current_active_state = 5 > > must_precopy = 0 > > can_postcopy = 0 > > in_postcopy = true > > pending_size = 0 > > __func__ = "migration_iteration_run" > > iter_state = > > s = 0x2aa00d1b4e0 > > thread = > > setup_start = > > thr_error = > > urgent = > > #5 migration_iteration_run (s=0x2aa00d1b4e0) at > > ../migration/migration.c:3882 > > must_precopy = 0 > > can_postcopy = 0 > > in_postcopy = true > > pending_size = 0 > > __func__ = "migration_iteration_run" > > iter_state = > > s = 0x2aa00d1b4e0 > > thread = > > setup_start = > > thr_error = > > urgent = > > #6 migration_thread (opaque=opaque@entry=0x2aa00d1b4e0) at > > ../migration/migration.c:4124 > > iter_state = > > s = 0x2aa00d1b4e0 > > --Type for more, q to quit, c to continue without paging-- > > thread = > > setup_start = > > thr_error = > > urgent = > > #7 0x02aa00819b8c in qemu_thread_start (args=) at > > ../util/qemu-thread-posix.c:541 > > __cancel_buf = > > {__cancel_jmp_buf = {{__cancel_jmp_buf = {{__gregs = > > {4396782422080, 4393751543808, 4397299389454, 4396844235904, 2929182727824, > > 2929182933488, 4396843986792, 4397299389455, 33679382915066768, > > 33678512846981306}, __fpregs = {4396774031360, 8392704, 2929182933488, 0, > > 4396782422272, 2929172491858, 4396774031360, 1}}}, __mask_was_saved = 0}}, > > __pad = {0x3ffb4a77a60, 0x0, 0x0, 0x0}} > > __cancel_routine = 0x2aa00819bf0 > > __not_first_call = > > start_routine = 0x2aa004e08f0 > > arg = 0x2aa00d1b4e0 > > r = > > #8 0x03ffb7b1e2e6 in start_thread () at /lib64/libc.so.6 > > #9 0x03ffb7aafdbe in thread_start () at /lib64/libc.so.6 > > > > It looks like it's in the preempt test: > > > > (gdb) where > > #0 0x03ffb17a0126 in __pthread_kill_implementation () from > > /lib64/libc.so.6 > > #1 0x03ffb1750890 in raise () from /lib64/libc.so.6 > > #2 0x03ffb172a340 in abort () from /lib64/libc.so.6 > > #3 0x02aa0041c130 in qtest_check_status (s=) at > > ../tests/qtest/libqtest.c:194 > > #4 0x03ffb1a3b5de in g_hook_list_invoke () from /lib64/libglib-2.0.so.0 > > #5 > > #6 0x03ffb17a0126 in __pthread_kill_implementation () from > > /lib64/libc.so.6 > > #7 0x03ffb1750890 in raise () from /lib64/libc.so.6 > > #8 0x03ffb172a340 in abort () from /lib64/libc.so.6 > > #9 0x02aa00420318 in qmp_fd_receive (fd=) at > > ../tests/qtest/libqmp.c:80 > > #10 0x02aa0041d5ee in qtest_qmp_receive_dict (s=0x2aa01eb2700) at > > ../tests/qtest/libqtest.c:713 > > #11 qtest_qmp_receive (s=0x2aa01eb2700) at ../tests/qtest/libqtest.c:701 > > #12 qtest_vqmp (s=s@entry=0x2aa01eb2700, fmt=fmt@entry=0x2aa00487100 "{ > > 'execute': 'query-migrate' }", ap=ap@entry=0x3ffc247cc68) > > at ../tests/qtest/libqtest.c:765 > > #13 0x02aa00413f1e in wait_command (who=who@entry=0x2aa01eb2700, > > command=command@entry=0x2aa00487100 "{ 'execute': 'query-migrate' }") > > at ../tests/qtest/migration-helpers.c:73 > > #14 0x02aa00414078 in migrate_query (who=who@entry=0x2aa01eb2700) at > > ../tests/qtest/migration-helpers.c:139 > > #15 migrate_query_status (who=who@entry=0x2aa01eb2700) at > > ../tests/qtest/migration-helpers.c:161 > > #16 0x02aa00414480 in check_migration_status (ungoals=0x0, > > goal=0x2aa00495c7e "completed", who=0x2aa01eb2700) at > > ../tests/qtest/migration-helpers.c:177 > > #17 wait_for_migration_status (who=0x2aa01eb2700, goal=, > > ungoals=0x0) at ../tests/qtest/migration-helpers.c:202 > > #18 0x02aa0041300e in migrate_postcopy_complete > > (from=from@entry=0x2aa01eb2700, to=to@entry=0x2aa01eb3000, > >
Re: s390 migration crash
On Tue, Mar 21, 2023 at 08:24:37PM +, Dr. David Alan Gilbert wrote: > Hi Peter's, > Peter M pointed me to a seg in a migration test in CI; I can reproduce > it: > * On an s390 host How easy to reproduce? > * only as part of a make check - running migration-test by itself > doesn't trigger for me. > * It looks like it's postcopy preempt > > (gdb) bt full > #0 iov_size (iov=iov@entry=0x2aa00e60670, iov_cnt=) at > ../util/iov.c:88 > len = 13517923312037845750 > i = 17305 > #1 0x02aa004d068c in qemu_fflush (f=0x2aa00e58630) at > ../migration/qemu-file.c:307 > local_error = 0x0 > #2 0x02aa004d0e04 in qemu_fflush (f=) at > ../migration/qemu-file.c:297 > #3 0x02aa00613962 in postcopy_preempt_shutdown_file > (s=s@entry=0x2aa00d1b4e0) at ../migration/ram.c:4657 > #4 0x02aa004e12b4 in migration_completion (s=0x2aa00d1b4e0) at > ../migration/migration.c:3469 > ret = > current_active_state = 5 > must_precopy = 0 > can_postcopy = 0 > in_postcopy = true > pending_size = 0 > __func__ = "migration_iteration_run" > iter_state = > s = 0x2aa00d1b4e0 > thread = > setup_start = > thr_error = > urgent = > #5 migration_iteration_run (s=0x2aa00d1b4e0) at ../migration/migration.c:3882 > must_precopy = 0 > can_postcopy = 0 > in_postcopy = true > pending_size = 0 > __func__ = "migration_iteration_run" > iter_state = > s = 0x2aa00d1b4e0 > thread = > setup_start = > thr_error = > urgent = > #6 migration_thread (opaque=opaque@entry=0x2aa00d1b4e0) at > ../migration/migration.c:4124 > iter_state = > s = 0x2aa00d1b4e0 > --Type for more, q to quit, c to continue without paging-- > thread = > setup_start = > thr_error = > urgent = > #7 0x02aa00819b8c in qemu_thread_start (args=) at > ../util/qemu-thread-posix.c:541 > __cancel_buf = > {__cancel_jmp_buf = {{__cancel_jmp_buf = {{__gregs = > {4396782422080, 4393751543808, 4397299389454, 4396844235904, 2929182727824, > 2929182933488, 4396843986792, 4397299389455, 33679382915066768, > 33678512846981306}, __fpregs = {4396774031360, 8392704, 2929182933488, 0, > 4396782422272, 2929172491858, 4396774031360, 1}}}, __mask_was_saved = 0}}, > __pad = {0x3ffb4a77a60, 0x0, 0x0, 0x0}} > __cancel_routine = 0x2aa00819bf0 > __not_first_call = > start_routine = 0x2aa004e08f0 > arg = 0x2aa00d1b4e0 > r = > #8 0x03ffb7b1e2e6 in start_thread () at /lib64/libc.so.6 > #9 0x03ffb7aafdbe in thread_start () at /lib64/libc.so.6 > > It looks like it's in the preempt test: > > (gdb) where > #0 0x03ffb17a0126 in __pthread_kill_implementation () from > /lib64/libc.so.6 > #1 0x03ffb1750890 in raise () from /lib64/libc.so.6 > #2 0x03ffb172a340 in abort () from /lib64/libc.so.6 > #3 0x02aa0041c130 in qtest_check_status (s=) at > ../tests/qtest/libqtest.c:194 > #4 0x03ffb1a3b5de in g_hook_list_invoke () from /lib64/libglib-2.0.so.0 > #5 > #6 0x03ffb17a0126 in __pthread_kill_implementation () from > /lib64/libc.so.6 > #7 0x03ffb1750890 in raise () from /lib64/libc.so.6 > #8 0x03ffb172a340 in abort () from /lib64/libc.so.6 > #9 0x02aa00420318 in qmp_fd_receive (fd=) at > ../tests/qtest/libqmp.c:80 > #10 0x02aa0041d5ee in qtest_qmp_receive_dict (s=0x2aa01eb2700) at > ../tests/qtest/libqtest.c:713 > #11 qtest_qmp_receive (s=0x2aa01eb2700) at ../tests/qtest/libqtest.c:701 > #12 qtest_vqmp (s=s@entry=0x2aa01eb2700, fmt=fmt@entry=0x2aa00487100 "{ > 'execute': 'query-migrate' }", ap=ap@entry=0x3ffc247cc68) > at ../tests/qtest/libqtest.c:765 > #13 0x02aa00413f1e in wait_command (who=who@entry=0x2aa01eb2700, > command=command@entry=0x2aa00487100 "{ 'execute': 'query-migrate' }") > at ../tests/qtest/migration-helpers.c:73 > #14 0x02aa00414078 in migrate_query (who=who@entry=0x2aa01eb2700) at > ../tests/qtest/migration-helpers.c:139 > #15 migrate_query_status (who=who@entry=0x2aa01eb2700) at > ../tests/qtest/migration-helpers.c:161 > #16 0x02aa00414480 in check_migration_status (ungoals=0x0, > goal=0x2aa00495c7e "completed", who=0x2aa01eb2700) at > ../tests/qtest/migration-helpers.c:177 > #17 wait_for_migration_status (who=0x2aa01eb2700, goal=, > ungoals=0x0) at ../tests/qtest/migration-helpers.c:202 > #18 0x02aa0041300e in migrate_postcopy_complete > (from=from@entry=0x2aa01eb2700, to=to@entry=0x2aa01eb3000, > args=args@entry=0x3ffc247cf48) > at ../tests/qtest/migration-test.c:1137 > #19 0x02aa004131a4 in test_postcopy_common (args=0x3ffc247cf48) at > ../tests/qtest/migration-test.c:1162 > #20 test_postcopy_preempt () at ../tests/qtest/migration-test.c:1178 > > Looking at the iov and file it's garbage; so it makes me think this is >