[Bug 1902470] Re: migration with TLS-MultiFD is stuck when the dst-libvirtd service restarts

2020-11-12 Thread Chuan Zheng
this bug is fixed by commit(a1af605bd5ade1a6dd571f553a6746b97f3d6869),
close the issue as fixed

** Changed in: qemu
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1902470

Title:
  migration with TLS-MultiFD is stuck when the dst-libvirtd service
  restarts

Status in QEMU:
  Fix Released

Bug description:
  hi,

  I found that the multi-channel TLS-handshake will be stuck when the
  dst-libvirtd restarts, both the src and dst sockets are blocked in
  recvmsg. In the meantime, live_migration thread is blocked in
  multifd_send_sync_main, so migration cannot be cancelled though src-
  libvirt has delivered the QMP command.

  Is there any way to exit migration when the multi-channel TLS-
  handshake is stuck? Does setting TLS-handshake timeout function take
  effect?

  The stack trace are as follows:

  =src qemu-system-aar stack=:
  #0  0x87d6f28c in recvmsg () from target:/usr/lib64/libpthread.so.0
  #1  0xe3817424 in qio_channel_socket_readv (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at 
../io/channel-socket.c:502
  #2  0xe380f468 in qio_channel_readv_full (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at ../io/channel.c:66
  #3  0xe380f9e8 in qio_channel_read (ioc=0xe9e30a30, 
buf=0xea204e9b "\026\003\001\001L\001", buflen=5, errp=0x0) at 
../io/channel.c:217
  #4  0xe380e7d4 in qio_channel_tls_read_handler (buf=0xea204e9b 
"\026\003\001\001L\001", len=5, opaque=0xfffd38001190) at ../io/channel-tls.c:53
  #5  0xe3801114 in qcrypto_tls_session_pull (opaque=0xe99d5700, 
buf=0xea204e9b, len=5) at ../crypto/tlssession.c:89
  #6  0x8822ed30 in _gnutls_stream_read (ms=0xdb58eaac, 
pull_func=0xfffd38001870, size=5, bufel=, 
session=0xe983cd60) at buffers.c:346
  #7  _gnutls_read (ms=0xdb58eaac, pull_func=0xfffd38001870, size=5, 
bufel=, session=0xe983cd60) at buffers.c:426
  #8  _gnutls_io_read_buffered (session=session@entry=0xe983cd60, total=5, 
recv_type=recv_type@entry=4294967295, ms=0xdb58eaac) at buffers.c:581
  #9  0x88224954 in recv_headers (ms=, 
record=0x883cd000 , 
htype=65535, type=2284006288, record_params=0xe9e22a60, 
session=0xe983cd60) at record.c:1163
  #10 _gnutls_recv_in_buffers (session=session@entry=0xe983cd60, 
type=2284006288, type@entry=GNUTLS_HANDSHAKE, htype=65535, 
htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, ms=, 
ms@entry=0) at record.c:1302
  #11 0x88230568 in _gnutls_handshake_io_recv_int 
(session=session@entry=0xe983cd60, 
htype=htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
hsk=hsk@entry=0xdb58ec38, optional=optional@entry=1) at buffers.c:1445
  #12 0x88232b90 in _gnutls_recv_handshake 
(session=session@entry=0xe983cd60, 
type=type@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
optional=optional@entry=1, buf=buf@entry=0x0) at handshake.c:1534
  #13 0x88235b40 in handshake_client 
(session=session@entry=0xe983cd60) at handshake.c:2925
  #14 0x88237824 in gnutls_handshake (session=0xe983cd60) at 
handshake.c:2739
  #15 0xe380213c in qcrypto_tls_session_handshake 
(session=0xe99d5700, errp=0xdb58ee58) at ../crypto/tlssession.c:493
  #16 0xe380ea40 in qio_channel_tls_handshake_task (ioc=0xfffd38001190, 
task=0xea61d4e0, context=0x0) at ../io/channel-tls.c:161
  #17 0xe380ec60 in qio_channel_tls_handshake (ioc=0xfffd38001190, 
func=0xe3394d20 , opaque=0xea189c30, 
destroy=0x0, context=0x0) at ../io/channel-tls.c:239
  #18 0xe3394e78 in multifd_tls_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, errp=0xdb58ef28) at ../migration/multifd.c:782
  #19 0xe3394f30 in multifd_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, error=0x0) at ../migration/multifd.c:804
  #20 0xe33950b8 in multifd_new_send_channel_async 
(task=0xea6855a0, opaque=0xea189c30) at ../migration/multifd.c:858
  #21 0xe3810cf8 in qio_task_complete (task=0xea6855a0) at 
../io/task.c:197
  #22 0xe381096c in qio_task_thread_result (opaque=0xea6855a0) at 
../io/task.c:112
  #23 0x88701df8 in ?? () from target:/usr/lib64/libglib-2.0.so.0
  #24 0x88705a7c in g_main_context_dispatch () from 
target:/usr/lib64/libglib-2.0.so.0
  #25 0xe3a5a29c in glib_pollfds_poll () at ../util/main-loop.c:221
  #26 0xe3a5a324 in os_host_main_loop_wait (timeout=0) at 
../util/main-loop.c:244
  #27 0xe3a5a444 in main_loop_wait (nonblocking=0) at 
../util/main-loop.c:520
  #28 0xe3696b20 in qemu_main_loop () at ../softmmu/vl.c:1677
  #29 0xe30949e4 in main (argc=81, argv=0xdb58f2c8, 
envp=0xdb58f558) at ../softmmu/main.c:50

  =src live_migration stack=:
  #0  0x0

[Bug 1902470] Re: migration with TLS-MultiFD is stuck when the dst-libvirtd service restarts

2020-11-09 Thread Chuan Zheng
this commit is sent and may fix this issue, waiting for review.
https://www.mail-archive.com/qemu-devel@nongnu.org/msg758017.html

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1902470

Title:
  migration with TLS-MultiFD is stuck when the dst-libvirtd service
  restarts

Status in QEMU:
  In Progress

Bug description:
  hi,

  I found that the multi-channel TLS-handshake will be stuck when the
  dst-libvirtd restarts, both the src and dst sockets are blocked in
  recvmsg. In the meantime, live_migration thread is blocked in
  multifd_send_sync_main, so migration cannot be cancelled though src-
  libvirt has delivered the QMP command.

  Is there any way to exit migration when the multi-channel TLS-
  handshake is stuck? Does setting TLS-handshake timeout function take
  effect?

  The stack trace are as follows:

  =src qemu-system-aar stack=:
  #0  0x87d6f28c in recvmsg () from target:/usr/lib64/libpthread.so.0
  #1  0xe3817424 in qio_channel_socket_readv (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at 
../io/channel-socket.c:502
  #2  0xe380f468 in qio_channel_readv_full (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at ../io/channel.c:66
  #3  0xe380f9e8 in qio_channel_read (ioc=0xe9e30a30, 
buf=0xea204e9b "\026\003\001\001L\001", buflen=5, errp=0x0) at 
../io/channel.c:217
  #4  0xe380e7d4 in qio_channel_tls_read_handler (buf=0xea204e9b 
"\026\003\001\001L\001", len=5, opaque=0xfffd38001190) at ../io/channel-tls.c:53
  #5  0xe3801114 in qcrypto_tls_session_pull (opaque=0xe99d5700, 
buf=0xea204e9b, len=5) at ../crypto/tlssession.c:89
  #6  0x8822ed30 in _gnutls_stream_read (ms=0xdb58eaac, 
pull_func=0xfffd38001870, size=5, bufel=, 
session=0xe983cd60) at buffers.c:346
  #7  _gnutls_read (ms=0xdb58eaac, pull_func=0xfffd38001870, size=5, 
bufel=, session=0xe983cd60) at buffers.c:426
  #8  _gnutls_io_read_buffered (session=session@entry=0xe983cd60, total=5, 
recv_type=recv_type@entry=4294967295, ms=0xdb58eaac) at buffers.c:581
  #9  0x88224954 in recv_headers (ms=, 
record=0x883cd000 , 
htype=65535, type=2284006288, record_params=0xe9e22a60, 
session=0xe983cd60) at record.c:1163
  #10 _gnutls_recv_in_buffers (session=session@entry=0xe983cd60, 
type=2284006288, type@entry=GNUTLS_HANDSHAKE, htype=65535, 
htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, ms=, 
ms@entry=0) at record.c:1302
  #11 0x88230568 in _gnutls_handshake_io_recv_int 
(session=session@entry=0xe983cd60, 
htype=htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
hsk=hsk@entry=0xdb58ec38, optional=optional@entry=1) at buffers.c:1445
  #12 0x88232b90 in _gnutls_recv_handshake 
(session=session@entry=0xe983cd60, 
type=type@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
optional=optional@entry=1, buf=buf@entry=0x0) at handshake.c:1534
  #13 0x88235b40 in handshake_client 
(session=session@entry=0xe983cd60) at handshake.c:2925
  #14 0x88237824 in gnutls_handshake (session=0xe983cd60) at 
handshake.c:2739
  #15 0xe380213c in qcrypto_tls_session_handshake 
(session=0xe99d5700, errp=0xdb58ee58) at ../crypto/tlssession.c:493
  #16 0xe380ea40 in qio_channel_tls_handshake_task (ioc=0xfffd38001190, 
task=0xea61d4e0, context=0x0) at ../io/channel-tls.c:161
  #17 0xe380ec60 in qio_channel_tls_handshake (ioc=0xfffd38001190, 
func=0xe3394d20 , opaque=0xea189c30, 
destroy=0x0, context=0x0) at ../io/channel-tls.c:239
  #18 0xe3394e78 in multifd_tls_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, errp=0xdb58ef28) at ../migration/multifd.c:782
  #19 0xe3394f30 in multifd_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, error=0x0) at ../migration/multifd.c:804
  #20 0xe33950b8 in multifd_new_send_channel_async 
(task=0xea6855a0, opaque=0xea189c30) at ../migration/multifd.c:858
  #21 0xe3810cf8 in qio_task_complete (task=0xea6855a0) at 
../io/task.c:197
  #22 0xe381096c in qio_task_thread_result (opaque=0xea6855a0) at 
../io/task.c:112
  #23 0x88701df8 in ?? () from target:/usr/lib64/libglib-2.0.so.0
  #24 0x88705a7c in g_main_context_dispatch () from 
target:/usr/lib64/libglib-2.0.so.0
  #25 0xe3a5a29c in glib_pollfds_poll () at ../util/main-loop.c:221
  #26 0xe3a5a324 in os_host_main_loop_wait (timeout=0) at 
../util/main-loop.c:244
  #27 0xe3a5a444 in main_loop_wait (nonblocking=0) at 
../util/main-loop.c:520
  #28 0xe3696b20 in qemu_main_loop () at ../softmmu/vl.c:1677
  #29 0xe30949e4 in main (argc=81, argv=0xdb58f2c8, 
envp=0xdb58f558) at ../softmmu/main.c:50

  =src live_migration stack=:
  #0  0x87d6a5d8 in pthread_cond

[Bug 1902470] Re: migration with TLS-MultiFD is stuck when the dst-libvirtd service restarts

2020-11-08 Thread Chuan Zheng
** Changed in: qemu
   Status: Confirmed => In Progress

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1902470

Title:
  migration with TLS-MultiFD is stuck when the dst-libvirtd service
  restarts

Status in QEMU:
  In Progress

Bug description:
  hi,

  I found that the multi-channel TLS-handshake will be stuck when the
  dst-libvirtd restarts, both the src and dst sockets are blocked in
  recvmsg. In the meantime, live_migration thread is blocked in
  multifd_send_sync_main, so migration cannot be cancelled though src-
  libvirt has delivered the QMP command.

  Is there any way to exit migration when the multi-channel TLS-
  handshake is stuck? Does setting TLS-handshake timeout function take
  effect?

  The stack trace are as follows:

  =src qemu-system-aar stack=:
  #0  0x87d6f28c in recvmsg () from target:/usr/lib64/libpthread.so.0
  #1  0xe3817424 in qio_channel_socket_readv (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at 
../io/channel-socket.c:502
  #2  0xe380f468 in qio_channel_readv_full (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at ../io/channel.c:66
  #3  0xe380f9e8 in qio_channel_read (ioc=0xe9e30a30, 
buf=0xea204e9b "\026\003\001\001L\001", buflen=5, errp=0x0) at 
../io/channel.c:217
  #4  0xe380e7d4 in qio_channel_tls_read_handler (buf=0xea204e9b 
"\026\003\001\001L\001", len=5, opaque=0xfffd38001190) at ../io/channel-tls.c:53
  #5  0xe3801114 in qcrypto_tls_session_pull (opaque=0xe99d5700, 
buf=0xea204e9b, len=5) at ../crypto/tlssession.c:89
  #6  0x8822ed30 in _gnutls_stream_read (ms=0xdb58eaac, 
pull_func=0xfffd38001870, size=5, bufel=, 
session=0xe983cd60) at buffers.c:346
  #7  _gnutls_read (ms=0xdb58eaac, pull_func=0xfffd38001870, size=5, 
bufel=, session=0xe983cd60) at buffers.c:426
  #8  _gnutls_io_read_buffered (session=session@entry=0xe983cd60, total=5, 
recv_type=recv_type@entry=4294967295, ms=0xdb58eaac) at buffers.c:581
  #9  0x88224954 in recv_headers (ms=, 
record=0x883cd000 , 
htype=65535, type=2284006288, record_params=0xe9e22a60, 
session=0xe983cd60) at record.c:1163
  #10 _gnutls_recv_in_buffers (session=session@entry=0xe983cd60, 
type=2284006288, type@entry=GNUTLS_HANDSHAKE, htype=65535, 
htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, ms=, 
ms@entry=0) at record.c:1302
  #11 0x88230568 in _gnutls_handshake_io_recv_int 
(session=session@entry=0xe983cd60, 
htype=htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
hsk=hsk@entry=0xdb58ec38, optional=optional@entry=1) at buffers.c:1445
  #12 0x88232b90 in _gnutls_recv_handshake 
(session=session@entry=0xe983cd60, 
type=type@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
optional=optional@entry=1, buf=buf@entry=0x0) at handshake.c:1534
  #13 0x88235b40 in handshake_client 
(session=session@entry=0xe983cd60) at handshake.c:2925
  #14 0x88237824 in gnutls_handshake (session=0xe983cd60) at 
handshake.c:2739
  #15 0xe380213c in qcrypto_tls_session_handshake 
(session=0xe99d5700, errp=0xdb58ee58) at ../crypto/tlssession.c:493
  #16 0xe380ea40 in qio_channel_tls_handshake_task (ioc=0xfffd38001190, 
task=0xea61d4e0, context=0x0) at ../io/channel-tls.c:161
  #17 0xe380ec60 in qio_channel_tls_handshake (ioc=0xfffd38001190, 
func=0xe3394d20 , opaque=0xea189c30, 
destroy=0x0, context=0x0) at ../io/channel-tls.c:239
  #18 0xe3394e78 in multifd_tls_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, errp=0xdb58ef28) at ../migration/multifd.c:782
  #19 0xe3394f30 in multifd_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, error=0x0) at ../migration/multifd.c:804
  #20 0xe33950b8 in multifd_new_send_channel_async 
(task=0xea6855a0, opaque=0xea189c30) at ../migration/multifd.c:858
  #21 0xe3810cf8 in qio_task_complete (task=0xea6855a0) at 
../io/task.c:197
  #22 0xe381096c in qio_task_thread_result (opaque=0xea6855a0) at 
../io/task.c:112
  #23 0x88701df8 in ?? () from target:/usr/lib64/libglib-2.0.so.0
  #24 0x88705a7c in g_main_context_dispatch () from 
target:/usr/lib64/libglib-2.0.so.0
  #25 0xe3a5a29c in glib_pollfds_poll () at ../util/main-loop.c:221
  #26 0xe3a5a324 in os_host_main_loop_wait (timeout=0) at 
../util/main-loop.c:244
  #27 0xe3a5a444 in main_loop_wait (nonblocking=0) at 
../util/main-loop.c:520
  #28 0xe3696b20 in qemu_main_loop () at ../softmmu/vl.c:1677
  #29 0xe30949e4 in main (argc=81, argv=0xdb58f2c8, 
envp=0xdb58f558) at ../softmmu/main.c:50

  =src live_migration stack=:
  #0  0x87d6a5d8 in pthread_cond_wait () from 
target:/usr/lib64/libpthread.so.0
  #1  0xe3a5f

[Bug 1902470] Re: migration with TLS-MultiFD is stuck when the dst-libvirtd service restarts

2020-11-05 Thread Chuan Zheng
** Changed in: qemu
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1902470

Title:
  migration with TLS-MultiFD is stuck when the dst-libvirtd service
  restarts

Status in QEMU:
  Confirmed

Bug description:
  hi,

  I found that the multi-channel TLS-handshake will be stuck when the
  dst-libvirtd restarts, both the src and dst sockets are blocked in
  recvmsg. In the meantime, live_migration thread is blocked in
  multifd_send_sync_main, so migration cannot be cancelled though src-
  libvirt has delivered the QMP command.

  Is there any way to exit migration when the multi-channel TLS-
  handshake is stuck? Does setting TLS-handshake timeout function take
  effect?

  The stack trace are as follows:

  =src qemu-system-aar stack=:
  #0  0x87d6f28c in recvmsg () from target:/usr/lib64/libpthread.so.0
  #1  0xe3817424 in qio_channel_socket_readv (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at 
../io/channel-socket.c:502
  #2  0xe380f468 in qio_channel_readv_full (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at ../io/channel.c:66
  #3  0xe380f9e8 in qio_channel_read (ioc=0xe9e30a30, 
buf=0xea204e9b "\026\003\001\001L\001", buflen=5, errp=0x0) at 
../io/channel.c:217
  #4  0xe380e7d4 in qio_channel_tls_read_handler (buf=0xea204e9b 
"\026\003\001\001L\001", len=5, opaque=0xfffd38001190) at ../io/channel-tls.c:53
  #5  0xe3801114 in qcrypto_tls_session_pull (opaque=0xe99d5700, 
buf=0xea204e9b, len=5) at ../crypto/tlssession.c:89
  #6  0x8822ed30 in _gnutls_stream_read (ms=0xdb58eaac, 
pull_func=0xfffd38001870, size=5, bufel=, 
session=0xe983cd60) at buffers.c:346
  #7  _gnutls_read (ms=0xdb58eaac, pull_func=0xfffd38001870, size=5, 
bufel=, session=0xe983cd60) at buffers.c:426
  #8  _gnutls_io_read_buffered (session=session@entry=0xe983cd60, total=5, 
recv_type=recv_type@entry=4294967295, ms=0xdb58eaac) at buffers.c:581
  #9  0x88224954 in recv_headers (ms=, 
record=0x883cd000 , 
htype=65535, type=2284006288, record_params=0xe9e22a60, 
session=0xe983cd60) at record.c:1163
  #10 _gnutls_recv_in_buffers (session=session@entry=0xe983cd60, 
type=2284006288, type@entry=GNUTLS_HANDSHAKE, htype=65535, 
htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, ms=, 
ms@entry=0) at record.c:1302
  #11 0x88230568 in _gnutls_handshake_io_recv_int 
(session=session@entry=0xe983cd60, 
htype=htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
hsk=hsk@entry=0xdb58ec38, optional=optional@entry=1) at buffers.c:1445
  #12 0x88232b90 in _gnutls_recv_handshake 
(session=session@entry=0xe983cd60, 
type=type@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
optional=optional@entry=1, buf=buf@entry=0x0) at handshake.c:1534
  #13 0x88235b40 in handshake_client 
(session=session@entry=0xe983cd60) at handshake.c:2925
  #14 0x88237824 in gnutls_handshake (session=0xe983cd60) at 
handshake.c:2739
  #15 0xe380213c in qcrypto_tls_session_handshake 
(session=0xe99d5700, errp=0xdb58ee58) at ../crypto/tlssession.c:493
  #16 0xe380ea40 in qio_channel_tls_handshake_task (ioc=0xfffd38001190, 
task=0xea61d4e0, context=0x0) at ../io/channel-tls.c:161
  #17 0xe380ec60 in qio_channel_tls_handshake (ioc=0xfffd38001190, 
func=0xe3394d20 , opaque=0xea189c30, 
destroy=0x0, context=0x0) at ../io/channel-tls.c:239
  #18 0xe3394e78 in multifd_tls_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, errp=0xdb58ef28) at ../migration/multifd.c:782
  #19 0xe3394f30 in multifd_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, error=0x0) at ../migration/multifd.c:804
  #20 0xe33950b8 in multifd_new_send_channel_async 
(task=0xea6855a0, opaque=0xea189c30) at ../migration/multifd.c:858
  #21 0xe3810cf8 in qio_task_complete (task=0xea6855a0) at 
../io/task.c:197
  #22 0xe381096c in qio_task_thread_result (opaque=0xea6855a0) at 
../io/task.c:112
  #23 0x88701df8 in ?? () from target:/usr/lib64/libglib-2.0.so.0
  #24 0x88705a7c in g_main_context_dispatch () from 
target:/usr/lib64/libglib-2.0.so.0
  #25 0xe3a5a29c in glib_pollfds_poll () at ../util/main-loop.c:221
  #26 0xe3a5a324 in os_host_main_loop_wait (timeout=0) at 
../util/main-loop.c:244
  #27 0xe3a5a444 in main_loop_wait (nonblocking=0) at 
../util/main-loop.c:520
  #28 0xe3696b20 in qemu_main_loop () at ../softmmu/vl.c:1677
  #29 0xe30949e4 in main (argc=81, argv=0xdb58f2c8, 
envp=0xdb58f558) at ../softmmu/main.c:50

  =src live_migration stack=:
  #0  0x87d6a5d8 in pthread_cond_wait () from 
target:/usr/lib64/libpthread.so.0
  #1  0xe3a5f3ec in qem

Re: [Bug 1902470] Re: migration with TLS-MultiFD is stuck when the dst-libvirtd service restarts

2020-11-03 Thread Zheng Chuan
I think i've got what Daniel point in another maillist about this problem.

This is exactly due to Blocking I/O issue of TLS handshake.

Src: (multifd_send_0)   
Dst: (multifd_recv_1)
multifd_channel_connect 
migration_channel_process_incoming
multifd_tls_channel_connect 
   migration_tls_channel_process_incoming
   multifd_tls_channel_connect  
   qio_channel_tls_handshake_task
   qio_channel_tls_handshake
 gnutls_handshake
 qio_channel_tls_handshake_task 
  ...
 qcrypto_tls_session_handshake  
  ...
  gnutls_handshake  
  ...
 ...
  ...
   recvmsg (Blocking I/O waiting for response)  
 recvmsg (Blocking I/O waiting for response)

Here is how hang up happens.
The Src multifd_send_0 invokes tls handshake, it sends hello to sever and wait 
response.
However, the Dst main qemu loop has been waiting recvmsg() for multifd_recv_1.
Both of Src and Dst main qemu loop are blocking and waiting for reponse which 
results in hang forever.

I have verified it through gdb that shows they are belong to different TLS 
handshake socket on Src and Dst.

So to solve this problem, one method maybe is that
we need to extract multifd_channel_connect() from 
multifd_new_send_channel_async as a qio task, which could
offload tls handshake to the thread other than qemu main loop?


On 2020/11/3 13:52, Zheng Chuan wrote:
> 
> 
> On 2020/11/3 4:16, Dr. David Alan Gilbert wrote:
>> * zhengchuan (zhengch...@huawei.com) wrote:
>>> Anyone who could help this would be appreciated since we have stuck for 
>>> three days:(
>>>
>>> IIUC, the client (Src) has sent first hello message to sever(Dst), however 
>>> due to something happened while restarted libvirtd,
>>> The messages is lost, and both of them are waiting which leading to hang 
>>> forever, but I could find out how for now.
>>
>> If you need to un-break things, I suggest killing the destination might
>> free it; but I'm not sure.
>>
> Hi, Dave.
> Unfortunately, no. After killing the destination, it left Src main migration 
> thread stuck at multifd_send_sync_main().
> 
>> An interesting question is if we can make migration-cancel work in this
>> case.
>>
>> Dave
>>
> Bad thing happened, since the main qemu thread is stuck at recvmsg(), qemu 
> could not respond for libvirt qmp_migrate_cancel:(
> 
> During the time, I also found another question is that the Dst socket 
> connections are not closed after migration-cancel,
> multifd channel would be left with status of CLOSE-WAIT if we look at them 
> though 'ss' command.
> 
> This is because the multifd_save_cleanup() is simply call 
> socket_send_channel_destroy and unref the ioc other than calling
> qio_channel_shutdown() in multifd_recv_terminate_threads(), It is not working 
> for tls channel.
> Simply working around by adding qio_channel_shutdown like this
> for (i = 0; i < migrate_multifd_channels(); i++) {
> MultiFDSendParams *p = &multifd_send_state->params[i];
> 
> +   qio_channel_shutdown(p->c, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
> socket_send_channel_destroy(p->c);
> }
> The residual socket is closed, but i doubt if it is the correct solution...
> 
> Back to the problem described in this issue, it is still not resolved after 
> this working around, but i think it is also a similiar
> cleanup issue, and i will dig it out more further...
> 
> 
>>> -----Original Message-----
>>> From: Qemu-devel 
>>> [mailto:qemu-devel-bounces+zhengchuan=huawei@nongnu.org] On Behalf Of 
>>> Yan Jin
>>> Sent: 2020年11月2日 11:12
>>> To: qemu-devel@nongnu.org
>>> Subject: [Bug 1902470] Re: migration with TLS-MultiFD is stuck when the 
>>> dst-libvirtd service restarts
>>>
>>> ** Description changed:
>>>
>>>   hi,
>>>   
>>>   I found that the multi-channel TLS-handshake will be stuck when the dst-
>>>   libvirtd restarts, both the src and dst sockets are blocked in recvmsg.
>>>   In the meantime, live_migration thread is blo

Re: [Bug 1902470] Re: migration with TLS-MultiFD is stuck when the dst-libvirtd service restarts

2020-11-03 Thread Zheng Chuan



On 2020/11/3 17:29, Daniel Berrange wrote:
> This looks to me like a significant implementation flaw in the QEMU
> code. Both src and dst QEMU appear to be running code from the main
> event loop, and they appear to be doing blocking I/O operations. This is
> very bad as we should never have anything running in the main event loop
> thread that is able to block on I/O.
> 
Well, the tls handshake seems to be blocking I/O.

> So to solve this something needs to be done to make sure the I/O is
> either non-blocking, or if it has to be blocking, then it needs to be
> offloaded to a background thread.
> 
Yes, i agree.
Since we do multifd tls handshake in main thread through multifd_save_setup(), 
maybe
we need to make socket_send_channel_create() to be a background thread other
than qio_channel_socket_connect_async()?

Besides,the hang problem itself still need to be figured out and solved...

-- 
Regards.
Chuan



[Bug 1902470] Re: migration with TLS-MultiFD is stuck when the dst-libvirtd service restarts

2020-11-03 Thread Daniel Berrange
This looks to me like a significant implementation flaw in the QEMU
code. Both src and dst QEMU appear to be running code from the main
event loop, and they appear to be doing blocking I/O operations. This is
very bad as we should never have anything running in the main event loop
thread that is able to block on I/O.

So to solve this something needs to be done to make sure the I/O is
either non-blocking, or if it has to be blocking, then it needs to be
offloaded to a background thread.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1902470

Title:
  migration with TLS-MultiFD is stuck when the dst-libvirtd service
  restarts

Status in QEMU:
  New

Bug description:
  hi,

  I found that the multi-channel TLS-handshake will be stuck when the
  dst-libvirtd restarts, both the src and dst sockets are blocked in
  recvmsg. In the meantime, live_migration thread is blocked in
  multifd_send_sync_main, so migration cannot be cancelled though src-
  libvirt has delivered the QMP command.

  Is there any way to exit migration when the multi-channel TLS-
  handshake is stuck? Does setting TLS-handshake timeout function take
  effect?

  The stack trace are as follows:

  =src qemu-system-aar stack=:
  #0  0x87d6f28c in recvmsg () from target:/usr/lib64/libpthread.so.0
  #1  0xe3817424 in qio_channel_socket_readv (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at 
../io/channel-socket.c:502
  #2  0xe380f468 in qio_channel_readv_full (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at ../io/channel.c:66
  #3  0xe380f9e8 in qio_channel_read (ioc=0xe9e30a30, 
buf=0xea204e9b "\026\003\001\001L\001", buflen=5, errp=0x0) at 
../io/channel.c:217
  #4  0xe380e7d4 in qio_channel_tls_read_handler (buf=0xea204e9b 
"\026\003\001\001L\001", len=5, opaque=0xfffd38001190) at ../io/channel-tls.c:53
  #5  0xe3801114 in qcrypto_tls_session_pull (opaque=0xe99d5700, 
buf=0xea204e9b, len=5) at ../crypto/tlssession.c:89
  #6  0x8822ed30 in _gnutls_stream_read (ms=0xdb58eaac, 
pull_func=0xfffd38001870, size=5, bufel=, 
session=0xe983cd60) at buffers.c:346
  #7  _gnutls_read (ms=0xdb58eaac, pull_func=0xfffd38001870, size=5, 
bufel=, session=0xe983cd60) at buffers.c:426
  #8  _gnutls_io_read_buffered (session=session@entry=0xe983cd60, total=5, 
recv_type=recv_type@entry=4294967295, ms=0xdb58eaac) at buffers.c:581
  #9  0x88224954 in recv_headers (ms=, 
record=0x883cd000 , 
htype=65535, type=2284006288, record_params=0xe9e22a60, 
session=0xe983cd60) at record.c:1163
  #10 _gnutls_recv_in_buffers (session=session@entry=0xe983cd60, 
type=2284006288, type@entry=GNUTLS_HANDSHAKE, htype=65535, 
htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, ms=, 
ms@entry=0) at record.c:1302
  #11 0x88230568 in _gnutls_handshake_io_recv_int 
(session=session@entry=0xe983cd60, 
htype=htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
hsk=hsk@entry=0xdb58ec38, optional=optional@entry=1) at buffers.c:1445
  #12 0x88232b90 in _gnutls_recv_handshake 
(session=session@entry=0xe983cd60, 
type=type@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
optional=optional@entry=1, buf=buf@entry=0x0) at handshake.c:1534
  #13 0x88235b40 in handshake_client 
(session=session@entry=0xe983cd60) at handshake.c:2925
  #14 0x88237824 in gnutls_handshake (session=0xe983cd60) at 
handshake.c:2739
  #15 0xe380213c in qcrypto_tls_session_handshake 
(session=0xe99d5700, errp=0xdb58ee58) at ../crypto/tlssession.c:493
  #16 0xe380ea40 in qio_channel_tls_handshake_task (ioc=0xfffd38001190, 
task=0xea61d4e0, context=0x0) at ../io/channel-tls.c:161
  #17 0xe380ec60 in qio_channel_tls_handshake (ioc=0xfffd38001190, 
func=0xe3394d20 , opaque=0xea189c30, 
destroy=0x0, context=0x0) at ../io/channel-tls.c:239
  #18 0xe3394e78 in multifd_tls_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, errp=0xdb58ef28) at ../migration/multifd.c:782
  #19 0xe3394f30 in multifd_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, error=0x0) at ../migration/multifd.c:804
  #20 0xe33950b8 in multifd_new_send_channel_async 
(task=0xea6855a0, opaque=0xea189c30) at ../migration/multifd.c:858
  #21 0xe3810cf8 in qio_task_complete (task=0xea6855a0) at 
../io/task.c:197
  #22 0xe381096c in qio_task_thread_result (opaque=0xea6855a0) at 
../io/task.c:112
  #23 0x88701df8 in ?? () from target:/usr/lib64/libglib-2.0.so.0
  #24 0x88705a7c in g_main_context_dispatch () from 
target:/usr/lib64/libglib-2.0.so.0
  #25 0xe3a5a29c in glib_pollfds_poll () at ../util/main-loop.c:221
  #26 0xe3a5a324 in os_host_main_loop_wait (timeout=0) at 
../util/main-loo

Re: [Bug 1902470] Re: migration with TLS-MultiFD is stuck when the dst-libvirtd service restarts

2020-11-02 Thread Zheng Chuan



On 2020/11/3 4:16, Dr. David Alan Gilbert wrote:
> * zhengchuan (zhengch...@huawei.com) wrote:
>> Anyone who could help this would be appreciated since we have stuck for 
>> three days:(
>>
>> IIUC, the client (Src) has sent first hello message to sever(Dst), however 
>> due to something happened while restarted libvirtd,
>> The messages is lost, and both of them are waiting which leading to hang 
>> forever, but I could find out how for now.
> 
> If you need to un-break things, I suggest killing the destination might
> free it; but I'm not sure.
> 
Hi, Dave.
Unfortunately, no. After killing the destination, it left Src main migration 
thread stuck at multifd_send_sync_main().

> An interesting question is if we can make migration-cancel work in this
> case.
> 
> Dave
> 
Bad thing happened, since the main qemu thread is stuck at recvmsg(), qemu 
could not respond for libvirt qmp_migrate_cancel:(

During the time, I also found another question is that the Dst socket 
connections are not closed after migration-cancel,
multifd channel would be left with status of CLOSE-WAIT if we look at them 
though 'ss' command.

This is because the multifd_save_cleanup() is simply call 
socket_send_channel_destroy and unref the ioc other than calling
qio_channel_shutdown() in multifd_recv_terminate_threads(), It is not working 
for tls channel.
Simply working around by adding qio_channel_shutdown like this
for (i = 0; i < migrate_multifd_channels(); i++) {
MultiFDSendParams *p = &multifd_send_state->params[i];

+   qio_channel_shutdown(p->c, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
socket_send_channel_destroy(p->c);
}
The residual socket is closed, but i doubt if it is the correct solution...

Back to the problem described in this issue, it is still not resolved after 
this working around, but i think it is also a similiar
cleanup issue, and i will dig it out more further...


>> -Original Message-
>> From: Qemu-devel 
>> [mailto:qemu-devel-bounces+zhengchuan=huawei....@nongnu.org] On Behalf Of 
>> Yan Jin
>> Sent: 2020年11月2日 11:12
>> To: qemu-devel@nongnu.org
>> Subject: [Bug 1902470] Re: migration with TLS-MultiFD is stuck when the 
>> dst-libvirtd service restarts
>>
>> ** Description changed:
>>
>>   hi,
>>   
>>   I found that the multi-channel TLS-handshake will be stuck when the dst-
>>   libvirtd restarts, both the src and dst sockets are blocked in recvmsg.
>>   In the meantime, live_migration thread is blocked in
>>   multifd_send_sync_main, so migration cannot be cancelled though src-
>>   libvirt has delivered the QMP command.
>>   
>>   Is there any way to exit migration when the multi-channel TLS-handshake
>> - is stuck? Does setting TLS handshake timeout function take effect?
>> + is stuck? Does setting TLS-handshake timeout function take effect?
>>   
>>   The stack trace are as follows:
>>   
>>   =src qemu-system-aar stack=:
>>   #0  0x87d6f28c in recvmsg () from target:/usr/lib64/libpthread.so.0
>>   #1  0xe3817424 in qio_channel_socket_readv (ioc=0xe9e30a30, 
>> iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at 
>> ../io/channel-socket.c:502
>>   #2  0xe380f468 in qio_channel_readv_full (ioc=0xe9e30a30, 
>> iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at 
>> ../io/channel.c:66
>>   #3  0xe380f9e8 in qio_channel_read (ioc=0xe9e30a30, 
>> buf=0xea204e9b "\026\003\001\001L\001", buflen=5, errp=0x0) at 
>> ../io/channel.c:217
>>   #4  0xe380e7d4 in qio_channel_tls_read_handler (buf=0xea204e9b 
>> "\026\003\001\001L\001", len=5, opaque=0xfffd38001190) at 
>> ../io/channel-tls.c:53
>>   #5  0xe3801114 in qcrypto_tls_session_pull (opaque=0xe99d5700, 
>> buf=0xea204e9b, len=5) at ../crypto/tlssession.c:89
>>   #6  0x8822ed30 in _gnutls_stream_read (ms=0xdb58eaac, 
>> pull_func=0xfffd38001870, size=5, bufel=, 
>> session=0xe983cd60) at buffers.c:346
>>   #7  _gnutls_read (ms=0xdb58eaac, pull_func=0xfffd38001870, size=5, 
>> bufel=, session=0xe983cd60) at buffers.c:426
>>   #8  _gnutls_io_read_buffered (session=session@entry=0xe983cd60, 
>> total=5, recv_type=recv_type@entry=4294967295, ms=0xdb58eaac) at 
>> buffers.c:581
>>   #9  0x88224954 in recv_headers (ms=, 
>> record=0x883cd000 , 
>> htype=65535, type=2284006288, record_params=0xe9e22a60, 
>> session=0xe983cd60) at record.c:1163
>>   #10 _gnutls_recv_in_buffers (session=session@entry=0xe983cd60, 
>> type=2284006288, type

Re: [Bug 1902470] Re: migration with TLS-MultiFD is stuck when the dst-libvirtd service restarts

2020-11-02 Thread Dr. David Alan Gilbert
* zhengchuan (zhengch...@huawei.com) wrote:
> Anyone who could help this would be appreciated since we have stuck for three 
> days:(
> 
> IIUC, the client (Src) has sent first hello message to sever(Dst), however 
> due to something happened while restarted libvirtd,
> The messages is lost, and both of them are waiting which leading to hang 
> forever, but I could find out how for now.

If you need to un-break things, I suggest killing the destination might
free it; but I'm not sure.

An interesting question is if we can make migration-cancel work in this
case.

Dave

> -Original Message-
> From: Qemu-devel [mailto:qemu-devel-bounces+zhengchuan=huawei@nongnu.org] 
> On Behalf Of Yan Jin
> Sent: 2020年11月2日 11:12
> To: qemu-devel@nongnu.org
> Subject: [Bug 1902470] Re: migration with TLS-MultiFD is stuck when the 
> dst-libvirtd service restarts
> 
> ** Description changed:
> 
>   hi,
>   
>   I found that the multi-channel TLS-handshake will be stuck when the dst-
>   libvirtd restarts, both the src and dst sockets are blocked in recvmsg.
>   In the meantime, live_migration thread is blocked in
>   multifd_send_sync_main, so migration cannot be cancelled though src-
>   libvirt has delivered the QMP command.
>   
>   Is there any way to exit migration when the multi-channel TLS-handshake
> - is stuck? Does setting TLS handshake timeout function take effect?
> + is stuck? Does setting TLS-handshake timeout function take effect?
>   
>   The stack trace are as follows:
>   
>   =src qemu-system-aar stack=:
>   #0  0x87d6f28c in recvmsg () from target:/usr/lib64/libpthread.so.0
>   #1  0xe3817424 in qio_channel_socket_readv (ioc=0xe9e30a30, 
> iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at 
> ../io/channel-socket.c:502
>   #2  0xe380f468 in qio_channel_readv_full (ioc=0xe9e30a30, 
> iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at ../io/channel.c:66
>   #3  0xe380f9e8 in qio_channel_read (ioc=0xe9e30a30, 
> buf=0xea204e9b "\026\003\001\001L\001", buflen=5, errp=0x0) at 
> ../io/channel.c:217
>   #4  0xe380e7d4 in qio_channel_tls_read_handler (buf=0xea204e9b 
> "\026\003\001\001L\001", len=5, opaque=0xfffd38001190) at 
> ../io/channel-tls.c:53
>   #5  0xe3801114 in qcrypto_tls_session_pull (opaque=0xe99d5700, 
> buf=0xea204e9b, len=5) at ../crypto/tlssession.c:89
>   #6  0x8822ed30 in _gnutls_stream_read (ms=0xdb58eaac, 
> pull_func=0xfffd38001870, size=5, bufel=, 
> session=0xe983cd60) at buffers.c:346
>   #7  _gnutls_read (ms=0xdb58eaac, pull_func=0xfffd38001870, size=5, 
> bufel=, session=0xe983cd60) at buffers.c:426
>   #8  _gnutls_io_read_buffered (session=session@entry=0xe983cd60, 
> total=5, recv_type=recv_type@entry=4294967295, ms=0xdb58eaac) at 
> buffers.c:581
>   #9  0x88224954 in recv_headers (ms=, 
> record=0x883cd000 , 
> htype=65535, type=2284006288, record_params=0xe9e22a60, 
> session=0xe983cd60) at record.c:1163
>   #10 _gnutls_recv_in_buffers (session=session@entry=0xe983cd60, 
> type=2284006288, type@entry=GNUTLS_HANDSHAKE, htype=65535, 
> htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, ms=, 
> ms@entry=0) at record.c:1302
>   #11 0x88230568 in _gnutls_handshake_io_recv_int 
> (session=session@entry=0xe983cd60, 
> htype=htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
> hsk=hsk@entry=0xdb58ec38, optional=optional@entry=1) at buffers.c:1445
>   #12 0x88232b90 in _gnutls_recv_handshake 
> (session=session@entry=0xe983cd60, 
> type=type@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
> optional=optional@entry=1, buf=buf@entry=0x0) at handshake.c:1534
>   #13 0x88235b40 in handshake_client 
> (session=session@entry=0xe983cd60) at handshake.c:2925
>   #14 0x88237824 in gnutls_handshake (session=0xe983cd60) at 
> handshake.c:2739
>   #15 0xe380213c in qcrypto_tls_session_handshake 
> (session=0xe99d5700, errp=0xdb58ee58) at ../crypto/tlssession.c:493
>   #16 0xe380ea40 in qio_channel_tls_handshake_task 
> (ioc=0xfffd38001190, task=0xea61d4e0, context=0x0) at 
> ../io/channel-tls.c:161
>   #17 0xe380ec60 in qio_channel_tls_handshake (ioc=0xfffd38001190, 
> func=0xe3394d20 , opaque=0xea189c30, 
> destroy=0x0, context=0x0) at ../io/channel-tls.c:239
>   #18 0xe3394e78 in multifd_tls_channel_connect (p=0xea189c30, 
> ioc=0xe9e30a30, errp=0xdb58ef28) at ../migration/multifd.c:782
>   #19 0xe3394f30 in multifd_channel_connect (p=0xea189c30, 
> ioc=0xe9e30a30, error=0x0) at ../migration/multifd.c:804
>   #2

RE: [Bug 1902470] Re: migration with TLS-MultiFD is stuck when the dst-libvirtd service restarts

2020-11-02 Thread zhengchuan
Anyone who could help this would be appreciated since we have stuck for three 
days:(

IIUC, the client (Src) has sent first hello message to sever(Dst), however due 
to something happened while restarted libvirtd,
The messages is lost, and both of them are waiting which leading to hang 
forever, but I could find out how for now.

-Original Message-
From: Qemu-devel [mailto:qemu-devel-bounces+zhengchuan=huawei@nongnu.org] 
On Behalf Of Yan Jin
Sent: 2020年11月2日 11:12
To: qemu-devel@nongnu.org
Subject: [Bug 1902470] Re: migration with TLS-MultiFD is stuck when the 
dst-libvirtd service restarts

** Description changed:

  hi,
  
  I found that the multi-channel TLS-handshake will be stuck when the dst-
  libvirtd restarts, both the src and dst sockets are blocked in recvmsg.
  In the meantime, live_migration thread is blocked in
  multifd_send_sync_main, so migration cannot be cancelled though src-
  libvirt has delivered the QMP command.
  
  Is there any way to exit migration when the multi-channel TLS-handshake
- is stuck? Does setting TLS handshake timeout function take effect?
+ is stuck? Does setting TLS-handshake timeout function take effect?
  
  The stack trace are as follows:
  
  =src qemu-system-aar stack=:
  #0  0x87d6f28c in recvmsg () from target:/usr/lib64/libpthread.so.0
  #1  0xe3817424 in qio_channel_socket_readv (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at 
../io/channel-socket.c:502
  #2  0xe380f468 in qio_channel_readv_full (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at ../io/channel.c:66
  #3  0xe380f9e8 in qio_channel_read (ioc=0xe9e30a30, 
buf=0xea204e9b "\026\003\001\001L\001", buflen=5, errp=0x0) at 
../io/channel.c:217
  #4  0xe380e7d4 in qio_channel_tls_read_handler (buf=0xea204e9b 
"\026\003\001\001L\001", len=5, opaque=0xfffd38001190) at ../io/channel-tls.c:53
  #5  0xe3801114 in qcrypto_tls_session_pull (opaque=0xe99d5700, 
buf=0xea204e9b, len=5) at ../crypto/tlssession.c:89
  #6  0x8822ed30 in _gnutls_stream_read (ms=0xdb58eaac, 
pull_func=0xfffd38001870, size=5, bufel=, 
session=0xe983cd60) at buffers.c:346
  #7  _gnutls_read (ms=0xdb58eaac, pull_func=0xfffd38001870, size=5, 
bufel=, session=0xe983cd60) at buffers.c:426
  #8  _gnutls_io_read_buffered (session=session@entry=0xe983cd60, total=5, 
recv_type=recv_type@entry=4294967295, ms=0xdb58eaac) at buffers.c:581
  #9  0x88224954 in recv_headers (ms=, 
record=0x883cd000 , 
htype=65535, type=2284006288, record_params=0xe9e22a60, 
session=0xe983cd60) at record.c:1163
  #10 _gnutls_recv_in_buffers (session=session@entry=0xe983cd60, 
type=2284006288, type@entry=GNUTLS_HANDSHAKE, htype=65535, 
htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, ms=, 
ms@entry=0) at record.c:1302
  #11 0x88230568 in _gnutls_handshake_io_recv_int 
(session=session@entry=0xe983cd60, 
htype=htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
hsk=hsk@entry=0xdb58ec38, optional=optional@entry=1) at buffers.c:1445
  #12 0x88232b90 in _gnutls_recv_handshake 
(session=session@entry=0xe983cd60, 
type=type@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
optional=optional@entry=1, buf=buf@entry=0x0) at handshake.c:1534
  #13 0x88235b40 in handshake_client 
(session=session@entry=0xe983cd60) at handshake.c:2925
  #14 0x88237824 in gnutls_handshake (session=0xe983cd60) at 
handshake.c:2739
  #15 0xe380213c in qcrypto_tls_session_handshake 
(session=0xe99d5700, errp=0xdb58ee58) at ../crypto/tlssession.c:493
  #16 0xe380ea40 in qio_channel_tls_handshake_task (ioc=0xfffd38001190, 
task=0xea61d4e0, context=0x0) at ../io/channel-tls.c:161
  #17 0xe380ec60 in qio_channel_tls_handshake (ioc=0xfffd38001190, 
func=0xe3394d20 , opaque=0xea189c30, 
destroy=0x0, context=0x0) at ../io/channel-tls.c:239
  #18 0xe3394e78 in multifd_tls_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, errp=0xdb58ef28) at ../migration/multifd.c:782
  #19 0xe3394f30 in multifd_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, error=0x0) at ../migration/multifd.c:804
  #20 0xe33950b8 in multifd_new_send_channel_async 
(task=0xea6855a0, opaque=0xea189c30) at ../migration/multifd.c:858
  #21 0xe3810cf8 in qio_task_complete (task=0xea6855a0) at 
../io/task.c:197
  #22 0xe381096c in qio_task_thread_result (opaque=0xea6855a0) at 
../io/task.c:112
  #23 0x88701df8 in ?? () from target:/usr/lib64/libglib-2.0.so.0
  #24 0x88705a7c in g_main_context_dispatch () from 
target:/usr/lib64/libglib-2.0.so.0
  #25 0xe3a5a29c in glib_pollfds_poll () at ../util/main-loop.c:221
  #26 0xe3a5a324 in os_host_main_loop_wait (timeout=0) at 
../util/main-loop.c:244
  #27 0xe3a5a4

[Bug 1902470] Re: migration with TLS-MultiFD is stuck when the dst-libvirtd service restarts

2020-11-01 Thread Yan Jin
** Description changed:

  hi,
  
  I found that the multi-channel TLS-handshake will be stuck when the dst-
  libvirtd restarts, both the src and dst sockets are blocked in recvmsg.
  In the meantime, live_migration thread is blocked in
  multifd_send_sync_main, so migration cannot be cancelled though src-
  libvirt has delivered the QMP command.
  
  Is there any way to exit migration when the multi-channel TLS-handshake
- is stuck? Does setting TLS handshake timeout function take effect?
+ is stuck? Does setting TLS-handshake timeout function take effect?
  
  The stack trace are as follows:
  
  =src qemu-system-aar stack=:
  #0  0x87d6f28c in recvmsg () from target:/usr/lib64/libpthread.so.0
  #1  0xe3817424 in qio_channel_socket_readv (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at 
../io/channel-socket.c:502
  #2  0xe380f468 in qio_channel_readv_full (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at ../io/channel.c:66
  #3  0xe380f9e8 in qio_channel_read (ioc=0xe9e30a30, 
buf=0xea204e9b "\026\003\001\001L\001", buflen=5, errp=0x0) at 
../io/channel.c:217
  #4  0xe380e7d4 in qio_channel_tls_read_handler (buf=0xea204e9b 
"\026\003\001\001L\001", len=5, opaque=0xfffd38001190) at ../io/channel-tls.c:53
  #5  0xe3801114 in qcrypto_tls_session_pull (opaque=0xe99d5700, 
buf=0xea204e9b, len=5) at ../crypto/tlssession.c:89
  #6  0x8822ed30 in _gnutls_stream_read (ms=0xdb58eaac, 
pull_func=0xfffd38001870, size=5, bufel=, 
session=0xe983cd60) at buffers.c:346
  #7  _gnutls_read (ms=0xdb58eaac, pull_func=0xfffd38001870, size=5, 
bufel=, session=0xe983cd60) at buffers.c:426
  #8  _gnutls_io_read_buffered (session=session@entry=0xe983cd60, total=5, 
recv_type=recv_type@entry=4294967295, ms=0xdb58eaac) at buffers.c:581
  #9  0x88224954 in recv_headers (ms=, 
record=0x883cd000 , 
htype=65535, type=2284006288, record_params=0xe9e22a60, 
session=0xe983cd60) at record.c:1163
  #10 _gnutls_recv_in_buffers (session=session@entry=0xe983cd60, 
type=2284006288, type@entry=GNUTLS_HANDSHAKE, htype=65535, 
htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, ms=, 
ms@entry=0) at record.c:1302
  #11 0x88230568 in _gnutls_handshake_io_recv_int 
(session=session@entry=0xe983cd60, 
htype=htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
hsk=hsk@entry=0xdb58ec38, optional=optional@entry=1) at buffers.c:1445
  #12 0x88232b90 in _gnutls_recv_handshake 
(session=session@entry=0xe983cd60, 
type=type@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
optional=optional@entry=1, buf=buf@entry=0x0) at handshake.c:1534
  #13 0x88235b40 in handshake_client 
(session=session@entry=0xe983cd60) at handshake.c:2925
  #14 0x88237824 in gnutls_handshake (session=0xe983cd60) at 
handshake.c:2739
  #15 0xe380213c in qcrypto_tls_session_handshake 
(session=0xe99d5700, errp=0xdb58ee58) at ../crypto/tlssession.c:493
  #16 0xe380ea40 in qio_channel_tls_handshake_task (ioc=0xfffd38001190, 
task=0xea61d4e0, context=0x0) at ../io/channel-tls.c:161
  #17 0xe380ec60 in qio_channel_tls_handshake (ioc=0xfffd38001190, 
func=0xe3394d20 , opaque=0xea189c30, 
destroy=0x0, context=0x0) at ../io/channel-tls.c:239
  #18 0xe3394e78 in multifd_tls_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, errp=0xdb58ef28) at ../migration/multifd.c:782
  #19 0xe3394f30 in multifd_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, error=0x0) at ../migration/multifd.c:804
  #20 0xe33950b8 in multifd_new_send_channel_async 
(task=0xea6855a0, opaque=0xea189c30) at ../migration/multifd.c:858
  #21 0xe3810cf8 in qio_task_complete (task=0xea6855a0) at 
../io/task.c:197
  #22 0xe381096c in qio_task_thread_result (opaque=0xea6855a0) at 
../io/task.c:112
  #23 0x88701df8 in ?? () from target:/usr/lib64/libglib-2.0.so.0
  #24 0x88705a7c in g_main_context_dispatch () from 
target:/usr/lib64/libglib-2.0.so.0
  #25 0xe3a5a29c in glib_pollfds_poll () at ../util/main-loop.c:221
  #26 0xe3a5a324 in os_host_main_loop_wait (timeout=0) at 
../util/main-loop.c:244
  #27 0xe3a5a444 in main_loop_wait (nonblocking=0) at 
../util/main-loop.c:520
  #28 0xe3696b20 in qemu_main_loop () at ../softmmu/vl.c:1677
  #29 0xe30949e4 in main (argc=81, argv=0xdb58f2c8, 
envp=0xdb58f558) at ../softmmu/main.c:50
  
  =src live_migration stack=:
  #0  0x87d6a5d8 in pthread_cond_wait () from 
target:/usr/lib64/libpthread.so.0
  #1  0xe3a5f3ec in qemu_sem_wait (sem=0xea189d40) at 
../util/qemu-thread-posix.c:328
  #2  0xe3394838 in multifd_send_sync_main (f=0xe983f0e0) at 
../migration/multifd.c:638
  #3  0xe37de310 in ram_save_setup (f=0xe983f0e0, 
opaqu

[Bug 1902470] Re: migration with TLS-MultiFD is stuck when the dst-libvirtd service restarts

2020-11-01 Thread Yan Jin
** Description changed:

  hi,
  
- I found that the multi-channel TLS-handshake will be stuck when the 
dst-libvirtd restarts, both the src and dst sockets are blocked in recvmsg. In 
the meantime, live_migration thread is blocked in multifd_send_sync_main, so
- migration cannot be cancelled though src-libvirt has delivered the QMP 
command.
+ I found that the multi-channel TLS-handshake will be stuck when the dst-
+ libvirtd restarts, both the src and dst sockets are blocked in recvmsg.
+ In the meantime, live_migration thread is blocked in
+ multifd_send_sync_main, so migration cannot be cancelled though src-
+ libvirt has delivered the QMP command.
  
  Is there any way to exit migration when the multi-channel TLS-handshake
  is stuck? Does setting TLS handshake timeout function take effect?
  
  The stack trace are as follows:
  
  =src qemu-system-aar stack=:
  #0  0x87d6f28c in recvmsg () from target:/usr/lib64/libpthread.so.0
  #1  0xe3817424 in qio_channel_socket_readv (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at 
../io/channel-socket.c:502
  #2  0xe380f468 in qio_channel_readv_full (ioc=0xe9e30a30, 
iov=0xdb58e8a8, niov=1, fds=0x0, nfds=0x0, errp=0x0) at ../io/channel.c:66
  #3  0xe380f9e8 in qio_channel_read (ioc=0xe9e30a30, 
buf=0xea204e9b "\026\003\001\001L\001", buflen=5, errp=0x0) at 
../io/channel.c:217
  #4  0xe380e7d4 in qio_channel_tls_read_handler (buf=0xea204e9b 
"\026\003\001\001L\001", len=5, opaque=0xfffd38001190) at ../io/channel-tls.c:53
  #5  0xe3801114 in qcrypto_tls_session_pull (opaque=0xe99d5700, 
buf=0xea204e9b, len=5) at ../crypto/tlssession.c:89
  #6  0x8822ed30 in _gnutls_stream_read (ms=0xdb58eaac, 
pull_func=0xfffd38001870, size=5, bufel=, 
session=0xe983cd60) at buffers.c:346
  #7  _gnutls_read (ms=0xdb58eaac, pull_func=0xfffd38001870, size=5, 
bufel=, session=0xe983cd60) at buffers.c:426
  #8  _gnutls_io_read_buffered (session=session@entry=0xe983cd60, total=5, 
recv_type=recv_type@entry=4294967295, ms=0xdb58eaac) at buffers.c:581
- #9  0x88224954 in recv_headers (ms=, 
record=0x883cd000 , 
htype=65535, type=2284006288,
- record_params=0xe9e22a60, session=0xe983cd60) at record.c:1163
- #10 _gnutls_recv_in_buffers (session=session@entry=0xe983cd60, 
type=2284006288, type@entry=GNUTLS_HANDSHAKE, htype=65535, 
htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST,
- ms=, ms@entry=0) at record.c:1302
- #11 0x88230568 in _gnutls_handshake_io_recv_int 
(session=session@entry=0xe983cd60, 
htype=htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
hsk=hsk@entry=0xdb58ec38,
- optional=optional@entry=1) at buffers.c:1445
- #12 0x88232b90 in _gnutls_recv_handshake 
(session=session@entry=0xe983cd60, 
type=type@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, optional=optional@entry=1,
- buf=buf@entry=0x0) at handshake.c:1534
+ #9  0x88224954 in recv_headers (ms=, 
record=0x883cd000 , 
htype=65535, type=2284006288, record_params=0xe9e22a60, 
session=0xe983cd60) at record.c:1163
+ #10 _gnutls_recv_in_buffers (session=session@entry=0xe983cd60, 
type=2284006288, type@entry=GNUTLS_HANDSHAKE, htype=65535, 
htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, ms=, 
ms@entry=0) at record.c:1302
+ #11 0x88230568 in _gnutls_handshake_io_recv_int 
(session=session@entry=0xe983cd60, 
htype=htype@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
hsk=hsk@entry=0xdb58ec38, optional=optional@entry=1) at buffers.c:1445
+ #12 0x88232b90 in _gnutls_recv_handshake 
(session=session@entry=0xe983cd60, 
type=type@entry=GNUTLS_HANDSHAKE_HELLO_RETRY_REQUEST, 
optional=optional@entry=1, buf=buf@entry=0x0) at handshake.c:1534
  #13 0x88235b40 in handshake_client 
(session=session@entry=0xe983cd60) at handshake.c:2925
  #14 0x88237824 in gnutls_handshake (session=0xe983cd60) at 
handshake.c:2739
  #15 0xe380213c in qcrypto_tls_session_handshake 
(session=0xe99d5700, errp=0xdb58ee58) at ../crypto/tlssession.c:493
  #16 0xe380ea40 in qio_channel_tls_handshake_task (ioc=0xfffd38001190, 
task=0xea61d4e0, context=0x0) at ../io/channel-tls.c:161
- #17 0xe380ec60 in qio_channel_tls_handshake (ioc=0xfffd38001190, 
func=0xe3394d20 , opaque=0xea189c30, 
destroy=0x0, context=0x0)
- at ../io/channel-tls.c:239
+ #17 0xe380ec60 in qio_channel_tls_handshake (ioc=0xfffd38001190, 
func=0xe3394d20 , opaque=0xea189c30, 
destroy=0x0, context=0x0) at ../io/channel-tls.c:239
  #18 0xe3394e78 in multifd_tls_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, errp=0xdb58ef28) at ../migration/multifd.c:782
  #19 0xe3394f30 in multifd_channel_connect (p=0xea189c30, 
ioc=0xe9e30a30, error=0x0) at ../migration/multifd.c:804
  #20 0xe33950b8 in multifd_new_send