On Wed, Jul 17, 2024 at 09:40:06AM -0400, Michael S. Tsirkin wrote: > On Wed, Jul 17, 2024 at 09:33:01AM -0400, Peter Xu wrote: > > Hi, Michael, > > > > On Wed, Jul 17, 2024 at 04:55:52AM -0400, Michael S. Tsirkin wrote: > > > I just want to understand how we managed to have two threads > > > talking in parallel. BQL is normally enough, which path > > > manages to invoke vhost-user with BQL not taken? > > > Just check BQL taken on each vhost user invocation and > > > you will figure it out. > > > > Prasad mentioned how the race happened in the cover letter: > > > > https://lore.kernel.org/r/20240711131424.181615-1-ppan...@redhat.com > > > > Thread-1 Thread-2 > > > > vhost_dev_start postcopy_ram_incoming_cleanup > > vhost_device_iotlb_miss postcopy_notify > > vhost_backend_update_device_iotlb vhost_user_postcopy_notifier > > vhost_user_send_device_iotlb_msg vhost_user_postcopy_end > > process_message_reply process_message_reply > > vhost_user_read vhost_user_read > > vhost_user_read_header vhost_user_read_header > > "Fail to update device iotlb" "Failed to receive reply to > > postcopy_end" > > > > The normal case should be that thread-2 is postcopy_ram_listen_thread(), > > and this happens when postcopy migration is close to the end. > > > > Thanks, > > > > -- > > Peter Xu > > > OK, so postcopy_ram_ things run without the BQL?
There are a lot of postcopy_ram_* functions, I didn't check all of them but I think it's true in this case. Thanks. -- Peter Xu