On Fri, Apr 27, 2018 at 5:16 PM, Daniel P. Berrangé <berra...@redhat.com> wrote: > On Fri, Apr 27, 2018 at 03:56:38PM +0800, 858585 jemmy wrote: >> On Fri, Apr 27, 2018 at 1:36 AM, Dr. David Alan Gilbert >> <dgilb...@redhat.com> wrote: >> > * Lidong Chen (jemmy858...@gmail.com) wrote: >> >> This patch implements bi-directional RDMA QIOChannel. Because different >> >> threads may access RDMAQIOChannel concurrently, this patch use RCU to >> >> protect it. >> >> >> >> Signed-off-by: Lidong Chen <lidongc...@tencent.com> >> > >> > I'm a bit confused by this. >> > >> > I can see it's adding RCU to protect the rdma structures against >> > deletion from multiple threads; that I'm OK with in principal; is that >> > the only locking we need? (I guess the two directions are actually >> > separate RDMAContext's so maybe). >> >> The qio_channel_rdma_close maybe invoked by migration thread and >> return path thread >> concurrently, so I use a mutex to protect it. > > Hmm, that is not good - concurrent threads calling close must not be > allowed to happen even with non-RDMA I/O chanels. > > For example, with the QIOChannelSocket, one thread can call close > which sets the fd = -1, another thread can race with this and either > end up calling close again on the same FD or calling close on -1. > Either way the second thread will get an error from close() when > it should have skipped the close() and returned success. Perhaps > migration gets lucky and this doesn't result in it being marked > as failed, but it is still not good. > > So only one thread should be calling close().
for live migration, source qemu invokes qemu_fclose in different threads, include main thread, migration thread, return path thread. destination qemu invokes qemu_fclose in main thread, listen thread and COLO incoming thread. so I prefer to add a lock for QEMUFile struct, like this: int qemu_fclose(QEMUFile *f) { int ret; qemu_fflush(f); ret = qemu_file_get_error(f); if (f->ops->close) { + qemu_mutex_lock(&f->lock); int ret2 = f->ops->close(f->opaque); + qemu_mutex_unlock(&f->lock); if (ret >= 0) { ret = ret2; } } /* If any error was spotted before closing, we should report it * instead of the close() return value. */ if (f->last_error) { ret = f->last_error; } g_free(f); trace_qemu_file_fclose(); return ret; } Any suggestion? > >> If one thread invoke qio_channel_rdma_writev, another thread invokes >> qio_channel_rdma_readv, >> two threads will use separate RDMAContext, so it does not need a lock. >> >> If two threads invoke qio_channel_rdma_writev concurrently, it will >> need a lock to protect. >> but I find source qemu migration thread only invoke >> qio_channel_rdma_writev, the return path >> thread only invokes qio_channel_rdma_readv. > > QIOChannel impls are only intended to cope with a single thread doing > I/O in each direction. If you have two threads needing to read, or > two threads needing to write, the layer above should provide locking > to ensure correct ordering of I/O oprations. yes, so I think RCU is enough, we do not need more lock. > >> The destination qemu only invoked qio_channel_rdma_readv by main >> thread before postcopy and or >> listen thread after postcopy. >> >> The destination qemu have already protected it by using >> qemu_mutex_lock(&mis->rp_mutex) when writing data to >> source qemu. >> >> But should we use qemu_mutex_lock to protect qio_channel_rdma_writev >> and qio_channel_rdma_readv? >> to avoid some change in future invoke qio_channel_rdma_writev or >> qio_channel_rdma_readv concurrently? > >> >> > >> > But is there nothing else to make the QIOChannel bidirectional? >> > >> > Also, a lot seems dependent on listen_id, can you explain how that's >> > being used. >> >> The destination qemu is server side, so listen_id is not zero. the >> source qemu is client side, >> the listen_id is zero. >> I use listen_id to determine whether qemu is destination or source. >> >> for the destination qemu, if write data to source, it need use the >> return_path rdma, like this: >> if (rdma->listen_id) { >> rdma = rdma->return_path; >> } >> >> for the source qemu, if read data from destination, it also need use >> the return_path rdma. >> if (!rdma->listen_id) { >> rdma = rdma->return_path; >> } > > This feels uncessarily complex to me. Why not just change QIOCHannelRMDA > struct, so that it has 2 RDMA context pointers eg > > struct QIOChannelRDMA { > QIOChannel parent; > RDMAContext *rdmain; > RDMAContext *rdmaout; > QEMUFile *file; > bool blocking; /* XXX we don't actually honour this yet */ > }; The reason is the parameter of some function is RDMAContext. like qemu_rdma_accept, rdma_start_outgoing_migration. It's easier to implement return path. so I should add some comment to the code. > > > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|