Hi, Apologize for late review.
On Tue, Jan 02, 2018 at 02:08:36AM -0800, zhike wang wrote: > From: wang zhike <wangzh...@jd.com> > > v3: > * Fix duplicate variable name, which leads to unexpected memory write. > v2: > * Move fdset_del before conn destroy. > * Fix coding style. Note that we prefer to put the change logs after "---" below Signed-off-by, so that those change logs won't be tracked in the git log history. > This patch fixes below race condition: > 1. one thread calls: rte_vhost_driver_unregister->lock conn_mutex > ->fdset_del->loop to check fd.busy. > 2. another thread calls fdset_event_dispatch, and the busy flag is > changed AFTER handling on the fd, i.e, rcb(). However, the rcb, > such as vhost_user_read_cb() would try to retrieve the conn_mutex. > > So issue is that the 1st thread will loop check the flag while holding > the mutex, while the 2nd thread would be blocked by mutex and can not > change the flag. Then dead lock is observed. I then would change the title to "vhost: fix deadlock". I'm also keen to know how do you reproduce this issue with real-life APP (say ovs) and how easy it is for reproduce. > Signed-off-by: zhike wang <wangzh...@jd.com> Again, you need fix your git config file about your name. > --- > lib/librte_vhost/socket.c | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c > index 422da00..ea01327 100644 > --- a/lib/librte_vhost/socket.c > +++ b/lib/librte_vhost/socket.c > @@ -749,6 +749,9 @@ struct vhost_user_reconnect_list { > struct vhost_user_socket *vsocket = vhost_user.vsockets[i]; > > if (!strcmp(vsocket->path, path)) { > + int del_fds[MAX_FDS]; > + int num_of_fds = 0, fd_index; > + I think the naming could be a bit shorter, like "fds, nr_fds (or nb_fds), fd_idx". > if (vsocket->is_server) { > fdset_del(&vhost_user.fdset, > vsocket->socket_fd); > close(vsocket->socket_fd); > @@ -757,13 +760,26 @@ struct vhost_user_reconnect_list { > vhost_user_remove_reconnect(vsocket); > } > > + /* fdset_del() must be called without conn_mutex. */ > + pthread_mutex_lock(&vsocket->conn_mutex); > + for (conn = TAILQ_FIRST(&vsocket->conn_list); > + conn != NULL; > + conn = next) { > + next = TAILQ_NEXT(conn, next); > + > + del_fds[num_of_fds++] = conn->connfd; > + } > + pthread_mutex_unlock(&vsocket->conn_mutex); > + > + for (fd_index = 0; fd_index < num_of_fds; fd_index++) > + fdset_del(&vhost_user.fdset, del_fds[fd_index]); > + > pthread_mutex_lock(&vsocket->conn_mutex); > for (conn = TAILQ_FIRST(&vsocket->conn_list); > conn != NULL; > conn = next) { > next = TAILQ_NEXT(conn, next); > > - fdset_del(&vhost_user.fdset, conn->connfd); If you log the fd here and invoke fdset_del() and close() after the loop, you then could avoid one extra loop as you did above. --yliu > RTE_LOG(INFO, VHOST_CONFIG, > "free connfd = %d for device '%s'\n", > conn->connfd, path); > -- > 1.8.3.1