Hi,

Apologize for late review.

On Tue, Jan 02, 2018 at 02:08:36AM -0800, zhike wang wrote:
> From: wang zhike <wangzh...@jd.com>
> 
> v3:
> * Fix duplicate variable name, which leads to unexpected memory write.
> v2:
> * Move fdset_del before conn destroy.
> * Fix coding style.

Note that we prefer to put the change logs after "---" below Signed-off-by,
so that those change logs won't be tracked in the git log history.

> This patch fixes below race condition:
> 1. one thread calls: rte_vhost_driver_unregister->lock conn_mutex
>    ->fdset_del->loop to check fd.busy.
> 2. another thread calls fdset_event_dispatch, and the busy flag is
>    changed AFTER handling on the fd, i.e, rcb(). However, the rcb,
>    such as vhost_user_read_cb() would try to retrieve the conn_mutex.
> 
> So issue is that the 1st thread will loop check the flag while holding
> the mutex, while the 2nd thread would be blocked by mutex and can not
> change the flag. Then dead lock is observed.

I then would change the title to "vhost: fix deadlock".

I'm also keen to know how do you reproduce this issue with real-life
APP (say ovs) and how easy it is for reproduce.

> Signed-off-by: zhike wang <wangzh...@jd.com>

Again, you need fix your git config file about your name.

> ---
>  lib/librte_vhost/socket.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
> index 422da00..ea01327 100644
> --- a/lib/librte_vhost/socket.c
> +++ b/lib/librte_vhost/socket.c
> @@ -749,6 +749,9 @@ struct vhost_user_reconnect_list {
>               struct vhost_user_socket *vsocket = vhost_user.vsockets[i];
>  
>               if (!strcmp(vsocket->path, path)) {
> +                     int del_fds[MAX_FDS];
> +                     int num_of_fds = 0, fd_index;
> +

I think the naming could be a bit shorter, like "fds, nr_fds (or nb_fds),
fd_idx".

>                       if (vsocket->is_server) {
>                               fdset_del(&vhost_user.fdset, 
> vsocket->socket_fd);
>                               close(vsocket->socket_fd);
> @@ -757,13 +760,26 @@ struct vhost_user_reconnect_list {
>                               vhost_user_remove_reconnect(vsocket);
>                       }
>  
> +                     /* fdset_del() must be called without conn_mutex. */
> +                     pthread_mutex_lock(&vsocket->conn_mutex);
> +                     for (conn = TAILQ_FIRST(&vsocket->conn_list);
> +                          conn != NULL;
> +                          conn = next) {
> +                             next = TAILQ_NEXT(conn, next);
> +
> +                             del_fds[num_of_fds++] = conn->connfd;
> +                     }
> +                     pthread_mutex_unlock(&vsocket->conn_mutex);
> +
> +                     for (fd_index = 0; fd_index < num_of_fds; fd_index++)
> +                             fdset_del(&vhost_user.fdset, del_fds[fd_index]);
> +
>                       pthread_mutex_lock(&vsocket->conn_mutex);
>                       for (conn = TAILQ_FIRST(&vsocket->conn_list);
>                            conn != NULL;
>                            conn = next) {
>                               next = TAILQ_NEXT(conn, next);
>  
> -                             fdset_del(&vhost_user.fdset, conn->connfd);

If you log the fd here and invoke fdset_del() and close() after the loop,
you then could avoid one extra loop as you did above.

        --yliu
>                               RTE_LOG(INFO, VHOST_CONFIG,
>                                       "free connfd = %d for device '%s'\n",
>                                       conn->connfd, path);
> -- 
> 1.8.3.1

Reply via email to