Hi Gaoxiang, > -----Original Message----- > From: dev <dev-boun...@dpdk.org> On Behalf Of Xia, Chenbo > Sent: Monday, September 6, 2021 11:18 AM > To: Gaoxiang Liu <gaoxiangl...@163.com> > Cc: maxime.coque...@redhat.com; dev@dpdk.org; liugaoxi...@huawei.com > Subject: Re: [dpdk-dev] [PATCH v6] vhost: fix crash on port deletion > > Hi Gaoxiang, > > > > > > >From: Gaoxiang Liu <gaoxiangl...@163.com> > >Sent: Thursday, September 2, 2021 11:38 PM > >To: Xia, Chenbo <chenbo....@intel.com> > >Cc: maxime.coque...@redhat.com; dev@dpdk.org; liugaoxi...@huawei.com > >Subject: Re:RE: [PATCH v6] vhost: fix crash on port deletion > > > > > >Hi chenbo, > >why this is not moved up? > >>> + if (vsocket->is_server) { > >>> + close(vsocket->socket_fd); > >>> + unlink(path); > >>> } > >==>Because if this is moved up, and if deleting conn fd from fdsets failed, > >it will arrive the "again" label, then close vsocket->socket_fd and uplink > "path" again. > >it's not necessary. > >And closing socket_fd does not trigger vhost_user_server_new_connection. > > But same issue happens when you deleted 'vsocket->socket_fd' but failed to > delete one > of the conn_fd: you will goto again and try to delete socket_fd again and then > loop > forever. So anyway you need to prevent this from happening.
Please ignore this. I thought delete socket_fd again will return -1. So I am fine with putting it later otherwise we have to introduce a flag to know if Socket_fd deletion happened once. Thanks, Chenbo > > Thanks, > Chenbo > > > > >Thanks. > >Gaoxiang > > At 2021-08-31 13:37:38, "Xia, Chenbo" <mailto:chenbo....@intel.com> wrote: > >Hi Gaoxiang, > > > >> -----Original Message----- > >> From: Gaoxiang Liu <mailto:gaoxiangl...@163.com> > >> Sent: Friday, August 27, 2021 10:19 PM > >> To: mailto:maxime.coque...@redhat.com; Xia, Chenbo > <mailto:chenbo....@intel.com> > >> Cc: mailto:dev@dpdk.org; mailto:liugaoxi...@huawei.com; Gaoxiang Liu > <mailto:gaoxiangl...@163.com> > >> Subject: [PATCH v6] vhost: fix crash on port deletion > >> > >> The rte_vhost_driver_unregister() and vhost_user_read_cb() > >> can be called at the same time by 2 threads. > >> when memory of vsocket is freed in rte_vhost_driver_unregister(), > >> the invalid memory of vsocket is accessed in vhost_user_read_cb(). > >> It's a bug of both mode for vhost as server or client. > >> > >> E.g.,vhostuser port is created as server. > > > >Put a space after ',' > > > >> Thread1 calls rte_vhost_driver_unregister(). > >> Before the listen fd is deleted from poll waiting fds, > >> "vhost-events" thread then calls vhost_user_server_new_connection(), > >> then a new conn fd is added in fdset when trying to reconnect. > >> "vhost-events" thread then calls vhost_user_read_cb() and > >> accesses invalid memory of socket while thread1 frees the memory of > >> vsocket. > >> > >> E.g.,vhostuser port is created as client. > > > >Same here. > > > >> Thread1 calls rte_vhost_driver_unregister(). > >> Before vsocket of reconn is deleted from reconn list, > >> "vhost_reconn" thread then calls vhost_user_add_connection() > >> then a new conn fd is added in fdset when trying to reconnect. > >> "vhost-events" thread then calls vhost_user_read_cb() and > >> accesses invalid memory of socket while thread1 frees the memory of > >> vsocket. > >> > >> The fix is to move the "fdset_try_del" in front of free memory of conn, > >> then avoid the race condition. > >> > >> The core trace is: > >> Program terminated with signal 11, Segmentation fault. > >> > >> Fixes: 52d874dc6705 ("vhost: fix crash on closing in client mode") > >> > >> Signed-off-by: Gaoxiang Liu <mailto:liugaoxi...@huawei.com> > >> --- > >> > >> v2: > >> * Fix coding style issues. > >> > >> v3: > >> * Add detailed log. > >> > >> v4: > >> * Add the reason, when vhostuser port is created as server. > >> > >> v5: > >> * Add detailed log when vhostuser port is created as client > >> > >> v6: > >> * Add 'path' check before deleting listen fd > >> * Fix spelling issues > >> --- > >> lib/vhost/socket.c | 108 ++++++++++++++++++++++----------------------- > >> 1 file changed, 54 insertions(+), 54 deletions(-) > >> > >> diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c > >> index 5d0d728d5..27d5e8695 100644 > >> --- a/lib/vhost/socket.c > >> +++ b/lib/vhost/socket.c > >> @@ -1023,66 +1023,66 @@ rte_vhost_driver_unregister(const char *path) > >> > >> for (i = 0; i < vhost_user.vsocket_cnt; i++) { > >> struct vhost_user_socket *vsocket = vhost_user.vsockets[i]; > >> + if (strcmp(vsocket->path, path)) { > >> + continue; > >> + } > > > >braces {} are not necessary for single statement blocks > > > >> > >> - if (!strcmp(vsocket->path, path)) { > >> - pthread_mutex_lock(&vsocket->conn_mutex); > >> - for (conn = TAILQ_FIRST(&vsocket->conn_list); > >> - conn != NULL; > >> - conn = next) { > >> - next = TAILQ_NEXT(conn, next); > >> - > >> - /* > >> - * If r/wcb is executing, release vsocket's > >> - * conn_mutex and vhost_user's mutex locks, and > >> - * try again since the r/wcb may use the > >> - * conn_mutex and mutex locks. > >> - */ > >> - if (fdset_try_del(&vhost_user.fdset, > >> - conn->connfd) == -1) { > >> - pthread_mutex_unlock( > >> - &vsocket->conn_mutex); > >> - pthread_mutex_unlock(&vhost_user.mutex); > >> - goto again; > >> - } > >> - > >> - VHOST_LOG_CONFIG(INFO, > >> - "free connfd = %d for device '%s'\n", > >> - conn->connfd, path); > >> - close(conn->connfd); > >> - vhost_destroy_device(conn->vid); > >> - TAILQ_REMOVE(&vsocket->conn_list, conn, next); > >> - free(conn); > >> - } > >> - pthread_mutex_unlock(&vsocket->conn_mutex); > >> - > >> - if (vsocket->is_server) { > >> - /* > >> - * If r/wcb is executing, release vhost_user's > >> - * mutex lock, and try again since the r/wcb > >> - * may use the mutex lock. > >> - */ > >> - if (fdset_try_del(&vhost_user.fdset, > >> - vsocket->socket_fd) == -1) { > >> - pthread_mutex_unlock(&vhost_user.mutex); > >> - goto again; > >> - } > >> - > >> - close(vsocket->socket_fd); > >> - unlink(path); > >> - } else if (vsocket->reconnect) { > >> - vhost_user_remove_reconnect(vsocket); > >> + if (vsocket->is_server) { > >> + /* > >> + * If r/wcb is executing, release vhost_user's > >> + * mutex lock, and try again since the r/wcb > >> + * may use the mutex lock. > >> + */ > >> + if (fdset_try_del(&vhost_user.fdset, > >> vsocket->socket_fd) == > >> -1) { > >> + pthread_mutex_unlock(&vhost_user.mutex); > >> + goto again; > >> } > >> + } else if (vsocket->reconnect) { > >> + vhost_user_remove_reconnect(vsocket); > >> + } > >> > >> - pthread_mutex_destroy(&vsocket->conn_mutex); > >> - vhost_user_socket_mem_free(vsocket); > >> + pthread_mutex_lock(&vsocket->conn_mutex); > >> + for (conn = TAILQ_FIRST(&vsocket->conn_list); > >> + conn != NULL; > >> + conn = next) { > >> + next = TAILQ_NEXT(conn, next); > >> > >> - count = --vhost_user.vsocket_cnt; > >> - vhost_user.vsockets[i] = vhost_user.vsockets[count]; > >> - vhost_user.vsockets[count] = NULL; > >> - pthread_mutex_unlock(&vhost_user.mutex); > >> + /* > >> + * If r/wcb is executing, release vsocket's > >> + * conn_mutex and vhost_user's mutex locks, and > >> + * try again since the r/wcb may use the > >> + * conn_mutex and mutex locks. > >> + */ > >> + if (fdset_try_del(&vhost_user.fdset, > >> + conn->connfd) == -1) { > >> + pthread_mutex_unlock(&vsocket->conn_mutex); > >> + pthread_mutex_unlock(&vhost_user.mutex); > >> + goto again; > >> + } > >> + > >> + VHOST_LOG_CONFIG(INFO, > >> + "free connfd = %d for device '%s'\n", > >> + conn->connfd, path); > >> + close(conn->connfd); > >> + vhost_destroy_device(conn->vid); > >> + TAILQ_REMOVE(&vsocket->conn_list, conn, next); > >> + free(conn); > >> + } > >> + pthread_mutex_unlock(&vsocket->conn_mutex); > >> > >> - return 0; > >> + if (vsocket->is_server) { > >> + close(vsocket->socket_fd); > >> + unlink(path); > >> } > > > >I think you miss my comment in V5 of asking why this is not moved up after > >fdset_try_del server socket fd. > > > >Thanks, > >Chenbo > > > >> + > >> + pthread_mutex_destroy(&vsocket->conn_mutex); > >> + vhost_user_socket_mem_free(vsocket); > >> + > >> + count = --vhost_user.vsocket_cnt; > >> + vhost_user.vsockets[i] = vhost_user.vsockets[count]; > >> + vhost_user.vsockets[count] = NULL; > >> + pthread_mutex_unlock(&vhost_user.mutex); > >> + return 0; > >> } > >> pthread_mutex_unlock(&vhost_user.mutex); > >> > >> -- > >> 2.32.0 > >> > > > >