[RFC] virtio_scsi: to poll and kick the virtqueue in timeout handler
This RFC is to trigger the discussion about to poll and kick the virtqueue on purpose in virtio-scsi timeout handler. The virtio-scsi relies on the virtio vring shared between VM and host. The VM side produces requests to vring and kicks the virtqueue, while the host side produces responses to vring and interrupts the VM side. By default the virtio-scsi handler depends on the host timeout handler by BLK_EH_RESET_TIMER to give host a chance to perform EH. However, this is not helpful for the case that the responses are available on vring but the notification from host to VM is lost. Would you mind share your feedback on this idea to poll the vring on purpose in timeout handler to give VM a chance to recover from stalled IO? In addition, how about to kick for an extra time, in case the stalled IO is due to the loss of VM-to-host notification? I have tested the IO can be recovered after interrupt is lost on purpose with below debug patch. https://github.com/finallyjustice/patchset/blob/master/scsi-virtio_scsi-to-lose-an-interrupt-on-purpose.patch In addition, the virtio-blk may implement the timeout handler as well, with the helper function in below patchset from Stefan. https://lore.kernel.org/linux-block/20210520141305.355961-1-stefa...@redhat.com/T/#t Thank you very much! Dongli Zhang Considering there might be loss of interrupt or kick issue, the timeout handler may poll or kick the virtqueue on purpose, in order to recover the IO. If the response is already available on vring, it indicates the host side has already processed the request and it is not helpful by giving host a chance to perform EH. There will be syslog like below by the timeout handler: [ 135.830746] virtio_scsi: Virtio SCSI HBA 0: I/O 196 QID 3 timeout, completion polled Signed-off-by: Dongli Zhang --- drivers/scsi/virtio_scsi.c | 32 1 file changed, 32 insertions(+) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index b9c86a7e3b97..633950b6336a 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -36,6 +36,9 @@ #define VIRTIO_SCSI_EVENT_LEN 8 #define VIRTIO_SCSI_VQ_BASE 2 +static bool __read_mostly timed_out_enabled; +module_param(timed_out_enabled, bool, 0644); + /* Command queue element */ struct virtio_scsi_cmd { struct scsi_cmnd *sc; @@ -732,9 +735,38 @@ static void virtscsi_commit_rqs(struct Scsi_Host *shost, u16 hwq) * The host guarantees to respond to each command, although I/O * latencies might be higher than on bare metal. Reset the timer * unconditionally to give the host a chance to perform EH. + * + * If 'timed_out_enabled' is set, the timeout handler polls or kicks the + * virtqueue on purpose, in order to recover the IO, in case there is loss + * of interrupt or kick issue with virtio. */ static enum blk_eh_timer_return virtscsi_eh_timed_out(struct scsi_cmnd *scmnd) { + struct Scsi_Host *shost; + struct virtio_scsi *vscsi; + struct virtio_scsi_vq *req_vq; + + if (!timed_out_enabled) + return BLK_EH_RESET_TIMER; + + shost = scmnd->device->host; + vscsi = shost_priv(shost); + req_vq = virtscsi_pick_vq_mq(vscsi, scmnd); + + virtscsi_vq_done(vscsi, req_vq, virtscsi_complete_cmd); + + if (test_bit(SCMD_STATE_COMPLETE, &scmnd->state)) { + pr_warn("Virtio SCSI HBA %u: I/O %u QID %d timeout, completion polled\n", + shost->host_no, scmnd->tag, req_vq->vq->index); + return BLK_EH_DONE; + } + + /* +* To kick the virtqueue in case the timeout is due to the loss of +* one prior notification. +*/ + virtqueue_notify(req_vq->vq); + return BLK_EH_RESET_TIMER; } -- 2.17.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[RFC v3] virtio-vsock: add description for datagram type
From: "jiang.wang" Add supports for datagram type for virtio-vsock. Datagram sockets are connectionless and unreliable. To avoid contention with stream and other sockets, add two more virtqueues and a new feature bit to identify if those two new queues exist or not. Also add descriptions for resource management of datagram, which does not use the existing credit update mechanism associated with stream sockets. Signed-off-by: Jiang Wang --- V2: addressed the comments for the previous version. V3: add description for the mergeable receive buffer. virtio-vsock.tex | 137 +-- 1 file changed, 124 insertions(+), 13 deletions(-) diff --git a/virtio-vsock.tex b/virtio-vsock.tex index da7e641..7eb3596 100644 --- a/virtio-vsock.tex +++ b/virtio-vsock.tex @@ -9,14 +9,36 @@ \subsection{Device ID}\label{sec:Device Types / Socket Device / Device ID} \subsection{Virtqueues}\label{sec:Device Types / Socket Device / Virtqueues} \begin{description} -\item[0] rx -\item[1] tx +\item[0] stream rx +\item[1] stream tx +\item[2] datagram rx +\item[3] datagram tx +\item[4] event +\end{description} +The virtio socket device uses 5 queues if feature bit VIRTIO_VSOCK_F_DRGAM is set. Otherwise, it +only uses 3 queues, as the following. + +\begin{description} +\item[0] stream rx +\item[1] stream tx \item[2] event \end{description} +When behavior differs between stream and datagram rx/tx virtqueues +their full names are used. Common behavior is simply described in +terms of rx/tx virtqueues and applies to both stream and datagram +virtqueues. + \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits} -There are currently no feature bits defined for this device. +\begin{description} +\item[VIRTIO_VSOCK_F_DGRAM (0)] Device has support for datagram socket type. +\end{description} + +\begin{description} +\item[VIRTIO_VSOCK_F_MRG_RXBUF (1)] Driver can merge receive buffers. +\end{description} + \subsection{Device configuration layout}\label{sec:Device Types / Socket Device / Device configuration layout} @@ -64,6 +86,8 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op Packets transmitted or received contain a header before the payload: +If feature VIRTIO_VSOCK_F_MRG_RXBUF is not negotiated, use the following header. + \begin{lstlisting} struct virtio_vsock_hdr { le64 src_cid; @@ -79,6 +103,15 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op }; \end{lstlisting} +If feature VIRTIO_VSOCK_F_MRG_RXBUF is negotianted, use the following header. +\begin{lstlisting} +struct virtio_vsock_hdr_mrg_rxbuf { + struct virtio_vsock_hdr hdr; + le16 num_buffers; +}; +\end{lstlisting} + + The upper 32 bits of src_cid and dst_cid are reserved and zeroed. Most packets simply transfer data but control packets are also used for @@ -107,6 +140,9 @@ \subsection{Device Operation}\label{sec:Device Types / Socket Device / Device Op \subsubsection{Virtqueue Flow Control}\label{sec:Device Types / Socket Device / Device Operation / Virtqueue Flow Control} +Flow control applies to stream sockets; datagram sockets do not have +flow control. + The tx virtqueue carries packets initiated by applications and replies to received packets. The rx virtqueue carries packets initiated by the device and replies to previously transmitted packets. @@ -140,12 +176,15 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera consists of a (cid, port number) tuple. The header fields used for this are \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}. -Currently only stream sockets are supported. \field{type} is 1 for stream -socket types. +Currently stream and datagram (dgram) sockets are supported. \field{type} is 1 for stream +socket types. \field{type} is 3 for dgram socket types. Stream sockets provide in-order, guaranteed, connection-oriented delivery without message boundaries. +Datagram sockets provide unordered, unreliable, connectionless messages +with message boundaries and a maximum length. + \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / Device Operation / Buffer Space Management} \field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of stream sockets. The guest and the device publish how much buffer space is @@ -162,7 +201,7 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / u32 peer_free = peer_buf_alloc - (tx_cnt - peer_fwd_cnt); \end{lstlisting} -If there is insufficient buffer space, the sender waits until virtqueue buffers +For stream sockets, if there is insufficient buffer space, the sender waits until virtqueue buffers are returned and checks \field{buf_alloc} and \field{fwd_cnt} again. Sending the VIRTIO_VSOCK_OP_CREDIT_REQUEST packet queries how much buffer space is available. The reply to t
Re: [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts
Hi Greg, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on fuse/for-next] [also build test WARNING on linux/master linus/master v5.13-rc2 next-20210521] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next config: sparc64-randconfig-p002-20210522 (attached as .config) compiler: sparc64-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/ee3cc45c5a2311efc82021bd5463271507bef828 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 git checkout ee3cc45c5a2311efc82021bd5463271507bef828 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=sparc64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): fs/fuse/dir.c: In function 'fuse_dentry_automount': >> fs/fuse/dir.c:312:21: warning: variable 'fm' set but not used >> [-Wunused-but-set-variable] 312 | struct fuse_mount *fm; | ^~ Kconfig warnings: (for reference only) WARNING: unmet direct dependencies detected for FRAME_POINTER Depends on DEBUG_KERNEL && (M68K || UML || SUPERH) || ARCH_WANT_FRAME_POINTERS || MCOUNT Selected by - LOCKDEP && DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT && !MIPS && !PPC && !ARM && !S390 && !MICROBLAZE && !ARC && !X86 vim +/fm +312 fs/fuse/dir.c 8fab010644363f Miklos Szeredi 2018-08-15 303 bf109c64040f5b Max Reitz 2020-04-21 304 /* bf109c64040f5b Max Reitz 2020-04-21 305 * Create a fuse_mount object with a new superblock (with path->dentry bf109c64040f5b Max Reitz 2020-04-21 306 * as the root), and return that mount so it can be auto-mounted on bf109c64040f5b Max Reitz 2020-04-21 307 * @path. bf109c64040f5b Max Reitz 2020-04-21 308 */ bf109c64040f5b Max Reitz 2020-04-21 309 static struct vfsmount *fuse_dentry_automount(struct path *path) bf109c64040f5b Max Reitz 2020-04-21 310 { bf109c64040f5b Max Reitz 2020-04-21 311 struct fs_context *fsc; bf109c64040f5b Max Reitz 2020-04-21 @312 struct fuse_mount *fm; bf109c64040f5b Max Reitz 2020-04-21 313 struct vfsmount *mnt; bf109c64040f5b Max Reitz 2020-04-21 314 struct fuse_inode *mp_fi = get_fuse_inode(d_inode(path->dentry)); bf109c64040f5b Max Reitz 2020-04-21 315 int err; bf109c64040f5b Max Reitz 2020-04-21 316 bf109c64040f5b Max Reitz 2020-04-21 317 fsc = fs_context_for_submount(path->mnt->mnt_sb->s_type, path->dentry); bf109c64040f5b Max Reitz 2020-04-21 318 if (IS_ERR(fsc)) { bf109c64040f5b Max Reitz 2020-04-21 319 err = PTR_ERR(fsc); bf109c64040f5b Max Reitz 2020-04-21 320 goto out; bf109c64040f5b Max Reitz 2020-04-21 321 } bf109c64040f5b Max Reitz 2020-04-21 322 ee3cc45c5a2311 Greg Kurz 2021-05-20 323 /* ee3cc45c5a2311 Greg Kurz 2021-05-20 324* Hijack fsc->fs_private to pass the mount point inode to ee3cc45c5a2311 Greg Kurz 2021-05-20 325* fuse_get_tree_submount(). It *must* be NULLified afterwards ee3cc45c5a2311 Greg Kurz 2021-05-20 326* to avoid the inode pointer to be passed to kfree() when ee3cc45c5a2311 Greg Kurz 2021-05-20 327* the context gets freed. ee3cc45c5a2311 Greg Kurz 2021-05-20 328*/ ee3cc45c5a2311 Greg Kurz 2021-05-20 329 fsc->fs_private = mp_fi; ee3cc45c5a2311 Greg Kurz 2021-05-20 330 err = vfs_get_tree(fsc); ee3cc45c5a2311 Greg Kurz 2021-05-20 331 fsc->fs_private = NULL; ee3cc45c5a2311 Greg Kurz 2021-05-20 332 if (err) bf109c64040f5b Max Reitz 2020-04-21 333 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 334 ee3cc45c5a2311 Greg Kurz 2021-05-20 335 fm = get_fuse_mount_super(fsc->root->d_sb); bf109c64040f5b Max Reitz 2020-04-21 336 bf109c64040f5b Max Reitz 2020-04-21 337 /* Create the submount */ bf109c64040f5b Max Reitz 2020-04-21 338 mnt = vfs_create_mount(fsc); bf109c64040f5b Max Reitz 2020-04-21 339 if (IS_ERR(mnt)) { bf109c64040f5b Max Reitz 2020-04-21 340 err = P
Re: [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts
Hi Greg, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on fuse/for-next] [also build test WARNING on linux/master linus/master v5.13-rc2 next-20210521] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next config: nds32-randconfig-r011-20210522 (attached as .config) compiler: nds32le-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/ee3cc45c5a2311efc82021bd5463271507bef828 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Greg-Kurz/virtiofs-propagate-sync-to-file-server/20210522-210652 git checkout ee3cc45c5a2311efc82021bd5463271507bef828 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=nds32 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): fs/fuse/dir.c: In function 'fuse_dentry_automount': >> fs/fuse/dir.c:312:21: warning: variable 'fm' set but not used >> [-Wunused-but-set-variable] 312 | struct fuse_mount *fm; | ^~ vim +/fm +312 fs/fuse/dir.c 8fab010644363f Miklos Szeredi 2018-08-15 303 bf109c64040f5b Max Reitz 2020-04-21 304 /* bf109c64040f5b Max Reitz 2020-04-21 305 * Create a fuse_mount object with a new superblock (with path->dentry bf109c64040f5b Max Reitz 2020-04-21 306 * as the root), and return that mount so it can be auto-mounted on bf109c64040f5b Max Reitz 2020-04-21 307 * @path. bf109c64040f5b Max Reitz 2020-04-21 308 */ bf109c64040f5b Max Reitz 2020-04-21 309 static struct vfsmount *fuse_dentry_automount(struct path *path) bf109c64040f5b Max Reitz 2020-04-21 310 { bf109c64040f5b Max Reitz 2020-04-21 311 struct fs_context *fsc; bf109c64040f5b Max Reitz 2020-04-21 @312 struct fuse_mount *fm; bf109c64040f5b Max Reitz 2020-04-21 313 struct vfsmount *mnt; bf109c64040f5b Max Reitz 2020-04-21 314 struct fuse_inode *mp_fi = get_fuse_inode(d_inode(path->dentry)); bf109c64040f5b Max Reitz 2020-04-21 315 int err; bf109c64040f5b Max Reitz 2020-04-21 316 bf109c64040f5b Max Reitz 2020-04-21 317 fsc = fs_context_for_submount(path->mnt->mnt_sb->s_type, path->dentry); bf109c64040f5b Max Reitz 2020-04-21 318 if (IS_ERR(fsc)) { bf109c64040f5b Max Reitz 2020-04-21 319 err = PTR_ERR(fsc); bf109c64040f5b Max Reitz 2020-04-21 320 goto out; bf109c64040f5b Max Reitz 2020-04-21 321 } bf109c64040f5b Max Reitz 2020-04-21 322 ee3cc45c5a2311 Greg Kurz 2021-05-20 323 /* ee3cc45c5a2311 Greg Kurz 2021-05-20 324* Hijack fsc->fs_private to pass the mount point inode to ee3cc45c5a2311 Greg Kurz 2021-05-20 325* fuse_get_tree_submount(). It *must* be NULLified afterwards ee3cc45c5a2311 Greg Kurz 2021-05-20 326* to avoid the inode pointer to be passed to kfree() when ee3cc45c5a2311 Greg Kurz 2021-05-20 327* the context gets freed. ee3cc45c5a2311 Greg Kurz 2021-05-20 328*/ ee3cc45c5a2311 Greg Kurz 2021-05-20 329 fsc->fs_private = mp_fi; ee3cc45c5a2311 Greg Kurz 2021-05-20 330 err = vfs_get_tree(fsc); ee3cc45c5a2311 Greg Kurz 2021-05-20 331 fsc->fs_private = NULL; ee3cc45c5a2311 Greg Kurz 2021-05-20 332 if (err) bf109c64040f5b Max Reitz 2020-04-21 333 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 334 ee3cc45c5a2311 Greg Kurz 2021-05-20 335 fm = get_fuse_mount_super(fsc->root->d_sb); bf109c64040f5b Max Reitz 2020-04-21 336 bf109c64040f5b Max Reitz 2020-04-21 337 /* Create the submount */ bf109c64040f5b Max Reitz 2020-04-21 338 mnt = vfs_create_mount(fsc); bf109c64040f5b Max Reitz 2020-04-21 339 if (IS_ERR(mnt)) { bf109c64040f5b Max Reitz 2020-04-21 340 err = PTR_ERR(mnt); bf109c64040f5b Max Reitz 2020-04-21 341 goto out_put_fsc; bf109c64040f5b Max Reitz 2020-04-21 342 } bf109c64040f5b Max Reitz 2020-04-21 343 mntget(mnt); bf109c64040f5b Max Reitz 2020-04-21 344 put_fs_context(fsc); bf109c64040f5b Max Reitz 2020-04-21 345 return mnt; bf109c64040f5b Max Reitz 2020-04-21 346 bf109c64040f5b Max Reitz 2020-04-21