Re: [PATCH 1/1] vhost scsi: fix lun reset completion handling

2020-11-18 Thread Stefan Hajnoczi
On Wed, Nov 18, 2020 at 12:24:20AM -0600, Mike Christie wrote:
> vhost scsi owns the scsi se_cmd but lio frees the se_cmd->se_tmr
> before calling release_cmd, so while with normal cmd completion we
> can access the se_cmd from the vhost work, we can't do the same with
> se_cmd->se_tmr. This has us copy the tmf response in
> vhost_scsi_queue_tm_rsp to our internal vhost-scsi tmf struct for
> when it gets sent to the guest from our worker thread.
> 
> Signed-off-by: Mike Christie 
> ---
>  drivers/vhost/scsi.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Acked-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vhost scsi: fix lun reset completion handling

2020-11-18 Thread Michael S. Tsirkin
On Wed, Nov 18, 2020 at 12:24:20AM -0600, Mike Christie wrote:
> vhost scsi owns the scsi se_cmd but lio frees the se_cmd->se_tmr
> before calling release_cmd, so while with normal cmd completion we
> can access the se_cmd from the vhost work, we can't do the same with
> se_cmd->se_tmr. This has us copy the tmf response in
> vhost_scsi_queue_tm_rsp to our internal vhost-scsi tmf struct for
> when it gets sent to the guest from our worker thread.
> 
> Signed-off-by: Mike Christie 

Is this a fix for
vhost scsi: Add support for LUN resets.

If so pls add a Fixes: tag.

> ---
>  drivers/vhost/scsi.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index f22fce5..6ff8a5096 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -220,6 +220,7 @@ struct vhost_scsi_tmf {
>   struct list_head queue_entry;
>  
>   struct se_cmd se_cmd;
> + u8 scsi_resp;
>   struct vhost_scsi_inflight *inflight;
>   struct iovec resp_iov;
>   int in_iovs;
> @@ -426,6 +427,7 @@ static void vhost_scsi_queue_tm_rsp(struct se_cmd *se_cmd)
>   struct vhost_scsi_tmf *tmf = container_of(se_cmd, struct vhost_scsi_tmf,
> se_cmd);
>  
> + tmf->scsi_resp = se_cmd->se_tmr_req->response;
>   transport_generic_free_cmd(&tmf->se_cmd, 0);
>  }
>  
> @@ -1183,7 +1185,7 @@ static void vhost_scsi_tmf_resp_work(struct vhost_work 
> *work)
> vwork);
>   int resp_code;
>  
> - if (tmf->se_cmd.se_tmr_req->response == TMR_FUNCTION_COMPLETE)
> + if (tmf->scsi_resp == TMR_FUNCTION_COMPLETE)
>   resp_code = VIRTIO_SCSI_S_FUNCTION_SUCCEEDED;
>   else
>   resp_code = VIRTIO_SCSI_S_FUNCTION_REJECTED;
> -- 
> 1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 27/29] mm/memory_hotplug: extend offline_and_remove_memory() to handle more than one memory block

2020-11-18 Thread David Hildenbrand

On 18.11.20 05:53, Andrew Morton wrote:

On Thu, 12 Nov 2020 14:38:13 +0100 David Hildenbrand  wrote:


virtio-mem soon wants to use offline_and_remove_memory() memory that
exceeds a single Linux memory block (memory_block_size_bytes()). Let's
remove that restriction.

Let's remember the old state and try to restore that if anything goes
wrong. While re-onlining can, in general, fail, it's highly unlikely to
happen (usually only when a notifier fails to allocate memory, and these
are rather rare).

This will be used by virtio-mem to offline+remove memory ranges that are
bigger than a single memory block - for example, with a device block
size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory
block size of 128MB.

While we could compress the state into 2 bit, using 8 bit is much
easier.

This handling is similar, but different to acpi_scan_try_to_offline():

a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG
optimization is still relevant - it should only apply to ZONE_NORMAL
(where we have no guarantees). If relevant, we can always add it.

b) acpi_scan_try_to_offline() simply onlines all memory in case
something goes wrong. It doesn't restore previous online type. Let's do
that, so we won't overwrite what e.g., user space configured.

...



uint8_t is a bit of a mouthful.  u8 is less typing ;)  Doesn't matter.


In case I have to resend, I'll change it :)



Acked-by: Andrew Morton 


Thanks!


--
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] vhost_vdpa: Return -EFUALT if copy_from_user() fails

2020-11-18 Thread Michael S. Tsirkin
On Wed, Nov 18, 2020 at 02:08:17PM +0800, Jason Wang wrote:
> 
> On 2020/10/26 上午10:59, Jason Wang wrote:
> > 
> > On 2020/10/23 下午11:34, Michael S. Tsirkin wrote:
> > > On Fri, Oct 23, 2020 at 03:08:53PM +0300, Dan Carpenter wrote:
> > > > The copy_to/from_user() functions return the number of bytes which we
> > > > weren't able to copy but the ioctl should return -EFAULT if they fail.
> > > > 
> > > > Fixes: a127c5bbb6a8 ("vhost-vdpa: fix backend feature ioctls")
> > > > Signed-off-by: Dan Carpenter 
> > > Acked-by: Michael S. Tsirkin 
> > > Needed for stable I guess.
> > 
> > 
> > Agree.
> > 
> > Acked-by: Jason Wang 
> 
> 
> Hi Michael.
> 
> I don't see this in your tree, please consider to merge.
> 
> Thanks
> 

I do see it there:

commit 7922460e33c81f41e0d2421417228b32e6fdbe94
Author: Dan Carpenter 
Date:   Fri Oct 23 15:08:53 2020 +0300

vhost_vdpa: Return -EFAULT if copy_from_user() fails

the reason you can't find it is probably because I fixed up
a typo in the subject.


-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 00/29] virtio-mem: Big Block Mode (BBM)

2020-11-18 Thread Michael S. Tsirkin
On Thu, Nov 12, 2020 at 02:37:46PM +0100, David Hildenbrand wrote:
> @Andrew, can we have an ack for patch #27, so that one can go via
> the vhost tree for 5.11?

OK, we got an ack, I'll put it in next now.
Thanks!

> ---
> 
> virtio-mem currently only supports device block sizes that span at most
> a single Linux memory block. For example, gigantic pages in the hypervisor
> result on x86-64 in a device block size of 1 GiB - when the Linux memory
> block size is 128 MiB, we cannot support such devices (we fail loading the
> driver). Of course, we want to support any device block size in any Linux
> VM.
> 
> Bigger device block sizes will become especially important once supporting
> VFIO in QEMU - each device block has to be mapped separately, and the
> maximum number of mappings for VFIO is 64k. So we usually want blocks in
> the gigabyte range when wanting to grow the VM big.
> 
> Patch #1 - #10 are cleanups and optimizations
> Patch #11 - #24 are refactorings to prepare for BBM
> Patch #25 - #29 implement BBM, including one mm/memory_hotplug extension
> 
> This series is based on latest linus/master and can be found at:
>  g...@github.com:davidhildenbrand/linux.git virtio-mem-bbm-v2
> 
> v1 -> v2:
> - Code wise, the only bigger change is using an union for sbm/bbm state
> - Reworked some subjects/patch descriptions
> - Reshuffled patches to make reviweing easier, and to have
>   cleanups+optimizations before all refactorings
> - "virtio-mem: more precise calculation in
>virtio_mem_mb_state_prepare_next_mb()"
> -- Changed subject
> -- Avoid two local variables
> 
> David Hildenbrand (29):
>   virtio-mem: determine nid only once using memory_add_physaddr_to_nid()
>   virtio-mem: more precise calculation in
> virtio_mem_mb_state_prepare_next_mb()
>   virtio-mem: simplify MAX_ORDER - 1 / pageblock_order handling
>   virtio-mem: drop rc2 in virtio_mem_mb_plug_and_add()
>   virtio-mem: use "unsigned long" for nr_pages when fake
> onlining/offlining
>   virtio-mem: factor out calculation of the bit number within the
> subblock bitmap
>   virtio-mem: print debug messages from virtio_mem_send_*_request()
>   virtio-mem: factor out fake-offlining into virtio_mem_fake_offline()
>   virtio-mem: factor out handling of fake-offline pages in memory
> notifier
>   virtio-mem: retry fake-offlining via alloc_contig_range() on
> ZONE_MOVABLE
>   virtio-mem: generalize check for added memory
>   virtio-mem: generalize virtio_mem_owned_mb()
>   virtio-mem: generalize virtio_mem_overlaps_range()
>   virtio-mem: drop last_mb_id
>   virtio-mem: don't always trigger the workqueue when offlining memory
>   virtio-mem: generalize handling when memory is getting onlined
> deferred
>   virito-mem: document Sub Block Mode (SBM)
>   virtio-mem: memory block states are specific to Sub Block Mode (SBM)
>   virito-mem: subblock states are specific to Sub Block Mode (SBM)
>   virtio-mem: nb_sb_per_mb and subblock_size are specific to Sub Block
> Mode (SBM)
>   virtio-mem: memory block ids are specific to Sub Block Mode (SBM)
>   virito-mem: existing (un)plug functions are specific to Sub Block Mode
> (SBM)
>   virtio-mem: memory notifier callbacks are specific to Sub Block Mode
> (SBM)
>   virtio-mem: factor out adding/removing memory from Linux
>   virtio-mem: Big Block Mode (BBM) memory hotplug
>   virtio-mem: allow to force Big Block Mode (BBM) and set the big block
> size
>   mm/memory_hotplug: extend offline_and_remove_memory() to handle more
> than one memory block
>   virtio-mem: Big Block Mode (BBM) - basic memory hotunplug
>   virtio-mem: Big Block Mode (BBM) - safe memory hotunplug
> 
>  drivers/virtio/virtio_mem.c | 1789 +--
>  mm/memory_hotplug.c |  105 +-
>  2 files changed, 1376 insertions(+), 518 deletions(-)
> 
> -- 
> 2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] vhost_vdpa: Return -EFUALT if copy_from_user() fails

2020-11-18 Thread Jason Wang


On 2020/11/18 下午4:59, Michael S. Tsirkin wrote:

On Wed, Nov 18, 2020 at 02:08:17PM +0800, Jason Wang wrote:

On 2020/10/26 上午10:59, Jason Wang wrote:

On 2020/10/23 下午11:34, Michael S. Tsirkin wrote:

On Fri, Oct 23, 2020 at 03:08:53PM +0300, Dan Carpenter wrote:

The copy_to/from_user() functions return the number of bytes which we
weren't able to copy but the ioctl should return -EFAULT if they fail.

Fixes: a127c5bbb6a8 ("vhost-vdpa: fix backend feature ioctls")
Signed-off-by: Dan Carpenter 

Acked-by: Michael S. Tsirkin 
Needed for stable I guess.


Agree.

Acked-by: Jason Wang 


Hi Michael.

I don't see this in your tree, please consider to merge.

Thanks


I do see it there:

commit 7922460e33c81f41e0d2421417228b32e6fdbe94
Author: Dan Carpenter 
Date:   Fri Oct 23 15:08:53 2020 +0300

 vhost_vdpa: Return -EFAULT if copy_from_user() fails
 
the reason you can't find it is probably because I fixed up

a typo in the subject.



I see that.

Thanks







___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 00/10] vhost/qemu: thread per IO SCSI vq

2020-11-18 Thread Michael S. Tsirkin
On Tue, Nov 17, 2020 at 01:13:14PM -0600, Mike Christie wrote:
> On 11/17/20 10:40 AM, Stefan Hajnoczi wrote:
> > On Thu, Nov 12, 2020 at 05:18:59PM -0600, Mike Christie wrote:
> >> The following kernel patches were made over Michael's vhost branch:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/log/?h=vhost
> >>
> >> and the vhost-scsi bug fix patchset:
> >>
> >> https://lore.kernel.org/linux-scsi/20201112170008.GB1555653@stefanha-x1.localdomain/T/#t
> >>
> >> And the qemu patch was made over the qemu master branch.
> >>
> >> vhost-scsi currently supports multiple queues with the num_queues
> >> setting, but we end up with a setup where the guest's scsi/block
> >> layer can do a queue per vCPU and the layers below vhost can do
> >> a queue per CPU. vhost-scsi will then do a num_queue virtqueues,
> >> but all IO gets set on and completed on a single vhost-scsi thread.
> >> After 2 - 4 vqs this becomes a bottleneck.
> >>
> >> This patchset allows us to create a worker thread per IO vq, so we
> >> can better utilize multiple CPUs with the multiple queues. It
> >> implments Jason's suggestion to create the initial worker like
> >> normal, then create the extra workers for IO vqs with the
> >> VHOST_SET_VRING_ENABLE ioctl command added in this patchset.
> > 
> > How does userspace find out the tids and set their CPU affinity?
> > 
> 
> When we create the worker thread we add it to the device owner's cgroup,
> so we end up inheriting those settings like affinity.
> 
> However, are you more asking about finer control like if the guest is
> doing mq, and the mq hw queue is bound to cpu0, it would perform
> better if we could bind vhost vq's worker thread to cpu0? I think the
> problem might is if you are in the cgroup then we can't set a specific
> threads CPU affinity to just one specific CPU. So you can either do
> cgroups or not.

Something we wanted to try for a while is to allow userspace
to create threads for us, then specify which vqs it processes.

That would address this set of concerns ...

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH virtio] virtio: virtio_console: fix DMA memory allocation for rproc serial

2020-11-18 Thread Michael S. Tsirkin
On Tue, Nov 17, 2020 at 02:02:30PM +, Christoph Hellwig wrote:
> On Tue, Nov 17, 2020 at 03:00:32PM +0100, Arnaud POULIQUEN wrote:
> > The dma_declare_coherent_memory allows to associate vdev0buffer memory 
> > region
> > to the remoteproc virtio device (vdev parent). This region is used to 
> > allocated
> > the rpmsg buffers.
> > The memory for the rpmsg buffer is allocated by the rpmsg_virtio device in
> > rpmsg_virtio_bus[1]. The size depends on the total size needed for the rpmsg
> > buffers.
> > 
> > The vrings are allocated directly by the remoteproc device.
> 
> Weird.  I thought virtio was pretty strict in not allowing diret DMA
> API usage in drivers to support the legacy no-mapping case.

Well remoteproc is weird in that it's use of DMA API precedes
standartization efforts, and it was never standardized in the virtio
spec ..

> Either way, the point stands:  if you want these magic buffers handed
> out to specific rpmsg instances I think not having to detour through the
> DMA API is going to make everyones life easier.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 04/12] vdpa: add vdpa simulator for block device

2020-11-18 Thread Stefan Hajnoczi
On Tue, Nov 17, 2020 at 06:38:11PM +0100, Stefano Garzarella wrote:
> On Tue, Nov 17, 2020 at 04:43:42PM +, Stefan Hajnoczi wrote:
> > On Tue, Nov 17, 2020 at 03:16:20PM +0100, Stefano Garzarella wrote:
> > > On Tue, Nov 17, 2020 at 11:11:21AM +, Stefan Hajnoczi wrote:
> > > > On Fri, Nov 13, 2020 at 02:47:04PM +0100, Stefano Garzarella wrote:
> > > > > +static void vdpasim_blk_work(struct work_struct *work)
> > > > > +{
> > > > > + struct vdpasim *vdpasim = container_of(work, struct vdpasim, 
> > > > > work);
> > > > > + u8 status = VIRTIO_BLK_S_OK;
> > > > > + int i;
> > > > > +
> > > > > + spin_lock(&vdpasim->lock);
> > > > > +
> > > > > + if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
> > > > > + goto out;
> > > > > +
> > > > > + for (i = 0; i < VDPASIM_BLK_VQ_NUM; i++) {
> > > > > + struct vdpasim_virtqueue *vq = &vdpasim->vqs[i];
> > > > > +
> > > > > + if (!vq->ready)
> > > > > + continue;
> > > > > +
> > > > > + while (vringh_getdesc_iotlb(&vq->vring, &vq->iov, 
> > > > > &vq->iov,
> > > > > + &vq->head, GFP_ATOMIC) > 0) 
> > > > > {
> > > > > +
> > > > > + int write;
> > > > > +
> > > > > + vq->iov.i = vq->iov.used - 1;
> > > > > + write = vringh_iov_push_iotlb(&vq->vring, 
> > > > > &vq->iov, &status, 1);
> > > > > + if (write <= 0)
> > > > > + break;
> > > >
> > > > We're lucky the guest driver doesn't crash after VIRTIO_BLK_T_GET_ID? :)
> > > 
> > > The crash could happen if the simulator doesn't put the string terminator,
> > > but in virtio_blk.c, the serial_show() initialize the buffer putting the
> > > string terminator in the VIRTIO_BLK_ID_BYTES element:
> > > 
> > > buf[VIRTIO_BLK_ID_BYTES] = '\0';
> > > err = virtblk_get_id(disk, buf);
> > > 
> > > This should prevent the issue, right?
> > > 
> > > However in the last patch of this series I implemented VIRTIO_BLK_T_GET_ID
> > > support :-)
> > 
> > Windows, BSD, macOS, etc guest drivers aren't necessarily going to
> > terminate or initialize the serial string buffer.
> 
> Unfortunately I discovered that VIRTIO_BLK_T_GET_ID is not in the VIRTIO
> specs, so, just for curiosity, I checked the QEMU code and I found this:
> 
> case VIRTIO_BLK_T_GET_ID:
> {
> /*
>  * NB: per existing s/n string convention the string is
>  * terminated by '\0' only when shorter than buffer.
>  */
> const char *serial = s->conf.serial ? s->conf.serial : "";
> size_t size = MIN(strlen(serial) + 1,
>   MIN(iov_size(in_iov, in_num),
>   VIRTIO_BLK_ID_BYTES));
> iov_from_buf(in_iov, in_num, 0, serial, size);
> virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
> virtio_blk_free_request(req);
> break;
> }
> 
> It seems that the device emulation in QEMU expects that the driver
> terminates the serial string buffer.
> 
> Do you know why VIRTIO_BLK_T_GET_ID is not in the specs?
> Should we add it?

It's about to be merged into the VIRTIO spec:
https://github.com/oasis-tcs/virtio-spec/issues/63

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 00/10] vhost/qemu: thread per IO SCSI vq

2020-11-18 Thread Stefan Hajnoczi
On Tue, Nov 17, 2020 at 01:13:14PM -0600, Mike Christie wrote:
> On 11/17/20 10:40 AM, Stefan Hajnoczi wrote:
> > On Thu, Nov 12, 2020 at 05:18:59PM -0600, Mike Christie wrote:
> >> The following kernel patches were made over Michael's vhost branch:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/log/?h=vhost
> >>
> >> and the vhost-scsi bug fix patchset:
> >>
> >> https://lore.kernel.org/linux-scsi/20201112170008.GB1555653@stefanha-x1.localdomain/T/#t
> >>
> >> And the qemu patch was made over the qemu master branch.
> >>
> >> vhost-scsi currently supports multiple queues with the num_queues
> >> setting, but we end up with a setup where the guest's scsi/block
> >> layer can do a queue per vCPU and the layers below vhost can do
> >> a queue per CPU. vhost-scsi will then do a num_queue virtqueues,
> >> but all IO gets set on and completed on a single vhost-scsi thread.
> >> After 2 - 4 vqs this becomes a bottleneck.
> >>
> >> This patchset allows us to create a worker thread per IO vq, so we
> >> can better utilize multiple CPUs with the multiple queues. It
> >> implments Jason's suggestion to create the initial worker like
> >> normal, then create the extra workers for IO vqs with the
> >> VHOST_SET_VRING_ENABLE ioctl command added in this patchset.
> > 
> > How does userspace find out the tids and set their CPU affinity?
> > 
> 
> When we create the worker thread we add it to the device owner's cgroup,
> so we end up inheriting those settings like affinity.
> 
> However, are you more asking about finer control like if the guest is
> doing mq, and the mq hw queue is bound to cpu0, it would perform
> better if we could bind vhost vq's worker thread to cpu0? I think the
> problem might is if you are in the cgroup then we can't set a specific
> threads CPU affinity to just one specific CPU. So you can either do
> cgroups or not.
> 
> 
> > What is the meaning of the new VHOST_SET_VRING_ENABLE ioctl? It doesn't
> > really "enable" or "disable" the vq, requests are processed regardless.
> > 
> 
> Yeah, I agree. The problem I've mentioned before is:
> 
> 1. For net and vsock, it's not useful because the vqs are hard coded in
> the kernel and userspace, so you can't disable a vq and you never need
> to enable one.
> 
> 2. vdpa has it's own enable ioctl.
> 
> 3. For scsi, because we already are doing multiple vqs based on the
> num_queues value, we have to have some sort of compat support and
> code to detect if userspace is even going to send the new ioctl.
> In this patchset, compat just meant enable/disable the extra functionality
> of extra worker threads for a vq. We will still use the vq if
> userspace set it up.
> 
> 
> > The purpose of the ioctl isn't clear to me because the kernel could
> > automatically create 1 thread per vq without a new ioctl. On the other
> > hand, if userspace is supposed to control worker threads then a
> > different interface would be more powerful:
> > 

The main request I have is to clearly define the meaning of the
VHOST_SET_VRING_ENABLE ioctl. If you want to keep it as-is for now and
the vhost maintainers are happy with then, that's okay. It should just
be documented so that userspace and other vhost driver authors
understand what it's supposed to do.

> My preference has been:
> 
> 1. If we were to ditch cgroups, then add a new interface that would allow
> us to bind threads to a specific CPU, so that it lines up with the guest's
> mq to CPU mapping.

A 1:1 vCPU/vq->CPU mapping isn't desirable in all cases.

The CPU affinity is a userspace policy decision. The host kernel should
provide a mechanism but not the policy. That way userspace can decide
which workers are shared by multiple vqs and on which physical CPUs they
should run.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 04/12] vdpa: add vdpa simulator for block device

2020-11-18 Thread Stefano Garzarella

On Wed, Nov 18, 2020 at 11:23:55AM +, Stefan Hajnoczi wrote:

On Tue, Nov 17, 2020 at 06:38:11PM +0100, Stefano Garzarella wrote:

On Tue, Nov 17, 2020 at 04:43:42PM +, Stefan Hajnoczi wrote:
> On Tue, Nov 17, 2020 at 03:16:20PM +0100, Stefano Garzarella wrote:
> > On Tue, Nov 17, 2020 at 11:11:21AM +, Stefan Hajnoczi wrote:
> > > On Fri, Nov 13, 2020 at 02:47:04PM +0100, Stefano Garzarella wrote:
> > > > +static void vdpasim_blk_work(struct work_struct *work)
> > > > +{
> > > > +   struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
> > > > +   u8 status = VIRTIO_BLK_S_OK;
> > > > +   int i;
> > > > +
> > > > +   spin_lock(&vdpasim->lock);
> > > > +
> > > > +   if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
> > > > +   goto out;
> > > > +
> > > > +   for (i = 0; i < VDPASIM_BLK_VQ_NUM; i++) {
> > > > +   struct vdpasim_virtqueue *vq = &vdpasim->vqs[i];
> > > > +
> > > > +   if (!vq->ready)
> > > > +   continue;
> > > > +
> > > > +   while (vringh_getdesc_iotlb(&vq->vring, &vq->iov, &vq->iov,
> > > > +   &vq->head, GFP_ATOMIC) > 0) {
> > > > +
> > > > +   int write;
> > > > +
> > > > +   vq->iov.i = vq->iov.used - 1;
> > > > +   write = vringh_iov_push_iotlb(&vq->vring, &vq->iov, 
&status, 1);
> > > > +   if (write <= 0)
> > > > +   break;
> > >
> > > We're lucky the guest driver doesn't crash after VIRTIO_BLK_T_GET_ID? :)
> >
> > The crash could happen if the simulator doesn't put the string terminator,
> > but in virtio_blk.c, the serial_show() initialize the buffer putting the
> > string terminator in the VIRTIO_BLK_ID_BYTES element:
> >
> > buf[VIRTIO_BLK_ID_BYTES] = '\0';
> > err = virtblk_get_id(disk, buf);
> >
> > This should prevent the issue, right?
> >
> > However in the last patch of this series I implemented VIRTIO_BLK_T_GET_ID
> > support :-)
>
> Windows, BSD, macOS, etc guest drivers aren't necessarily going to
> terminate or initialize the serial string buffer.

Unfortunately I discovered that VIRTIO_BLK_T_GET_ID is not in the VIRTIO
specs, so, just for curiosity, I checked the QEMU code and I found this:

case VIRTIO_BLK_T_GET_ID:
{
/*
 * NB: per existing s/n string convention the string is
 * terminated by '\0' only when shorter than buffer.
 */
const char *serial = s->conf.serial ? s->conf.serial : "";
size_t size = MIN(strlen(serial) + 1,
  MIN(iov_size(in_iov, in_num),
  VIRTIO_BLK_ID_BYTES));
iov_from_buf(in_iov, in_num, 0, serial, size);
virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
virtio_blk_free_request(req);
break;
}

It seems that the device emulation in QEMU expects that the driver
terminates the serial string buffer.

Do you know why VIRTIO_BLK_T_GET_ID is not in the specs?
Should we add it?


It's about to be merged into the VIRTIO spec:
https://github.com/oasis-tcs/virtio-spec/issues/63



Great! Thanks for the link!

Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 02/12] vdpa: split vdpasim to core and net modules

2020-11-18 Thread Stefano Garzarella

Hi Jason,
I just discovered that I missed the other questions in this email,
sorry for that!

On Mon, Nov 16, 2020 at 12:00:11PM +0800, Jason Wang wrote:


On 2020/11/13 下午9:47, Stefano Garzarella wrote:

From: Max Gurtovoy 

Introduce new vdpa_sim_net and vdpa_sim (core) drivers. This is a
preparation for adding a vdpa simulator module for block devices.

Signed-off-by: Max Gurtovoy 
[sgarzare: various cleanups/fixes]
Signed-off-by: Stefano Garzarella 
---
v1:
- Removed unused headers
- Removed empty module_init() module_exit()
- Moved vdpasim_is_little_endian() in vdpa_sim.h
- Moved vdpasim16_to_cpu/cpu_to_vdpasim16() in vdpa_sim.h
- Added vdpasim*_to_cpu/cpu_to_vdpasim*() also for 32 and 64
- Replaced 'select VDPA_SIM' with 'depends on VDPA_SIM' since selected
  option can not depend on other [Jason]



If possible, I would suggest to split this patch further:

1) convert to use void *config, and an attribute for setting config 
size during allocation

2) introduce supported_features
3) other attributes (#vqs)
4) rename config ops (more generic one)
5) introduce ops for set|get_config, set_get_features
6) real split




[...]


-static const struct vdpa_config_ops vdpasim_net_config_ops;
-static const struct vdpa_config_ops vdpasim_net_batch_config_ops;
+static const struct vdpa_config_ops vdpasim_config_ops;
+static const struct vdpa_config_ops vdpasim_batch_config_ops;
-static struct vdpasim *vdpasim_create(void)
+struct vdpasim *vdpasim_create(struct vdpasim_init_attr *attr)
 {
const struct vdpa_config_ops *ops;
struct vdpasim *vdpasim;
+   u32 device_id;
struct device *dev;
-   int ret = -ENOMEM;
+   int i, size, ret = -ENOMEM;
-   if (batch_mapping)
-   ops = &vdpasim_net_batch_config_ops;
+   device_id = attr->device_id;
+   /* Currently, we only accept the network and block devices. */
+   if (device_id != VIRTIO_ID_NET && device_id != VIRTIO_ID_BLOCK)
+   return ERR_PTR(-EOPNOTSUPP);
+
+   if (attr->batch_mapping)
+   ops = &vdpasim_batch_config_ops;
else
-   ops = &vdpasim_net_config_ops;
+   ops = &vdpasim_config_ops;
vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops, 
VDPASIM_VQ_NUM);
if (!vdpasim)
goto err_alloc;
-   INIT_WORK(&vdpasim->work, vdpasim_work);
+   if (device_id == VIRTIO_ID_NET)
+   size = sizeof(struct virtio_net_config);
+   else
+   size = sizeof(struct virtio_blk_config);



It's better to avoid such if/else consider we may introduce more type 
of devices.


Can we have an attribute of config size instead?


Yes, I'll move the patch 7 before this.

About config size and set/get_config ops, I'm not sure if it is better 
to hidden everything under the new set/get_config ops, allocating the 
config structure in each device, or leave the allocation in the core and 
update it like now.






+   vdpasim->config = kzalloc(size, GFP_KERNEL);
+   if (!vdpasim->config)
+   goto err_iommu;
+
+   vdpasim->device_id = device_id;
+   vdpasim->supported_features = attr->features;
+   INIT_WORK(&vdpasim->work, attr->work_fn);
spin_lock_init(&vdpasim->lock);
spin_lock_init(&vdpasim->iommu_lock);
@@ -379,23 +231,10 @@ static struct vdpasim *vdpasim_create(void)
if (!vdpasim->buffer)
goto err_iommu;
-   if (macaddr) {
-   mac_pton(macaddr, vdpasim->config.mac);
-   if (!is_valid_ether_addr(vdpasim->config.mac)) {
-   ret = -EADDRNOTAVAIL;
-   goto err_iommu;
-   }
-   } else {
-   eth_random_addr(vdpasim->config.mac);
-   }
-
-   vringh_set_iotlb(&vdpasim->vqs[0].vring, vdpasim->iommu);
-   vringh_set_iotlb(&vdpasim->vqs[1].vring, vdpasim->iommu);
+   for (i = 0; i < VDPASIM_VQ_NUM; i++)
+		vringh_set_iotlb(&vdpasim->vqs[i].vring, 
vdpasim->iommu);



And an attribute of #vqs here.


Yes.





vdpasim->vdpa.dma_dev = dev;
-   ret = vdpa_register_device(&vdpasim->vdpa);
-   if (ret)
-   goto err_iommu;
return vdpasim;
@@ -404,6 +243,7 @@ static struct vdpasim *vdpasim_create(void)
 err_alloc:
return ERR_PTR(ret);
 }
+EXPORT_SYMBOL_GPL(vdpasim_create);
 static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
  u64 desc_area, u64 driver_area,
@@ -498,28 +338,34 @@ static u32 vdpasim_get_vq_align(struct vdpa_device *vdpa)
 static u64 vdpasim_get_features(struct vdpa_device *vdpa)
 {
-   return vdpasim_features;
+   struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+
+   return vdpasim->supported_features;
 }
 static int vdpasim_set_features(struct vdpa_device *vdpa, u64 
 features)

 {
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
-   struct virtio_net_config *config = &vdpasim->config;
/* DMA mapping m

Re: netconsole deadlock with virtnet

2020-11-18 Thread Steven Rostedt


[ Adding netdev as perhaps someone there knows ]

On Wed, 18 Nov 2020 12:09:59 +0800
Jason Wang  wrote:

> > This CPU0 lock(_xmit_ETHER#2) -> hard IRQ -> lock(console_owner) is
> > basically
> > soft IRQ -> lock(_xmit_ETHER#2) -> hard IRQ -> printk()
> >
> > Then CPU1 spins on xmit, which is owned by CPU0, CPU0 spins on
> > console_owner, which is owned by CPU1?  

It still looks to me that the target_list_lock is taken in IRQ, (which can
be the case because printk calls write_msg() which takes that lock). And
someplace there's a:

lock(target_list_lock)
lock(xmit_lock)

which means you can remove the console lock from this scenario completely,
and you still have a possible deadlock between target_list_lock and
xmit_lock.

> 
> 
> If this is true, it looks not a virtio-net specific issue but somewhere 
> else.
> 
> I think all network driver will synchronize through bh instead of hardirq.

I think the issue is where target_list_lock is held when we take xmit_lock.
Is there anywhere in netconsole.c that can end up taking xmit_lock while
holding the target_list_lock? If so, that's the problem. As
target_list_lock is something that can be taken in IRQ context, which means
*any* other lock that is taking while holding the target_list_lock must
also protect against interrupts from happening while it they are held.

-- Steve
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 00/10] vhost/qemu: thread per IO SCSI vq

2020-11-18 Thread Jason Wang


On 2020/11/19 上午4:06, Mike Christie wrote:

On 11/18/20 1:54 AM, Jason Wang wrote:


On 2020/11/18 下午2:57, Mike Christie wrote:

On 11/17/20 11:17 PM, Jason Wang wrote:

On 2020/11/18 上午12:40, Stefan Hajnoczi wrote:

On Thu, Nov 12, 2020 at 05:18:59PM -0600, Mike Christie wrote:

The following kernel patches were made over Michael's vhost branch:

https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/log/?h=vhost__;!!GqivPVa7Brio!MzCv3wdRfz5dltunazRWGCeUkMg91pPEOLpIivsebLX9vhYDSi_E1V36e9H0NoRys_hU$ 


and the vhost-scsi bug fix patchset:

https://urldefense.com/v3/__https://lore.kernel.org/linux-scsi/20201112170008.GB1555653@stefanha-x1.localdomain/T/*t__;Iw!!GqivPVa7Brio!MzCv3wdRfz5dltunazRWGCeUkMg91pPEOLpIivsebLX9vhYDSi_E1V36e9H0NmuPE_m8$ 


And the qemu patch was made over the qemu master branch.

vhost-scsi currently supports multiple queues with the num_queues
setting, but we end up with a setup where the guest's scsi/block
layer can do a queue per vCPU and the layers below vhost can do
a queue per CPU. vhost-scsi will then do a num_queue virtqueues,
but all IO gets set on and completed on a single vhost-scsi thread.
After 2 - 4 vqs this becomes a bottleneck.

This patchset allows us to create a worker thread per IO vq, so we
can better utilize multiple CPUs with the multiple queues. It
implments Jason's suggestion to create the initial worker like
normal, then create the extra workers for IO vqs with the
VHOST_SET_VRING_ENABLE ioctl command added in this patchset.

How does userspace find out the tids and set their CPU affinity?

What is the meaning of the new VHOST_SET_VRING_ENABLE ioctl? It 
doesn't
really "enable" or "disable" the vq, requests are processed 
regardless.


Actually I think it should do the real "enable/disable" that tries 
to follow the virtio spec.



What does real mean here?



I think it means when a vq is disabled, vhost won't process any 
request from that virtqueue.




For the vdpa enable call for example, would it be like
ifcvf_vdpa_set_vq_ready where it sets the ready bit or more like 
mlx5_vdpa_set_vq_ready

where it can do some more work in the disable case?



For vDPA, it would be more complicated.

E.g for IFCVF, it just delay the setting of queue_enable when it get 
DRIVER_OK. Technically it can passthrough the queue_enable to the 
hardware as what mlx5e did.





For net and something like ifcvf_vdpa_set_vq_ready's design would we 
have
vhost_ring_ioctl() set some vhost_virtqueue enable bit. We then have 
some helper
vhost_vq_is_enabled() and some code to detect if userspace supports 
the new ioctl.



Yes, vhost support backend capability. When userspace negotiate the 
new capability, we should depend on SET_VRING_ENABLE, if not we can 
do vhost_vq_is_enable().



And then in vhost_net_set_backend do we call vhost_vq_is_enabled()? 
What is done

for disable then?



It needs more thought, but the question is not specific to 
SET_VRING_ENABLE. Consider guest may zero ring address as well.


For disabling, we can simply flush the work and disable all the polls.



It doesn't seem to buy a lot of new functionality. Is it just
so we follow the spec?



My understanding is that, since spec defines queue_enable, we should 
support it in vhost. And we can piggyback the delayed vq creation 
with this feature. Otherwise we will duplicate the function if we 
want to support queue_enable.



I had actually given up on the delayed vq creation goal. I'm still not 
sure how it's related to ENABLE and I think it gets pretty gross.


1. If we started from a semi-clean slate, and used the ENABLE ioctl 
more like a CREATE ioctl, and did the ENABLE after vhost dev open() 
but before any other ioctls, we can allocate the vq when we get the 
ENABLE ioctl. This fixes the issue where vhost scsi is allocating 128 
vqs at open() time. We can then allocate metadata like the iovecs at 
ENABLE time or when we get a setup ioctl that is related to the 
metadata, so it fixes that too.


That makes sense how ENABLE is related to delayed vq allocation and 
why we would want it.


If we now need to support old tools though, then you lose me. To try 
and keep the code paths using the same code, then at vhost dev open() 
time do we start vhost_dev_init with zero vqs like with the allocate 
at ENABLE time case? Then when we get the first vring or dev ioctl, do 
we allocate the vq and related metadata? If so, the ENABLE does not 
buy us a lot since we get the delayed allocation from the compat code. 
Also this compat case gets really messy when we are delaying the 
actual vq and not just the metadata.


If for the compat case, we keep the code that before/during 
vhost_dev_init allocates all the vqs and does the initialization, then 
we end up with 2 very very different code paths. And we also need a 
new modparam or something to tell the drivers to do the old or new 
open() behavior.



Right, so I think maybe we can take a step back. Instead of depending on 
exp

Re: [PATCH RFC 02/12] vdpa: split vdpasim to core and net modules

2020-11-18 Thread Jason Wang


On 2020/11/18 下午9:14, Stefano Garzarella wrote:

Hi Jason,
I just discovered that I missed the other questions in this email,
sorry for that!



No problem :)




On Mon, Nov 16, 2020 at 12:00:11PM +0800, Jason Wang wrote:


On 2020/11/13 下午9:47, Stefano Garzarella wrote:

From: Max Gurtovoy 

Introduce new vdpa_sim_net and vdpa_sim (core) drivers. This is a
preparation for adding a vdpa simulator module for block devices.

Signed-off-by: Max Gurtovoy 
[sgarzare: various cleanups/fixes]
Signed-off-by: Stefano Garzarella 
---
v1:
- Removed unused headers
- Removed empty module_init() module_exit()
- Moved vdpasim_is_little_endian() in vdpa_sim.h
- Moved vdpasim16_to_cpu/cpu_to_vdpasim16() in vdpa_sim.h
- Added vdpasim*_to_cpu/cpu_to_vdpasim*() also for 32 and 64
- Replaced 'select VDPA_SIM' with 'depends on VDPA_SIM' since selected
  option can not depend on other [Jason]



If possible, I would suggest to split this patch further:

1) convert to use void *config, and an attribute for setting config 
size during allocation

2) introduce supported_features
3) other attributes (#vqs)
4) rename config ops (more generic one)
5) introduce ops for set|get_config, set_get_features
6) real split




[...]


-static const struct vdpa_config_ops vdpasim_net_config_ops;
-static const struct vdpa_config_ops vdpasim_net_batch_config_ops;
+static const struct vdpa_config_ops vdpasim_config_ops;
+static const struct vdpa_config_ops vdpasim_batch_config_ops;
-static struct vdpasim *vdpasim_create(void)
+struct vdpasim *vdpasim_create(struct vdpasim_init_attr *attr)
 {
 const struct vdpa_config_ops *ops;
 struct vdpasim *vdpasim;
+    u32 device_id;
 struct device *dev;
-    int ret = -ENOMEM;
+    int i, size, ret = -ENOMEM;
-    if (batch_mapping)
-    ops = &vdpasim_net_batch_config_ops;
+    device_id = attr->device_id;
+    /* Currently, we only accept the network and block devices. */
+    if (device_id != VIRTIO_ID_NET && device_id != VIRTIO_ID_BLOCK)
+    return ERR_PTR(-EOPNOTSUPP);
+
+    if (attr->batch_mapping)
+    ops = &vdpasim_batch_config_ops;
 else
-    ops = &vdpasim_net_config_ops;
+    ops = &vdpasim_config_ops;
 vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops, 
VDPASIM_VQ_NUM);

 if (!vdpasim)
 goto err_alloc;
-    INIT_WORK(&vdpasim->work, vdpasim_work);
+    if (device_id == VIRTIO_ID_NET)
+    size = sizeof(struct virtio_net_config);
+    else
+    size = sizeof(struct virtio_blk_config);



It's better to avoid such if/else consider we may introduce more type 
of devices.


Can we have an attribute of config size instead?


Yes, I'll move the patch 7 before this.

About config size and set/get_config ops, I'm not sure if it is better 
to hidden everything under the new set/get_config ops, allocating the 
config structure in each device, or leave the allocation in the core 
and update it like now.



I think we'd better to avoid having any type specific codes in generic 
sim codes.



[...]



+config VDPA_SIM_NET
+    tristate "vDPA simulator for networking device"
+    depends on VDPA_SIM
+    default n



I remember somebody told me that if we don't enable a module it was 
disabled by default.


So, should I remove "default n" from vdpa_sim* entries?



Yes, but please do that in another patch.

Thanks




Thanks,
Stefano



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization