Re: [PATCH 0/1] virtio-scsi: fix missing unplug events when all LUNs are unplugged at the same time
On Wed, 29 Jul 2020 22:48:05 +0300, Maxim Levitsky wrote: > virtio-scsi currently has limit of 8 outstanding notifications so when more > that > 8 LUNs are unplugged, some are missed. > > Commit 5ff843721467 ("scsi: virtio_scsi: unplug LUNs when events missed") > Fixed this by checking the 'event overflow' bit and manually scanned the bus > to see which LUNs are still there. > > [...] Applied to 5.9/scsi-queue, thanks! [1/1] scsi: virtio-scsi: Correctly handle the case where all LUNs are unplugged https://git.kernel.org/mkp/scsi/c/b12149f2698c -- Martin K. Petersen Oracle Linux Engineering ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/1] scsi: virtio-scsi: handle correctly case when all LUNs were unplugged
On 29/07/20 21:48, Maxim Levitsky wrote: > Commit 5ff843721467 ("scsi: virtio_scsi: unplug LUNs when events missed"), > almost fixed the case of mass unpluging of LUNs, but it missed a > corner case in which all the LUNs are unplugged at the same time. > > In this case INQUIRY ends with DID_BAD_TARGET. > Detect this and unplug the LUN. > > Signed-off-by: Maxim Levitsky > --- > drivers/scsi/virtio_scsi.c | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c > index 0e0910c5b9424..c7f0c22b6f11d 100644 > --- a/drivers/scsi/virtio_scsi.c > +++ b/drivers/scsi/virtio_scsi.c > @@ -351,6 +351,16 @@ static void virtscsi_rescan_hotunplug(struct virtio_scsi > *vscsi) > /* PQ indicates the LUN is not attached */ > scsi_remove_device(sdev); > } > + > + else if (host_byte(result) == DID_BAD_TARGET) { > + /* > + * if all LUNs of a virtio-scsi device are unplugged, > + * it will respond with BAD TARGET on any INQUIRY > + * command. > + * Remove the device in this case as well > + */ > + scsi_remove_device(sdev); > + } > } > > kfree(inq_result); > Acked-by: Paolo Bonzini ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 1/1] scsi: virtio-scsi: handle correctly case when all LUNs were unplugged
Commit 5ff843721467 ("scsi: virtio_scsi: unplug LUNs when events missed"), almost fixed the case of mass unpluging of LUNs, but it missed a corner case in which all the LUNs are unplugged at the same time. In this case INQUIRY ends with DID_BAD_TARGET. Detect this and unplug the LUN. Signed-off-by: Maxim Levitsky --- drivers/scsi/virtio_scsi.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 0e0910c5b9424..c7f0c22b6f11d 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -351,6 +351,16 @@ static void virtscsi_rescan_hotunplug(struct virtio_scsi *vscsi) /* PQ indicates the LUN is not attached */ scsi_remove_device(sdev); } + + else if (host_byte(result) == DID_BAD_TARGET) { + /* +* if all LUNs of a virtio-scsi device are unplugged, +* it will respond with BAD TARGET on any INQUIRY +* command. +* Remove the device in this case as well +*/ + scsi_remove_device(sdev); + } } kfree(inq_result); -- 2.26.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 0/1] virtio-scsi: fix missing unplug events when all LUNs are unplugged at the same time
virtio-scsi currently has limit of 8 outstanding notifications so when more that 8 LUNs are unplugged, some are missed. Commit 5ff843721467 ("scsi: virtio_scsi: unplug LUNs when events missed") Fixed this by checking the 'event overflow' bit and manually scanned the bus to see which LUNs are still there. However there is a corner case when all LUNs are unplugged. In this case (which is not fully scsi confirmant IMHO), all scsi commands to such device respond with INVALID TARGET. This patch proposes to detect this and remove the LUN in this case as well. Maxim Levitsky (1): scsi: virtio-scsi: handle correctly case when all LUNs were unplugged drivers/scsi/virtio_scsi.c | 10 ++ 1 file changed, 10 insertions(+) -- 2.26.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 2/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()
> Am 29.07.2020 um 20:36 schrieb Mike Kravetz : > > On 7/29/20 11:08 AM, David Hildenbrand wrote: >> I have no clue what you mean with "reintroducing this abandoning of >> pageblocks". All this patch is changing is not doing the dump_page() - >> or am I missing something important? > > My apologies!!! > No worries, thanks for reviewing!! > I got confused when I saw 'Return -EBUSY' removed from the comment and > assumed the code would not return an error code. The code now more > explicitly does return -EBUSY. My concern was when I incorrectly thought > you were removing the error return code. Sorry for the noise. > > Acked-by: Mike Kravetz > -- > Mike Kravetz > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 2/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()
On 29.07.20 19:31, Mike Kravetz wrote: > On 6/30/20 7:26 AM, David Hildenbrand wrote: >> Right now, if we have two isolations racing, we might trigger the >> WARN_ON_ONCE() and to dump_page(NULL), dereferencing NULL. Let's just >> return directly. > > Just curious, what call path has the WARN_ON_ONCE()/dump_page(NULL)? See below, two set_migratetype_isolate() caller racing. > >> >> In the future, we might want to report -EAGAIN to the caller instead, as >> this could indicate a temporary isolation failure only. >> >> Cc: Andrew Morton >> Cc: Michal Hocko >> Cc: Michael S. Tsirkin >> Signed-off-by: David Hildenbrand > > Hi David, > > That 'return -EAGAIN' was added as a sort of synchronization mechanism. > See commit message for 2c7452a075d4d. Before adding the 'return -EAGAIN', > I could create races which would abandon isolated pageblocks. Repeating > those races over and over would result in a good chunk of system memory > being isolated and unusable. It's actually -EBUSY, it should maybe later be changed to -EAGAIN (see comment), so caller can decide to retry immediately. Other discussion. > > Admittedly, these races are rare and I had to work really hard to produce > them. I'll try to find my testing mechanism. My concern is reintroducing > this abandoning of pageblocks. I have not looked further in your series > to see if this potentially addressed later. If not, then we should not > remove the return code. > Memory offlining could race with alloc_contig_range(), e.g., called when allocating gigantic pages, or when virtio-mem tries to unplug memory. The latter two could also race. We are getting more alloc_contig_range() users, which is why these races will become more relevant. I have no clue what you mean with "reintroducing this abandoning of pageblocks". All this patch is changing is not doing the dump_page() - or am I missing something important? -- Thanks, David / dhildenb ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 3/6] mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()
On 29.07.20 15:24, Baoquan He wrote: > On 06/30/20 at 04:26pm, David Hildenbrand wrote: >> Inside has_unmovable_pages(), we have a comment describing how unmovable >> data could end up in ZONE_MOVABLE - via "movable_core". Also, besides > ~~~ 'movablecore' >> checking if the first page in the pageblock is reserved, we don't >> perform any further checks in case of ZONE_MOVABLE. >> >> In case of memory offlining, we set REPORT_FAILURE, properly >> dump_page() the page and handle the error gracefully. >> alloc_contig_pages() users currently never allocate from ZONE_MOVABLE. >> E.g., hugetlb uses alloc_contig_pages() for the allocation of gigantic >> pages only, which will never end up on the MOVABLE zone >> (see htlb_alloc_mask()). >> >> Cc: Andrew Morton >> Cc: Michal Hocko >> Cc: Michael S. Tsirkin >> Signed-off-by: David Hildenbrand >> --- >> mm/page_isolation.c | 16 ++-- >> 1 file changed, 6 insertions(+), 10 deletions(-) >> >> diff --git a/mm/page_isolation.c b/mm/page_isolation.c >> index 553b49a34cf71..02a01bff6b219 100644 >> --- a/mm/page_isolation.c >> +++ b/mm/page_isolation.c >> @@ -58,16 +58,12 @@ static int set_migratetype_isolate(struct page *page, >> int migratetype, int isol_ >> spin_unlock_irqrestore(&zone->lock, flags); >> if (!ret) { >> drain_all_pages(zone); >> -} else { >> -WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE); >> - >> -if ((isol_flags & REPORT_FAILURE) && unmovable) >> -/* >> - * printk() with zone->lock held will likely trigger a >> - * lockdep splat, so defer it here. >> - */ >> -dump_page(unmovable, "unmovable page"); >> -} >> +} else if ((isol_flags & REPORT_FAILURE) && unmovable) > > This else if branch should be enclosed in brace? > Not necessarily. And it will be gone in the next patch in this series :) Thanks! -- Thanks, David / dhildenb ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v1 1/6] mm/page_alloc: tweak comments in has_unmovable_pages()
On 29.07.20 12:47, Baoquan He wrote: > On 07/28/20 at 04:07pm, David Hildenbrand wrote: >> On 28.07.20 15:48, Baoquan He wrote: >>> On 06/30/20 at 04:26pm, David Hildenbrand wrote: Let's move the split comment regarding bootmem allocations and memory holes, especially in the context of ZONE_MOVABLE, to the PageReserved() check. Cc: Andrew Morton Cc: Michal Hocko Cc: Michael S. Tsirkin Signed-off-by: David Hildenbrand --- mm/page_alloc.c | 22 ++ 1 file changed, 6 insertions(+), 16 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 48eb0f1410d47..bd3ebf08f09b9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8207,14 +8207,6 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page, unsigned long iter = 0; unsigned long pfn = page_to_pfn(page); - /* - * TODO we could make this much more efficient by not checking every - * page in the range if we know all of them are in MOVABLE_ZONE and - * that the movable zone guarantees that pages are migratable but - * the later is not the case right now unfortunatelly. E.g. movablecore - * can still lead to having bootmem allocations in zone_movable. - */ - if (is_migrate_cma_page(page)) { /* * CMA allocations (alloc_contig_range) really need to mark @@ -8233,6 +8225,12 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page, page = pfn_to_page(pfn + iter); + /* + * Both, bootmem allocations and memory holes are marked + * PG_reserved and are unmovable. We can even have unmovable + * allocations inside ZONE_MOVABLE, for example when + * specifying "movable_core". >>> should be 'movablecore', we don't >>> have kernel parameter 'movable_core'. >> >> Agreed! >> >>> >>> Otherwise, this looks good to me. Esp the code comment at below had been >>> added very long time ago and obsolete. >>> >>> Reviewed-by: Baoquan He >>> >>> By the way, David, do you know what is the situation of having unmovable >>> allocations inside ZONE_MOVABLE when specifying 'movablecore'? I quickly >>> went through find_zone_movable_pfns_for_nodes(), but didn't get why. >>> Could you tell a little more detail about it? >> >> As far as I understand, it can happen that we have memblock allocations >> during boot that fall into an area the kernel later configures to span >> the movable zone (via movable_core). > > Seems yes, thanks a lot. Wondering who is still using > movablecore|kernelcore in what use case. > AFAIK, it's the only (guaranteed) way to get ZONE_MOVABLE without involving memory hotplug. As I learned, the movable zone is also interesting beyond memory hotunplug. It limits unmovable fragmentation and e.g., makes THP/huge pages more likely/easier to allocate. -- Thanks, David / dhildenb ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa
On 2020/7/29 下午5:55, Eli Cohen wrote: On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote: On 2020/7/28 下午5:04, Eli Cohen wrote: On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote: +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid) +{ + struct vhost_virtqueue *vq = &v->vqs[qid]; + const struct vdpa_config_ops *ops = v->vdpa->config; + struct vdpa_device *vdpa = v->vdpa; + int ret, irq; + + spin_lock(&vq->call_ctx.ctx_lock); + irq = ops->get_vq_irq(vdpa, qid); + if (!vq->call_ctx.ctx || irq == -EINVAL) { + spin_unlock(&vq->call_ctx.ctx_lock); + return; + } + If I understand correctly, this will cause these IRQs to be forwarded directly to the VCPU, e.g. will be handled by the guest/qemu. Yes, if it can bypassed, the interrupt will be delivered to vCPU directly. So, usually the network driver knows how to handle interrups for its devices. I assume the virtio_net driver at the guest has some default processing but what if the underlying hardware device (such as the case of vdpa) needs to take some actions? Virtio splits the bus operations out of device operations. So did the driver. The virtio-net driver depends on a transport driver to talk to the real device. Usually PCI is used as the transport for the device. In this case virtio-pci driver is in charge of dealing with irq allocation/free/configuration and it needs to co-operate with platform specific irqchip (virtualized by KVM) to finish the work like irq acknowledge etc. E.g for x86, the irq offloading can only work when there's a hardware support of virtual irqchip (APICv) then all stuffs could be done without vmexits. So no vendor specific part since the device and transport are all standard. Is there an option to do bounce the interrupt back to the vendor specific driver in the host so it can take these actions? Currently not, but even if we can do this, I'm afraid we will lose the performance advantage of irq bypassing. Does this mean that the host will not handle this interrupt? How does it work in case on level triggered interrupts? There's no guarantee that the KVM arch code can make sure the irq bypass work for any type of irq. So if they the irq will still need to be handled by host first. This means we should keep the host interrupt handler as a slowpath (fallback). In the case of ConnectX, I need to execute some code to acknowledge the interrupt. This turns out to be hard for irq bypassing to work. Is it because the irq is shared or what kind of ack you need to do? I have an EQ which is a queue for events comming from the hardware. This EQ can created so it reports only completion events but I still need to execute code that roughly tells the device that I saw these event records and then arm it again so it can report more interrupts (e.g if more packets are received or sent). This is device specific code. Any chance that the hardware can use MSI (which is not the case here)? Thanks Thanks Can you explain how this should be done? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: general protection fault in vsock_poll
On Wed, Jul 29, 2020 at 01:59:05AM -0700, syzbot wrote: > syzbot has bisected this issue to: > > commit 408624af4c89989117bb2c6517bd50b7708a2fcd > Author: Stefano Garzarella > Date: Tue Dec 10 10:43:06 2019 + > > vsock: use local transport when it is loaded > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=17e6489b10 > start commit: 92ed3019 Linux 5.8-rc7 > git tree: upstream > final oops: https://syzkaller.appspot.com/x/report.txt?x=1416489b10 > console output: https://syzkaller.appspot.com/x/log.txt?x=1016489b10 > kernel config: https://syzkaller.appspot.com/x/.config?x=84f076779e989e69 > dashboard link: https://syzkaller.appspot.com/bug?extid=a61bac2fcc1a7c6623fe > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15930b6490 > > Reported-by: syzbot+a61bac2fcc1a7c662...@syzkaller.appspotmail.com > Fixes: 408624af4c89 ("vsock: use local transport when it is loaded") > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection > I'll take a look. At first glance it seems strange, because if sk_state is TCP_ESTABLISHED, the transport shouldn't be NULL, that's why we didn't put the check. Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio_balloon: fix up endian-ness for free cmd id
On 2020/7/28 上午12:03, Michael S. Tsirkin wrote: free cmd id is read using virtio endian, spec says all fields in balloon are LE. Fix it up. Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT") Cc: sta...@vger.kernel.org Signed-off-by: Michael S. Tsirkin --- drivers/virtio/virtio_balloon.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 774deb65a9bb..798ec304fe3e 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -578,10 +578,14 @@ static int init_vqs(struct virtio_balloon *vb) static u32 virtio_balloon_cmd_id_received(struct virtio_balloon *vb) { if (test_and_clear_bit(VIRTIO_BALLOON_CONFIG_READ_CMD_ID, - &vb->config_read_bitmap)) + &vb->config_read_bitmap)) { virtio_cread(vb->vdev, struct virtio_balloon_config, free_page_hint_cmd_id, &vb->cmd_id_received_cache); + /* Legacy balloon config space is LE, unlike all other devices. */ + if (!virtio_has_feature(vb->vdev, VIRTIO_F_VERSION_1)) + vb->cmd_id_received_cache = le32_to_cpu((__force __le32)vb->cmd_id_received_cache); + } return vb->cmd_id_received_cache; } Acked-by: Jason Wang ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa
On 2020/7/28 下午5:04, Eli Cohen wrote: On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote: +static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid) +{ + struct vhost_virtqueue *vq = &v->vqs[qid]; + const struct vdpa_config_ops *ops = v->vdpa->config; + struct vdpa_device *vdpa = v->vdpa; + int ret, irq; + + spin_lock(&vq->call_ctx.ctx_lock); + irq = ops->get_vq_irq(vdpa, qid); + if (!vq->call_ctx.ctx || irq == -EINVAL) { + spin_unlock(&vq->call_ctx.ctx_lock); + return; + } + If I understand correctly, this will cause these IRQs to be forwarded directly to the VCPU, e.g. will be handled by the guest/qemu. Yes, if it can bypassed, the interrupt will be delivered to vCPU directly. Does this mean that the host will not handle this interrupt? How does it work in case on level triggered interrupts? There's no guarantee that the KVM arch code can make sure the irq bypass work for any type of irq. So if they the irq will still need to be handled by host first. This means we should keep the host interrupt handler as a slowpath (fallback). In the case of ConnectX, I need to execute some code to acknowledge the interrupt. This turns out to be hard for irq bypassing to work. Is it because the irq is shared or what kind of ack you need to do? Thanks Can you explain how this should be done? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization