Re: [PATCH 0/1] virtio-scsi: fix missing unplug events when all LUNs are unplugged at the same time

2020-07-29 Thread Martin K. Petersen
On Wed, 29 Jul 2020 22:48:05 +0300, Maxim Levitsky wrote:

> virtio-scsi currently has limit of 8 outstanding notifications so when more 
> that
> 8 LUNs are unplugged, some are missed.
> 
> Commit 5ff843721467 ("scsi: virtio_scsi: unplug LUNs when events missed")
> Fixed this by checking the 'event overflow' bit and manually scanned the bus
> to see which LUNs are still there.
> 
> [...]

Applied to 5.9/scsi-queue, thanks!

[1/1] scsi: virtio-scsi: Correctly handle the case where all LUNs are unplugged
  https://git.kernel.org/mkp/scsi/c/b12149f2698c

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] scsi: virtio-scsi: handle correctly case when all LUNs were unplugged

2020-07-29 Thread Paolo Bonzini
On 29/07/20 21:48, Maxim Levitsky wrote:
> Commit 5ff843721467 ("scsi: virtio_scsi: unplug LUNs when events missed"),
> almost fixed the case of mass unpluging of LUNs, but it missed a
> corner case in which all the LUNs are unplugged at the same time.
> 
> In this case INQUIRY ends with DID_BAD_TARGET.
> Detect this and unplug the LUN.
> 
> Signed-off-by: Maxim Levitsky 
> ---
>  drivers/scsi/virtio_scsi.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index 0e0910c5b9424..c7f0c22b6f11d 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -351,6 +351,16 @@ static void virtscsi_rescan_hotunplug(struct virtio_scsi 
> *vscsi)
>   /* PQ indicates the LUN is not attached */
>   scsi_remove_device(sdev);
>   }
> +
> + else if (host_byte(result) == DID_BAD_TARGET) {
> + /*
> +  * if all LUNs of a virtio-scsi device are unplugged,
> +  * it will respond with BAD TARGET on any INQUIRY
> +  * command.
> +  * Remove the device in this case as well
> +  */
> + scsi_remove_device(sdev);
> + }
>   }
>  
>   kfree(inq_result);
> 

Acked-by: Paolo Bonzini 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/1] scsi: virtio-scsi: handle correctly case when all LUNs were unplugged

2020-07-29 Thread Maxim Levitsky
Commit 5ff843721467 ("scsi: virtio_scsi: unplug LUNs when events missed"),
almost fixed the case of mass unpluging of LUNs, but it missed a
corner case in which all the LUNs are unplugged at the same time.

In this case INQUIRY ends with DID_BAD_TARGET.
Detect this and unplug the LUN.

Signed-off-by: Maxim Levitsky 
---
 drivers/scsi/virtio_scsi.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 0e0910c5b9424..c7f0c22b6f11d 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -351,6 +351,16 @@ static void virtscsi_rescan_hotunplug(struct virtio_scsi 
*vscsi)
/* PQ indicates the LUN is not attached */
scsi_remove_device(sdev);
}
+
+   else if (host_byte(result) == DID_BAD_TARGET) {
+   /*
+* if all LUNs of a virtio-scsi device are unplugged,
+* it will respond with BAD TARGET on any INQUIRY
+* command.
+* Remove the device in this case as well
+*/
+   scsi_remove_device(sdev);
+   }
}
 
kfree(inq_result);
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 0/1] virtio-scsi: fix missing unplug events when all LUNs are unplugged at the same time

2020-07-29 Thread Maxim Levitsky
virtio-scsi currently has limit of 8 outstanding notifications so when more that
8 LUNs are unplugged, some are missed.

Commit 5ff843721467 ("scsi: virtio_scsi: unplug LUNs when events missed")
Fixed this by checking the 'event overflow' bit and manually scanned the bus
to see which LUNs are still there.

However there is a corner case when all LUNs are unplugged.
In this case (which is not fully scsi confirmant IMHO), all scsi
commands to such device respond with INVALID TARGET.

This patch proposes to detect this and remove the LUN in this case
as well.

Maxim Levitsky (1):
  scsi: virtio-scsi: handle correctly case when all LUNs were unplugged

 drivers/scsi/virtio_scsi.c | 10 ++
 1 file changed, 10 insertions(+)

-- 
2.26.2


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 2/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()

2020-07-29 Thread David Hildenbrand


> Am 29.07.2020 um 20:36 schrieb Mike Kravetz :
> 
> On 7/29/20 11:08 AM, David Hildenbrand wrote:
>> I have no clue what you mean with "reintroducing this abandoning of
>> pageblocks". All this patch is changing is not doing the dump_page() -
>> or am I missing something important?
> 
> My apologies!!!
> 

No worries, thanks for reviewing!!

> I got confused when I saw 'Return -EBUSY' removed from the comment and
> assumed the code would not return an error code.  The code now more
> explicitly does return -EBUSY.  My concern was when I incorrectly thought
> you were removing the error return code.  Sorry for the noise.
> 
> Acked-by: Mike Kravetz 
> -- 
> Mike Kravetz
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 2/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()

2020-07-29 Thread David Hildenbrand
On 29.07.20 19:31, Mike Kravetz wrote:
> On 6/30/20 7:26 AM, David Hildenbrand wrote:
>> Right now, if we have two isolations racing, we might trigger the
>> WARN_ON_ONCE() and to dump_page(NULL), dereferencing NULL. Let's just
>> return directly.
> 
> Just curious, what call path has the WARN_ON_ONCE()/dump_page(NULL)?

See below, two set_migratetype_isolate() caller racing.

> 
>>
>> In the future, we might want to report -EAGAIN to the caller instead, as
>> this could indicate a temporary isolation failure only.
>>
>> Cc: Andrew Morton 
>> Cc: Michal Hocko 
>> Cc: Michael S. Tsirkin 
>> Signed-off-by: David Hildenbrand 
> 
> Hi David,
> 
> That 'return -EAGAIN' was added as a sort of synchronization mechanism.
> See commit message for 2c7452a075d4d.  Before adding the 'return -EAGAIN',
> I could create races which would abandon isolated pageblocks.  Repeating
> those races over and over would result in a good chunk of system memory
> being isolated and unusable.

It's actually -EBUSY, it should maybe later be changed to -EAGAIN (see
comment), so caller can decide to retry immediately. Other discussion.

> 
> Admittedly, these races are rare and I had to work really hard to produce
> them.  I'll try to find my testing mechanism.  My concern is reintroducing
> this abandoning of pageblocks.  I have not looked further in your series
> to see if this potentially addressed later.  If not, then we should not
> remove the return code.
> 

Memory offlining could race with alloc_contig_range(), e.g., called when
allocating gigantic pages, or when virtio-mem tries to unplug memory.
The latter two could also race.

We are getting more alloc_contig_range() users, which is why these races
will become more relevant.

I have no clue what you mean with "reintroducing this abandoning of
pageblocks". All this patch is changing is not doing the dump_page() -
or am I missing something important?


-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 3/6] mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()

2020-07-29 Thread David Hildenbrand
On 29.07.20 15:24, Baoquan He wrote:
> On 06/30/20 at 04:26pm, David Hildenbrand wrote:
>> Inside has_unmovable_pages(), we have a comment describing how unmovable
>> data could end up in ZONE_MOVABLE - via "movable_core". Also, besides
> ~~~ 'movablecore'
>> checking if the first page in the pageblock is reserved, we don't
>> perform any further checks in case of ZONE_MOVABLE.
>>
>> In case of memory offlining, we set REPORT_FAILURE, properly
>> dump_page() the page and handle the error gracefully.
>> alloc_contig_pages() users currently never allocate from ZONE_MOVABLE.
>> E.g., hugetlb uses alloc_contig_pages() for the allocation of gigantic
>> pages only, which will never end up on the MOVABLE zone
>> (see htlb_alloc_mask()).
>>
>> Cc: Andrew Morton 
>> Cc: Michal Hocko 
>> Cc: Michael S. Tsirkin 
>> Signed-off-by: David Hildenbrand 
>> ---
>>  mm/page_isolation.c | 16 ++--
>>  1 file changed, 6 insertions(+), 10 deletions(-)
>>
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 553b49a34cf71..02a01bff6b219 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -58,16 +58,12 @@ static int set_migratetype_isolate(struct page *page, 
>> int migratetype, int isol_
>>  spin_unlock_irqrestore(>lock, flags);
>>  if (!ret) {
>>  drain_all_pages(zone);
>> -} else {
>> -WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
>> -
>> -if ((isol_flags & REPORT_FAILURE) && unmovable)
>> -/*
>> - * printk() with zone->lock held will likely trigger a
>> - * lockdep splat, so defer it here.
>> - */
>> -dump_page(unmovable, "unmovable page");
>> -}
>> +} else if ((isol_flags & REPORT_FAILURE) && unmovable)
> 
> This else if branch should be enclosed in brace?
> 

Not necessarily. And it will be gone in the next patch in this series :)

Thanks!


-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 1/6] mm/page_alloc: tweak comments in has_unmovable_pages()

2020-07-29 Thread David Hildenbrand
On 29.07.20 12:47, Baoquan He wrote:
> On 07/28/20 at 04:07pm, David Hildenbrand wrote:
>> On 28.07.20 15:48, Baoquan He wrote:
>>> On 06/30/20 at 04:26pm, David Hildenbrand wrote:
 Let's move the split comment regarding bootmem allocations and memory
 holes, especially in the context of ZONE_MOVABLE, to the PageReserved()
 check.

 Cc: Andrew Morton 
 Cc: Michal Hocko 
 Cc: Michael S. Tsirkin 
 Signed-off-by: David Hildenbrand 
 ---
  mm/page_alloc.c | 22 ++
  1 file changed, 6 insertions(+), 16 deletions(-)

 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index 48eb0f1410d47..bd3ebf08f09b9 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -8207,14 +8207,6 @@ struct page *has_unmovable_pages(struct zone *zone, 
 struct page *page,
unsigned long iter = 0;
unsigned long pfn = page_to_pfn(page);
  
 -  /*
 -   * TODO we could make this much more efficient by not checking every
 -   * page in the range if we know all of them are in MOVABLE_ZONE and
 -   * that the movable zone guarantees that pages are migratable but
 -   * the later is not the case right now unfortunatelly. E.g. movablecore
 -   * can still lead to having bootmem allocations in zone_movable.
 -   */
 -
if (is_migrate_cma_page(page)) {
/*
 * CMA allocations (alloc_contig_range) really need to mark
 @@ -8233,6 +8225,12 @@ struct page *has_unmovable_pages(struct zone *zone, 
 struct page *page,
  
page = pfn_to_page(pfn + iter);
  
 +  /*
 +   * Both, bootmem allocations and memory holes are marked
 +   * PG_reserved and are unmovable. We can even have unmovable
 +   * allocations inside ZONE_MOVABLE, for example when
 +   * specifying "movable_core".
>>> should be 'movablecore', we don't
>>> have kernel parameter 'movable_core'.
>>
>> Agreed!
>>
>>>
>>> Otherwise, this looks good to me. Esp the code comment at below had been
>>> added very long time ago and obsolete.
>>>
>>> Reviewed-by: Baoquan He 
>>>
>>> By the way, David, do you know what is the situation of having unmovable
>>> allocations inside ZONE_MOVABLE when specifying 'movablecore'? I quickly
>>> went through find_zone_movable_pfns_for_nodes(), but didn't get why.
>>> Could you tell a little more detail about it?
>>
>> As far as I understand, it can happen that we have memblock allocations
>> during boot that fall into an area the kernel later configures to span
>> the movable zone (via movable_core).
> 
> Seems yes, thanks a lot. Wondering who is still using
> movablecore|kernelcore in what use case.
> 

AFAIK, it's the only (guaranteed) way to get ZONE_MOVABLE without
involving memory hotplug. As I learned, the movable zone is also
interesting beyond memory hotunplug. It limits unmovable fragmentation
and e.g., makes THP/huge pages more likely/easier to allocate.

-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa

2020-07-29 Thread Jason Wang


On 2020/7/29 下午5:55, Eli Cohen wrote:

On Wed, Jul 29, 2020 at 05:21:53PM +0800, Jason Wang wrote:

On 2020/7/28 下午5:04, Eli Cohen wrote:

On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote:

+static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)
+{
+   struct vhost_virtqueue *vq = >vqs[qid];
+   const struct vdpa_config_ops *ops = v->vdpa->config;
+   struct vdpa_device *vdpa = v->vdpa;
+   int ret, irq;
+
+   spin_lock(>call_ctx.ctx_lock);
+   irq = ops->get_vq_irq(vdpa, qid);
+   if (!vq->call_ctx.ctx || irq == -EINVAL) {
+   spin_unlock(>call_ctx.ctx_lock);
+   return;
+   }
+

If I understand correctly, this will cause these IRQs to be forwarded
directly to the VCPU, e.g. will be handled by the guest/qemu.


Yes, if it can bypassed, the interrupt will be delivered to vCPU directly.


So, usually the network driver knows how to handle interrups for its
devices. I assume the virtio_net driver at the guest has some default
processing but what if the underlying hardware device (such as the case
of vdpa) needs to take some actions?



Virtio splits the bus operations out of device operations. So did the 
driver.


The virtio-net driver depends on a transport driver to talk to the real 
device. Usually PCI is used as the transport for the device. In this 
case virtio-pci driver is in charge of dealing with irq 
allocation/free/configuration and it needs to co-operate with platform 
specific irqchip (virtualized by KVM) to finish the work like irq 
acknowledge etc.  E.g for x86, the irq offloading can only work when 
there's a hardware support of virtual irqchip (APICv) then all stuffs 
could be done without vmexits.


So no vendor specific part since the device and transport are all standard.



  Is there an option to do bounce the
interrupt back to the vendor specific driver in the host so it can take
these actions?



Currently not, but even if we can do this, I'm afraid we will lose the 
performance advantage of irq bypassing.






Does this mean that the host will not handle this interrupt? How does it
work in case on level triggered interrupts?


There's no guarantee that the KVM arch code can make sure the irq
bypass work for any type of irq. So if they the irq will still need
to be handled by host first. This means we should keep the host
interrupt handler as a slowpath (fallback).


In the case of ConnectX, I need to execute some code to acknowledge the
interrupt.


This turns out to be hard for irq bypassing to work. Is it because
the irq is shared or what kind of ack you need to do?

I have an EQ which is a queue for events comming from the hardware. This
EQ can created so it reports only completion events but I still need to
execute code that roughly tells the device that I saw these event
records and then arm it again so it can report more interrupts (e.g if
more packets are received or sent). This is device specific code.



Any chance that the hardware can use MSI (which is not the case here)?

Thanks



Thanks



Can you explain how this should be done?



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: general protection fault in vsock_poll

2020-07-29 Thread Stefano Garzarella
On Wed, Jul 29, 2020 at 01:59:05AM -0700, syzbot wrote:
> syzbot has bisected this issue to:
> 
> commit 408624af4c89989117bb2c6517bd50b7708a2fcd
> Author: Stefano Garzarella 
> Date:   Tue Dec 10 10:43:06 2019 +
> 
> vsock: use local transport when it is loaded
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=17e6489b10
> start commit:   92ed3019 Linux 5.8-rc7
> git tree:   upstream
> final oops: https://syzkaller.appspot.com/x/report.txt?x=1416489b10
> console output: https://syzkaller.appspot.com/x/log.txt?x=1016489b10
> kernel config:  https://syzkaller.appspot.com/x/.config?x=84f076779e989e69
> dashboard link: https://syzkaller.appspot.com/bug?extid=a61bac2fcc1a7c6623fe
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=15930b6490
> 
> Reported-by: syzbot+a61bac2fcc1a7c662...@syzkaller.appspotmail.com
> Fixes: 408624af4c89 ("vsock: use local transport when it is loaded")
> 
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> 

I'll take a look.

At first glance it seems strange, because if sk_state is TCP_ESTABLISHED,
the transport shouldn't be NULL, that's why we didn't put the check.

Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_balloon: fix up endian-ness for free cmd id

2020-07-29 Thread Jason Wang


On 2020/7/28 上午12:03, Michael S. Tsirkin wrote:

free cmd id is read using virtio endian, spec says all fields
in balloon are LE. Fix it up.

Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
Cc: sta...@vger.kernel.org
Signed-off-by: Michael S. Tsirkin 
---
  drivers/virtio/virtio_balloon.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 774deb65a9bb..798ec304fe3e 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -578,10 +578,14 @@ static int init_vqs(struct virtio_balloon *vb)
  static u32 virtio_balloon_cmd_id_received(struct virtio_balloon *vb)
  {
if (test_and_clear_bit(VIRTIO_BALLOON_CONFIG_READ_CMD_ID,
-  >config_read_bitmap))
+  >config_read_bitmap)) {
virtio_cread(vb->vdev, struct virtio_balloon_config,
 free_page_hint_cmd_id,
 >cmd_id_received_cache);
+   /* Legacy balloon config space is LE, unlike all other devices. 
*/
+   if (!virtio_has_feature(vb->vdev, VIRTIO_F_VERSION_1))
+   vb->cmd_id_received_cache = le32_to_cpu((__force 
__le32)vb->cmd_id_received_cache);
+   }
  
  	return vb->cmd_id_received_cache;

  }



Acked-by: Jason Wang 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa

2020-07-29 Thread Jason Wang


On 2020/7/28 下午5:04, Eli Cohen wrote:

On Tue, Jul 28, 2020 at 12:24:03PM +0800, Zhu Lingshan wrote:
  
+static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, int qid)

+{
+   struct vhost_virtqueue *vq = >vqs[qid];
+   const struct vdpa_config_ops *ops = v->vdpa->config;
+   struct vdpa_device *vdpa = v->vdpa;
+   int ret, irq;
+
+   spin_lock(>call_ctx.ctx_lock);
+   irq = ops->get_vq_irq(vdpa, qid);
+   if (!vq->call_ctx.ctx || irq == -EINVAL) {
+   spin_unlock(>call_ctx.ctx_lock);
+   return;
+   }
+

If I understand correctly, this will cause these IRQs to be forwarded
directly to the VCPU, e.g. will be handled by the guest/qemu.



Yes, if it can bypassed, the interrupt will be delivered to vCPU directly.



Does this mean that the host will not handle this interrupt? How does it
work in case on level triggered interrupts?



There's no guarantee that the KVM arch code can make sure the irq bypass 
work for any type of irq. So if they the irq will still need to be 
handled by host first. This means we should keep the host interrupt 
handler as a slowpath (fallback).





In the case of ConnectX, I need to execute some code to acknowledge the
interrupt.



This turns out to be hard for irq bypassing to work. Is it because the 
irq is shared or what kind of ack you need to do?


Thanks




Can you explain how this should be done?



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization