[vfio-users] lspci and vfio_pci_release deadlock when destroy a pci passthrough VM

2019-03-20 Thread Wuzongyong (Euler Dept)
Hi Alex,

I notice a patch you pushed in https://lkml.org/lkml/2019/2/18/1315
You said the previous commit you pushed may prone to deadlock, could you please 
share the details about how to reproduce the deadlock scene if you know it.
I met a similar question that all lspci command went into D state and libvirtd 
went into Z state when destroy a VM with a GPU passthrou. The stack like that:

2019-03-20T13:37:14.726514+07:00|err|kernel[-]|[2427373.553663] INFO: task 
ps:112058 blocked for more than 120 seconds.
2019-03-20T13:37:14.726576+07:00|err|kernel[-]|[2427373.553667] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
2019-03-20T13:37:14.726599+07:00|info|kernel[-]|[2427373.553669] ps 
 D  0 112058  1 0x0004
2019-03-20T13:37:14.726620+07:00|warning|kernel[-]|[2427373.553673] Call Trace:
2019-03-20T13:37:14.726640+07:00|warning|kernel[-]|[2427373.553682]  
[] schedule_preempt_disabled+0x29/0x70
2019-03-20T13:37:14.726668+07:00|warning|kernel[-]|[2427373.553684]  
[] __mutex_lock_slowpath+0xe1/0x170
2019-03-20T13:37:14.726689+07:00|warning|kernel[-]|[2427373.553689]  
[] mutex_lock+0x1f/0x2f
2019-03-20T13:37:14.726707+07:00|warning|kernel[-]|[2427373.553695]  
[] pci_bus_save_and_disable+0x37/0x70
2019-03-20T13:37:14.726725+07:00|warning|kernel[-]|[2427373.553697]  
[] pci_try_reset_bus+0x38/0x80
2019-03-20T13:37:14.726743+07:00|warning|kernel[-]|[2427373.553730]  
[] vfio_pci_release+0x3d5/0x430 [vfio_pci]
2019-03-20T13:37:14.726761+07:00|warning|kernel[-]|[2427373.553737]  
[] ? vfio_pci_rw+0xc0/0xc0 [vfio_pci]
2019-03-20T13:37:14.726779+07:00|warning|kernel[-]|[2427373.553745]  
[] vfio_device_fops_release+0x22/0x40 [vfio]
2019-03-20T13:37:14.726798+07:00|warning|kernel[-]|[2427373.553751]  
[] __fput+0xec/0x260
2019-03-20T13:37:14.726821+07:00|warning|kernel[-]|[2427373.553754]  
[] fput+0xe/0x10
2019-03-20T13:37:14.726840+07:00|warning|kernel[-]|[2427373.553758]  
[] task_work_run+0xaa/0xe0
2019-03-20T13:37:14.726858+07:00|warning|kernel[-]|[2427373.553763]  
[] do_notify_resume+0x92/0xb0
2019-03-20T13:37:14.726876+07:00|warning|kernel[-]|[2427373.553767]  
[] int_signal+0x12/0x17
2019-03-20T13:37:14.726892+07:00|err|kernel[-]|[2427373.553771] INFO: task 
lspci:139540 blocked for more than 120 seconds.
2019-03-20T13:37:14.726910+07:00|err|kernel[-]|[2427373.553772] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
2019-03-20T13:37:14.726929+07:00|info|kernel[-]|[2427373.553773] lspci  
 D  0 139540 139539 0x
2019-03-20T13:37:14.726948+07:00|warning|kernel[-]|[2427373.553776] Call Trace:
2019-03-20T13:37:14.726970+07:00|warning|kernel[-]|[2427373.553778]  
[] schedule+0x29/0x70
2019-03-20T13:37:14.726989+07:00|warning|kernel[-]|[2427373.553782]  
[] pci_wait_cfg+0xa0/0x110
2019-03-20T13:37:14.727006+07:00|warning|kernel[-]|[2427373.553787]  
[] ? wake_up_state+0x20/0x20
2019-03-20T13:37:14.727023+07:00|warning|kernel[-]|[2427373.553790]  
[] pci_user_read_config_dword+0x105/0x110
2019-03-20T13:37:14.727043+07:00|warning|kernel[-]|[2427373.553794]  
[] pci_read_config+0x114/0x2c0
2019-03-20T13:37:14.727063+07:00|warning|kernel[-]|[2427373.553799]  
[] ? __kmalloc+0x55/0x240
2019-03-20T13:37:14.727084+07:00|warning|kernel[-]|[2427373.553804]  
[] read+0xde/0x1f0
2019-03-20T13:37:14.727103+07:00|warning|kernel[-]|[2427373.553807]  
[] vfs_read+0x9f/0x170
2019-03-20T13:37:14.727123+07:00|warning|kernel[-]|[2427373.553809]  
[] SyS_pread64+0x92/0xc0
2019-03-20T13:37:14.727141+07:00|warning|kernel[-]|[2427373.553812]  
[] system_call_fastpath+0x1c/0x21

It seems that lspci and vfio_pci_release are in deadlock.

Thanks,
Zongyong Wu

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] PLX switch report a UR when EP tries to DMA to VM's memory

2018-09-13 Thread Wuzongyong (Euler Dept)
> > Hi,
> >
> > I notice a problem with a PCIe endpoint, which is behind a PLX switch,
> assigned to a VM by VFIO.
> > The problem is switch report a UR error when the EP tries to DMA to a
> memory zone inside VM's address space.
> > Assume that the DMA destination address is between in the VM's ram
> > address space, and unfortunately that address value in host's point of
> view just hit the PLX switch  upstream port BAR0 memory-mapped IO range.
> > In a result, the DMA will failed because SW think this memory request
> > is invalid if the destination address hit its UP's bar.
> > Is this a hardware bug or qemu/seabios doesn't maintain a proper address
> space for VM?
> 
> Upstream switch ports are generally single function devices and therefore
> governed by 6.12.1.3 (PCIe base spec rev 4.0, v1) which indicates an ACS
> capability must not be implemented.  We can therefore read into section
> 6.12.2 on interoperability which indicates the interaction between ACS and
> non-ACS components, including:
> 
>  * When ACS P2P Request Redirect, ACS P2P Completion Redirect, or both
>are being used, certain components in the PCI Express hierarchy must
>support ACS Upstream Forwarding (of Upstream redirected Requests).
>Specifically:
>...
>Between each ACS component where P2P TLP redirection is enabled and
>its associated Root Port, any intermediate Switches must support ACS
>Upstream Forwarding. Otherwise, how such Switches handle Upstream
>redirected TLPs is undefined.
> 
> It's my interpretation therefore that in a configuration where the switch
> downstream ports supports ACS, the switch upstream port must implicitly
> support upstream forwarding, thus I would consider this a hardware issue.
> The alternative is that we need to poke holes in the VM address space to
> account for any possible conflict and assigned device hot-add becomes
> nearly a non-starter.  Thanks,
> 
> Alex

Thanks for your explanation.
Do you know are there other vendors' switches that wouldn't result in this 
problem?

Thanks,
Zongyong Wu

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


[vfio-users] PLX switch report a UR when EP tries to DMA to VM's memory

2018-09-13 Thread Wuzongyong (Euler Dept)
Hi,

I notice a problem with a PCIe endpoint, which is behind a PLX switch, assigned 
to a VM by VFIO.
The problem is switch report a UR error when the EP tries to DMA to a memory 
zone inside VM's address space.
Assume that the DMA destination address is between in the VM's ram address 
space, and unfortunately that
address value in host's point of view just hit the PLX switch  upstream port 
BAR0 memory-mapped IO range.
In a result, the DMA will failed because SW think this memory request is 
invalid if the destination address hit
its UP's bar.
Is this a hardware bug or qemu/seabios doesn't maintain a proper address space 
for VM?

Thanks,
Zongyong Wu

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] host crash when assign 4 nics to 4 vms separately

2018-06-27 Thread Wuzongyong (Euler Dept)
> > Hi,
> >
> > Recently my colleague ran into a kernel crash problem when he tried to
> assign 4 nics to 4 vms separately.
> > Unfortunately he didn't collect related logs and we only can see the
> dmesg log when core dump currently.
> >
> > Here the info:
> >
> > linux:~ # lspci | grep -i eth
> > 02:00.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit
> Ethernet PCIe (rev 01)
> > 02:00.1 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit
> Ethernet PCIe (rev 01)
> > 02:00.2 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit
> Ethernet PCIe (rev 01)
> > 02:00.3 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit
> Ethernet PCIe (rev 01)
> > 81:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> SFI/SFP+ Network Connection (rev 01)
> > 81:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> SFI/SFP+ Network Connection (rev 01)
> > 82:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> SFI/SFP+ Network Connection (rev 01)
> > 82:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> SFI/SFP+ Network Connection (rev 01)
> >
> > He used the last four nics.
> >
> > Dmesg:
> >
> > [ 3449.519354] general protection fault:  [#1] SMP
> > [ 3449.682056] CPU: 8 PID: 26794 Comm: qemu-kvm Tainted: G   OE
>  ---   3.10.0-514.44.5.10_44.x86_64 #1
> 
> Are you able to reproduce this on an upstream kernel?  Thanks,
> 
> Alex
No, I can't reproduce it in our own environment either.

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


[vfio-users] How does a vfio container related to a iommu_domain

2018-05-25 Thread Wuzongyong (Euler Dept)
Hi,

I noted that the comments in vfio_iommu_type1_attach_group:
/*
* Try to match an existing compatible domain.  We don't want to
* preclude an IOMMU driver supporting multiple bus_types and 
being
* able to include different bus_types in the same IOMMU domain, 
so
* we test whether the domains use the same iommu_ops rather than
* testing if they're on the same bus_type.
*/

In intel x86 platforms currently, can I think a virtual machine have a 
one-to-one association with  a vfio container,
and a vfio container have a one-to-one association with a iommu_domain?.
Does there exist any scene or system that two domains use two different 
iommu_ops?

Thanks,
Wu Zongyong
___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] win7 report that the device cannot find enough free resources with a GPU behind a pci bridge

2018-05-09 Thread Wuzongyong (Euler Dept)
> > Hi,
> >
> > I found that win7/win2012r2 reported  "This device cannot find enough
> free resources that it can use"
> > with a NVIDIA GPU passthrough behind a PCI Bridge.
> > Here is a part of my xml:
> > 
> >   
> >   
> >> function='0x0'/> 
> > 
> >   
> >   
> >  function='0x0'/>
> >   
> >  > function='0x0'/>
> >
> > There is a similar problem in
> > https://bugzilla.redhat.com/show_bug.cgi?id=1273172 but I still don't
> > know what Is the root cause
> 
> Neither do we.  Current status is doesn't work, don't do it.  Thanks,
> 
> Alex
So does this problem exist in all windows versions? It seems that there
is no problem with linux.

Thanks,
Wu Zongyong

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


[vfio-users] win7 report that the device cannot find enough free resources with a GPU behind a pci bridge

2018-05-09 Thread Wuzongyong (Euler Dept)
Hi,

I found that win7/win2012r2 reported  "This device cannot find enough free 
resources that it can use"
with a NVIDIA GPU passthrough behind a PCI Bridge.
Here is a part of my xml:

  
  
  


  
  

  


There is a similar problem in 
https://bugzilla.redhat.com/show_bug.cgi?id=1273172 but I still don't know what
Is the root cause .

Thanks,
Zongyong Wu

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] qemu stuck when hot-add memory to a virtual machine with a device passthrough

2018-04-21 Thread Wuzongyong (Euler Dept)
> > > > Hi,
> > > >
> > > > The qemu process will  stuck when hot-add large size  memory to
> > > > the virtual machine with a device passtrhough.
> > > > We found it is too slow to pin and map pages in vfio_dma_do_map.
> > > > Is there any method to improve this process?
> > >
> > > At what size do you start to see problems?  The time to map a
> > > section of memory should be directly proportional to the size.  As
> > > the size is increased, it will take longer, but I don't know why
> > > you'd reach a point of not making forward progress.  Is it actually
> > > stuck or is it just taking longer than you want?  Using hugepages
> > > can certainly help, we still need to pin each PAGE_SIZE page within
> > > the hugepage, but we'll have larger contiguous regions and therefore
> > > call iommu_map() less frequently.  Please share more data.  Thanks,
> > >
> > > Alex
> > It just take longer time, instead of actually stuck.
> > We found that the problem exist when we hot-added 16G memory. And it
> > will consume tens of minutes when we hot-added 1T memory.
> 
> Is the stall adding 1TB roughly 64 times the stall adding 16GB or do we
> have some inflection in the size vs time curve?  There is a cost to
> pinning an mapping through the IOMMU, perhaps we can improve that, but I
> don't see how we can eliminate it or how it wouldn't be at least linear
> compared to the size of memory added without moving to a page request
> model, which hardly any hardware currently supports.  A workaround might
> be to incrementally add memory in smaller chunks which generate a less
> noticeable stall.  Thanks,
> 
> Alex
I collected a part of report as below recorded by perf when I hot-added 24GB 
memory:
+   63.41% 0.00%  qemu-kvm qemu-kvm-2.8.1-25.127   [.] 
0xffc7534a
+   63.41% 0.00%  qemu-kvm [kernel.vmlinux][k] 
do_vfs_ioctl
+   63.41% 0.00%  qemu-kvm [kernel.vmlinux][k] sys_ioctl
+   63.41% 0.00%  qemu-kvm libc-2.17.so[.] 
__GI___ioctl
+   63.41% 0.00%  qemu-kvm qemu-kvm-2.8.1-25.127   [.] 
0xffc71c59
+   63.10% 0.00%  qemu-kvm [vfio]  [k] 
vfio_fops_unl_ioctl
+   63.10% 0.00%  qemu-kvm qemu-kvm-2.8.1-25.127   [.] 
0xffcbbb6a
+   63.10% 0.02%  qemu-kvm [vfio_iommu_type1]  [k] 
vfio_iommu_type1_ioctl
+   60.67% 0.31%  qemu-kvm [vfio_iommu_type1]  [k] 
vfio_pin_pages_remote
+   60.06% 0.46%  qemu-kvm [vfio_iommu_type1]  [k] 
vaddr_get_pfn
+   59.61% 0.95%  qemu-kvm [kernel.vmlinux][k] 
get_user_pages_fast
+   54.28% 0.02%  qemu-kvm [kernel.vmlinux][k] 
get_user_pages_unlocked
+   54.24% 0.04%  qemu-kvm [kernel.vmlinux][k] 
__get_user_pages
+   54.13% 0.01%  qemu-kvm [kernel.vmlinux][k] 
handle_mm_fault
+   54.08% 0.03%  qemu-kvm [kernel.vmlinux][k] 
do_huge_pmd_anonymous_page
+   52.09%52.09%  qemu-kvm [kernel.vmlinux][k] 
clear_page
+9.42% 0.12%  swapper  [kernel.vmlinux][k] 
cpu_startup_entry
+9.20% 0.00%  swapper  [kernel.vmlinux][k] 
start_secondary
+8.85% 0.02%  swapper  [kernel.vmlinux][k] 
arch_cpu_idle
+8.79% 0.07%  swapper  [kernel.vmlinux][k] 
cpuidle_idle_call
+6.16% 0.29%  swapper  [kernel.vmlinux][k] 
apic_timer_interrupt
+5.73% 0.07%  swapper  [kernel.vmlinux][k] 
smp_apic_timer_interrupt
+4.34% 0.99%  qemu-kvm [kernel.vmlinux][k] 
gup_pud_range
+3.56% 0.16%  swapper  [kernel.vmlinux][k] 
local_apic_timer_interrupt
+3.32% 0.41%  swapper  [kernel.vmlinux][k] 
hrtimer_interrupt
+3.25% 3.21%  qemu-kvm [kernel.vmlinux][k] 
gup_huge_pmd
+2.31% 0.01%  qemu-kvm [kernel.vmlinux][k] iommu_map
+2.30% 0.00%  qemu-kvm [kernel.vmlinux][k] 
intel_iommu_map

It seems that the bottleneck is trying to pin pages through get_user_pages 
instead of do iommu mapping.

Thanks,
Wu Zongyong

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] qemu stuck when hot-add memory to a virtual machine with a device passthrough

2018-04-19 Thread Wuzongyong (Euler Dept)
> > > > Hi,
> > > >
> > > > The qemu process will  stuck when hot-add large size  memory to
> > > > the virtual machine with a device passtrhough.
> > > > We found it is too slow to pin and map pages in vfio_dma_do_map.
> > > > Is there any method to improve this process?
> > >
> > > At what size do you start to see problems?  The time to map a
> > > section of memory should be directly proportional to the size.  As
> > > the size is increased, it will take longer, but I don't know why
> > > you'd reach a point of not making forward progress.  Is it actually
> > > stuck or is it just taking longer than you want?  Using hugepages
> > > can certainly help, we still need to pin each PAGE_SIZE page within
> > > the hugepage, but we'll have larger contiguous regions and therefore
> > > call iommu_map() less frequently.  Please share more data.  Thanks,
> > >
> > > Alex
> > It just take longer time, instead of actually stuck.
> > We found that the problem exist when we hot-added 16G memory. And it
> > will consume tens of minutes when we hot-added 1T memory.
> 
> Is the stall adding 1TB roughly 64 times the stall adding 16GB or do we
> have some inflection in the size vs time curve?  There is a cost to
> pinning an mapping through the IOMMU, perhaps we can improve that, but I
> don't see how we can eliminate it or how it wouldn't be at least linear
> compared to the size of memory added without moving to a page request
> model, which hardly any hardware currently supports.  A workaround might
> be to incrementally add memory in smaller chunks which generate a less
> noticeable stall.  Thanks,
> 
> Alex
It took about 1 minute to add 16GB and about 40 minutes to add 1TB.

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


Re: [vfio-users] qemu stuck when hot-add memory to a virtual machine with a device passthrough

2018-04-18 Thread Wuzongyong (Euler Dept)
> > Hi,
> >
> > The qemu process will  stuck when hot-add large size  memory to the
> > virtual machine with a device passtrhough.
> > We found it is too slow to pin and map pages in vfio_dma_do_map.
> > Is there any method to improve this process?
> 
> At what size do you start to see problems?  The time to map a section of
> memory should be directly proportional to the size.  As the size is
> increased, it will take longer, but I don't know why you'd reach a point
> of not making forward progress.  Is it actually stuck or is it just taking
> longer than you want?  Using hugepages can certainly help, we still need
> to pin each PAGE_SIZE page within the hugepage, but we'll have larger
> contiguous regions and therefore call iommu_map() less frequently.  Please
> share more data.  Thanks,
> 
> Alex
It just take longer time, instead of actually stuck. 
We found that the problem exist when we hot-added 16G memory. And it will 
consume
tens of minutes when we hot-added 1T memory.

Thanks,
Wu Zongyong

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


[vfio-users] qemu stuck when hot-add memory to a virtual machine with a device passthrough

2018-04-18 Thread Wuzongyong (Euler Dept)
Hi,

The qemu process will  stuck when hot-add large size  memory to the virtual 
machine
with a device passtrhough.
We found it is too slow to pin and map pages in vfio_dma_do_map.
Is there any method to improve this process?

Thanks,
Zongyong Wu

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users


[vfio-users] Is there some method to merge 2 iommu group if I disable ACS ?

2017-08-08 Thread Wuzongyong (Euler Dept)
Hi,

Assume that an endpoint device(called ep1) belongs to iommu group 1, and 
another endpoint device(called ep2) belongs to iommu group 2.
Moreover, these two devices locate in different downstream ports of the same 
switch respectively. If I disable the ACS of these downstream
ports,  we know ep1 and ep2 should locate in the same iommu group. So the 
question is if I can regenerate a iommu group to let ep1 and ep2
locate in the same iommu group and let ep1 and ep2  can't be assigned to two 
different VMs without  host rebooting?


Thanks,
Cordius Wu

___
vfio-users mailing list
vfio-users@redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users