Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-04-07 Thread Peter Xu
On Mon, Apr 08, 2019 at 12:32:12AM +, Tian, Kevin wrote:

[...]

> > > > Probably.  Currently VT-d emulation does not support snooping control,
> > > > and if you modify that ecap only you probably will encounter this
> > > > problem because then the guest kernel will setup the SNP bit in the
> > > > IOMMU page table entries which will violate the reserved bits in the
> > > > emulation code then you can see these errors.
> > > >
> > > > Now talking about implementing the Snoop Control for Intel IOMMU for
> > > > real (which corresponds to vt-d ecap bit 7) - I'd confess I'm not 100%
> > > > clear on what does the "snooping" mean and what we need to do as an
> > > > emulator. I'm quotting from spec:
> > > >
> > > >   "Snoop behavior for a memory access (to a translation structure
> > > >   entry or access to the mapped page) specifies if the access is
> > > >   coherent (snoops the processor caches) or not."
> > > >
> > > > If it is only a capability showing that whether the hardware is
> > > > capable of snooping processor caches, then I don't think we need to do
> > > > much here as an emulator of VT-d simply because when we access the
> > > > data we're still from the processor's side (because we're emulating
> > > > the IOMMU behavior only) so the cache should always been coherent
> > from
> > > > the POV of guest vCPUs, just like how the processors provide cache
> > > > coherence between two cores (so IMHO here the VT-d emulation code
> > can
> > > > be run on one core/thread, and the vcpu which runs the guest iommu
> > > > driver can be run on another core/thread).  If so, maybe we can simply
> > > > declare support of that but we at least also need to remove the SNP
> > > > bit from vtd_paging_entry_rsvd_field[] array to reflect that we
> > > > understand that bit.
> > > >
> > > > CCing Alex and Kevin to see whether I'm misunderstanding or in case of
> > > > any further input on the snooping support.
> > > >
> > >
> > > for software DMA yes snoop is guaranteed since it's just CPU access.
> > >
> > > However for VFIO device i.e. hardware DMA, snoop should be reported
> > > based on physical IOMMU capability. It's fine to report no snoop control 
> > > on
> > > vIOMMU (current state) even when it's physically supported. It just 
> > > results
> > > that L1 VMM must favor guest cache attributes instead of forcing WB in L1
> > > EPT when doing nested passthrough. However it's incorrect to report snoop
> > > control on vIOMMU when physically it's not supported, otherwise L1 VMM
> > > may force WB in L1 EPT and enable snoop field in vIOMMU 2nd level PTE
> > with
> > > assumption that hardware snoop is guaranteed (however it isn't). Then it
> > > becomes a correctness issue.
> > >
> > 
> > If my device is fully emulated, can I ignore the SNP bit in the SLPTE? What 
> > is
> > the cost of ignoring it in such a case? What could go wrong?
> > (I tried to ignore it and it seems that translations work for me now).
> > 
> 
> I'm not sure what you meant by 'ignore' here. But as earlier pointed
> out by Peter, for emulated devices you don't need do anything special
> here. You can just report snoop capability and then remove it from
> reserved bit check in SLPTE.

Yes.  For simplicity, you can add a new patch for a new property
"x-snooping" into vtd_properties and make it false by default, then
allow the user to turn it on manually considering that the user should
be clear on the consequence of this knob.

Later on we can consider to enrich this property by checking the host
configurations when detected assigned devices (I feel like it can be a
VFIO_DMA_CC_IOMMU ioctl upon every assigned device, or container), or
more.

Regards,

-- 
Peter Xu



Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-04-07 Thread Tian, Kevin
> From: Elijah Shakkour [mailto:elija...@mellanox.com]
> Sent: Sunday, April 7, 2019 9:47 PM
> 
> 
> > -Original Message-
> > From: Tian, Kevin 
> > Sent: Thursday, April 4, 2019 10:58 AM
> > To: Peter Xu ; Elijah Shakkour
> > 
> > Cc: Knut Omang ; Michael S. Tsirkin
> > ; Alex Williamson ;
> > Marcel Apfelbaum ; Stefan Hajnoczi
> > ; qemu-devel@nongnu.org
> > Subject: RE: QEMU and vIOMMU support for emulated VF passthrough to
> > nested (L2) VM
> >
> > > From: Peter Xu [mailto:pet...@redhat.com]
> > > Sent: Thursday, April 4, 2019 3:00 PM
> > >
> > > On Wed, Apr 03, 2019 at 10:10:35PM +, Elijah Shakkour wrote:
> > >
> > > [...]
> > >
> > > > > > > > > > You can also try to enable VT-d device log by appending:
> > > > > > > > > >
> > > > > > > > > >   -trace enable="vtd_*"
> > > > > > > > > >
> > > > > > > > > > In case it dumps anything useful for you.
> > > > > > >
> > > > > > > Here is the relevant dump (dev 01:00.01 is my VF):
> > > > > > > "
> > > > > > > vtd_inv_desc_cc_device context invalidate device 01:00.01
> > > > > > > vtd_ce_not_present Context entry bus 1 devfn 1 not present
> > > > > > > vtd_switch_address_space Device 01:00.1 switching address
> > > > > > > space (iommu
> > > > > > > enabled=1) vtd_ce_not_present Context entry bus 1 devfn 1 not
> > > > > > > present vtd_err Detected invalid context entry when trying to
> > > > > > > sync shadow page table
> > > > > >
> > > > > > These lines mean that the guest sent a device invalidation to
> > > > > > your VF but the IOMMU found that the device context entry is
> > missing.
> > > > > >
> > > > > > > vtd_iotlb_cc_update IOTLB context update bus 0x1 devfn 0x1
> > > > > > > high
> > > > > > > 0x102 low 0x2d007003 gen 0 -> gen 2
> > > > > > > vtd_err_dmar_slpte_resv_error iova
> > > > > > > 0xf08e7000 level 2 slpte 0x2a54008f7
> > > > > >
> > > > > > This line should not exist in latest QEMU.  Are you sure you're
> > > > > > using the latest QEMU?
> > > > >
> > > > > I moved now to QEMU 4.0 RC2.
> > > > > This is the what I get now:
> > > > > vtd_iotlb_cc_update IOTLB context update bus 0x1 devfn 0x1 high
> > > > > 0x102
> > > low
> > > > > 0x2f007003 gen 0 -> gen 1
> > > > > qemu-system-x86_64: vtd_iova_to_slpte: detected splte reserve
> > > > > non-zero iova=0xf0d29000, level=0x2slpte=0x29f6008f7)
> > > > > vtd_fault_disabled Fault processing disabled for context entry
> > > > > qemu-system-x86_64: vtd_iommu_translate: detected translation
> > > > > failure (dev=01:00:01, iova=0xf0d29000) Unassigned mem read
> > > f0d29000
> > > > >
> > > > > I'm not familiar with vIOMMU registers, but I noticed that I must
> > > > > report snoop control support to Hyper-V (i.e. bit 7 in extended
> > > > > capability register
> > > of
> > > > > vIOMMU) in-order to satisfy IOMMU support for SRIOV.
> > > > > vIOMMU.ecap before0xf00f5e
> > > > > vIOMMU.ecap after   0xf00fde
> > > > > But I see that vIOMMU doesn't really support snoop control.
> > > > > Could this be the problem that fails IOVA range check in this
> > > > > function vtd_iova_range_check()?
> > > >
> > > > Sorry, I meant the SLPTE reserved non-zero check failure in
> > > vtd_slpte_nonzero_rsvd()
> > > > And NOT IOVA range check failure (since range check didn't fail)
> > >
> > > Probably.  Currently VT-d emulation does not support snooping control,
> > > and if you modify that ecap only you probably will encounter this
> > > problem because then the guest kernel will setup the SNP bit in the
> > > IOMMU page table entries which will violate the reserved bits in the
> > > emulation code then you can see these errors.
> > >
> > > Now talking about implementing the Snoop Control for Intel IOMMU for
> > > real (which corresponds to vt-d ecap bit 7) - I'd confess I'm not 100%
> > > clear on what does the "snooping" mean and what we need to do as an
> > > emulator. I'm quotting from spec:
> > >
> > >   "Snoop behavior for a memory access (to a translation structure
> > >   entry or access to the mapped page) specifies if the access is
> > >   coherent (snoops the processor caches) or not."
> > >
> > > If it is only a capability showing that whether the hardware is
> > > capable of snooping processor caches, then I don't think we need to do
> > > much here as an emulator of VT-d simply because when we access the
> > > data we're still from the processor's side (because we're emulating
> > > the IOMMU behavior only) so the cache should always been coherent
> from
> > > the POV of guest vCPUs, just like how the processors provide cache
> > > coherence between two cores (so IMHO here the VT-d emulation code
> can
> > > be run on one core/thread, and the vcpu which runs the guest iommu
> > > driver can be run on another core/thread).  If so, maybe we can simply
> > > declare support of that but we at least also need to remove the SNP
> > > bit from vtd_paging_entry_rsvd_field[] array to reflect that we
> > > understand that bit.
> > >
> > > CCing Alex and Kevin to see whether I'm 

Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-04-07 Thread Elijah Shakkour


> -Original Message-
> From: Tian, Kevin 
> Sent: Thursday, April 4, 2019 10:58 AM
> To: Peter Xu ; Elijah Shakkour
> 
> Cc: Knut Omang ; Michael S. Tsirkin
> ; Alex Williamson ;
> Marcel Apfelbaum ; Stefan Hajnoczi
> ; qemu-devel@nongnu.org
> Subject: RE: QEMU and vIOMMU support for emulated VF passthrough to
> nested (L2) VM
> 
> > From: Peter Xu [mailto:pet...@redhat.com]
> > Sent: Thursday, April 4, 2019 3:00 PM
> >
> > On Wed, Apr 03, 2019 at 10:10:35PM +, Elijah Shakkour wrote:
> >
> > [...]
> >
> > > > > > > > > You can also try to enable VT-d device log by appending:
> > > > > > > > >
> > > > > > > > >   -trace enable="vtd_*"
> > > > > > > > >
> > > > > > > > > In case it dumps anything useful for you.
> > > > > >
> > > > > > Here is the relevant dump (dev 01:00.01 is my VF):
> > > > > > "
> > > > > > vtd_inv_desc_cc_device context invalidate device 01:00.01
> > > > > > vtd_ce_not_present Context entry bus 1 devfn 1 not present
> > > > > > vtd_switch_address_space Device 01:00.1 switching address
> > > > > > space (iommu
> > > > > > enabled=1) vtd_ce_not_present Context entry bus 1 devfn 1 not
> > > > > > present vtd_err Detected invalid context entry when trying to
> > > > > > sync shadow page table
> > > > >
> > > > > These lines mean that the guest sent a device invalidation to
> > > > > your VF but the IOMMU found that the device context entry is
> missing.
> > > > >
> > > > > > vtd_iotlb_cc_update IOTLB context update bus 0x1 devfn 0x1
> > > > > > high
> > > > > > 0x102 low 0x2d007003 gen 0 -> gen 2
> > > > > > vtd_err_dmar_slpte_resv_error iova
> > > > > > 0xf08e7000 level 2 slpte 0x2a54008f7
> > > > >
> > > > > This line should not exist in latest QEMU.  Are you sure you're
> > > > > using the latest QEMU?
> > > >
> > > > I moved now to QEMU 4.0 RC2.
> > > > This is the what I get now:
> > > > vtd_iotlb_cc_update IOTLB context update bus 0x1 devfn 0x1 high
> > > > 0x102
> > low
> > > > 0x2f007003 gen 0 -> gen 1
> > > > qemu-system-x86_64: vtd_iova_to_slpte: detected splte reserve
> > > > non-zero iova=0xf0d29000, level=0x2slpte=0x29f6008f7)
> > > > vtd_fault_disabled Fault processing disabled for context entry
> > > > qemu-system-x86_64: vtd_iommu_translate: detected translation
> > > > failure (dev=01:00:01, iova=0xf0d29000) Unassigned mem read
> > f0d29000
> > > >
> > > > I'm not familiar with vIOMMU registers, but I noticed that I must
> > > > report snoop control support to Hyper-V (i.e. bit 7 in extended
> > > > capability register
> > of
> > > > vIOMMU) in-order to satisfy IOMMU support for SRIOV.
> > > > vIOMMU.ecap before0xf00f5e
> > > > vIOMMU.ecap after   0xf00fde
> > > > But I see that vIOMMU doesn't really support snoop control.
> > > > Could this be the problem that fails IOVA range check in this
> > > > function vtd_iova_range_check()?
> > >
> > > Sorry, I meant the SLPTE reserved non-zero check failure in
> > vtd_slpte_nonzero_rsvd()
> > > And NOT IOVA range check failure (since range check didn't fail)
> >
> > Probably.  Currently VT-d emulation does not support snooping control,
> > and if you modify that ecap only you probably will encounter this
> > problem because then the guest kernel will setup the SNP bit in the
> > IOMMU page table entries which will violate the reserved bits in the
> > emulation code then you can see these errors.
> >
> > Now talking about implementing the Snoop Control for Intel IOMMU for
> > real (which corresponds to vt-d ecap bit 7) - I'd confess I'm not 100%
> > clear on what does the "snooping" mean and what we need to do as an
> > emulator. I'm quotting from spec:
> >
> >   "Snoop behavior for a memory access (to a translation structure
> >   entry or access to the mapped page) specifies if the access is
> >   coherent (snoops the processor caches) or not."
> >
> > If it is only a capability showing that whether the hardware is
> > capable of snooping processor caches, then I don't think we need to do
> > much here as an emulator of VT-d simply because when we access the
> > data we're still from the processor's side (because we're emulating
> > the IOMMU behavior only) so the cache should always been coherent from
> > the POV of guest vCPUs, just like how the processors provide cache
> > coherence between two cores (so IMHO here the VT-d emulation code can
> > be run on one core/thread, and the vcpu which runs the guest iommu
> > driver can be run on another core/thread).  If so, maybe we can simply
> > declare support of that but we at least also need to remove the SNP
> > bit from vtd_paging_entry_rsvd_field[] array to reflect that we
> > understand that bit.
> >
> > CCing Alex and Kevin to see whether I'm misunderstanding or in case of
> > any further input on the snooping support.
> >
> 
> for software DMA yes snoop is guaranteed since it's just CPU access.
> 
> However for VFIO device i.e. hardware DMA, snoop should be reported
> based on physical IOMMU capability. It's fine to report no snoop control on
> v

Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-04-04 Thread Tian, Kevin
> From: Peter Xu [mailto:pet...@redhat.com]
> Sent: Thursday, April 4, 2019 3:00 PM
> 
> On Wed, Apr 03, 2019 at 10:10:35PM +, Elijah Shakkour wrote:
> 
> [...]
> 
> > > > > > > > You can also try to enable VT-d device log by appending:
> > > > > > > >
> > > > > > > >   -trace enable="vtd_*"
> > > > > > > >
> > > > > > > > In case it dumps anything useful for you.
> > > > >
> > > > > Here is the relevant dump (dev 01:00.01 is my VF):
> > > > > "
> > > > > vtd_inv_desc_cc_device context invalidate device 01:00.01
> > > > > vtd_ce_not_present Context entry bus 1 devfn 1 not present
> > > > > vtd_switch_address_space Device 01:00.1 switching address space
> > > > > (iommu
> > > > > enabled=1) vtd_ce_not_present Context entry bus 1 devfn 1 not
> > > > > present vtd_err Detected invalid context entry when trying to sync
> > > > > shadow page table
> > > >
> > > > These lines mean that the guest sent a device invalidation to your VF
> > > > but the IOMMU found that the device context entry is missing.
> > > >
> > > > > vtd_iotlb_cc_update IOTLB context update bus 0x1 devfn 0x1 high
> > > > > 0x102 low 0x2d007003 gen 0 -> gen 2 vtd_err_dmar_slpte_resv_error
> > > > > iova
> > > > > 0xf08e7000 level 2 slpte 0x2a54008f7
> > > >
> > > > This line should not exist in latest QEMU.  Are you sure you're using
> > > > the latest QEMU?
> > >
> > > I moved now to QEMU 4.0 RC2.
> > > This is the what I get now:
> > > vtd_iotlb_cc_update IOTLB context update bus 0x1 devfn 0x1 high 0x102
> low
> > > 0x2f007003 gen 0 -> gen 1
> > > qemu-system-x86_64: vtd_iova_to_slpte: detected splte reserve non-zero
> > > iova=0xf0d29000, level=0x2slpte=0x29f6008f7) vtd_fault_disabled Fault
> > > processing disabled for context entry
> > > qemu-system-x86_64: vtd_iommu_translate: detected translation failure
> > > (dev=01:00:01, iova=0xf0d29000) Unassigned mem read
> f0d29000
> > >
> > > I'm not familiar with vIOMMU registers, but I noticed that I must report
> > > snoop control support to Hyper-V (i.e. bit 7 in extended capability 
> > > register
> of
> > > vIOMMU) in-order to satisfy IOMMU support for SRIOV.
> > > vIOMMU.ecap before0xf00f5e
> > > vIOMMU.ecap after   0xf00fde
> > > But I see that vIOMMU doesn't really support snoop control.
> > > Could this be the problem that fails IOVA range check in this function
> > > vtd_iova_range_check()?
> >
> > Sorry, I meant the SLPTE reserved non-zero check failure in
> vtd_slpte_nonzero_rsvd()
> > And NOT IOVA range check failure (since range check didn't fail)
> 
> Probably.  Currently VT-d emulation does not support snooping control,
> and if you modify that ecap only you probably will encounter this
> problem because then the guest kernel will setup the SNP bit in the
> IOMMU page table entries which will violate the reserved bits in the
> emulation code then you can see these errors.
> 
> Now talking about implementing the Snoop Control for Intel IOMMU for
> real (which corresponds to vt-d ecap bit 7) - I'd confess I'm not 100%
> clear on what does the "snooping" mean and what we need to do as an
> emulator. I'm quotting from spec:
> 
>   "Snoop behavior for a memory access (to a translation structure
>   entry or access to the mapped page) specifies if the access is
>   coherent (snoops the processor caches) or not."
> 
> If it is only a capability showing that whether the hardware is
> capable of snooping processor caches, then I don't think we need to do
> much here as an emulator of VT-d simply because when we access the
> data we're still from the processor's side (because we're emulating
> the IOMMU behavior only) so the cache should always been coherent from
> the POV of guest vCPUs, just like how the processors provide cache
> coherence between two cores (so IMHO here the VT-d emulation code can
> be run on one core/thread, and the vcpu which runs the guest iommu
> driver can be run on another core/thread).  If so, maybe we can simply
> declare support of that but we at least also need to remove the SNP
> bit from vtd_paging_entry_rsvd_field[] array to reflect that we
> understand that bit.
> 
> CCing Alex and Kevin to see whether I'm misunderstanding or in case of
> any further input on the snooping support.
> 

for software DMA yes snoop is guaranteed since it's just CPU access.

However for VFIO device i.e. hardware DMA, snoop should be reported
based on physical IOMMU capability. It's fine to report no snoop control 
on vIOMMU (current state) even when it's physically supported. It just 
results that L1 VMM must favor guest cache attributes instead of forcing 
WB in L1 EPT when doing nested passthrough. However it's incorrect to 
report snoop control on vIOMMU when physically it's not supported, 
otherwise L1 VMM may force WB in L1 EPT and enable snoop field in
vIOMMU 2nd level PTE with assumption that hardware snoop is guaranteed
(however it isn't). Then it becomes a correctness issue.

The thing is a bit tricky regarding to two VFIO devices which are under
t

Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-04-04 Thread Peter Xu
On Wed, Apr 03, 2019 at 10:10:35PM +, Elijah Shakkour wrote:

[...]

> > > > > > > You can also try to enable VT-d device log by appending:
> > > > > > >
> > > > > > >   -trace enable="vtd_*"
> > > > > > >
> > > > > > > In case it dumps anything useful for you.
> > > >
> > > > Here is the relevant dump (dev 01:00.01 is my VF):
> > > > "
> > > > vtd_inv_desc_cc_device context invalidate device 01:00.01
> > > > vtd_ce_not_present Context entry bus 1 devfn 1 not present
> > > > vtd_switch_address_space Device 01:00.1 switching address space
> > > > (iommu
> > > > enabled=1) vtd_ce_not_present Context entry bus 1 devfn 1 not
> > > > present vtd_err Detected invalid context entry when trying to sync
> > > > shadow page table
> > >
> > > These lines mean that the guest sent a device invalidation to your VF
> > > but the IOMMU found that the device context entry is missing.
> > >
> > > > vtd_iotlb_cc_update IOTLB context update bus 0x1 devfn 0x1 high
> > > > 0x102 low 0x2d007003 gen 0 -> gen 2 vtd_err_dmar_slpte_resv_error
> > > > iova
> > > > 0xf08e7000 level 2 slpte 0x2a54008f7
> > >
> > > This line should not exist in latest QEMU.  Are you sure you're using
> > > the latest QEMU?
> > 
> > I moved now to QEMU 4.0 RC2.
> > This is the what I get now:
> > vtd_iotlb_cc_update IOTLB context update bus 0x1 devfn 0x1 high 0x102 low
> > 0x2f007003 gen 0 -> gen 1
> > qemu-system-x86_64: vtd_iova_to_slpte: detected splte reserve non-zero
> > iova=0xf0d29000, level=0x2slpte=0x29f6008f7) vtd_fault_disabled Fault
> > processing disabled for context entry
> > qemu-system-x86_64: vtd_iommu_translate: detected translation failure
> > (dev=01:00:01, iova=0xf0d29000) Unassigned mem read f0d29000
> > 
> > I'm not familiar with vIOMMU registers, but I noticed that I must report
> > snoop control support to Hyper-V (i.e. bit 7 in extended capability 
> > register of
> > vIOMMU) in-order to satisfy IOMMU support for SRIOV.
> > vIOMMU.ecap before0xf00f5e
> > vIOMMU.ecap after   0xf00fde
> > But I see that vIOMMU doesn't really support snoop control.
> > Could this be the problem that fails IOVA range check in this function
> > vtd_iova_range_check()?
> 
> Sorry, I meant the SLPTE reserved non-zero check failure in 
> vtd_slpte_nonzero_rsvd()
> And NOT IOVA range check failure (since range check didn't fail)

Probably.  Currently VT-d emulation does not support snooping control,
and if you modify that ecap only you probably will encounter this
problem because then the guest kernel will setup the SNP bit in the
IOMMU page table entries which will violate the reserved bits in the
emulation code then you can see these errors.

Now talking about implementing the Snoop Control for Intel IOMMU for
real (which corresponds to vt-d ecap bit 7) - I'd confess I'm not 100%
clear on what does the "snooping" mean and what we need to do as an
emulator. I'm quotting from spec:

  "Snoop behavior for a memory access (to a translation structure
  entry or access to the mapped page) specifies if the access is
  coherent (snoops the processor caches) or not."

If it is only a capability showing that whether the hardware is
capable of snooping processor caches, then I don't think we need to do
much here as an emulator of VT-d simply because when we access the
data we're still from the processor's side (because we're emulating
the IOMMU behavior only) so the cache should always been coherent from
the POV of guest vCPUs, just like how the processors provide cache
coherence between two cores (so IMHO here the VT-d emulation code can
be run on one core/thread, and the vcpu which runs the guest iommu
driver can be run on another core/thread).  If so, maybe we can simply
declare support of that but we at least also need to remove the SNP
bit from vtd_paging_entry_rsvd_field[] array to reflect that we
understand that bit.

CCing Alex and Kevin to see whether I'm misunderstanding or in case of
any further input on the snooping support.

Regards,

-- 
Peter Xu



Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-04-03 Thread Elijah Shakkour


> -Original Message-
> From: Elijah Shakkour
> Sent: Thursday, April 4, 2019 12:57 AM
> To: 'Peter Xu' 
> Cc: Knut Omang ; Michael S. Tsirkin
> ; Alex Williamson ;
> Marcel Apfelbaum ; Stefan Hajnoczi
> ; qemu-devel@nongnu.org
> Subject: RE: QEMU and vIOMMU support for emulated VF passthrough to
> nested (L2) VM
> 
> 
> 
> > -Original Message-
> > From: Peter Xu 
> > Sent: Wednesday, April 3, 2019 5:40 AM
> > To: Elijah Shakkour 
> > Cc: Knut Omang ; Michael S. Tsirkin
> > ; Alex Williamson ;
> Marcel
> > Apfelbaum ; Stefan Hajnoczi
> > ; qemu-devel@nongnu.org
> > Subject: Re: QEMU and vIOMMU support for emulated VF passthrough to
> > nested (L2) VM
> >
> > On Tue, Apr 02, 2019 at 03:41:10PM +, Elijah Shakkour wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Knut Omang 
> > > > Sent: Monday, April 1, 2019 5:24 PM
> > > > To: Elijah Shakkour ; Peter Xu
> > > > 
> > > > Cc: Michael S. Tsirkin ; Alex Williamson
> > > > ; Marcel Apfelbaum
> > > > ; Stefan Hajnoczi
> > ;
> > > > qemu-devel@nongnu.org
> > > > Subject: Re: QEMU and vIOMMU support for emulated VF passthrough
> > to
> > > > nested (L2) VM
> > > >
> > > > On Mon, 2019-04-01 at 14:01 +, Elijah Shakkour wrote:
> > > > >
> > > > > > -Original Message-
> > > > > > From: Peter Xu 
> > > > > > Sent: Monday, April 1, 2019 1:25 PM
> > > > > > To: Elijah Shakkour 
> > > > > > Cc: Knut Omang ; Michael S. Tsirkin
> > > > > > ; Alex Williamson
> > > > > > ; Marcel Apfelbaum
> > > > > > ; Stefan Hajnoczi
> > > > > > ; qemu-devel@nongnu.org
> > > > > > Subject: Re: QEMU and vIOMMU support for emulated VF
> > passthrough
> > > > to
> > > > > > nested (L2) VM
> > > > > >
> > > > > > On Mon, Apr 01, 2019 at 09:12:38AM +, Elijah Shakkour wrote:
> > > > > > >
> > > > > > >
> > > > > > > > -Original Message-
> > > > > > > > From: Peter Xu 
> > > > > > > > Sent: Monday, April 1, 2019 5:47 AM
> > > > > > > > To: Elijah Shakkour 
> > > > > > > > Cc: Knut Omang ; Michael S. Tsirkin
> > > > > > > > ; Alex Williamson
> > > > > > > > ; Marcel Apfelbaum
> > > > > > > > ; Stefan Hajnoczi
> > > > > > > > ; qemu-devel@nongnu.org
> > > > > > > > Subject: Re: QEMU and vIOMMU support for emulated VF
> > > > passthrough
> > > > > > to
> > > > > > > > nested (L2) VM
> > > > > > > >
> > > > > > > > On Sun, Mar 31, 2019 at 11:15:00AM +, Elijah Shakkour
> wrote:
> > > > > > > >
> > > > > > > > [...]
> > > > > > > >
> > > > > > > > > I didn't have DMA nor MMIO read/write working with my
> > > > > > > > > old command
> > > > > > > > line.
> > > > > > > > > But, when I removed all CPU flags and only provided
> > > > > > > > > "-cpu host", I see that
> > > > > > > > MMIO works.
> > > > > > > > > Still, DMA read/write from emulated device doesn't work for
> VF.
> > > > > > > > > For
> > > > > > > > example:
> > > > > > > > > Driver provides me a buffer pointer through MMIO write,
> > > > > > > > > this address
> > > > > > > > (pointer) is GPA of L2, and when I try to call
> > > > > > > > pci_dma_read() with this address I get:
> > > > > > > > > "
> > > > > > > > > Unassigned mem read  "
> > > > > > > >
> > > > > > > > I don't know where this error log was dumped but if it's
> > > > > > > > during DMA then I agree it can probably be related to vIOMMU.
> > > > > > > >
> > > > > > >
> > > > > > > This log is dumped from:
> > > > > > > memory.c: unassigned_mem_read()
> > > > > > >
> > > > > > > > > As I said, my problem now is in translation of L2 GPA
> > > > > > > > > provided by driver,
> > > > > > > > when I call DMA read/write for this address from VF.
> > > > > > > > > Any insights?
> > > > > > > >
> > > > > > > > I just noticed that you were using QEMU 2.12 [1].  If
> > > > > > > > that's the case, please rebase to the latest QEMU, at
> > > > > > > > least >=3.0 because there's major refactor of the shadow
> > > > > > > > logic during
> > > > > > > > 3.0 devel cycle
> > > > AFAICT.
> > > > > > > >
> > > > > > >
> > > > > > > Rebased to QEMU 3.1
> > > > > > > Now I see the address I'm trying to read from in log but
> > > > > > > still same
> > error:
> > > > > > > "
> > > > > > > Unassigned mem read f0481000 "
> > > > > > > What do you suggest?
> > > > > >
> > > > > > Would you please answer the questions that Knut asked?  Is it
> > > > > > working for L1 guest?  How about PF?
> > > > >
> > > > > Both VF and PF are working for L1 guest.
> > > > > I don't know how to passthrough PF to nested VM in hyper-v.
> > > >
> > > > On Linux passing through VFs and PFs are the same.
> > > > Maybe you can try passthrough with all Linux first? (first PF then VF) ?
> > > >
> > > > > I don't invoke VF manually in hyper-v and pass it through to
> > > > > nested VM. I use hyper-v manager to configure and provide a VF
> > > > > for nested VM (I can see the VF only in the nested VM).
> > > > >
> > > > > Did someone try to run emulated device in linux RH as nested L2
> > > > > where
> > > > > L1 is windows hyper-v? Does 

Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-04-03 Thread Elijah Shakkour


> -Original Message-
> From: Peter Xu 
> Sent: Wednesday, April 3, 2019 5:40 AM
> To: Elijah Shakkour 
> Cc: Knut Omang ; Michael S. Tsirkin
> ; Alex Williamson ;
> Marcel Apfelbaum ; Stefan Hajnoczi
> ; qemu-devel@nongnu.org
> Subject: Re: QEMU and vIOMMU support for emulated VF passthrough to
> nested (L2) VM
> 
> On Tue, Apr 02, 2019 at 03:41:10PM +, Elijah Shakkour wrote:
> >
> >
> > > -Original Message-
> > > From: Knut Omang 
> > > Sent: Monday, April 1, 2019 5:24 PM
> > > To: Elijah Shakkour ; Peter Xu
> > > 
> > > Cc: Michael S. Tsirkin ; Alex Williamson
> > > ; Marcel Apfelbaum
> > > ; Stefan Hajnoczi
> ;
> > > qemu-devel@nongnu.org
> > > Subject: Re: QEMU and vIOMMU support for emulated VF passthrough
> to
> > > nested (L2) VM
> > >
> > > On Mon, 2019-04-01 at 14:01 +, Elijah Shakkour wrote:
> > > >
> > > > > -Original Message-
> > > > > From: Peter Xu 
> > > > > Sent: Monday, April 1, 2019 1:25 PM
> > > > > To: Elijah Shakkour 
> > > > > Cc: Knut Omang ; Michael S. Tsirkin
> > > > > ; Alex Williamson ;
> > > > > Marcel Apfelbaum ; Stefan Hajnoczi
> > > > > ; qemu-devel@nongnu.org
> > > > > Subject: Re: QEMU and vIOMMU support for emulated VF
> passthrough
> > > to
> > > > > nested (L2) VM
> > > > >
> > > > > On Mon, Apr 01, 2019 at 09:12:38AM +, Elijah Shakkour wrote:
> > > > > >
> > > > > >
> > > > > > > -Original Message-
> > > > > > > From: Peter Xu 
> > > > > > > Sent: Monday, April 1, 2019 5:47 AM
> > > > > > > To: Elijah Shakkour 
> > > > > > > Cc: Knut Omang ; Michael S. Tsirkin
> > > > > > > ; Alex Williamson
> > > > > > > ; Marcel Apfelbaum
> > > > > > > ; Stefan Hajnoczi
> > > > > > > ; qemu-devel@nongnu.org
> > > > > > > Subject: Re: QEMU and vIOMMU support for emulated VF
> > > passthrough
> > > > > to
> > > > > > > nested (L2) VM
> > > > > > >
> > > > > > > On Sun, Mar 31, 2019 at 11:15:00AM +, Elijah Shakkour wrote:
> > > > > > >
> > > > > > > [...]
> > > > > > >
> > > > > > > > I didn't have DMA nor MMIO read/write working with my old
> > > > > > > > command
> > > > > > > line.
> > > > > > > > But, when I removed all CPU flags and only provided "-cpu
> > > > > > > > host", I see that
> > > > > > > MMIO works.
> > > > > > > > Still, DMA read/write from emulated device doesn't work for VF.
> > > > > > > > For
> > > > > > > example:
> > > > > > > > Driver provides me a buffer pointer through MMIO write,
> > > > > > > > this address
> > > > > > > (pointer) is GPA of L2, and when I try to call
> > > > > > > pci_dma_read() with this address I get:
> > > > > > > > "
> > > > > > > > Unassigned mem read  "
> > > > > > >
> > > > > > > I don't know where this error log was dumped but if it's
> > > > > > > during DMA then I agree it can probably be related to vIOMMU.
> > > > > > >
> > > > > >
> > > > > > This log is dumped from:
> > > > > > memory.c: unassigned_mem_read()
> > > > > >
> > > > > > > > As I said, my problem now is in translation of L2 GPA
> > > > > > > > provided by driver,
> > > > > > > when I call DMA read/write for this address from VF.
> > > > > > > > Any insights?
> > > > > > >
> > > > > > > I just noticed that you were using QEMU 2.12 [1].  If that's
> > > > > > > the case, please rebase to the latest QEMU, at least >=3.0
> > > > > > > because there's major refactor of the shadow logic during
> > > > > > > 3.0 devel cycle
> > > AFAICT.
> > > > > > >
> > > > > >
> > > > > > Rebased to QEMU 3.1
> > > > > > Now I see the address I'm trying to read from in log but still same
> error:
> > > > > > "
> > > > > > Unassigned mem read f0481000 "
> > > > > > What do you suggest?
> > > > >
> > > > > Would you please answer the questions that Knut asked?  Is it
> > > > > working for L1 guest?  How about PF?
> > > >
> > > > Both VF and PF are working for L1 guest.
> > > > I don't know how to passthrough PF to nested VM in hyper-v.
> > >
> > > On Linux passing through VFs and PFs are the same.
> > > Maybe you can try passthrough with all Linux first? (first PF then VF) ?
> > >
> > > > I don't invoke VF manually in hyper-v and pass it through to
> > > > nested VM. I use hyper-v manager to configure and provide a VF for
> > > > nested VM (I can see the VF only in the nested VM).
> > > >
> > > > Did someone try to run emulated device in linux RH as nested L2
> > > > where
> > > > L1 is windows hyper-v? Does DMA read/write work for this emulated
> > > device in this case?
> > >
> > > I have never tried that, I have only used Linux as L2, Windows might
> > > be pickier about what it expects, so starting with Linux to rule
> > > that out is probably a good idea.
> >
> > Will move to this solution after I/we give up 😊
> >
> > >
> > > > >
> > > > > You can also try to enable VT-d device log by appending:
> > > > >
> > > > >   -trace enable="vtd_*"
> > > > >
> > > > > In case it dumps anything useful for you.
> >
> > Here is the relevant dump (dev 01:00.01 is my VF):
> > "
> > vtd_inv_desc_cc_device context invalidate devic

Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-04-02 Thread Peter Xu
On Tue, Apr 02, 2019 at 03:41:10PM +, Elijah Shakkour wrote:
> 
> 
> > -Original Message-
> > From: Knut Omang 
> > Sent: Monday, April 1, 2019 5:24 PM
> > To: Elijah Shakkour ; Peter Xu
> > 
> > Cc: Michael S. Tsirkin ; Alex Williamson
> > ; Marcel Apfelbaum
> > ; Stefan Hajnoczi ;
> > qemu-devel@nongnu.org
> > Subject: Re: QEMU and vIOMMU support for emulated VF passthrough to
> > nested (L2) VM
> > 
> > On Mon, 2019-04-01 at 14:01 +, Elijah Shakkour wrote:
> > >
> > > > -Original Message-
> > > > From: Peter Xu 
> > > > Sent: Monday, April 1, 2019 1:25 PM
> > > > To: Elijah Shakkour 
> > > > Cc: Knut Omang ; Michael S. Tsirkin
> > > > ; Alex Williamson ;
> > > > Marcel Apfelbaum ; Stefan Hajnoczi
> > > > ; qemu-devel@nongnu.org
> > > > Subject: Re: QEMU and vIOMMU support for emulated VF passthrough
> > to
> > > > nested (L2) VM
> > > >
> > > > On Mon, Apr 01, 2019 at 09:12:38AM +, Elijah Shakkour wrote:
> > > > >
> > > > >
> > > > > > -Original Message-
> > > > > > From: Peter Xu 
> > > > > > Sent: Monday, April 1, 2019 5:47 AM
> > > > > > To: Elijah Shakkour 
> > > > > > Cc: Knut Omang ; Michael S. Tsirkin
> > > > > > ; Alex Williamson ;
> > > > > > Marcel Apfelbaum ; Stefan Hajnoczi
> > > > > > ; qemu-devel@nongnu.org
> > > > > > Subject: Re: QEMU and vIOMMU support for emulated VF
> > passthrough
> > > > to
> > > > > > nested (L2) VM
> > > > > >
> > > > > > On Sun, Mar 31, 2019 at 11:15:00AM +, Elijah Shakkour wrote:
> > > > > >
> > > > > > [...]
> > > > > >
> > > > > > > I didn't have DMA nor MMIO read/write working with my old
> > > > > > > command
> > > > > > line.
> > > > > > > But, when I removed all CPU flags and only provided "-cpu
> > > > > > > host", I see that
> > > > > > MMIO works.
> > > > > > > Still, DMA read/write from emulated device doesn't work for VF.
> > > > > > > For
> > > > > > example:
> > > > > > > Driver provides me a buffer pointer through MMIO write, this
> > > > > > > address
> > > > > > (pointer) is GPA of L2, and when I try to call pci_dma_read()
> > > > > > with this address I get:
> > > > > > > "
> > > > > > > Unassigned mem read  "
> > > > > >
> > > > > > I don't know where this error log was dumped but if it's during
> > > > > > DMA then I agree it can probably be related to vIOMMU.
> > > > > >
> > > > >
> > > > > This log is dumped from:
> > > > > memory.c: unassigned_mem_read()
> > > > >
> > > > > > > As I said, my problem now is in translation of L2 GPA provided
> > > > > > > by driver,
> > > > > > when I call DMA read/write for this address from VF.
> > > > > > > Any insights?
> > > > > >
> > > > > > I just noticed that you were using QEMU 2.12 [1].  If that's the
> > > > > > case, please rebase to the latest QEMU, at least >=3.0 because
> > > > > > there's major refactor of the shadow logic during 3.0 devel cycle
> > AFAICT.
> > > > > >
> > > > >
> > > > > Rebased to QEMU 3.1
> > > > > Now I see the address I'm trying to read from in log but still same 
> > > > > error:
> > > > > "
> > > > > Unassigned mem read f0481000 "
> > > > > What do you suggest?
> > > >
> > > > Would you please answer the questions that Knut asked?  Is it
> > > > working for L1 guest?  How about PF?
> > >
> > > Both VF and PF are working for L1 guest.
> > > I don't know how to passthrough PF to nested VM in hyper-v.
> > 
> > On Linux passing through VFs and PFs are the same.
> > Maybe you can try passthrough with all Linux first? (first PF then VF) ?
> > 
> > > I don't invoke VF manually in hyper-v and pass it through to nested
> > > VM. I use hyper-v manager to configure and provide a VF for nested VM
> > > (I can see the VF only in the nested VM).
> > >
> > > Did someone try to run emulated device in linux RH as nested L2 where
> > > L1 is windows hyper-v? Does DMA read/write work for this emulated
> > device in this case?
> > 
> > I have never tried that, I have only used Linux as L2, Windows might be
> > pickier about what it expects, so starting with Linux to rule that out is
> > probably a good idea.
> 
> Will move to this solution after I/we give up 😊
> 
> > 
> > > >
> > > > You can also try to enable VT-d device log by appending:
> > > >
> > > >   -trace enable="vtd_*"
> > > >
> > > > In case it dumps anything useful for you.
> 
> Here is the relevant dump (dev 01:00.01 is my VF):
> "
> vtd_inv_desc_cc_device context invalidate device 01:00.01
> vtd_ce_not_present Context entry bus 1 devfn 1 not present
> vtd_switch_address_space Device 01:00.1 switching address space (iommu 
> enabled=1)
> vtd_ce_not_present Context entry bus 1 devfn 1 not present
> vtd_err Detected invalid context entry when trying to sync shadow page table

These lines mean that the guest sent a device invalidation to your VF
but the IOMMU found that the device context entry is missing.

> vtd_iotlb_cc_update IOTLB context update bus 0x1 devfn 0x1 high 0x102 low 
> 0x2d007003 gen 0 -> gen 2
> vtd_err_dmar_slpte_resv_error iova 0xf08e7000 level 2 s

Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-04-02 Thread Elijah Shakkour


> -Original Message-
> From: Knut Omang 
> Sent: Monday, April 1, 2019 5:24 PM
> To: Elijah Shakkour ; Peter Xu
> 
> Cc: Michael S. Tsirkin ; Alex Williamson
> ; Marcel Apfelbaum
> ; Stefan Hajnoczi ;
> qemu-devel@nongnu.org
> Subject: Re: QEMU and vIOMMU support for emulated VF passthrough to
> nested (L2) VM
> 
> On Mon, 2019-04-01 at 14:01 +, Elijah Shakkour wrote:
> >
> > > -Original Message-
> > > From: Peter Xu 
> > > Sent: Monday, April 1, 2019 1:25 PM
> > > To: Elijah Shakkour 
> > > Cc: Knut Omang ; Michael S. Tsirkin
> > > ; Alex Williamson ;
> > > Marcel Apfelbaum ; Stefan Hajnoczi
> > > ; qemu-devel@nongnu.org
> > > Subject: Re: QEMU and vIOMMU support for emulated VF passthrough
> to
> > > nested (L2) VM
> > >
> > > On Mon, Apr 01, 2019 at 09:12:38AM +, Elijah Shakkour wrote:
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Peter Xu 
> > > > > Sent: Monday, April 1, 2019 5:47 AM
> > > > > To: Elijah Shakkour 
> > > > > Cc: Knut Omang ; Michael S. Tsirkin
> > > > > ; Alex Williamson ;
> > > > > Marcel Apfelbaum ; Stefan Hajnoczi
> > > > > ; qemu-devel@nongnu.org
> > > > > Subject: Re: QEMU and vIOMMU support for emulated VF
> passthrough
> > > to
> > > > > nested (L2) VM
> > > > >
> > > > > On Sun, Mar 31, 2019 at 11:15:00AM +, Elijah Shakkour wrote:
> > > > >
> > > > > [...]
> > > > >
> > > > > > I didn't have DMA nor MMIO read/write working with my old
> > > > > > command
> > > > > line.
> > > > > > But, when I removed all CPU flags and only provided "-cpu
> > > > > > host", I see that
> > > > > MMIO works.
> > > > > > Still, DMA read/write from emulated device doesn't work for VF.
> > > > > > For
> > > > > example:
> > > > > > Driver provides me a buffer pointer through MMIO write, this
> > > > > > address
> > > > > (pointer) is GPA of L2, and when I try to call pci_dma_read()
> > > > > with this address I get:
> > > > > > "
> > > > > > Unassigned mem read  "
> > > > >
> > > > > I don't know where this error log was dumped but if it's during
> > > > > DMA then I agree it can probably be related to vIOMMU.
> > > > >
> > > >
> > > > This log is dumped from:
> > > > memory.c: unassigned_mem_read()
> > > >
> > > > > > As I said, my problem now is in translation of L2 GPA provided
> > > > > > by driver,
> > > > > when I call DMA read/write for this address from VF.
> > > > > > Any insights?
> > > > >
> > > > > I just noticed that you were using QEMU 2.12 [1].  If that's the
> > > > > case, please rebase to the latest QEMU, at least >=3.0 because
> > > > > there's major refactor of the shadow logic during 3.0 devel cycle
> AFAICT.
> > > > >
> > > >
> > > > Rebased to QEMU 3.1
> > > > Now I see the address I'm trying to read from in log but still same 
> > > > error:
> > > > "
> > > > Unassigned mem read f0481000 "
> > > > What do you suggest?
> > >
> > > Would you please answer the questions that Knut asked?  Is it
> > > working for L1 guest?  How about PF?
> >
> > Both VF and PF are working for L1 guest.
> > I don't know how to passthrough PF to nested VM in hyper-v.
> 
> On Linux passing through VFs and PFs are the same.
> Maybe you can try passthrough with all Linux first? (first PF then VF) ?
> 
> > I don't invoke VF manually in hyper-v and pass it through to nested
> > VM. I use hyper-v manager to configure and provide a VF for nested VM
> > (I can see the VF only in the nested VM).
> >
> > Did someone try to run emulated device in linux RH as nested L2 where
> > L1 is windows hyper-v? Does DMA read/write work for this emulated
> device in this case?
> 
> I have never tried that, I have only used Linux as L2, Windows might be
> pickier about what it expects, so starting with Linux to rule that out is
> probably a good idea.

Will move to this solution after I/we give up 😊

> 
> > >
> > > You can also try to enable VT-d device log by appending:
> > >
> > >   -trace enable="vtd_*"
> > >
> > > In case it dumps anything useful for you.

Here is the relevant dump (dev 01:00.01 is my VF):
"
vtd_inv_desc_cc_device context invalidate device 01:00.01
vtd_ce_not_present Context entry bus 1 devfn 1 not present
vtd_switch_address_space Device 01:00.1 switching address space (iommu 
enabled=1)
vtd_ce_not_present Context entry bus 1 devfn 1 not present
vtd_err Detected invalid context entry when trying to sync shadow page table
vtd_iotlb_cc_update IOTLB context update bus 0x1 devfn 0x1 high 0x102 low 
0x2d007003 gen 0 -> gen 2
vtd_err_dmar_slpte_resv_error iova 0xf08e7000 level 2 slpte 0x2a54008f7
vtd_fault_disabled Fault processing disabled for context entry
vtd_err_dmar_translate dev 01:00.01 iova 0x0
Unassigned mem read f08e7000
"
What do you conclude from this dump?

> >
> > Is there a way to open those traces to be dumped to stdout/stderr on
> > the fly, instead of dtrace?
> 
> It's up to you what tracer(s) to configure when you build QEMU - check out
> docs/devel/tracing.txt . There's a few trace events defined in the S

Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-04-01 Thread Knut Omang
On Mon, 2019-04-01 at 14:01 +, Elijah Shakkour wrote:
> 
> > -Original Message-
> > From: Peter Xu 
> > Sent: Monday, April 1, 2019 1:25 PM
> > To: Elijah Shakkour 
> > Cc: Knut Omang ; Michael S. Tsirkin
> > ; Alex Williamson ;
> > Marcel Apfelbaum ; Stefan Hajnoczi
> > ; qemu-devel@nongnu.org
> > Subject: Re: QEMU and vIOMMU support for emulated VF passthrough to
> > nested (L2) VM
> > 
> > On Mon, Apr 01, 2019 at 09:12:38AM +, Elijah Shakkour wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Peter Xu 
> > > > Sent: Monday, April 1, 2019 5:47 AM
> > > > To: Elijah Shakkour 
> > > > Cc: Knut Omang ; Michael S. Tsirkin
> > > > ; Alex Williamson ;
> > > > Marcel Apfelbaum ; Stefan Hajnoczi
> > > > ; qemu-devel@nongnu.org
> > > > Subject: Re: QEMU and vIOMMU support for emulated VF passthrough
> > to
> > > > nested (L2) VM
> > > >
> > > > On Sun, Mar 31, 2019 at 11:15:00AM +, Elijah Shakkour wrote:
> > > >
> > > > [...]
> > > >
> > > > > I didn't have DMA nor MMIO read/write working with my old command
> > > > line.
> > > > > But, when I removed all CPU flags and only provided "-cpu host", I
> > > > > see that
> > > > MMIO works.
> > > > > Still, DMA read/write from emulated device doesn't work for VF.
> > > > > For
> > > > example:
> > > > > Driver provides me a buffer pointer through MMIO write, this
> > > > > address
> > > > (pointer) is GPA of L2, and when I try to call pci_dma_read() with
> > > > this address I get:
> > > > > "
> > > > > Unassigned mem read  "
> > > >
> > > > I don't know where this error log was dumped but if it's during DMA
> > > > then I agree it can probably be related to vIOMMU.
> > > >
> > >
> > > This log is dumped from:
> > > memory.c: unassigned_mem_read()
> > >
> > > > > As I said, my problem now is in translation of L2 GPA provided by
> > > > > driver,
> > > > when I call DMA read/write for this address from VF.
> > > > > Any insights?
> > > >
> > > > I just noticed that you were using QEMU 2.12 [1].  If that's the
> > > > case, please rebase to the latest QEMU, at least >=3.0 because
> > > > there's major refactor of the shadow logic during 3.0 devel cycle 
> > > > AFAICT.
> > > >
> > >
> > > Rebased to QEMU 3.1
> > > Now I see the address I'm trying to read from in log but still same error:
> > > "
> > > Unassigned mem read f0481000
> > > "
> > > What do you suggest?
> > 
> > Would you please answer the questions that Knut asked?  Is it working for L1
> > guest?  How about PF?
> 
> Both VF and PF are working for L1 guest.
> I don't know how to passthrough PF to nested VM in hyper-v.

On Linux passing through VFs and PFs are the same. 
Maybe you can try passthrough with all Linux first? (first PF then VF) ?

> I don't invoke VF manually in hyper-v and pass it through to nested VM. I use 
> hyper-v
> manager to configure and provide a VF for nested VM (I can see the VF only in 
> the nested
> VM).
> 
> Did someone try to run emulated device in linux RH as nested L2 where L1 is 
> windows
> hyper-v? Does DMA read/write work for this emulated device in this case?

I have never tried that, I have only used Linux as L2, Windows might be pickier 
about what
it expects, so starting with Linux to rule that out is probably a good idea.

> > 
> > You can also try to enable VT-d device log by appending:
> > 
> >   -trace enable="vtd_*"
> > 
> > In case it dumps anything useful for you.
> 
> Is there a way to open those traces to be dumped to stdout/stderr on the fly, 
> instead of
> dtrace?

It's up to you what tracer(s) to configure when you build QEMU - check out 
docs/devel/tracing.txt . There's a few trace events defined in the SR/IOV patch 
set, you
might want to enable them as well.

Knut

> > --
> > Peter Xu




Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-04-01 Thread Elijah Shakkour


> -Original Message-
> From: Peter Xu 
> Sent: Monday, April 1, 2019 1:25 PM
> To: Elijah Shakkour 
> Cc: Knut Omang ; Michael S. Tsirkin
> ; Alex Williamson ;
> Marcel Apfelbaum ; Stefan Hajnoczi
> ; qemu-devel@nongnu.org
> Subject: Re: QEMU and vIOMMU support for emulated VF passthrough to
> nested (L2) VM
> 
> On Mon, Apr 01, 2019 at 09:12:38AM +, Elijah Shakkour wrote:
> >
> >
> > > -Original Message-
> > > From: Peter Xu 
> > > Sent: Monday, April 1, 2019 5:47 AM
> > > To: Elijah Shakkour 
> > > Cc: Knut Omang ; Michael S. Tsirkin
> > > ; Alex Williamson ;
> > > Marcel Apfelbaum ; Stefan Hajnoczi
> > > ; qemu-devel@nongnu.org
> > > Subject: Re: QEMU and vIOMMU support for emulated VF passthrough
> to
> > > nested (L2) VM
> > >
> > > On Sun, Mar 31, 2019 at 11:15:00AM +, Elijah Shakkour wrote:
> > >
> > > [...]
> > >
> > > > I didn't have DMA nor MMIO read/write working with my old command
> > > line.
> > > > But, when I removed all CPU flags and only provided "-cpu host", I
> > > > see that
> > > MMIO works.
> > > > Still, DMA read/write from emulated device doesn't work for VF.
> > > > For
> > > example:
> > > > Driver provides me a buffer pointer through MMIO write, this
> > > > address
> > > (pointer) is GPA of L2, and when I try to call pci_dma_read() with
> > > this address I get:
> > > > "
> > > > Unassigned mem read  "
> > >
> > > I don't know where this error log was dumped but if it's during DMA
> > > then I agree it can probably be related to vIOMMU.
> > >
> >
> > This log is dumped from:
> > memory.c: unassigned_mem_read()
> >
> > > > As I said, my problem now is in translation of L2 GPA provided by
> > > > driver,
> > > when I call DMA read/write for this address from VF.
> > > > Any insights?
> > >
> > > I just noticed that you were using QEMU 2.12 [1].  If that's the
> > > case, please rebase to the latest QEMU, at least >=3.0 because
> > > there's major refactor of the shadow logic during 3.0 devel cycle AFAICT.
> > >
> >
> > Rebased to QEMU 3.1
> > Now I see the address I'm trying to read from in log but still same error:
> > "
> > Unassigned mem read f0481000
> > "
> > What do you suggest?
> 
> Would you please answer the questions that Knut asked?  Is it working for L1
> guest?  How about PF?

Both VF and PF are working for L1 guest.
I don't know how to passthrough PF to nested VM in hyper-v.
I don't invoke VF manually in hyper-v and pass it through to nested VM. I use 
hyper-v manager to configure and provide a VF for nested VM (I can see the VF 
only in the nested VM).

Did someone try to run emulated device in linux RH as nested L2 where L1 is 
windows hyper-v? Does DMA read/write work for this emulated device in this case?

> 
> You can also try to enable VT-d device log by appending:
> 
>   -trace enable="vtd_*"
> 
> In case it dumps anything useful for you.

Is there a way to open those traces to be dumped to stdout/stderr on the fly, 
instead of dtrace?

> 
> --
> Peter Xu


Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-04-01 Thread Peter Xu
On Mon, Apr 01, 2019 at 09:12:38AM +, Elijah Shakkour wrote:
> 
> 
> > -Original Message-
> > From: Peter Xu 
> > Sent: Monday, April 1, 2019 5:47 AM
> > To: Elijah Shakkour 
> > Cc: Knut Omang ; Michael S. Tsirkin
> > ; Alex Williamson ;
> > Marcel Apfelbaum ; Stefan Hajnoczi
> > ; qemu-devel@nongnu.org
> > Subject: Re: QEMU and vIOMMU support for emulated VF passthrough to
> > nested (L2) VM
> > 
> > On Sun, Mar 31, 2019 at 11:15:00AM +, Elijah Shakkour wrote:
> > 
> > [...]
> > 
> > > I didn't have DMA nor MMIO read/write working with my old command
> > line.
> > > But, when I removed all CPU flags and only provided "-cpu host", I see 
> > > that
> > MMIO works.
> > > Still, DMA read/write from emulated device doesn't work for VF. For
> > example:
> > > Driver provides me a buffer pointer through MMIO write, this address
> > (pointer) is GPA of L2, and when I try to call pci_dma_read() with this 
> > address
> > I get:
> > > "
> > > Unassigned mem read 
> > > "
> > 
> > I don't know where this error log was dumped but if it's during DMA then I
> > agree it can probably be related to vIOMMU.
> > 
> 
> This log is dumped from:
> memory.c: unassigned_mem_read()
> 
> > > As I said, my problem now is in translation of L2 GPA provided by driver,
> > when I call DMA read/write for this address from VF.
> > > Any insights?
> > 
> > I just noticed that you were using QEMU 2.12 [1].  If that's the case, 
> > please
> > rebase to the latest QEMU, at least >=3.0 because there's major refactor of
> > the shadow logic during 3.0 devel cycle AFAICT.
> > 
> 
> Rebased to QEMU 3.1
> Now I see the address I'm trying to read from in log but still same error:
> "
> Unassigned mem read f0481000
> "
> What do you suggest?

Would you please answer the questions that Knut asked?  Is it working
for L1 guest?  How about PF?

You can also try to enable VT-d device log by appending:

  -trace enable="vtd_*"

In case it dumps anything useful for you.

-- 
Peter Xu



Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-04-01 Thread Elijah Shakkour


> -Original Message-
> From: Peter Xu 
> Sent: Monday, April 1, 2019 5:47 AM
> To: Elijah Shakkour 
> Cc: Knut Omang ; Michael S. Tsirkin
> ; Alex Williamson ;
> Marcel Apfelbaum ; Stefan Hajnoczi
> ; qemu-devel@nongnu.org
> Subject: Re: QEMU and vIOMMU support for emulated VF passthrough to
> nested (L2) VM
> 
> On Sun, Mar 31, 2019 at 11:15:00AM +, Elijah Shakkour wrote:
> 
> [...]
> 
> > I didn't have DMA nor MMIO read/write working with my old command
> line.
> > But, when I removed all CPU flags and only provided "-cpu host", I see that
> MMIO works.
> > Still, DMA read/write from emulated device doesn't work for VF. For
> example:
> > Driver provides me a buffer pointer through MMIO write, this address
> (pointer) is GPA of L2, and when I try to call pci_dma_read() with this 
> address
> I get:
> > "
> > Unassigned mem read 
> > "
> 
> I don't know where this error log was dumped but if it's during DMA then I
> agree it can probably be related to vIOMMU.
> 

This log is dumped from:
memory.c: unassigned_mem_read()

> > As I said, my problem now is in translation of L2 GPA provided by driver,
> when I call DMA read/write for this address from VF.
> > Any insights?
> 
> I just noticed that you were using QEMU 2.12 [1].  If that's the case, please
> rebase to the latest QEMU, at least >=3.0 because there's major refactor of
> the shadow logic during 3.0 devel cycle AFAICT.
> 

Rebased to QEMU 3.1
Now I see the address I'm trying to read from in log but still same error:
"
Unassigned mem read f0481000
"
What do you suggest?

> > > > > > > > I'm using Knut Omang SRIOV patches rebased to QEMU v2.12.
> 
> [1]
> 
> Regards,
> 
> --
> Peter Xu


Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-03-31 Thread Peter Xu
On Sun, Mar 31, 2019 at 11:15:00AM +, Elijah Shakkour wrote:

[...]

> I didn't have DMA nor MMIO read/write working with my old command line.
> But, when I removed all CPU flags and only provided "-cpu host", I see that 
> MMIO works.
> Still, DMA read/write from emulated device doesn't work for VF. For example:
> Driver provides me a buffer pointer through MMIO write, this address 
> (pointer) is GPA of L2, and when I try to call pci_dma_read() with this 
> address I get:
> "
> Unassigned mem read 
> "

I don't know where this error log was dumped but if it's during DMA
then I agree it can probably be related to vIOMMU.

> As I said, my problem now is in translation of L2 GPA provided by driver, 
> when I call DMA read/write for this address from VF.
> Any insights?

I just noticed that you were using QEMU 2.12 [1].  If that's the case,
please rebase to the latest QEMU, at least >=3.0 because there's major
refactor of the shadow logic during 3.0 devel cycle AFAICT.

> > > > > > > I'm using Knut Omang SRIOV patches rebased to QEMU v2.12.

[1]

Regards,

-- 
Peter Xu



Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-03-31 Thread Elijah Shakkour


> -Original Message-
> From: Peter Xu 
> Sent: Wednesday, March 27, 2019 10:43 AM
> To: Knut Omang 
> Cc: Elijah Shakkour ; Michael S. Tsirkin
> ; Alex Williamson ;
> Marcel Apfelbaum ; Stefan Hajnoczi
> ; qemu-devel@nongnu.org
> Subject: Re: QEMU and vIOMMU support for emulated VF passthrough to
> nested (L2) VM
> 
> On Wed, Mar 27, 2019 at 08:57:56AM +0100, Knut Omang wrote:
> > On Wed, 2019-03-27 at 14:41 +0800, Peter Xu wrote:
> > > On Tue, Mar 26, 2019 at 01:23:12PM +, Elijah Shakkour wrote:
> > > > Adding QEMU-devel
> > >
> > > Hi, Elijah,
> > >
> > > >
> > > > -Original Message-
> > > > From: Michael S. Tsirkin 
> > > > Sent: Tuesday, March 26, 2019 2:53 PM
> > > > To: Elijah Shakkour 
> > > > Cc: Knut Omang ; Alex Williamson
> > > > ;
> > > Marcel Apfelbaum ; Stefan Hajnoczi
> > > ; pet...@redhat.com
> > > > Subject: Re: QEMU and vIOMMU support for emulated VF passthrough
> > > > to nested (L2) VM
> > > >
> > > > I think you forgot to copy the qemu mailing list.
> > > >
> > > > On Tue, Mar 26, 2019 at 10:08:17AM +, Elijah Shakkour wrote:
> > > > > My questions are:
> > > > >
> > > > > - Suppose that there is an emulated NIC that supports SRIOV (I
> > > > > implemented such a
> > > NIC), now does QEMU support a scenario of an emulated NIC that
> > > supports SRIOV in Hyper-V
> > > L1 guest, that invokes VF and pass it to nested linux L2 guest?
> > >
> > > I am not an expert of SR-IOV but I can't see a limitation to not
> > > allow that to happen.
> > >
> > > > > - I'm using vIOMMU in L1, so what is needed to be done in QEMU
> > > > > or maybe in emulated
> > > NIC PF/VF to allow DMA remapping and INT remapping work as
> expected?
> > >
> > > Your below command line should work, and even it seems to be an
> > > overkill.

I didn't have DMA nor MMIO read/write working with my old command line.
But, when I removed all CPU flags and only provided "-cpu host", I see that 
MMIO works.
Still, DMA read/write from emulated device doesn't work for VF. For example:
Driver provides me a buffer pointer through MMIO write, this address (pointer) 
is GPA of L2, and when I try to call pci_dma_read() with this address I get:
"
Unassigned mem read 
"
I expected this address to be translated correctly in the vIOMMU.
Is there something that I'm missing here?

> > >
> > > If your device is completely emulated, IIUC you only simply need
> > > this on the latest QEMU:
> > >
> > >   -M q35 -device intel-iommu
> > >
> > > Split-irqchip and IR is on by default now, so you'll naturally gain
> > > x2apic if it's supported.  You can use x-aw-bits but only if you
> > > really need address space beyond 39 bits (which I suspect).  The
> > > rest parameters are optional too.

What do you mean by "completely emulated"?
I think that, for the sake of our talk, all I need is configuration space and 
MMIO call backs, and MSIX, PCIe capability in both PF and VF and SRIOV 
capability in PF.

> > >
> > > > > - Does the command line below -that I use to run QEMU- seem ok
> > > > > for the scenario I
> > > described to work?
> > >
> > > Before I look into details of the cmdline - I'd say MMIO in L2
> > > should have nothing to do with IOMMU...
> >
> > The addresses used in L2 is the GPAs of the L2, which would typically
> > be different from the L2 HPAs == L1 GPAs, so I think the IOMMU mappings
> must work.
> >
> > You would need something like 'intel_iommu=on iommu=pt' as boot
> parameters for L1.
> 
> Yes, the IOMMU must work to do the assignment.  What I meant was that
> IOMMU should not be in the code path of MMIO accesses even for L2.
> IIUC that's the processor who reads/writes to the memory region and if it's a
> MMIO issue then it probably has little to do with IOMMU.
> 
> Thanks,

As I said, MMIO works, but when I try to do DMA from VF device to GPA in L2 it 
fails.
I expected this DMA read to work fine without any changes to be done.

> 
> >
> > > Are you sure the MMIO traps are
> > > setup correctly?  Can the VF do IO properly even without L2?

The answer is Yes for both questions.

> >
> > I agree with Peter that just running the VF as another function in L1
> > would be good to test before trying to get L2 passthrough to work.
> >
> > I recommend you also verify that passing the PF through works as
> > expected, unless you already have done so.
> >
> > And do you see correct BAR address values in the lspci -vvv output in
> > the L2 instance?
> >
> > The SR/IOV logic is from QEMU's perspective just another device
> > instance apart from the differences in the BAR setup code, so if
> > passing through a non-virtual device works, and VF BAR addresses
> > appear right, I believe VFs should work as well.
> >
> > > Also I don't know whether there can be some tricks when you boot L2
> > > with vfio-pci when the device to assign is a VF.
> >
> > A lot has happened since I was actively using the SR/IOV patch set
> > myself so that might entirely be possible from my perspective.
> >
> > Thanks,
> 

Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-03-27 Thread Peter Xu
On Wed, Mar 27, 2019 at 08:57:56AM +0100, Knut Omang wrote:
> On Wed, 2019-03-27 at 14:41 +0800, Peter Xu wrote:
> > On Tue, Mar 26, 2019 at 01:23:12PM +, Elijah Shakkour wrote:
> > > Adding QEMU-devel
> > 
> > Hi, Elijah,
> > 
> > > 
> > > -Original Message-
> > > From: Michael S. Tsirkin  
> > > Sent: Tuesday, March 26, 2019 2:53 PM
> > > To: Elijah Shakkour 
> > > Cc: Knut Omang ; Alex Williamson 
> > > ;
> > Marcel Apfelbaum ; Stefan Hajnoczi 
> > ; 
> > pet...@redhat.com
> > > Subject: Re: QEMU and vIOMMU support for emulated VF passthrough to 
> > > nested (L2) VM
> > > 
> > > I think you forgot to copy the qemu mailing list.
> > > 
> > > On Tue, Mar 26, 2019 at 10:08:17AM +, Elijah Shakkour wrote:
> > > > My questions are:
> > > > 
> > > > - Suppose that there is an emulated NIC that supports SRIOV (I 
> > > > implemented such a
> > NIC), now does QEMU support a scenario of an emulated NIC that supports 
> > SRIOV in Hyper-V 
> > L1 guest, that invokes VF and pass it to nested linux L2 guest?
> > 
> > I am not an expert of SR-IOV but I can't see a limitation to not allow
> > that to happen.
> > 
> > > > - I'm using vIOMMU in L1, so what is needed to be done in QEMU or maybe 
> > > > in emulated
> > NIC PF/VF to allow DMA remapping and INT remapping work as expected?
> > 
> > Your below command line should work, and even it seems to be an
> > overkill.
> > 
> > If your device is completely emulated, IIUC you only simply need this
> > on the latest QEMU:
> > 
> >   -M q35 -device intel-iommu
> > 
> > Split-irqchip and IR is on by default now, so you'll naturally gain
> > x2apic if it's supported.  You can use x-aw-bits but only if you
> > really need address space beyond 39 bits (which I suspect).  The rest
> > parameters are optional too.
> > 
> > > > - Does the command line below -that I use to run QEMU- seem ok for the 
> > > > scenario I
> > described to work?
> > 
> > Before I look into details of the cmdline - I'd say MMIO in L2 should
> > have nothing to do with IOMMU...  
> 
> The addresses used in L2 is the GPAs of the L2, which would typically be 
> different from
> the L2 HPAs == L1 GPAs, so I think the IOMMU mappings must work.
> 
> You would need something like 'intel_iommu=on iommu=pt' as boot parameters 
> for L1.

Yes, the IOMMU must work to do the assignment.  What I meant was that
IOMMU should not be in the code path of MMIO accesses even for L2.
IIUC that's the processor who reads/writes to the memory region and if
it's a MMIO issue then it probably has little to do with IOMMU.

Thanks,

> 
> > Are you sure the MMIO traps are
> > setup correctly?  Can the VF do IO properly even without L2?
> 
> I agree with Peter that just running the VF as another function in L1 
> would be good to test before trying to get L2 passthrough to work.
> 
> I recommend you also verify that passing the PF through works as expected, 
> unless you already have done so.
> 
> And do you see correct BAR address values in the lspci -vvv output in the L2 
> instance?
> 
> The SR/IOV logic is from QEMU's perspective just another device instance 
> apart from the differences in the BAR setup code, so if passing through a 
> non-virtual device works, and VF BAR addresses appear right, 
> I believe VFs should work as well.
> 
> > Also I don't know whether there can be some tricks when you boot L2
> > with vfio-pci when the device to assign is a VF.
> 
> A lot has happened since I was actively using the SR/IOV patch set myself so 
> that might entirely be possible from my perspective.
> 
> Thanks,
> Knut 
> 
> > > > 
> > > > -Original Message-
> > > > From: Michael S. Tsirkin 
> > > > Sent: Monday, March 25, 2019 4:14 AM
> > > > To: Elijah Shakkour 
> > > > Cc: Knut Omang ; Alex Williamson 
> > > > ; Marcel Apfelbaum 
> > > > ; Stefan Hajnoczi 
> > > > Subject: Re: QEMU and vIOMMU support for emulated VF passthrough to 
> > > > nested (L2) VM
> > > > 
> > > > Pls post all questions on list.
> > > > I have a policy against answering off-list mail.
> > > > Cc Pter Xu might be a good idea, too.
> > > > 
> > > > On Sun, Mar 24, 2019 at 09:56:26PM +, Elijah Shakkour wrote:
> > > > > Hey,
> > > > > 
> > > > > I'm emulating Mellanox ConnectX-4 in QEMU and right now, I'm adding 
> > > > > SRIOV
> > capability.
> > > > > I'm using Knut Omang SRIOV patches rebased to QEMU v2.12.
> > > > > My server (L0) is Linux. L1 guest is Windows2016 Hyper-V and L2 guest 
> > > > > is Linux
> > RH7.2.
> > > > > I can see my device in L1 VM and I see the invocation of the VF via 
> > > > > SRIOV
> > capability.
> > > > > Inside L2 guest I see the virtual function in "lspci' command.
> > > > > But when driver of L2 guest issues MMIO read/write, my MMIO ops don't 
> > > > > get called.
> > > > > I implemented my VF basically like Omang SRIOV example patch.
> > > > > 
> > > > > Could you please shed some light on what you think I might be missing?
> > > > > 
> > > > > Here is the command line I run:
> > > > > 
> > >

Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-03-27 Thread Knut Omang
On Wed, 2019-03-27 at 14:41 +0800, Peter Xu wrote:
> On Tue, Mar 26, 2019 at 01:23:12PM +, Elijah Shakkour wrote:
> > Adding QEMU-devel
> 
> Hi, Elijah,
> 
> > 
> > -Original Message-
> > From: Michael S. Tsirkin  
> > Sent: Tuesday, March 26, 2019 2:53 PM
> > To: Elijah Shakkour 
> > Cc: Knut Omang ; Alex Williamson 
> > ;
> Marcel Apfelbaum ; Stefan Hajnoczi 
> ; 
> pet...@redhat.com
> > Subject: Re: QEMU and vIOMMU support for emulated VF passthrough to nested 
> > (L2) VM
> > 
> > I think you forgot to copy the qemu mailing list.
> > 
> > On Tue, Mar 26, 2019 at 10:08:17AM +, Elijah Shakkour wrote:
> > > My questions are:
> > > 
> > > - Suppose that there is an emulated NIC that supports SRIOV (I 
> > > implemented such a
> NIC), now does QEMU support a scenario of an emulated NIC that supports SRIOV 
> in Hyper-V 
> L1 guest, that invokes VF and pass it to nested linux L2 guest?
> 
> I am not an expert of SR-IOV but I can't see a limitation to not allow
> that to happen.
> 
> > > - I'm using vIOMMU in L1, so what is needed to be done in QEMU or maybe 
> > > in emulated
> NIC PF/VF to allow DMA remapping and INT remapping work as expected?
> 
> Your below command line should work, and even it seems to be an
> overkill.
> 
> If your device is completely emulated, IIUC you only simply need this
> on the latest QEMU:
> 
>   -M q35 -device intel-iommu
> 
> Split-irqchip and IR is on by default now, so you'll naturally gain
> x2apic if it's supported.  You can use x-aw-bits but only if you
> really need address space beyond 39 bits (which I suspect).  The rest
> parameters are optional too.
> 
> > > - Does the command line below -that I use to run QEMU- seem ok for the 
> > > scenario I
> described to work?
> 
> Before I look into details of the cmdline - I'd say MMIO in L2 should
> have nothing to do with IOMMU...  

The addresses used in L2 is the GPAs of the L2, which would typically be 
different from
the L2 HPAs == L1 GPAs, so I think the IOMMU mappings must work.

You would need something like 'intel_iommu=on iommu=pt' as boot parameters for 
L1.

> Are you sure the MMIO traps are
> setup correctly?  Can the VF do IO properly even without L2?

I agree with Peter that just running the VF as another function in L1 
would be good to test before trying to get L2 passthrough to work.

I recommend you also verify that passing the PF through works as expected, 
unless you already have done so.

And do you see correct BAR address values in the lspci -vvv output in the L2 
instance?

The SR/IOV logic is from QEMU's perspective just another device instance 
apart from the differences in the BAR setup code, so if passing through a 
non-virtual device works, and VF BAR addresses appear right, 
I believe VFs should work as well.

> Also I don't know whether there can be some tricks when you boot L2
> with vfio-pci when the device to assign is a VF.

A lot has happened since I was actively using the SR/IOV patch set myself so 
that might entirely be possible from my perspective.

Thanks,
Knut 

> > > 
> > > -Original Message-
> > > From: Michael S. Tsirkin 
> > > Sent: Monday, March 25, 2019 4:14 AM
> > > To: Elijah Shakkour 
> > > Cc: Knut Omang ; Alex Williamson 
> > > ; Marcel Apfelbaum 
> > > ; Stefan Hajnoczi 
> > > Subject: Re: QEMU and vIOMMU support for emulated VF passthrough to 
> > > nested (L2) VM
> > > 
> > > Pls post all questions on list.
> > > I have a policy against answering off-list mail.
> > > Cc Pter Xu might be a good idea, too.
> > > 
> > > On Sun, Mar 24, 2019 at 09:56:26PM +, Elijah Shakkour wrote:
> > > > Hey,
> > > > 
> > > > I'm emulating Mellanox ConnectX-4 in QEMU and right now, I'm adding 
> > > > SRIOV
> capability.
> > > > I'm using Knut Omang SRIOV patches rebased to QEMU v2.12.
> > > > My server (L0) is Linux. L1 guest is Windows2016 Hyper-V and L2 guest 
> > > > is Linux
> RH7.2.
> > > > I can see my device in L1 VM and I see the invocation of the VF via 
> > > > SRIOV
> capability.
> > > > Inside L2 guest I see the virtual function in "lspci' command.
> > > > But when driver of L2 guest issues MMIO read/write, my MMIO ops don't 
> > > > get called.
> > > > I implemented my VF basically like Omang SRIOV example patch.
> > > > 
> > > > Could you please shed some light on what you think I might be missing?
> > > > 
> > > > Here is the command line I run:
> > > > 
> > > > ./x86_64-softmmu/qemu-system-x86_64 \  -machine 
> > > > q35,accel=kvm,usb=off,dump-guest-core=off,kernel-irqchip=split \  -m 
> > > > 32G \  -smp 2 \  -enable-kvm \  -cpu 
> > > > host,vmx=on,ss=on,cx16=on,x2apic=on,hypervisor=on,lahf_lm=on,hv_time
> > > > ,h v_relaxed,hv_vapic,hv_spinlocks=0x1fff,kvm=on \  -vnc 
> > > > 127.0.0.1:0,to=99,id=default \  -drive
> > > > file=$IMAGE,format=qcow2,if=none,id=drive-sata0-0-0 \  -chardev
> > > > pty,id=charserial0 \  -device
> > > > intel-iommu,intremap=on,caching-mode=on,device-iotlb=on,eim=on,x-aw-
> > > > bi
> > > > ts=48 \  -device
> > > > 

Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-03-26 Thread Peter Xu
On Tue, Mar 26, 2019 at 01:23:12PM +, Elijah Shakkour wrote:
> Adding QEMU-devel

Hi, Elijah,

> 
> -Original Message-
> From: Michael S. Tsirkin  
> Sent: Tuesday, March 26, 2019 2:53 PM
> To: Elijah Shakkour 
> Cc: Knut Omang ; Alex Williamson 
> ; Marcel Apfelbaum ; 
> Stefan Hajnoczi ; pet...@redhat.com
> Subject: Re: QEMU and vIOMMU support for emulated VF passthrough to nested 
> (L2) VM
> 
> I think you forgot to copy the qemu mailing list.
> 
> On Tue, Mar 26, 2019 at 10:08:17AM +, Elijah Shakkour wrote:
> > My questions are:
> > 
> > - Suppose that there is an emulated NIC that supports SRIOV (I implemented 
> > such a NIC), now does QEMU support a scenario of an emulated NIC that 
> > supports SRIOV in Hyper-V L1 guest, that invokes VF and pass it to nested 
> > linux L2 guest?

I am not an expert of SR-IOV but I can't see a limitation to not allow
that to happen.

> > - I'm using vIOMMU in L1, so what is needed to be done in QEMU or maybe in 
> > emulated NIC PF/VF to allow DMA remapping and INT remapping work as 
> > expected?

Your below command line should work, and even it seems to be an
overkill.

If your device is completely emulated, IIUC you only simply need this
on the latest QEMU:

  -M q35 -device intel-iommu

Split-irqchip and IR is on by default now, so you'll naturally gain
x2apic if it's supported.  You can use x-aw-bits but only if you
really need address space beyond 39 bits (which I suspect).  The rest
parameters are optional too.

> > - Does the command line below -that I use to run QEMU- seem ok for the 
> > scenario I described to work?

Before I look into details of the cmdline - I'd say MMIO in L2 should
have nothing to do with IOMMU...  Are you sure the MMIO traps are
setup correctly?  Can the VF do IO properly even without L2?

Also I don't know whether there can be some tricks when you boot L2
with vfio-pci when the device to assign is a VF.

> > 
> > -Original Message-
> > From: Michael S. Tsirkin 
> > Sent: Monday, March 25, 2019 4:14 AM
> > To: Elijah Shakkour 
> > Cc: Knut Omang ; Alex Williamson 
> > ; Marcel Apfelbaum 
> > ; Stefan Hajnoczi 
> > Subject: Re: QEMU and vIOMMU support for emulated VF passthrough to 
> > nested (L2) VM
> > 
> > Pls post all questions on list.
> > I have a policy against answering off-list mail.
> > Cc Pter Xu might be a good idea, too.
> > 
> > On Sun, Mar 24, 2019 at 09:56:26PM +, Elijah Shakkour wrote:
> > > Hey,
> > > 
> > > I'm emulating Mellanox ConnectX-4 in QEMU and right now, I'm adding SRIOV 
> > > capability.
> > > I'm using Knut Omang SRIOV patches rebased to QEMU v2.12.
> > > My server (L0) is Linux. L1 guest is Windows2016 Hyper-V and L2 guest is 
> > > Linux RH7.2.
> > > I can see my device in L1 VM and I see the invocation of the VF via SRIOV 
> > > capability.
> > > Inside L2 guest I see the virtual function in "lspci' command.
> > > But when driver of L2 guest issues MMIO read/write, my MMIO ops don't get 
> > > called.
> > > I implemented my VF basically like Omang SRIOV example patch.
> > > 
> > > Could you please shed some light on what you think I might be missing?
> > > 
> > > Here is the command line I run:
> > > 
> > > ./x86_64-softmmu/qemu-system-x86_64 \  -machine 
> > > q35,accel=kvm,usb=off,dump-guest-core=off,kernel-irqchip=split \  -m 
> > > 32G \  -smp 2 \  -enable-kvm \  -cpu 
> > > host,vmx=on,ss=on,cx16=on,x2apic=on,hypervisor=on,lahf_lm=on,hv_time
> > > ,h v_relaxed,hv_vapic,hv_spinlocks=0x1fff,kvm=on \  -vnc 
> > > 127.0.0.1:0,to=99,id=default \  -drive
> > > file=$IMAGE,format=qcow2,if=none,id=drive-sata0-0-0 \  -chardev
> > > pty,id=charserial0 \  -device
> > > intel-iommu,intremap=on,caching-mode=on,device-iotlb=on,eim=on,x-aw-
> > > bi
> > > ts=48 \  -device
> > > ide-hd,bus=ide.0,drive=drive-sata0-0-0,id=sata0-0-0,bootindex=0 \ 
> > > -device 
> > > pcie-root-port,pref64-reserve=500M,slot=0,id=pcie_port.1,bus=pcie.0,
> > > mu
> > > ltifunction=on \  -netdev
> > > tap,id=tap5,ifname=tap5,script=no,downscript=no \  -device 
> > > connectx4,netdev=tap5,bus=pcie_port.1,multifunction=on

Regards,

-- 
Peter Xu



Re: [Qemu-devel] QEMU and vIOMMU support for emulated VF passthrough to nested (L2) VM

2019-03-26 Thread Elijah Shakkour
Adding QEMU-devel

-Original Message-
From: Michael S. Tsirkin  
Sent: Tuesday, March 26, 2019 2:53 PM
To: Elijah Shakkour 
Cc: Knut Omang ; Alex Williamson 
; Marcel Apfelbaum ; 
Stefan Hajnoczi ; pet...@redhat.com
Subject: Re: QEMU and vIOMMU support for emulated VF passthrough to nested (L2) 
VM

I think you forgot to copy the qemu mailing list.

On Tue, Mar 26, 2019 at 10:08:17AM +, Elijah Shakkour wrote:
> My questions are:
> 
> - Suppose that there is an emulated NIC that supports SRIOV (I implemented 
> such a NIC), now does QEMU support a scenario of an emulated NIC that 
> supports SRIOV in Hyper-V L1 guest, that invokes VF and pass it to nested 
> linux L2 guest?
> - I'm using vIOMMU in L1, so what is needed to be done in QEMU or maybe in 
> emulated NIC PF/VF to allow DMA remapping and INT remapping work as expected?
> - Does the command line below -that I use to run QEMU- seem ok for the 
> scenario I described to work?
> 
> -Original Message-
> From: Michael S. Tsirkin 
> Sent: Monday, March 25, 2019 4:14 AM
> To: Elijah Shakkour 
> Cc: Knut Omang ; Alex Williamson 
> ; Marcel Apfelbaum 
> ; Stefan Hajnoczi 
> Subject: Re: QEMU and vIOMMU support for emulated VF passthrough to 
> nested (L2) VM
> 
> Pls post all questions on list.
> I have a policy against answering off-list mail.
> Cc Pter Xu might be a good idea, too.
> 
> On Sun, Mar 24, 2019 at 09:56:26PM +, Elijah Shakkour wrote:
> > Hey,
> > 
> > I'm emulating Mellanox ConnectX-4 in QEMU and right now, I'm adding SRIOV 
> > capability.
> > I'm using Knut Omang SRIOV patches rebased to QEMU v2.12.
> > My server (L0) is Linux. L1 guest is Windows2016 Hyper-V and L2 guest is 
> > Linux RH7.2.
> > I can see my device in L1 VM and I see the invocation of the VF via SRIOV 
> > capability.
> > Inside L2 guest I see the virtual function in "lspci' command.
> > But when driver of L2 guest issues MMIO read/write, my MMIO ops don't get 
> > called.
> > I implemented my VF basically like Omang SRIOV example patch.
> > 
> > Could you please shed some light on what you think I might be missing?
> > 
> > Here is the command line I run:
> > 
> > ./x86_64-softmmu/qemu-system-x86_64 \  -machine 
> > q35,accel=kvm,usb=off,dump-guest-core=off,kernel-irqchip=split \  -m 
> > 32G \  -smp 2 \  -enable-kvm \  -cpu 
> > host,vmx=on,ss=on,cx16=on,x2apic=on,hypervisor=on,lahf_lm=on,hv_time
> > ,h v_relaxed,hv_vapic,hv_spinlocks=0x1fff,kvm=on \  -vnc 
> > 127.0.0.1:0,to=99,id=default \  -drive
> > file=$IMAGE,format=qcow2,if=none,id=drive-sata0-0-0 \  -chardev
> > pty,id=charserial0 \  -device
> > intel-iommu,intremap=on,caching-mode=on,device-iotlb=on,eim=on,x-aw-
> > bi
> > ts=48 \  -device
> > ide-hd,bus=ide.0,drive=drive-sata0-0-0,id=sata0-0-0,bootindex=0 \ 
> > -device 
> > pcie-root-port,pref64-reserve=500M,slot=0,id=pcie_port.1,bus=pcie.0,
> > mu
> > ltifunction=on \  -netdev
> > tap,id=tap5,ifname=tap5,script=no,downscript=no \  -device 
> > connectx4,netdev=tap5,bus=pcie_port.1,multifunction=on