Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support

2018-02-07 Thread Michael S. Tsirkin
On Wed, Feb 07, 2018 at 10:02:24AM -0800, Alexander Duyck wrote:
> On Wed, Feb 7, 2018 at 8:43 AM, Michael S. Tsirkin  wrote:
> > On Sun, Feb 04, 2018 at 01:49:46PM -0800, Alexander Duyck wrote:
> >> On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Bie  wrote:
> >> > On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote:
> >> >> On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
> >> >> > > The virtual IOMMU isn't supported by the accelerators for now.
> >> >> > > Because vhost-user currently lacks of an efficient way to share
> >> >> > > the IOMMU table in VM to vhost backend. That's why the software
> >> >> > > implementation of virtual IOMMU support in vhost-user backend
> >> >> > > can't support dynamic mapping well.
> >> >> > What exactly is meant by that? vIOMMU seems to work for people,
> >> >> > it's not that fast if you change mappings all the time,
> >> >> > but e.g. dpdk within guest doesn't.
> >> >>
> >> >> Yes, software implementation support dynamic mapping for sure. I think 
> >> >> the
> >> >> point is, current vhost-user backend can not program hardware IOMMU. So 
> >> >> it
> >> >> can not let hardware accelerator to cowork with software vIOMMU.
> >> >
> >> > Vhost-user backend can program hardware IOMMU. Currently
> >> > vhost-user backend (or more precisely the vDPA driver in
> >> > vhost-user backend) will use the memory table (delivered
> >> > by the VHOST_USER_SET_MEM_TABLE message) to program the
> >> > IOMMU via vfio, and that's why accelerators can use the
> >> > GPA (guest physical address) in descriptors directly.
> >> >
> >> > Theoretically, we can use the IOVA mapping info (delivered
> >> > by the VHOST_USER_IOTLB_MSG message) to program the IOMMU,
> >> > and accelerators will be able to use IOVA. But the problem
> >> > is that in vhost-user QEMU won't push all the IOVA mappings
> >> > to backend directly. Backend needs to ask for those info
> >> > when it meets a new IOVA. Such design and implementation
> >> > won't work well for dynamic mappings anyway and couldn't
> >> > be supported by hardware accelerators.
> >> >
> >> >> I think
> >> >> that's another call to implement the offloaded path inside qemu which 
> >> >> has
> >> >> complete support for vIOMMU co-operated VFIO.
> >> >
> >> > Yes, that's exactly what we want. After revisiting the
> >> > last paragraph in the commit message, I found it's not
> >> > really accurate. The practicability of dynamic mappings
> >> > support is a common issue for QEMU. It also exists for
> >> > vfio (hw/vfio in QEMU). If QEMU needs to trap all the
> >> > map/unmap events, the data path performance couldn't be
> >> > high. If we want to thoroughly fix this issue especially
> >> > for vfio (hw/vfio in QEMU), we need to have the offload
> >> > path Jason mentioned in QEMU. And I think accelerators
> >> > could use it too.
> >> >
> >> > Best regards,
> >> > Tiwei Bie
> >>
> >> I wonder if we couldn't look at coming up with an altered security
> >> model for the IOMMU drivers to address some of the performance issues
> >> seen with typical hardware IOMMU?
> >>
> >> In the case of most network devices, we seem to be moving toward a
> >> model where the Rx pages are mapped for an extended period of time and
> >> see a fairly high rate of reuse. As such pages mapped as being
> >> writable or read/write by the device are left mapped for an extended
> >> period of time while Tx pages, which are read only, are often
> >> mapped/unmapped since they are coming from some other location in the
> >> kernel beyond the driver's control.
> >>
> >> If we were to somehow come up with a model where the read-only(Tx)
> >> pages had access to a pre-allocated memory mapped address, and the
> >> read/write(descriptor rings), write-only(Rx) pages were provided with
> >> dynamic addresses we might be able to come up with a solution that
> >> would allow for fairly high network performance while at least
> >> protecting from memory corruption. The only issue it would open up is
> >> that the device would have the ability to read any/all memory on the
> >> guest. I was wondering about doing something like this with the vIOMMU
> >> with VFIO for the Intel NICs this way since an interface like igb,
> >> ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good
> >> performance under such a model and as long as the writable pages were
> >> being tracked by the vIOMMU. It could even allow for live migration
> >> support if the vIOMMU provided the info needed for migratable/dirty
> >> page tracking and we held off on migrating any of the dynamically
> >> mapped pages until after they were either unmapped or an FLR reset the
> >> device.
> >>
> >> Thanks.
> >>
> >> - Alex
> >
> >
> >
> > It might be a good idea to change the iommu instead - how about a
> > variant of strict in intel iommu which forces an IOTLB flush after
> > invalidating a writeable mapping but not a RO mapping?  Not sure what the
> > name would be - relaxed-ro?
> >
> 

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support

2018-02-07 Thread Alexander Duyck
On Wed, Feb 7, 2018 at 8:43 AM, Michael S. Tsirkin  wrote:
> On Sun, Feb 04, 2018 at 01:49:46PM -0800, Alexander Duyck wrote:
>> On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Bie  wrote:
>> > On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote:
>> >> On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
>> >> > > The virtual IOMMU isn't supported by the accelerators for now.
>> >> > > Because vhost-user currently lacks of an efficient way to share
>> >> > > the IOMMU table in VM to vhost backend. That's why the software
>> >> > > implementation of virtual IOMMU support in vhost-user backend
>> >> > > can't support dynamic mapping well.
>> >> > What exactly is meant by that? vIOMMU seems to work for people,
>> >> > it's not that fast if you change mappings all the time,
>> >> > but e.g. dpdk within guest doesn't.
>> >>
>> >> Yes, software implementation support dynamic mapping for sure. I think the
>> >> point is, current vhost-user backend can not program hardware IOMMU. So it
>> >> can not let hardware accelerator to cowork with software vIOMMU.
>> >
>> > Vhost-user backend can program hardware IOMMU. Currently
>> > vhost-user backend (or more precisely the vDPA driver in
>> > vhost-user backend) will use the memory table (delivered
>> > by the VHOST_USER_SET_MEM_TABLE message) to program the
>> > IOMMU via vfio, and that's why accelerators can use the
>> > GPA (guest physical address) in descriptors directly.
>> >
>> > Theoretically, we can use the IOVA mapping info (delivered
>> > by the VHOST_USER_IOTLB_MSG message) to program the IOMMU,
>> > and accelerators will be able to use IOVA. But the problem
>> > is that in vhost-user QEMU won't push all the IOVA mappings
>> > to backend directly. Backend needs to ask for those info
>> > when it meets a new IOVA. Such design and implementation
>> > won't work well for dynamic mappings anyway and couldn't
>> > be supported by hardware accelerators.
>> >
>> >> I think
>> >> that's another call to implement the offloaded path inside qemu which has
>> >> complete support for vIOMMU co-operated VFIO.
>> >
>> > Yes, that's exactly what we want. After revisiting the
>> > last paragraph in the commit message, I found it's not
>> > really accurate. The practicability of dynamic mappings
>> > support is a common issue for QEMU. It also exists for
>> > vfio (hw/vfio in QEMU). If QEMU needs to trap all the
>> > map/unmap events, the data path performance couldn't be
>> > high. If we want to thoroughly fix this issue especially
>> > for vfio (hw/vfio in QEMU), we need to have the offload
>> > path Jason mentioned in QEMU. And I think accelerators
>> > could use it too.
>> >
>> > Best regards,
>> > Tiwei Bie
>>
>> I wonder if we couldn't look at coming up with an altered security
>> model for the IOMMU drivers to address some of the performance issues
>> seen with typical hardware IOMMU?
>>
>> In the case of most network devices, we seem to be moving toward a
>> model where the Rx pages are mapped for an extended period of time and
>> see a fairly high rate of reuse. As such pages mapped as being
>> writable or read/write by the device are left mapped for an extended
>> period of time while Tx pages, which are read only, are often
>> mapped/unmapped since they are coming from some other location in the
>> kernel beyond the driver's control.
>>
>> If we were to somehow come up with a model where the read-only(Tx)
>> pages had access to a pre-allocated memory mapped address, and the
>> read/write(descriptor rings), write-only(Rx) pages were provided with
>> dynamic addresses we might be able to come up with a solution that
>> would allow for fairly high network performance while at least
>> protecting from memory corruption. The only issue it would open up is
>> that the device would have the ability to read any/all memory on the
>> guest. I was wondering about doing something like this with the vIOMMU
>> with VFIO for the Intel NICs this way since an interface like igb,
>> ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good
>> performance under such a model and as long as the writable pages were
>> being tracked by the vIOMMU. It could even allow for live migration
>> support if the vIOMMU provided the info needed for migratable/dirty
>> page tracking and we held off on migrating any of the dynamically
>> mapped pages until after they were either unmapped or an FLR reset the
>> device.
>>
>> Thanks.
>>
>> - Alex
>
>
>
> It might be a good idea to change the iommu instead - how about a
> variant of strict in intel iommu which forces an IOTLB flush after
> invalidating a writeable mapping but not a RO mapping?  Not sure what the
> name would be - relaxed-ro?
>
> This is probably easier than poking at the drivers and net core.
>
> Keeping the RX pages mapped in the IOMMU was envisioned for XDP.
> That might be a good place to start.

My plan is to update the Intel IOMMU driver first since it seems like
something that shouldn't 

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support

2018-02-07 Thread Michael S. Tsirkin
On Sun, Feb 04, 2018 at 01:49:46PM -0800, Alexander Duyck wrote:
> On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Bie  wrote:
> > On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote:
> >> On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
> >> > > The virtual IOMMU isn't supported by the accelerators for now.
> >> > > Because vhost-user currently lacks of an efficient way to share
> >> > > the IOMMU table in VM to vhost backend. That's why the software
> >> > > implementation of virtual IOMMU support in vhost-user backend
> >> > > can't support dynamic mapping well.
> >> > What exactly is meant by that? vIOMMU seems to work for people,
> >> > it's not that fast if you change mappings all the time,
> >> > but e.g. dpdk within guest doesn't.
> >>
> >> Yes, software implementation support dynamic mapping for sure. I think the
> >> point is, current vhost-user backend can not program hardware IOMMU. So it
> >> can not let hardware accelerator to cowork with software vIOMMU.
> >
> > Vhost-user backend can program hardware IOMMU. Currently
> > vhost-user backend (or more precisely the vDPA driver in
> > vhost-user backend) will use the memory table (delivered
> > by the VHOST_USER_SET_MEM_TABLE message) to program the
> > IOMMU via vfio, and that's why accelerators can use the
> > GPA (guest physical address) in descriptors directly.
> >
> > Theoretically, we can use the IOVA mapping info (delivered
> > by the VHOST_USER_IOTLB_MSG message) to program the IOMMU,
> > and accelerators will be able to use IOVA. But the problem
> > is that in vhost-user QEMU won't push all the IOVA mappings
> > to backend directly. Backend needs to ask for those info
> > when it meets a new IOVA. Such design and implementation
> > won't work well for dynamic mappings anyway and couldn't
> > be supported by hardware accelerators.
> >
> >> I think
> >> that's another call to implement the offloaded path inside qemu which has
> >> complete support for vIOMMU co-operated VFIO.
> >
> > Yes, that's exactly what we want. After revisiting the
> > last paragraph in the commit message, I found it's not
> > really accurate. The practicability of dynamic mappings
> > support is a common issue for QEMU. It also exists for
> > vfio (hw/vfio in QEMU). If QEMU needs to trap all the
> > map/unmap events, the data path performance couldn't be
> > high. If we want to thoroughly fix this issue especially
> > for vfio (hw/vfio in QEMU), we need to have the offload
> > path Jason mentioned in QEMU. And I think accelerators
> > could use it too.
> >
> > Best regards,
> > Tiwei Bie
> 
> I wonder if we couldn't look at coming up with an altered security
> model for the IOMMU drivers to address some of the performance issues
> seen with typical hardware IOMMU?
> 
> In the case of most network devices, we seem to be moving toward a
> model where the Rx pages are mapped for an extended period of time and
> see a fairly high rate of reuse. As such pages mapped as being
> writable or read/write by the device are left mapped for an extended
> period of time while Tx pages, which are read only, are often
> mapped/unmapped since they are coming from some other location in the
> kernel beyond the driver's control.
> 
> If we were to somehow come up with a model where the read-only(Tx)
> pages had access to a pre-allocated memory mapped address, and the
> read/write(descriptor rings), write-only(Rx) pages were provided with
> dynamic addresses we might be able to come up with a solution that
> would allow for fairly high network performance while at least
> protecting from memory corruption. The only issue it would open up is
> that the device would have the ability to read any/all memory on the
> guest. I was wondering about doing something like this with the vIOMMU
> with VFIO for the Intel NICs this way since an interface like igb,
> ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good
> performance under such a model and as long as the writable pages were
> being tracked by the vIOMMU. It could even allow for live migration
> support if the vIOMMU provided the info needed for migratable/dirty
> page tracking and we held off on migrating any of the dynamically
> mapped pages until after they were either unmapped or an FLR reset the
> device.
> 
> Thanks.
> 
> - Alex



It might be a good idea to change the iommu instead - how about a
variant of strict in intel iommu which forces an IOTLB flush after
invalidating a writeable mapping but not a RO mapping?  Not sure what the
name would be - relaxed-ro?

This is probably easier than poking at the drivers and net core.

Keeping the RX pages mapped in the IOMMU was envisioned for XDP.
That might be a good place to start.

-- 
MST



Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support

2018-02-04 Thread Alexander Duyck
On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Bie  wrote:
> On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote:
>> On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
>> > > The virtual IOMMU isn't supported by the accelerators for now.
>> > > Because vhost-user currently lacks of an efficient way to share
>> > > the IOMMU table in VM to vhost backend. That's why the software
>> > > implementation of virtual IOMMU support in vhost-user backend
>> > > can't support dynamic mapping well.
>> > What exactly is meant by that? vIOMMU seems to work for people,
>> > it's not that fast if you change mappings all the time,
>> > but e.g. dpdk within guest doesn't.
>>
>> Yes, software implementation support dynamic mapping for sure. I think the
>> point is, current vhost-user backend can not program hardware IOMMU. So it
>> can not let hardware accelerator to cowork with software vIOMMU.
>
> Vhost-user backend can program hardware IOMMU. Currently
> vhost-user backend (or more precisely the vDPA driver in
> vhost-user backend) will use the memory table (delivered
> by the VHOST_USER_SET_MEM_TABLE message) to program the
> IOMMU via vfio, and that's why accelerators can use the
> GPA (guest physical address) in descriptors directly.
>
> Theoretically, we can use the IOVA mapping info (delivered
> by the VHOST_USER_IOTLB_MSG message) to program the IOMMU,
> and accelerators will be able to use IOVA. But the problem
> is that in vhost-user QEMU won't push all the IOVA mappings
> to backend directly. Backend needs to ask for those info
> when it meets a new IOVA. Such design and implementation
> won't work well for dynamic mappings anyway and couldn't
> be supported by hardware accelerators.
>
>> I think
>> that's another call to implement the offloaded path inside qemu which has
>> complete support for vIOMMU co-operated VFIO.
>
> Yes, that's exactly what we want. After revisiting the
> last paragraph in the commit message, I found it's not
> really accurate. The practicability of dynamic mappings
> support is a common issue for QEMU. It also exists for
> vfio (hw/vfio in QEMU). If QEMU needs to trap all the
> map/unmap events, the data path performance couldn't be
> high. If we want to thoroughly fix this issue especially
> for vfio (hw/vfio in QEMU), we need to have the offload
> path Jason mentioned in QEMU. And I think accelerators
> could use it too.
>
> Best regards,
> Tiwei Bie

I wonder if we couldn't look at coming up with an altered security
model for the IOMMU drivers to address some of the performance issues
seen with typical hardware IOMMU?

In the case of most network devices, we seem to be moving toward a
model where the Rx pages are mapped for an extended period of time and
see a fairly high rate of reuse. As such pages mapped as being
writable or read/write by the device are left mapped for an extended
period of time while Tx pages, which are read only, are often
mapped/unmapped since they are coming from some other location in the
kernel beyond the driver's control.

If we were to somehow come up with a model where the read-only(Tx)
pages had access to a pre-allocated memory mapped address, and the
read/write(descriptor rings), write-only(Rx) pages were provided with
dynamic addresses we might be able to come up with a solution that
would allow for fairly high network performance while at least
protecting from memory corruption. The only issue it would open up is
that the device would have the ability to read any/all memory on the
guest. I was wondering about doing something like this with the vIOMMU
with VFIO for the Intel NICs this way since an interface like igb,
ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good
performance under such a model and as long as the writable pages were
being tracked by the vIOMMU. It could even allow for live migration
support if the vIOMMU provided the info needed for migratable/dirty
page tracking and we held off on migrating any of the dynamically
mapped pages until after they were either unmapped or an FLR reset the
device.

Thanks.

- Alex



Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support

2018-01-25 Thread Tiwei Bie
On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote:
> On 2018年01月26日 07:59, Michael S. Tsirkin wrote:
> > > The virtual IOMMU isn't supported by the accelerators for now.
> > > Because vhost-user currently lacks of an efficient way to share
> > > the IOMMU table in VM to vhost backend. That's why the software
> > > implementation of virtual IOMMU support in vhost-user backend
> > > can't support dynamic mapping well.
> > What exactly is meant by that? vIOMMU seems to work for people,
> > it's not that fast if you change mappings all the time,
> > but e.g. dpdk within guest doesn't.
> 
> Yes, software implementation support dynamic mapping for sure. I think the
> point is, current vhost-user backend can not program hardware IOMMU. So it
> can not let hardware accelerator to cowork with software vIOMMU.

Vhost-user backend can program hardware IOMMU. Currently
vhost-user backend (or more precisely the vDPA driver in
vhost-user backend) will use the memory table (delivered
by the VHOST_USER_SET_MEM_TABLE message) to program the
IOMMU via vfio, and that's why accelerators can use the
GPA (guest physical address) in descriptors directly.

Theoretically, we can use the IOVA mapping info (delivered
by the VHOST_USER_IOTLB_MSG message) to program the IOMMU,
and accelerators will be able to use IOVA. But the problem
is that in vhost-user QEMU won't push all the IOVA mappings
to backend directly. Backend needs to ask for those info
when it meets a new IOVA. Such design and implementation
won't work well for dynamic mappings anyway and couldn't
be supported by hardware accelerators.

> I think
> that's another call to implement the offloaded path inside qemu which has
> complete support for vIOMMU co-operated VFIO.

Yes, that's exactly what we want. After revisiting the
last paragraph in the commit message, I found it's not
really accurate. The practicability of dynamic mappings
support is a common issue for QEMU. It also exists for
vfio (hw/vfio in QEMU). If QEMU needs to trap all the
map/unmap events, the data path performance couldn't be
high. If we want to thoroughly fix this issue especially
for vfio (hw/vfio in QEMU), we need to have the offload
path Jason mentioned in QEMU. And I think accelerators
could use it too.

Best regards,
Tiwei Bie

> 
> Thanks
> 
> > 
> > > Once this problem is solved
> > > in vhost-user, virtual IOMMU can be supported by accelerators
> > > too, and the IOMMU feature bit checking in this patch can be
> > > removed.
> > Given it works with software backends right now, I suspect
> > this will be up to you guys to address.
> > 
> 



Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support

2018-01-25 Thread Jason Wang



On 2018年01月26日 07:59, Michael S. Tsirkin wrote:

The virtual IOMMU isn't supported by the accelerators for now.
Because vhost-user currently lacks of an efficient way to share
the IOMMU table in VM to vhost backend. That's why the software
implementation of virtual IOMMU support in vhost-user backend
can't support dynamic mapping well.

What exactly is meant by that? vIOMMU seems to work for people,
it's not that fast if you change mappings all the time,
but e.g. dpdk within guest doesn't.


Yes, software implementation support dynamic mapping for sure. I think 
the point is, current vhost-user backend can not program hardware IOMMU. 
So it can not let hardware accelerator to cowork with software vIOMMU. I 
think that's another call to implement the offloaded path inside qemu 
which has complete support for vIOMMU co-operated VFIO.


Thanks




Once this problem is solved
in vhost-user, virtual IOMMU can be supported by accelerators
too, and the IOMMU feature bit checking in this patch can be
removed.

Given it works with software backends right now, I suspect
this will be up to you guys to address.