Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-06 Thread Xiao Guangrong



On 07/06/2016 07:48 PM, Paolo Bonzini wrote:



On 06/07/2016 06:02, Xiao Guangrong wrote:




May I ask you what the exact issue you have with this interface for
Intel to support
your own GPU virtualization?


Intel's vGPU can work with this framework. We really appreciate your
/ nvidia's
contribution.


Then, I don't think we should embargo Paolo's patch.


This patchset is specific for the framework design, i.e, mapping memory when
fault happens rather than mmap(), and this design is exact what we are
discussing for nearly two days.


I disagree, this patch fixes a bug because what Neo is doing is legal.
It may not be the design that will be committed, but the bug they found
in KVM is real.



I just worried if we really need fault-on-demand for device memory, i.e,
if device memory overcommit is safe enough.

It lacks a graceful way to recover the workload if the resource is really
overloaded. Unlike with normal memory, host kernel and guest kernel can not
do anything except killing the VM under this case. So the VM get crashed
due to device emulation, that is not safe as the device can be accessed in
userspace even with unprivileged user, it is vulnerable in data center.



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-06 Thread Xiao Guangrong



On 07/06/2016 07:48 PM, Paolo Bonzini wrote:



On 06/07/2016 06:02, Xiao Guangrong wrote:




May I ask you what the exact issue you have with this interface for
Intel to support
your own GPU virtualization?


Intel's vGPU can work with this framework. We really appreciate your
/ nvidia's
contribution.


Then, I don't think we should embargo Paolo's patch.


This patchset is specific for the framework design, i.e, mapping memory when
fault happens rather than mmap(), and this design is exact what we are
discussing for nearly two days.


I disagree, this patch fixes a bug because what Neo is doing is legal.
It may not be the design that will be committed, but the bug they found
in KVM is real.



I just worried if we really need fault-on-demand for device memory, i.e,
if device memory overcommit is safe enough.

It lacks a graceful way to recover the workload if the resource is really
overloaded. Unlike with normal memory, host kernel and guest kernel can not
do anything except killing the VM under this case. So the VM get crashed
due to device emulation, that is not safe as the device can be accessed in
userspace even with unprivileged user, it is vulnerable in data center.



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-06 Thread Alex Williamson
On Wed, 6 Jul 2016 08:05:15 +0200
Paolo Bonzini  wrote:

> On 06/07/2016 04:00, Xiao Guangrong wrote:
> > 
> > 
> > On 07/05/2016 08:18 PM, Paolo Bonzini wrote:  
> >>
> >>
> >> On 05/07/2016 07:41, Neo Jia wrote:  
> >>> On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:  
>  The vGPU folks would like to trap the first access to a BAR by setting
>  vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
>  handler
>  then can use remap_pfn_range to place some non-reserved pages in the
>  VMA.
> 
>  KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and
>  these
>  patches should fix this.  
> >>>
> >>> Hi Paolo,
> >>>
> >>> I have tested your patches with the mediated passthru patchset that
> >>> is being
> >>> reviewed in KVM and QEMU mailing list.
> >>>
> >>> The fault handler gets called successfully and the previously mapped
> >>> memory gets
> >>> unmmaped correctly via unmap_mapping_range.  
> >>
> >> Great, then I'll include them in 4.8.  
> > 
> > Code is okay, but i still suspect if this implementation, fetch mmio 
> > pages in fault handler, is needed. We'd better include these patches
> > after the design of vfio framework is decided.  
> 
> I think that this fixes a bug anyway, the previous handling of VM_PFNMAP
> is too simplistic.


Agreed, no reason to hold off on this, it's a valid interaction that
needs to be fixed regardless of how or if the vfio mediated driver
makes use of it.  Thanks,

Alex


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-06 Thread Alex Williamson
On Wed, 6 Jul 2016 08:05:15 +0200
Paolo Bonzini  wrote:

> On 06/07/2016 04:00, Xiao Guangrong wrote:
> > 
> > 
> > On 07/05/2016 08:18 PM, Paolo Bonzini wrote:  
> >>
> >>
> >> On 05/07/2016 07:41, Neo Jia wrote:  
> >>> On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:  
>  The vGPU folks would like to trap the first access to a BAR by setting
>  vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
>  handler
>  then can use remap_pfn_range to place some non-reserved pages in the
>  VMA.
> 
>  KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and
>  these
>  patches should fix this.  
> >>>
> >>> Hi Paolo,
> >>>
> >>> I have tested your patches with the mediated passthru patchset that
> >>> is being
> >>> reviewed in KVM and QEMU mailing list.
> >>>
> >>> The fault handler gets called successfully and the previously mapped
> >>> memory gets
> >>> unmmaped correctly via unmap_mapping_range.  
> >>
> >> Great, then I'll include them in 4.8.  
> > 
> > Code is okay, but i still suspect if this implementation, fetch mmio 
> > pages in fault handler, is needed. We'd better include these patches
> > after the design of vfio framework is decided.  
> 
> I think that this fixes a bug anyway, the previous handling of VM_PFNMAP
> is too simplistic.


Agreed, no reason to hold off on this, it's a valid interaction that
needs to be fixed regardless of how or if the vfio mediated driver
makes use of it.  Thanks,

Alex


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-06 Thread Paolo Bonzini


On 06/07/2016 06:02, Xiao Guangrong wrote:
>>>

 May I ask you what the exact issue you have with this interface for
 Intel to support
 your own GPU virtualization?
>>>
>>> Intel's vGPU can work with this framework. We really appreciate your
>>> / nvidia's
>>> contribution.
>>
>> Then, I don't think we should embargo Paolo's patch.
> 
> This patchset is specific for the framework design, i.e, mapping memory when
> fault happens rather than mmap(), and this design is exact what we are
> discussing for nearly two days.

I disagree, this patch fixes a bug because what Neo is doing is legal.
It may not be the design that will be committed, but the bug they found
in KVM is real.

Paolo


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-06 Thread Paolo Bonzini


On 06/07/2016 06:02, Xiao Guangrong wrote:
>>>

 May I ask you what the exact issue you have with this interface for
 Intel to support
 your own GPU virtualization?
>>>
>>> Intel's vGPU can work with this framework. We really appreciate your
>>> / nvidia's
>>> contribution.
>>
>> Then, I don't think we should embargo Paolo's patch.
> 
> This patchset is specific for the framework design, i.e, mapping memory when
> fault happens rather than mmap(), and this design is exact what we are
> discussing for nearly two days.

I disagree, this patch fixes a bug because what Neo is doing is legal.
It may not be the design that will be committed, but the bug they found
in KVM is real.

Paolo


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-06 Thread Paolo Bonzini


On 06/07/2016 04:00, Xiao Guangrong wrote:
> 
> 
> On 07/05/2016 08:18 PM, Paolo Bonzini wrote:
>>
>>
>> On 05/07/2016 07:41, Neo Jia wrote:
>>> On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:
 The vGPU folks would like to trap the first access to a BAR by setting
 vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
 handler
 then can use remap_pfn_range to place some non-reserved pages in the
 VMA.

 KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and
 these
 patches should fix this.
>>>
>>> Hi Paolo,
>>>
>>> I have tested your patches with the mediated passthru patchset that
>>> is being
>>> reviewed in KVM and QEMU mailing list.
>>>
>>> The fault handler gets called successfully and the previously mapped
>>> memory gets
>>> unmmaped correctly via unmap_mapping_range.
>>
>> Great, then I'll include them in 4.8.
> 
> Code is okay, but i still suspect if this implementation, fetch mmio 
> pages in fault handler, is needed. We'd better include these patches
> after the design of vfio framework is decided.

I think that this fixes a bug anyway, the previous handling of VM_PFNMAP
is too simplistic.

Paolo


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-06 Thread Paolo Bonzini


On 06/07/2016 04:00, Xiao Guangrong wrote:
> 
> 
> On 07/05/2016 08:18 PM, Paolo Bonzini wrote:
>>
>>
>> On 05/07/2016 07:41, Neo Jia wrote:
>>> On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:
 The vGPU folks would like to trap the first access to a BAR by setting
 vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
 handler
 then can use remap_pfn_range to place some non-reserved pages in the
 VMA.

 KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and
 these
 patches should fix this.
>>>
>>> Hi Paolo,
>>>
>>> I have tested your patches with the mediated passthru patchset that
>>> is being
>>> reviewed in KVM and QEMU mailing list.
>>>
>>> The fault handler gets called successfully and the previously mapped
>>> memory gets
>>> unmmaped correctly via unmap_mapping_range.
>>
>> Great, then I'll include them in 4.8.
> 
> Code is okay, but i still suspect if this implementation, fetch mmio 
> pages in fault handler, is needed. We'd better include these patches
> after the design of vfio framework is decided.

I think that this fixes a bug anyway, the previous handling of VM_PFNMAP
is too simplistic.

Paolo


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Neo Jia
On Wed, Jul 06, 2016 at 10:22:59AM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/05/2016 11:07 PM, Neo Jia wrote:
> >This is kept there in case the validate_map_request() is not provided by 
> >vendor
> >driver then by default assume 1:1 mapping. So if validate_map_request() is 
> >not
> >provided, fault handler should not fail.
> 
> THESE are the parameters you passed to validate_map_request(), and these info 
> is
> available in mmap(), it really does not matter if you move 
> validate_map_request()
> to mmap(). That's what i want to say.

Let me answer this at the end of my response.

> 
> >
> >>
> >>>None of such information is available at VFIO mmap() time. For example, 
> >>>several VMs
> >>>are sharing the same physical device to provide mediated access. All VMs 
> >>>will
> >>>call the VFIO mmap() on their virtual BAR as part of QEMU vfio/pci 
> >>>initialization
> >>>process, at that moment, we definitely can't mmap the entire physical MMIO
> >>>into both VM blindly for obvious reason.
> >>>
> >>
> >>mmap() carries @length information, so you only need to allocate the 
> >>specified size
> >>(corresponding to @length) of memory for them.
> >
> >Again, you still look at this as a static partition at QEMU configuration 
> >time
> >where the guest mmio will be mapped as a whole at some offset of the physical
> >mmio region. (You still can do that like I said above by not providing
> >validate_map_request() in your vendor driver.)
> >
> 
> Then you can move validate_map_request() to here to achieve custom 
> allocation-policy.
> 
> >But this is not the framework we are defining here.
> >
> >The framework we have here is to provide the driver vendor flexibility to 
> >decide
> >the guest mmio and physical mmio mapping on page basis, and such information 
> >is
> >available during runtime.
> >
> >How such information gets communicated between guest and host driver is up to
> >driver vendor.
> 
> The problems is the sequence of the way "provide the driver vendor
> flexibility to decide the guest mmio and physical mmio mapping on page basis"
> and mmap().
> 
> We should provide such allocation info first then do mmap(). You current 
> design,
> do mmap() -> communication telling such info -> use such info when fault 
> happens,
> is really BAD, because you can not control the time when memory fault will 
> happen.
> The guest may access this memory before the communication you mentioned above,
> and another reason is that KVM MMU can prefetch memory at any time.

Like I have said before if your implementation doesn't need such flexibility,
you can still do a static mapping at VFIO mmap() time, then your mediated driver
doesn't have to provide validate_map_request, also the fault handler will not be
called. 

Let me address your questions below.

1. Information available at VFIO mmap() time?

So you are saying that the @req_size and  are both available in the time
when people are calling VFIO mmap() when guest OS is not even running, right?

The answer is No, the only thing are available at VFIO mmap are the following:

1) guest MMIO size

2) host physical MMIO size

3) guest MMIO starting address

4) host MMIO starting address

But none of above are the @req_size and @pgoff that we are talking about at the
validate_map_request time.

Our host MMIO is representing the means to access GPU HW resource. Those GPU HW
resources are allocated dynamically at runtime. So we have no visibility of the
@pgoff and @req_size that is covering some specific type of GPU HW resource at
VFIO mmap time. Also we don't even know if such resource will be required for a
particular VM or not.

For example, VM1 will need to launch a lot of graphics workload than VM2. So the
end result is that VM1 will gets a lot of resource A allocated than VM2 to
support his graphics workload. And to access resource A, the host mmio region
will be allocated as well, say [pfn_a size_a], the VM2 is [pfn_b, size_b].

Clearly, such region can be destroyed and reallocated through mediated driver
lifetime. This is why we need to have a fault handler there to map the proper
pages into guest after validation in the runtime.

I hope above response can address your question why we can't provide such
allocation info at VFIO mmap() time.

2. Guest might access mmio region at any time ...

Guest with a mediated GPU inside can definitely access his BARs at any time. If
guest is accessing some his BAR region that is not previously allocated, then
such access will be denied and with current scheme VM will crash to prevent
malicious access from the guest. This is another reason we choose to keep the
guest MMIO mediated.

3. KVM MMU can prefetch memory at any time.

You are talking about the KVM MMU prefetch the guest mmio region which is marked
as prefetchable right?

On baremetal, the prefetch is basic a cache line fill, where the range needs to
be marked as cachable for the CPU, then it issues a read to anywhere in the
cache line.

Is KVM MMU prefetch the same as 

Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Neo Jia
On Wed, Jul 06, 2016 at 10:22:59AM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/05/2016 11:07 PM, Neo Jia wrote:
> >This is kept there in case the validate_map_request() is not provided by 
> >vendor
> >driver then by default assume 1:1 mapping. So if validate_map_request() is 
> >not
> >provided, fault handler should not fail.
> 
> THESE are the parameters you passed to validate_map_request(), and these info 
> is
> available in mmap(), it really does not matter if you move 
> validate_map_request()
> to mmap(). That's what i want to say.

Let me answer this at the end of my response.

> 
> >
> >>
> >>>None of such information is available at VFIO mmap() time. For example, 
> >>>several VMs
> >>>are sharing the same physical device to provide mediated access. All VMs 
> >>>will
> >>>call the VFIO mmap() on their virtual BAR as part of QEMU vfio/pci 
> >>>initialization
> >>>process, at that moment, we definitely can't mmap the entire physical MMIO
> >>>into both VM blindly for obvious reason.
> >>>
> >>
> >>mmap() carries @length information, so you only need to allocate the 
> >>specified size
> >>(corresponding to @length) of memory for them.
> >
> >Again, you still look at this as a static partition at QEMU configuration 
> >time
> >where the guest mmio will be mapped as a whole at some offset of the physical
> >mmio region. (You still can do that like I said above by not providing
> >validate_map_request() in your vendor driver.)
> >
> 
> Then you can move validate_map_request() to here to achieve custom 
> allocation-policy.
> 
> >But this is not the framework we are defining here.
> >
> >The framework we have here is to provide the driver vendor flexibility to 
> >decide
> >the guest mmio and physical mmio mapping on page basis, and such information 
> >is
> >available during runtime.
> >
> >How such information gets communicated between guest and host driver is up to
> >driver vendor.
> 
> The problems is the sequence of the way "provide the driver vendor
> flexibility to decide the guest mmio and physical mmio mapping on page basis"
> and mmap().
> 
> We should provide such allocation info first then do mmap(). You current 
> design,
> do mmap() -> communication telling such info -> use such info when fault 
> happens,
> is really BAD, because you can not control the time when memory fault will 
> happen.
> The guest may access this memory before the communication you mentioned above,
> and another reason is that KVM MMU can prefetch memory at any time.

Like I have said before if your implementation doesn't need such flexibility,
you can still do a static mapping at VFIO mmap() time, then your mediated driver
doesn't have to provide validate_map_request, also the fault handler will not be
called. 

Let me address your questions below.

1. Information available at VFIO mmap() time?

So you are saying that the @req_size and  are both available in the time
when people are calling VFIO mmap() when guest OS is not even running, right?

The answer is No, the only thing are available at VFIO mmap are the following:

1) guest MMIO size

2) host physical MMIO size

3) guest MMIO starting address

4) host MMIO starting address

But none of above are the @req_size and @pgoff that we are talking about at the
validate_map_request time.

Our host MMIO is representing the means to access GPU HW resource. Those GPU HW
resources are allocated dynamically at runtime. So we have no visibility of the
@pgoff and @req_size that is covering some specific type of GPU HW resource at
VFIO mmap time. Also we don't even know if such resource will be required for a
particular VM or not.

For example, VM1 will need to launch a lot of graphics workload than VM2. So the
end result is that VM1 will gets a lot of resource A allocated than VM2 to
support his graphics workload. And to access resource A, the host mmio region
will be allocated as well, say [pfn_a size_a], the VM2 is [pfn_b, size_b].

Clearly, such region can be destroyed and reallocated through mediated driver
lifetime. This is why we need to have a fault handler there to map the proper
pages into guest after validation in the runtime.

I hope above response can address your question why we can't provide such
allocation info at VFIO mmap() time.

2. Guest might access mmio region at any time ...

Guest with a mediated GPU inside can definitely access his BARs at any time. If
guest is accessing some his BAR region that is not previously allocated, then
such access will be denied and with current scheme VM will crash to prevent
malicious access from the guest. This is another reason we choose to keep the
guest MMIO mediated.

3. KVM MMU can prefetch memory at any time.

You are talking about the KVM MMU prefetch the guest mmio region which is marked
as prefetchable right?

On baremetal, the prefetch is basic a cache line fill, where the range needs to
be marked as cachable for the CPU, then it issues a read to anywhere in the
cache line.

Is KVM MMU prefetch the same as 

Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Xiao Guangrong



On 07/06/2016 10:57 AM, Neo Jia wrote:

On Wed, Jul 06, 2016 at 10:35:18AM +0800, Xiao Guangrong wrote:



On 07/06/2016 10:18 AM, Neo Jia wrote:

On Wed, Jul 06, 2016 at 10:00:46AM +0800, Xiao Guangrong wrote:



On 07/05/2016 08:18 PM, Paolo Bonzini wrote:



On 05/07/2016 07:41, Neo Jia wrote:

On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.

KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and these
patches should fix this.


Hi Paolo,

I have tested your patches with the mediated passthru patchset that is being
reviewed in KVM and QEMU mailing list.

The fault handler gets called successfully and the previously mapped memory gets
unmmaped correctly via unmap_mapping_range.


Great, then I'll include them in 4.8.


Code is okay, but i still suspect if this implementation, fetch mmio pages in 
fault
handler, is needed. We'd better include these patches after the design of vfio
framework is decided.


Hi Guangrong,

I disagree. The design of VFIO framework has been actively discussed in the KVM
and QEMU mailing for a while and the fault handler is agreed upon to provide the
flexibility for different driver vendors' implementation. With that said, I am
still open to discuss with you and anybody else about this framework as the goal
is to allow multiple vendor to plugin into this framework to support their
mediated device virtualization scheme, such as Intel, IBM and us.


The discussion is still going on. And current vfio patchset we reviewed is still
problematic.


My point is the fault handler part has been discussed already, with that said I
am always open to any constructive suggestions to make things better and
maintainable. (Appreciate your code review on the VFIO thread, I think we still
own you another response, will do that.)



It always can be changed especially the vfio patchset is not in a good shape.





May I ask you what the exact issue you have with this interface for Intel to 
support
your own GPU virtualization?


Intel's vGPU can work with this framework. We really appreciate your / nvidia's
contribution.


Then, I don't think we should embargo Paolo's patch.


This patchset is specific for the framework design, i.e, mapping memory when 
fault
happens rather than mmap(), and this design is exact what we are discussing for
nearly two days.



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Xiao Guangrong



On 07/06/2016 10:57 AM, Neo Jia wrote:

On Wed, Jul 06, 2016 at 10:35:18AM +0800, Xiao Guangrong wrote:



On 07/06/2016 10:18 AM, Neo Jia wrote:

On Wed, Jul 06, 2016 at 10:00:46AM +0800, Xiao Guangrong wrote:



On 07/05/2016 08:18 PM, Paolo Bonzini wrote:



On 05/07/2016 07:41, Neo Jia wrote:

On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.

KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and these
patches should fix this.


Hi Paolo,

I have tested your patches with the mediated passthru patchset that is being
reviewed in KVM and QEMU mailing list.

The fault handler gets called successfully and the previously mapped memory gets
unmmaped correctly via unmap_mapping_range.


Great, then I'll include them in 4.8.


Code is okay, but i still suspect if this implementation, fetch mmio pages in 
fault
handler, is needed. We'd better include these patches after the design of vfio
framework is decided.


Hi Guangrong,

I disagree. The design of VFIO framework has been actively discussed in the KVM
and QEMU mailing for a while and the fault handler is agreed upon to provide the
flexibility for different driver vendors' implementation. With that said, I am
still open to discuss with you and anybody else about this framework as the goal
is to allow multiple vendor to plugin into this framework to support their
mediated device virtualization scheme, such as Intel, IBM and us.


The discussion is still going on. And current vfio patchset we reviewed is still
problematic.


My point is the fault handler part has been discussed already, with that said I
am always open to any constructive suggestions to make things better and
maintainable. (Appreciate your code review on the VFIO thread, I think we still
own you another response, will do that.)



It always can be changed especially the vfio patchset is not in a good shape.





May I ask you what the exact issue you have with this interface for Intel to 
support
your own GPU virtualization?


Intel's vGPU can work with this framework. We really appreciate your / nvidia's
contribution.


Then, I don't think we should embargo Paolo's patch.


This patchset is specific for the framework design, i.e, mapping memory when 
fault
happens rather than mmap(), and this design is exact what we are discussing for
nearly two days.



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Neo Jia
On Wed, Jul 06, 2016 at 10:35:18AM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/06/2016 10:18 AM, Neo Jia wrote:
> >On Wed, Jul 06, 2016 at 10:00:46AM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 07/05/2016 08:18 PM, Paolo Bonzini wrote:
> >>>
> >>>
> >>>On 05/07/2016 07:41, Neo Jia wrote:
> On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:
> >The vGPU folks would like to trap the first access to a BAR by setting
> >vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
> >then can use remap_pfn_range to place some non-reserved pages in the VMA.
> >
> >KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and 
> >these
> >patches should fix this.
> 
> Hi Paolo,
> 
> I have tested your patches with the mediated passthru patchset that is 
> being
> reviewed in KVM and QEMU mailing list.
> 
> The fault handler gets called successfully and the previously mapped 
> memory gets
> unmmaped correctly via unmap_mapping_range.
> >>>
> >>>Great, then I'll include them in 4.8.
> >>
> >>Code is okay, but i still suspect if this implementation, fetch mmio pages 
> >>in fault
> >>handler, is needed. We'd better include these patches after the design of 
> >>vfio
> >>framework is decided.
> >
> >Hi Guangrong,
> >
> >I disagree. The design of VFIO framework has been actively discussed in the 
> >KVM
> >and QEMU mailing for a while and the fault handler is agreed upon to provide 
> >the
> >flexibility for different driver vendors' implementation. With that said, I 
> >am
> >still open to discuss with you and anybody else about this framework as the 
> >goal
> >is to allow multiple vendor to plugin into this framework to support their
> >mediated device virtualization scheme, such as Intel, IBM and us.
> 
> The discussion is still going on. And current vfio patchset we reviewed is 
> still
> problematic.

My point is the fault handler part has been discussed already, with that said I
am always open to any constructive suggestions to make things better and
maintainable. (Appreciate your code review on the VFIO thread, I think we still
own you another response, will do that.)

> 
> >
> >May I ask you what the exact issue you have with this interface for Intel to 
> >support
> >your own GPU virtualization?
> 
> Intel's vGPU can work with this framework. We really appreciate your / 
> nvidia's
> contribution.

Then, I don't think we should embargo Paolo's patch.

> 
> i didn’t mean to offend you, i just want to make sure if this complexity is 
> really
> needed and inspect if this framework is safe enough and think it over if we 
> have
> a better implementation.

Not at all. :-)

Suggestions are always welcome, I just want to know the exact issues you have
with the code so I can have a better response to address that with proper 
information.

Thanks,
Neo



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Neo Jia
On Wed, Jul 06, 2016 at 10:35:18AM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/06/2016 10:18 AM, Neo Jia wrote:
> >On Wed, Jul 06, 2016 at 10:00:46AM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 07/05/2016 08:18 PM, Paolo Bonzini wrote:
> >>>
> >>>
> >>>On 05/07/2016 07:41, Neo Jia wrote:
> On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:
> >The vGPU folks would like to trap the first access to a BAR by setting
> >vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
> >then can use remap_pfn_range to place some non-reserved pages in the VMA.
> >
> >KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and 
> >these
> >patches should fix this.
> 
> Hi Paolo,
> 
> I have tested your patches with the mediated passthru patchset that is 
> being
> reviewed in KVM and QEMU mailing list.
> 
> The fault handler gets called successfully and the previously mapped 
> memory gets
> unmmaped correctly via unmap_mapping_range.
> >>>
> >>>Great, then I'll include them in 4.8.
> >>
> >>Code is okay, but i still suspect if this implementation, fetch mmio pages 
> >>in fault
> >>handler, is needed. We'd better include these patches after the design of 
> >>vfio
> >>framework is decided.
> >
> >Hi Guangrong,
> >
> >I disagree. The design of VFIO framework has been actively discussed in the 
> >KVM
> >and QEMU mailing for a while and the fault handler is agreed upon to provide 
> >the
> >flexibility for different driver vendors' implementation. With that said, I 
> >am
> >still open to discuss with you and anybody else about this framework as the 
> >goal
> >is to allow multiple vendor to plugin into this framework to support their
> >mediated device virtualization scheme, such as Intel, IBM and us.
> 
> The discussion is still going on. And current vfio patchset we reviewed is 
> still
> problematic.

My point is the fault handler part has been discussed already, with that said I
am always open to any constructive suggestions to make things better and
maintainable. (Appreciate your code review on the VFIO thread, I think we still
own you another response, will do that.)

> 
> >
> >May I ask you what the exact issue you have with this interface for Intel to 
> >support
> >your own GPU virtualization?
> 
> Intel's vGPU can work with this framework. We really appreciate your / 
> nvidia's
> contribution.

Then, I don't think we should embargo Paolo's patch.

> 
> i didn’t mean to offend you, i just want to make sure if this complexity is 
> really
> needed and inspect if this framework is safe enough and think it over if we 
> have
> a better implementation.

Not at all. :-)

Suggestions are always welcome, I just want to know the exact issues you have
with the code so I can have a better response to address that with proper 
information.

Thanks,
Neo



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Xiao Guangrong



On 07/06/2016 10:18 AM, Neo Jia wrote:

On Wed, Jul 06, 2016 at 10:00:46AM +0800, Xiao Guangrong wrote:



On 07/05/2016 08:18 PM, Paolo Bonzini wrote:



On 05/07/2016 07:41, Neo Jia wrote:

On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.

KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and these
patches should fix this.


Hi Paolo,

I have tested your patches with the mediated passthru patchset that is being
reviewed in KVM and QEMU mailing list.

The fault handler gets called successfully and the previously mapped memory gets
unmmaped correctly via unmap_mapping_range.


Great, then I'll include them in 4.8.


Code is okay, but i still suspect if this implementation, fetch mmio pages in 
fault
handler, is needed. We'd better include these patches after the design of vfio
framework is decided.


Hi Guangrong,

I disagree. The design of VFIO framework has been actively discussed in the KVM
and QEMU mailing for a while and the fault handler is agreed upon to provide the
flexibility for different driver vendors' implementation. With that said, I am
still open to discuss with you and anybody else about this framework as the goal
is to allow multiple vendor to plugin into this framework to support their
mediated device virtualization scheme, such as Intel, IBM and us.


The discussion is still going on. And current vfio patchset we reviewed is still
problematic.



May I ask you what the exact issue you have with this interface for Intel to 
support
your own GPU virtualization?


Intel's vGPU can work with this framework. We really appreciate your / nvidia's
contribution.

i didn’t mean to offend you, i just want to make sure if this complexity is 
really
needed and inspect if this framework is safe enough and think it over if we have
a better implementation.





Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Xiao Guangrong



On 07/06/2016 10:18 AM, Neo Jia wrote:

On Wed, Jul 06, 2016 at 10:00:46AM +0800, Xiao Guangrong wrote:



On 07/05/2016 08:18 PM, Paolo Bonzini wrote:



On 05/07/2016 07:41, Neo Jia wrote:

On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.

KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and these
patches should fix this.


Hi Paolo,

I have tested your patches with the mediated passthru patchset that is being
reviewed in KVM and QEMU mailing list.

The fault handler gets called successfully and the previously mapped memory gets
unmmaped correctly via unmap_mapping_range.


Great, then I'll include them in 4.8.


Code is okay, but i still suspect if this implementation, fetch mmio pages in 
fault
handler, is needed. We'd better include these patches after the design of vfio
framework is decided.


Hi Guangrong,

I disagree. The design of VFIO framework has been actively discussed in the KVM
and QEMU mailing for a while and the fault handler is agreed upon to provide the
flexibility for different driver vendors' implementation. With that said, I am
still open to discuss with you and anybody else about this framework as the goal
is to allow multiple vendor to plugin into this framework to support their
mediated device virtualization scheme, such as Intel, IBM and us.


The discussion is still going on. And current vfio patchset we reviewed is still
problematic.



May I ask you what the exact issue you have with this interface for Intel to 
support
your own GPU virtualization?


Intel's vGPU can work with this framework. We really appreciate your / nvidia's
contribution.

i didn’t mean to offend you, i just want to make sure if this complexity is 
really
needed and inspect if this framework is safe enough and think it over if we have
a better implementation.





Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Xiao Guangrong



On 07/05/2016 11:07 PM, Neo Jia wrote:

On Tue, Jul 05, 2016 at 05:02:46PM +0800, Xiao Guangrong wrote:




It is physically contiguous but it is done during the runtime, physically 
contiguous doesn't mean
static partition at boot time. And only during runtime, the proper HW resource 
will be requested therefore
the right portion of MMIO region will be granted by the mediated device driver 
on the host.


Okay. This is your implantation design rather than the hardware limitation, 
right?


I don't think it matters here. We are talking about framework so it should
provide the flexibility for different driver vendor.


It really matters. It is the reason why we design the framework like this and
we need to make sure whether we have a better design to fill the requirements.





For example, if the instance require 512M memory (the size can be specified by 
QEMU
command line), it can tell its requirement to the mediated device driver via 
create()
interface, then the driver can allocate then memory for this instance before it 
is running.


BAR != your device memory

We don't set the BAR size via QEMU command line, BAR size is extracted by QEMU
from config space provided by vendor driver.



Anyway, the BAR size have a way to configure, e.g, specify the size as a 
parameter when
you create a mdev via sysfs.



Theoretically, the hardware is able to do memory management as this style, but 
for some
reasons you choose allocating memory in the runtime. right? If my understanding 
is right,
could you please tell us what benefit you want to get from this 
running-allocation style?


Your understanding is incorrect.


Then WHY?







Then the req_size and pgoff will both come from the mediated device driver 
based on his internal book
keeping of the hw resource allocation, which is only available during runtime. 
And such book keeping
can be built part of para-virtualization scheme between guest and host device 
driver.



I am talking the parameters you passed to validate_map_request(). req_size is 
calculated like this:

+   offset   = virtaddr - vma->vm_start;
+   phyaddr  = (vma->vm_pgoff << PAGE_SHIFT) + offset;
+   pgoff= phyaddr >> PAGE_SHIFT;

All these info is from vma which is available in mmmap().

pgoff is got from:
+   pg_prot  = vma->vm_page_prot;
that is also available in mmap().


This is kept there in case the validate_map_request() is not provided by vendor
driver then by default assume 1:1 mapping. So if validate_map_request() is not
provided, fault handler should not fail.


THESE are the parameters you passed to validate_map_request(), and these info is
available in mmap(), it really does not matter if you move 
validate_map_request()
to mmap(). That's what i want to say.






None of such information is available at VFIO mmap() time. For example, several 
VMs
are sharing the same physical device to provide mediated access. All VMs will
call the VFIO mmap() on their virtual BAR as part of QEMU vfio/pci 
initialization
process, at that moment, we definitely can't mmap the entire physical MMIO
into both VM blindly for obvious reason.



mmap() carries @length information, so you only need to allocate the specified 
size
(corresponding to @length) of memory for them.


Again, you still look at this as a static partition at QEMU configuration time
where the guest mmio will be mapped as a whole at some offset of the physical
mmio region. (You still can do that like I said above by not providing
validate_map_request() in your vendor driver.)



Then you can move validate_map_request() to here to achieve custom 
allocation-policy.


But this is not the framework we are defining here.

The framework we have here is to provide the driver vendor flexibility to decide
the guest mmio and physical mmio mapping on page basis, and such information is
available during runtime.

How such information gets communicated between guest and host driver is up to
driver vendor.


The problems is the sequence of the way "provide the driver vendor
flexibility to decide the guest mmio and physical mmio mapping on page basis"
and mmap().

We should provide such allocation info first then do mmap(). You current design,
do mmap() -> communication telling such info -> use such info when fault 
happens,
is really BAD, because you can not control the time when memory fault will 
happen.
The guest may access this memory before the communication you mentioned above,
and another reason is that KVM MMU can prefetch memory at any time.



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Xiao Guangrong



On 07/05/2016 11:07 PM, Neo Jia wrote:

On Tue, Jul 05, 2016 at 05:02:46PM +0800, Xiao Guangrong wrote:




It is physically contiguous but it is done during the runtime, physically 
contiguous doesn't mean
static partition at boot time. And only during runtime, the proper HW resource 
will be requested therefore
the right portion of MMIO region will be granted by the mediated device driver 
on the host.


Okay. This is your implantation design rather than the hardware limitation, 
right?


I don't think it matters here. We are talking about framework so it should
provide the flexibility for different driver vendor.


It really matters. It is the reason why we design the framework like this and
we need to make sure whether we have a better design to fill the requirements.





For example, if the instance require 512M memory (the size can be specified by 
QEMU
command line), it can tell its requirement to the mediated device driver via 
create()
interface, then the driver can allocate then memory for this instance before it 
is running.


BAR != your device memory

We don't set the BAR size via QEMU command line, BAR size is extracted by QEMU
from config space provided by vendor driver.



Anyway, the BAR size have a way to configure, e.g, specify the size as a 
parameter when
you create a mdev via sysfs.



Theoretically, the hardware is able to do memory management as this style, but 
for some
reasons you choose allocating memory in the runtime. right? If my understanding 
is right,
could you please tell us what benefit you want to get from this 
running-allocation style?


Your understanding is incorrect.


Then WHY?







Then the req_size and pgoff will both come from the mediated device driver 
based on his internal book
keeping of the hw resource allocation, which is only available during runtime. 
And such book keeping
can be built part of para-virtualization scheme between guest and host device 
driver.



I am talking the parameters you passed to validate_map_request(). req_size is 
calculated like this:

+   offset   = virtaddr - vma->vm_start;
+   phyaddr  = (vma->vm_pgoff << PAGE_SHIFT) + offset;
+   pgoff= phyaddr >> PAGE_SHIFT;

All these info is from vma which is available in mmmap().

pgoff is got from:
+   pg_prot  = vma->vm_page_prot;
that is also available in mmap().


This is kept there in case the validate_map_request() is not provided by vendor
driver then by default assume 1:1 mapping. So if validate_map_request() is not
provided, fault handler should not fail.


THESE are the parameters you passed to validate_map_request(), and these info is
available in mmap(), it really does not matter if you move 
validate_map_request()
to mmap(). That's what i want to say.






None of such information is available at VFIO mmap() time. For example, several 
VMs
are sharing the same physical device to provide mediated access. All VMs will
call the VFIO mmap() on their virtual BAR as part of QEMU vfio/pci 
initialization
process, at that moment, we definitely can't mmap the entire physical MMIO
into both VM blindly for obvious reason.



mmap() carries @length information, so you only need to allocate the specified 
size
(corresponding to @length) of memory for them.


Again, you still look at this as a static partition at QEMU configuration time
where the guest mmio will be mapped as a whole at some offset of the physical
mmio region. (You still can do that like I said above by not providing
validate_map_request() in your vendor driver.)



Then you can move validate_map_request() to here to achieve custom 
allocation-policy.


But this is not the framework we are defining here.

The framework we have here is to provide the driver vendor flexibility to decide
the guest mmio and physical mmio mapping on page basis, and such information is
available during runtime.

How such information gets communicated between guest and host driver is up to
driver vendor.


The problems is the sequence of the way "provide the driver vendor
flexibility to decide the guest mmio and physical mmio mapping on page basis"
and mmap().

We should provide such allocation info first then do mmap(). You current design,
do mmap() -> communication telling such info -> use such info when fault 
happens,
is really BAD, because you can not control the time when memory fault will 
happen.
The guest may access this memory before the communication you mentioned above,
and another reason is that KVM MMU can prefetch memory at any time.



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Neo Jia
On Wed, Jul 06, 2016 at 10:00:46AM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/05/2016 08:18 PM, Paolo Bonzini wrote:
> >
> >
> >On 05/07/2016 07:41, Neo Jia wrote:
> >>On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:
> >>>The vGPU folks would like to trap the first access to a BAR by setting
> >>>vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
> >>>then can use remap_pfn_range to place some non-reserved pages in the VMA.
> >>>
> >>>KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and these
> >>>patches should fix this.
> >>
> >>Hi Paolo,
> >>
> >>I have tested your patches with the mediated passthru patchset that is being
> >>reviewed in KVM and QEMU mailing list.
> >>
> >>The fault handler gets called successfully and the previously mapped memory 
> >>gets
> >>unmmaped correctly via unmap_mapping_range.
> >
> >Great, then I'll include them in 4.8.
> 
> Code is okay, but i still suspect if this implementation, fetch mmio pages in 
> fault
> handler, is needed. We'd better include these patches after the design of vfio
> framework is decided.

Hi Guangrong,

I disagree. The design of VFIO framework has been actively discussed in the KVM
and QEMU mailing for a while and the fault handler is agreed upon to provide the
flexibility for different driver vendors' implementation. With that said, I am
still open to discuss with you and anybody else about this framework as the goal
is to allow multiple vendor to plugin into this framework to support their
mediated device virtualization scheme, such as Intel, IBM and us.

May I ask you what the exact issue you have with this interface for Intel to 
support 
your own GPU virtualization? 

Thanks,
Neo

> 


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Neo Jia
On Wed, Jul 06, 2016 at 10:00:46AM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/05/2016 08:18 PM, Paolo Bonzini wrote:
> >
> >
> >On 05/07/2016 07:41, Neo Jia wrote:
> >>On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:
> >>>The vGPU folks would like to trap the first access to a BAR by setting
> >>>vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
> >>>then can use remap_pfn_range to place some non-reserved pages in the VMA.
> >>>
> >>>KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and these
> >>>patches should fix this.
> >>
> >>Hi Paolo,
> >>
> >>I have tested your patches with the mediated passthru patchset that is being
> >>reviewed in KVM and QEMU mailing list.
> >>
> >>The fault handler gets called successfully and the previously mapped memory 
> >>gets
> >>unmmaped correctly via unmap_mapping_range.
> >
> >Great, then I'll include them in 4.8.
> 
> Code is okay, but i still suspect if this implementation, fetch mmio pages in 
> fault
> handler, is needed. We'd better include these patches after the design of vfio
> framework is decided.

Hi Guangrong,

I disagree. The design of VFIO framework has been actively discussed in the KVM
and QEMU mailing for a while and the fault handler is agreed upon to provide the
flexibility for different driver vendors' implementation. With that said, I am
still open to discuss with you and anybody else about this framework as the goal
is to allow multiple vendor to plugin into this framework to support their
mediated device virtualization scheme, such as Intel, IBM and us.

May I ask you what the exact issue you have with this interface for Intel to 
support 
your own GPU virtualization? 

Thanks,
Neo

> 


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Xiao Guangrong



On 07/05/2016 08:18 PM, Paolo Bonzini wrote:



On 05/07/2016 07:41, Neo Jia wrote:

On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.

KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and these
patches should fix this.


Hi Paolo,

I have tested your patches with the mediated passthru patchset that is being
reviewed in KVM and QEMU mailing list.

The fault handler gets called successfully and the previously mapped memory gets
unmmaped correctly via unmap_mapping_range.


Great, then I'll include them in 4.8.


Code is okay, but i still suspect if this implementation, fetch mmio pages in 
fault
handler, is needed. We'd better include these patches after the design of vfio
framework is decided.



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Xiao Guangrong



On 07/05/2016 08:18 PM, Paolo Bonzini wrote:



On 05/07/2016 07:41, Neo Jia wrote:

On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.

KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and these
patches should fix this.


Hi Paolo,

I have tested your patches with the mediated passthru patchset that is being
reviewed in KVM and QEMU mailing list.

The fault handler gets called successfully and the previously mapped memory gets
unmmaped correctly via unmap_mapping_range.


Great, then I'll include them in 4.8.


Code is okay, but i still suspect if this implementation, fetch mmio pages in 
fault
handler, is needed. We'd better include these patches after the design of vfio
framework is decided.



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Neo Jia
On Tue, Jul 05, 2016 at 05:02:46PM +0800, Xiao Guangrong wrote:
> 
> >
> >It is physically contiguous but it is done during the runtime, physically 
> >contiguous doesn't mean
> >static partition at boot time. And only during runtime, the proper HW 
> >resource will be requested therefore
> >the right portion of MMIO region will be granted by the mediated device 
> >driver on the host.
> 
> Okay. This is your implantation design rather than the hardware limitation, 
> right?

I don't think it matters here. We are talking about framework so it should
provide the flexibility for different driver vendor.

> 
> For example, if the instance require 512M memory (the size can be specified 
> by QEMU
> command line), it can tell its requirement to the mediated device driver via 
> create()
> interface, then the driver can allocate then memory for this instance before 
> it is running.

BAR != your device memory

We don't set the BAR size via QEMU command line, BAR size is extracted by QEMU
from config space provided by vendor driver.

> 
> Theoretically, the hardware is able to do memory management as this style, 
> but for some
> reasons you choose allocating memory in the runtime. right? If my 
> understanding is right,
> could you please tell us what benefit you want to get from this 
> running-allocation style?

Your understanding is incorrect.

> 
> >
> >Then the req_size and pgoff will both come from the mediated device driver 
> >based on his internal book
> >keeping of the hw resource allocation, which is only available during 
> >runtime. And such book keeping
> >can be built part of para-virtualization scheme between guest and host 
> >device driver.
> >
> 
> I am talking the parameters you passed to validate_map_request(). req_size is 
> calculated like this:
> 
> +   offset   = virtaddr - vma->vm_start;
> +   phyaddr  = (vma->vm_pgoff << PAGE_SHIFT) + offset;
> +   pgoff= phyaddr >> PAGE_SHIFT;
> 
> All these info is from vma which is available in mmmap().
> 
> pgoff is got from:
> +   pg_prot  = vma->vm_page_prot;
> that is also available in mmap().

This is kept there in case the validate_map_request() is not provided by vendor
driver then by default assume 1:1 mapping. So if validate_map_request() is not
provided, fault handler should not fail.

> 
> >None of such information is available at VFIO mmap() time. For example, 
> >several VMs
> >are sharing the same physical device to provide mediated access. All VMs will
> >call the VFIO mmap() on their virtual BAR as part of QEMU vfio/pci 
> >initialization
> >process, at that moment, we definitely can't mmap the entire physical MMIO
> >into both VM blindly for obvious reason.
> >
> 
> mmap() carries @length information, so you only need to allocate the 
> specified size
> (corresponding to @length) of memory for them.

Again, you still look at this as a static partition at QEMU configuration time
where the guest mmio will be mapped as a whole at some offset of the physical
mmio region. (You still can do that like I said above by not providing
validate_map_request() in your vendor driver.)

But this is not the framework we are defining here.

The framework we have here is to provide the driver vendor flexibility to decide
the guest mmio and physical mmio mapping on page basis, and such information is
available during runtime.

How such information gets communicated between guest and host driver is up to
driver vendor.

Thanks,
Neo



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Neo Jia
On Tue, Jul 05, 2016 at 05:02:46PM +0800, Xiao Guangrong wrote:
> 
> >
> >It is physically contiguous but it is done during the runtime, physically 
> >contiguous doesn't mean
> >static partition at boot time. And only during runtime, the proper HW 
> >resource will be requested therefore
> >the right portion of MMIO region will be granted by the mediated device 
> >driver on the host.
> 
> Okay. This is your implantation design rather than the hardware limitation, 
> right?

I don't think it matters here. We are talking about framework so it should
provide the flexibility for different driver vendor.

> 
> For example, if the instance require 512M memory (the size can be specified 
> by QEMU
> command line), it can tell its requirement to the mediated device driver via 
> create()
> interface, then the driver can allocate then memory for this instance before 
> it is running.

BAR != your device memory

We don't set the BAR size via QEMU command line, BAR size is extracted by QEMU
from config space provided by vendor driver.

> 
> Theoretically, the hardware is able to do memory management as this style, 
> but for some
> reasons you choose allocating memory in the runtime. right? If my 
> understanding is right,
> could you please tell us what benefit you want to get from this 
> running-allocation style?

Your understanding is incorrect.

> 
> >
> >Then the req_size and pgoff will both come from the mediated device driver 
> >based on his internal book
> >keeping of the hw resource allocation, which is only available during 
> >runtime. And such book keeping
> >can be built part of para-virtualization scheme between guest and host 
> >device driver.
> >
> 
> I am talking the parameters you passed to validate_map_request(). req_size is 
> calculated like this:
> 
> +   offset   = virtaddr - vma->vm_start;
> +   phyaddr  = (vma->vm_pgoff << PAGE_SHIFT) + offset;
> +   pgoff= phyaddr >> PAGE_SHIFT;
> 
> All these info is from vma which is available in mmmap().
> 
> pgoff is got from:
> +   pg_prot  = vma->vm_page_prot;
> that is also available in mmap().

This is kept there in case the validate_map_request() is not provided by vendor
driver then by default assume 1:1 mapping. So if validate_map_request() is not
provided, fault handler should not fail.

> 
> >None of such information is available at VFIO mmap() time. For example, 
> >several VMs
> >are sharing the same physical device to provide mediated access. All VMs will
> >call the VFIO mmap() on their virtual BAR as part of QEMU vfio/pci 
> >initialization
> >process, at that moment, we definitely can't mmap the entire physical MMIO
> >into both VM blindly for obvious reason.
> >
> 
> mmap() carries @length information, so you only need to allocate the 
> specified size
> (corresponding to @length) of memory for them.

Again, you still look at this as a static partition at QEMU configuration time
where the guest mmio will be mapped as a whole at some offset of the physical
mmio region. (You still can do that like I said above by not providing
validate_map_request() in your vendor driver.)

But this is not the framework we are defining here.

The framework we have here is to provide the driver vendor flexibility to decide
the guest mmio and physical mmio mapping on page basis, and such information is
available during runtime.

How such information gets communicated between guest and host driver is up to
driver vendor.

Thanks,
Neo



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Neo Jia
On Tue, Jul 05, 2016 at 02:18:28PM +0200, Paolo Bonzini wrote:
> 
> 
> On 05/07/2016 07:41, Neo Jia wrote:
> > On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:
> >> The vGPU folks would like to trap the first access to a BAR by setting
> >> vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
> >> then can use remap_pfn_range to place some non-reserved pages in the VMA.
> >>
> >> KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and these
> >> patches should fix this.
> > 
> > Hi Paolo,
> > 
> > I have tested your patches with the mediated passthru patchset that is being
> > reviewed in KVM and QEMU mailing list.
> > 
> > The fault handler gets called successfully and the previously mapped memory 
> > gets
> > unmmaped correctly via unmap_mapping_range.
> 
> Great, then I'll include them in 4.8.

Thanks!

> 
> Paolo
> 


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Neo Jia
On Tue, Jul 05, 2016 at 02:18:28PM +0200, Paolo Bonzini wrote:
> 
> 
> On 05/07/2016 07:41, Neo Jia wrote:
> > On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:
> >> The vGPU folks would like to trap the first access to a BAR by setting
> >> vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
> >> then can use remap_pfn_range to place some non-reserved pages in the VMA.
> >>
> >> KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and these
> >> patches should fix this.
> > 
> > Hi Paolo,
> > 
> > I have tested your patches with the mediated passthru patchset that is being
> > reviewed in KVM and QEMU mailing list.
> > 
> > The fault handler gets called successfully and the previously mapped memory 
> > gets
> > unmmaped correctly via unmap_mapping_range.
> 
> Great, then I'll include them in 4.8.

Thanks!

> 
> Paolo
> 


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Paolo Bonzini


On 05/07/2016 07:41, Neo Jia wrote:
> On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:
>> The vGPU folks would like to trap the first access to a BAR by setting
>> vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
>> then can use remap_pfn_range to place some non-reserved pages in the VMA.
>>
>> KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and these
>> patches should fix this.
> 
> Hi Paolo,
> 
> I have tested your patches with the mediated passthru patchset that is being
> reviewed in KVM and QEMU mailing list.
> 
> The fault handler gets called successfully and the previously mapped memory 
> gets
> unmmaped correctly via unmap_mapping_range.

Great, then I'll include them in 4.8.

Paolo



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Paolo Bonzini


On 05/07/2016 07:41, Neo Jia wrote:
> On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:
>> The vGPU folks would like to trap the first access to a BAR by setting
>> vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
>> then can use remap_pfn_range to place some non-reserved pages in the VMA.
>>
>> KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and these
>> patches should fix this.
> 
> Hi Paolo,
> 
> I have tested your patches with the mediated passthru patchset that is being
> reviewed in KVM and QEMU mailing list.
> 
> The fault handler gets called successfully and the previously mapped memory 
> gets
> unmmaped correctly via unmap_mapping_range.

Great, then I'll include them in 4.8.

Paolo



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Xiao Guangrong



On 07/05/2016 03:30 PM, Neo Jia wrote:




(Just for completeness, if you really want to use a device in above example as
VFIO passthru, the second step is not completely handled in userspace, it is 
actually the guest
driver who will allocate and setup the proper hw resource which will later ready
for you to access via some mmio pages.)


Hmm... i always treat the VM as userspace.


It is OK to treat VM as userspace, but I think it is better to put out details
so we are always on the same page.



Okay. I should pay more attention on it when i discuss with the driver people. 
:)







This is how QEMU/VFIO currently works, could you please tell me the special 
points
of your solution comparing with current QEMU/VFIO and why current model can not 
fit
your requirement? So that we can better understand your scenario?


The scenario I am describing here is mediated passthru case, but what you are
describing here (more or less) is VFIO direct assigned case. It is different in 
several
areas, but major difference related to this topic here is:

1) In VFIO direct assigned case, the device (and its resource) is completely 
owned by the VM
therefore its mmio region can be mapped directly into the VM during the VFIO 
mmap() call as
there is no resource sharing among VMs and there is no mediated device driver on
the host to manage such resource, so it is completely owned by the guest.


I understand this difference, However, as you told to me that the MMIO region 
allocated for the
VM is continuous, so i assume the portion of physical MMIO region is completely 
owned by guest.
The only difference i can see is mediated device driver need to allocate that 
region.


It is physically contiguous but it is done during the runtime, physically 
contiguous doesn't mean
static partition at boot time. And only during runtime, the proper HW resource 
will be requested therefore
the right portion of MMIO region will be granted by the mediated device driver 
on the host.


Okay. This is your implantation design rather than the hardware limitation, 
right?

For example, if the instance require 512M memory (the size can be specified by 
QEMU
command line), it can tell its requirement to the mediated device driver via 
create()
interface, then the driver can allocate then memory for this instance before it 
is running.

Theoretically, the hardware is able to do memory management as this style, but 
for some
reasons you choose allocating memory in the runtime. right? If my understanding 
is right,
could you please tell us what benefit you want to get from this 
running-allocation style?



Also, the physically contiguous doesn't mean the guest and host mmio is 1:1
always. You can have a 8GB host physical mmio while the guest will only have
256MB.


Thanks for your patience, it is clearer to me and at least i am able to try to 
guess the
whole picture. :)







2) In mediated passthru case, multiple VMs are sharing the same physical 
device, so how
the HW resource gets allocated is completely decided by the guest and host 
device driver of
the virtualized DMA device, here is the GPU, same as the MMIO pages used to 
access those Hw resource.


I can not see what guest's affair is here, look at your code, you cooked the 
fault handler like
this:


You shouldn't as that depends on how different devices are getting
para-virtualized by their own implementations.



PV method. It is interesting. More comments below.



+   ret = parent->ops->validate_map_request(mdev, virtaddr,
+, _size,
+_prot);

Please tell me what information is got from guest? All these info can be found 
at the time of
mmap().


The virtaddr is the guest mmio address that triggers this fault, which will be
used for the mediated device driver to locate the resource that he has 
previously allocated.


The virtaddr is not the guest mmio address, it is the virtual address of QEMU. 
vfio is not
able to figure out the guest mmio address as the mapping is handled in 
userspace as we
discussed above.

And we can get the virtaddr from [vma->start, vma->end) when we do mmap().



Then the req_size and pgoff will both come from the mediated device driver 
based on his internal book
keeping of the hw resource allocation, which is only available during runtime. 
And such book keeping
can be built part of para-virtualization scheme between guest and host device 
driver.



I am talking the parameters you passed to validate_map_request(). req_size is 
calculated like this:

+   offset   = virtaddr - vma->vm_start;
+   phyaddr  = (vma->vm_pgoff << PAGE_SHIFT) + offset;
+   pgoff= phyaddr >> PAGE_SHIFT;

All these info is from vma which is available in mmmap().

pgoff is got from:
+   pg_prot  = vma->vm_page_prot;
that is also available in mmap().


None of such information is available at VFIO mmap() time. For example, several 
VMs

Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Xiao Guangrong



On 07/05/2016 03:30 PM, Neo Jia wrote:




(Just for completeness, if you really want to use a device in above example as
VFIO passthru, the second step is not completely handled in userspace, it is 
actually the guest
driver who will allocate and setup the proper hw resource which will later ready
for you to access via some mmio pages.)


Hmm... i always treat the VM as userspace.


It is OK to treat VM as userspace, but I think it is better to put out details
so we are always on the same page.



Okay. I should pay more attention on it when i discuss with the driver people. 
:)







This is how QEMU/VFIO currently works, could you please tell me the special 
points
of your solution comparing with current QEMU/VFIO and why current model can not 
fit
your requirement? So that we can better understand your scenario?


The scenario I am describing here is mediated passthru case, but what you are
describing here (more or less) is VFIO direct assigned case. It is different in 
several
areas, but major difference related to this topic here is:

1) In VFIO direct assigned case, the device (and its resource) is completely 
owned by the VM
therefore its mmio region can be mapped directly into the VM during the VFIO 
mmap() call as
there is no resource sharing among VMs and there is no mediated device driver on
the host to manage such resource, so it is completely owned by the guest.


I understand this difference, However, as you told to me that the MMIO region 
allocated for the
VM is continuous, so i assume the portion of physical MMIO region is completely 
owned by guest.
The only difference i can see is mediated device driver need to allocate that 
region.


It is physically contiguous but it is done during the runtime, physically 
contiguous doesn't mean
static partition at boot time. And only during runtime, the proper HW resource 
will be requested therefore
the right portion of MMIO region will be granted by the mediated device driver 
on the host.


Okay. This is your implantation design rather than the hardware limitation, 
right?

For example, if the instance require 512M memory (the size can be specified by 
QEMU
command line), it can tell its requirement to the mediated device driver via 
create()
interface, then the driver can allocate then memory for this instance before it 
is running.

Theoretically, the hardware is able to do memory management as this style, but 
for some
reasons you choose allocating memory in the runtime. right? If my understanding 
is right,
could you please tell us what benefit you want to get from this 
running-allocation style?



Also, the physically contiguous doesn't mean the guest and host mmio is 1:1
always. You can have a 8GB host physical mmio while the guest will only have
256MB.


Thanks for your patience, it is clearer to me and at least i am able to try to 
guess the
whole picture. :)







2) In mediated passthru case, multiple VMs are sharing the same physical 
device, so how
the HW resource gets allocated is completely decided by the guest and host 
device driver of
the virtualized DMA device, here is the GPU, same as the MMIO pages used to 
access those Hw resource.


I can not see what guest's affair is here, look at your code, you cooked the 
fault handler like
this:


You shouldn't as that depends on how different devices are getting
para-virtualized by their own implementations.



PV method. It is interesting. More comments below.



+   ret = parent->ops->validate_map_request(mdev, virtaddr,
+, _size,
+_prot);

Please tell me what information is got from guest? All these info can be found 
at the time of
mmap().


The virtaddr is the guest mmio address that triggers this fault, which will be
used for the mediated device driver to locate the resource that he has 
previously allocated.


The virtaddr is not the guest mmio address, it is the virtual address of QEMU. 
vfio is not
able to figure out the guest mmio address as the mapping is handled in 
userspace as we
discussed above.

And we can get the virtaddr from [vma->start, vma->end) when we do mmap().



Then the req_size and pgoff will both come from the mediated device driver 
based on his internal book
keeping of the hw resource allocation, which is only available during runtime. 
And such book keeping
can be built part of para-virtualization scheme between guest and host device 
driver.



I am talking the parameters you passed to validate_map_request(). req_size is 
calculated like this:

+   offset   = virtaddr - vma->vm_start;
+   phyaddr  = (vma->vm_pgoff << PAGE_SHIFT) + offset;
+   pgoff= phyaddr >> PAGE_SHIFT;

All these info is from vma which is available in mmmap().

pgoff is got from:
+   pg_prot  = vma->vm_page_prot;
that is also available in mmap().


None of such information is available at VFIO mmap() time. For example, several 
VMs

Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Neo Jia
On Tue, Jul 05, 2016 at 02:26:46PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/05/2016 01:16 PM, Neo Jia wrote:
> >On Tue, Jul 05, 2016 at 12:02:42PM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 07/05/2016 09:35 AM, Neo Jia wrote:
> >>>On Tue, Jul 05, 2016 at 09:19:40AM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/04/2016 11:33 PM, Neo Jia wrote:
> 
> >>>
> >>>Sorry, I think I misread the "allocation" as "mapping". We only delay 
> >>>the
> >>>cpu mapping, not the allocation.
> >>
> >>So how to understand your statement:
> >>"at that moment nobody has any knowledge about how the physical mmio 
> >>gets virtualized"
> >>
> >>The resource, physical MMIO region, has been allocated, why we do not 
> >>know the physical
> >>address mapped to the VM?
> >>
> >
> >>From a device driver point of view, the physical mmio region never gets 
> >>allocated until
> >the corresponding resource is requested by clients and granted by the 
> >mediated device driver.
> 
> Hmm... but you told me that you did not delay the allocation. :(
> >>>
> >>>Hi Guangrong,
> >>>
> >>>The allocation here is the allocation of device resource, and the only way 
> >>>to
> >>>access that kind of device resource is via a mmio region of some pages 
> >>>there.
> >>>
> >>>For example, if VM needs resource A, and the only way to access resource A 
> >>>is
> >>>via some kind of device memory at mmio address X.
> >>>
> >>>So, we never defer the allocation request during runtime, we just setup the
> >>>CPU mapping later when it actually gets accessed.
> >>>
> 
> So it returns to my original question: why not allocate the physical mmio 
> region in mmap()?
> 
> >>>
> >>>Without running anything inside the VM, how do you know how the hw 
> >>>resource gets
> >>>allocated, therefore no knowledge of the use of mmio region.
> >>
> >>The allocation and mapping can be two independent processes:
> >>- the first process is just allocation. The MMIO region is allocated from 
> >>physical
> >>   hardware and this region is mapped into _QEMU's_ arbitrary virtual 
> >> address by mmap().
> >>   At this time, VM can not actually use this resource.
> >>
> >>- the second process is mapping. When VM enable this region, e.g, it 
> >>enables the
> >>   PCI BAR, then QEMU maps its virtual address returned by mmap() to VM's 
> >> physical
> >>   memory. After that, VM can access this region.
> >>
> >>The second process is completed handled in userspace, that means, the 
> >>mediated
> >>device driver needn't care how the resource is mapped into VM.
> >
> >In your example, you are still picturing it as VFIO direct assign, but the 
> >solution we are
> >talking here is mediated passthru via VFIO framework to virtualize DMA 
> >devices without SR-IOV.
> >
> 
> Please see my comments below.
> 
> >(Just for completeness, if you really want to use a device in above example 
> >as
> >VFIO passthru, the second step is not completely handled in userspace, it is 
> >actually the guest
> >driver who will allocate and setup the proper hw resource which will later 
> >ready
> >for you to access via some mmio pages.)
> 
> Hmm... i always treat the VM as userspace.

It is OK to treat VM as userspace, but I think it is better to put out details
so we are always on the same page.

> 
> >
> >>
> >>This is how QEMU/VFIO currently works, could you please tell me the special 
> >>points
> >>of your solution comparing with current QEMU/VFIO and why current model can 
> >>not fit
> >>your requirement? So that we can better understand your scenario?
> >
> >The scenario I am describing here is mediated passthru case, but what you are
> >describing here (more or less) is VFIO direct assigned case. It is different 
> >in several
> >areas, but major difference related to this topic here is:
> >
> >1) In VFIO direct assigned case, the device (and its resource) is completely 
> >owned by the VM
> >therefore its mmio region can be mapped directly into the VM during the VFIO 
> >mmap() call as
> >there is no resource sharing among VMs and there is no mediated device 
> >driver on
> >the host to manage such resource, so it is completely owned by the guest.
> 
> I understand this difference, However, as you told to me that the MMIO region 
> allocated for the
> VM is continuous, so i assume the portion of physical MMIO region is 
> completely owned by guest.
> The only difference i can see is mediated device driver need to allocate that 
> region.

It is physically contiguous but it is done during the runtime, physically 
contiguous doesn't mean 
static partition at boot time. And only during runtime, the proper HW resource 
will be requested therefore 
the right portion of MMIO region will be granted by the mediated device driver 
on the host.

Also, the physically contiguous doesn't mean the guest and host mmio is 1:1
always. You can have a 8GB host physical mmio while the guest will only have

Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Neo Jia
On Tue, Jul 05, 2016 at 02:26:46PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/05/2016 01:16 PM, Neo Jia wrote:
> >On Tue, Jul 05, 2016 at 12:02:42PM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 07/05/2016 09:35 AM, Neo Jia wrote:
> >>>On Tue, Jul 05, 2016 at 09:19:40AM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/04/2016 11:33 PM, Neo Jia wrote:
> 
> >>>
> >>>Sorry, I think I misread the "allocation" as "mapping". We only delay 
> >>>the
> >>>cpu mapping, not the allocation.
> >>
> >>So how to understand your statement:
> >>"at that moment nobody has any knowledge about how the physical mmio 
> >>gets virtualized"
> >>
> >>The resource, physical MMIO region, has been allocated, why we do not 
> >>know the physical
> >>address mapped to the VM?
> >>
> >
> >>From a device driver point of view, the physical mmio region never gets 
> >>allocated until
> >the corresponding resource is requested by clients and granted by the 
> >mediated device driver.
> 
> Hmm... but you told me that you did not delay the allocation. :(
> >>>
> >>>Hi Guangrong,
> >>>
> >>>The allocation here is the allocation of device resource, and the only way 
> >>>to
> >>>access that kind of device resource is via a mmio region of some pages 
> >>>there.
> >>>
> >>>For example, if VM needs resource A, and the only way to access resource A 
> >>>is
> >>>via some kind of device memory at mmio address X.
> >>>
> >>>So, we never defer the allocation request during runtime, we just setup the
> >>>CPU mapping later when it actually gets accessed.
> >>>
> 
> So it returns to my original question: why not allocate the physical mmio 
> region in mmap()?
> 
> >>>
> >>>Without running anything inside the VM, how do you know how the hw 
> >>>resource gets
> >>>allocated, therefore no knowledge of the use of mmio region.
> >>
> >>The allocation and mapping can be two independent processes:
> >>- the first process is just allocation. The MMIO region is allocated from 
> >>physical
> >>   hardware and this region is mapped into _QEMU's_ arbitrary virtual 
> >> address by mmap().
> >>   At this time, VM can not actually use this resource.
> >>
> >>- the second process is mapping. When VM enable this region, e.g, it 
> >>enables the
> >>   PCI BAR, then QEMU maps its virtual address returned by mmap() to VM's 
> >> physical
> >>   memory. After that, VM can access this region.
> >>
> >>The second process is completed handled in userspace, that means, the 
> >>mediated
> >>device driver needn't care how the resource is mapped into VM.
> >
> >In your example, you are still picturing it as VFIO direct assign, but the 
> >solution we are
> >talking here is mediated passthru via VFIO framework to virtualize DMA 
> >devices without SR-IOV.
> >
> 
> Please see my comments below.
> 
> >(Just for completeness, if you really want to use a device in above example 
> >as
> >VFIO passthru, the second step is not completely handled in userspace, it is 
> >actually the guest
> >driver who will allocate and setup the proper hw resource which will later 
> >ready
> >for you to access via some mmio pages.)
> 
> Hmm... i always treat the VM as userspace.

It is OK to treat VM as userspace, but I think it is better to put out details
so we are always on the same page.

> 
> >
> >>
> >>This is how QEMU/VFIO currently works, could you please tell me the special 
> >>points
> >>of your solution comparing with current QEMU/VFIO and why current model can 
> >>not fit
> >>your requirement? So that we can better understand your scenario?
> >
> >The scenario I am describing here is mediated passthru case, but what you are
> >describing here (more or less) is VFIO direct assigned case. It is different 
> >in several
> >areas, but major difference related to this topic here is:
> >
> >1) In VFIO direct assigned case, the device (and its resource) is completely 
> >owned by the VM
> >therefore its mmio region can be mapped directly into the VM during the VFIO 
> >mmap() call as
> >there is no resource sharing among VMs and there is no mediated device 
> >driver on
> >the host to manage such resource, so it is completely owned by the guest.
> 
> I understand this difference, However, as you told to me that the MMIO region 
> allocated for the
> VM is continuous, so i assume the portion of physical MMIO region is 
> completely owned by guest.
> The only difference i can see is mediated device driver need to allocate that 
> region.

It is physically contiguous but it is done during the runtime, physically 
contiguous doesn't mean 
static partition at boot time. And only during runtime, the proper HW resource 
will be requested therefore 
the right portion of MMIO region will be granted by the mediated device driver 
on the host.

Also, the physically contiguous doesn't mean the guest and host mmio is 1:1
always. You can have a 8GB host physical mmio while the guest will only have

Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Xiao Guangrong



On 07/05/2016 01:16 PM, Neo Jia wrote:

On Tue, Jul 05, 2016 at 12:02:42PM +0800, Xiao Guangrong wrote:



On 07/05/2016 09:35 AM, Neo Jia wrote:

On Tue, Jul 05, 2016 at 09:19:40AM +0800, Xiao Guangrong wrote:



On 07/04/2016 11:33 PM, Neo Jia wrote:



Sorry, I think I misread the "allocation" as "mapping". We only delay the
cpu mapping, not the allocation.


So how to understand your statement:
"at that moment nobody has any knowledge about how the physical mmio gets 
virtualized"

The resource, physical MMIO region, has been allocated, why we do not know the 
physical
address mapped to the VM?



>From a device driver point of view, the physical mmio region never gets 
allocated until
the corresponding resource is requested by clients and granted by the mediated 
device driver.


Hmm... but you told me that you did not delay the allocation. :(


Hi Guangrong,

The allocation here is the allocation of device resource, and the only way to
access that kind of device resource is via a mmio region of some pages there.

For example, if VM needs resource A, and the only way to access resource A is
via some kind of device memory at mmio address X.

So, we never defer the allocation request during runtime, we just setup the
CPU mapping later when it actually gets accessed.



So it returns to my original question: why not allocate the physical mmio 
region in mmap()?



Without running anything inside the VM, how do you know how the hw resource gets
allocated, therefore no knowledge of the use of mmio region.


The allocation and mapping can be two independent processes:
- the first process is just allocation. The MMIO region is allocated from 
physical
   hardware and this region is mapped into _QEMU's_ arbitrary virtual address 
by mmap().
   At this time, VM can not actually use this resource.

- the second process is mapping. When VM enable this region, e.g, it enables the
   PCI BAR, then QEMU maps its virtual address returned by mmap() to VM's 
physical
   memory. After that, VM can access this region.

The second process is completed handled in userspace, that means, the mediated
device driver needn't care how the resource is mapped into VM.


In your example, you are still picturing it as VFIO direct assign, but the 
solution we are
talking here is mediated passthru via VFIO framework to virtualize DMA devices 
without SR-IOV.



Please see my comments below.


(Just for completeness, if you really want to use a device in above example as
VFIO passthru, the second step is not completely handled in userspace, it is 
actually the guest
driver who will allocate and setup the proper hw resource which will later ready
for you to access via some mmio pages.)


Hmm... i always treat the VM as userspace.





This is how QEMU/VFIO currently works, could you please tell me the special 
points
of your solution comparing with current QEMU/VFIO and why current model can not 
fit
your requirement? So that we can better understand your scenario?


The scenario I am describing here is mediated passthru case, but what you are
describing here (more or less) is VFIO direct assigned case. It is different in 
several
areas, but major difference related to this topic here is:

1) In VFIO direct assigned case, the device (and its resource) is completely 
owned by the VM
therefore its mmio region can be mapped directly into the VM during the VFIO 
mmap() call as
there is no resource sharing among VMs and there is no mediated device driver on
the host to manage such resource, so it is completely owned by the guest.


I understand this difference, However, as you told to me that the MMIO region 
allocated for the
VM is continuous, so i assume the portion of physical MMIO region is completely 
owned by guest.
The only difference i can see is mediated device driver need to allocate that 
region.



2) In mediated passthru case, multiple VMs are sharing the same physical 
device, so how
the HW resource gets allocated is completely decided by the guest and host 
device driver of
the virtualized DMA device, here is the GPU, same as the MMIO pages used to 
access those Hw resource.


I can not see what guest's affair is here, look at your code, you cooked the 
fault handler like
this:

+   ret = parent->ops->validate_map_request(mdev, virtaddr,
+, _size,
+_prot);

Please tell me what information is got from guest? All these info can be found 
at the time of
mmap().



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-05 Thread Xiao Guangrong



On 07/05/2016 01:16 PM, Neo Jia wrote:

On Tue, Jul 05, 2016 at 12:02:42PM +0800, Xiao Guangrong wrote:



On 07/05/2016 09:35 AM, Neo Jia wrote:

On Tue, Jul 05, 2016 at 09:19:40AM +0800, Xiao Guangrong wrote:



On 07/04/2016 11:33 PM, Neo Jia wrote:



Sorry, I think I misread the "allocation" as "mapping". We only delay the
cpu mapping, not the allocation.


So how to understand your statement:
"at that moment nobody has any knowledge about how the physical mmio gets 
virtualized"

The resource, physical MMIO region, has been allocated, why we do not know the 
physical
address mapped to the VM?



>From a device driver point of view, the physical mmio region never gets 
allocated until
the corresponding resource is requested by clients and granted by the mediated 
device driver.


Hmm... but you told me that you did not delay the allocation. :(


Hi Guangrong,

The allocation here is the allocation of device resource, and the only way to
access that kind of device resource is via a mmio region of some pages there.

For example, if VM needs resource A, and the only way to access resource A is
via some kind of device memory at mmio address X.

So, we never defer the allocation request during runtime, we just setup the
CPU mapping later when it actually gets accessed.



So it returns to my original question: why not allocate the physical mmio 
region in mmap()?



Without running anything inside the VM, how do you know how the hw resource gets
allocated, therefore no knowledge of the use of mmio region.


The allocation and mapping can be two independent processes:
- the first process is just allocation. The MMIO region is allocated from 
physical
   hardware and this region is mapped into _QEMU's_ arbitrary virtual address 
by mmap().
   At this time, VM can not actually use this resource.

- the second process is mapping. When VM enable this region, e.g, it enables the
   PCI BAR, then QEMU maps its virtual address returned by mmap() to VM's 
physical
   memory. After that, VM can access this region.

The second process is completed handled in userspace, that means, the mediated
device driver needn't care how the resource is mapped into VM.


In your example, you are still picturing it as VFIO direct assign, but the 
solution we are
talking here is mediated passthru via VFIO framework to virtualize DMA devices 
without SR-IOV.



Please see my comments below.


(Just for completeness, if you really want to use a device in above example as
VFIO passthru, the second step is not completely handled in userspace, it is 
actually the guest
driver who will allocate and setup the proper hw resource which will later ready
for you to access via some mmio pages.)


Hmm... i always treat the VM as userspace.





This is how QEMU/VFIO currently works, could you please tell me the special 
points
of your solution comparing with current QEMU/VFIO and why current model can not 
fit
your requirement? So that we can better understand your scenario?


The scenario I am describing here is mediated passthru case, but what you are
describing here (more or less) is VFIO direct assigned case. It is different in 
several
areas, but major difference related to this topic here is:

1) In VFIO direct assigned case, the device (and its resource) is completely 
owned by the VM
therefore its mmio region can be mapped directly into the VM during the VFIO 
mmap() call as
there is no resource sharing among VMs and there is no mediated device driver on
the host to manage such resource, so it is completely owned by the guest.


I understand this difference, However, as you told to me that the MMIO region 
allocated for the
VM is continuous, so i assume the portion of physical MMIO region is completely 
owned by guest.
The only difference i can see is mediated device driver need to allocate that 
region.



2) In mediated passthru case, multiple VMs are sharing the same physical 
device, so how
the HW resource gets allocated is completely decided by the guest and host 
device driver of
the virtualized DMA device, here is the GPU, same as the MMIO pages used to 
access those Hw resource.


I can not see what guest's affair is here, look at your code, you cooked the 
fault handler like
this:

+   ret = parent->ops->validate_map_request(mdev, virtaddr,
+, _size,
+_prot);

Please tell me what information is got from guest? All these info can be found 
at the time of
mmap().



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Neo Jia
On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:
> The vGPU folks would like to trap the first access to a BAR by setting
> vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
> then can use remap_pfn_range to place some non-reserved pages in the VMA.
> 
> KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and these
> patches should fix this.

Hi Paolo,

I have tested your patches with the mediated passthru patchset that is being
reviewed in KVM and QEMU mailing list.

The fault handler gets called successfully and the previously mapped memory gets
unmmaped correctly via unmap_mapping_range.

Thanks,
Neo

> 
> Thanks,
> 
> Paolo
> 
> Paolo Bonzini (2):
>   KVM: MMU: prepare to support mapping of VM_IO and VM_PFNMAP frames
>   KVM: MMU: try to fix up page faults before giving up
> 
>  mm/gup.c|  1 +
>  virt/kvm/kvm_main.c | 55 
> -
>  2 files changed, 51 insertions(+), 5 deletions(-)
> 
> -- 
> 1.8.3.1
> 


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Neo Jia
On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:
> The vGPU folks would like to trap the first access to a BAR by setting
> vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
> then can use remap_pfn_range to place some non-reserved pages in the VMA.
> 
> KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and these
> patches should fix this.

Hi Paolo,

I have tested your patches with the mediated passthru patchset that is being
reviewed in KVM and QEMU mailing list.

The fault handler gets called successfully and the previously mapped memory gets
unmmaped correctly via unmap_mapping_range.

Thanks,
Neo

> 
> Thanks,
> 
> Paolo
> 
> Paolo Bonzini (2):
>   KVM: MMU: prepare to support mapping of VM_IO and VM_PFNMAP frames
>   KVM: MMU: try to fix up page faults before giving up
> 
>  mm/gup.c|  1 +
>  virt/kvm/kvm_main.c | 55 
> -
>  2 files changed, 51 insertions(+), 5 deletions(-)
> 
> -- 
> 1.8.3.1
> 


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Neo Jia
On Tue, Jul 05, 2016 at 12:02:42PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/05/2016 09:35 AM, Neo Jia wrote:
> >On Tue, Jul 05, 2016 at 09:19:40AM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 07/04/2016 11:33 PM, Neo Jia wrote:
> >>
> >
> >Sorry, I think I misread the "allocation" as "mapping". We only delay the
> >cpu mapping, not the allocation.
> 
> So how to understand your statement:
> "at that moment nobody has any knowledge about how the physical mmio gets 
> virtualized"
> 
> The resource, physical MMIO region, has been allocated, why we do not 
> know the physical
> address mapped to the VM?
> 
> >>>
> From a device driver point of view, the physical mmio region never gets 
> allocated until
> >>>the corresponding resource is requested by clients and granted by the 
> >>>mediated device driver.
> >>
> >>Hmm... but you told me that you did not delay the allocation. :(
> >
> >Hi Guangrong,
> >
> >The allocation here is the allocation of device resource, and the only way to
> >access that kind of device resource is via a mmio region of some pages there.
> >
> >For example, if VM needs resource A, and the only way to access resource A is
> >via some kind of device memory at mmio address X.
> >
> >So, we never defer the allocation request during runtime, we just setup the
> >CPU mapping later when it actually gets accessed.
> >
> >>
> >>So it returns to my original question: why not allocate the physical mmio 
> >>region in mmap()?
> >>
> >
> >Without running anything inside the VM, how do you know how the hw resource 
> >gets
> >allocated, therefore no knowledge of the use of mmio region.
> 
> The allocation and mapping can be two independent processes:
> - the first process is just allocation. The MMIO region is allocated from 
> physical
>   hardware and this region is mapped into _QEMU's_ arbitrary virtual address 
> by mmap().
>   At this time, VM can not actually use this resource.
> 
> - the second process is mapping. When VM enable this region, e.g, it enables 
> the
>   PCI BAR, then QEMU maps its virtual address returned by mmap() to VM's 
> physical
>   memory. After that, VM can access this region.
> 
> The second process is completed handled in userspace, that means, the mediated
> device driver needn't care how the resource is mapped into VM.

In your example, you are still picturing it as VFIO direct assign, but the 
solution we are 
talking here is mediated passthru via VFIO framework to virtualize DMA devices 
without SR-IOV.

(Just for completeness, if you really want to use a device in above example as
VFIO passthru, the second step is not completely handled in userspace, it is 
actually the guest
driver who will allocate and setup the proper hw resource which will later ready
for you to access via some mmio pages.)

> 
> This is how QEMU/VFIO currently works, could you please tell me the special 
> points
> of your solution comparing with current QEMU/VFIO and why current model can 
> not fit
> your requirement? So that we can better understand your scenario?

The scenario I am describing here is mediated passthru case, but what you are
describing here (more or less) is VFIO direct assigned case. It is different in 
several
areas, but major difference related to this topic here is:

1) In VFIO direct assigned case, the device (and its resource) is completely 
owned by the VM
therefore its mmio region can be mapped directly into the VM during the VFIO 
mmap() call as
there is no resource sharing among VMs and there is no mediated device driver on
the host to manage such resource, so it is completely owned by the guest.

2) In mediated passthru case, multiple VMs are sharing the same physical 
device, so how
the HW resource gets allocated is completely decided by the guest and host 
device driver of 
the virtualized DMA device, here is the GPU, same as the MMIO pages used to 
access those Hw resource.

Thanks,
Neo



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Neo Jia
On Tue, Jul 05, 2016 at 12:02:42PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/05/2016 09:35 AM, Neo Jia wrote:
> >On Tue, Jul 05, 2016 at 09:19:40AM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 07/04/2016 11:33 PM, Neo Jia wrote:
> >>
> >
> >Sorry, I think I misread the "allocation" as "mapping". We only delay the
> >cpu mapping, not the allocation.
> 
> So how to understand your statement:
> "at that moment nobody has any knowledge about how the physical mmio gets 
> virtualized"
> 
> The resource, physical MMIO region, has been allocated, why we do not 
> know the physical
> address mapped to the VM?
> 
> >>>
> From a device driver point of view, the physical mmio region never gets 
> allocated until
> >>>the corresponding resource is requested by clients and granted by the 
> >>>mediated device driver.
> >>
> >>Hmm... but you told me that you did not delay the allocation. :(
> >
> >Hi Guangrong,
> >
> >The allocation here is the allocation of device resource, and the only way to
> >access that kind of device resource is via a mmio region of some pages there.
> >
> >For example, if VM needs resource A, and the only way to access resource A is
> >via some kind of device memory at mmio address X.
> >
> >So, we never defer the allocation request during runtime, we just setup the
> >CPU mapping later when it actually gets accessed.
> >
> >>
> >>So it returns to my original question: why not allocate the physical mmio 
> >>region in mmap()?
> >>
> >
> >Without running anything inside the VM, how do you know how the hw resource 
> >gets
> >allocated, therefore no knowledge of the use of mmio region.
> 
> The allocation and mapping can be two independent processes:
> - the first process is just allocation. The MMIO region is allocated from 
> physical
>   hardware and this region is mapped into _QEMU's_ arbitrary virtual address 
> by mmap().
>   At this time, VM can not actually use this resource.
> 
> - the second process is mapping. When VM enable this region, e.g, it enables 
> the
>   PCI BAR, then QEMU maps its virtual address returned by mmap() to VM's 
> physical
>   memory. After that, VM can access this region.
> 
> The second process is completed handled in userspace, that means, the mediated
> device driver needn't care how the resource is mapped into VM.

In your example, you are still picturing it as VFIO direct assign, but the 
solution we are 
talking here is mediated passthru via VFIO framework to virtualize DMA devices 
without SR-IOV.

(Just for completeness, if you really want to use a device in above example as
VFIO passthru, the second step is not completely handled in userspace, it is 
actually the guest
driver who will allocate and setup the proper hw resource which will later ready
for you to access via some mmio pages.)

> 
> This is how QEMU/VFIO currently works, could you please tell me the special 
> points
> of your solution comparing with current QEMU/VFIO and why current model can 
> not fit
> your requirement? So that we can better understand your scenario?

The scenario I am describing here is mediated passthru case, but what you are
describing here (more or less) is VFIO direct assigned case. It is different in 
several
areas, but major difference related to this topic here is:

1) In VFIO direct assigned case, the device (and its resource) is completely 
owned by the VM
therefore its mmio region can be mapped directly into the VM during the VFIO 
mmap() call as
there is no resource sharing among VMs and there is no mediated device driver on
the host to manage such resource, so it is completely owned by the guest.

2) In mediated passthru case, multiple VMs are sharing the same physical 
device, so how
the HW resource gets allocated is completely decided by the guest and host 
device driver of 
the virtualized DMA device, here is the GPU, same as the MMIO pages used to 
access those Hw resource.

Thanks,
Neo



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/05/2016 09:35 AM, Neo Jia wrote:

On Tue, Jul 05, 2016 at 09:19:40AM +0800, Xiao Guangrong wrote:



On 07/04/2016 11:33 PM, Neo Jia wrote:



Sorry, I think I misread the "allocation" as "mapping". We only delay the
cpu mapping, not the allocation.


So how to understand your statement:
"at that moment nobody has any knowledge about how the physical mmio gets 
virtualized"

The resource, physical MMIO region, has been allocated, why we do not know the 
physical
address mapped to the VM?



>From a device driver point of view, the physical mmio region never gets 
allocated until
the corresponding resource is requested by clients and granted by the mediated 
device driver.


Hmm... but you told me that you did not delay the allocation. :(


Hi Guangrong,

The allocation here is the allocation of device resource, and the only way to
access that kind of device resource is via a mmio region of some pages there.

For example, if VM needs resource A, and the only way to access resource A is
via some kind of device memory at mmio address X.

So, we never defer the allocation request during runtime, we just setup the
CPU mapping later when it actually gets accessed.



So it returns to my original question: why not allocate the physical mmio 
region in mmap()?



Without running anything inside the VM, how do you know how the hw resource gets
allocated, therefore no knowledge of the use of mmio region.


The allocation and mapping can be two independent processes:
- the first process is just allocation. The MMIO region is allocated from 
physical
  hardware and this region is mapped into _QEMU's_ arbitrary virtual address by 
mmap().
  At this time, VM can not actually use this resource.

- the second process is mapping. When VM enable this region, e.g, it enables the
  PCI BAR, then QEMU maps its virtual address returned by mmap() to VM's 
physical
  memory. After that, VM can access this region.

The second process is completed handled in userspace, that means, the mediated
device driver needn't care how the resource is mapped into VM.

This is how QEMU/VFIO currently works, could you please tell me the special 
points
of your solution comparing with current QEMU/VFIO and why current model can not 
fit
your requirement? So that we can better understand your scenario?


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/05/2016 09:35 AM, Neo Jia wrote:

On Tue, Jul 05, 2016 at 09:19:40AM +0800, Xiao Guangrong wrote:



On 07/04/2016 11:33 PM, Neo Jia wrote:



Sorry, I think I misread the "allocation" as "mapping". We only delay the
cpu mapping, not the allocation.


So how to understand your statement:
"at that moment nobody has any knowledge about how the physical mmio gets 
virtualized"

The resource, physical MMIO region, has been allocated, why we do not know the 
physical
address mapped to the VM?



>From a device driver point of view, the physical mmio region never gets 
allocated until
the corresponding resource is requested by clients and granted by the mediated 
device driver.


Hmm... but you told me that you did not delay the allocation. :(


Hi Guangrong,

The allocation here is the allocation of device resource, and the only way to
access that kind of device resource is via a mmio region of some pages there.

For example, if VM needs resource A, and the only way to access resource A is
via some kind of device memory at mmio address X.

So, we never defer the allocation request during runtime, we just setup the
CPU mapping later when it actually gets accessed.



So it returns to my original question: why not allocate the physical mmio 
region in mmap()?



Without running anything inside the VM, how do you know how the hw resource gets
allocated, therefore no knowledge of the use of mmio region.


The allocation and mapping can be two independent processes:
- the first process is just allocation. The MMIO region is allocated from 
physical
  hardware and this region is mapped into _QEMU's_ arbitrary virtual address by 
mmap().
  At this time, VM can not actually use this resource.

- the second process is mapping. When VM enable this region, e.g, it enables the
  PCI BAR, then QEMU maps its virtual address returned by mmap() to VM's 
physical
  memory. After that, VM can access this region.

The second process is completed handled in userspace, that means, the mediated
device driver needn't care how the resource is mapped into VM.

This is how QEMU/VFIO currently works, could you please tell me the special 
points
of your solution comparing with current QEMU/VFIO and why current model can not 
fit
your requirement? So that we can better understand your scenario?


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Neo Jia
On Tue, Jul 05, 2016 at 09:19:40AM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/04/2016 11:33 PM, Neo Jia wrote:
> 
> >>>
> >>>Sorry, I think I misread the "allocation" as "mapping". We only delay the
> >>>cpu mapping, not the allocation.
> >>
> >>So how to understand your statement:
> >>"at that moment nobody has any knowledge about how the physical mmio gets 
> >>virtualized"
> >>
> >>The resource, physical MMIO region, has been allocated, why we do not know 
> >>the physical
> >>address mapped to the VM?
> >>
> >
> >>From a device driver point of view, the physical mmio region never gets 
> >>allocated until
> >the corresponding resource is requested by clients and granted by the 
> >mediated device driver.
> 
> Hmm... but you told me that you did not delay the allocation. :(

Hi Guangrong,

The allocation here is the allocation of device resource, and the only way to
access that kind of device resource is via a mmio region of some pages there.

For example, if VM needs resource A, and the only way to access resource A is
via some kind of device memory at mmio address X.

So, we never defer the allocation request during runtime, we just setup the
CPU mapping later when it actually gets accessed.

> 
> So it returns to my original question: why not allocate the physical mmio 
> region in mmap()?
> 

Without running anything inside the VM, how do you know how the hw resource gets
allocated, therefore no knowledge of the use of mmio region.

Thanks,
Neo

> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Neo Jia
On Tue, Jul 05, 2016 at 09:19:40AM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/04/2016 11:33 PM, Neo Jia wrote:
> 
> >>>
> >>>Sorry, I think I misread the "allocation" as "mapping". We only delay the
> >>>cpu mapping, not the allocation.
> >>
> >>So how to understand your statement:
> >>"at that moment nobody has any knowledge about how the physical mmio gets 
> >>virtualized"
> >>
> >>The resource, physical MMIO region, has been allocated, why we do not know 
> >>the physical
> >>address mapped to the VM?
> >>
> >
> >>From a device driver point of view, the physical mmio region never gets 
> >>allocated until
> >the corresponding resource is requested by clients and granted by the 
> >mediated device driver.
> 
> Hmm... but you told me that you did not delay the allocation. :(

Hi Guangrong,

The allocation here is the allocation of device resource, and the only way to
access that kind of device resource is via a mmio region of some pages there.

For example, if VM needs resource A, and the only way to access resource A is
via some kind of device memory at mmio address X.

So, we never defer the allocation request during runtime, we just setup the
CPU mapping later when it actually gets accessed.

> 
> So it returns to my original question: why not allocate the physical mmio 
> region in mmap()?
> 

Without running anything inside the VM, how do you know how the hw resource gets
allocated, therefore no knowledge of the use of mmio region.

Thanks,
Neo

> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/04/2016 11:33 PM, Neo Jia wrote:



Sorry, I think I misread the "allocation" as "mapping". We only delay the
cpu mapping, not the allocation.


So how to understand your statement:
"at that moment nobody has any knowledge about how the physical mmio gets 
virtualized"

The resource, physical MMIO region, has been allocated, why we do not know the 
physical
address mapped to the VM?




From a device driver point of view, the physical mmio region never gets 
allocated until

the corresponding resource is requested by clients and granted by the mediated 
device driver.


Hmm... but you told me that you did not delay the allocation. :(

So it returns to my original question: why not allocate the physical mmio 
region in mmap()?






Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/04/2016 11:33 PM, Neo Jia wrote:



Sorry, I think I misread the "allocation" as "mapping". We only delay the
cpu mapping, not the allocation.


So how to understand your statement:
"at that moment nobody has any knowledge about how the physical mmio gets 
virtualized"

The resource, physical MMIO region, has been allocated, why we do not know the 
physical
address mapped to the VM?




From a device driver point of view, the physical mmio region never gets 
allocated until

the corresponding resource is requested by clients and granted by the mediated 
device driver.


Hmm... but you told me that you did not delay the allocation. :(

So it returns to my original question: why not allocate the physical mmio 
region in mmap()?






Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Neo Jia
On Mon, Jul 04, 2016 at 06:16:46PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/04/2016 05:16 PM, Neo Jia wrote:
> >On Mon, Jul 04, 2016 at 04:45:05PM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 07/04/2016 04:41 PM, Neo Jia wrote:
> >>>On Mon, Jul 04, 2016 at 04:19:20PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/04/2016 03:53 PM, Neo Jia wrote:
> >On Mon, Jul 04, 2016 at 03:37:35PM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 07/04/2016 03:03 PM, Neo Jia wrote:
> >>>On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:
> 
> 
> On 06/30/2016 09:01 PM, Paolo Bonzini wrote:
> >The vGPU folks would like to trap the first access to a BAR by 
> >setting
> >vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault 
> >handler
> >then can use remap_pfn_range to place some non-reserved pages in the 
> >VMA.
> 
> Why does it require fetching the pfn when the fault is triggered 
> rather
> than when mmap() is called?
> >>>
> >>>Hi Guangrong,
> >>>
> >>>as such mapping information between virtual mmio to physical mmio is 
> >>>only available
> >>>at runtime.
> >>
> >>Sorry, i do not know what the different between mmap() and the time VM 
> >>actually
> >>accesses the memory for your case. Could you please more detail?
> >
> >Hi Guangrong,
> >
> >Sure. The mmap() gets called by qemu or any VFIO API userspace consumer 
> >when
> >setting up the virtual mmio, at that moment nobody has any knowledge 
> >about how
> >the physical mmio gets virtualized.
> >
> >When the vm (or application if we don't want to limit ourselves to vmm 
> >term)
> >starts, the virtual and physical mmio gets mapped by mpci kernel module 
> >with the
> >help from vendor supplied mediated host driver according to the hw 
> >resource
> >assigned to this vm / application.
> 
> Thanks for your expiation.
> 
> It sounds like a strategy of resource allocation, you delay the 
> allocation until VM really
> accesses it, right?
> >>>
> >>>Yes, that is where the fault handler inside mpci code comes to the picture.
> >>
> >>
> >>I am not sure this strategy is good. The instance is successfully created, 
> >>and it is started
> >>successful, but the VM is crashed due to the resource of that instance is 
> >>not enough. That sounds
> >>unreasonable.
> >
> >
> >Sorry, I think I misread the "allocation" as "mapping". We only delay the
> >cpu mapping, not the allocation.
> 
> So how to understand your statement:
> "at that moment nobody has any knowledge about how the physical mmio gets 
> virtualized"
> 
> The resource, physical MMIO region, has been allocated, why we do not know 
> the physical
> address mapped to the VM?
> 

>From a device driver point of view, the physical mmio region never gets 
>allocated until 
the corresponding resource is requested by clients and granted by the mediated 
device driver. 

The resource here is the internal hw resource.

"at that moment" == vfio client triggers mmap() call.

Thanks,
Neo

> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Neo Jia
On Mon, Jul 04, 2016 at 06:16:46PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/04/2016 05:16 PM, Neo Jia wrote:
> >On Mon, Jul 04, 2016 at 04:45:05PM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 07/04/2016 04:41 PM, Neo Jia wrote:
> >>>On Mon, Jul 04, 2016 at 04:19:20PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/04/2016 03:53 PM, Neo Jia wrote:
> >On Mon, Jul 04, 2016 at 03:37:35PM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 07/04/2016 03:03 PM, Neo Jia wrote:
> >>>On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:
> 
> 
> On 06/30/2016 09:01 PM, Paolo Bonzini wrote:
> >The vGPU folks would like to trap the first access to a BAR by 
> >setting
> >vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault 
> >handler
> >then can use remap_pfn_range to place some non-reserved pages in the 
> >VMA.
> 
> Why does it require fetching the pfn when the fault is triggered 
> rather
> than when mmap() is called?
> >>>
> >>>Hi Guangrong,
> >>>
> >>>as such mapping information between virtual mmio to physical mmio is 
> >>>only available
> >>>at runtime.
> >>
> >>Sorry, i do not know what the different between mmap() and the time VM 
> >>actually
> >>accesses the memory for your case. Could you please more detail?
> >
> >Hi Guangrong,
> >
> >Sure. The mmap() gets called by qemu or any VFIO API userspace consumer 
> >when
> >setting up the virtual mmio, at that moment nobody has any knowledge 
> >about how
> >the physical mmio gets virtualized.
> >
> >When the vm (or application if we don't want to limit ourselves to vmm 
> >term)
> >starts, the virtual and physical mmio gets mapped by mpci kernel module 
> >with the
> >help from vendor supplied mediated host driver according to the hw 
> >resource
> >assigned to this vm / application.
> 
> Thanks for your expiation.
> 
> It sounds like a strategy of resource allocation, you delay the 
> allocation until VM really
> accesses it, right?
> >>>
> >>>Yes, that is where the fault handler inside mpci code comes to the picture.
> >>
> >>
> >>I am not sure this strategy is good. The instance is successfully created, 
> >>and it is started
> >>successful, but the VM is crashed due to the resource of that instance is 
> >>not enough. That sounds
> >>unreasonable.
> >
> >
> >Sorry, I think I misread the "allocation" as "mapping". We only delay the
> >cpu mapping, not the allocation.
> 
> So how to understand your statement:
> "at that moment nobody has any knowledge about how the physical mmio gets 
> virtualized"
> 
> The resource, physical MMIO region, has been allocated, why we do not know 
> the physical
> address mapped to the VM?
> 

>From a device driver point of view, the physical mmio region never gets 
>allocated until 
the corresponding resource is requested by clients and granted by the mediated 
device driver. 

The resource here is the internal hw resource.

"at that moment" == vfio client triggers mmap() call.

Thanks,
Neo

> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/04/2016 05:16 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 04:45:05PM +0800, Xiao Guangrong wrote:



On 07/04/2016 04:41 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 04:19:20PM +0800, Xiao Guangrong wrote:



On 07/04/2016 03:53 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 03:37:35PM +0800, Xiao Guangrong wrote:



On 07/04/2016 03:03 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:



On 06/30/2016 09:01 PM, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.


Why does it require fetching the pfn when the fault is triggered rather
than when mmap() is called?


Hi Guangrong,

as such mapping information between virtual mmio to physical mmio is only 
available
at runtime.


Sorry, i do not know what the different between mmap() and the time VM actually
accesses the memory for your case. Could you please more detail?


Hi Guangrong,

Sure. The mmap() gets called by qemu or any VFIO API userspace consumer when
setting up the virtual mmio, at that moment nobody has any knowledge about how
the physical mmio gets virtualized.

When the vm (or application if we don't want to limit ourselves to vmm term)
starts, the virtual and physical mmio gets mapped by mpci kernel module with the
help from vendor supplied mediated host driver according to the hw resource
assigned to this vm / application.


Thanks for your expiation.

It sounds like a strategy of resource allocation, you delay the allocation 
until VM really
accesses it, right?


Yes, that is where the fault handler inside mpci code comes to the picture.



I am not sure this strategy is good. The instance is successfully created, and 
it is started
successful, but the VM is crashed due to the resource of that instance is not 
enough. That sounds
unreasonable.



Sorry, I think I misread the "allocation" as "mapping". We only delay the
cpu mapping, not the allocation.


So how to understand your statement:
"at that moment nobody has any knowledge about how the physical mmio gets 
virtualized"

The resource, physical MMIO region, has been allocated, why we do not know the 
physical
address mapped to the VM?




Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/04/2016 05:16 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 04:45:05PM +0800, Xiao Guangrong wrote:



On 07/04/2016 04:41 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 04:19:20PM +0800, Xiao Guangrong wrote:



On 07/04/2016 03:53 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 03:37:35PM +0800, Xiao Guangrong wrote:



On 07/04/2016 03:03 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:



On 06/30/2016 09:01 PM, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.


Why does it require fetching the pfn when the fault is triggered rather
than when mmap() is called?


Hi Guangrong,

as such mapping information between virtual mmio to physical mmio is only 
available
at runtime.


Sorry, i do not know what the different between mmap() and the time VM actually
accesses the memory for your case. Could you please more detail?


Hi Guangrong,

Sure. The mmap() gets called by qemu or any VFIO API userspace consumer when
setting up the virtual mmio, at that moment nobody has any knowledge about how
the physical mmio gets virtualized.

When the vm (or application if we don't want to limit ourselves to vmm term)
starts, the virtual and physical mmio gets mapped by mpci kernel module with the
help from vendor supplied mediated host driver according to the hw resource
assigned to this vm / application.


Thanks for your expiation.

It sounds like a strategy of resource allocation, you delay the allocation 
until VM really
accesses it, right?


Yes, that is where the fault handler inside mpci code comes to the picture.



I am not sure this strategy is good. The instance is successfully created, and 
it is started
successful, but the VM is crashed due to the resource of that instance is not 
enough. That sounds
unreasonable.



Sorry, I think I misread the "allocation" as "mapping". We only delay the
cpu mapping, not the allocation.


So how to understand your statement:
"at that moment nobody has any knowledge about how the physical mmio gets 
virtualized"

The resource, physical MMIO region, has been allocated, why we do not know the 
physical
address mapped to the VM?




Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Neo Jia
On Mon, Jul 04, 2016 at 04:45:05PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/04/2016 04:41 PM, Neo Jia wrote:
> >On Mon, Jul 04, 2016 at 04:19:20PM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 07/04/2016 03:53 PM, Neo Jia wrote:
> >>>On Mon, Jul 04, 2016 at 03:37:35PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/04/2016 03:03 PM, Neo Jia wrote:
> >On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 06/30/2016 09:01 PM, Paolo Bonzini wrote:
> >>>The vGPU folks would like to trap the first access to a BAR by setting
> >>>vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault 
> >>>handler
> >>>then can use remap_pfn_range to place some non-reserved pages in the 
> >>>VMA.
> >>
> >>Why does it require fetching the pfn when the fault is triggered rather
> >>than when mmap() is called?
> >
> >Hi Guangrong,
> >
> >as such mapping information between virtual mmio to physical mmio is 
> >only available
> >at runtime.
> 
> Sorry, i do not know what the different between mmap() and the time VM 
> actually
> accesses the memory for your case. Could you please more detail?
> >>>
> >>>Hi Guangrong,
> >>>
> >>>Sure. The mmap() gets called by qemu or any VFIO API userspace consumer 
> >>>when
> >>>setting up the virtual mmio, at that moment nobody has any knowledge about 
> >>>how
> >>>the physical mmio gets virtualized.
> >>>
> >>>When the vm (or application if we don't want to limit ourselves to vmm 
> >>>term)
> >>>starts, the virtual and physical mmio gets mapped by mpci kernel module 
> >>>with the
> >>>help from vendor supplied mediated host driver according to the hw resource
> >>>assigned to this vm / application.
> >>
> >>Thanks for your expiation.
> >>
> >>It sounds like a strategy of resource allocation, you delay the allocation 
> >>until VM really
> >>accesses it, right?
> >
> >Yes, that is where the fault handler inside mpci code comes to the picture.
> 
> 
> I am not sure this strategy is good. The instance is successfully created, 
> and it is started
> successful, but the VM is crashed due to the resource of that instance is not 
> enough. That sounds
> unreasonable.


Sorry, I think I misread the "allocation" as "mapping". We only delay the
cpu mapping, not the allocation.

Thanks,
Neo

> 
> 
> 


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Neo Jia
On Mon, Jul 04, 2016 at 04:45:05PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/04/2016 04:41 PM, Neo Jia wrote:
> >On Mon, Jul 04, 2016 at 04:19:20PM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 07/04/2016 03:53 PM, Neo Jia wrote:
> >>>On Mon, Jul 04, 2016 at 03:37:35PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/04/2016 03:03 PM, Neo Jia wrote:
> >On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 06/30/2016 09:01 PM, Paolo Bonzini wrote:
> >>>The vGPU folks would like to trap the first access to a BAR by setting
> >>>vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault 
> >>>handler
> >>>then can use remap_pfn_range to place some non-reserved pages in the 
> >>>VMA.
> >>
> >>Why does it require fetching the pfn when the fault is triggered rather
> >>than when mmap() is called?
> >
> >Hi Guangrong,
> >
> >as such mapping information between virtual mmio to physical mmio is 
> >only available
> >at runtime.
> 
> Sorry, i do not know what the different between mmap() and the time VM 
> actually
> accesses the memory for your case. Could you please more detail?
> >>>
> >>>Hi Guangrong,
> >>>
> >>>Sure. The mmap() gets called by qemu or any VFIO API userspace consumer 
> >>>when
> >>>setting up the virtual mmio, at that moment nobody has any knowledge about 
> >>>how
> >>>the physical mmio gets virtualized.
> >>>
> >>>When the vm (or application if we don't want to limit ourselves to vmm 
> >>>term)
> >>>starts, the virtual and physical mmio gets mapped by mpci kernel module 
> >>>with the
> >>>help from vendor supplied mediated host driver according to the hw resource
> >>>assigned to this vm / application.
> >>
> >>Thanks for your expiation.
> >>
> >>It sounds like a strategy of resource allocation, you delay the allocation 
> >>until VM really
> >>accesses it, right?
> >
> >Yes, that is where the fault handler inside mpci code comes to the picture.
> 
> 
> I am not sure this strategy is good. The instance is successfully created, 
> and it is started
> successful, but the VM is crashed due to the resource of that instance is not 
> enough. That sounds
> unreasonable.


Sorry, I think I misread the "allocation" as "mapping". We only delay the
cpu mapping, not the allocation.

Thanks,
Neo

> 
> 
> 


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/04/2016 04:45 PM, Xiao Guangrong wrote:



On 07/04/2016 04:41 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 04:19:20PM +0800, Xiao Guangrong wrote:



On 07/04/2016 03:53 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 03:37:35PM +0800, Xiao Guangrong wrote:



On 07/04/2016 03:03 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:



On 06/30/2016 09:01 PM, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.


Why does it require fetching the pfn when the fault is triggered rather
than when mmap() is called?


Hi Guangrong,

as such mapping information between virtual mmio to physical mmio is only 
available
at runtime.


Sorry, i do not know what the different between mmap() and the time VM actually
accesses the memory for your case. Could you please more detail?


Hi Guangrong,

Sure. The mmap() gets called by qemu or any VFIO API userspace consumer when
setting up the virtual mmio, at that moment nobody has any knowledge about how
the physical mmio gets virtualized.

When the vm (or application if we don't want to limit ourselves to vmm term)
starts, the virtual and physical mmio gets mapped by mpci kernel module with the
help from vendor supplied mediated host driver according to the hw resource
assigned to this vm / application.


Thanks for your expiation.

It sounds like a strategy of resource allocation, you delay the allocation 
until VM really
accesses it, right?


Yes, that is where the fault handler inside mpci code comes to the picture.



I am not sure this strategy is good. The instance is successfully created, and 
it is started
successful, but the VM is crashed due to the resource of that instance is not 
enough. That sounds
unreasonable.



Especially, you can not squeeze this kind of memory to balance the usage 
between all VMs. Does
this strategy still make sense?




Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/04/2016 04:45 PM, Xiao Guangrong wrote:



On 07/04/2016 04:41 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 04:19:20PM +0800, Xiao Guangrong wrote:



On 07/04/2016 03:53 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 03:37:35PM +0800, Xiao Guangrong wrote:



On 07/04/2016 03:03 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:



On 06/30/2016 09:01 PM, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.


Why does it require fetching the pfn when the fault is triggered rather
than when mmap() is called?


Hi Guangrong,

as such mapping information between virtual mmio to physical mmio is only 
available
at runtime.


Sorry, i do not know what the different between mmap() and the time VM actually
accesses the memory for your case. Could you please more detail?


Hi Guangrong,

Sure. The mmap() gets called by qemu or any VFIO API userspace consumer when
setting up the virtual mmio, at that moment nobody has any knowledge about how
the physical mmio gets virtualized.

When the vm (or application if we don't want to limit ourselves to vmm term)
starts, the virtual and physical mmio gets mapped by mpci kernel module with the
help from vendor supplied mediated host driver according to the hw resource
assigned to this vm / application.


Thanks for your expiation.

It sounds like a strategy of resource allocation, you delay the allocation 
until VM really
accesses it, right?


Yes, that is where the fault handler inside mpci code comes to the picture.



I am not sure this strategy is good. The instance is successfully created, and 
it is started
successful, but the VM is crashed due to the resource of that instance is not 
enough. That sounds
unreasonable.



Especially, you can not squeeze this kind of memory to balance the usage 
between all VMs. Does
this strategy still make sense?




Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Paolo Bonzini


On 04/07/2016 10:21, Xiao Guangrong wrote:
>>
>> /*
>>  * In case the VMA has VM_MIXEDMAP set, whoever called
>> remap_pfn_range
>>  * is also going to call e.g. unmap_mapping_range before the
>> underlying
>>  * non-reserved pages are freed, which will then call our MMU
>> notifier.
>>  * We still have to get a reference here to the page, because the
>> callers
>>  * of *hva_to_pfn* and *gfn_to_pfn* ultimately end up doing a
>>  * kvm_release_pfn_clean on the returned pfn.  If the pfn is
>>  * reserved, the kvm_get_pfn/kvm_release_pfn_clean pair will simply
>>  * do nothing.
>>  */
>>
> 
> Excellent. I like it. :)

So is it Reviewed-by Guangrong? :)

Paolo


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Paolo Bonzini


On 04/07/2016 10:21, Xiao Guangrong wrote:
>>
>> /*
>>  * In case the VMA has VM_MIXEDMAP set, whoever called
>> remap_pfn_range
>>  * is also going to call e.g. unmap_mapping_range before the
>> underlying
>>  * non-reserved pages are freed, which will then call our MMU
>> notifier.
>>  * We still have to get a reference here to the page, because the
>> callers
>>  * of *hva_to_pfn* and *gfn_to_pfn* ultimately end up doing a
>>  * kvm_release_pfn_clean on the returned pfn.  If the pfn is
>>  * reserved, the kvm_get_pfn/kvm_release_pfn_clean pair will simply
>>  * do nothing.
>>  */
>>
> 
> Excellent. I like it. :)

So is it Reviewed-by Guangrong? :)

Paolo


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/04/2016 04:41 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 04:19:20PM +0800, Xiao Guangrong wrote:



On 07/04/2016 03:53 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 03:37:35PM +0800, Xiao Guangrong wrote:



On 07/04/2016 03:03 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:



On 06/30/2016 09:01 PM, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.


Why does it require fetching the pfn when the fault is triggered rather
than when mmap() is called?


Hi Guangrong,

as such mapping information between virtual mmio to physical mmio is only 
available
at runtime.


Sorry, i do not know what the different between mmap() and the time VM actually
accesses the memory for your case. Could you please more detail?


Hi Guangrong,

Sure. The mmap() gets called by qemu or any VFIO API userspace consumer when
setting up the virtual mmio, at that moment nobody has any knowledge about how
the physical mmio gets virtualized.

When the vm (or application if we don't want to limit ourselves to vmm term)
starts, the virtual and physical mmio gets mapped by mpci kernel module with the
help from vendor supplied mediated host driver according to the hw resource
assigned to this vm / application.


Thanks for your expiation.

It sounds like a strategy of resource allocation, you delay the allocation 
until VM really
accesses it, right?


Yes, that is where the fault handler inside mpci code comes to the picture.



I am not sure this strategy is good. The instance is successfully created, and 
it is started
successful, but the VM is crashed due to the resource of that instance is not 
enough. That sounds
unreasonable.





Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/04/2016 04:41 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 04:19:20PM +0800, Xiao Guangrong wrote:



On 07/04/2016 03:53 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 03:37:35PM +0800, Xiao Guangrong wrote:



On 07/04/2016 03:03 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:



On 06/30/2016 09:01 PM, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.


Why does it require fetching the pfn when the fault is triggered rather
than when mmap() is called?


Hi Guangrong,

as such mapping information between virtual mmio to physical mmio is only 
available
at runtime.


Sorry, i do not know what the different between mmap() and the time VM actually
accesses the memory for your case. Could you please more detail?


Hi Guangrong,

Sure. The mmap() gets called by qemu or any VFIO API userspace consumer when
setting up the virtual mmio, at that moment nobody has any knowledge about how
the physical mmio gets virtualized.

When the vm (or application if we don't want to limit ourselves to vmm term)
starts, the virtual and physical mmio gets mapped by mpci kernel module with the
help from vendor supplied mediated host driver according to the hw resource
assigned to this vm / application.


Thanks for your expiation.

It sounds like a strategy of resource allocation, you delay the allocation 
until VM really
accesses it, right?


Yes, that is where the fault handler inside mpci code comes to the picture.



I am not sure this strategy is good. The instance is successfully created, and 
it is started
successful, but the VM is crashed due to the resource of that instance is not 
enough. That sounds
unreasonable.





Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Neo Jia
On Mon, Jul 04, 2016 at 04:19:20PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/04/2016 03:53 PM, Neo Jia wrote:
> >On Mon, Jul 04, 2016 at 03:37:35PM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 07/04/2016 03:03 PM, Neo Jia wrote:
> >>>On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:
> 
> 
> On 06/30/2016 09:01 PM, Paolo Bonzini wrote:
> >The vGPU folks would like to trap the first access to a BAR by setting
> >vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
> >then can use remap_pfn_range to place some non-reserved pages in the VMA.
> 
> Why does it require fetching the pfn when the fault is triggered rather
> than when mmap() is called?
> >>>
> >>>Hi Guangrong,
> >>>
> >>>as such mapping information between virtual mmio to physical mmio is only 
> >>>available
> >>>at runtime.
> >>
> >>Sorry, i do not know what the different between mmap() and the time VM 
> >>actually
> >>accesses the memory for your case. Could you please more detail?
> >
> >Hi Guangrong,
> >
> >Sure. The mmap() gets called by qemu or any VFIO API userspace consumer when
> >setting up the virtual mmio, at that moment nobody has any knowledge about 
> >how
> >the physical mmio gets virtualized.
> >
> >When the vm (or application if we don't want to limit ourselves to vmm term)
> >starts, the virtual and physical mmio gets mapped by mpci kernel module with 
> >the
> >help from vendor supplied mediated host driver according to the hw resource
> >assigned to this vm / application.
> 
> Thanks for your expiation.
> 
> It sounds like a strategy of resource allocation, you delay the allocation 
> until VM really
> accesses it, right?

Yes, that is where the fault handler inside mpci code comes to the picture.

Thanks,
Neo




Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Neo Jia
On Mon, Jul 04, 2016 at 04:19:20PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/04/2016 03:53 PM, Neo Jia wrote:
> >On Mon, Jul 04, 2016 at 03:37:35PM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 07/04/2016 03:03 PM, Neo Jia wrote:
> >>>On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:
> 
> 
> On 06/30/2016 09:01 PM, Paolo Bonzini wrote:
> >The vGPU folks would like to trap the first access to a BAR by setting
> >vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
> >then can use remap_pfn_range to place some non-reserved pages in the VMA.
> 
> Why does it require fetching the pfn when the fault is triggered rather
> than when mmap() is called?
> >>>
> >>>Hi Guangrong,
> >>>
> >>>as such mapping information between virtual mmio to physical mmio is only 
> >>>available
> >>>at runtime.
> >>
> >>Sorry, i do not know what the different between mmap() and the time VM 
> >>actually
> >>accesses the memory for your case. Could you please more detail?
> >
> >Hi Guangrong,
> >
> >Sure. The mmap() gets called by qemu or any VFIO API userspace consumer when
> >setting up the virtual mmio, at that moment nobody has any knowledge about 
> >how
> >the physical mmio gets virtualized.
> >
> >When the vm (or application if we don't want to limit ourselves to vmm term)
> >starts, the virtual and physical mmio gets mapped by mpci kernel module with 
> >the
> >help from vendor supplied mediated host driver according to the hw resource
> >assigned to this vm / application.
> 
> Thanks for your expiation.
> 
> It sounds like a strategy of resource allocation, you delay the allocation 
> until VM really
> accesses it, right?

Yes, that is where the fault handler inside mpci code comes to the picture.

Thanks,
Neo




Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/04/2016 04:14 PM, Paolo Bonzini wrote:



On 04/07/2016 09:59, Xiao Guangrong wrote:



But apart from this, it's much more obvious to consider the refcount.
The x86 MMU code doesn't care if the page is reserved or not;
mmu_set_spte does a kvm_release_pfn_clean, hence it makes sense for
hva_to_pfn_remapped to try doing a get_page (via kvm_get_pfn) after
invoking the fault handler, just like the get_user_pages family of
function does.


Well,  it's little strange as you always try to get refcont
for a PFNMAP region without MIXEDMAP which indicates all the memory
in this region is no 'struct page' backend.


Fair enough, I can modify the comment.

/*
 * In case the VMA has VM_MIXEDMAP set, whoever called remap_pfn_range
 * is also going to call e.g. unmap_mapping_range before the underlying
 * non-reserved pages are freed, which will then call our MMU notifier.
 * We still have to get a reference here to the page, because the 
callers
 * of *hva_to_pfn* and *gfn_to_pfn* ultimately end up doing a
 * kvm_release_pfn_clean on the returned pfn.  If the pfn is
 * reserved, the kvm_get_pfn/kvm_release_pfn_clean pair will simply
 * do nothing.
 */



Excellent. I like it. :)


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/04/2016 04:14 PM, Paolo Bonzini wrote:



On 04/07/2016 09:59, Xiao Guangrong wrote:



But apart from this, it's much more obvious to consider the refcount.
The x86 MMU code doesn't care if the page is reserved or not;
mmu_set_spte does a kvm_release_pfn_clean, hence it makes sense for
hva_to_pfn_remapped to try doing a get_page (via kvm_get_pfn) after
invoking the fault handler, just like the get_user_pages family of
function does.


Well,  it's little strange as you always try to get refcont
for a PFNMAP region without MIXEDMAP which indicates all the memory
in this region is no 'struct page' backend.


Fair enough, I can modify the comment.

/*
 * In case the VMA has VM_MIXEDMAP set, whoever called remap_pfn_range
 * is also going to call e.g. unmap_mapping_range before the underlying
 * non-reserved pages are freed, which will then call our MMU notifier.
 * We still have to get a reference here to the page, because the 
callers
 * of *hva_to_pfn* and *gfn_to_pfn* ultimately end up doing a
 * kvm_release_pfn_clean on the returned pfn.  If the pfn is
 * reserved, the kvm_get_pfn/kvm_release_pfn_clean pair will simply
 * do nothing.
 */



Excellent. I like it. :)


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/04/2016 03:53 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 03:37:35PM +0800, Xiao Guangrong wrote:



On 07/04/2016 03:03 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:



On 06/30/2016 09:01 PM, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.


Why does it require fetching the pfn when the fault is triggered rather
than when mmap() is called?


Hi Guangrong,

as such mapping information between virtual mmio to physical mmio is only 
available
at runtime.


Sorry, i do not know what the different between mmap() and the time VM actually
accesses the memory for your case. Could you please more detail?


Hi Guangrong,

Sure. The mmap() gets called by qemu or any VFIO API userspace consumer when
setting up the virtual mmio, at that moment nobody has any knowledge about how
the physical mmio gets virtualized.

When the vm (or application if we don't want to limit ourselves to vmm term)
starts, the virtual and physical mmio gets mapped by mpci kernel module with the
help from vendor supplied mediated host driver according to the hw resource
assigned to this vm / application.


Thanks for your expiation.

It sounds like a strategy of resource allocation, you delay the allocation 
until VM really
accesses it, right?


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/04/2016 03:53 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 03:37:35PM +0800, Xiao Guangrong wrote:



On 07/04/2016 03:03 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:



On 06/30/2016 09:01 PM, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.


Why does it require fetching the pfn when the fault is triggered rather
than when mmap() is called?


Hi Guangrong,

as such mapping information between virtual mmio to physical mmio is only 
available
at runtime.


Sorry, i do not know what the different between mmap() and the time VM actually
accesses the memory for your case. Could you please more detail?


Hi Guangrong,

Sure. The mmap() gets called by qemu or any VFIO API userspace consumer when
setting up the virtual mmio, at that moment nobody has any knowledge about how
the physical mmio gets virtualized.

When the vm (or application if we don't want to limit ourselves to vmm term)
starts, the virtual and physical mmio gets mapped by mpci kernel module with the
help from vendor supplied mediated host driver according to the hw resource
assigned to this vm / application.


Thanks for your expiation.

It sounds like a strategy of resource allocation, you delay the allocation 
until VM really
accesses it, right?


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Paolo Bonzini


On 04/07/2016 09:59, Xiao Guangrong wrote:
> 
>> But apart from this, it's much more obvious to consider the refcount.
>> The x86 MMU code doesn't care if the page is reserved or not;
>> mmu_set_spte does a kvm_release_pfn_clean, hence it makes sense for
>> hva_to_pfn_remapped to try doing a get_page (via kvm_get_pfn) after
>> invoking the fault handler, just like the get_user_pages family of
>> function does.
> 
> Well,  it's little strange as you always try to get refcont
> for a PFNMAP region without MIXEDMAP which indicates all the memory
> in this region is no 'struct page' backend.

Fair enough, I can modify the comment.

/*
 * In case the VMA has VM_MIXEDMAP set, whoever called remap_pfn_range
 * is also going to call e.g. unmap_mapping_range before the underlying
 * non-reserved pages are freed, which will then call our MMU notifier.
 * We still have to get a reference here to the page, because the 
callers
 * of *hva_to_pfn* and *gfn_to_pfn* ultimately end up doing a
 * kvm_release_pfn_clean on the returned pfn.  If the pfn is
 * reserved, the kvm_get_pfn/kvm_release_pfn_clean pair will simply
 * do nothing.
 */

Paolo

> But it works as kvm_{get, release}_* have already been aware of
> reserved_pfn, so i am okay with it..



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Paolo Bonzini


On 04/07/2016 09:59, Xiao Guangrong wrote:
> 
>> But apart from this, it's much more obvious to consider the refcount.
>> The x86 MMU code doesn't care if the page is reserved or not;
>> mmu_set_spte does a kvm_release_pfn_clean, hence it makes sense for
>> hva_to_pfn_remapped to try doing a get_page (via kvm_get_pfn) after
>> invoking the fault handler, just like the get_user_pages family of
>> function does.
> 
> Well,  it's little strange as you always try to get refcont
> for a PFNMAP region without MIXEDMAP which indicates all the memory
> in this region is no 'struct page' backend.

Fair enough, I can modify the comment.

/*
 * In case the VMA has VM_MIXEDMAP set, whoever called remap_pfn_range
 * is also going to call e.g. unmap_mapping_range before the underlying
 * non-reserved pages are freed, which will then call our MMU notifier.
 * We still have to get a reference here to the page, because the 
callers
 * of *hva_to_pfn* and *gfn_to_pfn* ultimately end up doing a
 * kvm_release_pfn_clean on the returned pfn.  If the pfn is
 * reserved, the kvm_get_pfn/kvm_release_pfn_clean pair will simply
 * do nothing.
 */

Paolo

> But it works as kvm_{get, release}_* have already been aware of
> reserved_pfn, so i am okay with it..



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/04/2016 03:48 PM, Paolo Bonzini wrote:



On 04/07/2016 09:37, Xiao Guangrong wrote:




It actually is a portion of the physical mmio which is set by vfio mmap.


So i do not think we need to care its refcount, i,e, we can consider it
as reserved_pfn,
Paolo?


nVidia provided me (offlist) with a simple patch that modified VFIO to
exhibit the problem, and it didn't use reserved PFNs.  This is why the
commit message for the patch is not entirely accurate.



It's clear now.


But apart from this, it's much more obvious to consider the refcount.
The x86 MMU code doesn't care if the page is reserved or not;
mmu_set_spte does a kvm_release_pfn_clean, hence it makes sense for
hva_to_pfn_remapped to try doing a get_page (via kvm_get_pfn) after
invoking the fault handler, just like the get_user_pages family of
function does.


Well,  it's little strange as you always try to get refcont
for a PFNMAP region without MIXEDMAP which indicates all the memory
in this region is no 'struct page' backend.

But it works as kvm_{get, release}_* have already been aware of
reserved_pfn, so i am okay with it..


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/04/2016 03:48 PM, Paolo Bonzini wrote:



On 04/07/2016 09:37, Xiao Guangrong wrote:




It actually is a portion of the physical mmio which is set by vfio mmap.


So i do not think we need to care its refcount, i,e, we can consider it
as reserved_pfn,
Paolo?


nVidia provided me (offlist) with a simple patch that modified VFIO to
exhibit the problem, and it didn't use reserved PFNs.  This is why the
commit message for the patch is not entirely accurate.



It's clear now.


But apart from this, it's much more obvious to consider the refcount.
The x86 MMU code doesn't care if the page is reserved or not;
mmu_set_spte does a kvm_release_pfn_clean, hence it makes sense for
hva_to_pfn_remapped to try doing a get_page (via kvm_get_pfn) after
invoking the fault handler, just like the get_user_pages family of
function does.


Well,  it's little strange as you always try to get refcont
for a PFNMAP region without MIXEDMAP which indicates all the memory
in this region is no 'struct page' backend.

But it works as kvm_{get, release}_* have already been aware of
reserved_pfn, so i am okay with it..


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Neo Jia
On Mon, Jul 04, 2016 at 03:37:35PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/04/2016 03:03 PM, Neo Jia wrote:
> >On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 06/30/2016 09:01 PM, Paolo Bonzini wrote:
> >>>The vGPU folks would like to trap the first access to a BAR by setting
> >>>vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
> >>>then can use remap_pfn_range to place some non-reserved pages in the VMA.
> >>
> >>Why does it require fetching the pfn when the fault is triggered rather
> >>than when mmap() is called?
> >
> >Hi Guangrong,
> >
> >as such mapping information between virtual mmio to physical mmio is only 
> >available
> >at runtime.
> 
> Sorry, i do not know what the different between mmap() and the time VM 
> actually
> accesses the memory for your case. Could you please more detail?

Hi Guangrong,

Sure. The mmap() gets called by qemu or any VFIO API userspace consumer when
setting up the virtual mmio, at that moment nobody has any knowledge about how
the physical mmio gets virtualized.

When the vm (or application if we don't want to limit ourselves to vmm term)
starts, the virtual and physical mmio gets mapped by mpci kernel module with the
help from vendor supplied mediated host driver according to the hw resource
assigned to this vm / application.

> 
> >
> >>
> >>Why the memory mapped by this mmap() is not a portion of MMIO from
> >>underlayer physical device? If it is a valid system memory, is this 
> >>interface
> >>really needed to implemented in vfio? (you at least need to set VM_MIXEDMAP
> >>if it mixed system memory with MMIO)
> >>
> >
> >It actually is a portion of the physical mmio which is set by vfio mmap.
> 
> So i do not think we need to care its refcount, i,e, we can consider it as 
> reserved_pfn,
> Paolo?
> 
> >
> >>IIUC, the kernel assumes that VM_PFNMAP is a continuous memory, e.g, like
> >>current KVM and vaddr_get_pfn() in vfio, but it seems nvdia's patchset
> >>breaks this semantic as ops->validate_map_request() can adjust the physical
> >>address arbitrarily. (again, the name 'validate' should be changed to match
> >>the thing as it is really doing)
> >
> >The vgpu api will allow you to adjust the target mmio address and the size 
> >via
> >validate_map_request, but it is still physical contiguous as  >size>.
> 
> Okay, the interface confused us, maybe this interface need to be cooked to 
> reflect
> to this fact.

Sure. We can address this in the RFC mediated device thread.

Thanks,
Neo

> 
> Thanks!
> 


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Neo Jia
On Mon, Jul 04, 2016 at 03:37:35PM +0800, Xiao Guangrong wrote:
> 
> 
> On 07/04/2016 03:03 PM, Neo Jia wrote:
> >On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:
> >>
> >>
> >>On 06/30/2016 09:01 PM, Paolo Bonzini wrote:
> >>>The vGPU folks would like to trap the first access to a BAR by setting
> >>>vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
> >>>then can use remap_pfn_range to place some non-reserved pages in the VMA.
> >>
> >>Why does it require fetching the pfn when the fault is triggered rather
> >>than when mmap() is called?
> >
> >Hi Guangrong,
> >
> >as such mapping information between virtual mmio to physical mmio is only 
> >available
> >at runtime.
> 
> Sorry, i do not know what the different between mmap() and the time VM 
> actually
> accesses the memory for your case. Could you please more detail?

Hi Guangrong,

Sure. The mmap() gets called by qemu or any VFIO API userspace consumer when
setting up the virtual mmio, at that moment nobody has any knowledge about how
the physical mmio gets virtualized.

When the vm (or application if we don't want to limit ourselves to vmm term)
starts, the virtual and physical mmio gets mapped by mpci kernel module with the
help from vendor supplied mediated host driver according to the hw resource
assigned to this vm / application.

> 
> >
> >>
> >>Why the memory mapped by this mmap() is not a portion of MMIO from
> >>underlayer physical device? If it is a valid system memory, is this 
> >>interface
> >>really needed to implemented in vfio? (you at least need to set VM_MIXEDMAP
> >>if it mixed system memory with MMIO)
> >>
> >
> >It actually is a portion of the physical mmio which is set by vfio mmap.
> 
> So i do not think we need to care its refcount, i,e, we can consider it as 
> reserved_pfn,
> Paolo?
> 
> >
> >>IIUC, the kernel assumes that VM_PFNMAP is a continuous memory, e.g, like
> >>current KVM and vaddr_get_pfn() in vfio, but it seems nvdia's patchset
> >>breaks this semantic as ops->validate_map_request() can adjust the physical
> >>address arbitrarily. (again, the name 'validate' should be changed to match
> >>the thing as it is really doing)
> >
> >The vgpu api will allow you to adjust the target mmio address and the size 
> >via
> >validate_map_request, but it is still physical contiguous as  >size>.
> 
> Okay, the interface confused us, maybe this interface need to be cooked to 
> reflect
> to this fact.

Sure. We can address this in the RFC mediated device thread.

Thanks,
Neo

> 
> Thanks!
> 


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Paolo Bonzini


On 04/07/2016 09:37, Xiao Guangrong wrote:
>>>
>>
>> It actually is a portion of the physical mmio which is set by vfio mmap.
> 
> So i do not think we need to care its refcount, i,e, we can consider it
> as reserved_pfn,
> Paolo?

nVidia provided me (offlist) with a simple patch that modified VFIO to
exhibit the problem, and it didn't use reserved PFNs.  This is why the
commit message for the patch is not entirely accurate.

But apart from this, it's much more obvious to consider the refcount.
The x86 MMU code doesn't care if the page is reserved or not;
mmu_set_spte does a kvm_release_pfn_clean, hence it makes sense for
hva_to_pfn_remapped to try doing a get_page (via kvm_get_pfn) after
invoking the fault handler, just like the get_user_pages family of
function does.

Paolo


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Paolo Bonzini


On 04/07/2016 09:37, Xiao Guangrong wrote:
>>>
>>
>> It actually is a portion of the physical mmio which is set by vfio mmap.
> 
> So i do not think we need to care its refcount, i,e, we can consider it
> as reserved_pfn,
> Paolo?

nVidia provided me (offlist) with a simple patch that modified VFIO to
exhibit the problem, and it didn't use reserved PFNs.  This is why the
commit message for the patch is not entirely accurate.

But apart from this, it's much more obvious to consider the refcount.
The x86 MMU code doesn't care if the page is reserved or not;
mmu_set_spte does a kvm_release_pfn_clean, hence it makes sense for
hva_to_pfn_remapped to try doing a get_page (via kvm_get_pfn) after
invoking the fault handler, just like the get_user_pages family of
function does.

Paolo


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/04/2016 03:38 PM, Paolo Bonzini wrote:



On 04/07/2016 08:39, Xiao Guangrong wrote:

Why the memory mapped by this mmap() is not a portion of MMIO from
underlayer physical device? If it is a valid system memory, is this
interface
really needed to implemented in vfio? (you at least need to set VM_MIXEDMAP
if it mixed system memory with MMIO)


The KVM code does not care if VM_MIXEDMAP is set or not, it works in
either case.


Yes, it is. I mean nvdia's vfio patchset should use VM_MIXEDMAP if the memory
is mixed. :)


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/04/2016 03:38 PM, Paolo Bonzini wrote:



On 04/07/2016 08:39, Xiao Guangrong wrote:

Why the memory mapped by this mmap() is not a portion of MMIO from
underlayer physical device? If it is a valid system memory, is this
interface
really needed to implemented in vfio? (you at least need to set VM_MIXEDMAP
if it mixed system memory with MMIO)


The KVM code does not care if VM_MIXEDMAP is set or not, it works in
either case.


Yes, it is. I mean nvdia's vfio patchset should use VM_MIXEDMAP if the memory
is mixed. :)


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/04/2016 03:03 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:



On 06/30/2016 09:01 PM, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.


Why does it require fetching the pfn when the fault is triggered rather
than when mmap() is called?


Hi Guangrong,

as such mapping information between virtual mmio to physical mmio is only 
available
at runtime.


Sorry, i do not know what the different between mmap() and the time VM actually
accesses the memory for your case. Could you please more detail?





Why the memory mapped by this mmap() is not a portion of MMIO from
underlayer physical device? If it is a valid system memory, is this interface
really needed to implemented in vfio? (you at least need to set VM_MIXEDMAP
if it mixed system memory with MMIO)



It actually is a portion of the physical mmio which is set by vfio mmap.


So i do not think we need to care its refcount, i,e, we can consider it as 
reserved_pfn,
Paolo?




IIUC, the kernel assumes that VM_PFNMAP is a continuous memory, e.g, like
current KVM and vaddr_get_pfn() in vfio, but it seems nvdia's patchset
breaks this semantic as ops->validate_map_request() can adjust the physical
address arbitrarily. (again, the name 'validate' should be changed to match
the thing as it is really doing)


The vgpu api will allow you to adjust the target mmio address and the size via
validate_map_request, but it is still physical contiguous as .


Okay, the interface confused us, maybe this interface need to be cooked to 
reflect
to this fact.

Thanks!



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 07/04/2016 03:03 PM, Neo Jia wrote:

On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:



On 06/30/2016 09:01 PM, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.


Why does it require fetching the pfn when the fault is triggered rather
than when mmap() is called?


Hi Guangrong,

as such mapping information between virtual mmio to physical mmio is only 
available
at runtime.


Sorry, i do not know what the different between mmap() and the time VM actually
accesses the memory for your case. Could you please more detail?





Why the memory mapped by this mmap() is not a portion of MMIO from
underlayer physical device? If it is a valid system memory, is this interface
really needed to implemented in vfio? (you at least need to set VM_MIXEDMAP
if it mixed system memory with MMIO)



It actually is a portion of the physical mmio which is set by vfio mmap.


So i do not think we need to care its refcount, i,e, we can consider it as 
reserved_pfn,
Paolo?




IIUC, the kernel assumes that VM_PFNMAP is a continuous memory, e.g, like
current KVM and vaddr_get_pfn() in vfio, but it seems nvdia's patchset
breaks this semantic as ops->validate_map_request() can adjust the physical
address arbitrarily. (again, the name 'validate' should be changed to match
the thing as it is really doing)


The vgpu api will allow you to adjust the target mmio address and the size via
validate_map_request, but it is still physical contiguous as .


Okay, the interface confused us, maybe this interface need to be cooked to 
reflect
to this fact.

Thanks!



Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Paolo Bonzini


On 04/07/2016 08:39, Xiao Guangrong wrote:
> Why the memory mapped by this mmap() is not a portion of MMIO from
> underlayer physical device? If it is a valid system memory, is this
> interface
> really needed to implemented in vfio? (you at least need to set VM_MIXEDMAP
> if it mixed system memory with MMIO)

The KVM code does not care if VM_MIXEDMAP is set or not, it works in
either case.

Paolo

> IIUC, the kernel assumes that VM_PFNMAP is a continuous memory, e.g, like
> current KVM and vaddr_get_pfn() in vfio, but it seems nvdia's patchset
> breaks this semantic as ops->validate_map_request() can adjust the physical
> address arbitrarily. (again, the name 'validate' should be changed to match
> the thing as it is really doing)
> 
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Paolo Bonzini


On 04/07/2016 08:39, Xiao Guangrong wrote:
> Why the memory mapped by this mmap() is not a portion of MMIO from
> underlayer physical device? If it is a valid system memory, is this
> interface
> really needed to implemented in vfio? (you at least need to set VM_MIXEDMAP
> if it mixed system memory with MMIO)

The KVM code does not care if VM_MIXEDMAP is set or not, it works in
either case.

Paolo

> IIUC, the kernel assumes that VM_PFNMAP is a continuous memory, e.g, like
> current KVM and vaddr_get_pfn() in vfio, but it seems nvdia's patchset
> breaks this semantic as ops->validate_map_request() can adjust the physical
> address arbitrarily. (again, the name 'validate' should be changed to match
> the thing as it is really doing)
> 
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Neo Jia
On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:
> 
> 
> On 06/30/2016 09:01 PM, Paolo Bonzini wrote:
> >The vGPU folks would like to trap the first access to a BAR by setting
> >vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
> >then can use remap_pfn_range to place some non-reserved pages in the VMA.
> 
> Why does it require fetching the pfn when the fault is triggered rather
> than when mmap() is called?

Hi Guangrong,

as such mapping information between virtual mmio to physical mmio is only 
available 
at runtime.

> 
> Why the memory mapped by this mmap() is not a portion of MMIO from
> underlayer physical device? If it is a valid system memory, is this interface
> really needed to implemented in vfio? (you at least need to set VM_MIXEDMAP
> if it mixed system memory with MMIO)
> 

It actually is a portion of the physical mmio which is set by vfio mmap.

> IIUC, the kernel assumes that VM_PFNMAP is a continuous memory, e.g, like
> current KVM and vaddr_get_pfn() in vfio, but it seems nvdia's patchset
> breaks this semantic as ops->validate_map_request() can adjust the physical
> address arbitrarily. (again, the name 'validate' should be changed to match
> the thing as it is really doing)

The vgpu api will allow you to adjust the target mmio address and the size via 
validate_map_request, but it is still physical contiguous as .

Thanks,
Neo

> 
> 


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Neo Jia
On Mon, Jul 04, 2016 at 02:39:22PM +0800, Xiao Guangrong wrote:
> 
> 
> On 06/30/2016 09:01 PM, Paolo Bonzini wrote:
> >The vGPU folks would like to trap the first access to a BAR by setting
> >vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
> >then can use remap_pfn_range to place some non-reserved pages in the VMA.
> 
> Why does it require fetching the pfn when the fault is triggered rather
> than when mmap() is called?

Hi Guangrong,

as such mapping information between virtual mmio to physical mmio is only 
available 
at runtime.

> 
> Why the memory mapped by this mmap() is not a portion of MMIO from
> underlayer physical device? If it is a valid system memory, is this interface
> really needed to implemented in vfio? (you at least need to set VM_MIXEDMAP
> if it mixed system memory with MMIO)
> 

It actually is a portion of the physical mmio which is set by vfio mmap.

> IIUC, the kernel assumes that VM_PFNMAP is a continuous memory, e.g, like
> current KVM and vaddr_get_pfn() in vfio, but it seems nvdia's patchset
> breaks this semantic as ops->validate_map_request() can adjust the physical
> address arbitrarily. (again, the name 'validate' should be changed to match
> the thing as it is really doing)

The vgpu api will allow you to adjust the target mmio address and the size via 
validate_map_request, but it is still physical contiguous as .

Thanks,
Neo

> 
> 


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 06/30/2016 09:01 PM, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.


Why does it require fetching the pfn when the fault is triggered rather
than when mmap() is called?

Why the memory mapped by this mmap() is not a portion of MMIO from
underlayer physical device? If it is a valid system memory, is this interface
really needed to implemented in vfio? (you at least need to set VM_MIXEDMAP
if it mixed system memory with MMIO)

IIUC, the kernel assumes that VM_PFNMAP is a continuous memory, e.g, like
current KVM and vaddr_get_pfn() in vfio, but it seems nvdia's patchset
breaks this semantic as ops->validate_map_request() can adjust the physical
address arbitrarily. (again, the name 'validate' should be changed to match
the thing as it is really doing)




Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-07-04 Thread Xiao Guangrong



On 06/30/2016 09:01 PM, Paolo Bonzini wrote:

The vGPU folks would like to trap the first access to a BAR by setting
vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
then can use remap_pfn_range to place some non-reserved pages in the VMA.


Why does it require fetching the pfn when the fault is triggered rather
than when mmap() is called?

Why the memory mapped by this mmap() is not a portion of MMIO from
underlayer physical device? If it is a valid system memory, is this interface
really needed to implemented in vfio? (you at least need to set VM_MIXEDMAP
if it mixed system memory with MMIO)

IIUC, the kernel assumes that VM_PFNMAP is a continuous memory, e.g, like
current KVM and vaddr_get_pfn() in vfio, but it seems nvdia's patchset
breaks this semantic as ops->validate_map_request() can adjust the physical
address arbitrarily. (again, the name 'validate' should be changed to match
the thing as it is really doing)




Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-06-30 Thread Neo Jia
On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:
> The vGPU folks would like to trap the first access to a BAR by setting
> vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
> then can use remap_pfn_range to place some non-reserved pages in the VMA.

Hi Paolo,

Thanks for the quick patches, I am in the middle of verifying them and will
report back asap.

Thanks,
Neo

> 
> KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and these
> patches should fix this.
> 
> Thanks,
> 
> Paolo
> 
> Paolo Bonzini (2):
>   KVM: MMU: prepare to support mapping of VM_IO and VM_PFNMAP frames
>   KVM: MMU: try to fix up page faults before giving up
> 
>  mm/gup.c|  1 +
>  virt/kvm/kvm_main.c | 55 
> -
>  2 files changed, 51 insertions(+), 5 deletions(-)
> 
> -- 
> 1.8.3.1
> 


Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed

2016-06-30 Thread Neo Jia
On Thu, Jun 30, 2016 at 03:01:49PM +0200, Paolo Bonzini wrote:
> The vGPU folks would like to trap the first access to a BAR by setting
> vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault handler
> then can use remap_pfn_range to place some non-reserved pages in the VMA.

Hi Paolo,

Thanks for the quick patches, I am in the middle of verifying them and will
report back asap.

Thanks,
Neo

> 
> KVM lacks support for this kind of non-linear VM_PFNMAP mapping, and these
> patches should fix this.
> 
> Thanks,
> 
> Paolo
> 
> Paolo Bonzini (2):
>   KVM: MMU: prepare to support mapping of VM_IO and VM_PFNMAP frames
>   KVM: MMU: try to fix up page faults before giving up
> 
>  mm/gup.c|  1 +
>  virt/kvm/kvm_main.c | 55 
> -
>  2 files changed, 51 insertions(+), 5 deletions(-)
> 
> -- 
> 1.8.3.1
>