[PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-07 Thread Xiao Guangrong
Changelog:
- introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
- introduce KVM_HVA_ERR_BAD and optimize error hva indicators

The test case can be found at:
http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2

In current code, if we map a readonly memory space from host to guest
and the page is not currently mapped in the host, we will get a fault-pfn
and async is not allowed, then the vm will crash.

As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
to the guest, read access is happy for readonly memslot, write access on
readonly memslot will cause KVM_EXIT_MMIO exit.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-10 Thread Marcelo Tosatti
On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
> Changelog:
> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
> 
> The test case can be found at:
> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
> 
> In current code, if we map a readonly memory space from host to guest
> and the page is not currently mapped in the host, we will get a fault-pfn
> and async is not allowed, then the vm will crash.
> 
> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
> to the guest, read access is happy for readonly memslot, write access on
> readonly memslot will cause KVM_EXIT_MMIO exit.

Memory slots whose QEMU mapping is write protected is supported
today, as long as there are no write faults.

What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
again?

The initial objective was to fix a vm crash, can you explain that
initial problem?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-10 Thread Xiao Guangrong
On 08/11/2012 02:14 AM, Marcelo Tosatti wrote:
> On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
>> Changelog:
>> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
>> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
>>
>> The test case can be found at:
>> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
>>
>> In current code, if we map a readonly memory space from host to guest
>> and the page is not currently mapped in the host, we will get a fault-pfn
>> and async is not allowed, then the vm will crash.
>>
>> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
>> to the guest, read access is happy for readonly memslot, write access on
>> readonly memslot will cause KVM_EXIT_MMIO exit.
> 
> Memory slots whose QEMU mapping is write protected is supported
> today, as long as there are no write faults.
> 
> What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
> again?
> 

It is happy to map !write host memory space to the readonly memslot,
and they can coexist as well.

readonly memslot checks the write-permission by seeing slot->flags and
!write memory checks the write-permission in hva_to_pfn() function
which checks vma->flags. It is no conflict.

> The initial objective was to fix a vm crash, can you explain that
> initial problem?
>

The issue was trigged by this code:

} else {
if (async && (vma->vm_flags & VM_WRITE))
*async = true;
pfn = KVM_PFN_ERR_FAULT;
}

If the host memory region is readonly (!vma->vm_flags & VM_WRITE) and
its physical page is swapped out (or the file data does not be read in),
get_user_page_nowait will fail, above code reject to set async,
then we will get a fault pfn and async=false.

I guess this issue also exists in "QEMU write protected mapping" as
you mentioned above.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-13 Thread Marcelo Tosatti
On Sat, Aug 11, 2012 at 11:36:20AM +0800, Xiao Guangrong wrote:
> On 08/11/2012 02:14 AM, Marcelo Tosatti wrote:
> > On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
> >> Changelog:
> >> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
> >> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
> >>
> >> The test case can be found at:
> >> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
> >>
> >> In current code, if we map a readonly memory space from host to guest
> >> and the page is not currently mapped in the host, we will get a fault-pfn
> >> and async is not allowed, then the vm will crash.
> >>
> >> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
> >> to the guest, read access is happy for readonly memslot, write access on
> >> readonly memslot will cause KVM_EXIT_MMIO exit.
> > 
> > Memory slots whose QEMU mapping is write protected is supported
> > today, as long as there are no write faults.
> > 
> > What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
> > again?
> > 
> 
> It is happy to map !write host memory space to the readonly memslot,
> and they can coexist as well.
> 
> readonly memslot checks the write-permission by seeing slot->flags and
> !write memory checks the write-permission in hva_to_pfn() function
> which checks vma->flags. It is no conflict.

Yes, there is no conflict. The point is, if you can use the
mmap(PROT_READ) interface (supporting read faults on read-only slots)
for this behavior, what is the advantage of a new memslot flag?

I'm not saying mmap(PROT_READ) is the best interface, i am just asking
why it is not.

> > The initial objective was to fix a vm crash, can you explain that
> > initial problem?
> >
> 
> The issue was trigged by this code:
> 
> } else {
> if (async && (vma->vm_flags & VM_WRITE))
> *async = true;
> pfn = KVM_PFN_ERR_FAULT;
> }
> 
> If the host memory region is readonly (!vma->vm_flags & VM_WRITE) and
> its physical page is swapped out (or the file data does not be read in),
> get_user_page_nowait will fail, above code reject to set async,
> then we will get a fault pfn and async=false.
> 
> I guess this issue also exists in "QEMU write protected mapping" as
> you mentioned above.

Yes, it does. As far as i understand, what that check does from a high
level pov is:

- Did get_user_pages_nowait() fail due to a swapped out page (in which 
case we should try to swappin the page asynchronously), or due to 
another reason (for which case an error should be returned).

Using vma->vm_flags VM_WRITE for that is trying to guess why
get_user_pages_nowait() failed, because it (gup_nowait return values) 
does not provide sufficient information by itself.

Can't that be fixed separately? 

Another issue which is also present with the mmap(PROT_READ) scheme is
interaction with reexecute_instruction. That is, unless i am mistaken,
reexecute_instruction can succeed (return true) on a region that is
write protected. This breaks the "write faults on read-only slots exit
to userspace via EXIT_MMIO" behaviour.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-13 Thread Xiao Guangrong
On 08/14/2012 01:39 AM, Marcelo Tosatti wrote:
> On Sat, Aug 11, 2012 at 11:36:20AM +0800, Xiao Guangrong wrote:
>> On 08/11/2012 02:14 AM, Marcelo Tosatti wrote:
>>> On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
 Changelog:
 - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
 - introduce KVM_HVA_ERR_BAD and optimize error hva indicators

 The test case can be found at:
 http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2

 In current code, if we map a readonly memory space from host to guest
 and the page is not currently mapped in the host, we will get a fault-pfn
 and async is not allowed, then the vm will crash.

 As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
 to the guest, read access is happy for readonly memslot, write access on
 readonly memslot will cause KVM_EXIT_MMIO exit.
>>>
>>> Memory slots whose QEMU mapping is write protected is supported
>>> today, as long as there are no write faults.
>>>
>>> What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
>>> again?
>>>
>>
>> It is happy to map !write host memory space to the readonly memslot,
>> and they can coexist as well.
>>
>> readonly memslot checks the write-permission by seeing slot->flags and
>> !write memory checks the write-permission in hva_to_pfn() function
>> which checks vma->flags. It is no conflict.
> 
> Yes, there is no conflict. The point is, if you can use the
> mmap(PROT_READ) interface (supporting read faults on read-only slots)
> for this behavior, what is the advantage of a new memslot flag?
> 

You can get the discussion at:
https://lkml.org/lkml/2012/5/22/228

> I'm not saying mmap(PROT_READ) is the best interface, i am just asking
> why it is not.

My fault. :(

> 
>>> The initial objective was to fix a vm crash, can you explain that
>>> initial problem?
>>>
>>
>> The issue was trigged by this code:
>>
>> } else {
>> if (async && (vma->vm_flags & VM_WRITE))
>> *async = true;
>> pfn = KVM_PFN_ERR_FAULT;
>> }
>>
>> If the host memory region is readonly (!vma->vm_flags & VM_WRITE) and
>> its physical page is swapped out (or the file data does not be read in),
>> get_user_page_nowait will fail, above code reject to set async,
>> then we will get a fault pfn and async=false.
>>
>> I guess this issue also exists in "QEMU write protected mapping" as
>> you mentioned above.
> 
> Yes, it does. As far as i understand, what that check does from a high
> level pov is:
> 
> - Did get_user_pages_nowait() fail due to a swapped out page (in which 
> case we should try to swappin the page asynchronously), or due to 
> another reason (for which case an error should be returned).
> 
> Using vma->vm_flags VM_WRITE for that is trying to guess why
> get_user_pages_nowait() failed, because it (gup_nowait return values) 
> does not provide sufficient information by itself.
> 

That is exactly what i did in the first version. :)

You can see it and the reason why it switched to the new way (readonly memslot)
in the above website (the first message in thread).

> Can't that be fixed separately? 
> 
> Another issue which is also present with the mmap(PROT_READ) scheme is
> interaction with reexecute_instruction. That is, unless i am mistaken,
> reexecute_instruction can succeed (return true) on a region that is
> write protected. This breaks the "write faults on read-only slots exit
> to userspace via EXIT_MMIO" behaviour.

Sorry, Why? After re-entry to the guest, it can not generate a correct MMIO?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-14 Thread Avi Kivity
On 08/10/2012 09:14 PM, Marcelo Tosatti wrote:
> On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
>> Changelog:
>> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
>> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
>> 
>> The test case can be found at:
>> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
>> 
>> In current code, if we map a readonly memory space from host to guest
>> and the page is not currently mapped in the host, we will get a fault-pfn
>> and async is not allowed, then the vm will crash.
>> 
>> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
>> to the guest, read access is happy for readonly memslot, write access on
>> readonly memslot will cause KVM_EXIT_MMIO exit.
> 
> Memory slots whose QEMU mapping is write protected is supported
> today, as long as there are no write faults.
> 
> What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
> again?

Userspace may want to modify the ROM (for example, when programming a
flash device).  It is also possible to map an hva range rw through one
slot and ro through another.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-14 Thread Marcelo Tosatti
On Tue, Aug 14, 2012 at 10:58:07AM +0800, Xiao Guangrong wrote:
> On 08/14/2012 01:39 AM, Marcelo Tosatti wrote:
> > On Sat, Aug 11, 2012 at 11:36:20AM +0800, Xiao Guangrong wrote:
> >> On 08/11/2012 02:14 AM, Marcelo Tosatti wrote:
> >>> On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
>  Changelog:
>  - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
>  - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
> 
>  The test case can be found at:
>  http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
> 
>  In current code, if we map a readonly memory space from host to guest
>  and the page is not currently mapped in the host, we will get a fault-pfn
>  and async is not allowed, then the vm will crash.
> 
>  As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
>  to the guest, read access is happy for readonly memslot, write access on
>  readonly memslot will cause KVM_EXIT_MMIO exit.
> >>>
> >>> Memory slots whose QEMU mapping is write protected is supported
> >>> today, as long as there are no write faults.
> >>>
> >>> What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
> >>> again?
> >>>
> >>
> >> It is happy to map !write host memory space to the readonly memslot,
> >> and they can coexist as well.
> >>
> >> readonly memslot checks the write-permission by seeing slot->flags and
> >> !write memory checks the write-permission in hva_to_pfn() function
> >> which checks vma->flags. It is no conflict.
> > 
> > Yes, there is no conflict. The point is, if you can use the
> > mmap(PROT_READ) interface (supporting read faults on read-only slots)
> > for this behavior, what is the advantage of a new memslot flag?
> > 
> 
> You can get the discussion at:
> https://lkml.org/lkml/2012/5/22/228
> 
> > I'm not saying mmap(PROT_READ) is the best interface, i am just asking
> > why it is not.
> 
> My fault. :(
> 
> > 
> >>> The initial objective was to fix a vm crash, can you explain that
> >>> initial problem?
> >>>
> >>
> >> The issue was trigged by this code:
> >>
> >> } else {
> >> if (async && (vma->vm_flags & VM_WRITE))
> >> *async = true;
> >> pfn = KVM_PFN_ERR_FAULT;
> >> }
> >>
> >> If the host memory region is readonly (!vma->vm_flags & VM_WRITE) and
> >> its physical page is swapped out (or the file data does not be read in),
> >> get_user_page_nowait will fail, above code reject to set async,
> >> then we will get a fault pfn and async=false.
> >>
> >> I guess this issue also exists in "QEMU write protected mapping" as
> >> you mentioned above.
> > 
> > Yes, it does. As far as i understand, what that check does from a high
> > level pov is:
> > 
> > - Did get_user_pages_nowait() fail due to a swapped out page (in which 
> > case we should try to swappin the page asynchronously), or due to 
> > another reason (for which case an error should be returned).
> > 
> > Using vma->vm_flags VM_WRITE for that is trying to guess why
> > get_user_pages_nowait() failed, because it (gup_nowait return values) 
> > does not provide sufficient information by itself.
> > 
> 
> That is exactly what i did in the first version. :)
> 
> You can see it and the reason why it switched to the new way (readonly 
> memslot)
> in the above website (the first message in thread).

Userspace can create multiple mappings for the same memory region, for
example via shared memory (shm_open), and have different protections for
the two (or more) regions. I had old patch doing this, its attached.

> > Can't that be fixed separately? 
> > 
> > Another issue which is also present with the mmap(PROT_READ) scheme is
> > interaction with reexecute_instruction. That is, unless i am mistaken,
> > reexecute_instruction can succeed (return true) on a region that is
> > write protected. This breaks the "write faults on read-only slots exit
> > to userspace via EXIT_MMIO" behaviour.
> 
> Sorry, Why? After re-entry to the guest, it can not generate a correct MMIO?

reexecute_instruction validates presence of GPA by looking at registered
memslots. But if the access is a write, and userspace memory map is
read-only, reexecute_instruction should exit via MMIO.

That is, reexecute_instruction must validate GPA using registered
memslots AND additionaly userspace map permission, not only registered
memslot.
 
Index: qemu-kvm-gpage-cache/cpu-common.h
===
--- qemu-kvm-gpage-cache.orig/cpu-common.h
+++ qemu-kvm-gpage-cache/cpu-common.h
@@ -33,6 +33,7 @@ ram_addr_t qemu_ram_alloc(ram_addr_t);
 void qemu_ram_free(ram_addr_t addr);
 /* This should only be used for ram local to a device.  */
 void *qemu_get_ram_ptr(ram_addr_t addr);
+void *qemu_get_ram_ptr_guest(ram_addr_t addr);
 /* This should not be used by devices.  */
 int do_qemu_ram_addr_from

Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-14 Thread Marcelo Tosatti
On Tue, Aug 14, 2012 at 05:00:33PM +0300, Avi Kivity wrote:
> On 08/10/2012 09:14 PM, Marcelo Tosatti wrote:
> > On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
> >> Changelog:
> >> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
> >> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
> >> 
> >> The test case can be found at:
> >> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
> >> 
> >> In current code, if we map a readonly memory space from host to guest
> >> and the page is not currently mapped in the host, we will get a fault-pfn
> >> and async is not allowed, then the vm will crash.
> >> 
> >> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
> >> to the guest, read access is happy for readonly memslot, write access on
> >> readonly memslot will cause KVM_EXIT_MMIO exit.
> > 
> > Memory slots whose QEMU mapping is write protected is supported
> > today, as long as there are no write faults.
> > 
> > What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
> > again?
> 
> Userspace may want to modify the ROM (for example, when programming a
> flash device).  It is also possible to map an hva range rw through one
> slot and ro through another.

Right, can do that with multiple userspace maps to the same anonymous 
memory region (see other email).

The bugs noticed should be fixed.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-15 Thread Avi Kivity
On 08/14/2012 06:51 PM, Marcelo Tosatti wrote:
>> 
>> Userspace may want to modify the ROM (for example, when programming a
>> flash device).  It is also possible to map an hva range rw through one
>> slot and ro through another.
> 
> Right, can do that with multiple userspace maps to the same anonymous 
> memory region (see other email).

Yes it's possible.  It requires that we move all memory allocation to be
fd based, since userspace can't predict what memory will be dual-mapped
(at least if emulated hardware allows this).  Is this a reasonable
requirement?  Do ksm/thp/autonuma work with this?



-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-15 Thread Marcelo Tosatti
On Wed, Aug 15, 2012 at 01:44:14PM +0300, Avi Kivity wrote:
> On 08/14/2012 06:51 PM, Marcelo Tosatti wrote:
> >> 
> >> Userspace may want to modify the ROM (for example, when programming a
> >> flash device).  It is also possible to map an hva range rw through one
> >> slot and ro through another.
> > 
> > Right, can do that with multiple userspace maps to the same anonymous 
> > memory region (see other email).
> 
> Yes it's possible.  It requires that we move all memory allocation to be
> fd based, since userspace can't predict what memory will be dual-mapped
> (at least if emulated hardware allows this).

It can:
- Create named memory object, with associated fd.
- Copy data from large anonymous memory region to named memory.
- Unmap region that must be dual-mapped from large anonymous memory chunk.
- Map named memory object at address.

The last step can be replaced by adjusting KVM memory slots.

The disadvantage of protection information in memory slots
is that it duplicates functionality that is handled by 
userspace mappings.

Moreover, multiple memory maps are necessary for any
split-qemu-into-smaller-pieces solutions.

>  Is this a reasonable
> requirement?  Do ksm/thp/autonuma work with this?

As mentioned, only memory used for ROM purposes must be dual mapped. 

I don't think there is any way to create multiple mappings 
to one anonymous memory object ATM, but POSIX defines it
(posix_typed_mem_open).

The limitation of thp/ksm on shared memory also affects any other user
of shared memory, so it should be fixed there.

Also, QEMU ROM is allocated separately from RAM, correct?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-15 Thread Xiao Guangrong
On 08/14/2012 11:25 PM, Marcelo Tosatti wrote:
> On Tue, Aug 14, 2012 at 10:58:07AM +0800, Xiao Guangrong wrote:
>> On 08/14/2012 01:39 AM, Marcelo Tosatti wrote:
>>> On Sat, Aug 11, 2012 at 11:36:20AM +0800, Xiao Guangrong wrote:
 On 08/11/2012 02:14 AM, Marcelo Tosatti wrote:
> On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
>> Changelog:
>> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
>> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
>>
>> The test case can be found at:
>> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
>>
>> In current code, if we map a readonly memory space from host to guest
>> and the page is not currently mapped in the host, we will get a fault-pfn
>> and async is not allowed, then the vm will crash.
>>
>> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
>> to the guest, read access is happy for readonly memslot, write access on
>> readonly memslot will cause KVM_EXIT_MMIO exit.
>
> Memory slots whose QEMU mapping is write protected is supported
> today, as long as there are no write faults.
>
> What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
> again?
>

 It is happy to map !write host memory space to the readonly memslot,
 and they can coexist as well.

 readonly memslot checks the write-permission by seeing slot->flags and
 !write memory checks the write-permission in hva_to_pfn() function
 which checks vma->flags. It is no conflict.
>>>
>>> Yes, there is no conflict. The point is, if you can use the
>>> mmap(PROT_READ) interface (supporting read faults on read-only slots)
>>> for this behavior, what is the advantage of a new memslot flag?
>>>
>>
>> You can get the discussion at:
>> https://lkml.org/lkml/2012/5/22/228
>>
>>> I'm not saying mmap(PROT_READ) is the best interface, i am just asking
>>> why it is not.
>>
>> My fault. :(
>>
>>>
> The initial objective was to fix a vm crash, can you explain that
> initial problem?
>

 The issue was trigged by this code:

 } else {
 if (async && (vma->vm_flags & VM_WRITE))
 *async = true;
 pfn = KVM_PFN_ERR_FAULT;
 }

 If the host memory region is readonly (!vma->vm_flags & VM_WRITE) and
 its physical page is swapped out (or the file data does not be read in),
 get_user_page_nowait will fail, above code reject to set async,
 then we will get a fault pfn and async=false.

 I guess this issue also exists in "QEMU write protected mapping" as
 you mentioned above.
>>>
>>> Yes, it does. As far as i understand, what that check does from a high
>>> level pov is:
>>>
>>> - Did get_user_pages_nowait() fail due to a swapped out page (in which 
>>> case we should try to swappin the page asynchronously), or due to 
>>> another reason (for which case an error should be returned).
>>>
>>> Using vma->vm_flags VM_WRITE for that is trying to guess why
>>> get_user_pages_nowait() failed, because it (gup_nowait return values) 
>>> does not provide sufficient information by itself.
>>>
>>
>> That is exactly what i did in the first version. :)
>>
>> You can see it and the reason why it switched to the new way (readonly 
>> memslot)
>> in the above website (the first message in thread).
> 
> Userspace can create multiple mappings for the same memory region, for
> example via shared memory (shm_open), and have different protections for
> the two (or more) regions. I had old patch doing this, its attached.
> 

In this way, if guest try to write a readonly gfn, the vm will be crashed since
it will return FAULT_PFN on the page-fault path. VMM can not detect this kind
of fault, we have these problems:
- even if guest try to write ROM on a PCI device, the guest will die, but
  we'd ignore this write, it looks more like the real machine.

- can not implement ROMD beacuse write to a ROMD is MMIO access

Yes, we can rework get_user_page_nowait and get_user_pages_fast, let them
tell us the fault reason, but it is more complex i think.

>>> Can't that be fixed separately? 
>>>
>>> Another issue which is also present with the mmap(PROT_READ) scheme is
>>> interaction with reexecute_instruction. That is, unless i am mistaken,
>>> reexecute_instruction can succeed (return true) on a region that is
>>> write protected. This breaks the "write faults on read-only slots exit
>>> to userspace via EXIT_MMIO" behaviour.
>>
>> Sorry, Why? After re-entry to the guest, it can not generate a correct MMIO?
> 
> reexecute_instruction validates presence of GPA by looking at registered
> memslots. But if the access is a write, and userspace memory map is
> read-only, reexecute_instruction should exit via MMIO.
> 
> That is, reexecute_instruction must validate GPA usi

Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-16 Thread Avi Kivity
On 08/15/2012 08:53 PM, Marcelo Tosatti wrote:
> On Wed, Aug 15, 2012 at 01:44:14PM +0300, Avi Kivity wrote:
>> On 08/14/2012 06:51 PM, Marcelo Tosatti wrote:
>> >> 
>> >> Userspace may want to modify the ROM (for example, when programming a
>> >> flash device).  It is also possible to map an hva range rw through one
>> >> slot and ro through another.
>> > 
>> > Right, can do that with multiple userspace maps to the same anonymous 
>> > memory region (see other email).
>> 
>> Yes it's possible.  It requires that we move all memory allocation to be
>> fd based, since userspace can't predict what memory will be dual-mapped
>> (at least if emulated hardware allows this).
> 
> It can:
> - Create named memory object, with associated fd.
> - Copy data from large anonymous memory region to named memory.

That doesn't work if dma is in progress (assigned device).  It also
doubles the amount of memory in use.

> - Unmap region that must be dual-mapped from large anonymous memory chunk.
> - Map named memory object at address.
> 
> The last step can be replaced by adjusting KVM memory slots.
> 
> The disadvantage of protection information in memory slots
> is that it duplicates functionality that is handled by 
> userspace mappings.

Agree.  So does the memory slots mechanism, and even dirty logging.

> 
> Moreover, multiple memory maps are necessary for any
> split-qemu-into-smaller-pieces solutions.

Complex users can use complex mechanism, but let's keep the simple stuff
simple.

> 
>>  Is this a reasonable
>> requirement?  Do ksm/thp/autonuma work with this?
> 
> As mentioned, only memory used for ROM purposes must be dual mapped. 
> 
> I don't think there is any way to create multiple mappings 
> to one anonymous memory object ATM, but POSIX defines it
> (posix_typed_mem_open).
> 
> The limitation of thp/ksm on shared memory also affects any other user
> of shared memory, so it should be fixed there.
> 
> Also, QEMU ROM is allocated separately from RAM, correct?
> 

Correct.  But the chipset is also able to to write-protect some ranges
in the 0xc-0x10 area via the PAM.  It is able to write-protect
both RAM and PCI memory (usually mapped to flash).



-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-16 Thread Marcelo Tosatti
On Thu, Aug 16, 2012 at 12:03:01PM +0300, Avi Kivity wrote:
> On 08/15/2012 08:53 PM, Marcelo Tosatti wrote:
> > On Wed, Aug 15, 2012 at 01:44:14PM +0300, Avi Kivity wrote:
> >> On 08/14/2012 06:51 PM, Marcelo Tosatti wrote:
> >> >> 
> >> >> Userspace may want to modify the ROM (for example, when programming a
> >> >> flash device).  It is also possible to map an hva range rw through one
> >> >> slot and ro through another.
> >> > 
> >> > Right, can do that with multiple userspace maps to the same anonymous 
> >> > memory region (see other email).
> >> 
> >> Yes it's possible.  It requires that we move all memory allocation to be
> >> fd based, since userspace can't predict what memory will be dual-mapped
> >> (at least if emulated hardware allows this).
> > 
> > It can:
> > - Create named memory object, with associated fd.
> > - Copy data from large anonymous memory region to named memory.
> 
> That doesn't work if dma is in progress (assigned device).  It also
> doubles the amount of memory in use.
>
> > - Unmap region that must be dual-mapped from large anonymous memory chunk.
> > - Map named memory object at address.
> > 
> > The last step can be replaced by adjusting KVM memory slots.
> > 
> > The disadvantage of protection information in memory slots
> > is that it duplicates functionality that is handled by 
> > userspace mappings.
> 
> Agree.  So does the memory slots mechanism, and even dirty logging.
>
> > Moreover, multiple memory maps are necessary for any
> > split-qemu-into-smaller-pieces solutions.
> 
> Complex users can use complex mechanism, but let's keep the simple stuff
> simple.
> 
> > 
> >>  Is this a reasonable
> >> requirement?  Do ksm/thp/autonuma work with this?
> > 
> > As mentioned, only memory used for ROM purposes must be dual mapped. 
> > 
> > I don't think there is any way to create multiple mappings 
> > to one anonymous memory object ATM, but POSIX defines it
> > (posix_typed_mem_open).
> > 
> > The limitation of thp/ksm on shared memory also affects any other user
> > of shared memory, so it should be fixed there.
> > 
> > Also, QEMU ROM is allocated separately from RAM, correct?
> > 
> 
> Correct.  But the chipset is also able to to write-protect some ranges
> in the 0xc-0x10 area via the PAM.  It is able to write-protect
> both RAM and PCI memory (usually mapped to flash).

You are convinced that adding read-write protection information to the
memory slots, which controls access by the guest, in addition to the
userspace host pagetables, is useful. OK.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-16 Thread Marcelo Tosatti
On Thu, Aug 16, 2012 at 01:49:11PM +0800, Xiao Guangrong wrote:
> On 08/14/2012 11:25 PM, Marcelo Tosatti wrote:
> > On Tue, Aug 14, 2012 at 10:58:07AM +0800, Xiao Guangrong wrote:
> >> On 08/14/2012 01:39 AM, Marcelo Tosatti wrote:
> >>> On Sat, Aug 11, 2012 at 11:36:20AM +0800, Xiao Guangrong wrote:
>  On 08/11/2012 02:14 AM, Marcelo Tosatti wrote:
> > On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
> >> Changelog:
> >> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
> >> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
> >>
> >> The test case can be found at:
> >> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
> >>
> >> In current code, if we map a readonly memory space from host to guest
> >> and the page is not currently mapped in the host, we will get a 
> >> fault-pfn
> >> and async is not allowed, then the vm will crash.
> >>
> >> As Avi's suggestion, We introduce readonly memory region to map 
> >> ROM/ROMD
> >> to the guest, read access is happy for readonly memslot, write access 
> >> on
> >> readonly memslot will cause KVM_EXIT_MMIO exit.
> >
> > Memory slots whose QEMU mapping is write protected is supported
> > today, as long as there are no write faults.
> >
> > What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
> > again?
> >
> 
>  It is happy to map !write host memory space to the readonly memslot,
>  and they can coexist as well.
> 
>  readonly memslot checks the write-permission by seeing slot->flags and
>  !write memory checks the write-permission in hva_to_pfn() function
>  which checks vma->flags. It is no conflict.
> >>>
> >>> Yes, there is no conflict. The point is, if you can use the
> >>> mmap(PROT_READ) interface (supporting read faults on read-only slots)
> >>> for this behavior, what is the advantage of a new memslot flag?
> >>>
> >>
> >> You can get the discussion at:
> >> https://lkml.org/lkml/2012/5/22/228
> >>
> >>> I'm not saying mmap(PROT_READ) is the best interface, i am just asking
> >>> why it is not.
> >>
> >> My fault. :(
> >>
> >>>
> > The initial objective was to fix a vm crash, can you explain that
> > initial problem?
> >
> 
>  The issue was trigged by this code:
> 
>  } else {
>  if (async && (vma->vm_flags & VM_WRITE))
>  *async = true;
>  pfn = KVM_PFN_ERR_FAULT;
>  }
> 
>  If the host memory region is readonly (!vma->vm_flags & VM_WRITE) and
>  its physical page is swapped out (or the file data does not be read in),
>  get_user_page_nowait will fail, above code reject to set async,
>  then we will get a fault pfn and async=false.
> 
>  I guess this issue also exists in "QEMU write protected mapping" as
>  you mentioned above.
> >>>
> >>> Yes, it does. As far as i understand, what that check does from a high
> >>> level pov is:
> >>>
> >>> - Did get_user_pages_nowait() fail due to a swapped out page (in which 
> >>> case we should try to swappin the page asynchronously), or due to 
> >>> another reason (for which case an error should be returned).
> >>>
> >>> Using vma->vm_flags VM_WRITE for that is trying to guess why
> >>> get_user_pages_nowait() failed, because it (gup_nowait return values) 
> >>> does not provide sufficient information by itself.
> >>>
> >>
> >> That is exactly what i did in the first version. :)
> >>
> >> You can see it and the reason why it switched to the new way (readonly 
> >> memslot)
> >> in the above website (the first message in thread).
> > 
> > Userspace can create multiple mappings for the same memory region, for
> > example via shared memory (shm_open), and have different protections for
> > the two (or more) regions. I had old patch doing this, its attached.
> > 
> 
> In this way, if guest try to write a readonly gfn, the vm will be crashed 
> since
> it will return FAULT_PFN on the page-fault path. VMM can not detect this kind
> of fault, we have these problems:
> - even if guest try to write ROM on a PCI device, the guest will die, but
>   we'd ignore this write, it looks more like the real machine.
> 
> - can not implement ROMD beacuse write to a ROMD is MMIO access
> 
> Yes, we can rework get_user_page_nowait and get_user_pages_fast, let them
> tell us the fault reason, but it is more complex i think.
> 
> >>> Can't that be fixed separately? 
> >>>
> >>> Another issue which is also present with the mmap(PROT_READ) scheme is
> >>> interaction with reexecute_instruction. That is, unless i am mistaken,
> >>> reexecute_instruction can succeed (return true) on a region that is
> >>> write protected. This breaks the "write faults on read-only slots exit
> >>> to userspace via EXIT_MMIO" behaviour.
> >>
> >> Sorry, Why? After 

Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-16 Thread Avi Kivity
On 08/16/2012 06:57 PM, Marcelo Tosatti wrote:
>> 
>> Correct.  But the chipset is also able to to write-protect some ranges
>> in the 0xc-0x10 area via the PAM.  It is able to write-protect
>> both RAM and PCI memory (usually mapped to flash).
> 
> You are convinced that adding read-write protection information to the
> memory slots, which controls access by the guest, in addition to the
> userspace host pagetables, is useful. OK.

In fact if we started from scratch I'd go for one huge slot, with
PROT_NONE for mmio and non-kvm APIs for dirty bitmaps, and use Linux mm
APIs to manage the details.  This would make kvm x86_64 only (no way to
access the PCI space on i386) but it would simplify a lot of the
internal translation layer.  But we're not there.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/