Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency

2015-05-21 Thread Andrew Jones
On Wed, May 20, 2015 at 07:29:28PM -0700, Mario Smarduch wrote:
> On 05/15/2015 10:04 AM, Andrew Jones wrote:
> > On Fri, May 15, 2015 at 08:02:59AM -0700, Christoffer Dall wrote:
> >> On Thu, May 14, 2015 at 03:32:13PM +0200, Andrew Jones wrote:
> >>> On Thu, May 14, 2015 at 12:55:49PM +0200, Christoffer Dall wrote:
>  On Wed, May 13, 2015 at 01:31:54PM +0200, Andrew Jones wrote:
> > When S1 and S2 memory attributes combine wrt to caching policy,
> > non-cacheable types take precedence. If a guest maps a region as
> > device memory, which KVM userspace is using to emulate the device
> > using normal, cacheable memory, then we lose coherency. With
> > KVM_MEM_UNCACHED, KVM userspace can now hint to KVM which memory
> > regions are likely to be problematic. With this patch, as pages
> > of these types of regions are faulted into the guest, not only do
> > we flush the page's dcache, but we also change userspace's
> > mapping to NC in order to maintain coherency.
> >
> > What if the guest doesn't do what we expect? While we can't
> > force a guest to use cacheable memory, we can take advantage of
> > the non-cacheable precedence, and force it to use non-cacheable.
> > So, this patch also introduces PAGE_S2_NORMAL_NC, and uses it on
> > KVM_MEM_UNCACHED regions to force them to NC.
> >
> > We now have both guest and userspace on the same page (pun intended)
> 
>  I'd like to revisit the overall approach here.  Is doing non-cached
>  accesses in both the guest and host really the right thing to do here?
> >>>
> >>> I think so, but all ideas/approaches are still on the table. This is
> >>> still an RFC.
> >>>
> 
>  The semantics of the device becomes that it is cache coherent (because
>  QEMU is), and I think Marc argued that Linux/UEFI should simply be
>  adapted to handle whatever emulated devices we have as coherent.  I also
>  remember someone arguing that would be wrong (Peter?).
> >>>
> >>> I'm not really for quirking all devices in all guest types (AAVMF, Linux,
> >>> other bootloaders, other OSes). Windows is unlikely to apply any quirks.
> >>>
> >>
> >> Well my point was that if we're emulating a platform with coherent IO
> >> memory for PCI devices that is something that the guest should work with
> >> as such, but as Paolo explained it should always be safe for a guest to
> >> assume non-coherent, so that doesn't work.
> >>
> 
>  Finally, does this address all cache coherency issues with emulated
>  devices?  Some VOS guys had seen things still not working with this
>  approach, unsure why...  I'd like to avoid us merging this only to merge
>  a more complete solution in a few weeks which reverts this solution...
> >>>
> >>> I'm not sure (this is still an RFT too :-) We definitely would need to
> >>> scatter some more memory_region_set_uncached() calls around QEMU first.
> >>>
> >>
> >> It would be good if you could sync with the VOS guys and make sure your
> >> patch set addresses their issues with the appropriate
> >> memory_region_set_uncached() added to QEMU, and if it does not, some
> >> vague idea why that falls outside of the scope of this patch set.  After
> >> all, adding a USB controller to a VM is not that an esoteric use case,
> >> is it?
> > 
> > I'll pull together a new version addressing all your comments, and also
> > put some more time into making sure it'll work...
> > 
> > Jeremy, can you give me the qemu command line you're using for your tests?
> > I'll do some experimenting with it.
> > 
> > Thanks,
> > drew
> > 
> 
> Hi Drew,
> 
> I just recently looked at these patches and little confused.
> 
> Where or how are the QEMU page tables changed to
> non-cacheable?
> 
> I noticed the logical pfn is changed to non-cacheable.

Right, this series is broken. Please stay tuned, I'll fix it the
next time around.

Thanks for reviewing.

drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency

2015-05-20 Thread Mario Smarduch
On 05/15/2015 10:04 AM, Andrew Jones wrote:
> On Fri, May 15, 2015 at 08:02:59AM -0700, Christoffer Dall wrote:
>> On Thu, May 14, 2015 at 03:32:13PM +0200, Andrew Jones wrote:
>>> On Thu, May 14, 2015 at 12:55:49PM +0200, Christoffer Dall wrote:
 On Wed, May 13, 2015 at 01:31:54PM +0200, Andrew Jones wrote:
> When S1 and S2 memory attributes combine wrt to caching policy,
> non-cacheable types take precedence. If a guest maps a region as
> device memory, which KVM userspace is using to emulate the device
> using normal, cacheable memory, then we lose coherency. With
> KVM_MEM_UNCACHED, KVM userspace can now hint to KVM which memory
> regions are likely to be problematic. With this patch, as pages
> of these types of regions are faulted into the guest, not only do
> we flush the page's dcache, but we also change userspace's
> mapping to NC in order to maintain coherency.
>
> What if the guest doesn't do what we expect? While we can't
> force a guest to use cacheable memory, we can take advantage of
> the non-cacheable precedence, and force it to use non-cacheable.
> So, this patch also introduces PAGE_S2_NORMAL_NC, and uses it on
> KVM_MEM_UNCACHED regions to force them to NC.
>
> We now have both guest and userspace on the same page (pun intended)

 I'd like to revisit the overall approach here.  Is doing non-cached
 accesses in both the guest and host really the right thing to do here?
>>>
>>> I think so, but all ideas/approaches are still on the table. This is
>>> still an RFC.
>>>

 The semantics of the device becomes that it is cache coherent (because
 QEMU is), and I think Marc argued that Linux/UEFI should simply be
 adapted to handle whatever emulated devices we have as coherent.  I also
 remember someone arguing that would be wrong (Peter?).
>>>
>>> I'm not really for quirking all devices in all guest types (AAVMF, Linux,
>>> other bootloaders, other OSes). Windows is unlikely to apply any quirks.
>>>
>>
>> Well my point was that if we're emulating a platform with coherent IO
>> memory for PCI devices that is something that the guest should work with
>> as such, but as Paolo explained it should always be safe for a guest to
>> assume non-coherent, so that doesn't work.
>>

 Finally, does this address all cache coherency issues with emulated
 devices?  Some VOS guys had seen things still not working with this
 approach, unsure why...  I'd like to avoid us merging this only to merge
 a more complete solution in a few weeks which reverts this solution...
>>>
>>> I'm not sure (this is still an RFT too :-) We definitely would need to
>>> scatter some more memory_region_set_uncached() calls around QEMU first.
>>>
>>
>> It would be good if you could sync with the VOS guys and make sure your
>> patch set addresses their issues with the appropriate
>> memory_region_set_uncached() added to QEMU, and if it does not, some
>> vague idea why that falls outside of the scope of this patch set.  After
>> all, adding a USB controller to a VM is not that an esoteric use case,
>> is it?
> 
> I'll pull together a new version addressing all your comments, and also
> put some more time into making sure it'll work...
> 
> Jeremy, can you give me the qemu command line you're using for your tests?
> I'll do some experimenting with it.
> 
> Thanks,
> drew
> 

Hi Drew,

I just recently looked at these patches and little confused.

Where or how are the QEMU page tables changed to
non-cacheable?

I noticed the logical pfn is changed to non-cacheable.

- Mario

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency

2015-05-15 Thread Jérémy Fanguède
On Fri, May 15, 2015 at 7:04 PM, Andrew Jones  wrote:
> On Fri, May 15, 2015 at 08:02:59AM -0700, Christoffer Dall wrote:
>> On Thu, May 14, 2015 at 03:32:13PM +0200, Andrew Jones wrote:
>> > On Thu, May 14, 2015 at 12:55:49PM +0200, Christoffer Dall wrote:
>> > > On Wed, May 13, 2015 at 01:31:54PM +0200, Andrew Jones wrote:
>> > > > When S1 and S2 memory attributes combine wrt to caching policy,
>> > > > non-cacheable types take precedence. If a guest maps a region as
>> > > > device memory, which KVM userspace is using to emulate the device
>> > > > using normal, cacheable memory, then we lose coherency. With
>> > > > KVM_MEM_UNCACHED, KVM userspace can now hint to KVM which memory
>> > > > regions are likely to be problematic. With this patch, as pages
>> > > > of these types of regions are faulted into the guest, not only do
>> > > > we flush the page's dcache, but we also change userspace's
>> > > > mapping to NC in order to maintain coherency.
>> > > >
>> > > > What if the guest doesn't do what we expect? While we can't
>> > > > force a guest to use cacheable memory, we can take advantage of
>> > > > the non-cacheable precedence, and force it to use non-cacheable.
>> > > > So, this patch also introduces PAGE_S2_NORMAL_NC, and uses it on
>> > > > KVM_MEM_UNCACHED regions to force them to NC.
>> > > >
>> > > > We now have both guest and userspace on the same page (pun intended)
>> > >
>> > > I'd like to revisit the overall approach here.  Is doing non-cached
>> > > accesses in both the guest and host really the right thing to do here?
>> >
>> > I think so, but all ideas/approaches are still on the table. This is
>> > still an RFC.
>> >
>> > >
>> > > The semantics of the device becomes that it is cache coherent (because
>> > > QEMU is), and I think Marc argued that Linux/UEFI should simply be
>> > > adapted to handle whatever emulated devices we have as coherent.  I also
>> > > remember someone arguing that would be wrong (Peter?).
>> >
>> > I'm not really for quirking all devices in all guest types (AAVMF, Linux,
>> > other bootloaders, other OSes). Windows is unlikely to apply any quirks.
>> >
>>
>> Well my point was that if we're emulating a platform with coherent IO
>> memory for PCI devices that is something that the guest should work with
>> as such, but as Paolo explained it should always be safe for a guest to
>> assume non-coherent, so that doesn't work.
>>
>> > >
>> > > Finally, does this address all cache coherency issues with emulated
>> > > devices?  Some VOS guys had seen things still not working with this
>> > > approach, unsure why...  I'd like to avoid us merging this only to merge
>> > > a more complete solution in a few weeks which reverts this solution...
>> >
>> > I'm not sure (this is still an RFT too :-) We definitely would need to
>> > scatter some more memory_region_set_uncached() calls around QEMU first.
>> >
>>
>> It would be good if you could sync with the VOS guys and make sure your
>> patch set addresses their issues with the appropriate
>> memory_region_set_uncached() added to QEMU, and if it does not, some
>> vague idea why that falls outside of the scope of this patch set.  After
>> all, adding a USB controller to a VM is not that an esoteric use case,
>> is it?
>
> I'll pull together a new version addressing all your comments, and also
> put some more time into making sure it'll work...
>
> Jeremy, can you give me the qemu command line you're using for your tests?
> I'll do some experimenting with it.

Hi Andrew,

Here is the command line that I used:

./qemu-system-arm -m 512 -machine type=virt \
-enable-kvm -cpu host \
-nographic \
-kernel zImage \
-drive if=none,file=ubuntu.img,id=fs,format=raw \
-device virtio-blk-device,drive=fs \
-usb -device usb-ehci \
-device usb-tablet \
-append "console=ttyAMA0 root=/dev/vda rw"

I encountered also troubles with other devices, so the test can be
extended with some devices:
-netdev type=user,id=net0 -device e1000,netdev=net0 \
-device nec-usb-xhci,id=xhci \
-device usb-kbd,bus=xhci.0 \
-drive if=scsi,file=disk.img,format=raw \
-device lsi53c895a \

I tried it with some arm32 boards, but also on arm64 (Juno board).


Jeremy
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency

2015-05-15 Thread Andrew Jones
On Fri, May 15, 2015 at 08:02:59AM -0700, Christoffer Dall wrote:
> On Thu, May 14, 2015 at 03:32:13PM +0200, Andrew Jones wrote:
> > On Thu, May 14, 2015 at 12:55:49PM +0200, Christoffer Dall wrote:
> > > On Wed, May 13, 2015 at 01:31:54PM +0200, Andrew Jones wrote:
> > > > When S1 and S2 memory attributes combine wrt to caching policy,
> > > > non-cacheable types take precedence. If a guest maps a region as
> > > > device memory, which KVM userspace is using to emulate the device
> > > > using normal, cacheable memory, then we lose coherency. With
> > > > KVM_MEM_UNCACHED, KVM userspace can now hint to KVM which memory
> > > > regions are likely to be problematic. With this patch, as pages
> > > > of these types of regions are faulted into the guest, not only do
> > > > we flush the page's dcache, but we also change userspace's
> > > > mapping to NC in order to maintain coherency.
> > > > 
> > > > What if the guest doesn't do what we expect? While we can't
> > > > force a guest to use cacheable memory, we can take advantage of
> > > > the non-cacheable precedence, and force it to use non-cacheable.
> > > > So, this patch also introduces PAGE_S2_NORMAL_NC, and uses it on
> > > > KVM_MEM_UNCACHED regions to force them to NC.
> > > > 
> > > > We now have both guest and userspace on the same page (pun intended)
> > > 
> > > I'd like to revisit the overall approach here.  Is doing non-cached
> > > accesses in both the guest and host really the right thing to do here?
> > 
> > I think so, but all ideas/approaches are still on the table. This is
> > still an RFC.
> > 
> > > 
> > > The semantics of the device becomes that it is cache coherent (because
> > > QEMU is), and I think Marc argued that Linux/UEFI should simply be
> > > adapted to handle whatever emulated devices we have as coherent.  I also
> > > remember someone arguing that would be wrong (Peter?).
> > 
> > I'm not really for quirking all devices in all guest types (AAVMF, Linux,
> > other bootloaders, other OSes). Windows is unlikely to apply any quirks.
> > 
> 
> Well my point was that if we're emulating a platform with coherent IO
> memory for PCI devices that is something that the guest should work with
> as such, but as Paolo explained it should always be safe for a guest to
> assume non-coherent, so that doesn't work.
> 
> > > 
> > > Finally, does this address all cache coherency issues with emulated
> > > devices?  Some VOS guys had seen things still not working with this
> > > approach, unsure why...  I'd like to avoid us merging this only to merge
> > > a more complete solution in a few weeks which reverts this solution...
> > 
> > I'm not sure (this is still an RFT too :-) We definitely would need to
> > scatter some more memory_region_set_uncached() calls around QEMU first.
> > 
> 
> It would be good if you could sync with the VOS guys and make sure your
> patch set addresses their issues with the appropriate
> memory_region_set_uncached() added to QEMU, and if it does not, some
> vague idea why that falls outside of the scope of this patch set.  After
> all, adding a USB controller to a VM is not that an esoteric use case,
> is it?

I'll pull together a new version addressing all your comments, and also
put some more time into making sure it'll work...

Jeremy, can you give me the qemu command line you're using for your tests?
I'll do some experimenting with it.

Thanks,
drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency

2015-05-15 Thread Christoffer Dall
On Thu, May 14, 2015 at 03:32:13PM +0200, Andrew Jones wrote:
> On Thu, May 14, 2015 at 12:55:49PM +0200, Christoffer Dall wrote:
> > On Wed, May 13, 2015 at 01:31:54PM +0200, Andrew Jones wrote:
> > > When S1 and S2 memory attributes combine wrt to caching policy,
> > > non-cacheable types take precedence. If a guest maps a region as
> > > device memory, which KVM userspace is using to emulate the device
> > > using normal, cacheable memory, then we lose coherency. With
> > > KVM_MEM_UNCACHED, KVM userspace can now hint to KVM which memory
> > > regions are likely to be problematic. With this patch, as pages
> > > of these types of regions are faulted into the guest, not only do
> > > we flush the page's dcache, but we also change userspace's
> > > mapping to NC in order to maintain coherency.
> > > 
> > > What if the guest doesn't do what we expect? While we can't
> > > force a guest to use cacheable memory, we can take advantage of
> > > the non-cacheable precedence, and force it to use non-cacheable.
> > > So, this patch also introduces PAGE_S2_NORMAL_NC, and uses it on
> > > KVM_MEM_UNCACHED regions to force them to NC.
> > > 
> > > We now have both guest and userspace on the same page (pun intended)
> > 
> > I'd like to revisit the overall approach here.  Is doing non-cached
> > accesses in both the guest and host really the right thing to do here?
> 
> I think so, but all ideas/approaches are still on the table. This is
> still an RFC.
> 
> > 
> > The semantics of the device becomes that it is cache coherent (because
> > QEMU is), and I think Marc argued that Linux/UEFI should simply be
> > adapted to handle whatever emulated devices we have as coherent.  I also
> > remember someone arguing that would be wrong (Peter?).
> 
> I'm not really for quirking all devices in all guest types (AAVMF, Linux,
> other bootloaders, other OSes). Windows is unlikely to apply any quirks.
> 

Well my point was that if we're emulating a platform with coherent IO
memory for PCI devices that is something that the guest should work with
as such, but as Paolo explained it should always be safe for a guest to
assume non-coherent, so that doesn't work.

> > 
> > Finally, does this address all cache coherency issues with emulated
> > devices?  Some VOS guys had seen things still not working with this
> > approach, unsure why...  I'd like to avoid us merging this only to merge
> > a more complete solution in a few weeks which reverts this solution...
> 
> I'm not sure (this is still an RFT too :-) We definitely would need to
> scatter some more memory_region_set_uncached() calls around QEMU first.
> 

It would be good if you could sync with the VOS guys and make sure your
patch set addresses their issues with the appropriate
memory_region_set_uncached() added to QEMU, and if it does not, some
vague idea why that falls outside of the scope of this patch set.  After
all, adding a USB controller to a VM is not that an esoteric use case,
is it?

> > 
> > More comments/questions below:
> > 
> > > 
> > > Signed-off-by: Andrew Jones 
> > > ---
> > >  arch/arm/include/asm/kvm_mmu.h|  5 -
> > >  arch/arm/include/asm/pgtable-3level.h |  1 +
> > >  arch/arm/include/asm/pgtable.h|  1 +
> > >  arch/arm/kvm/mmu.c| 37 
> > > +++
> > >  arch/arm64/include/asm/kvm_mmu.h  |  5 -
> > >  arch/arm64/include/asm/memory.h   |  1 +
> > >  arch/arm64/include/asm/pgtable.h  |  1 +
> > >  7 files changed, 36 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/arch/arm/include/asm/kvm_mmu.h 
> > > b/arch/arm/include/asm/kvm_mmu.h
> > > index 405aa18833073..e8034a80b12e5 100644
> > > --- a/arch/arm/include/asm/kvm_mmu.h
> > > +++ b/arch/arm/include/asm/kvm_mmu.h
> > > @@ -214,8 +214,11 @@ static inline void 
> > > __coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t pfn,
> > >   while (size) {
> > >   void *va = kmap_atomic_pfn(pfn);
> > >  
> > > - if (need_flush)
> > > + if (need_flush) {
> > >   kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> > > + if (ipa_uncached)
> > > + set_memory_nc((unsigned long)va, 1);
> > 
> > nit: consider moving this outside the need_flush
> > 
> > > + }
> > >  
> > >   if (icache_is_pipt())
> > >   __cpuc_coherent_user_range((unsigned long)va,
> > > diff --git a/arch/arm/include/asm/pgtable-3level.h 
> > > b/arch/arm/include/asm/pgtable-3level.h
> > > index a745a2a53853c..39b3f7a40e663 100644
> > > --- a/arch/arm/include/asm/pgtable-3level.h
> > > +++ b/arch/arm/include/asm/pgtable-3level.h
> > > @@ -121,6 +121,7 @@
> > >   * 2nd stage PTE definitions for LPAE.
> > >   */
> > >  #define L_PTE_S2_MT_UNCACHED (_AT(pteval_t, 0x0) << 2) /* 
> > > strongly ordered */
> > > +#define L_PTE_S2_MT_NORMAL_NC(_AT(pteval_t, 0x5) << 2) /* 
> > > normal non-cacheable */
> > >  #define L_PTE_S2_MT_WRITETHROUGH (_AT

Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency

2015-05-14 Thread Andrew Jones
On Thu, May 14, 2015 at 12:55:49PM +0200, Christoffer Dall wrote:
> On Wed, May 13, 2015 at 01:31:54PM +0200, Andrew Jones wrote:
> > When S1 and S2 memory attributes combine wrt to caching policy,
> > non-cacheable types take precedence. If a guest maps a region as
> > device memory, which KVM userspace is using to emulate the device
> > using normal, cacheable memory, then we lose coherency. With
> > KVM_MEM_UNCACHED, KVM userspace can now hint to KVM which memory
> > regions are likely to be problematic. With this patch, as pages
> > of these types of regions are faulted into the guest, not only do
> > we flush the page's dcache, but we also change userspace's
> > mapping to NC in order to maintain coherency.
> > 
> > What if the guest doesn't do what we expect? While we can't
> > force a guest to use cacheable memory, we can take advantage of
> > the non-cacheable precedence, and force it to use non-cacheable.
> > So, this patch also introduces PAGE_S2_NORMAL_NC, and uses it on
> > KVM_MEM_UNCACHED regions to force them to NC.
> > 
> > We now have both guest and userspace on the same page (pun intended)
> 
> I'd like to revisit the overall approach here.  Is doing non-cached
> accesses in both the guest and host really the right thing to do here?

I think so, but all ideas/approaches are still on the table. This is
still an RFC.

> 
> The semantics of the device becomes that it is cache coherent (because
> QEMU is), and I think Marc argued that Linux/UEFI should simply be
> adapted to handle whatever emulated devices we have as coherent.  I also
> remember someone arguing that would be wrong (Peter?).

I'm not really for quirking all devices in all guest types (AAVMF, Linux,
other bootloaders, other OSes). Windows is unlikely to apply any quirks.

> 
> Finally, does this address all cache coherency issues with emulated
> devices?  Some VOS guys had seen things still not working with this
> approach, unsure why...  I'd like to avoid us merging this only to merge
> a more complete solution in a few weeks which reverts this solution...

I'm not sure (this is still an RFT too :-) We definitely would need to
scatter some more memory_region_set_uncached() calls around QEMU first.

> 
> More comments/questions below:
> 
> > 
> > Signed-off-by: Andrew Jones 
> > ---
> >  arch/arm/include/asm/kvm_mmu.h|  5 -
> >  arch/arm/include/asm/pgtable-3level.h |  1 +
> >  arch/arm/include/asm/pgtable.h|  1 +
> >  arch/arm/kvm/mmu.c| 37 
> > +++
> >  arch/arm64/include/asm/kvm_mmu.h  |  5 -
> >  arch/arm64/include/asm/memory.h   |  1 +
> >  arch/arm64/include/asm/pgtable.h  |  1 +
> >  7 files changed, 36 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> > index 405aa18833073..e8034a80b12e5 100644
> > --- a/arch/arm/include/asm/kvm_mmu.h
> > +++ b/arch/arm/include/asm/kvm_mmu.h
> > @@ -214,8 +214,11 @@ static inline void __coherent_cache_guest_page(struct 
> > kvm_vcpu *vcpu, pfn_t pfn,
> > while (size) {
> > void *va = kmap_atomic_pfn(pfn);
> >  
> > -   if (need_flush)
> > +   if (need_flush) {
> > kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> > +   if (ipa_uncached)
> > +   set_memory_nc((unsigned long)va, 1);
> 
> nit: consider moving this outside the need_flush
> 
> > +   }
> >  
> > if (icache_is_pipt())
> > __cpuc_coherent_user_range((unsigned long)va,
> > diff --git a/arch/arm/include/asm/pgtable-3level.h 
> > b/arch/arm/include/asm/pgtable-3level.h
> > index a745a2a53853c..39b3f7a40e663 100644
> > --- a/arch/arm/include/asm/pgtable-3level.h
> > +++ b/arch/arm/include/asm/pgtable-3level.h
> > @@ -121,6 +121,7 @@
> >   * 2nd stage PTE definitions for LPAE.
> >   */
> >  #define L_PTE_S2_MT_UNCACHED   (_AT(pteval_t, 0x0) << 2) /* 
> > strongly ordered */
> > +#define L_PTE_S2_MT_NORMAL_NC  (_AT(pteval_t, 0x5) << 2) /* 
> > normal non-cacheable */
> >  #define L_PTE_S2_MT_WRITETHROUGH   (_AT(pteval_t, 0xa) << 2) /* normal 
> > inner write-through */
> >  #define L_PTE_S2_MT_WRITEBACK  (_AT(pteval_t, 0xf) << 2) /* 
> > normal inner write-back */
> >  #define L_PTE_S2_MT_DEV_SHARED (_AT(pteval_t, 0x1) << 2) /* 
> > device */
> > diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> > index f40354198bad4..ae13ca8b0a23d 100644
> > --- a/arch/arm/include/asm/pgtable.h
> > +++ b/arch/arm/include/asm/pgtable.h
> > @@ -100,6 +100,7 @@ extern pgprot_t pgprot_s2_device;
> >  #define PAGE_HYP   _MOD_PROT(pgprot_kernel, L_PTE_HYP)
> >  #define PAGE_HYP_DEVICE_MOD_PROT(pgprot_hyp_device, L_PTE_HYP)
> >  #define PAGE_S2_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY)
> > +#define PAGE_S2_NORMAL_NC  __pgprot((pgprot_val(PAGE_S2) & 
> > ~L_PTE_S

Re: [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency

2015-05-14 Thread Christoffer Dall
On Wed, May 13, 2015 at 01:31:54PM +0200, Andrew Jones wrote:
> When S1 and S2 memory attributes combine wrt to caching policy,
> non-cacheable types take precedence. If a guest maps a region as
> device memory, which KVM userspace is using to emulate the device
> using normal, cacheable memory, then we lose coherency. With
> KVM_MEM_UNCACHED, KVM userspace can now hint to KVM which memory
> regions are likely to be problematic. With this patch, as pages
> of these types of regions are faulted into the guest, not only do
> we flush the page's dcache, but we also change userspace's
> mapping to NC in order to maintain coherency.
> 
> What if the guest doesn't do what we expect? While we can't
> force a guest to use cacheable memory, we can take advantage of
> the non-cacheable precedence, and force it to use non-cacheable.
> So, this patch also introduces PAGE_S2_NORMAL_NC, and uses it on
> KVM_MEM_UNCACHED regions to force them to NC.
> 
> We now have both guest and userspace on the same page (pun intended)

I'd like to revisit the overall approach here.  Is doing non-cached
accesses in both the guest and host really the right thing to do here?

The semantics of the device becomes that it is cache coherent (because
QEMU is), and I think Marc argued that Linux/UEFI should simply be
adapted to handle whatever emulated devices we have as coherent.  I also
remember someone arguing that would be wrong (Peter?).

Finally, does this address all cache coherency issues with emulated
devices?  Some VOS guys had seen things still not working with this
approach, unsure why...  I'd like to avoid us merging this only to merge
a more complete solution in a few weeks which reverts this solution...

More comments/questions below:

> 
> Signed-off-by: Andrew Jones 
> ---
>  arch/arm/include/asm/kvm_mmu.h|  5 -
>  arch/arm/include/asm/pgtable-3level.h |  1 +
>  arch/arm/include/asm/pgtable.h|  1 +
>  arch/arm/kvm/mmu.c| 37 
> +++
>  arch/arm64/include/asm/kvm_mmu.h  |  5 -
>  arch/arm64/include/asm/memory.h   |  1 +
>  arch/arm64/include/asm/pgtable.h  |  1 +
>  7 files changed, 36 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 405aa18833073..e8034a80b12e5 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -214,8 +214,11 @@ static inline void __coherent_cache_guest_page(struct 
> kvm_vcpu *vcpu, pfn_t pfn,
>   while (size) {
>   void *va = kmap_atomic_pfn(pfn);
>  
> - if (need_flush)
> + if (need_flush) {
>   kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> + if (ipa_uncached)
> + set_memory_nc((unsigned long)va, 1);

nit: consider moving this outside the need_flush

> + }
>  
>   if (icache_is_pipt())
>   __cpuc_coherent_user_range((unsigned long)va,
> diff --git a/arch/arm/include/asm/pgtable-3level.h 
> b/arch/arm/include/asm/pgtable-3level.h
> index a745a2a53853c..39b3f7a40e663 100644
> --- a/arch/arm/include/asm/pgtable-3level.h
> +++ b/arch/arm/include/asm/pgtable-3level.h
> @@ -121,6 +121,7 @@
>   * 2nd stage PTE definitions for LPAE.
>   */
>  #define L_PTE_S2_MT_UNCACHED (_AT(pteval_t, 0x0) << 2) /* strongly 
> ordered */
> +#define L_PTE_S2_MT_NORMAL_NC(_AT(pteval_t, 0x5) << 2) /* 
> normal non-cacheable */
>  #define L_PTE_S2_MT_WRITETHROUGH (_AT(pteval_t, 0xa) << 2) /* normal 
> inner write-through */
>  #define L_PTE_S2_MT_WRITEBACK(_AT(pteval_t, 0xf) << 2) /* 
> normal inner write-back */
>  #define L_PTE_S2_MT_DEV_SHARED   (_AT(pteval_t, 0x1) << 2) /* 
> device */
> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> index f40354198bad4..ae13ca8b0a23d 100644
> --- a/arch/arm/include/asm/pgtable.h
> +++ b/arch/arm/include/asm/pgtable.h
> @@ -100,6 +100,7 @@ extern pgprot_t   pgprot_s2_device;
>  #define PAGE_HYP _MOD_PROT(pgprot_kernel, L_PTE_HYP)
>  #define PAGE_HYP_DEVICE  _MOD_PROT(pgprot_hyp_device, L_PTE_HYP)
>  #define PAGE_S2  _MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY)
> +#define PAGE_S2_NORMAL_NC__pgprot((pgprot_val(PAGE_S2) & 
> ~L_PTE_S2_MT_MASK) | L_PTE_S2_MT_NORMAL_NC)
>  #define PAGE_S2_DEVICE   _MOD_PROT(pgprot_s2_device, 
> L_PTE_S2_RDONLY)
>  
>  #define __PAGE_NONE  __pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | 
> L_PTE_XN | L_PTE_NONE)
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index bc1665acd73e7..6b3bd8061bd2a 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -1220,7 +1220,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa,
>   struct vm_area_struct *vma;
>   pfn_t pfn;
>   pgprot_t mem_type = PAGE_S2;
> - bool fault_ipa_uncached;
>

[RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency

2015-05-13 Thread Andrew Jones
When S1 and S2 memory attributes combine wrt to caching policy,
non-cacheable types take precedence. If a guest maps a region as
device memory, which KVM userspace is using to emulate the device
using normal, cacheable memory, then we lose coherency. With
KVM_MEM_UNCACHED, KVM userspace can now hint to KVM which memory
regions are likely to be problematic. With this patch, as pages
of these types of regions are faulted into the guest, not only do
we flush the page's dcache, but we also change userspace's
mapping to NC in order to maintain coherency.

What if the guest doesn't do what we expect? While we can't
force a guest to use cacheable memory, we can take advantage of
the non-cacheable precedence, and force it to use non-cacheable.
So, this patch also introduces PAGE_S2_NORMAL_NC, and uses it on
KVM_MEM_UNCACHED regions to force them to NC.

We now have both guest and userspace on the same page (pun intended)

Signed-off-by: Andrew Jones 
---
 arch/arm/include/asm/kvm_mmu.h|  5 -
 arch/arm/include/asm/pgtable-3level.h |  1 +
 arch/arm/include/asm/pgtable.h|  1 +
 arch/arm/kvm/mmu.c| 37 +++
 arch/arm64/include/asm/kvm_mmu.h  |  5 -
 arch/arm64/include/asm/memory.h   |  1 +
 arch/arm64/include/asm/pgtable.h  |  1 +
 7 files changed, 36 insertions(+), 15 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 405aa18833073..e8034a80b12e5 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -214,8 +214,11 @@ static inline void __coherent_cache_guest_page(struct 
kvm_vcpu *vcpu, pfn_t pfn,
while (size) {
void *va = kmap_atomic_pfn(pfn);
 
-   if (need_flush)
+   if (need_flush) {
kvm_flush_dcache_to_poc(va, PAGE_SIZE);
+   if (ipa_uncached)
+   set_memory_nc((unsigned long)va, 1);
+   }
 
if (icache_is_pipt())
__cpuc_coherent_user_range((unsigned long)va,
diff --git a/arch/arm/include/asm/pgtable-3level.h 
b/arch/arm/include/asm/pgtable-3level.h
index a745a2a53853c..39b3f7a40e663 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -121,6 +121,7 @@
  * 2nd stage PTE definitions for LPAE.
  */
 #define L_PTE_S2_MT_UNCACHED   (_AT(pteval_t, 0x0) << 2) /* strongly 
ordered */
+#define L_PTE_S2_MT_NORMAL_NC  (_AT(pteval_t, 0x5) << 2) /* normal 
non-cacheable */
 #define L_PTE_S2_MT_WRITETHROUGH   (_AT(pteval_t, 0xa) << 2) /* normal 
inner write-through */
 #define L_PTE_S2_MT_WRITEBACK  (_AT(pteval_t, 0xf) << 2) /* normal 
inner write-back */
 #define L_PTE_S2_MT_DEV_SHARED (_AT(pteval_t, 0x1) << 2) /* device */
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index f40354198bad4..ae13ca8b0a23d 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -100,6 +100,7 @@ extern pgprot_t pgprot_s2_device;
 #define PAGE_HYP   _MOD_PROT(pgprot_kernel, L_PTE_HYP)
 #define PAGE_HYP_DEVICE_MOD_PROT(pgprot_hyp_device, L_PTE_HYP)
 #define PAGE_S2_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY)
+#define PAGE_S2_NORMAL_NC  __pgprot((pgprot_val(PAGE_S2) & 
~L_PTE_S2_MT_MASK) | L_PTE_S2_MT_NORMAL_NC)
 #define PAGE_S2_DEVICE _MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY)
 
 #define __PAGE_NONE__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | 
L_PTE_XN | L_PTE_NONE)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index bc1665acd73e7..6b3bd8061bd2a 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1220,7 +1220,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
struct vm_area_struct *vma;
pfn_t pfn;
pgprot_t mem_type = PAGE_S2;
-   bool fault_ipa_uncached;
+   bool fault_ipa_uncached = false;
bool logging_active = memslot_is_logging(memslot);
unsigned long flags = 0;
 
@@ -1300,6 +1300,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
writable = false;
}
 
+   if (memslot->flags & KVM_MEM_UNCACHED) {
+   mem_type = PAGE_S2_NORMAL_NC;
+   fault_ipa_uncached = true;
+   }
+
spin_lock(&kvm->mmu_lock);
if (mmu_notifier_retry(kvm, mmu_seq))
goto out_unlock;
@@ -1307,8 +1312,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
if (!hugetlb && !force_pte)
hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
 
-   fault_ipa_uncached = memslot->flags & KVM_MEM_UNCACHED;
-
if (hugetlb) {
pmd_t new_pmd = pfn_pmd(pfn, mem_type);
new_pmd = pmd_mkhuge(new_pmd);
@@ -1462,6 +1465,7 @@ static int handle_hva_to_gpa(st