Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency

2015-05-14 Thread Andrew Jones
On Thu, May 14, 2015 at 12:55:49PM +0200, Christoffer Dall wrote:
> On Wed, May 13, 2015 at 01:31:54PM +0200, Andrew Jones wrote:
> > When S1 and S2 memory attributes combine wrt to caching policy,
> > non-cacheable types take precedence. If a guest maps a region as
> > device memory, which KVM userspace is using to emulate the device
> > using normal, cacheable memory, then we lose coherency. With
> > KVM_MEM_UNCACHED, KVM userspace can now hint to KVM which memory
> > regions are likely to be problematic. With this patch, as pages
> > of these types of regions are faulted into the guest, not only do
> > we flush the page's dcache, but we also change userspace's
> > mapping to NC in order to maintain coherency.
> > 
> > What if the guest doesn't do what we expect? While we can't
> > force a guest to use cacheable memory, we can take advantage of
> > the non-cacheable precedence, and force it to use non-cacheable.
> > So, this patch also introduces PAGE_S2_NORMAL_NC, and uses it on
> > KVM_MEM_UNCACHED regions to force them to NC.
> > 
> > We now have both guest and userspace on the same page (pun intended)
> 
> I'd like to revisit the overall approach here.  Is doing non-cached
> accesses in both the guest and host really the right thing to do here?

I think so, but all ideas/approaches are still on the table. This is
still an RFC.

> 
> The semantics of the device becomes that it is cache coherent (because
> QEMU is), and I think Marc argued that Linux/UEFI should simply be
> adapted to handle whatever emulated devices we have as coherent.  I also
> remember someone arguing that would be wrong (Peter?).

I'm not really for quirking all devices in all guest types (AAVMF, Linux,
other bootloaders, other OSes). Windows is unlikely to apply any quirks.

> 
> Finally, does this address all cache coherency issues with emulated
> devices?  Some VOS guys had seen things still not working with this
> approach, unsure why...  I'd like to avoid us merging this only to merge
> a more complete solution in a few weeks which reverts this solution...

I'm not sure (this is still an RFT too :-) We definitely would need to
scatter some more memory_region_set_uncached() calls around QEMU first.

> 
> More comments/questions below:
> 
> > 
> > Signed-off-by: Andrew Jones 
> > ---
> >  arch/arm/include/asm/kvm_mmu.h|  5 -
> >  arch/arm/include/asm/pgtable-3level.h |  1 +
> >  arch/arm/include/asm/pgtable.h|  1 +
> >  arch/arm/kvm/mmu.c| 37 
> > +++
> >  arch/arm64/include/asm/kvm_mmu.h  |  5 -
> >  arch/arm64/include/asm/memory.h   |  1 +
> >  arch/arm64/include/asm/pgtable.h  |  1 +
> >  7 files changed, 36 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> > index 405aa18833073..e8034a80b12e5 100644
> > --- a/arch/arm/include/asm/kvm_mmu.h
> > +++ b/arch/arm/include/asm/kvm_mmu.h
> > @@ -214,8 +214,11 @@ static inline void __coherent_cache_guest_page(struct 
> > kvm_vcpu *vcpu, pfn_t pfn,
> > while (size) {
> > void *va = kmap_atomic_pfn(pfn);
> >  
> > -   if (need_flush)
> > +   if (need_flush) {
> > kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> > +   if (ipa_uncached)
> > +   set_memory_nc((unsigned long)va, 1);
> 
> nit: consider moving this outside the need_flush
> 
> > +   }
> >  
> > if (icache_is_pipt())
> > __cpuc_coherent_user_range((unsigned long)va,
> > diff --git a/arch/arm/include/asm/pgtable-3level.h 
> > b/arch/arm/include/asm/pgtable-3level.h
> > index a745a2a53853c..39b3f7a40e663 100644
> > --- a/arch/arm/include/asm/pgtable-3level.h
> > +++ b/arch/arm/include/asm/pgtable-3level.h
> > @@ -121,6 +121,7 @@
> >   * 2nd stage PTE definitions for LPAE.
> >   */
> >  #define L_PTE_S2_MT_UNCACHED   (_AT(pteval_t, 0x0) << 2) /* 
> > strongly ordered */
> > +#define L_PTE_S2_MT_NORMAL_NC  (_AT(pteval_t, 0x5) << 2) /* 
> > normal non-cacheable */
> >  #define L_PTE_S2_MT_WRITETHROUGH   (_AT(pteval_t, 0xa) << 2) /* normal 
> > inner write-through */
> >  #define L_PTE_S2_MT_WRITEBACK  (_AT(pteval_t, 0xf) << 2) /* 
> > normal inner write-back */
> >  #define L_PTE_S2_MT_DEV_SHARED (_AT(pteval_t, 0x1) << 2) /* 
> > device */
> > diff --git a/arch/arm/inc

Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-14 Thread Andrew Jones
On Thu, May 14, 2015 at 02:11:59PM +0100, Peter Maydell wrote:
> On 14 May 2015 at 14:03, Andrew Jones  wrote:
> > On Thu, May 14, 2015 at 11:37:46AM +0100, Peter Maydell wrote:
> >> On 14 May 2015 at 11:31, Andrew Jones  wrote:
> >> > Forgot to (4): switch from setting userspace's mapping to
> >> > device memory to normal, non-cacheable. Using device memory
> >> > caused a problem that Alex Graf found, and Peter Maydell suggested
> >> > using normal, non-cacheable instead.
> >>
> >> Did you check that non-cacheable is definitely the correct
> >> kind of Normal memory attribute we want? (ie not write-through).
> >
> > I was concerned that write-through wouldn't be sufficient. If the
> > guest writes to its non-cached memory, and QEMU needs to see what
> > it wrote, then won't write-through fail to work? Unless we some
> > how invalidate the cache first?
> 
> Well, I meant more that the correct mapping for userspace is
> the same as the guest, whatever that is, and so somebody needs
> to look at what the guest actually does rather than merely
> hoping NormalNC is OK. (For instance, do we need to provide
> support for QEMU to map both NC and writethrough?)
>

Ah, we assume the guest is mapping it as device memory, and in
this version of the series, I ensure that it is at least NC with
the S2 attributes. I don't think we can look at what some guests
do with some devices to come up with anything beyond (poor?)
heuristics. I prefer that we force both the guest and QEMU to NC
(or guest chooses Device and QEMU is forced to NC) to make sure
we get it right.

drew



Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-14 Thread Andrew Jones
On Thu, May 14, 2015 at 01:38:11PM +0100, Peter Maydell wrote:
> On 14 May 2015 at 13:28, Paolo Bonzini  wrote:
> > Well, PCI BARs are generally MMIO resources, and hence should not be cached.
> >
> > As an optimization, OS drivers can mark them as cacheable or
> > write-combining or something like that, but in general it's a safe
> > default to leave them uncached---one would think.
> 
> Isn't this handled by the OS mapping them in the 'prefetchable'
> MMIO window rather than the 'non-prefetchable' one? (QEMU's
> generic-PCIe device doesn't yet support the prefetchable window.)

I was thinking (with my limited PCI knowledge) the same thing, and
was planning on experimenting with that.

drew



Re: [Qemu-devel] [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc

2015-05-14 Thread Andrew Jones
On Thu, May 14, 2015 at 01:05:09PM +0200, Christoffer Dall wrote:
> On Wed, May 13, 2015 at 01:31:52PM +0200, Andrew Jones wrote:
> > Provide a method to change normal, cacheable memory to non-cacheable.
> > KVM will make use of this to keep emulated device memory regions
> > coherent with the guest.
> > 
> > Signed-off-by: Andrew Jones 
> 
> Reviewed-by: Christoffer Dall 
> 
> But you obviously need Russell and Will/Catalin to ack/merge this.

I guess this patch is going to go away in the next round. You've
pointed out that I screwed stuff up royally with my over eagerness
to reuse code. I need to reimplement change_memory_common, but a
version that takes an mm, which is more or less what I did in the
last version of this series, back when I was pinning pages.

drew



Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency

2015-05-15 Thread Andrew Jones
On Fri, May 15, 2015 at 08:02:59AM -0700, Christoffer Dall wrote:
> On Thu, May 14, 2015 at 03:32:13PM +0200, Andrew Jones wrote:
> > On Thu, May 14, 2015 at 12:55:49PM +0200, Christoffer Dall wrote:
> > > On Wed, May 13, 2015 at 01:31:54PM +0200, Andrew Jones wrote:
> > > > When S1 and S2 memory attributes combine wrt to caching policy,
> > > > non-cacheable types take precedence. If a guest maps a region as
> > > > device memory, which KVM userspace is using to emulate the device
> > > > using normal, cacheable memory, then we lose coherency. With
> > > > KVM_MEM_UNCACHED, KVM userspace can now hint to KVM which memory
> > > > regions are likely to be problematic. With this patch, as pages
> > > > of these types of regions are faulted into the guest, not only do
> > > > we flush the page's dcache, but we also change userspace's
> > > > mapping to NC in order to maintain coherency.
> > > > 
> > > > What if the guest doesn't do what we expect? While we can't
> > > > force a guest to use cacheable memory, we can take advantage of
> > > > the non-cacheable precedence, and force it to use non-cacheable.
> > > > So, this patch also introduces PAGE_S2_NORMAL_NC, and uses it on
> > > > KVM_MEM_UNCACHED regions to force them to NC.
> > > > 
> > > > We now have both guest and userspace on the same page (pun intended)
> > > 
> > > I'd like to revisit the overall approach here.  Is doing non-cached
> > > accesses in both the guest and host really the right thing to do here?
> > 
> > I think so, but all ideas/approaches are still on the table. This is
> > still an RFC.
> > 
> > > 
> > > The semantics of the device becomes that it is cache coherent (because
> > > QEMU is), and I think Marc argued that Linux/UEFI should simply be
> > > adapted to handle whatever emulated devices we have as coherent.  I also
> > > remember someone arguing that would be wrong (Peter?).
> > 
> > I'm not really for quirking all devices in all guest types (AAVMF, Linux,
> > other bootloaders, other OSes). Windows is unlikely to apply any quirks.
> > 
> 
> Well my point was that if we're emulating a platform with coherent IO
> memory for PCI devices that is something that the guest should work with
> as such, but as Paolo explained it should always be safe for a guest to
> assume non-coherent, so that doesn't work.
> 
> > > 
> > > Finally, does this address all cache coherency issues with emulated
> > > devices?  Some VOS guys had seen things still not working with this
> > > approach, unsure why...  I'd like to avoid us merging this only to merge
> > > a more complete solution in a few weeks which reverts this solution...
> > 
> > I'm not sure (this is still an RFT too :-) We definitely would need to
> > scatter some more memory_region_set_uncached() calls around QEMU first.
> > 
> 
> It would be good if you could sync with the VOS guys and make sure your
> patch set addresses their issues with the appropriate
> memory_region_set_uncached() added to QEMU, and if it does not, some
> vague idea why that falls outside of the scope of this patch set.  After
> all, adding a USB controller to a VM is not that an esoteric use case,
> is it?

I'll pull together a new version addressing all your comments, and also
put some more time into making sure it'll work...

Jeremy, can you give me the qemu command line you're using for your tests?
I'll do some experimenting with it.

Thanks,
drew



Re: [Qemu-devel] [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc

2015-05-19 Thread Andrew Jones

Hi Catalin,

Thanks for the feedback. Some comments to your comments below.

On Mon, May 18, 2015 at 04:53:03PM +0100, Catalin Marinas wrote:
> On Thu, May 14, 2015 at 02:46:44PM +0100, Andrew Jones wrote:
> > On Thu, May 14, 2015 at 01:05:09PM +0200, Christoffer Dall wrote:
> > > On Wed, May 13, 2015 at 01:31:52PM +0200, Andrew Jones wrote:
> > > > Provide a method to change normal, cacheable memory to non-cacheable.
> > > > KVM will make use of this to keep emulated device memory regions
> > > > coherent with the guest.
> > > > 
> > > > Signed-off-by: Andrew Jones 
> > > 
> > > Reviewed-by: Christoffer Dall 
> > > 
> > > But you obviously need Russell and Will/Catalin to ack/merge this.
> > 
> > I guess this patch is going to go away in the next round. You've
> > pointed out that I screwed stuff up royally with my over eagerness
> > to reuse code. I need to reimplement change_memory_common, but a
> > version that takes an mm, which is more or less what I did in the
> > last version of this series, back when I was pinning pages.
> 
> I kept wondering what this patch and the next one are doing with
> set_memory_nc(). Basically you were trying to set the cache attributes
> for the kernel linear mapping or kmap address (in the 32-bit arm case,
> which is removed anyway on kunmap).

Yeah, I was way to hasty with this version... Sorry for wasting people's
time with it.

> 
> What you need is changing the attributes of the user mapping as accessed
> by Qemu but I don't think simply re-implementing change_memory_common()
> would work, you probably need to pin the pages in memory as well.
> Otherwise, the kernel may remove such pages and, when bringing them
> back, would set the default cacheability attributes.

Yup, I read that code and saw it would inherit the memory attributes
from the vma. At the time I wasn't thinking about the userspace mapping
though, and thus had convinced myself that if "the" mapping goes away,
then we'll be invalidating the guest's mapping too, and thus we'd end up
faulting it in again when necessary, and at that time resetting the
attributes. If the guest never faulted it in again, then it wouldn't
have mattered what the attributes were anyway. Of course I was thinking
about the wrong mapping...

> 
> Another way would be to split the vma containing the non-cacheable
> memory so that you get a single vma with the vm_page_prot as
> Non-cacheable.

This sounds interesting. Actually, it even crossed my mind once when I
first saw that the vma would overwrite the attributes, but then, sigh,
I let my brain take a stupidity bath.

> 
> Yet another approach could be for KVM to mmap the necessary memory for
> Qemu via a file_operations.mmap call (but that's only for ranges outside
> the guest "RAM").

I guess I prefer the vma splitting, rather than this (the vma creating
with mmap), as it keeps the KVM interface from changing (as you point out
below). Well, unless there are other advantages to this that are worth
considering?

> 
> I didn't have time to follow these threads in details, but just to
> recap my understanding, we have two main use-cases:
> 
> 1. Qemu handling guest I/O to device (e.g. PCIe BARs)
> 2. Qemu emulating device DMA
> 
> For (1), I guess Qemu uses an anonymous mmap() and then tells KVM about
> this memory slot. The memory attributes in this case could be Device
> because that's how the guest would normally map it. The
> file_operations.mmap trick would work in this case but this means
> expanding the KVM ABI beyond just an ioctl().
> 
> For (2), since Qemu is writing to the guest "RAM" (e.g. video
> framebuffer allocated by the guest), I still think the simplest is to
> tell the guest (via DT) that such device is cache coherent rather than
> trying to remap the Qemu mapping as non-cacheable.

If we need a solution for (1), then I'd prefer that it work and be
applied to (2) as well. Anyway, I'm still not 100% sure we can count on
all guest types (booloaders, different OSes) to listen to us. They may
assume non-cacheable is typical and safe, and thus just do that always.
We can certainly change some of those bootloaders and OSes, but probably
not all of them.

Thanks,
drew



Re: [Qemu-devel] [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc

2015-05-19 Thread Andrew Jones
On Tue, May 19, 2015 at 12:18:54PM +0100, Catalin Marinas wrote:
> On Tue, May 19, 2015 at 11:03:22AM +0100, Andrew Jones wrote:
> > On Mon, May 18, 2015 at 04:53:03PM +0100, Catalin Marinas wrote:
> > > Another way would be to split the vma containing the non-cacheable
> > > memory so that you get a single vma with the vm_page_prot as
> > > Non-cacheable.
> > 
> > This sounds interesting. Actually, it even crossed my mind once when I
> > first saw that the vma would overwrite the attributes, but then, sigh,
> > I let my brain take a stupidity bath.
> > 
> > > 
> > > Yet another approach could be for KVM to mmap the necessary memory for
> > > Qemu via a file_operations.mmap call (but that's only for ranges outside
> > > the guest "RAM").
> > 
> > I guess I prefer the vma splitting, rather than this (the vma creating
> > with mmap), as it keeps the KVM interface from changing (as you point out
> > below). Well, unless there are other advantages to this that are worth
> > considering?
> 
> The advantage is that you don't need to deal with the mm internals in
> the KVM code.
> 
> But you can probably add such code directly to mm/ and reuse some of the
> existing code in there already as part of change_protection(),
> mprotect_fixup(), sys_mprotect(). Actually, once you split the vma and
> set the new protection (something similar to mprotect_fixup), it looks
> to me like you can just call change_protection(vma->vm_page_prot).

I'll start playing around with this today.

> 
> > > I didn't have time to follow these threads in details, but just to
> > > recap my understanding, we have two main use-cases:
> > > 
> > > 1. Qemu handling guest I/O to device (e.g. PCIe BARs)
> > > 2. Qemu emulating device DMA
> > > 
> > > For (1), I guess Qemu uses an anonymous mmap() and then tells KVM about
> > > this memory slot. The memory attributes in this case could be Device
> > > because that's how the guest would normally map it. The
> > > file_operations.mmap trick would work in this case but this means
> > > expanding the KVM ABI beyond just an ioctl().
> > > 
> > > For (2), since Qemu is writing to the guest "RAM" (e.g. video
> > > framebuffer allocated by the guest), I still think the simplest is to
> > > tell the guest (via DT) that such device is cache coherent rather than
> > > trying to remap the Qemu mapping as non-cacheable.
> > 
> > If we need a solution for (1), then I'd prefer that it work and be
> > applied to (2) as well. Anyway, I'm still not 100% sure we can count on
> > all guest types (booloaders, different OSes) to listen to us. They may
> > assume non-cacheable is typical and safe, and thus just do that always.
> > We can certainly change some of those bootloaders and OSes, but probably
> > not all of them.
> 
> That's fine by me. Once you get the vma splitting and attributes
> changing done, I think you get the second one for free.
> 
> Do we want to differentiate between Device and Normal Non-cacheable
> memory? Something like KVM_MEMSLOT_DEVICE?
> 
> Nitpick: I'm not sure whether "uncached" is clear enough. In Linux,
> pgprot_noncached() returns Strongly Ordered memory. For Normal
> Non-cachable we used pgprot_writecombine (e.g. a video framebuffer).
> 
> Maybe something like KVM_MEMSLOT_COHERENT meaning a request to KVM to

Sounds good to me. I'll rename for the next round.

> ensure that guest and host access it coherently (which would mean
> writecombine for ARM). That's similar naming to functions like
> dma_alloc_coherent() that return cacheable or non-cacheable memory based
> on what the device supports. Anyway, I'm not to bothered with the
> naming.
>

Thanks for your help!

drew 



Re: [Qemu-devel] [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency

2015-05-21 Thread Andrew Jones
On Wed, May 20, 2015 at 07:29:28PM -0700, Mario Smarduch wrote:
> On 05/15/2015 10:04 AM, Andrew Jones wrote:
> > On Fri, May 15, 2015 at 08:02:59AM -0700, Christoffer Dall wrote:
> >> On Thu, May 14, 2015 at 03:32:13PM +0200, Andrew Jones wrote:
> >>> On Thu, May 14, 2015 at 12:55:49PM +0200, Christoffer Dall wrote:
> >>>> On Wed, May 13, 2015 at 01:31:54PM +0200, Andrew Jones wrote:
> >>>>> When S1 and S2 memory attributes combine wrt to caching policy,
> >>>>> non-cacheable types take precedence. If a guest maps a region as
> >>>>> device memory, which KVM userspace is using to emulate the device
> >>>>> using normal, cacheable memory, then we lose coherency. With
> >>>>> KVM_MEM_UNCACHED, KVM userspace can now hint to KVM which memory
> >>>>> regions are likely to be problematic. With this patch, as pages
> >>>>> of these types of regions are faulted into the guest, not only do
> >>>>> we flush the page's dcache, but we also change userspace's
> >>>>> mapping to NC in order to maintain coherency.
> >>>>>
> >>>>> What if the guest doesn't do what we expect? While we can't
> >>>>> force a guest to use cacheable memory, we can take advantage of
> >>>>> the non-cacheable precedence, and force it to use non-cacheable.
> >>>>> So, this patch also introduces PAGE_S2_NORMAL_NC, and uses it on
> >>>>> KVM_MEM_UNCACHED regions to force them to NC.
> >>>>>
> >>>>> We now have both guest and userspace on the same page (pun intended)
> >>>>
> >>>> I'd like to revisit the overall approach here.  Is doing non-cached
> >>>> accesses in both the guest and host really the right thing to do here?
> >>>
> >>> I think so, but all ideas/approaches are still on the table. This is
> >>> still an RFC.
> >>>
> >>>>
> >>>> The semantics of the device becomes that it is cache coherent (because
> >>>> QEMU is), and I think Marc argued that Linux/UEFI should simply be
> >>>> adapted to handle whatever emulated devices we have as coherent.  I also
> >>>> remember someone arguing that would be wrong (Peter?).
> >>>
> >>> I'm not really for quirking all devices in all guest types (AAVMF, Linux,
> >>> other bootloaders, other OSes). Windows is unlikely to apply any quirks.
> >>>
> >>
> >> Well my point was that if we're emulating a platform with coherent IO
> >> memory for PCI devices that is something that the guest should work with
> >> as such, but as Paolo explained it should always be safe for a guest to
> >> assume non-coherent, so that doesn't work.
> >>
> >>>>
> >>>> Finally, does this address all cache coherency issues with emulated
> >>>> devices?  Some VOS guys had seen things still not working with this
> >>>> approach, unsure why...  I'd like to avoid us merging this only to merge
> >>>> a more complete solution in a few weeks which reverts this solution...
> >>>
> >>> I'm not sure (this is still an RFT too :-) We definitely would need to
> >>> scatter some more memory_region_set_uncached() calls around QEMU first.
> >>>
> >>
> >> It would be good if you could sync with the VOS guys and make sure your
> >> patch set addresses their issues with the appropriate
> >> memory_region_set_uncached() added to QEMU, and if it does not, some
> >> vague idea why that falls outside of the scope of this patch set.  After
> >> all, adding a USB controller to a VM is not that an esoteric use case,
> >> is it?
> > 
> > I'll pull together a new version addressing all your comments, and also
> > put some more time into making sure it'll work...
> > 
> > Jeremy, can you give me the qemu command line you're using for your tests?
> > I'll do some experimenting with it.
> > 
> > Thanks,
> > drew
> > 
> 
> Hi Drew,
> 
> I just recently looked at these patches and little confused.
> 
> Where or how are the QEMU page tables changed to
> non-cacheable?
> 
> I noticed the logical pfn is changed to non-cacheable.

Right, this series is broken. Please stay tuned, I'll fix it the
next time around.

Thanks for reviewing.

drew



Re: [Qemu-devel] [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc

2015-05-25 Thread Andrew Jones
On Fri, May 22, 2015 at 06:08:30PM -0700, Mario Smarduch wrote:
> On 05/18/2015 08:53 AM, Catalin Marinas wrote:
> > On Thu, May 14, 2015 at 02:46:44PM +0100, Andrew Jones wrote:
> >> On Thu, May 14, 2015 at 01:05:09PM +0200, Christoffer Dall wrote:
> >>> On Wed, May 13, 2015 at 01:31:52PM +0200, Andrew Jones wrote:
> >>>> Provide a method to change normal, cacheable memory to non-cacheable.
> >>>> KVM will make use of this to keep emulated device memory regions
> >>>> coherent with the guest.
> >>>>
> >>>> Signed-off-by: Andrew Jones 
> >>>
> >>> Reviewed-by: Christoffer Dall 
> >>>
> >>> But you obviously need Russell and Will/Catalin to ack/merge this.
> >>
> >> I guess this patch is going to go away in the next round. You've
> >> pointed out that I screwed stuff up royally with my over eagerness
> >> to reuse code. I need to reimplement change_memory_common, but a
> >> version that takes an mm, which is more or less what I did in the
> >> last version of this series, back when I was pinning pages.
> > 
> > I kept wondering what this patch and the next one are doing with
> > set_memory_nc(). Basically you were trying to set the cache attributes
> > for the kernel linear mapping or kmap address (in the 32-bit arm case,
> > which is removed anyway on kunmap).
> > 
> > What you need is changing the attributes of the user mapping as accessed
> > by Qemu but I don't think simply re-implementing change_memory_common()
> > would work, you probably need to pin the pages in memory as well.
> > Otherwise, the kernel may remove such pages and, when bringing them
> > back, would set the default cacheability attributes.
> > 
> > Another way would be to split the vma containing the non-cacheable
> > memory so that you get a single vma with the vm_page_prot as
> > Non-cacheable.
> > 
> > Yet another approach could be for KVM to mmap the necessary memory for
> > Qemu via a file_operations.mmap call (but that's only for ranges outside
> > the guest "RAM").
> 
> I think this option with a basic loadable driver
> that allocates non-cachable/pinned pages for QEMU to mmap()
> may provide a reference point to build on. If that covers
> all the cases then perhaps move to more generic solution. This
> should be quicker to implement and test.

I've pulled together a different approach for experimentation. I added a
new mmap/mprotect prot flag, like what was done for the powerpc SAO bit
(see commit b845f313d78e4e "mm: Allow architectures to define additional
protection bits" and commit ef3d3246a0d06 " powerpc/mm: Add Strong Access
Ordering support"). So far I've only tested with a simple test program that
forks and messes around with mapping shared memory in different ways. With
some cache flushing added to set_pte_at on memattr changes, then it seems
to work pretty well.

Now I'll start experimenting with QEMU again to see if the "map QEMU's
memory as noncacheable" approach makes any sense at all. If it does,
then I'm not sure we want to do it with mprotect, but I could clean up
the patches and post them for discussion. The main problems I see with it
are the need to define a new PROT_ flag, and the fact that it might not
be a good idea for userspace to have this capability in the first place,
at least not for MAP_SHARED regions.

> 
> I wonder if kernel mm will ever have a reason
> to create a cacheable mapping even if the pages are pinned?
> Like reading /dev/mem although that's not a likely case here.

Actually for a version with pinned pages, then I think this one
http://www.spinics.net/lists/kvm-arm/msg14021.html
is a good start. There are some problems with it, but I can fix
them in order for it to be useful for experimenting with QEMU. It
suffers from the Device-nGnRnE issue Peter and Alex pointed out, and
I also see that I'm using the wrong address in there as well for the
dcache flush, because I'm using kvm_flush_dcache_pte(*ptep) in
set_page_uncached, which ends up using page_address(page). I should
just do __flush_dcache_area(addr, PAGE_SIZE), instead.

Thanks,
drew



[Qemu-devel] [PATCH] kvm-all: put kvm_mem_flags to more work

2015-03-06 Thread Andrew Jones
Currently kvm_mem_flags just translates bools to bits, let's
make it also determine the bools first. This avoids its parameter
list growing each time we add a flag.

Signed-off-by: Andrew Jones 
---
This patch comes from an experimental series that added a kvm
memslot flag. That series probably won't ever be merged, but
this patch on it's own seems like a nice cleanup to me.

 kvm-all.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 05a79c20e0bba..507fa7204e062 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -294,10 +294,14 @@ err:
  * dirty pages logging control
  */
 
-static int kvm_mem_flags(KVMState *s, bool log_dirty, bool readonly)
+static int kvm_mem_flags(MemoryRegion *mr)
 {
+bool readonly = mr->readonly || memory_region_is_romd(mr);
 int flags = 0;
-flags = log_dirty ? KVM_MEM_LOG_DIRTY_PAGES : 0;
+
+if (memory_region_is_logging(mr)) {
+flags |= KVM_MEM_LOG_DIRTY_PAGES;
+}
 if (readonly && kvm_readonly_mem_allowed) {
 flags |= KVM_MEM_READONLY;
 }
@@ -312,7 +316,10 @@ static int kvm_slot_dirty_pages_log_change(KVMSlot *mem, 
bool log_dirty)
 
 old_flags = mem->flags;
 
-flags = (mem->flags & ~mask) | kvm_mem_flags(s, log_dirty, false);
+flags = mem->flags & ~mask;
+if (log_dirty) {
+flags |= KVM_MEM_LOG_DIRTY_PAGES;
+}
 mem->flags = flags;
 
 /* If nothing changed effectively, no need to issue ioctl */
@@ -642,9 +649,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, 
bool add)
 KVMSlot *mem, old;
 int err;
 MemoryRegion *mr = section->mr;
-bool log_dirty = memory_region_is_logging(mr);
 bool writeable = !mr->readonly && !mr->rom_device;
-bool readonly_flag = mr->readonly || memory_region_is_romd(mr);
 hwaddr start_addr = section->offset_within_address_space;
 ram_addr_t size = int128_get64(section->size);
 void *ram = NULL;
@@ -688,7 +693,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, 
bool add)
 (ram - start_addr == mem->ram - mem->start_addr)) {
 /* The new slot fits into the existing one and comes with
  * identical parameters - update flags and done. */
-kvm_slot_dirty_pages_log_change(mem, log_dirty);
+kvm_slot_dirty_pages_log_change(mem, memory_region_is_logging(mr));
 return;
 }
 
@@ -721,7 +726,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, 
bool add)
 mem->memory_size = old.memory_size;
 mem->start_addr = old.start_addr;
 mem->ram = old.ram;
-mem->flags = kvm_mem_flags(s, log_dirty, readonly_flag);
+mem->flags = kvm_mem_flags(mr);
 
 err = kvm_set_user_memory_region(s, mem);
 if (err) {
@@ -742,7 +747,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, 
bool add)
 mem->memory_size = start_addr - old.start_addr;
 mem->start_addr = old.start_addr;
 mem->ram = old.ram;
-mem->flags =  kvm_mem_flags(s, log_dirty, readonly_flag);
+mem->flags =  kvm_mem_flags(mr);
 
 err = kvm_set_user_memory_region(s, mem);
 if (err) {
@@ -766,7 +771,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, 
bool add)
 size_delta = mem->start_addr - old.start_addr;
 mem->memory_size = old.memory_size - size_delta;
 mem->ram = old.ram + size_delta;
-mem->flags = kvm_mem_flags(s, log_dirty, readonly_flag);
+mem->flags = kvm_mem_flags(mr);
 
 err = kvm_set_user_memory_region(s, mem);
 if (err) {
@@ -788,7 +793,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, 
bool add)
 mem->memory_size = size;
 mem->start_addr = start_addr;
 mem->ram = ram;
-mem->flags = kvm_mem_flags(s, log_dirty, readonly_flag);
+mem->flags = kvm_mem_flags(mr);
 
 err = kvm_set_user_memory_region(s, mem);
 if (err) {
-- 
1.8.3.1




[Qemu-devel] the arm cache coherency cluster

2015-03-06 Thread Andrew Jones
In reply to this message I'll send two series' one for KVM and
one for QEMU. The two series' are their respective component
complements, and attempt to implement cache coherency for arm
guests using emulated devices, where the emulator (qemu) uses
cached memory for the device memory, but the guest uses
uncached - as device memory is generally used. Right now I've
just focused on VGA vram.

This approach starts as the "add a new memslot flag" approach,
and then turns into the "make qemu do some cache maintenance"
approach with the final patch of each series (6/6). It stops
short of the "add syscalls..." approach. Below is a summary of
all the approaches discussed so far, to my knowledge.

"MAIR manipulating"
Posted[1] by Ard. Works. No performance degradation. Potential
issues with device assignment and the guest getting confused.

"add a new memslot flag"
This posting (not counting patches 6/6). Works. Huge performance
degradation.

"make qemu do some cache maintenance"
This posting (patches 6/6). We can only do so much in qemu
without syscalls. This series does what it can. Almost works,
probably could work, after playing 'find the missing flush'.
This approach still requires the new memslot flag, as userspace
can't invalidate the cache, only clean, or clean+invalidate.
No noticeable performance degradation.

"add syscalls to make qemu do all cache maintenance"
Variant 1: implement as kvm ioctls - to avoid trying to get
   syscalls into the general kernel
Variant 2: add real syscalls, or maybe just ARM private SWIs
   like __ARM_NR_cacheflush
This approach should work, and if we add an invalidate syscall,
then we shouldn't need any kvm changes at all, i.e. no need for
the memslot flag. I haven't experimented with this yet, but I'm
starting to like the idea of variant 2, with a private SWI, so
will try to pull something together soon for that.

"describe the problematic memory as cached to the guest"
Not an ideal solution for virt. Could maybe be workable as a
quirk for a specific device though.

re: $SUBJECT; Here 'cluster' is defined by the urban dictionary.

[1] http://thread.gmane.org/gmane.comp.emulators.kvm.arm.devel/34/



[Qemu-devel] [RFC PATCH 3/6] KVM: ARM: change __coherent_cache_guest_page interface

2015-03-06 Thread Andrew Jones
Remove the vcpu parameter. We can do this by doing the same
query in the caller of __coherent_cache_guest_page, and then
folding the result into its ipa_uncached parameter, which we
rename to need_flush.

A later patch will add a new caller for __coherent_cache_guest_page
that does not have a vcpu parameter.

Signed-off-by: Andrew Jones 
---
 arch/arm/include/asm/kvm_mmu.h   | 7 ++-
 arch/arm/kvm/mmu.c   | 3 ++-
 arch/arm64/include/asm/kvm_mmu.h | 7 +++
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 37ca2a4c6f094..fd801e96fdd3c 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -183,9 +183,8 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu 
*vcpu)
return (vcpu->arch.cp15[c1_SCTLR] & 0b101) == 0b101;
 }
 
-static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t 
pfn,
-  unsigned long size,
-  bool ipa_uncached)
+static inline void __coherent_cache_guest_page(pfn_t pfn, unsigned long size,
+  bool need_flush)
 {
/*
 * If we are going to insert an instruction page and the icache is
@@ -205,8 +204,6 @@ static inline void __coherent_cache_guest_page(struct 
kvm_vcpu *vcpu, pfn_t pfn,
 * and iterate over the range.
 */
 
-   bool need_flush = !vcpu_has_cache_enabled(vcpu) || ipa_uncached;
-
VM_BUG_ON(size & PAGE_MASK);
 
if (!need_flush && !icache_is_pipt())
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index a806e8cecc01b..781afc712871c 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1157,7 +1157,8 @@ void kvm_arch_mmu_write_protect_pt_masked(struct kvm *kvm,
 static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t pfn,
  unsigned long size, bool uncached)
 {
-   __coherent_cache_guest_page(vcpu, pfn, size, uncached);
+   bool need_flush = uncached || !vcpu_has_cache_enabled(vcpu);
+   __coherent_cache_guest_page(pfn, size, need_flush);
 }
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 6458b53731421..56a976c776bc2 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -264,13 +264,12 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu 
*vcpu)
return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
-static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t 
pfn,
-  unsigned long size,
-  bool ipa_uncached)
+static inline void __coherent_cache_guest_page(pfn_t pfn, unsigned long size,
+  bool need_flush)
 {
void *va = page_address(pfn_to_page(pfn));
 
-   if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached)
+   if (need_flush)
kvm_flush_dcache_to_poc(va, size);
 
if (!icache_is_aliasing()) {/* PIPT */
-- 
1.8.3.1




[Qemu-devel] [RFC PATCH 1/6] kvm: promote KVM_MEMSLOT_INCOHERENT to uapi

2015-03-06 Thread Andrew Jones
Signed-off-by: Andrew Jones 
---
 arch/arm/kvm/mmu.c   | 9 +++--
 include/linux/kvm_host.h | 1 -
 include/uapi/linux/kvm.h | 1 +
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index bcc1b3ad2adce..a806e8cecc01b 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1260,7 +1260,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
if (!hugetlb && !force_pte)
hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
 
-   fault_ipa_uncached = memslot->flags & KVM_MEMSLOT_INCOHERENT;
+   fault_ipa_uncached = memslot->flags & KVM_MEM_INCOHERENT;
 
if (hugetlb) {
pmd_t new_pmd = pfn_pmd(pfn, mem_type);
@@ -1784,15 +1784,20 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct 
kvm_memory_slot *free,
 int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
unsigned long npages)
 {
+#if 1
/*
 * Readonly memslots are not incoherent with the caches by definition,
 * but in practice, they are used mostly to emulate ROMs or NOR flashes
 * that the guest may consider devices and hence map as uncached.
 * To prevent incoherency issues in these cases, tag all readonly
 * regions as incoherent.
+*
+* This heuristic can be removed after userspace has been updated to
+* use KVM_MEM_INCOHERENT on readonly regions when necessary.
 */
if (slot->flags & KVM_MEM_READONLY)
-   slot->flags |= KVM_MEMSLOT_INCOHERENT;
+   slot->flags |= KVM_MEM_INCOHERENT;
+#endif
return 0;
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3b934cc94cc83..9dfb519c51e5b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -43,7 +43,6 @@
  * include/linux/kvm_h.
  */
 #define KVM_MEMSLOT_INVALID(1UL << 16)
-#define KVM_MEMSLOT_INCOHERENT (1UL << 17)
 
 /* Two fragments for cross MMIO pages. */
 #define KVM_MAX_MMIO_FRAGMENTS 2
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a37fd1224f363..f7f9432bcf485 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -108,6 +108,7 @@ struct kvm_userspace_memory_region {
  */
 #define KVM_MEM_LOG_DIRTY_PAGES(1UL << 0)
 #define KVM_MEM_READONLY   (1UL << 1)
+#define KVM_MEM_INCOHERENT (1UL << 2)
 
 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
-- 
1.8.3.1




[Qemu-devel] [RFC PATCH 4/6] KVM: ARM: extend __coherent_cache_guest_page

2015-03-06 Thread Andrew Jones
Also support only invalidating, rather than always invalidate+clear.

Signed-off-by: Andrew Jones 
---
 arch/arm/include/asm/kvm_mmu.h   | 7 +--
 arch/arm/kvm/mmu.c   | 2 +-
 arch/arm64/include/asm/kvm_mmu.h | 7 +--
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index fd801e96fdd3c..a1c7f554f5de8 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -176,7 +176,8 @@ static inline void *kvm_get_hwpgd(struct kvm *kvm)
 
 struct kvm;
 
-#define kvm_flush_dcache_to_poc(a,l)   __cpuc_flush_dcache_area((a), (l))
+#define kvm_flush_dcache_to_poc(a,l)   __cpuc_flush_dcache_area((a), 
(l))
+#define kvm_invalidate_cache_to_poc(a,l)   dmac_unmap_area((a), (l), 0)
 
 static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 {
@@ -184,7 +185,7 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu 
*vcpu)
 }
 
 static inline void __coherent_cache_guest_page(pfn_t pfn, unsigned long size,
-  bool need_flush)
+  bool need_flush, bool invalidate)
 {
/*
 * If we are going to insert an instruction page and the icache is
@@ -214,6 +215,8 @@ static inline void __coherent_cache_guest_page(pfn_t pfn, 
unsigned long size,
 
if (need_flush)
kvm_flush_dcache_to_poc(va, PAGE_SIZE);
+   if (invalidate)
+   kvm_invalidate_cache_to_poc(va, PAGE_SIZE);
 
if (icache_is_pipt())
__cpuc_coherent_user_range((unsigned long)va,
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 781afc712871c..2f3a6581b9200 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1158,7 +1158,7 @@ static void coherent_cache_guest_page(struct kvm_vcpu 
*vcpu, pfn_t pfn,
  unsigned long size, bool uncached)
 {
bool need_flush = uncached || !vcpu_has_cache_enabled(vcpu);
-   __coherent_cache_guest_page(pfn, size, need_flush);
+   __coherent_cache_guest_page(pfn, size, need_flush, false);
 }
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 56a976c776bc2..e1090ad70133d 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -257,7 +257,8 @@ static inline bool kvm_page_empty(void *ptr)
 
 struct kvm;
 
-#define kvm_flush_dcache_to_poc(a,l)   __flush_dcache_area((a), (l))
+#define kvm_flush_dcache_to_poc(a,l)   __flush_dcache_area((a), (l))
+#define kvm_invalidate_cache_to_poc(a,l)   __dma_unmap_area((a), (l), 0)
 
 static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 {
@@ -265,12 +266,14 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu 
*vcpu)
 }
 
 static inline void __coherent_cache_guest_page(pfn_t pfn, unsigned long size,
-  bool need_flush)
+  bool need_flush, bool invalidate)
 {
void *va = page_address(pfn_to_page(pfn));
 
if (need_flush)
kvm_flush_dcache_to_poc(va, size);
+   if (invalidate)
+   kvm_invalidate_cache_to_poc(va, size);
 
if (!icache_is_aliasing()) {/* PIPT */
flush_icache_range((unsigned long)va,
-- 
1.8.3.1




[Qemu-devel] [RFC PATCH 0/6] flush/invalidate on entry/exit

2015-03-06 Thread Andrew Jones
Userspace flags memory regions as incoherent and kvm flushes/
invalidates those regions on entry/exit from userspace.

 Result before patch 6/6: restores coherency, way t sloow
 Result with patch 6/6: fast again - well, we removed the code...

Andrew Jones (6):
  kvm: promote KVM_MEMSLOT_INCOHERENT to uapi
  KVM: Introduce incoherent cache maintenance API
  KVM: ARM: change __coherent_cache_guest_page interface
  KVM: ARM: extend __coherent_cache_guest_page
  KVM: ARM: implement kvm_*_incoherent_memory_regions
  KVM: ARM: no need for kvm_arch_flush_incoherent

 arch/arm/include/asm/kvm_mmu.h| 12 +++
 arch/arm/include/uapi/asm/kvm.h   |  1 +
 arch/arm/kvm/arm.c|  4 +++
 arch/arm/kvm/mmu.c| 72 +--
 arch/arm64/include/asm/kvm_mmu.h  | 12 ---
 arch/arm64/include/uapi/asm/kvm.h |  1 +
 include/linux/kvm_host.h  | 15 +++-
 include/uapi/linux/kvm.h  |  1 +
 virt/kvm/kvm_main.c   | 45 +++-
 9 files changed, 147 insertions(+), 16 deletions(-)

-- 
1.8.3.1




[Qemu-devel] [RFC PATCH 2/6] HACK: linux header update

2015-03-06 Thread Andrew Jones
Should do a proper update-linux-headers.sh update.

Signed-off-by: Andrew Jones 
---
 linux-headers/linux/kvm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 12045a11c036b..d04e2d781c43b 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -108,6 +108,7 @@ struct kvm_userspace_memory_region {
  */
 #define KVM_MEM_LOG_DIRTY_PAGES(1UL << 0)
 #define KVM_MEM_READONLY   (1UL << 1)
+#define KVM_MEM_INCOHERENT (1UL << 2)
 
 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
-- 
1.8.3.1




[Qemu-devel] [RFC PATCH 2/6] KVM: Introduce incoherent cache maintenance API

2015-03-06 Thread Andrew Jones
Add two new memslot functions to the API

  kvm_flush_incoherent_memory_regions
flush all KVM_MEM_INCOHERENT memslot addresses

  kvm_invalidate_incoherent_memory_regions
invalidate all KVM_MEM_INCOHERENT memslot addresses

Signed-off-by: Andrew Jones 
---
 include/linux/kvm_host.h | 14 ++
 virt/kvm/kvm_main.c  | 45 -
 2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9dfb519c51e5b..2bdbeeb1b2704 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -517,6 +517,8 @@ int kvm_set_memory_region(struct kvm *kvm,
  struct kvm_userspace_memory_region *mem);
 int __kvm_set_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
+void kvm_flush_incoherent_memory_regions(struct kvm *kvm);
+void kvm_invalidate_incoherent_memory_regions(struct kvm *kvm);
 void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
   struct kvm_memory_slot *dont);
 int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
@@ -705,6 +707,18 @@ static inline bool kvm_arch_has_noncoherent_dma(struct kvm 
*kvm)
 }
 #endif
 
+#ifdef __KVM_HAVE_INCOHERENT_MEM
+void kvm_arch_flush_incoherent(struct kvm *kvm, struct kvm_memory_slot *slot);
+void kvm_arch_invalidate_incoherent(struct kvm *kvm, struct kvm_memory_slot 
*slot);
+#else
+void kvm_arch_flush_incoherent(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+}
+void kvm_arch_invalidate_incoherent(struct kvm *kvm, struct kvm_memory_slot 
*slot)
+{
+}
+#endif
+
 static inline wait_queue_head_t *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
 {
 #ifdef __KVM_HAVE_ARCH_WQP
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 449c1c9ee68b4..96f44c57b8808 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -716,7 +716,7 @@ static void update_memslots(struct kvm_memslots *slots,
 
 static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
 {
-   u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
+   u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_INCOHERENT;
 
 #ifdef __KVM_HAVE_READONLY_MEM
valid_flags |= KVM_MEM_READONLY;
@@ -960,6 +960,49 @@ static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
return kvm_set_memory_region(kvm, mem);
 }
 
+static inline bool memory_region_is_incoherent(struct kvm_memory_slot *slot)
+{
+   return slot && slot->id < KVM_USER_MEM_SLOTS
+   && !(slot->flags & KVM_MEMSLOT_INVALID)
+   && slot->flags & KVM_MEM_INCOHERENT;
+}
+
+void kvm_flush_incoherent_memory_regions(struct kvm *kvm)
+{
+   struct kvm_memslots *slots = kvm_memslots(kvm);
+   struct kvm_memory_slot *slot;
+   int idx;
+
+   idx = srcu_read_lock(&kvm->srcu);
+
+   kvm_for_each_memslot(slot, slots) {
+   if (!memory_region_is_incoherent(slot))
+   continue;
+   kvm_arch_flush_incoherent(kvm, slot);
+   }
+   srcu_read_unlock(&kvm->srcu, idx);
+}
+EXPORT_SYMBOL_GPL(kvm_flush_incoherent_memory_regions);
+
+void kvm_invalidate_incoherent_memory_regions(struct kvm *kvm)
+{
+   struct kvm_memslots *slots = kvm_memslots(kvm);
+   struct kvm_memory_slot *slot;
+   int idx;
+
+   idx = srcu_read_lock(&kvm->srcu);
+
+   kvm_for_each_memslot(slot, slots) {
+   if (slot->flags & KVM_MEM_READONLY)
+   continue;
+   if (!memory_region_is_incoherent(slot))
+   continue;
+   kvm_arch_invalidate_incoherent(kvm, slot);
+   }
+   srcu_read_unlock(&kvm->srcu, idx);
+}
+EXPORT_SYMBOL_GPL(kvm_invalidate_incoherent_memory_regions);
+
 int kvm_get_dirty_log(struct kvm *kvm,
struct kvm_dirty_log *log, int *is_dirty)
 {
-- 
1.8.3.1




[Qemu-devel] [RFC PATCH 6/6] KVM: ARM: no need for kvm_arch_flush_incoherent

2015-03-06 Thread Andrew Jones
kvm_arch_flush_incoherent makes things too slow, and we don't
need it. Userspace can flush for us, as the necessary cache
maintenance instruction is not (necessarily) privileged.

Signed-off-by: Andrew Jones 
---
 arch/arm/kvm/mmu.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 2f45db9cd436a..b2d87587a9d79 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1822,6 +1822,11 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct 
kvm_memory_slot *slot,
 
 void kvm_arch_flush_incoherent(struct kvm *kvm, struct kvm_memory_slot *slot)
 {
+   /*
+* We no longer need this function, now that userspace does the
+* flushing.
+*/
+#if 0
if (slot->flags & KVM_MEM_READONLY) {
/*
 * Readonly memory shouldn't be changing, and we do a
@@ -1840,6 +1845,7 @@ void kvm_arch_flush_incoherent(struct kvm *kvm, struct 
kvm_memory_slot *slot)
 */
 
coherent_cache_memslot(slot, true);
+#endif
 }
 
 void kvm_arch_invalidate_incoherent(struct kvm *kvm, struct kvm_memory_slot 
*slot)
-- 
1.8.3.1




[Qemu-devel] [RFC PATCH 5/6] KVM: ARM: implement kvm_*_incoherent_memory_regions

2015-03-06 Thread Andrew Jones
Add the kvm_*_incoherent_memory_regions calls to arm's
kvm_arch_vcpu_ioctl_run and implement the corresponding
arch flush/invalidate functions.

Signed-off-by: Andrew Jones 
---
 arch/arm/include/uapi/asm/kvm.h   |  1 +
 arch/arm/kvm/arm.c|  4 +++
 arch/arm/kvm/mmu.c| 54 +++
 arch/arm64/include/uapi/asm/kvm.h |  1 +
 4 files changed, 60 insertions(+)

diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index 09ee408c1a676..cb0898a995c4f 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -26,6 +26,7 @@
 #define __KVM_HAVE_GUEST_DEBUG
 #define __KVM_HAVE_IRQ_LINE
 #define __KVM_HAVE_READONLY_MEM
+#define __KVM_HAVE_INCOHERENT_MEM
 
 #define KVM_REG_SIZE(id)   \
(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index c9e6ef1f7403a..789c03c84e7c0 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -486,6 +486,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
return ret;
}
 
+   kvm_flush_incoherent_memory_regions(vcpu->kvm);
+
if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
 
@@ -556,6 +558,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
ret = handle_exit(vcpu, run, ret);
}
 
+   kvm_invalidate_incoherent_memory_regions(vcpu->kvm);
+
if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, &sigsaved, NULL);
return ret;
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 2f3a6581b9200..2f45db9cd436a 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1161,6 +1161,24 @@ static void coherent_cache_guest_page(struct kvm_vcpu 
*vcpu, pfn_t pfn,
__coherent_cache_guest_page(pfn, size, need_flush, false);
 }
 
+static void coherent_cache_memslot(struct kvm_memory_slot *slot, bool flush)
+{
+   gfn_t gfn, end = slot->base_gfn + slot->npages;
+   pfn_t pfn;
+
+   for (gfn = slot->base_gfn; gfn < end; ++gfn) {
+   pfn = gfn_to_pfn_memslot(slot, gfn);
+   if (is_error_pfn(pfn)) {
+   pr_err("%s: Bad pfn: gfn=%llx, pfn=%llx, "
+   "userspace_addr=%lx\n", __func__,
+   gfn, pfn, slot->userspace_addr);
+   continue;
+   }
+   __coherent_cache_guest_page(pfn, PAGE_SIZE, flush, !flush);
+   kvm_release_pfn_clean(pfn);
+   }
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
  struct kvm_memory_slot *memslot, unsigned long hva,
  unsigned long fault_status)
@@ -1802,6 +1820,42 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct 
kvm_memory_slot *slot,
return 0;
 }
 
+void kvm_arch_flush_incoherent(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+   if (slot->flags & KVM_MEM_READONLY) {
+   /*
+* Readonly memory shouldn't be changing, and we do a
+* clean+invalidate for KVM_MEM_INCOHERENT memory when
+* faulting it in. So, there's nothing to do now.
+*/
+   return;
+   }
+
+   /*
+* Ideally, we would further filter out all pages not touched by
+* userspace on the last exit. No way to know those though, unless
+* we force userspace to fault on all pages in the incoherent
+* memory regions, but even then, I don't see any sane way for
+* do_wp_page to handle the faults without modification. So, sigh...
+*/
+
+   coherent_cache_memslot(slot, true);
+}
+
+void kvm_arch_invalidate_incoherent(struct kvm *kvm, struct kvm_memory_slot 
*slot)
+{
+   if (slot->flags & KVM_MEM_LOG_DIRTY_PAGES) {
+   /*
+* We fault each write when logging is enabled, and do a
+* clean+invalidate on KVM_MEM_INCOHERENT memory while
+* handling the fault. So, there's nothing to do now.
+*/
+   return;
+   }
+
+   coherent_cache_memslot(slot, false);
+}
+
 void kvm_arch_memslots_updated(struct kvm *kvm)
 {
 }
diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index 8e38878c87c61..29ddf77958c2a 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -38,6 +38,7 @@
 #define __KVM_HAVE_GUEST_DEBUG
 #define __KVM_HAVE_IRQ_LINE
 #define __KVM_HAVE_READONLY_MEM
+#define __KVM_HAVE_INCOHERENT_MEM
 
 #define KVM_REG_SIZE(id)   \
(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
-- 
1.8.3.1




[Qemu-devel] [PATCH 3/6] kvm-all: put kvm_mem_flags to more work

2015-03-06 Thread Andrew Jones
Currently kvm_mem_flags just translates bools to bits, let's
make it also determine the bools first. This avoids its parameter
list growing each time we add a flag.

Signed-off-by: Andrew Jones 
---
Posted this, as it makes sense without this series.
http://lists.gnu.org/archive/html/qemu-devel/2015-03/msg01225.html

 kvm-all.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 05a79c20e0bba..507fa7204e062 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -294,10 +294,14 @@ err:
  * dirty pages logging control
  */
 
-static int kvm_mem_flags(KVMState *s, bool log_dirty, bool readonly)
+static int kvm_mem_flags(MemoryRegion *mr)
 {
+bool readonly = mr->readonly || memory_region_is_romd(mr);
 int flags = 0;
-flags = log_dirty ? KVM_MEM_LOG_DIRTY_PAGES : 0;
+
+if (memory_region_is_logging(mr)) {
+flags |= KVM_MEM_LOG_DIRTY_PAGES;
+}
 if (readonly && kvm_readonly_mem_allowed) {
 flags |= KVM_MEM_READONLY;
 }
@@ -312,7 +316,10 @@ static int kvm_slot_dirty_pages_log_change(KVMSlot *mem, 
bool log_dirty)
 
 old_flags = mem->flags;
 
-flags = (mem->flags & ~mask) | kvm_mem_flags(s, log_dirty, false);
+flags = mem->flags & ~mask;
+if (log_dirty) {
+flags |= KVM_MEM_LOG_DIRTY_PAGES;
+}
 mem->flags = flags;
 
 /* If nothing changed effectively, no need to issue ioctl */
@@ -642,9 +649,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, 
bool add)
 KVMSlot *mem, old;
 int err;
 MemoryRegion *mr = section->mr;
-bool log_dirty = memory_region_is_logging(mr);
 bool writeable = !mr->readonly && !mr->rom_device;
-bool readonly_flag = mr->readonly || memory_region_is_romd(mr);
 hwaddr start_addr = section->offset_within_address_space;
 ram_addr_t size = int128_get64(section->size);
 void *ram = NULL;
@@ -688,7 +693,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, 
bool add)
 (ram - start_addr == mem->ram - mem->start_addr)) {
 /* The new slot fits into the existing one and comes with
  * identical parameters - update flags and done. */
-kvm_slot_dirty_pages_log_change(mem, log_dirty);
+kvm_slot_dirty_pages_log_change(mem, memory_region_is_logging(mr));
 return;
 }
 
@@ -721,7 +726,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, 
bool add)
 mem->memory_size = old.memory_size;
 mem->start_addr = old.start_addr;
 mem->ram = old.ram;
-mem->flags = kvm_mem_flags(s, log_dirty, readonly_flag);
+mem->flags = kvm_mem_flags(mr);
 
 err = kvm_set_user_memory_region(s, mem);
 if (err) {
@@ -742,7 +747,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, 
bool add)
 mem->memory_size = start_addr - old.start_addr;
 mem->start_addr = old.start_addr;
 mem->ram = old.ram;
-mem->flags =  kvm_mem_flags(s, log_dirty, readonly_flag);
+mem->flags =  kvm_mem_flags(mr);
 
 err = kvm_set_user_memory_region(s, mem);
 if (err) {
@@ -766,7 +771,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, 
bool add)
 size_delta = mem->start_addr - old.start_addr;
 mem->memory_size = old.memory_size - size_delta;
 mem->ram = old.ram + size_delta;
-mem->flags = kvm_mem_flags(s, log_dirty, readonly_flag);
+mem->flags = kvm_mem_flags(mr);
 
 err = kvm_set_user_memory_region(s, mem);
 if (err) {
@@ -788,7 +793,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, 
bool add)
 mem->memory_size = size;
 mem->start_addr = start_addr;
 mem->ram = ram;
-mem->flags = kvm_mem_flags(s, log_dirty, readonly_flag);
+mem->flags = kvm_mem_flags(mr);
 
 err = kvm_set_user_memory_region(s, mem);
 if (err) {
-- 
1.8.3.1




[Qemu-devel] [RFC PATCH 0/6] support KVM_MEM_INCOHERENT

2015-03-06 Thread Andrew Jones
Add support for the new KVM_MEM_INCOHERENT flag, and flag
appropriate memory. (Only flags vram for now.)

Patch 6/6 doesn't appear to be complete. While the VGA output is
99% corruption free, it's not perfect, so it's missing flushes
somewhere... 

Andrew Jones (6):
  memory: add incoherent cache flag
  HACK: linux header update
  kvm-all: put kvm_mem_flags to more work
  kvm-all: set KVM_MEM_INCOHERENT
  vga: flag vram as incoherent
  memory: add clear_cache_to_poc

 exec.c| 16 ++--
 hw/display/vga.c  |  1 +
 include/exec/exec-all.h   | 41 +
 include/exec/memory.h | 23 +++
 kvm-all.c | 28 ++--
 linux-headers/linux/kvm.h |  1 +
 memory.c  | 15 +++
 7 files changed, 109 insertions(+), 16 deletions(-)

-- 
1.8.3.1




[Qemu-devel] [RFC/WIP PATCH 6/6] memory: add clear_cache_to_poc

2015-03-06 Thread Andrew Jones
Add a function that flushes the cache to PoC. We need a new
function because __builtin___clear_cache only flushes to
PoU. Call this function each time an address in a memory
region that has been flagged as having an incoherent cache
is written. For starters we only implement it for ARM. Most
other architectures don't need it anyway.

Signed-off-by: Andrew Jones 
---
Currently only implemented for aarch64, doesn't completely work yet.

 exec.c  | 16 ++--
 include/exec/exec-all.h | 41 +
 2 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index c85321a38ba69..68268a5961ff5 100644
--- a/exec.c
+++ b/exec.c
@@ -2261,7 +2261,7 @@ int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr,
 
 #else
 
-static void invalidate_and_set_dirty(hwaddr addr,
+static void invalidate_and_set_dirty(MemoryRegion *mr, hwaddr addr,
  hwaddr length)
 {
 if (cpu_physical_memory_range_includes_clean(addr, length)) {
@@ -2269,6 +2269,10 @@ static void invalidate_and_set_dirty(hwaddr addr,
 cpu_physical_memory_set_dirty_range_nocode(addr, length);
 }
 xen_modified_memory(addr, length);
+if (memory_region_has_incoherent_cache(mr)) {
+char *start = qemu_get_ram_ptr(addr);
+clear_cache_to_poc(start, start + length);
+}
 }
 
 static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
@@ -2348,7 +2352,7 @@ bool address_space_rw(AddressSpace *as, hwaddr addr, 
uint8_t *buf,
 /* RAM case */
 ptr = qemu_get_ram_ptr(addr1);
 memcpy(ptr, buf, l);
-invalidate_and_set_dirty(addr1, l);
+invalidate_and_set_dirty(mr, addr1, l);
 }
 } else {
 if (!memory_access_is_direct(mr, is_write)) {
@@ -2437,7 +2441,7 @@ static inline void 
cpu_physical_memory_write_rom_internal(AddressSpace *as,
 switch (type) {
 case WRITE_DATA:
 memcpy(ptr, buf, l);
-invalidate_and_set_dirty(addr1, l);
+invalidate_and_set_dirty(mr, addr1, l);
 break;
 case FLUSH_CACHE:
 flush_icache_range((uintptr_t)ptr, (uintptr_t)ptr + l);
@@ -2622,7 +2626,7 @@ void address_space_unmap(AddressSpace *as, void *buffer, 
hwaddr len,
 mr = qemu_ram_addr_from_host(buffer, &addr1);
 assert(mr != NULL);
 if (is_write) {
-invalidate_and_set_dirty(addr1, access_len);
+invalidate_and_set_dirty(mr, addr1, access_len);
 }
 if (xen_enabled()) {
 xen_invalidate_map_cache_entry(buffer);
@@ -2904,7 +2908,7 @@ static inline void stl_phys_internal(AddressSpace *as,
 stl_p(ptr, val);
 break;
 }
-invalidate_and_set_dirty(addr1, 4);
+invalidate_and_set_dirty(mr, addr1, 4);
 }
 }
 
@@ -2967,7 +2971,7 @@ static inline void stw_phys_internal(AddressSpace *as,
 stw_p(ptr, val);
 break;
 }
-invalidate_and_set_dirty(addr1, 2);
+invalidate_and_set_dirty(mr, addr1, 2);
 }
 }
 
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 8eb0db3910e86..9bf74e791f357 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -106,6 +106,43 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr,
   hwaddr paddr, int prot,
   int mmu_idx, target_ulong size);
 void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr);
+#if defined(__aarch64__)
+static inline void clear_cache_to_poc(char *begin, char *end)
+{
+/* Unfortunately __builtin___clear_cache only flushes
+ * to PoU, we need to implement this for PoC.
+ */
+static unsigned long line_sz = 0;
+unsigned long start, stop, addr;
+
+if (!line_sz) {
+unsigned int ctr_el0;
+asm volatile("mrs %0, ctr_el0" : "=&r" (ctr_el0));
+line_sz = (1UL << ((ctr_el0 >> 16) & 0xf)) * sizeof(int);
+}
+
+start = (unsigned long)begin & ~(line_sz - 1);
+stop = ((unsigned long)(end + line_sz) & ~(line_sz - 1));
+
+for (addr = start; addr < stop; addr += line_sz) {
+asm volatile("dc cvac, %0" : : "r" (addr));
+}
+
+/* FIXME: Ideally, we'd also flush the icache now, just in
+ * case this is for an executable region. But, AArch64 can't
+ * flush it to PoC from userspace. We need a syscall.
+ */
+}
+#elif defined(__arm__)
+static inline void clear_cache_to_poc(char *begin, char *end)
+{
+/* TODO */
+}
+#else
+static inline void clear_cache_to_poc(char *begin, char *end)
+{
+}
+#endif
 #else
 static inline void tlb_flush_page(CPUState *cpu, target_ulong addr)
 {
@@ -114,6 +151,10 @@ static inline void tlb_flush_page(CPUState *cpu, 
target_ulong addr)
 static inline void tlb

[Qemu-devel] [RFC PATCH 1/6] memory: add incoherent cache flag

2015-03-06 Thread Andrew Jones
Add an incoherent cache flag, which indicates the region
needs explicit cache maintenance.

Signed-off-by: Andrew Jones 
---
 include/exec/memory.h | 23 +++
 memory.c  | 15 +++
 2 files changed, 38 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 06ffa1d185b93..c947b88b87241 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -158,6 +158,7 @@ struct MemoryRegion {
 bool rom_device;
 bool warning_printed; /* For reservations */
 bool flush_coalesced_mmio;
+bool incoherent_cache;
 MemoryRegion *alias;
 hwaddr alias_offset;
 int32_t priority;
@@ -778,6 +779,28 @@ void memory_region_set_flush_coalesced(MemoryRegion *mr);
 void memory_region_clear_flush_coalesced(MemoryRegion *mr);
 
 /**
+ * memory_region_set_incoherent_cache: Flag this memory region as needing
+ * explicit cache maintenance.
+ *
+ * @mr: the memory region to be updated.
+ */
+void memory_region_set_incoherent_cache(MemoryRegion *mr);
+
+/**
+ * memory_region_clear_incoherent_cache: Remove the incoherent cache flag.
+ *
+ * @mr: the memory region to be updated.
+ */
+void memory_region_clear_incoherent_cache(MemoryRegion *mr);
+
+/**
+ * memory_region_has_incoherent_cache: Return the incoherent cache flag.
+ *
+ * @mr: the memory region to check.
+ */
+bool memory_region_has_incoherent_cache(MemoryRegion *mr);
+
+/**
  * memory_region_add_eventfd: Request an eventfd to be triggered when a word
  *is written to a location.
  *
diff --git a/memory.c b/memory.c
index 20f6d9eeac737..fa74bcb8c1e4c 100644
--- a/memory.c
+++ b/memory.c
@@ -1549,6 +1549,21 @@ void memory_region_clear_flush_coalesced(MemoryRegion 
*mr)
 }
 }
 
+void memory_region_set_incoherent_cache(MemoryRegion *mr)
+{
+mr->incoherent_cache = true;
+}
+
+void memory_region_clear_incoherent_cache(MemoryRegion *mr)
+{
+mr->incoherent_cache = false;
+}
+
+bool memory_region_has_incoherent_cache(MemoryRegion *mr)
+{
+return mr->incoherent_cache;
+}
+
 void memory_region_add_eventfd(MemoryRegion *mr,
hwaddr addr,
unsigned size,
-- 
1.8.3.1




[Qemu-devel] [RFC PATCH 4/6] kvm-all: set KVM_MEM_INCOHERENT

2015-03-06 Thread Andrew Jones
Signed-off-by: Andrew Jones 
---
 kvm-all.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kvm-all.c b/kvm-all.c
index 507fa7204e062..924b4a0bec21c 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -305,6 +305,9 @@ static int kvm_mem_flags(MemoryRegion *mr)
 if (readonly && kvm_readonly_mem_allowed) {
 flags |= KVM_MEM_READONLY;
 }
+if (memory_region_has_incoherent_cache(mr)) {
+flags |= KVM_MEM_INCOHERENT;
+}
 return flags;
 }
 
-- 
1.8.3.1




[Qemu-devel] [RFC PATCH 5/6] vga: flag vram as incoherent

2015-03-06 Thread Andrew Jones
Signed-off-by: Andrew Jones 
---
 hw/display/vga.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/display/vga.c b/hw/display/vga.c
index c8c49abc6e8ba..2b74eb8e96462 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -2135,6 +2135,7 @@ void vga_common_init(VGACommonState *s, Object *obj, bool 
global_vmstate)
 s->is_vbe_vmstate = 1;
 memory_region_init_ram(&s->vram, obj, "vga.vram", s->vram_size,
&error_abort);
+memory_region_set_incoherent_cache(&s->vram);
 vmstate_register_ram(&s->vram, global_vmstate ? NULL : DEVICE(obj));
 xen_register_framebuffer(&s->vram);
 s->vram_ptr = memory_region_get_ram_ptr(&s->vram);
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH 1/5] target-arm: convert check_ap to get_rw_prot

2015-03-10 Thread Andrew Jones
On Tue, Mar 10, 2015 at 03:12:20PM +, Peter Maydell wrote:
> On 12 February 2015 at 15:05, Andrew Jones  wrote:
> > +*prot = get_rw_prot(env, mmu_idx, ap, domain_prot);
> > +*prot |= *prot && !xn ? PAGE_EXEC : 0;
> 
> I'm not a great fan of complicated ?: expressions, incidentally;
> I think this is clearer to read as
>if (*prot && !xn) {
>*prot |= PAGE_EXEC;
>}
> 
> -- PMM

OK, I'll change it if we need a v2.

Thanks,
drew



Re: [Qemu-devel] [PATCH 2/5] target-arm: enable get_rw_prot to take simple AP

2015-03-10 Thread Andrew Jones
On Tue, Mar 10, 2015 at 03:22:55PM +, Peter Maydell wrote:
> On 12 February 2015 at 15:05, Andrew Jones  wrote:
> > Teach get_rw_prot about the simple AP format AP[2:1]. An additional
> > switch was added, as opposed to converting ap := AP[2:1] to AP[2:0]
> > with a simple shift - and then modifying cases 0,2,4,6, because the
> > resulting code is easier to read with the switch.
> >
> > Signed-off-by: Andrew Jones 
> > ---
> >  target-arm/helper.c | 22 +-
> >  1 file changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/target-arm/helper.c b/target-arm/helper.c
> > index 610f305c4d661..b63ec7b7979f9 100644
> > --- a/target-arm/helper.c
> > +++ b/target-arm/helper.c
> > @@ -4698,12 +4698,32 @@ static inline bool regime_is_user(CPUARMState *env, 
> > ARMMMUIdx mmu_idx)
> >  static inline int get_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
> >int ap, int domain_prot)
> >  {
> > +bool simple_ap = regime_using_lpae_format(env, mmu_idx)
> > + || (regime_sctlr(env, mmu_idx) & SCTLR_AFE);
> 
> We should check arm_feature(env, ARM_FEATURE_V6K) && (SCTLR.AFE is set);
> that bit isn't defined til v6K.

Indeed. Will send v2 for that.

> 
> > +bool domain_prot_valid = !regime_using_lpae_format(env, mmu_idx);
> 
> Given that the lpae code path is totally separate (and not even
> calling this function yet), can't you just have it pass in a
> zero domain_prot ? Or have the callers do the domain protection
> check themselves...

domain_prot=0 is a valid access permission (no access), so I didn't
want to overload the meaning with 'not used'. I can move the check to
the callers that need it though. It would actually be nice to remove
the need for a 0 place holder from the other callers.

> 
> >  bool is_user = regime_is_user(env, mmu_idx);
> >
> > -if (domain_prot == 3) {
> > +if (domain_prot_valid && domain_prot == 3) {
> >  return PAGE_READ | PAGE_WRITE;
> >  }
> >
> > +/* ap is AP[2:1] */
> > +if (simple_ap) {
> > +switch (ap) {
> > +case 0:
> > +return is_user ? 0 : PAGE_READ | PAGE_WRITE;
> > +case 1:
> > +return PAGE_READ | PAGE_WRITE;
> > +case 2:
> > +return is_user ? 0 : PAGE_READ;
> > +case 3:
> > +return PAGE_READ;
> > +default:
> > +g_assert_not_reached();
> > +}
> > +}
> 
> I'm confused. Even if we're using the simple-permissions
> model, the ap parameter is still AP[2:0]. Shouldn't this
> switch be for cases 0, 2, 4, 6 ?

Depends on how we choose to implement the callers. Currently
I only require the caller to send in 2 bits for the simple
model. If we want to require them to send in 3, then we'll
need to shift a zero in for the lpae caller, rather than
shift a zero out for the v6 caller.

> 
> > +
> > +/* ap is AP[2:0] */
> >  switch (ap) {
> >  case 0:
> >  if (arm_feature(env, ARM_FEATURE_V7)) {
> > --
> > 1.9.3
> >
> 
> -- PMM



Re: [Qemu-devel] [PATCH 4/5] target-arm: get_phys_addr_lpae: more xn control

2015-03-10 Thread Andrew Jones
On Tue, Mar 10, 2015 at 03:56:11PM +, Peter Maydell wrote:
> On 12 February 2015 at 15:05, Andrew Jones  wrote:
> > This patch makes the following changes to the determination of
> > whether an address is executable, when translating addresses
> > using LPAE.
> >
> > 1. No longer assumes that PL0 can't execute when it can't read.
> >It can in AArch64, a difference from AArch32.
> > 2. Use va_size == 64 to determine we're in AArch64, rather than
> >arm_feature(env, ARM_FEATURE_V8), which is insufficient.
> > 3. Add additional XN determinants
> >- NS && is_secure && (SCR & SCR_SIF)
> >- WXN && (prot & PAGE_WRITE)
> >- AArch64: (prot_PL0 & PAGE_WRITE)
> >- AArch32: UWXN && (prot_PL0 & PAGE_WRITE)
> >- XN determination should also work in secure mode (untested)
> >- XN may even work in EL2 (currently impossible to test)
> > 4. Cleans up the bloated PAGE_EXEC condition - by removing it.
> >
> > The helper get_S1prot is introduced, which also handles short-
> > descriptors and v6 XN determination. It may even work in EL2,
> > when support for that comes, but, as the function name implies,
> > it only works for stage 1 translations.
> >
> > Signed-off-by: Andrew Jones 
> > ---
> >  target-arm/helper.c | 111 
> > 
> >  1 file changed, 86 insertions(+), 25 deletions(-)
> >
> > diff --git a/target-arm/helper.c b/target-arm/helper.c
> > index df78f481e92f4..20e5753bd216d 100644
> > --- a/target-arm/helper.c
> > +++ b/target-arm/helper.c
> > @@ -4695,8 +4695,8 @@ static inline bool regime_is_user(CPUARMState *env, 
> > ARMMMUIdx mmu_idx)
> >  /* Translate section/page access permissions to page
> >   * R/W protection flags
> >   */
> > -static inline int get_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
> > -  bool is_user, int ap, int domain_prot)
> > +static int get_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
> > +   bool is_user, int ap, int domain_prot)
> 
> ...why does this suddenly lose its 'inline' ?

Adding another caller, and thought it was a bit fat for explicit
inlining, but have no problem returning it.

> 
> >  {
> >  bool simple_ap = regime_using_lpae_format(env, mmu_idx)
> >   || (regime_sctlr(env, mmu_idx) & SCTLR_AFE);
> > @@ -4762,6 +4762,84 @@ static inline int get_rw_prot(CPUARMState *env, 
> > ARMMMUIdx mmu_idx,
> >  }
> >  }
> >
> > +/* Translate section/page access permissions to protection flags */
> 
> This is LPAE-format only so it would be nice to mention that in the comment
> and function name.

Not after the next patch. I probably should have made it completely
LPAE-only first, then added two more patches, one preparing it for
non-LPAE, and then the next patch, or just done a better job pointing
that out in the commit message.

> 
> > +static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
> > +  int ap, int domain_prot, int ns, int xn, int pxn)
> > +{
> > +bool domain_prot_valid = !regime_using_lpae_format(env, mmu_idx);
> 
> But this is always false!

Same response as above - not after next patch.

> 
> > +bool is_user = regime_is_user(env, mmu_idx);
> > +bool have_wxn;
> > +int prot_rw, user_rw;
> > +int wxn = 0;
> > +
> > +assert(mmu_idx != ARMMMUIdx_S2NS);
> > +
> > +if (domain_prot_valid && domain_prot == 3) {
> > +return PAGE_READ | PAGE_WRITE | PAGE_EXEC;
> > +}
> > +
> > +user_rw = get_rw_prot(env, mmu_idx, true, ap, domain_prot);
> > +if (is_user) {
> > +prot_rw = user_rw;
> > +} else {
> > +prot_rw = get_rw_prot(env, mmu_idx, false, ap, domain_prot);
> > +}
> 
> I think it would be much better not to try to use the short-descriptor
> get_rw_prot function. For one thing, we know for definite that we
> won't be using the old-fashioned AP[2:0] access format, and that
> we don't have to worry about the domain protection stuff. So it's
> much simpler and better not to tangle it up with the legacy stuff.
> (Pull the simple-access-permissions check out into its own function
> if you like.)

legacy stuff is here for next patch too

> 
> For instance, you're missing a shift here on the ap bits, because
> get_rw_prot needs AP[2:0] and 'ap' here is AP[2:1].

Don't need the shift because get_rw_prot support

Re: [Qemu-devel] [PATCH 5/5] target-arm: apply get_S1prot to get_phys_addr_v6

2015-03-10 Thread Andrew Jones
On Tue, Mar 10, 2015 at 03:57:27PM +, Peter Maydell wrote:
> On 12 February 2015 at 17:08, Andrew Jones  wrote:
> > On Thu, Feb 12, 2015 at 04:05:07PM +0100, Andrew Jones wrote:
> >> Now that we have get_S1prot, we can apply it to get_phys_addr_v6
> >> for a minor code cleanup.
> 
> I think this is a bad idea -- better to keep the long
> and short descriptor code paths separate. It's too easy
> to get confused otherwise.

I don't mind keeping them separate, but I disagree with it being
confusing keeping them together :-)

> 
> > Actually, I should point out that this isn't just a cleanup, but
> > also a fix. See below.
> 
> > The original code didn't take into account that it may be calling check_ap
> > with a simple AP, AP[2:1].
> 
> No, because check_ap() always takes AP[2:0]...

No, it's really wrong. It's not the 2 vs. 3 bit issue that's the
problem, it's the cases. You snipped most of my reply to myself.
This part is pertinent

> As a simple AP wouldn't be properly translated to protection flags with
> check_ap (except for case 6), then I think this should have caused some
> problems. Maybe this path just hasn't been tested? I don't see CR_AFE
> getting used by Linux, so possibly not.

So, yes, a simple (3-bit) ap would be handled by the 8-case switch with
cases 0, 2, 4, and 6, but only case 6 would give the correct result.

Thanks for the review.

drew



Re: [Qemu-devel] [PATCH 2/5] target-arm: enable get_rw_prot to take simple AP

2015-03-10 Thread Andrew Jones
On Tue, Mar 10, 2015 at 04:41:45PM +, Peter Maydell wrote:
> On 10 March 2015 at 16:32, Andrew Jones  wrote:
> > On Tue, Mar 10, 2015 at 03:22:55PM +, Peter Maydell wrote:
> >> I'm confused. Even if we're using the simple-permissions
> >> model, the ap parameter is still AP[2:0]. Shouldn't this
> >> switch be for cases 0, 2, 4, 6 ?
> >
> > Depends on how we choose to implement the callers. Currently
> > I only require the caller to send in 2 bits for the simple
> > model. If we want to require them to send in 3, then we'll
> > need to shift a zero in for the lpae caller, rather than
> > shift a zero out for the v6 caller.
> 
> You have to have the callers just pass in AP[2:0], unless
> you want them to have to duplicate the "are we using the
> simple permissions model?" condition to figure out whether
> to shift the argument around, which doesn't seem very
> sensible.
>

The v6 caller is the only one that could be either-or, and it
already checked regime_sctlr(env, mmu_idx) & SCTLR_AFE, as it
wanted to see if it cared about the access flag anyway. I
suspect any new callers that could be either-or would do the
same.

drew



Re: [Qemu-devel] [PATCH 4/5] target-arm: get_phys_addr_lpae: more xn control

2015-03-10 Thread Andrew Jones
On Tue, Mar 10, 2015 at 04:55:53PM +, Peter Maydell wrote:
> On 10 March 2015 at 16:48, Andrew Jones  wrote:
> > On Tue, Mar 10, 2015 at 03:56:11PM +, Peter Maydell wrote:
> 
> >> For instance, you're missing a shift here on the ap bits, because
> >> get_rw_prot needs AP[2:0] and 'ap' here is AP[2:1].
> >
> > Don't need the shift because get_rw_prot supports the 2-bit format.
> 
> No it doesn't...

Yes it does :-) That's the support patch 2/5 adds.

> 
> >> Doesn't this lose us the "you need read permission to execute"
> >> check (for 32-bit)? Something in here should be doing a
> >> PAGE_READ check to see if we can have PAGE_EXEC.
> >
> > It's there. It's the '!user_rw' and the '!prot_rw'
> 
> Ah yes, and that works because you can't have a page which
> is writable but not readable (which is what I'd forgotten).
> 
> -- PMM



Re: [Qemu-devel] [PATCH 5/5] target-arm: apply get_S1prot to get_phys_addr_v6

2015-03-10 Thread Andrew Jones
On Tue, Mar 10, 2015 at 05:03:48PM +, Peter Maydell wrote:
> On 10 March 2015 at 16:54, Andrew Jones  wrote:
> > On Tue, Mar 10, 2015 at 03:57:27PM +, Peter Maydell wrote:
> >> On 12 February 2015 at 17:08, Andrew Jones  wrote:
> >> > Actually, I should point out that this isn't just a cleanup, but
> >> > also a fix. See below.
> >>
> >> > The original code didn't take into account that it may be calling 
> >> > check_ap
> >> > with a simple AP, AP[2:1].
> >>
> >> No, because check_ap() always takes AP[2:0]...
> >
> > No, it's really wrong. It's not the 2 vs. 3 bit issue that's the
> > problem, it's the cases. You snipped most of my reply to myself.
> > This part is pertinent
> >
> >> As a simple AP wouldn't be properly translated to protection flags with
> >> check_ap (except for case 6), then I think this should have caused some
> >> problems. Maybe this path just hasn't been tested? I don't see CR_AFE
> >> getting used by Linux, so possibly not.
> >
> > So, yes, a simple (3-bit) ap would be handled by the 8-case switch with
> > cases 0, 2, 4, and 6, but only case 6 would give the correct result.
> 
> Well, we didn't correctly support the simple permission model at all
> before your patches. But the point is that check_ap() always takes
> AP[2:0] regardless.
> 
> Hmm, or you could have check_simple_ap() be totally separate
> from check_ap(), and call it from inside the check in the v6
> code path for AFE that we already have.

Agreed. Creating a check_simple_ap function for the new switch added
by 2/5 would look better.

drew



Re: [Qemu-devel] [PATCH 4/5] target-arm: get_phys_addr_lpae: more xn control

2015-03-10 Thread Andrew Jones
On Tue, Mar 10, 2015 at 05:14:21PM +, Peter Maydell wrote:
> On 10 March 2015 at 17:02, Andrew Jones  wrote:
> > On Tue, Mar 10, 2015 at 04:55:53PM +, Peter Maydell wrote:
> >> On 10 March 2015 at 16:48, Andrew Jones  wrote:
> >> > On Tue, Mar 10, 2015 at 03:56:11PM +, Peter Maydell wrote:
> >>
> >> >> For instance, you're missing a shift here on the ap bits, because
> >> >> get_rw_prot needs AP[2:0] and 'ap' here is AP[2:1].
> >> >
> >> > Don't need the shift because get_rw_prot supports the 2-bit format.
> >>
> >> No it doesn't...
> >
> > Yes it does :-) That's the support patch 2/5 adds.
> 
> No, because patch 2 doesn't do anything in the callers to
> make them pass only bits [2:1]. So after patch 2 get_rw_prot
> still requires bits [2:0]. Except it's broken, because the
> function itself assumes it gets bits [2:1].

You've lost me. Patch 2 adds support for 2-bit ap, but
doesn't remove support for 3-bit. There are not callers
expecting it to support the simple model as 2 or 3 bits
at that time, except v6, but that was broken already, and
we fix it in patch 5 (and we add the ap shift there too).
IOW, how can preparing a function for new callers, while
still supporting the old callers, be 'broken'?

> 
> Having thought a bit more about it, I think we should
> just have two totally separate functions for the old
> style and simplified permission checks, because we can
> always call the correct one (always old for v5, either
> one for v6 since we already have the SCTLR.AFE check,
> and the simplified one for the lpae code path).

I like that.

Thanks,
drew



[Qemu-devel] [PATCH v2 0/3] tcg-arm: LPAE: fix and extend xn control

2015-03-10 Thread Andrew Jones
This series fixes and extends the determination of whether or
not an address is executable for LPAE translations. The main
patch is 3/3, and describes the details in its commit message.
Patch 1/3 prepares for patch 2/3, which is prep for 3/3, and also
fixes a potential problem with checking access permissions on
v6 processors when they use the simple AP format.

Changes since v1:
* There are now 3 patches instead of 5. This is due to approaching
  the prep patches in a different way. Most notably, simple AP
  format is now handled with its own function. [Peter Maydell]
* A check of SCTLR_AFE is guarded with ARM_FEATURE_V6K [Peter Maydell]
* All other cleanups suggested by Peter Maydell.

Tested by booting Linux on mach-virt with cortex-a15 and cortex-a57
(just up to 'Unable to mount root fs'), and also with a kvm-unit-tests
test. The curious can check out the unit test here [1].

Thanks in advance for reviews!

drew

[1] 
https://github.com/rhdrjones/kvm-unit-tests/commit/ee553e4bb795b0150e31c794bf8953dfb08d619a

Andrew Jones (3):
  target-arm: convert check_ap to ap_to_rw_prot
  target-arm: fix get_phys_addr_v6/SCTLR_AFE access check
  target-arm: get_phys_addr_lpae: more xn control

 target-arm/helper.c | 213 +---
 1 file changed, 154 insertions(+), 59 deletions(-)

-- 
1.9.3




[Qemu-devel] [PATCH v2 1/3] target-arm: convert check_ap to ap_to_rw_prot

2015-03-10 Thread Andrew Jones
Instead of mixing access permission checking with access permissions
to page protection flags translation, just do the translation, and
leave it to the caller to check the protection flags against the access
type. Also rename to ap_to_rw_prot to better describe the new behavior.

Signed-off-by: Andrew Jones 
Reviewed-by: Peter Maydell 
---
 target-arm/helper.c | 49 +++--
 1 file changed, 19 insertions(+), 30 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 3bc20af04f012..d1e70a86a647c 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -4903,34 +4903,23 @@ static inline bool regime_is_user(CPUARMState *env, 
ARMMMUIdx mmu_idx)
 }
 }
 
-/* Check section/page access permissions.
-   Returns the page protection flags, or zero if the access is not
-   permitted.  */
-static inline int check_ap(CPUARMState *env, ARMMMUIdx mmu_idx,
-   int ap, int domain_prot,
-   int access_type)
-{
-int prot_ro;
+/* Translate section/page access permissions to page
+ * R/W protection flags
+ */
+static inline int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
+int ap, int domain_prot)
+{
 bool is_user = regime_is_user(env, mmu_idx);
 
 if (domain_prot == 3) {
 return PAGE_READ | PAGE_WRITE;
 }
 
-if (access_type == 1) {
-prot_ro = 0;
-} else {
-prot_ro = PAGE_READ;
-}
-
 switch (ap) {
 case 0:
 if (arm_feature(env, ARM_FEATURE_V7)) {
 return 0;
 }
-if (access_type == 1) {
-return 0;
-}
 switch (regime_sctlr(env, mmu_idx) & (SCTLR_S | SCTLR_R)) {
 case SCTLR_S:
 return is_user ? 0 : PAGE_READ;
@@ -4943,7 +4932,7 @@ static inline int check_ap(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 return is_user ? 0 : PAGE_READ | PAGE_WRITE;
 case 2:
 if (is_user) {
-return prot_ro;
+return PAGE_READ;
 } else {
 return PAGE_READ | PAGE_WRITE;
 }
@@ -4952,16 +4941,16 @@ static inline int check_ap(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 case 4: /* Reserved.  */
 return 0;
 case 5:
-return is_user ? 0 : prot_ro;
+return is_user ? 0 : PAGE_READ;
 case 6:
-return prot_ro;
+return PAGE_READ;
 case 7:
 if (!arm_feature(env, ARM_FEATURE_V6K)) {
 return 0;
 }
-return prot_ro;
+return PAGE_READ;
 default:
-abort();
+g_assert_not_reached();
 }
 }
 
@@ -5083,12 +5072,12 @@ static int get_phys_addr_v5(CPUARMState *env, uint32_t 
address, int access_type,
 }
 code = 15;
 }
-*prot = check_ap(env, mmu_idx, ap, domain_prot, access_type);
-if (!*prot) {
+*prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
+*prot |= *prot ? PAGE_EXEC : 0;
+if (!(*prot & (1 << access_type))) {
 /* Access permission fault.  */
 goto do_fault;
 }
-*prot |= PAGE_EXEC;
 *phys_ptr = phys_addr;
 return 0;
 do_fault:
@@ -5204,14 +5193,14 @@ static int get_phys_addr_v6(CPUARMState *env, uint32_t 
address, int access_type,
 code = (code == 15) ? 6 : 3;
 goto do_fault;
 }
-*prot = check_ap(env, mmu_idx, ap, domain_prot, access_type);
-if (!*prot) {
+*prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
+if (*prot && !xn) {
+*prot |= PAGE_EXEC;
+}
+if (!(*prot & (1 << access_type))) {
 /* Access permission fault.  */
 goto do_fault;
 }
-if (!xn) {
-*prot |= PAGE_EXEC;
-}
 }
 *phys_ptr = phys_addr;
 return 0;
-- 
1.9.3




[Qemu-devel] [PATCH v2 2/3] target-arm: fix get_phys_addr_v6/SCTLR_AFE access check

2015-03-10 Thread Andrew Jones
Introduce simple_ap_to_rw_prot(), which has the same behavior as
ap_to_rw_prot(), but takes the 2-bit simple AP[2:1] instead of
the 3-bit AP[2:0]. Use this in get_phys_addr_v6 when SCTLR_AFE
is set, as that bit indicates we should be using the simple AP
format.

It's unlikely this path is getting used. I don't see CR_AFE
getting used by Linux, so possibly not. If it had been, then
the check would have been wrong for all but AP[2:1] = 0b11.
Anyway, this should fix it up, in case it ever does get used.

Signed-off-by: Andrew Jones 
---
 target-arm/helper.c | 49 ++---
 1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index d1e70a86a647c..d996659652f8d 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -4905,6 +4905,11 @@ static inline bool regime_is_user(CPUARMState *env, 
ARMMMUIdx mmu_idx)
 
 /* Translate section/page access permissions to page
  * R/W protection flags
+ *
+ * @env: CPUARMState
+ * @mmu_idx: MMU index indicating required translation regime
+ * @ap:  The 3-bit access permissions (AP[2:0])
+ * @domain_prot: The 2-bit domain access permissions
  */
 static inline int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
 int ap, int domain_prot)
@@ -4954,6 +4959,32 @@ static inline int ap_to_rw_prot(CPUARMState *env, 
ARMMMUIdx mmu_idx,
 }
 }
 
+/* Translate section/page access permissions to page
+ * R/W protection flags.
+ *
+ * @env: CPUARMState
+ * @mmu_idx: MMU index indicating required translation regime
+ * @ap:  The 2-bit simple AP (AP[2:1])
+ */
+static inline int
+simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+{
+bool is_user = regime_is_user(env, mmu_idx);
+
+switch (ap) {
+case 0:
+return is_user ? 0 : PAGE_READ | PAGE_WRITE;
+case 1:
+return PAGE_READ | PAGE_WRITE;
+case 2:
+return is_user ? 0 : PAGE_READ;
+case 3:
+return PAGE_READ;
+default:
+g_assert_not_reached();
+}
+}
+
 static bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
  uint32_t *table, uint32_t address)
 {
@@ -5186,14 +5217,18 @@ static int get_phys_addr_v6(CPUARMState *env, uint32_t 
address, int access_type,
 if (xn && access_type == 2)
 goto do_fault;
 
-/* The simplified model uses AP[0] as an access control bit.  */
-if ((regime_sctlr(env, mmu_idx) & SCTLR_AFE)
-&& (ap & 1) == 0) {
-/* Access flag fault.  */
-code = (code == 15) ? 6 : 3;
-goto do_fault;
+if (arm_feature(env, ARM_FEATURE_V6K) &&
+(regime_sctlr(env, mmu_idx) & SCTLR_AFE)) {
+/* The simplified model uses AP[0] as an access control bit.  */
+if ((ap & 1) == 0) {
+/* Access flag fault.  */
+code = (code == 15) ? 6 : 3;
+goto do_fault;
+}
+*prot = simple_ap_to_rw_prot(env, mmu_idx, ap >> 1);
+} else {
+*prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
 }
-*prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
 if (*prot && !xn) {
 *prot |= PAGE_EXEC;
 }
-- 
1.9.3




[Qemu-devel] [PATCH v2 3/3] target-arm: get_phys_addr_lpae: more xn control

2015-03-10 Thread Andrew Jones
This patch makes the following changes to the determination of
whether an address is executable, when translating addresses
using LPAE.

1. No longer assumes that PL0 can't execute when it can't read.
   It can in AArch64, a difference from AArch32.
2. Use va_size == 64 to determine we're in AArch64, rather than
   arm_feature(env, ARM_FEATURE_V8), which is insufficient.
3. Add additional XN determinants
   - NS && is_secure && (SCR & SCR_SIF)
   - WXN && (prot & PAGE_WRITE)
   - AArch64: (prot_PL0 & PAGE_WRITE)
   - AArch32: UWXN && (prot_PL0 & PAGE_WRITE)
   - XN determination should also work in secure mode (untested)
   - XN may even work in EL2 (currently impossible to test)
4. Cleans up the bloated PAGE_EXEC condition - by removing it.

The helper get_S1prot is introduced. It may even work in EL2,
when support for that comes, but, as the function name implies,
it only works for stage 1 translations.

Signed-off-by: Andrew Jones 
---
 target-arm/helper.c | 129 
 1 file changed, 100 insertions(+), 29 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index d996659652f8d..c457e9ab8c85a 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -4962,15 +4962,11 @@ static inline int ap_to_rw_prot(CPUARMState *env, 
ARMMMUIdx mmu_idx,
 /* Translate section/page access permissions to page
  * R/W protection flags.
  *
- * @env: CPUARMState
- * @mmu_idx: MMU index indicating required translation regime
  * @ap:  The 2-bit simple AP (AP[2:1])
+ * @is_user: TRUE if accessing from PL0
  */
-static inline int
-simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+static inline int simple_ap_to_rw_prot_is_user(int ap, bool is_user)
 {
-bool is_user = regime_is_user(env, mmu_idx);
-
 switch (ap) {
 case 0:
 return is_user ? 0 : PAGE_READ | PAGE_WRITE;
@@ -4985,6 +4981,94 @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx 
mmu_idx, int ap)
 }
 }
 
+static inline int
+simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+{
+return simple_ap_to_rw_prot_is_user(ap, regime_is_user(env, mmu_idx));
+}
+
+/* Translate section/page access permissions to protection flags
+ *
+ * @env: CPUARMState
+ * @mmu_idx: MMU index indicating required translation regime
+ * @is_aa64: TRUE if AArch64
+ * @ap:  The 2-bit simple AP (AP[2:1])
+ * @ns:  NS (non-secure) bit
+ * @xn:  XN (execute-never) bit
+ * @pxn: PXN (privileged execute-never) bit
+ */
+static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
+  int ap, int ns, int xn, int pxn)
+{
+bool is_user = regime_is_user(env, mmu_idx);
+int prot_rw, user_rw;
+bool have_wxn;
+int wxn = 0;
+
+assert(mmu_idx != ARMMMUIdx_S2NS);
+
+user_rw = simple_ap_to_rw_prot_is_user(ap, true);
+if (is_user) {
+prot_rw = user_rw;
+} else {
+prot_rw = simple_ap_to_rw_prot_is_user(ap, false);
+}
+
+if (ns && arm_is_secure(env) && (env->cp15.scr_el3 & SCR_SIF)) {
+return prot_rw;
+}
+
+/* TODO have_wxn should be replaced with
+ *   ARM_FEATURE_V8 || (ARM_FEATURE_V7 && ARM_FEATURE_EL2)
+ * when ARM_FEATURE_EL2 starts getting set. For now we assume all LPAE
+ * compatible processors have EL2, which is required for [U]WXN.
+ */
+have_wxn = arm_feature(env, ARM_FEATURE_LPAE);
+
+if (have_wxn) {
+wxn = regime_sctlr(env, mmu_idx) & SCTLR_WXN;
+}
+
+if (is_aa64) {
+switch (regime_el(env, mmu_idx)) {
+case 1:
+if (is_user && !user_rw) {
+wxn = 0;
+} else if (!is_user) {
+xn = pxn || (user_rw & PAGE_WRITE);
+}
+break;
+case 2:
+case 3:
+break;
+}
+} else if (arm_feature(env, ARM_FEATURE_V7)) {
+switch (regime_el(env, mmu_idx)) {
+case 1:
+case 3:
+if (is_user) {
+xn = xn || !user_rw;
+} else {
+int uwxn = 0;
+if (have_wxn) {
+uwxn = regime_sctlr(env, mmu_idx) & SCTLR_UWXN;
+}
+xn = xn || !prot_rw || pxn || (uwxn && (user_rw & PAGE_WRITE));
+}
+break;
+case 2:
+break;
+}
+} else {
+xn = wxn = 0;
+}
+
+if (xn || (wxn && (prot_rw & PAGE_WRITE))) {
+return prot_rw;
+}
+return prot_rw | PAGE_EXEC;
+}
+
 static bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
  uint32_t *table, uint32_t address)
 {
@@ -5273,8 +5357,8 @@ static int get_phys_addr_lpae(CPUARMState *env, 
target_ulong address,
 int32_t granule_sz = 9;
 int32_t va_size = 32;
 int32_t tbi =

Re: [Qemu-devel] [PATCH 4/5] target-arm: get_phys_addr_lpae: more xn control

2015-03-11 Thread Andrew Jones
On Thu, Feb 12, 2015 at 04:05:06PM +0100, Andrew Jones wrote:
> This patch makes the following changes to the determination of
> whether an address is executable, when translating addresses
> using LPAE.
> 
> 1. No longer assumes that PL0 can't execute when it can't read.
>It can in AArch64, a difference from AArch32.
> 2. Use va_size == 64 to determine we're in AArch64, rather than
>arm_feature(env, ARM_FEATURE_V8), which is insufficient.
> 3. Add additional XN determinants
>- NS && is_secure && (SCR & SCR_SIF)
>- WXN && (prot & PAGE_WRITE)
>- AArch64: (prot_PL0 & PAGE_WRITE)
>- AArch32: UWXN && (prot_PL0 & PAGE_WRITE)
>- XN determination should also work in secure mode (untested)
>- XN may even work in EL2 (currently impossible to test)
> 4. Cleans up the bloated PAGE_EXEC condition - by removing it.
> 
> The helper get_S1prot is introduced, which also handles short-
> descriptors and v6 XN determination. It may even work in EL2,
> when support for that comes, but, as the function name implies,
> it only works for stage 1 translations.
> 
> Signed-off-by: Andrew Jones 
> ---
>  target-arm/helper.c | 111 
> 
>  1 file changed, 86 insertions(+), 25 deletions(-)
> 
> diff --git a/target-arm/helper.c b/target-arm/helper.c
> index df78f481e92f4..20e5753bd216d 100644
> --- a/target-arm/helper.c
> +++ b/target-arm/helper.c
> @@ -4695,8 +4695,8 @@ static inline bool regime_is_user(CPUARMState *env, 
> ARMMMUIdx mmu_idx)
>  /* Translate section/page access permissions to page
>   * R/W protection flags
>   */
> -static inline int get_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
> -  bool is_user, int ap, int domain_prot)
> +static int get_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
> +   bool is_user, int ap, int domain_prot)
>  {
>  bool simple_ap = regime_using_lpae_format(env, mmu_idx)
>   || (regime_sctlr(env, mmu_idx) & SCTLR_AFE);
> @@ -4762,6 +4762,84 @@ static inline int get_rw_prot(CPUARMState *env, 
> ARMMMUIdx mmu_idx,
>  }
>  }
>  
> +/* Translate section/page access permissions to protection flags */
> +static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
> +  int ap, int domain_prot, int ns, int xn, int pxn)
> +{
> +bool domain_prot_valid = !regime_using_lpae_format(env, mmu_idx);
> +bool is_user = regime_is_user(env, mmu_idx);
> +bool have_wxn;
> +int prot_rw, user_rw;
> +int wxn = 0;
> +
> +assert(mmu_idx != ARMMMUIdx_S2NS);
> +
> +if (domain_prot_valid && domain_prot == 3) {
> +return PAGE_READ | PAGE_WRITE | PAGE_EXEC;
> +}
> +
> +user_rw = get_rw_prot(env, mmu_idx, true, ap, domain_prot);
> +if (is_user) {
> +prot_rw = user_rw;
> +} else {
> +prot_rw = get_rw_prot(env, mmu_idx, false, ap, domain_prot);
> +}
> +
> +if (ns && arm_is_secure(env) && (env->cp15.scr_el3 & SCR_SIF)) {
> +return prot_rw;
> +}
> +
> +/* TODO have_wxn should be replaced with arm_feature(env, 
> ARM_FEATURE_EL2),
> + * when ARM_FEATURE_EL2 starts getting set. For now we assume all v7
> + * compatible processors have EL2, which is required for [U]WXN.
> + */
> +have_wxn = arm_feature(env, ARM_FEATURE_V7);
> +
> +if (have_wxn) {
> +wxn = regime_sctlr(env, mmu_idx) & SCTLR_WXN;
> +}
> +
> +if (is_aa64) {
> +assert(arm_feature(env, ARM_FEATURE_V8));
> +switch (regime_el(env, mmu_idx)) {
> +case 1:
> +if (is_user && !user_rw) {
> +wxn = 0;
> +} else if (!is_user) {
> +xn = pxn || (user_rw & PAGE_WRITE);
> +}
> +break;
> +case 2:
> +case 3:
> +break;
> +}
> +} else if (arm_feature(env, ARM_FEATURE_V6K)) {
> +switch (regime_el(env, mmu_idx)) {
> +case 1:
> +case 3:
> +if (is_user) {
> +xn = xn || !user_rw;
> +} else {
> +int uwxn = 0;
> +if (have_wxn) {
> +uwxn = regime_sctlr(env, mmu_idx) & SCTLR_UWXN;
> +}
> +xn = xn || !prot_rw || pxn || (uwxn && (user_rw & 
> PAGE_WRITE));
> +}
> +break;
> +case 2:
> +break;
> +}
> +} else {
> +xn = wxn = 0;
> +}
> +
&g

Re: [Qemu-devel] [PATCH v2 3/3] target-arm: get_phys_addr_lpae: more xn control

2015-03-11 Thread Andrew Jones
On Wed, Mar 11, 2015 at 05:02:00PM +, Peter Maydell wrote:
> On 10 March 2015 at 21:06, Andrew Jones  wrote:
> > This patch makes the following changes to the determination of
> > whether an address is executable, when translating addresses
> > using LPAE.
> >
> > 1. No longer assumes that PL0 can't execute when it can't read.
> >It can in AArch64, a difference from AArch32.
> > 2. Use va_size == 64 to determine we're in AArch64, rather than
> >arm_feature(env, ARM_FEATURE_V8), which is insufficient.
> > 3. Add additional XN determinants
> >- NS && is_secure && (SCR & SCR_SIF)
> >- WXN && (prot & PAGE_WRITE)
> >- AArch64: (prot_PL0 & PAGE_WRITE)
> >- AArch32: UWXN && (prot_PL0 & PAGE_WRITE)
> >- XN determination should also work in secure mode (untested)
> >- XN may even work in EL2 (currently impossible to test)
> > 4. Cleans up the bloated PAGE_EXEC condition - by removing it.
> >
> > The helper get_S1prot is introduced. It may even work in EL2,
> > when support for that comes, but, as the function name implies,
> > it only works for stage 1 translations.
> >
> > Signed-off-by: Andrew Jones 
> 
> I like the general shape of this patch. Minor comment below:
> 
> > ---
> >  target-arm/helper.c | 129 
> > 
> >  1 file changed, 100 insertions(+), 29 deletions(-)
> >
> > diff --git a/target-arm/helper.c b/target-arm/helper.c
> > index d996659652f8d..c457e9ab8c85a 100644
> > --- a/target-arm/helper.c
> > +++ b/target-arm/helper.c
> > @@ -4962,15 +4962,11 @@ static inline int ap_to_rw_prot(CPUARMState *env, 
> > ARMMMUIdx mmu_idx,
> >  /* Translate section/page access permissions to page
> >   * R/W protection flags.
> >   *
> > - * @env: CPUARMState
> > - * @mmu_idx: MMU index indicating required translation regime
> >   * @ap:  The 2-bit simple AP (AP[2:1])
> > + * @is_user: TRUE if accessing from PL0
> >   */
> > -static inline int
> > -simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
> > +static inline int simple_ap_to_rw_prot_is_user(int ap, bool is_user)
> >  {
> > -bool is_user = regime_is_user(env, mmu_idx);
> > -
> >  switch (ap) {
> >  case 0:
> >  return is_user ? 0 : PAGE_READ | PAGE_WRITE;
> > @@ -4985,6 +4981,94 @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx 
> > mmu_idx, int ap)
> >  }
> >  }
> >
> > +static inline int
> > +simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
> > +{
> > +return simple_ap_to_rw_prot_is_user(ap, regime_is_user(env, mmu_idx));
> > +}
> > +
> > +/* Translate section/page access permissions to protection flags
> > + *
> > + * @env: CPUARMState
> > + * @mmu_idx: MMU index indicating required translation regime
> > + * @is_aa64: TRUE if AArch64
> > + * @ap:  The 2-bit simple AP (AP[2:1])
> > + * @ns:  NS (non-secure) bit
> > + * @xn:  XN (execute-never) bit
> > + * @pxn: PXN (privileged execute-never) bit
> > + */
> > +static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
> > +  int ap, int ns, int xn, int pxn)
> > +{
> > +bool is_user = regime_is_user(env, mmu_idx);
> > +int prot_rw, user_rw;
> > +bool have_wxn;
> > +int wxn = 0;
> > +
> > +assert(mmu_idx != ARMMMUIdx_S2NS);
> > +
> > +user_rw = simple_ap_to_rw_prot_is_user(ap, true);
> > +if (is_user) {
> > +prot_rw = user_rw;
> > +} else {
> > +prot_rw = simple_ap_to_rw_prot_is_user(ap, false);
> > +}
> > +
> > +if (ns && arm_is_secure(env) && (env->cp15.scr_el3 & SCR_SIF)) {
> > +return prot_rw;
> > +}
> > +
> > +/* TODO have_wxn should be replaced with
> > + *   ARM_FEATURE_V8 || (ARM_FEATURE_V7 && ARM_FEATURE_EL2)
> > + * when ARM_FEATURE_EL2 starts getting set. For now we assume all LPAE
> > + * compatible processors have EL2, which is required for [U]WXN.
> > + */
> > +have_wxn = arm_feature(env, ARM_FEATURE_LPAE);
> > +
> > +if (have_wxn) {
> > +wxn = regime_sctlr(env, mmu_idx) & SCTLR_WXN;
> > +}
> > +
> > +if (is_aa64) {
> > +switch (regime_el(env, mmu_idx)) {
> > +case 1:
> > +if (is_user && !user_rw) {
&

[Qemu-devel] [PATCH v3 0/3] tcg-arm: LPAE: fix and extend xn control

2015-03-11 Thread Andrew Jones
This series fixes and extends the determination of whether or
not an address is executable for LPAE translations. The main
patch is 3/3, and describes the details in its commit message.
Patch 1/3 prepares for patch 2/3, which is prep for 3/3, and also
fixes a potential problem with checking access permissions on
v6 processors when they use the simple AP format.

Changes since v2:
* Collect NSTable bits into ns
* Explicitly check {prot,user}_rw against PAGE_READ. !{prot,user}_rw
  means no read or write, but, since we can't have write without read,
  it also means !({prot,user}_rw & PAGE_READ). Making this explicit
  can reduce confusion when trying to determine with prot flag we
  actually care about. [Peter Maydell]

Changes since v1:
* There are now 3 patches instead of 5. This is due to approaching
  the prep patches in a different way. Most notably, simple AP
  format is now handled with its own function. [Peter Maydell]
* A check of SCTLR_AFE is guarded with ARM_FEATURE_V6K [Peter Maydell]
* All other cleanups suggested by Peter Maydell.

Tested by booting Linux on mach-virt with cortex-a15 and cortex-a57
(just up to 'Unable to mount root fs'), and also with a kvm-unit-tests
test. The curious can check out the unit test here [1].

Thanks in advance for reviews!

drew

[1] 
https://github.com/rhdrjones/kvm-unit-tests/commit/ee553e4bb795b0150e31c794bf8953dfb08d619a

Andrew Jones (3):
  target-arm: convert check_ap to ap_to_rw_prot
  target-arm: fix get_phys_addr_v6/SCTLR_AFE access check
  target-arm: get_phys_addr_lpae: more xn control

 target-arm/helper.c | 216 +---
 1 file changed, 156 insertions(+), 60 deletions(-)

-- 
1.9.3




[Qemu-devel] [PATCH v3 1/3] target-arm: convert check_ap to ap_to_rw_prot

2015-03-11 Thread Andrew Jones
Instead of mixing access permission checking with access permissions
to page protection flags translation, just do the translation, and
leave it to the caller to check the protection flags against the access
type. Also rename to ap_to_rw_prot to better describe the new behavior.

Signed-off-by: Andrew Jones 
Reviewed-by: Peter Maydell 
---
 target-arm/helper.c | 49 +++--
 1 file changed, 19 insertions(+), 30 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 3bc20af04f012..d1e70a86a647c 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -4903,34 +4903,23 @@ static inline bool regime_is_user(CPUARMState *env, 
ARMMMUIdx mmu_idx)
 }
 }
 
-/* Check section/page access permissions.
-   Returns the page protection flags, or zero if the access is not
-   permitted.  */
-static inline int check_ap(CPUARMState *env, ARMMMUIdx mmu_idx,
-   int ap, int domain_prot,
-   int access_type)
-{
-int prot_ro;
+/* Translate section/page access permissions to page
+ * R/W protection flags
+ */
+static inline int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
+int ap, int domain_prot)
+{
 bool is_user = regime_is_user(env, mmu_idx);
 
 if (domain_prot == 3) {
 return PAGE_READ | PAGE_WRITE;
 }
 
-if (access_type == 1) {
-prot_ro = 0;
-} else {
-prot_ro = PAGE_READ;
-}
-
 switch (ap) {
 case 0:
 if (arm_feature(env, ARM_FEATURE_V7)) {
 return 0;
 }
-if (access_type == 1) {
-return 0;
-}
 switch (regime_sctlr(env, mmu_idx) & (SCTLR_S | SCTLR_R)) {
 case SCTLR_S:
 return is_user ? 0 : PAGE_READ;
@@ -4943,7 +4932,7 @@ static inline int check_ap(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 return is_user ? 0 : PAGE_READ | PAGE_WRITE;
 case 2:
 if (is_user) {
-return prot_ro;
+return PAGE_READ;
 } else {
 return PAGE_READ | PAGE_WRITE;
 }
@@ -4952,16 +4941,16 @@ static inline int check_ap(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 case 4: /* Reserved.  */
 return 0;
 case 5:
-return is_user ? 0 : prot_ro;
+return is_user ? 0 : PAGE_READ;
 case 6:
-return prot_ro;
+return PAGE_READ;
 case 7:
 if (!arm_feature(env, ARM_FEATURE_V6K)) {
 return 0;
 }
-return prot_ro;
+return PAGE_READ;
 default:
-abort();
+g_assert_not_reached();
 }
 }
 
@@ -5083,12 +5072,12 @@ static int get_phys_addr_v5(CPUARMState *env, uint32_t 
address, int access_type,
 }
 code = 15;
 }
-*prot = check_ap(env, mmu_idx, ap, domain_prot, access_type);
-if (!*prot) {
+*prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
+*prot |= *prot ? PAGE_EXEC : 0;
+if (!(*prot & (1 << access_type))) {
 /* Access permission fault.  */
 goto do_fault;
 }
-*prot |= PAGE_EXEC;
 *phys_ptr = phys_addr;
 return 0;
 do_fault:
@@ -5204,14 +5193,14 @@ static int get_phys_addr_v6(CPUARMState *env, uint32_t 
address, int access_type,
 code = (code == 15) ? 6 : 3;
 goto do_fault;
 }
-*prot = check_ap(env, mmu_idx, ap, domain_prot, access_type);
-if (!*prot) {
+*prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
+if (*prot && !xn) {
+*prot |= PAGE_EXEC;
+}
+if (!(*prot & (1 << access_type))) {
 /* Access permission fault.  */
 goto do_fault;
 }
-if (!xn) {
-*prot |= PAGE_EXEC;
-}
 }
 *phys_ptr = phys_addr;
 return 0;
-- 
1.9.3




[Qemu-devel] [PATCH v3 2/3] target-arm: fix get_phys_addr_v6/SCTLR_AFE access check

2015-03-11 Thread Andrew Jones
Introduce simple_ap_to_rw_prot(), which has the same behavior as
ap_to_rw_prot(), but takes the 2-bit simple AP[2:1] instead of
the 3-bit AP[2:0]. Use this in get_phys_addr_v6 when SCTLR_AFE
is set, as that bit indicates we should be using the simple AP
format.

It's unlikely this path is getting used. I don't see CR_AFE
getting used by Linux, so possibly not. If it had been, then
the check would have been wrong for all but AP[2:1] = 0b11.
Anyway, this should fix it up, in case it ever does get used.

Signed-off-by: Andrew Jones 
Reviewed-by: Peter Maydell 
---
 target-arm/helper.c | 49 ++---
 1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index d1e70a86a647c..d996659652f8d 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -4905,6 +4905,11 @@ static inline bool regime_is_user(CPUARMState *env, 
ARMMMUIdx mmu_idx)
 
 /* Translate section/page access permissions to page
  * R/W protection flags
+ *
+ * @env: CPUARMState
+ * @mmu_idx: MMU index indicating required translation regime
+ * @ap:  The 3-bit access permissions (AP[2:0])
+ * @domain_prot: The 2-bit domain access permissions
  */
 static inline int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
 int ap, int domain_prot)
@@ -4954,6 +4959,32 @@ static inline int ap_to_rw_prot(CPUARMState *env, 
ARMMMUIdx mmu_idx,
 }
 }
 
+/* Translate section/page access permissions to page
+ * R/W protection flags.
+ *
+ * @env: CPUARMState
+ * @mmu_idx: MMU index indicating required translation regime
+ * @ap:  The 2-bit simple AP (AP[2:1])
+ */
+static inline int
+simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+{
+bool is_user = regime_is_user(env, mmu_idx);
+
+switch (ap) {
+case 0:
+return is_user ? 0 : PAGE_READ | PAGE_WRITE;
+case 1:
+return PAGE_READ | PAGE_WRITE;
+case 2:
+return is_user ? 0 : PAGE_READ;
+case 3:
+return PAGE_READ;
+default:
+g_assert_not_reached();
+}
+}
+
 static bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
  uint32_t *table, uint32_t address)
 {
@@ -5186,14 +5217,18 @@ static int get_phys_addr_v6(CPUARMState *env, uint32_t 
address, int access_type,
 if (xn && access_type == 2)
 goto do_fault;
 
-/* The simplified model uses AP[0] as an access control bit.  */
-if ((regime_sctlr(env, mmu_idx) & SCTLR_AFE)
-&& (ap & 1) == 0) {
-/* Access flag fault.  */
-code = (code == 15) ? 6 : 3;
-goto do_fault;
+if (arm_feature(env, ARM_FEATURE_V6K) &&
+(regime_sctlr(env, mmu_idx) & SCTLR_AFE)) {
+/* The simplified model uses AP[0] as an access control bit.  */
+if ((ap & 1) == 0) {
+/* Access flag fault.  */
+code = (code == 15) ? 6 : 3;
+goto do_fault;
+}
+*prot = simple_ap_to_rw_prot(env, mmu_idx, ap >> 1);
+} else {
+*prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
 }
-*prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
 if (*prot && !xn) {
 *prot |= PAGE_EXEC;
 }
-- 
1.9.3




[Qemu-devel] [PATCH v3 3/3] target-arm: get_phys_addr_lpae: more xn control

2015-03-11 Thread Andrew Jones
This patch makes the following changes to the determination of
whether an address is executable, when translating addresses
using LPAE.

1. No longer assumes that PL0 can't execute when it can't read.
   It can in AArch64, a difference from AArch32.
2. Use va_size == 64 to determine we're in AArch64, rather than
   arm_feature(env, ARM_FEATURE_V8), which is insufficient.
3. Add additional XN determinants
   - NS && is_secure && (SCR & SCR_SIF)
   - WXN && (prot & PAGE_WRITE)
   - AArch64: (prot_PL0 & PAGE_WRITE)
   - AArch32: UWXN && (prot_PL0 & PAGE_WRITE)
   - XN determination should also work in secure mode (untested)
   - XN may even work in EL2 (currently impossible to test)
4. Cleans up the bloated PAGE_EXEC condition - by removing it.

The helper get_S1prot is introduced. It may even work in EL2,
when support for that comes, but, as the function name implies,
it only works for stage 1 translations.

Signed-off-by: Andrew Jones 
---
 target-arm/helper.c | 132 
 1 file changed, 102 insertions(+), 30 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index d996659652f8d..193e6aa7c1f26 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -4962,15 +4962,11 @@ static inline int ap_to_rw_prot(CPUARMState *env, 
ARMMMUIdx mmu_idx,
 /* Translate section/page access permissions to page
  * R/W protection flags.
  *
- * @env: CPUARMState
- * @mmu_idx: MMU index indicating required translation regime
  * @ap:  The 2-bit simple AP (AP[2:1])
+ * @is_user: TRUE if accessing from PL0
  */
-static inline int
-simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+static inline int simple_ap_to_rw_prot_is_user(int ap, bool is_user)
 {
-bool is_user = regime_is_user(env, mmu_idx);
-
 switch (ap) {
 case 0:
 return is_user ? 0 : PAGE_READ | PAGE_WRITE;
@@ -4985,6 +4981,95 @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx 
mmu_idx, int ap)
 }
 }
 
+static inline int
+simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+{
+return simple_ap_to_rw_prot_is_user(ap, regime_is_user(env, mmu_idx));
+}
+
+/* Translate section/page access permissions to protection flags
+ *
+ * @env: CPUARMState
+ * @mmu_idx: MMU index indicating required translation regime
+ * @is_aa64: TRUE if AArch64
+ * @ap:  The 2-bit simple AP (AP[2:1])
+ * @ns:  NS (non-secure) bit
+ * @xn:  XN (execute-never) bit
+ * @pxn: PXN (privileged execute-never) bit
+ */
+static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
+  int ap, int ns, int xn, int pxn)
+{
+bool is_user = regime_is_user(env, mmu_idx);
+int prot_rw, user_rw;
+bool have_wxn;
+int wxn = 0;
+
+assert(mmu_idx != ARMMMUIdx_S2NS);
+
+user_rw = simple_ap_to_rw_prot_is_user(ap, true);
+if (is_user) {
+prot_rw = user_rw;
+} else {
+prot_rw = simple_ap_to_rw_prot_is_user(ap, false);
+}
+
+if (ns && arm_is_secure(env) && (env->cp15.scr_el3 & SCR_SIF)) {
+return prot_rw;
+}
+
+/* TODO have_wxn should be replaced with
+ *   ARM_FEATURE_V8 || (ARM_FEATURE_V7 && ARM_FEATURE_EL2)
+ * when ARM_FEATURE_EL2 starts getting set. For now we assume all LPAE
+ * compatible processors have EL2, which is required for [U]WXN.
+ */
+have_wxn = arm_feature(env, ARM_FEATURE_LPAE);
+
+if (have_wxn) {
+wxn = regime_sctlr(env, mmu_idx) & SCTLR_WXN;
+}
+
+if (is_aa64) {
+switch (regime_el(env, mmu_idx)) {
+case 1:
+if (is_user && !(user_rw & PAGE_READ)) {
+wxn = 0;
+} else if (!is_user) {
+xn = pxn || (user_rw & PAGE_WRITE);
+}
+break;
+case 2:
+case 3:
+break;
+}
+} else if (arm_feature(env, ARM_FEATURE_V7)) {
+switch (regime_el(env, mmu_idx)) {
+case 1:
+case 3:
+if (is_user) {
+xn = xn || !(user_rw & PAGE_READ);
+} else {
+int uwxn = 0;
+if (have_wxn) {
+uwxn = regime_sctlr(env, mmu_idx) & SCTLR_UWXN;
+}
+xn = xn || !(prot_rw & PAGE_READ) || pxn ||
+ (uwxn && (user_rw & PAGE_WRITE));
+}
+break;
+case 2:
+break;
+}
+} else {
+xn = wxn = 0;
+}
+
+if (xn || (wxn && (prot_rw & PAGE_WRITE))) {
+return prot_rw;
+}
+return prot_rw | PAGE_EXEC;
+}
+
 static bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
  uint32_t *table, uint32_t address)
 {
@@ -5273,8 +5358,8 @@ static int get_phys_addr_lpae(CPUARMState *env, 
target_ulong addres

Re: [Qemu-devel] [PATCH v2 3/3] target-arm: get_phys_addr_lpae: more xn control

2015-03-11 Thread Andrew Jones
On Wed, Mar 11, 2015 at 05:49:39PM +, Peter Maydell wrote:
> On 11 March 2015 at 17:42, Andrew Jones  wrote:
> > On Wed, Mar 11, 2015 at 05:02:00PM +, Peter Maydell wrote:
> 
> >> > +if (is_aa64) {
> >> > +switch (regime_el(env, mmu_idx)) {
> >> > +case 1:
> >> > +if (is_user && !user_rw) {
> >> > +wxn = 0;
> >>
> >> I don't understand this. We ignore the WXN bit if this is
> >> a user access and the page is not readable ?
> >
> > Yup. If the page is not readable or writeable, AP[1]=0. I almost
> > submitted an errata to the ARM ARM when I saw this on the 2nd line
> > of table D4-32. I thought it must be a typo. However I tested it
> > on hardware, and it works this way. So at least the weirdness has
> > been implemented consistently...
> 
> Still confused. If the page isn't readable or writable
> then WXN isn't going to kick in anyway because WXN only
> affects writable pages. I don't see what the case is
> where this bit of code will make a difference.
>

Ah, that is true. Too bad I didn't read this before sending v3,
as I could have removed it, if you prefer. I had it here to
be explicit about the ignoring of wxn - matching the spec, but
you're right, it's useless code. Should I send a v4?

drew



Re: [Qemu-devel] [PATCH v2 3/3] target-arm: get_phys_addr_lpae: more xn control

2015-03-11 Thread Andrew Jones
On Wed, Mar 11, 2015 at 06:15:47PM +, Peter Maydell wrote:
> On 11 March 2015 at 18:10, Andrew Jones  wrote:
> > On Wed, Mar 11, 2015 at 05:49:39PM +, Peter Maydell wrote:
> >> Still confused. If the page isn't readable or writable
> >> then WXN isn't going to kick in anyway because WXN only
> >> affects writable pages. I don't see what the case is
> >> where this bit of code will make a difference.
> >>
> >
> > Ah, that is true. Too bad I didn't read this before sending v3,
> > as I could have removed it, if you prefer. I had it here to
> > be explicit about the ignoring of wxn - matching the spec, but
> > you're right, it's useless code. Should I send a v4?
> 
> Yes, please send a v4.
> 
> I don't see what you mean about matching the spec, though.
> The spec doesn't say anything about "ignore WXN if the
> page isn't readable". It just straightforwardly says "if
> the WXN bit is set then writable regions are treated as
> XN", which is exactly what the code at the bottom of your
> function does.
>

My interpretation of SCTLR_EL1.WXN was just wrong. There is
talks about "EL1&0", and I assumed it meant that when WXN is
on, then both EL1 and EL0 should lose executability. However
it can certainly be interpreted as applying to them both, but
based on their respective access permissions, which is how
I guess I should have interpreted it.

v4 coming

drew 



[Qemu-devel] [PATCH v4 0/3] tcg-arm: LPAE: fix and extend xn control

2015-03-11 Thread Andrew Jones
This series fixes and extends the determination of whether or
not an address is executable for LPAE translations. The main
patch is 3/3, and describes the details in its commit message.
Patch 1/3 prepares for patch 2/3, which is prep for 3/3, and also
fixes a potential problem with checking access permissions on
v6 processors when they use the simple AP format.

Changes since v3:
* Remove a useless is_aa64/case1/is_user code setting wxn=0 [Peter]

Changes since v2:
* Collect NSTable bits into ns
* Explicitly check {prot,user}_rw against PAGE_READ. !{prot,user}_rw
  means no read or write, but, since we can't have write without read,
  it also means !({prot,user}_rw & PAGE_READ). Making this explicit
  can reduce confusion when trying to determine with prot flag we
  actually care about. [Peter Maydell]

Changes since v1:
* There are now 3 patches instead of 5. This is due to approaching
  the prep patches in a different way. Most notably, simple AP
  format is now handled with its own function. [Peter Maydell]
* A check of SCTLR_AFE is guarded with ARM_FEATURE_V6K [Peter Maydell]
* All other cleanups suggested by Peter Maydell.

Tested by booting Linux on mach-virt with cortex-a15 and cortex-a57
(just up to 'Unable to mount root fs'), and also with a kvm-unit-tests
test. The curious can check out the unit test here [1].

Thanks in advance for reviews!

drew

[1] 
https://github.com/rhdrjones/kvm-unit-tests/commit/ee553e4bb795b0150e31c794bf8953dfb08d619a

Andrew Jones (3):
  target-arm: convert check_ap to ap_to_rw_prot
  target-arm: fix get_phys_addr_v6/SCTLR_AFE access check
  target-arm: get_phys_addr_lpae: more xn control

 target-arm/helper.c | 214 +---
 1 file changed, 154 insertions(+), 60 deletions(-)

-- 
1.9.3




[Qemu-devel] [PATCH v4 1/3] target-arm: convert check_ap to ap_to_rw_prot

2015-03-11 Thread Andrew Jones
Instead of mixing access permission checking with access permissions
to page protection flags translation, just do the translation, and
leave it to the caller to check the protection flags against the access
type. Also rename to ap_to_rw_prot to better describe the new behavior.

Signed-off-by: Andrew Jones 
Reviewed-by: Peter Maydell 
---
 target-arm/helper.c | 49 +++--
 1 file changed, 19 insertions(+), 30 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 3bc20af04f012..d1e70a86a647c 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -4903,34 +4903,23 @@ static inline bool regime_is_user(CPUARMState *env, 
ARMMMUIdx mmu_idx)
 }
 }
 
-/* Check section/page access permissions.
-   Returns the page protection flags, or zero if the access is not
-   permitted.  */
-static inline int check_ap(CPUARMState *env, ARMMMUIdx mmu_idx,
-   int ap, int domain_prot,
-   int access_type)
-{
-int prot_ro;
+/* Translate section/page access permissions to page
+ * R/W protection flags
+ */
+static inline int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
+int ap, int domain_prot)
+{
 bool is_user = regime_is_user(env, mmu_idx);
 
 if (domain_prot == 3) {
 return PAGE_READ | PAGE_WRITE;
 }
 
-if (access_type == 1) {
-prot_ro = 0;
-} else {
-prot_ro = PAGE_READ;
-}
-
 switch (ap) {
 case 0:
 if (arm_feature(env, ARM_FEATURE_V7)) {
 return 0;
 }
-if (access_type == 1) {
-return 0;
-}
 switch (regime_sctlr(env, mmu_idx) & (SCTLR_S | SCTLR_R)) {
 case SCTLR_S:
 return is_user ? 0 : PAGE_READ;
@@ -4943,7 +4932,7 @@ static inline int check_ap(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 return is_user ? 0 : PAGE_READ | PAGE_WRITE;
 case 2:
 if (is_user) {
-return prot_ro;
+return PAGE_READ;
 } else {
 return PAGE_READ | PAGE_WRITE;
 }
@@ -4952,16 +4941,16 @@ static inline int check_ap(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 case 4: /* Reserved.  */
 return 0;
 case 5:
-return is_user ? 0 : prot_ro;
+return is_user ? 0 : PAGE_READ;
 case 6:
-return prot_ro;
+return PAGE_READ;
 case 7:
 if (!arm_feature(env, ARM_FEATURE_V6K)) {
 return 0;
 }
-return prot_ro;
+return PAGE_READ;
 default:
-abort();
+g_assert_not_reached();
 }
 }
 
@@ -5083,12 +5072,12 @@ static int get_phys_addr_v5(CPUARMState *env, uint32_t 
address, int access_type,
 }
 code = 15;
 }
-*prot = check_ap(env, mmu_idx, ap, domain_prot, access_type);
-if (!*prot) {
+*prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
+*prot |= *prot ? PAGE_EXEC : 0;
+if (!(*prot & (1 << access_type))) {
 /* Access permission fault.  */
 goto do_fault;
 }
-*prot |= PAGE_EXEC;
 *phys_ptr = phys_addr;
 return 0;
 do_fault:
@@ -5204,14 +5193,14 @@ static int get_phys_addr_v6(CPUARMState *env, uint32_t 
address, int access_type,
 code = (code == 15) ? 6 : 3;
 goto do_fault;
 }
-*prot = check_ap(env, mmu_idx, ap, domain_prot, access_type);
-if (!*prot) {
+*prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
+if (*prot && !xn) {
+*prot |= PAGE_EXEC;
+}
+if (!(*prot & (1 << access_type))) {
 /* Access permission fault.  */
 goto do_fault;
 }
-if (!xn) {
-*prot |= PAGE_EXEC;
-}
 }
 *phys_ptr = phys_addr;
 return 0;
-- 
1.9.3




[Qemu-devel] [PATCH v4 3/3] target-arm: get_phys_addr_lpae: more xn control

2015-03-11 Thread Andrew Jones
This patch makes the following changes to the determination of
whether an address is executable, when translating addresses
using LPAE.

1. No longer assumes that PL0 can't execute when it can't read.
   It can in AArch64, a difference from AArch32.
2. Use va_size == 64 to determine we're in AArch64, rather than
   arm_feature(env, ARM_FEATURE_V8), which is insufficient.
3. Add additional XN determinants
   - NS && is_secure && (SCR & SCR_SIF)
   - WXN && (prot & PAGE_WRITE)
   - AArch64: (prot_PL0 & PAGE_WRITE)
   - AArch32: UWXN && (prot_PL0 & PAGE_WRITE)
   - XN determination should also work in secure mode (untested)
   - XN may even work in EL2 (currently impossible to test)
4. Cleans up the bloated PAGE_EXEC condition - by removing it.

The helper get_S1prot is introduced. It may even work in EL2,
when support for that comes, but, as the function name implies,
it only works for stage 1 translations.

Signed-off-by: Andrew Jones 
---
 target-arm/helper.c | 130 
 1 file changed, 100 insertions(+), 30 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index d996659652f8d..7fe3d14773068 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -4962,15 +4962,11 @@ static inline int ap_to_rw_prot(CPUARMState *env, 
ARMMMUIdx mmu_idx,
 /* Translate section/page access permissions to page
  * R/W protection flags.
  *
- * @env: CPUARMState
- * @mmu_idx: MMU index indicating required translation regime
  * @ap:  The 2-bit simple AP (AP[2:1])
+ * @is_user: TRUE if accessing from PL0
  */
-static inline int
-simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+static inline int simple_ap_to_rw_prot_is_user(int ap, bool is_user)
 {
-bool is_user = regime_is_user(env, mmu_idx);
-
 switch (ap) {
 case 0:
 return is_user ? 0 : PAGE_READ | PAGE_WRITE;
@@ -4985,6 +4981,93 @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx 
mmu_idx, int ap)
 }
 }
 
+static inline int
+simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+{
+return simple_ap_to_rw_prot_is_user(ap, regime_is_user(env, mmu_idx));
+}
+
+/* Translate section/page access permissions to protection flags
+ *
+ * @env: CPUARMState
+ * @mmu_idx: MMU index indicating required translation regime
+ * @is_aa64: TRUE if AArch64
+ * @ap:  The 2-bit simple AP (AP[2:1])
+ * @ns:  NS (non-secure) bit
+ * @xn:  XN (execute-never) bit
+ * @pxn: PXN (privileged execute-never) bit
+ */
+static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
+  int ap, int ns, int xn, int pxn)
+{
+bool is_user = regime_is_user(env, mmu_idx);
+int prot_rw, user_rw;
+bool have_wxn;
+int wxn = 0;
+
+assert(mmu_idx != ARMMMUIdx_S2NS);
+
+user_rw = simple_ap_to_rw_prot_is_user(ap, true);
+if (is_user) {
+prot_rw = user_rw;
+} else {
+prot_rw = simple_ap_to_rw_prot_is_user(ap, false);
+}
+
+if (ns && arm_is_secure(env) && (env->cp15.scr_el3 & SCR_SIF)) {
+return prot_rw;
+}
+
+/* TODO have_wxn should be replaced with
+ *   ARM_FEATURE_V8 || (ARM_FEATURE_V7 && ARM_FEATURE_EL2)
+ * when ARM_FEATURE_EL2 starts getting set. For now we assume all LPAE
+ * compatible processors have EL2, which is required for [U]WXN.
+ */
+have_wxn = arm_feature(env, ARM_FEATURE_LPAE);
+
+if (have_wxn) {
+wxn = regime_sctlr(env, mmu_idx) & SCTLR_WXN;
+}
+
+if (is_aa64) {
+switch (regime_el(env, mmu_idx)) {
+case 1:
+if (!is_user) {
+xn = pxn || (user_rw & PAGE_WRITE);
+}
+break;
+case 2:
+case 3:
+break;
+}
+} else if (arm_feature(env, ARM_FEATURE_V7)) {
+switch (regime_el(env, mmu_idx)) {
+case 1:
+case 3:
+if (is_user) {
+xn = xn || !(user_rw & PAGE_READ);
+} else {
+int uwxn = 0;
+if (have_wxn) {
+uwxn = regime_sctlr(env, mmu_idx) & SCTLR_UWXN;
+}
+xn = xn || !(prot_rw & PAGE_READ) || pxn ||
+ (uwxn && (user_rw & PAGE_WRITE));
+}
+break;
+case 2:
+break;
+}
+} else {
+xn = wxn = 0;
+}
+
+if (xn || (wxn && (prot_rw & PAGE_WRITE))) {
+return prot_rw;
+}
+return prot_rw | PAGE_EXEC;
+}
+
 static bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
  uint32_t *table, uint32_t address)
 {
@@ -5273,8 +5356,8 @@ static int get_phys_addr_lpae(CPUARMState *env, 
target_ulong address,
 int32_t granule_sz = 9;
 int32_t va_size = 32;
 int32_t tbi = 0;
-bool is_user;

[Qemu-devel] [PATCH v4 2/3] target-arm: fix get_phys_addr_v6/SCTLR_AFE access check

2015-03-11 Thread Andrew Jones
Introduce simple_ap_to_rw_prot(), which has the same behavior as
ap_to_rw_prot(), but takes the 2-bit simple AP[2:1] instead of
the 3-bit AP[2:0]. Use this in get_phys_addr_v6 when SCTLR_AFE
is set, as that bit indicates we should be using the simple AP
format.

It's unlikely this path is getting used. I don't see CR_AFE
getting used by Linux, so possibly not. If it had been, then
the check would have been wrong for all but AP[2:1] = 0b11.
Anyway, this should fix it up, in case it ever does get used.

Signed-off-by: Andrew Jones 
Reviewed-by: Peter Maydell 
---
 target-arm/helper.c | 49 ++---
 1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index d1e70a86a647c..d996659652f8d 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -4905,6 +4905,11 @@ static inline bool regime_is_user(CPUARMState *env, 
ARMMMUIdx mmu_idx)
 
 /* Translate section/page access permissions to page
  * R/W protection flags
+ *
+ * @env: CPUARMState
+ * @mmu_idx: MMU index indicating required translation regime
+ * @ap:  The 3-bit access permissions (AP[2:0])
+ * @domain_prot: The 2-bit domain access permissions
  */
 static inline int ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx,
 int ap, int domain_prot)
@@ -4954,6 +4959,32 @@ static inline int ap_to_rw_prot(CPUARMState *env, 
ARMMMUIdx mmu_idx,
 }
 }
 
+/* Translate section/page access permissions to page
+ * R/W protection flags.
+ *
+ * @env: CPUARMState
+ * @mmu_idx: MMU index indicating required translation regime
+ * @ap:  The 2-bit simple AP (AP[2:1])
+ */
+static inline int
+simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
+{
+bool is_user = regime_is_user(env, mmu_idx);
+
+switch (ap) {
+case 0:
+return is_user ? 0 : PAGE_READ | PAGE_WRITE;
+case 1:
+return PAGE_READ | PAGE_WRITE;
+case 2:
+return is_user ? 0 : PAGE_READ;
+case 3:
+return PAGE_READ;
+default:
+g_assert_not_reached();
+}
+}
+
 static bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
  uint32_t *table, uint32_t address)
 {
@@ -5186,14 +5217,18 @@ static int get_phys_addr_v6(CPUARMState *env, uint32_t 
address, int access_type,
 if (xn && access_type == 2)
 goto do_fault;
 
-/* The simplified model uses AP[0] as an access control bit.  */
-if ((regime_sctlr(env, mmu_idx) & SCTLR_AFE)
-&& (ap & 1) == 0) {
-/* Access flag fault.  */
-code = (code == 15) ? 6 : 3;
-goto do_fault;
+if (arm_feature(env, ARM_FEATURE_V6K) &&
+(regime_sctlr(env, mmu_idx) & SCTLR_AFE)) {
+/* The simplified model uses AP[0] as an access control bit.  */
+if ((ap & 1) == 0) {
+/* Access flag fault.  */
+code = (code == 15) ? 6 : 3;
+goto do_fault;
+}
+*prot = simple_ap_to_rw_prot(env, mmu_idx, ap >> 1);
+} else {
+*prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
 }
-*prot = ap_to_rw_prot(env, mmu_idx, ap, domain_prot);
 if (*prot && !xn) {
 *prot |= PAGE_EXEC;
 }
-- 
1.9.3




Re: [Qemu-devel] [RFC/WIP PATCH 6/6] memory: add clear_cache_to_poc

2015-03-11 Thread Andrew Jones
On Fri, Mar 06, 2015 at 01:53:38PM -0500, Andrew Jones wrote:
> Add a function that flushes the cache to PoC. We need a new
> function because __builtin___clear_cache only flushes to
> PoU. Call this function each time an address in a memory
> region that has been flagged as having an incoherent cache
> is written. For starters we only implement it for ARM. Most
> other architectures don't need it anyway.

I started looking for my missing flushes, and see I have stupidity
in this patch. I'm not flushing in the right place at all... My
testing was just [un]lucky, making me think it was on the right
track. I'll send an update to this tomorrow after I remove my head
from a dark hole near my chair.

drew


> 
> Signed-off-by: Andrew Jones 
> ---
> Currently only implemented for aarch64, doesn't completely work yet.
> 
>  exec.c  | 16 ++--
>  include/exec/exec-all.h | 41 +
>  2 files changed, 51 insertions(+), 6 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index c85321a38ba69..68268a5961ff5 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2261,7 +2261,7 @@ int cpu_memory_rw_debug(CPUState *cpu, target_ulong 
> addr,
>  
>  #else
>  
> -static void invalidate_and_set_dirty(hwaddr addr,
> +static void invalidate_and_set_dirty(MemoryRegion *mr, hwaddr addr,
>   hwaddr length)
>  {
>  if (cpu_physical_memory_range_includes_clean(addr, length)) {
> @@ -2269,6 +2269,10 @@ static void invalidate_and_set_dirty(hwaddr addr,
>  cpu_physical_memory_set_dirty_range_nocode(addr, length);
>  }
>  xen_modified_memory(addr, length);
> +if (memory_region_has_incoherent_cache(mr)) {
> +char *start = qemu_get_ram_ptr(addr);
> +clear_cache_to_poc(start, start + length);
> +}
>  }
>  
>  static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
> @@ -2348,7 +2352,7 @@ bool address_space_rw(AddressSpace *as, hwaddr addr, 
> uint8_t *buf,
>  /* RAM case */
>  ptr = qemu_get_ram_ptr(addr1);
>  memcpy(ptr, buf, l);
> -invalidate_and_set_dirty(addr1, l);
> +invalidate_and_set_dirty(mr, addr1, l);
>  }
>  } else {
>  if (!memory_access_is_direct(mr, is_write)) {
> @@ -2437,7 +2441,7 @@ static inline void 
> cpu_physical_memory_write_rom_internal(AddressSpace *as,
>  switch (type) {
>  case WRITE_DATA:
>  memcpy(ptr, buf, l);
> -invalidate_and_set_dirty(addr1, l);
> +invalidate_and_set_dirty(mr, addr1, l);
>  break;
>  case FLUSH_CACHE:
>  flush_icache_range((uintptr_t)ptr, (uintptr_t)ptr + l);
> @@ -2622,7 +2626,7 @@ void address_space_unmap(AddressSpace *as, void 
> *buffer, hwaddr len,
>  mr = qemu_ram_addr_from_host(buffer, &addr1);
>  assert(mr != NULL);
>  if (is_write) {
> -invalidate_and_set_dirty(addr1, access_len);
> +invalidate_and_set_dirty(mr, addr1, access_len);
>  }
>  if (xen_enabled()) {
>  xen_invalidate_map_cache_entry(buffer);
> @@ -2904,7 +2908,7 @@ static inline void stl_phys_internal(AddressSpace *as,
>  stl_p(ptr, val);
>  break;
>  }
> -invalidate_and_set_dirty(addr1, 4);
> +invalidate_and_set_dirty(mr, addr1, 4);
>  }
>  }
>  
> @@ -2967,7 +2971,7 @@ static inline void stw_phys_internal(AddressSpace *as,
>  stw_p(ptr, val);
>  break;
>  }
> -invalidate_and_set_dirty(addr1, 2);
> +invalidate_and_set_dirty(mr, addr1, 2);
>  }
>  }
>  
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 8eb0db3910e86..9bf74e791f357 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -106,6 +106,43 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr,
>hwaddr paddr, int prot,
>int mmu_idx, target_ulong size);
>  void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr);
> +#if defined(__aarch64__)
> +static inline void clear_cache_to_poc(char *begin, char *end)
> +{
> +/* Unfortunately __builtin___clear_cache only flushes
> + * to PoU, we need to implement this for PoC.
> + */
> +static unsigned long line_sz = 0;
> +unsigned long start, stop, addr;
> +
> +if (!line_sz) {
> +unsigned int ctr_el0;
> +asm volatile("mrs %0, ctr_el0" : "=&r" (ctr_el0));
> +line_sz = 

Re: [Qemu-devel] ARM CPU affinities

2015-09-28 Thread Andrew Jones
On Mon, Sep 28, 2015 at 10:13:15AM +0100, Peter Maydell wrote:
> On 27 September 2015 at 22:28, Peter Crosthwaite
>  wrote:
> > Hi Peter,
> >
> > I am looking at this:
> >
> > static void arm_cpu_initfn(Object *obj)
> > {
> > ...
> > Aff1 = cs->cpu_index / ARM_CPUS_PER_CLUSTER;
> > Aff0 = cs->cpu_index % ARM_CPUS_PER_CLUSTER;
> > cpu->mp_affinity = (Aff1 << ARM_AFF1_SHIFT) | Aff0;
> >
> >
> > Should we push this up to the machine model? I am trying to fix a
> > machine where Aff1 of the one and only cluster is non-zero. The kernel
> > SMP support barfs when Aff1 is mismatched to whats in the DTB (CPU reg
> > property).
> >
> > I think this modulo 8 starting from 0:0 policy might be specific to 
> > mach-virt?
> 
> Yeah, it should be a CPU property if you need it to be something
> different. I think we just left it as a hardcoded thing until
> somebody needed it to be different.
> 
> NB that as the comment says KVM currently imposes its own numbering
> anyway -- if you care about that you need to get the kernel to
> support having userspace tell it about affinity numbering.

I've been thinking about picking this work up. Let me confirm first that
Pavel hasn't already, because, iirc, he was the last to look into it.
Pavel?

Thanks,
drew



Re: [Qemu-devel] ARM CPU affinities

2015-09-28 Thread Andrew Jones
On Mon, Sep 28, 2015 at 06:16:26PM +0300, Pavel Fedin wrote:
>  Hello!
> 
> > I've been thinking about picking this work up. Let me confirm first that
> > Pavel hasn't already, because, iirc, he was the last to look into it.
> > Pavel?
> 
>  I am here. Yes, i did not do anything else with this. The problem is in the 
> kernel; every time vCPU
> is reset, it resets its MPIDR value to something that it wants it to be. IIRC 
> the hardcoded
> assignment is 16 CPUs per cluster starting from 0:0.

I think we'll need to cache mpidr in the vcpu data in order to properly
reset it.

> 
>  I have some strange problem with delivering mails, my emails do not seem to 
> make it to the list.
> Please reply me privately if you got this.

Got your mail, replying not only to you, but also the list though, as I'll
use the opportunity to confirm that I'll go ahead and pick up this work.

Thanks,
drew



Re: [Qemu-devel] [PATCH] hw/arm/virt: smbios: inform guest of kvm

2015-09-28 Thread Andrew Jones
On Thu, Sep 24, 2015 at 12:13:08PM +0200, Andrew Jones wrote:
> On Wed, Sep 23, 2015 at 08:43:39AM -0700, Peter Maydell wrote:
> > On 23 September 2015 at 07:18, Andrew Jones  wrote:
> > > ARM/AArch64 KVM guests don't have any way to identify
> > > themselves as KVM guests (x86 guests use a CPUID leaf). Now, we
> > > could discuss all sorts of reasons why guests shouldn't need to
> > > know that, but then there's always some case where it'd be
> > > nice... Anyway, now that we have SMBIOS tables in ARM guests,
> > > it's easy for the guest to know that it's a QEMU instance. This
> > > patch takes that one step further, also identifying KVM, when
> > > appropriate. Again, we could debate why generally nothing
> > > should care whether it's of type QEMU or QEMU/KVM, but again,
> > > sometimes it's nice to know...
> > 
> > This doesn't seem great to me, because it's ACPI/SMBIOS
> > specific. A mechanism that worked whether the guest was
> > booted via APCI or DT would seem preferable to me...
> 
> SMBIOS is populated on both ACPI and devicetree boots. We already
> have detection in virt-what and systemd-detect-virt for DT boots,
> although it only detects QEMU (it can't determine if KVM is used).
> That detection is DT-specific, and much more of a heuristic, it
> checks for the presence of the fw-cfg node in the DT. Actually, I'd
> like to patch those virt detection tools to try SMBIOS first (which,
> with this patch, could also give KVM info), and then fall back to
> trying the current DT-only, QEMU-only detection, before giving up.
>

Hi Peter,

Anymore thoughts on this?

Thanks,
drew 



Re: [Qemu-devel] ARM CPU affinities

2015-09-29 Thread Andrew Jones
On Mon, Sep 28, 2015 at 06:36:29PM +0300, Pavel Fedin wrote:
>  Hello!
> 
> > I think we'll need to cache mpidr in the vcpu data in order to properly
> > reset it.
> 
>  We already have it in our data. I took a quick look at that, but kernel 
> needs patching. IIRC reset

Right. I was referring to the kernel's vcpu data (struct kvm_vcpu).

> for secondary vCPUs happens inside kernel's PSCI code, before primary tells 
> it to start up. You
> cannot separate these two events.
>  In a short: you can use KVM_SET_ONE_REG for the MPIDR, and the value will go 
> into kernel, but as
> soon as you call KVM_VCPU_RUN for the secondary, it gets reset.

My current thinking is that we'll set MPIDR with KVM_SET_ONE_REG, which
will also set the cached value in struct kvm_vcpu. Reset will then use
that value.

Thanks,
drew



Re: [Qemu-devel] [PATCH] kvm-unit-tests: arm: Fail on unknown subtest

2015-09-30 Thread Andrew Jones
On Tue, Sep 29, 2015 at 11:08:27AM -0400, Christopher Covington wrote:
> Signed-off-by: Christopher Covington 
> ---
>  arm/selftest.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arm/selftest.c b/arm/selftest.c
> index fc9ec60..aa16a91 100644
> --- a/arm/selftest.c
> +++ b/arm/selftest.c
> @@ -376,6 +376,8 @@ int main(int argc, char **argv)
>   cpumask_set_cpu(0, &smp_reported);
>   while (!cpumask_full(&smp_reported))
>   cpu_relax();
> + } else {
> + report("Unknown subtest", false);

I think

 printf("Unknown subtest\n");
 abort();

might be better. Also, please post kvm-unit-tests patches to the
kvm mailing list.

Thanks,
drew

>   }
>  
>   return report_summary();
> -- 
> Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 



[Qemu-devel] [PATCH] configure: arm/aarch64: allow enable-seccomp

2015-10-01 Thread Andrew Jones
This is a revert of ae6e8ef11e6cb, but with a bit of refactoring,
and also specifically adding arm/aarch64, rather than all
architectures. Currently, libseccomp code appears to also support
mips, ppc, and s390. We could therefore allow qemu to enable
seccomp for those platforms as well, with additional configure
patches, given they're tested and proven to work.

Signed-off-by: Andrew Jones 
---
Depends on
http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg00191.html

 configure | 32 +---
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/configure b/configure
index f14454e691b36..2d993bf210b56 100755
--- a/configure
+++ b/configure
@@ -1870,16 +1870,34 @@ fi
 # libseccomp check
 
 if test "$seccomp" != "no" ; then
-if test "$cpu" = "i386" || test "$cpu" = "x86_64" &&
-$pkg_config --atleast-version=2.1.1 libseccomp; then
+case "$cpu" in
+i386|x86_64)
+libseccomp_minver="2.1.1"
+;;
+arm|aarch64)
+libseccomp_minver="2.2.3"
+;;
+*)
+libseccomp_minver=""
+;;
+esac
+
+if test "$libseccomp_minver" != "" &&
+   $pkg_config --atleast-version=$libseccomp_minver libseccomp ; then
 libs_softmmu="$libs_softmmu `$pkg_config --libs libseccomp`"
 QEMU_CFLAGS="$QEMU_CFLAGS `$pkg_config --cflags libseccomp`"
-   seccomp="yes"
+seccomp="yes"
 else
-   if test "$seccomp" = "yes"; then
-feature_not_found "libseccomp" "Install libseccomp devel >= 2.1.1"
-   fi
-   seccomp="no"
+if test "$seccomp" = "yes" ; then
+if test "$libseccomp_minver" != "" ; then
+feature_not_found "libseccomp" \
+"Install libseccomp devel >= $libseccomp_minver"
+else
+feature_not_found "libseccomp" \
+"libseccomp is not supported for host cpu $cpu"
+fi
+fi
+seccomp="no"
 fi
 fi
 ##
-- 
2.5.2




Re: [Qemu-devel] [PATCHv2] arm: Fail on unknown subtest

2015-10-02 Thread Andrew Jones
On Thu, Oct 01, 2015 at 03:46:25PM -0400, Christopher Covington wrote:
> Signed-off-by: Christopher Covington 
> ---
>  arm/selftest.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arm/selftest.c b/arm/selftest.c
> index fc9ec60..f4a5030 100644
> --- a/arm/selftest.c
> +++ b/arm/selftest.c
> @@ -376,6 +376,9 @@ int main(int argc, char **argv)
>   cpumask_set_cpu(0, &smp_reported);
>   while (!cpumask_full(&smp_reported))
>   cpu_relax();
> + } else {
> + printf("Unknown subtest\n");
> + abort();
>   }
>  
>   return report_summary();
> -- 
> Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 
> 

Reviewed-by: Andrew Jones 

and applied to https://github.com/rhdrjones/kvm-unit-tests/tree/staging

Thanks,
drew



Re: [Qemu-devel] [PATCH] arm: Add PMU test

2015-10-02 Thread Andrew Jones
On Thu, Oct 01, 2015 at 03:47:21PM -0400, Christopher Covington wrote:
> Beginning with just a read of the control register, add plumbing
> for testing the ARM Performance Monitors Unit (PMU).
> 
> Signed-off-by: Christopher Covington 

Hi Christopher,

PMU tests are a great idea. I think Wei Huang has started working
on them too. I've CC'ed him so you two can work out how to proceed
without duplication of effort.

For kvm-unit-tests patches please make sure the kvm-unit-tests flag
is in the PATCH tag. You may want to run

  git config format.subjectprefix "kvm-unit-tests PATCH"

in your git repo.


> ---
>  arm/pmu.c| 31 +++
>  arm/unittests.cfg|  5 +
>  config/config-arm-common.mak |  4 +++-
>  3 files changed, 39 insertions(+), 1 deletion(-)
>  create mode 100644 arm/pmu.c
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> new file mode 100644
> index 000..b1e3c7a
> --- /dev/null
> +++ b/arm/pmu.c
> @@ -0,0 +1,31 @@
> +#include "libcflat.h"
> +
> +union pmcr_el0 {
> + struct {
> + unsigned int implementor:8;
> + unsigned int identification_code:8;
> + unsigned int num_counters:5;
> + unsigned int zeros:4;
> + unsigned int cycle_counter_long:1;
> + unsigned int cycle_counter_disable_when_prohibited:1;
> + unsigned int event_counter_export:1;
> + unsigned int cycle_counter_clock_divider:1;
> + unsigned int cycle_counter_reset:1;
> + unsigned int event_counter_reset:1;
> + unsigned int enable:1;
> + } split;

I'd use an anonymous struct here. And, since I prefer to avoid needing
to remember a data structure is a union, I'd also make the union anonymous
and wrap it inside a struct, i.e.

struct pmu_data {
union {
uint32_t pmcr_el0;
struct {
...
};
};
};

> + uint32_t full;
> +};
> +
> +int main()
> +{
> + union pmcr_el0 pmcr;
> +
> + asm volatile("mrs %0, pmcr_el0\n" : "=r" (pmcr));
> +
> + printf("PMU implementor: 0x%x\n", pmcr.split.implementor);
> + printf("Identification code: 0x%x\n", pmcr.split.identification_code);
> + printf("Event counters:  %d\n", pmcr.split.num_counters);
> +
> + return report_summary();
> +}
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index e068a0c..d3deb6a 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -35,3 +35,8 @@ file = selftest.flat
>  smp = `getconf _NPROCESSORS_CONF`
>  extra_params = -append 'smp'
>  groups = selftest
> +
> +# Test PMU support
> +[pmu]
> +file = pmu.flat
> +groups = pmu
> diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
> index 698555d..b34d04c 100644
> --- a/config/config-arm-common.mak
> +++ b/config/config-arm-common.mak
> @@ -11,7 +11,8 @@ endif
>  
>  tests-common = \
>   $(TEST_DIR)/selftest.flat \
> - $(TEST_DIR)/spinlock-test.flat
> + $(TEST_DIR)/spinlock-test.flat \
> + $(TEST_DIR)/pmu.flat
>  
>  all: test_cases
>  
> @@ -70,3 +71,4 @@ test_cases: $(generated_files) $(tests-common) $(tests)
>  
>  $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o
>  $(TEST_DIR)/spinlock-test.elf: $(cstart.o) $(TEST_DIR)/spinlock-test.o
> +$(TEST_DIR)/pmu.elf: $(cstart.o) $(TEST_DIR)/pmu.o
> -- 
> Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 
>

Thanks,
drew 



Re: [Qemu-devel] [PATCH] configure: arm/aarch64: allow enable-seccomp

2015-10-02 Thread Andrew Jones
On Fri, Oct 02, 2015 at 04:02:12PM +0200, Eduardo Otubo wrote:
> On Wed, Sep 30, 2015 at 11=59=18AM -0400, Andrew Jones wrote:
> > This is a revert of ae6e8ef11e6cb, but with a bit of refactoring,
> > and also specifically adding arm/aarch64, rather than all
> > architectures. Currently, libseccomp code appears to also support
> > mips, ppc, and s390. We could therefore allow qemu to enable
> > seccomp for those platforms as well, with additional configure
> > patches, given they're tested and proven to work.
> > 
> > Signed-off-by: Andrew Jones 
> > ---
> > Depends on
> > http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg00191.html
> > 
> >  configure | 32 +---
> >  1 file changed, 25 insertions(+), 7 deletions(-)
> > 
> > diff --git a/configure b/configure
> > index f14454e691b36..2d993bf210b56 100755
> > --- a/configure
> > +++ b/configure
> > @@ -1870,16 +1870,34 @@ fi
> >  # libseccomp check
> >  
> >  if test "$seccomp" != "no" ; then
> > -if test "$cpu" = "i386" || test "$cpu" = "x86_64" &&
> > -$pkg_config --atleast-version=2.1.1 libseccomp; then
> > +case "$cpu" in
> > +i386|x86_64)
> > +libseccomp_minver="2.1.1"
> > +;;
> > +arm|aarch64)
> > +libseccomp_minver="2.2.3"
> > +;;
> > +*)
> > +libseccomp_minver=""
> > +;;
> > +esac
> > +
> > +if test "$libseccomp_minver" != "" &&
> > +   $pkg_config --atleast-version=$libseccomp_minver libseccomp ; then
> >  libs_softmmu="$libs_softmmu `$pkg_config --libs libseccomp`"
> >  QEMU_CFLAGS="$QEMU_CFLAGS `$pkg_config --cflags libseccomp`"
> > -   seccomp="yes"
> > +seccomp="yes"
> >  else
> > -   if test "$seccomp" = "yes"; then
> > -feature_not_found "libseccomp" "Install libseccomp devel >= 
> > 2.1.1"
> > -   fi
> > -   seccomp="no"
> > +if test "$seccomp" = "yes" ; then
> > +if test "$libseccomp_minver" != "" ; then
> > +feature_not_found "libseccomp" \
> > +"Install libseccomp devel >= $libseccomp_minver"
> > +else
> > +feature_not_found "libseccomp" \
> > +"libseccomp is not supported for host cpu $cpu"
> > +fi
> > +fi
> > +seccomp="no"
> >  fi
> >  fi
> >  ##
> > -- 
> > 2.5.2
> > 
> 
> The patch does look good, but I'll just delay a little bit the pull
> request due to more patches incoming. I just want to create a single
> batch. Is ok if we just merge it mid or end of next week?

Works for me!

Thanks,
drew

> 
> Thanks for the contribution.
> 
> Acked-by: Eduardo Otubo 
> 
> -- 
> Eduardo Otubo
> ProfitBricks GmbH





[Qemu-devel] [PATCH 2/2] hw/arm/virt: don't use a15memmap directly

2015-10-06 Thread Andrew Jones
We should always go through VirtBoardInfo when we need the memmap.
To avoid using a15memmap directly, in this case, we need to defer
the max-cpus check from class init time to instance init time. In
class init we now use MAX_CPUMASK_BITS for max_cpus initialization,
which is the maximum QEMU supports, and also, incidentally, the
maximum KVM/gicv3 currently supports. Also, a nice side-effect of
delaying the max-cpus check is that we now get more appropriate
error messages for gicv2 machines that try to configure more than
123 cpus. Before this patch it would complain that the requested
number of cpus was greater than 123, but for gicv2 configs, it
should complain that the number is greater than 8.

Signed-off-by: Andrew Jones 
---
 hw/arm/virt.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d25d6cfce74cd..a9901983731ae 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -918,7 +918,7 @@ static void machvirt_init(MachineState *machine)
 qemu_irq pic[NUM_IRQS];
 MemoryRegion *sysmem = get_system_memory();
 int gic_version = vms->gic_version;
-int n;
+int n, max_cpus;
 MemoryRegion *ram = g_new(MemoryRegion, 1);
 const char *cpu_model = machine->cpu_model;
 VirtBoardInfo *vbi;
@@ -952,6 +952,21 @@ static void machvirt_init(MachineState *machine)
 exit(1);
 }
 
+/* The maximum number of CPUs depends on the GIC version, or on how
+ * many redistributors we can fit into the memory map.
+ */
+if (gic_version == 3) {
+max_cpus = vbi->memmap[VIRT_GIC_REDIST].size / 0x2;
+} else {
+max_cpus = GICV2_NCPU;
+}
+
+if (smp_cpus > max_cpus) {
+error_report("mach-virt: Number of SMP cpus requested (%d), "
+ "exceeds max cpus supported %d", smp_cpus, max_cpus);
+exit(1);
+}
+
 vbi->smp_cpus = smp_cpus;
 
 if (machine->ram_size > vbi->memmap[VIRT_MEM].size) {
@@ -1150,10 +1165,7 @@ static void virt_class_init(ObjectClass *oc, void *data)
 
 mc->desc = "ARM Virtual Machine",
 mc->init = machvirt_init;
-/* Our maximum number of CPUs depends on how many redistributors
- * we can fit into memory map
- */
-mc->max_cpus = a15memmap[VIRT_GIC_REDIST].size / 0x2;
+mc->max_cpus = MAX_CPUMASK_BITS;
 mc->has_dynamic_sysbus = true;
 mc->block_default_type = IF_VIRTIO;
 mc->no_cdrom = 1;
-- 
2.4.3




[Qemu-devel] [PATCH 0/2] hw/arm/virt: max-cpus init/check fixup

2015-10-06 Thread Andrew Jones
Andrew Jones (2):
  [RFC] arm_gic_common.h: add gicv2 aliases for defines
  hw/arm/virt: don't use a15memmap directly

 hw/arm/virt.c| 22 +-
 include/hw/intc/arm_gic_common.h |  2 ++
 2 files changed, 19 insertions(+), 5 deletions(-)

-- 
2.4.3




[Qemu-devel] [PATCH 1/2] [RFC] arm_gic_common.h: add gicv2 aliases for defines

2015-10-06 Thread Andrew Jones
I'm not sure if arm_gic_common.h is supposed to be common, not
only between tcg and kvm, but also v2 and v3, but it currently
is (arm_gicv3_common.h includes it, and it's the only gic header
included by hw/arm/virt.c). If it should be the super-common
header, then it's unfortunate that the define names are too
generic. This patch doesn't help much, as it doesn't rename
anything, but it does start heading down the right path. With
it, code including the super-common header can start using more
appropriate names for a couple very gic-version-specific defines.

Signed-off-by: Andrew Jones 
---
 include/hw/intc/arm_gic_common.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/hw/intc/arm_gic_common.h b/include/hw/intc/arm_gic_common.h
index 564a72b2cf77f..299226064a30f 100644
--- a/include/hw/intc/arm_gic_common.h
+++ b/include/hw/intc/arm_gic_common.h
@@ -25,11 +25,13 @@
 
 /* Maximum number of possible interrupts, determined by the GIC architecture */
 #define GIC_MAXIRQ 1020
+#define GICV2_MAXIRQ GIC_MAXIRQ
 /* First 32 are private to each CPU (SGIs and PPIs). */
 #define GIC_INTERNAL 32
 #define GIC_NR_SGIS 16
 /* Maximum number of possible CPU interfaces, determined by GIC architecture */
 #define GIC_NCPU 8
+#define GICV2_NCPU GIC_NCPU
 
 #define MAX_NR_GROUP_PRIO 128
 #define GIC_NR_APRS (MAX_NR_GROUP_PRIO / 32)
-- 
2.4.3




Re: [Qemu-devel] [PATCH v4] ivshmem: allow the sharing of hugepages

2015-10-06 Thread Andrew Jones
On Tue, Oct 06, 2015 at 04:33:23PM +0300, Pavel Fedin wrote:
> This patch permits to share memory areas that do not specifically belong to
> /dev/shm. In such case, the file must be already present when launching qemu.
> A new parameter 'file' has been added to specify the file to use.
> 
> A use case for this patch is sharing huge pages available through a
> hugetlbfs mountpoint.

ivshmem is getting a big reboot right now. See this series
"[PATCH v5 00/48] ivshmem improvements". Part of that work
is to better support sharing huge pages. Please look at that
series first, and if there's still something missing, then
additional patches to that series should be posted.

Thanks,
drew



Re: [Qemu-devel] [kvm-unit-tests PATCHv3 1/3] arm: Add PMU test

2015-10-06 Thread Andrew Jones
On Tue, Oct 06, 2015 at 01:49:24PM -0400, Christopher Covington wrote:
> Beginning with a simple sanity check of the control register, add
> a unit test for the ARM Performance Monitors Unit (PMU).
> 
> Signed-off-by: Christopher Covington 
> ---
>  arm/pmu.c   | 66 
> +
>  arm/unittests.cfg   |  5 
>  config/config-arm64.mak |  4 ++-
>  3 files changed, 74 insertions(+), 1 deletion(-)
>  create mode 100644 arm/pmu.c
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> new file mode 100644
> index 000..91a3688
> --- /dev/null
> +++ b/arm/pmu.c
> @@ -0,0 +1,66 @@
> +/*
> + * Test the ARM Performance Monitors Unit (PMU).
> + *
> + * Copyright 2015 The Linux Foundation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU Lesser General Public License version 2.1 and
> + * only version 2.1 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but 
> WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public 
> License
> + * for more details.
> + */
> +#include "libcflat.h"
> +
> +struct pmu_data {
> + union {
> + uint32_t pmcr_el0;
> + struct {
> + uint32_t enable:1;
> + uint32_t event_counter_reset:1;
> + uint32_t cycle_counter_reset:1;
> + uint32_t cycle_counter_clock_divider:1;
> + uint32_t event_counter_export:1;
> + uint32_t cycle_counter_disable_when_prohibited:1;
> + uint32_t cycle_counter_long:1;
> + uint32_t zeros:4;

resv might be a better name than zeros. Actually, does the spec even
state these bits will always be read as zeros?

> + uint32_t num_counters:5;
> + uint32_t identification_code:8;
> + uint32_t implementer:8;
> + };
> + };
> +};
> +
> +/* As a simple sanity check on the PMCR_EL0, ensure the implementer field 
> isn't
> + * null. Also print out a couple other interesting fields for diagnostic
> + * purposes. For example, as of fall 2015, QEMU TCG mode doesn't implement
> + * event counters and therefore reports zero of them, but hopefully support 
> for
^for
> + * at least the instructions event will be added in the future and the 
> reported
> + * number of event counters will become nonzero.
> + */
> +static bool check_pmcr(void)
> +{
> + struct pmu_data pmcr;

Thanks for making the union and bitfield anonymous. I suggested
naming the struct pmu_data because I'm expecting more stuff to
eventually get shoved in there. If not, then maybe it needs to
be renamed something more pmcr-ish. Or, if it might expand its
purpose as I expect, then this variable should be renamed, i.e.

  struct pmu_data pmu;

> +
> + asm volatile("mrs %0, pmcr_el0" : "=r" (pmcr));

And this should be

  asm volatile("mrs %0, pmcr_el0" : "=r" (pmu.pmcr_el0));

> +
> + printf("PMU implementer: %c\n", pmcr.implementer);
> + printf("Identification code: 0x%x\n", pmcr.identification_code);
> + printf("Event counters:  %d\n", pmcr.num_counters);
> +
> + if (pmcr.implementer)
> + return true;
> +
> + return false;

How about

  return pmcr.implementer != 0;

> +}
> +
> +int main(void)
> +{
> + report_prefix_push("pmu");
> +
> + report("Control register", check_pmcr());
> +
> + return report_summary();
> +}
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index e068a0c..fd94adb 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -35,3 +35,8 @@ file = selftest.flat
>  smp = `getconf _NPROCESSORS_CONF`
>  extra_params = -append 'smp'
>  groups = selftest
> +
> +# Test PMU support without -icount
> +[pmu]
> +file = pmu.flat
> +groups = pmu
> diff --git a/config/config-arm64.mak b/config/config-arm64.mak
> index d61b703..140b611 100644
> --- a/config/config-arm64.mak
> +++ b/config/config-arm64.mak
> @@ -12,9 +12,11 @@ cflatobjs += lib/arm64/processor.o
>  cflatobjs += lib/arm64/spinlock.o
>  
>  # arm64 specific tests
> -tests =
> +tests = $(TEST_DIR)/pmu.flat
>  
>  include config/config-arm-common.mak
>  
>  arch_clean: arm_clean
>   $(RM) lib/arm64/.*.d
> +
> +$(TEST_DIR)/pmu.elf: $(cstart.o) $(TEST_DIR)/pmu.o

We can have a PMU on 32-bit ARM processors too. If this test file
needs to be specific to 64-bit, then we should probably name it
less generically, like pmu64?

Thanks,
drew



Re: [Qemu-devel] [kvm-unit-tests PATCHv3 2/3] arm: pmu: Check cycle count increases

2015-10-06 Thread Andrew Jones
On Tue, Oct 06, 2015 at 01:49:25PM -0400, Christopher Covington wrote:
> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
> even for the smallest delta of two subsequent reads.
> 
> Signed-off-by: Christopher Covington 
> ---
>  arm/pmu.c | 29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 91a3688..589e605 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -33,6 +33,8 @@ struct pmu_data {
>   };
>  };
>  
> +static const int samples = 10;

#define NR_SAMPLES 10

> +
>  /* As a simple sanity check on the PMCR_EL0, ensure the implementer field 
> isn't
>   * null. Also print out a couple other interesting fields for diagnostic
>   * purposes. For example, as of fall 2015, QEMU TCG mode doesn't implement
> @@ -56,11 +58,38 @@ static bool check_pmcr(void)
>   return false;
>  }
>  
> +/* Ensure that the cycle counter progresses between back-to-back reads.
> + */

style nit: your block quotes don't have opening wing (the preferred
kernel style - and, fwiw, my preference too...)

> +static bool check_cycles_increase(void)
> +{
> + struct pmu_data pmcr;
> +
> + pmcr.enable = 1;
> + asm volatile("msr pmcr_el0, %0" : : "r" (pmcr));
> +
> + for (int i = 0; i < samples; i++) {
> + int a, b;
> +
> + asm volatile(
> + "mrs %[a], pmccntr_el0\n"
> + "mrs %[b], pmccntr_el0\n"
> + : [a] "=r" (a), [b] "=r" (b));
> +
> + if (a >= b) {
> + printf("Read %d then %d.\n", a, b);
> + return false;
> + }
> + }
> +
> + return true;
> +}
> +
>  int main(void)
>  {
>   report_prefix_push("pmu");
>  
>   report("Control register", check_pmcr());
> + report("Monotonically increasing cycle count", check_cycles_increase());
>  
>   return report_summary();
>  }
> -- 
> Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 
> 



Re: [Qemu-devel] [kvm-unit-tests PATCHv3 3/3] arm: pmu: Add CPI checking

2015-10-06 Thread Andrew Jones
On Tue, Oct 06, 2015 at 01:49:26PM -0400, Christopher Covington wrote:
> Check the numbers of cycles per instruction (CPI) implied by ARM PMU
> cycle counter values. Check that in -icount mode these strictly
> match the specified rate.
> 
> Signed-off-by: Christopher Covington 
> ---
>  arm/pmu.c | 72 
> ++-
>  arm/unittests.cfg | 13 ++
>  2 files changed, 84 insertions(+), 1 deletion(-)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 589e605..0ad113d 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -84,12 +84,82 @@ static bool check_cycles_increase(void)
>   return true;
>  }
>  
> -int main(void)
> +/* Execute a known number of guest instructions. Only odd instruction counts
> + * greater than or equal to 3 are supported by the in-line assembly code. The
> + * control register (PMCR_EL0) is initialized with the provided value 
> (allowing
> + * for example for the cycle counter or event counters to be reset). At the 
> end
> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
> + * counting, allowing the cycle counter or event counters to be read at the
> + * leisure of the calling code.
> + */
> +static void measure_instrs(int num, struct pmu_data pmcr)
> +{
> + int i = (num - 1) / 2;
> +
> + if (num < 3 || ((num - 1) % 2))
> + abort();

assert(num >= 3 && ((num - 1) % 2) == 0);

> +
> + asm volatile(
> + "msr pmcr_el0, %[pmcr]\n"
^\t
> + "1: subs %[i], %[i], #1\n"
   ^\t  ^\t
> + "b.gt 1b\n"
 ^\t
> + "msr pmcr_el0, xzr"
^\t
> + : [i] "+r" (i) : [pmcr] "r" (pmcr) : "cc");
> +}
> +
> +/* Measure cycle counts for various known instruction counts. Ensure that the
> + * cycle counter progresses (similar to check_cycles_increase() but with more
> + * instructions and using reset and stop controls). If supplied a positive,
> + * nonzero CPI parameter, also strictly check that every measurement matches
> + * it. Strict CPI checking is used to test -icount mode.
> + */
> +static bool check_cpi(int cpi)
> +{
> + struct pmu_data pmcr;
> +
> + pmcr.cycle_counter_reset = 1;
> + pmcr.enable = 1;
> +
> + if (cpi > 0)
> + printf("Checking for CPI=%d.\n", cpi);
> + printf("instrs : cycles0 cycles1 ...\n");
> +
> + for (int i = 3; i < 300; i += 32) {
> + int avg, sum = 0;
> +
> + printf("%d :", i);
> + for (int j = 0; j < samples; j++) {
> + int cycles;
> +
> + measure_instrs(i, pmcr);
> + asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
> + printf(" %d", cycles);
> +
> + if (!cycles || (cpi > 0 && cycles != i * cpi)) {
> + printf("\n");
> + return false;
> + }
> +
> + sum += cycles;
> + }
> + avg = sum / samples;
> + printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n",
> + sum, avg, i / avg, avg / i);
> + }
> +
> + return true;
> +}
> +
> +int main(int argc, char *argv[])
>  {
>   report_prefix_push("pmu");
>  
>   report("Control register", check_pmcr());
>   report("Monotonically increasing cycle count", check_cycles_increase());
>  
> + int cpi = (argc == 1 ? atol(argv[0]) : 0);

I prefer variable declarations at the top of the function, and

  int cpi = 0;

  if (argc > 1)
cpi = atol(argv[0]);

looks a bit better to me.


> +
> + report("Cycle/instruction ratio", check_cpi(cpi));
> +
>   return report_summary();
>  }
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index fd94adb..333ee0d 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -39,4 +39,17 @@ groups = selftest
>  # Test PMU support without -icount
>  [pmu]
>  file = pmu.flat
> +extra_params = -append '-1'

Why do we need this cpu == -1? Can't it just be zero?

> +groups = pmu
> +
> +# Test PMU support with -icount IPC=1
> +[pmu-icount-1]
> +file = pmu.flat
> +extra_params = -icount 0 -append '1'
> +groups = pmu
> +
> +# Test PMU support with -icount IPC=256
> +[pmu-icount-256]
> +file = pmu.flat
> +extra_params = -icount 8 -append '256'
>  groups = pmu

-icount is a tcg specific parameter. I have a patch[*] in my staging
branch which allows you to specify 'accel = tcg' in unittests.cfg for
this type of test. You'll need to use that for anything with -icount
on the extra_params list.

Thanks,
drew

[*] 
https://github.com/rhdrjones/kvm-unit-tests/commit/85e084cf263e76484f7d82cbc9add4e7602f80a4



Re: [Qemu-devel] [PULL 00/48] ivshmem series

2015-10-07 Thread Andrew Jones
On Wed, Oct 07, 2015 at 08:16:40AM -0400, Marc-André Lureau wrote:
> 
> Hi Andreas
> 
> - Original Message -
> > Am 06.10.2015 um 21:18 schrieb marcandre.lur...@redhat.com:
> > > From: Marc-André Lureau 
> > > 
> > > The following changes since commit
> > > 5fdb4671b08e0d1631447e81348b2b50a6b85bf7:
> > > 
> > >   Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request'
> > >   into staging (2015-10-06 13:42:33 +0100)
> > > 
> > > are available in the git repository at:
> > > 
> > >   https://github.com/elmarco/qemu tags/ivshmem-series
> > > 
> > > for you to fetch changes up to 097cadb155ef22be286af1403240b4fbf0f038ef:
> > > 
> > >   ivshmem: use little-endian int64_t for the protocol (2015-10-06 21:17:22
> > >   +0200)
> > > 
> > > 
> > > Ivshmem series
> > > 
> > > 
> > [...]
> > > Marc-André Lureau (45):
> > [...]
> > >   tests: add ivshmem qtest
> > 
> > I had NAK'ed this patch in v1 and it has not been fixed. If this pull
> > gets merged I will immediately revert it. Not funny.
> > 
> 
> 
> Could stick to technical review, please. The test runs fine without kvm. 
> Regarding your copyright claim, I already explain that your older version of 
> boilerplate test is really nothing compare to this one. But if you feel so 
> strongly about it, I don't care you add a copyright line.
> 

I would care if we added it. If contributors are getting bullied into
outrageous demands, then there's something wrong. Something wrong is
something we should try to fix, not just shrug off. And, in this case,
Andreas' claim is quite outrageous. The patch[*] in question provided
absolutely nothing that couldn't have been copy+pasted from any other
qtest.

drew

[*] http://patchwork.ozlabs.org/patch/336367/



Re: [Qemu-devel] [PATCH 2/2] hw/arm/virt: don't use a15memmap directly

2015-10-09 Thread Andrew Jones
On Fri, Oct 09, 2015 at 05:45:24PM +0100, Peter Maydell wrote:
> On 6 October 2015 at 15:37, Andrew Jones  wrote:
> > We should always go through VirtBoardInfo when we need the memmap.
> > To avoid using a15memmap directly, in this case, we need to defer
> > the max-cpus check from class init time to instance init time. In
> > class init we now use MAX_CPUMASK_BITS for max_cpus initialization,
> > which is the maximum QEMU supports, and also, incidentally, the
> > maximum KVM/gicv3 currently supports. Also, a nice side-effect of
> > delaying the max-cpus check is that we now get more appropriate
> > error messages for gicv2 machines that try to configure more than
> > 123 cpus. Before this patch it would complain that the requested
> > number of cpus was greater than 123, but for gicv2 configs, it
> > should complain that the number is greater than 8.
> 
> Yes, this seems like a good plan.
> 
> > Signed-off-by: Andrew Jones 
> > ---
> >  hw/arm/virt.c | 22 +-
> >  1 file changed, 17 insertions(+), 5 deletions(-)
> >
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index d25d6cfce74cd..a9901983731ae 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -918,7 +918,7 @@ static void machvirt_init(MachineState *machine)
> >  qemu_irq pic[NUM_IRQS];
> >  MemoryRegion *sysmem = get_system_memory();
> >  int gic_version = vms->gic_version;
> > -int n;
> > +int n, max_cpus;
> >  MemoryRegion *ram = g_new(MemoryRegion, 1);
> >  const char *cpu_model = machine->cpu_model;
> >  VirtBoardInfo *vbi;
> > @@ -952,6 +952,21 @@ static void machvirt_init(MachineState *machine)
> >  exit(1);
> >  }
> >
> > +/* The maximum number of CPUs depends on the GIC version, or on how
> > + * many redistributors we can fit into the memory map.
> > + */
> > +if (gic_version == 3) {
> > +max_cpus = vbi->memmap[VIRT_GIC_REDIST].size / 0x2;
> > +} else {
> > +max_cpus = GICV2_NCPU;
> > +}
> > +
> > +if (smp_cpus > max_cpus) {
> > +error_report("mach-virt: Number of SMP cpus requested (%d), "
> 
> The comma here in the error message is unnecessary.
> 
> > + "exceeds max cpus supported %d", smp_cpus, max_cpus);
> 
> ...and there should be parens around the %d here for consistency.
> 
> Since this is a user-facing error message, "CPUs" is nicer than
> "cpus".

Sounds good to me, but I actually took that message from vl.c, where
there's already a message for the same purpose, so I was trying to be
consistent with that. Maybe we should clean that one up too.

> 
> > +exit(1);
> > +}
> > +
> >  vbi->smp_cpus = smp_cpus;
> >
> >  if (machine->ram_size > vbi->memmap[VIRT_MEM].size) {
> > @@ -1150,10 +1165,7 @@ static void virt_class_init(ObjectClass *oc, void 
> > *data)
> >
> >  mc->desc = "ARM Virtual Machine",
> >  mc->init = machvirt_init;
> > -/* Our maximum number of CPUs depends on how many redistributors
> > - * we can fit into memory map
> > - */
> > -mc->max_cpus = a15memmap[VIRT_GIC_REDIST].size / 0x2;
> > +mc->max_cpus = MAX_CPUMASK_BITS;
> 
> A brief comment that we do a more restrictive check later at init
> time would be a good idea I think.

OK

> 
> >  mc->has_dynamic_sysbus = true;
> >  mc->block_default_type = IF_VIRTIO;
> >  mc->no_cdrom = 1;
> > --
> 
> thanks
> -- PMM

Thanks for the review. I'll send out a v2 soon.

drew



Re: [Qemu-devel] [PATCH 2/2] hw/arm/virt: don't use a15memmap directly

2015-10-12 Thread Andrew Jones
On Mon, Oct 12, 2015 at 10:00:19AM +0300, Pavel Fedin wrote:
>  Hello!
> 
> > Before this patch it would complain that the requested
> > number of cpus was greater than 123, but for gicv2 configs, it
> > should complain that the number is greater than 8.
> 
>  Actually, gicv2 code has own check, and it would complain about >8 CPU (see
> arm_gic_common_realize()).

It only gets a chance to do that check if the class init check doesn't
fail first, i.e. nr_cpus > 8 <= 123 will fail with a "> 8" message, but
nr_cpus > 123, as I wrote above, will fail with a "> 123" message.

drew

> 
> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia
> 
> 
> 



Re: [Qemu-devel] [PATCH] vl.c: Replace fprintf(stderr) with error_report()

2015-10-27 Thread Andrew Jones
On Mon, Oct 26, 2015 at 03:13:43PM -0200, Eduardo Habkost wrote:
> This replaces most fprintf(stderr) calls on vl.c with error_report().
> 
> The trailing newlines, "qemu:" and "error:" message prefixes were
> removed.
> 
> The only remaining fprintf(stderr) calls are the ones at
> qemu_kill_report(), because the error mesage is split in multiple
> fprintf() calls.
> 
> Signed-off-by: Eduardo Habkost 
> ---
> Not sure if this is appropriate post soft-freeze, but if we are going to apply
> the max-cpus patch from Drew before 2.5.0, we could simply change all the
> fprintf() calls in a single step.

In addition to Markus' and Eric's comments, I think we should

1. make sure the first word's case is correct. Lowercase for phrases,
   uppercase for sentences and proper nouns.
2. make sure the 'warning' prefix is consistent in all uses. This should
   be a QEMU-wide change. Maybe we need an error_report_warn variant?
3. make sure past tense is used in phrases like "failed to do..."
4. only break the line if necessary. Some of the lines are broken even
   though they could fit in 80 char. Probably the opposite exists too,
   i.e. some are > 80.
  
I've tried to point out a few of the cases below that inspired me to
to write these suggestions.

Thanks,
drew


> ---
>  vl.c | 228 
> +--
>  1 file changed, 112 insertions(+), 116 deletions(-)
> 
> diff --git a/vl.c b/vl.c
> index dffaf09..db4a1f5 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -674,9 +674,9 @@ void runstate_set(RunState new_state)
>  assert(new_state < RUN_STATE_MAX);
>  
>  if (!runstate_valid_transitions[current_run_state][new_state]) {
> -fprintf(stderr, "ERROR: invalid runstate transition: '%s' -> '%s'\n",
> -RunState_lookup[current_run_state],
> -RunState_lookup[new_state]);
> +error_report("invalid runstate transition: '%s' -> '%s'",
> + RunState_lookup[current_run_state],
> + RunState_lookup[new_state]);
>  abort();
>  }
>  trace_runstate_set(new_state);
> @@ -828,8 +828,8 @@ static void configure_rtc_date_offset(const char 
> *startdate, int legacy)
>  rtc_start_date = mktimegm(&tm);
>  if (rtc_start_date == -1) {
>  date_fail:
> -fprintf(stderr, "Invalid date format. Valid formats are:\n"
> -"'2006-06-17T16:01:21' or '2006-06-17'\n");
> +error_report("Invalid date format. Valid formats are:\n"
> + "'2006-06-17T16:01:21' or '2006-06-17'");
>  exit(1);
>  }
>  rtc_date_offset = qemu_time() - rtc_start_date;
> @@ -859,7 +859,7 @@ static void configure_rtc(QemuOpts *opts)
>  } else if (!strcmp(value, "vm")) {
>  rtc_clock = QEMU_CLOCK_VIRTUAL;
>  } else {
> -fprintf(stderr, "qemu: invalid option value '%s'\n", value);
> +error_report("invalid option value '%s'", value);
>  exit(1);
>  }
>  }
> @@ -879,7 +879,7 @@ static void configure_rtc(QemuOpts *opts)
>  } else if (!strcmp(value, "none")) {
>  /* discard is default */
>  } else {
> -fprintf(stderr, "qemu: invalid option value '%s'\n", value);
> +error_report("invalid option value '%s'", value);
>  exit(1);
>  }
>  }
> @@ -905,7 +905,7 @@ static int bt_hci_parse(const char *str)
>  bdaddr_t bdaddr;
>  
>  if (nb_hcis >= MAX_NICS) {
> -fprintf(stderr, "qemu: Too many bluetooth HCIs (max %i).\n", 
> MAX_NICS);
> +error_report("Too many bluetooth HCIs (max %i).", MAX_NICS);
>  return -1;
>  }
>  
> @@ -931,8 +931,8 @@ static void bt_vhci_add(int vlan_id)
>  struct bt_scatternet_s *vlan = qemu_find_bt_vlan(vlan_id);
>  
>  if (!vlan->slave)
> -fprintf(stderr, "qemu: warning: adding a VHCI to "
> -"an empty scatternet %i\n", vlan_id);
> +error_report("warning: adding a VHCI to "
> + "an empty scatternet %i", vlan_id);
>  
>  bt_vhci_init(bt_new_hci(vlan));
>  }
> @@ -950,7 +950,7 @@ static struct bt_device_s *bt_device_add(const char *opt)
>  if (endp) {
>  vlan_id = strtol(endp + 6, &endp, 0);
>  if (*endp) {
> -fprintf(stderr, "qemu: unrecognised bluetooth vlan Id\n");
> +error_report("unrecognised bluetooth vlan Id");
>  return 0;
>  }
>  }
> @@ -958,13 +958,13 @@ static struct bt_device_s *bt_device_add(const char 
> *opt)
>  vlan = qemu_find_bt_vlan(vlan_id);
>  
>  if (!vlan->slave)
> -fprintf(stderr, "qemu: warning: adding a slave device to "
> -"an empty scatternet %i\n", vlan_id);
> +error_report("warning: adding a slave device to "
> + "an empty scatternet %i", vlan_id);
>  
>  if (!strcmp(devname, "keyboar

Re: [Qemu-devel] proposal: new qemu-arm mailing list

2015-10-27 Thread Andrew Jones
On Tue, Oct 27, 2015 at 10:15:32AM +, Peter Maydell wrote:
> Hi; it's been suggested to me that it would be helpful to have a
> qemu-arm mailing list, along the lines of the existing qemu-ppc
> and qemu-block lists. The idea would be to get people to cc the
> list with their ARM related patches, so it would mostly act as
> a way for people to filter their mail to separate the ARM stuff
> out from the qemu-devel firehose. (Everything should still be
> cc'd to qemu-devel as well.)
> 
> Any objections/opinions?

FWIW, I like the idea. Although

notmuch tag +qemu-arm -- tag:new \
 and to:qemu-devel@nongnu.org \
 and \( subject:arm or subject:aarch64 \)

also seems to work pretty well for me.

Thanks,
drew



Re: [Qemu-devel] [PATCH] vl.c: Replace fprintf(stderr) with error_report()

2015-10-28 Thread Andrew Jones
On Tue, Oct 27, 2015 at 08:36:07PM +0100, Laszlo Ersek wrote:
> On 10/27/15 20:25, Eduardo Habkost wrote:
> > On Tue, Oct 27, 2015 at 08:30:38AM +0100, Andrew Jones wrote:
> >> In addition to Markus' and Eric's comments, I think we should
> >>
> >> 1. make sure the first word's case is correct. Lowercase for phrases,
> >>uppercase for sentences and proper nouns.
> > 
> > I think I can understand this in the more obvious cases, but I don't
> > know how I could convert the following instance:
> > 
> > [...]
> >>>  case QEMU_OPTION_no_kvm_pit: {
> >>> -fprintf(stderr, "Warning: KVM PIT can no longer be 
> >>> disabled "
> >>> -"separately.\n");
> >>> +error_report("Warning: KVM PIT can no longer be disabled 
> >>> "
> >>> + "separately.");
> >>
> >> Could change this from a sentence into a phrase. Also, we need a consistent
> >> 'warning' prefix. Should we make a error_report_warn variant?
> > 
> > Converting that sentence to a phrase is beyond my non-native english
> > speaker skills. Do you have any suggestions? :)
> > 
> 
> easy-peasy, "KVM PIT no longer disableable separately"!
> 
> /me hides ;)
As well you should with a word like "disableable" :-)

How about

  Warning: ignoring deprecated option no-kvm-pit

drew

> 



Re: [Qemu-devel] [kvm-unit-tests PATCHv5 3/3] arm: pmu: Add CPI checking

2015-10-30 Thread Andrew Jones
On Wed, Oct 28, 2015 at 03:12:55PM -0400, Christopher Covington wrote:
> Calculate the numbers of cycles per instruction (CPI) implied by ARM
> PMU cycle counter values. The code includes a strict checking facility
> intended for the -icount option in TCG mode but it is not yet enabled
> in the configuration file. Enabling it must wait on infrastructure
> improvements which allow for different tests to be run on TCG versus
> KVM.
> 
> Signed-off-by: Christopher Covington 
> ---
>  arm/pmu.c | 103 
> +-
>  1 file changed, 102 insertions(+), 1 deletion(-)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 4334de4..76a 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -43,6 +43,23 @@ static inline unsigned long get_pmccntr(void)
>   asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
>   return cycles;
>  }
> +
> +/*
> + * Extra instructions inserted by the compiler would be difficult to 
> compensate
> + * for, so hand assemble everything between, and including, the PMCR accesses
> + * to start and stop counting.
> + */
> +static inline void loop(int i, uint32_t pmcr)
> +{
> + asm volatile(
> + "   mcr p15, 0, %[pmcr], c9, c12, 0\n"
> + "1: subs%[i], %[i], #1\n"
> + "   bgt 1b\n"
> + "   mcr p15, 0, %[z], c9, c12, 0\n"
> + : [i] "+r" (i)
> + : [pmcr] "r" (pmcr), [z] "r" (0)
> + : "cc");
> +}
>  #elif defined(__aarch64__)
>  static inline uint32_t get_pmcr(void)
>  {
> @@ -64,6 +81,23 @@ static inline unsigned long get_pmccntr(void)
>   asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
>   return cycles;
>  }
> +
> +/*
> + * Extra instructions inserted by the compiler would be difficult to 
> compensate
> + * for, so hand assemble everything between, and including, the PMCR accesses
> + * to start and stop counting.
> + */
> +static inline void loop(int i, uint32_t pmcr)
> +{
> + asm volatile(
> + "   msr pmcr_el0, %[pmcr]\n"
> + "1: subs%[i], %[i], #1\n"
> + "   b.gt1b\n"
> + "   msr pmcr_el0, xzr\n"
> + : [i] "+r" (i)
> + : [pmcr] "r" (pmcr)
> + : "cc");
> +}
>  #endif
>  
>  struct pmu_data {
> @@ -131,12 +165,79 @@ static bool check_cycles_increase(void)
>   return true;
>  }
>  
> -int main(void)
> +/*
> + * Execute a known number of guest instructions. Only odd instruction counts
> + * greater than or equal to 3 are supported by the in-line assembly code. The
> + * control register (PMCR_EL0) is initialized with the provided value 
> (allowing
> + * for example for the cycle counter or event counters to be reset). At the 
> end
> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
> + * counting, allowing the cycle counter or event counters to be read at the
> + * leisure of the calling code.
> + */
> +static void measure_instrs(int num, uint32_t pmcr)
> +{
> + int i = (num - 1) / 2;
> +
> + assert(num >= 3 && ((num - 1) % 2 == 0));
> + loop(i, pmcr);
> +}
> +
> +/*
> + * Measure cycle counts for various known instruction counts. Ensure that the
> + * cycle counter progresses (similar to check_cycles_increase() but with more
> + * instructions and using reset and stop controls). If supplied a positive,
> + * nonzero CPI parameter, also strictly check that every measurement matches
> + * it. Strict CPI checking is used to test -icount mode.
> + */
> +static bool check_cpi(int cpi)
> +{
> + struct pmu_data pmu = {0};
> +
> + pmu.cycle_counter_reset = 1;
> + pmu.enable = 1;
> +
> + if (cpi > 0)
> + printf("Checking for CPI=%d.\n", cpi);
> + printf("instrs : cycles0 cycles1 ...\n");
> +
> + for (int i = 3; i < 300; i += 32) {
> + int avg, sum = 0;
> +
> + printf("%d :", i);
> + for (int j = 0; j < NR_SAMPLES; j++) {
> + int cycles;
> +
> + measure_instrs(i, pmu.pmcr_el0);
> + cycles = get_pmccntr();
> + printf(" %d", cycles);
> +
> + if (!cycles || (cpi > 0 && cycles != i * cpi)) {
> + printf("\n");
> + return false;
> + }
> +
> + sum += cycles;
> + }
> + avg = sum / NR_SAMPLES;
> + printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n",
> + sum, avg, i / avg, avg / i);
> + }
> +
> + return true;
> +}
> +
> +int main(int argc, char *argv[])
>  {
> + int cpi = 0;
> +
> + if (argc >= 1)
> + cpi = atol(argv[0]);
> +
>   report_prefix_push("pmu");
>  
>   report("Control register", check_pmcr());
>   report("Monotonically increasing cycle count", check_cycles_increase());
> + report("Cycle/instruction ratio", check_cpi(cpi));
>  
>   return report_summary();
>  }
> -- 
> Qualcomm Innovation Center, Inc.
> 

Re: [Qemu-devel] [PATCH v2 07/16] vl.c: Use "%s support disabled" error messages consistently

2015-10-30 Thread Andrew Jones
On Thu, Oct 29, 2015 at 05:59:13PM +0100, Markus Armbruster wrote:
> Eduardo Habkost  writes:
> 
> > Signed-off-by: Eduardo Habkost 
> > ---
> >  vl.c | 34 --
> >  1 file changed, 16 insertions(+), 18 deletions(-)
> >
> > diff --git a/vl.c b/vl.c
> > index d417dd9..f37e3a9 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -1018,8 +1018,7 @@ static int parse_sandbox(void *opaque, QemuOpts 
> > *opts, Error **errp)
> >  return -1;
> >  }
> >  #else
> > -error_report("sandboxing request but seccomp is not compiled "
> > - "into this build");
> > +error_report("seccomp support disabled");
> >  return -1;
> >  #endif
> >  }
> 
> Yes, please.
> 
> > @@ -2112,7 +2111,7 @@ static DisplayType select_display(const char *p)
> >  opts = nextopt;
> >  }
> >  #else
> > -error_report("SDL support is disabled");
> > +error_report("SDL support disabled");
> >  exit(1);
> >  #endif
> >  } else if (strstart(p, "vnc", &opts)) {
> 
> I'd prefer the old wording.  Drew?

I like the 'is' version better too. I'm OK with both though.

> 
> > @@ -2128,14 +2127,14 @@ static DisplayType select_display(const char *p)
> >  exit(1);
> >  }
> >  #else
> > -error_report("VNC support is disabled");
> > +error_report("VNC support disabled");
> >  exit(1);
> >  #endif
> >  } else if (strstart(p, "curses", &opts)) {
> >  #ifdef CONFIG_CURSES
> >  display = DT_CURSES;
> >  #else
> > -error_report("Curses support is disabled");
> > +error_report("curses support disabled");
> >  exit(1);
> >  #endif
> >  } else if (strstart(p, "gtk", &opts)) {
> > @@ -2170,7 +2169,7 @@ static DisplayType select_display(const char *p)
> >  opts = nextopt;
> >  }
> >  #else
> > -error_report("GTK support is disabled");
> > +error_report("GTK support disabled");
> >  exit(1);
> >  #endif
> >  } else if (strstart(p, "none", &opts)) {
> > @@ -2635,7 +2634,7 @@ static gint machine_class_cmp(gconstpointer a, 
> > gconstpointer b)
> >  return mc;
> >  }
> >  if (name && !is_help_option(name)) {
> > -error_report("Unsupported machine type");
> > +error_report("unsupported machine type");
> >  error_printf("Use -machine help to list supported machines!\n");
> >  } else {
> >  printf("Supported machines are:\n");
> 
> This belongs to PATCH 11.
> 
> > @@ -3011,7 +3010,7 @@ int main(int argc, char **argv, char **envp)
> >  runstate_init();
> >  
> >  if (qcrypto_init(&err) < 0) {
> > -error_report("Cannot initialize crypto: %s", 
> > error_get_pretty(err));
> > +error_report("cannot initialize crypto: %s", 
> > error_get_pretty(err));
> >  exit(1);
> >  }
> >  rtc_clock = QEMU_CLOCK_HOST;
> 
> Likewise.
> 
> > @@ -3212,7 +3211,7 @@ int main(int argc, char **argv, char **envp)
> >  #ifdef CONFIG_CURSES
> >  display_type = DT_CURSES;
> >  #else
> > -error_report("Curses support is disabled");
> > +error_report("curses support disabled");
> >  exit(1);
> >  #endif
> >  break;
> 
> Likewise.
> 
> > @@ -3440,7 +3439,7 @@ int main(int argc, char **argv, char **envp)
> >  case QEMU_OPTION_fsdev:
> >  olist = qemu_find_opts("fsdev");
> >  if (!olist) {
> > -error_report("fsdev is not supported by this qemu 
> > build");
> > +error_report("fsdev support disabled");
> >  exit(1);
> >  }
> >  opts = qemu_opts_parse_noisily(olist, optarg, true);
> > @@ -3455,7 +3454,7 @@ int main(int argc, char **argv, char **envp)
> >  
> >  olist = qemu_find_opts("virtfs");
> >  if (!olist) {
> > -error_report("virtfs is not supported by this qemu 
> > build");
> > +error_report("virtfs support disabled");
> >  exit(1);
> >  }
> >  opts = qemu_opts_parse_noisily(olist, optarg, true);
> > @@ -3593,7 +3592,7 @@ int main(int argc, char **argv, char **envp)
> >  display_type = DT_SDL;
> >  break;
> >  #else
> > -error_report("SDL support is disabled");
> > +error_report("SDL support disabled");
> >  exit(1);
> >  #endif
> >  case QEMU_OPTION_pidfile:
> > @@ -3705,7 +3704,7 @@ int main(int argc, char **argv, char **envp)
> >  exit(1);
> >  }
> >  #else
> > -error_report("VNC support is disabled");
> > +error_report("VNC support disabled");
> >  exit(1);
> >  #endif
> >  break;
> > @@ -3899,7 +3898,7 @@ int main(int argc, char 

Re: [Qemu-devel] [PATCH v2 09/16] vl.c: Reword -no-kvm-pit-reinjection deprecation warning

2015-10-30 Thread Andrew Jones
On Thu, Oct 29, 2015 at 06:10:17PM +0100, Markus Armbruster wrote:
> Eduardo Habkost  writes:
> 
> > Signed-off-by: Eduardo Habkost 
> > ---
> >  vl.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/vl.c b/vl.c
> > index af8d09c..67147e0 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -3667,8 +3667,8 @@ int main(int argc, char **argv, char **envp)
> >  { /* end of list */ }
> >  };
> >  
> > -error_report("warning: option deprecated, use "
> > - "lost_tick_policy property of kvm-pit 
> > instead");
> > +error_report("warning: -no-kvm-pit-reinjection deprecated 
> > by "
> > + "-global kvm-pit.lost_tick_policy=discard");
> >  qdev_prop_register_global_list(kvm_pit_lost_tick_policy);
> >  break;
> >  }
> 
> Repeating the option in the error message rarely improves the message,
> because error_report() already shows it.  It's actually misleading when
> the option comes from a configuration file read with -readconfig.
> 
> Fortunately, -readconfig doesn't support the option, so that's not an
> issue here.
> 
> For the command line, the message changes from
> 
> qemu-system-x86_64: -no-kvm-pit-reinjection: warning: option deprecated, 
> use lost_tick_policy property of kvm-pit instead
> 
> to
> 
> qemu-system-x86_64: -no-kvm-pit-reinjection: warning: 
> -no-kvm-pit-reinjection deprecated by -global kvm-pit.lost_tick_policy=discard
> 
> Perhaps:
> 
> qemu-system-x86_64: -no-kvm-pit-reinjection: warning: deprecated, 
> replaced by -global kvm-pit.lost_tick_policy=discard
> 
> Similar argument for PATCH 08.
> 

Ah, I hadn't looked closely enough at error_report to know that the option
would be output already with my "ignoring" phrase suggestion. I definitely
agree that repeating it should be avoided.

"warning: deprecated, ignoring"

Thanks,
drew



Re: [Qemu-devel] [PULL 00/05] seccomp branch queue

2015-10-30 Thread Andrew Jones
On Fri, Oct 30, 2015 at 04:30:05PM +, Peter Maydell wrote:
> On 30 October 2015 at 13:44, Eduardo Otubo
>  wrote:
> > The following changes since commit c49d3411faae8ffaab8f7e5db47405a008411c10:
> >
> >   Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2015-10-12' 
> > into staging (2015-10-13 10:42:06 +0100)
> >
> > are available in the git repository at:
> >
> >   git://github.com/otubo/qemu.git tags/pull-seccomp-20151030
> >
> > for you to fetch changes up to b1e1f0bbe7268d0bb8f63da220b41803b2e54081:
> >
> >   seccomp: loosen library version dependency (2015-10-30 14:33:00 +0100)
> >
> > 
> > seccomp branch queue
> >
> > 
> 
> Hi. I'm afraid this fails to build on x86 Linux:
> 
> /home/petmay01/linaro/qemu-for-merges/qemu-seccomp.c:241:8: error:
> ‘__NR_cacheflush’ undeclared here (not in a function)
>  { SCMP_SYS(cacheflush), 240 },

Ugh...  Of course we can't have a common seccomp syscall table and different
atleast-versions for different architectures... So x86 only requires 2.1.0,
but libseccomp didn't know about cacheflush until 2.2.1, and we require 2.2.3
for other reasons. Ideally, we'd compose the seccomp syscall table by starting
with a common base (something 2.1.0 knows about) and then add an architecture
specific table, and maybe even a machine-type specific table. That's pretty
easy to do, we just need a list of tables, but as it's a framework change and
will touch many places, then I don't expect it to be a super quick job... I
think in the interim the fix for patch 1/5 of this series is something
like shown below. If there are no objections, then I'll send it.

drew

diff --git a/qemu-seccomp.c b/qemu-seccomp.c
index 80d034a8d5190..e76097e958779 100644
--- a/qemu-seccomp.c
+++ b/qemu-seccomp.c
@@ -16,6 +16,10 @@
 #include 
 #include "sysemu/seccomp.h"
 
+#if SCMP_VER_MAJOR >= 2 && SCMP_VER_MINOR >= 2 && SCMP_VER_MICRO >= 3
+#define HAVE_CACHEFLUSH
+#endif
+
 struct QemuSeccompSyscall {
 int32_t num;
 uint8_t priority;
@@ -238,7 +242,10 @@ static const struct QemuSeccompSyscall
seccomp_whitelist[] = {
 { SCMP_SYS(inotify_init1), 240 },
 { SCMP_SYS(inotify_add_watch), 240 },
-{ SCMP_SYS(mbind), 240 }
+{ SCMP_SYS(mbind), 240 },
+#ifdef HAVE_CACHEFLUSH
+{ SCMP_SYS(cacheflush), 240 },
+#endif
 };
 
 int seccomp_start(void)



Re: [Qemu-devel] [kvm-unit-tests PATCHv5 3/3] arm: pmu: Add CPI checking

2015-11-02 Thread Andrew Jones
On Fri, Oct 30, 2015 at 03:32:43PM -0400, Christopher Covington wrote:
> Hi Drew,
> 
> On 10/30/2015 09:00 AM, Andrew Jones wrote:
> > On Wed, Oct 28, 2015 at 03:12:55PM -0400, Christopher Covington wrote:
> >> Calculate the numbers of cycles per instruction (CPI) implied by ARM
> >> PMU cycle counter values. The code includes a strict checking facility
> >> intended for the -icount option in TCG mode but it is not yet enabled
> >> in the configuration file. Enabling it must wait on infrastructure
> >> improvements which allow for different tests to be run on TCG versus
> >> KVM.
> >>
> >> Signed-off-by: Christopher Covington 
> >> ---
> >>  arm/pmu.c | 103 
> >> +-
> >>  1 file changed, 102 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arm/pmu.c b/arm/pmu.c
> >> index 4334de4..76a 100644
> >> --- a/arm/pmu.c
> >> +++ b/arm/pmu.c
> >> @@ -43,6 +43,23 @@ static inline unsigned long get_pmccntr(void)
> >>asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
> >>return cycles;
> >>  }
> >> +
> >> +/*
> >> + * Extra instructions inserted by the compiler would be difficult to 
> >> compensate
> >> + * for, so hand assemble everything between, and including, the PMCR 
> >> accesses
> >> + * to start and stop counting.
> >> + */
> >> +static inline void loop(int i, uint32_t pmcr)
> >> +{
> >> +  asm volatile(
> >> +  "   mcr p15, 0, %[pmcr], c9, c12, 0\n"
> >> +  "1: subs%[i], %[i], #1\n"
> >> +  "   bgt 1b\n"
> >> +  "   mcr p15, 0, %[z], c9, c12, 0\n"
> >> +  : [i] "+r" (i)
> >> +  : [pmcr] "r" (pmcr), [z] "r" (0)
> >> +  : "cc");
> >> +}
> >>  #elif defined(__aarch64__)
> >>  static inline uint32_t get_pmcr(void)
> >>  {
> >> @@ -64,6 +81,23 @@ static inline unsigned long get_pmccntr(void)
> >>asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
> >>return cycles;
> >>  }
> >> +
> >> +/*
> >> + * Extra instructions inserted by the compiler would be difficult to 
> >> compensate
> >> + * for, so hand assemble everything between, and including, the PMCR 
> >> accesses
> >> + * to start and stop counting.
> >> + */
> >> +static inline void loop(int i, uint32_t pmcr)
> >> +{
> >> +  asm volatile(
> >> +  "   msr pmcr_el0, %[pmcr]\n"
> >> +  "1: subs%[i], %[i], #1\n"
> >> +  "   b.gt1b\n"
> >> +  "   msr pmcr_el0, xzr\n"
> >> +  : [i] "+r" (i)
> >> +  : [pmcr] "r" (pmcr)
> >> +  : "cc");
> >> +}
> >>  #endif
> >>  
> >>  struct pmu_data {
> >> @@ -131,12 +165,79 @@ static bool check_cycles_increase(void)
> >>return true;
> >>  }
> >>  
> >> -int main(void)
> >> +/*
> >> + * Execute a known number of guest instructions. Only odd instruction 
> >> counts
> >> + * greater than or equal to 3 are supported by the in-line assembly code. 
> >> The
> >> + * control register (PMCR_EL0) is initialized with the provided value 
> >> (allowing
> >> + * for example for the cycle counter or event counters to be reset). At 
> >> the end
> >> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
> >> + * counting, allowing the cycle counter or event counters to be read at 
> >> the
> >> + * leisure of the calling code.
> >> + */
> >> +static void measure_instrs(int num, uint32_t pmcr)
> >> +{
> >> +  int i = (num - 1) / 2;
> >> +
> >> +  assert(num >= 3 && ((num - 1) % 2 == 0));
> >> +  loop(i, pmcr);
> >> +}
> >> +
> >> +/*
> >> + * Measure cycle counts for various known instruction counts. Ensure that 
> >> the
> >> + * cycle counter progresses (similar to check_cycles_increase() but with 
> >> more
> >> + * instructions and using reset and stop controls). If supplied a 
> >> positive,
> >> + * nonzero CPI parameter, also strictly check that every measurement 
> >> matches
> >> + * it. Stri

[Qemu-devel] [PATCH v2] seccomp: add cacheflush to whitelist

2015-11-02 Thread Andrew Jones
cacheflush is an arm-specific syscall that qemu built for arm
uses. Add it to the whitelist, but only if we're linking with
a recent enough libseccomp.

Signed-off-by: Andrew Jones 
---
v2: only add cacheflush if libseccomp supports it

 qemu-seccomp.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/qemu-seccomp.c b/qemu-seccomp.c
index 80d034a8d5190..e76097e958779 100644
--- a/qemu-seccomp.c
+++ b/qemu-seccomp.c
@@ -16,6 +16,10 @@
 #include 
 #include "sysemu/seccomp.h"
 
+#if SCMP_VER_MAJOR >= 2 && SCMP_VER_MINOR >= 2 && SCMP_VER_MICRO >= 3
+#define HAVE_CACHEFLUSH
+#endif
+
 struct QemuSeccompSyscall {
 int32_t num;
 uint8_t priority;
@@ -238,7 +242,10 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] 
= {
 { SCMP_SYS(inotify_init1), 240 },
 { SCMP_SYS(inotify_add_watch), 240 },
 { SCMP_SYS(mbind), 240 },
-{ SCMP_SYS(memfd_create), 240 }
+{ SCMP_SYS(memfd_create), 240 },
+#ifdef HAVE_CACHEFLUSH
+{ SCMP_SYS(cacheflush), 240 },
+#endif
 };
 
 int seccomp_start(void)
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH v2] seccomp: add cacheflush to whitelist

2015-11-02 Thread Andrew Jones
On Mon, Nov 02, 2015 at 06:09:41PM +, Peter Maydell wrote:
> On 2 November 2015 at 17:56, Andrew Jones  wrote:
> > cacheflush is an arm-specific syscall that qemu built for arm
> > uses. Add it to the whitelist, but only if we're linking with
> > a recent enough libseccomp.
> >
> > Signed-off-by: Andrew Jones 
> > ---
> > v2: only add cacheflush if libseccomp supports it
> >
> >  qemu-seccomp.c | 9 -
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/qemu-seccomp.c b/qemu-seccomp.c
> > index 80d034a8d5190..e76097e958779 100644
> > --- a/qemu-seccomp.c
> > +++ b/qemu-seccomp.c
> > @@ -16,6 +16,10 @@
> >  #include 
> >  #include "sysemu/seccomp.h"
> >
> > +#if SCMP_VER_MAJOR >= 2 && SCMP_VER_MINOR >= 2 && SCMP_VER_MICRO >= 3
> > +#define HAVE_CACHEFLUSH
> > +#endif
> 
> This will claim that a hypothetical future version 3.0.0 does not
> have cacheflush...

Indeed. Sigh... In that case, how about just

#if defined(TARGET_ARM) || defined(TARGET_AARCH64)
{ SCMP_SYS(cacheflush), 240 },
#endif

Thanks,
drew

> 
> thanks
> -- PMM
> 



Re: [Qemu-devel] [PATCH v2] seccomp: add cacheflush to whitelist

2015-11-02 Thread Andrew Jones
On Mon, Nov 02, 2015 at 08:37:15PM +, Peter Maydell wrote:
> On 2 November 2015 at 19:04, Andrew Jones  wrote:
> > On Mon, Nov 02, 2015 at 06:09:41PM +, Peter Maydell wrote:
> >> On 2 November 2015 at 17:56, Andrew Jones  wrote:
> >> > cacheflush is an arm-specific syscall that qemu built for arm
> >> > uses. Add it to the whitelist, but only if we're linking with
> >> > a recent enough libseccomp.
> >> >
> >> > Signed-off-by: Andrew Jones 
> >> > ---
> >> > v2: only add cacheflush if libseccomp supports it
> >> >
> >> >  qemu-seccomp.c | 9 -
> >> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/qemu-seccomp.c b/qemu-seccomp.c
> >> > index 80d034a8d5190..e76097e958779 100644
> >> > --- a/qemu-seccomp.c
> >> > +++ b/qemu-seccomp.c
> >> > @@ -16,6 +16,10 @@
> >> >  #include 
> >> >  #include "sysemu/seccomp.h"
> >> >
> >> > +#if SCMP_VER_MAJOR >= 2 && SCMP_VER_MINOR >= 2 && SCMP_VER_MICRO >= 3
> >> > +#define HAVE_CACHEFLUSH
> >> > +#endif
> >>
> >> This will claim that a hypothetical future version 3.0.0 does not
> >> have cacheflush...
> >
> > Indeed. Sigh... In that case, how about just
> >
> > #if defined(TARGET_ARM) || defined(TARGET_AARCH64)
> > { SCMP_SYS(cacheflush), 240 },
> > #endif
> 
> You want to be checking based on the host architecture,
> not the target architecture. Also, not doing the check based
> on seccomp version means there's no hint in the code that the
> ifdefs become obsolete if we raise our cross-architecture
> minimum seccomp version requirement in the future, so really
> a version check is better I think.

Right. OK, I'll stop flinging junk and pull a better version
check together.

Thanks,
drew

> 
> thanks
> -- PMM
> 



[Qemu-devel] [PATCH v3] seccomp: add cacheflush to whitelist

2015-11-02 Thread Andrew Jones
cacheflush is an arm-specific syscall that qemu built for arm
uses. Add it to the whitelist, but only if we're linking with
a recent enough libseccomp.

Signed-off-by: Andrew Jones 
---
v3: deal with major and minor version number bumps
v2: only add cacheflush if libseccomp supports it

 qemu-seccomp.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/qemu-seccomp.c b/qemu-seccomp.c
index 80d034a8d5190..c831fe83ad500 100644
--- a/qemu-seccomp.c
+++ b/qemu-seccomp.c
@@ -16,6 +16,14 @@
 #include 
 #include "sysemu/seccomp.h"
 
+#if SCMP_VER_MAJOR >= 3
+  #define HAVE_CACHEFLUSH
+#elif SCMP_VER_MAJOR == 2 && SCMP_VER_MINOR >= 3
+  #define HAVE_CACHEFLUSH
+#elif SCMP_VER_MAJOR == 2 && SCMP_VER_MINOR == 2 && SCMP_VER_MICRO >= 3
+  #define HAVE_CACHEFLUSH
+#endif
+
 struct QemuSeccompSyscall {
 int32_t num;
 uint8_t priority;
@@ -238,7 +246,10 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] 
= {
 { SCMP_SYS(inotify_init1), 240 },
 { SCMP_SYS(inotify_add_watch), 240 },
 { SCMP_SYS(mbind), 240 },
-{ SCMP_SYS(memfd_create), 240 }
+{ SCMP_SYS(memfd_create), 240 },
+#ifdef HAVE_CACHEFLUSH
+{ SCMP_SYS(cacheflush), 240 },
+#endif
 };
 
 int seccomp_start(void)
-- 
1.8.3.1




[Qemu-devel] [PATCH] hw/arm/virt: error_report cleanups

2015-11-07 Thread Andrew Jones
Signed-off-by: Andrew Jones 
---
 hw/arm/virt.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 77d9267599b7e..9c6792cea16f6 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -941,8 +941,8 @@ static void machvirt_init(MachineState *machine)
 if (!gic_version) {
 gic_version = kvm_arm_vgic_probe();
 if (!gic_version) {
-error_report("Unable to determine GIC version supported by host\n"
- "Probably KVM acceleration is not supported\n");
+error_report("Unable to determine GIC version supported by host");
+error_printf("KVM acceleration is probably not supported\n");
 exit(1);
 }
 }
@@ -990,7 +990,7 @@ static void machvirt_init(MachineState *machine)
 char *cpuopts = g_strdup(cpustr[1]);
 
 if (!oc) {
-fprintf(stderr, "Unable to find CPU definition\n");
+error_report("Unable to find CPU definition");
 exit(1);
 }
 cpuobj = object_new(object_class_get_name(oc));
@@ -1126,8 +1126,8 @@ static void virt_set_gic_version(Object *obj, const char 
*value, Error **errp)
 } else if (!strcmp(value, "host")) {
 vms->gic_version = 0; /* Will probe later */
 } else {
-error_report("Invalid gic-version option value\n"
- "Allowed values are: 3, 2, host\n");
+error_report("Invalid gic-version option value");
+error_printf("Allowed gic-version values are: 3, 2, host\n");
 exit(1);
 }
 }
-- 
2.4.3




Re: [Qemu-devel] [PATCH v3] seccomp: add cacheflush to whitelist

2015-11-09 Thread Andrew Jones
On Mon, Nov 02, 2015 at 11:53:26PM +0100, Andrew Jones wrote:
> cacheflush is an arm-specific syscall that qemu built for arm
> uses. Add it to the whitelist, but only if we're linking with
> a recent enough libseccomp.
> 
> Signed-off-by: Andrew Jones 
> ---
> v3: deal with major and minor version number bumps
> v2: only add cacheflush if libseccomp supports it
> 
>  qemu-seccomp.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/qemu-seccomp.c b/qemu-seccomp.c
> index 80d034a8d5190..c831fe83ad500 100644
> --- a/qemu-seccomp.c
> +++ b/qemu-seccomp.c
> @@ -16,6 +16,14 @@
>  #include 
>  #include "sysemu/seccomp.h"
>  
> +#if SCMP_VER_MAJOR >= 3
> +  #define HAVE_CACHEFLUSH
> +#elif SCMP_VER_MAJOR == 2 && SCMP_VER_MINOR >= 3
> +  #define HAVE_CACHEFLUSH
> +#elif SCMP_VER_MAJOR == 2 && SCMP_VER_MINOR == 2 && SCMP_VER_MICRO >= 3
> +  #define HAVE_CACHEFLUSH
> +#endif
> +
>  struct QemuSeccompSyscall {
>  int32_t num;
>  uint8_t priority;
> @@ -238,7 +246,10 @@ static const struct QemuSeccompSyscall 
> seccomp_whitelist[] = {
>  { SCMP_SYS(inotify_init1), 240 },
>  { SCMP_SYS(inotify_add_watch), 240 },
>  { SCMP_SYS(mbind), 240 },
> -{ SCMP_SYS(memfd_create), 240 }
> +{ SCMP_SYS(memfd_create), 240 },
> +#ifdef HAVE_CACHEFLUSH
> +{ SCMP_SYS(cacheflush), 240 },
> +#endif
>  };
>  
>  int seccomp_start(void)
> -- 
> 1.8.3.1
>

Eduardo, ping?

drew 



[Qemu-devel] [PATCH] kvm-all: PAGE_SIZE should be real host page size

2015-11-09 Thread Andrew Jones
Just noticed this while grepping TARGET_PAGE_SIZE for an unrelated
reason. I didn't use qemu_real_host_page_size as kvm_set_phys_mem()
does, because we'd need to make sure page_size_init() has run first.

Signed-off-by: Andrew Jones 
---
 kvm-all.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 1bc12737723c3..de9ff5971fb3b 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -45,8 +45,10 @@
 #include 
 #endif
 
-/* KVM uses PAGE_SIZE in its definition of COALESCED_MMIO_MAX */
-#define PAGE_SIZE TARGET_PAGE_SIZE
+/* KVM uses PAGE_SIZE in its definition of KVM_COALESCED_MMIO_MAX. We
+ * need to use the real host PAGE_SIZE, as that's what KVM will use.
+ */
+#define PAGE_SIZE getpagesize()
 
 //#define DEBUG_KVM
 
-- 
2.4.3




Re: [Qemu-devel] [PATCH] hw/arm/virt: error_report cleanups

2015-11-10 Thread Andrew Jones
On Tue, Nov 10, 2015 at 11:55:55AM +, Peter Maydell wrote:
> On 10 November 2015 at 09:39, Markus Armbruster  wrote:
> > Peter Maydell  writes:
> >> ...so in conclusion Andrew's patch is correct as it stands
> >> and I should just apply it? :-)
> >
> > Yes.  It got my R-by :)
> 
> OK, applied to target-arm.next. Thanks for walking me through this.
>

Thanks guys! And I'm glad you saw this patch Markus! I'll definitely
remember to CC you on all error_* patches in the future :-)

drew 



Re: [Qemu-devel] [PATCH] kvm-all: PAGE_SIZE should be real host page size

2015-11-10 Thread Andrew Jones
On Tue, Nov 10, 2015 at 04:41:16PM +0100, Paolo Bonzini wrote:
> 
> 
> On 10/11/2015 01:23, Andrew Jones wrote:
> > Just noticed this while grepping TARGET_PAGE_SIZE for an unrelated
> > reason. I didn't use qemu_real_host_page_size as kvm_set_phys_mem()
> > does, because we'd need to make sure page_size_init() has run first.
> > 
> > Signed-off-by: Andrew Jones 
> > ---
> >  kvm-all.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kvm-all.c b/kvm-all.c
> > index 1bc12737723c3..de9ff5971fb3b 100644
> > --- a/kvm-all.c
> > +++ b/kvm-all.c
> > @@ -45,8 +45,10 @@
> >  #include 
> >  #endif
> >  
> > -/* KVM uses PAGE_SIZE in its definition of COALESCED_MMIO_MAX */
> > -#define PAGE_SIZE TARGET_PAGE_SIZE
> > +/* KVM uses PAGE_SIZE in its definition of KVM_COALESCED_MMIO_MAX. We
> > + * need to use the real host PAGE_SIZE, as that's what KVM will use.
> > + */
> > +#define PAGE_SIZE getpagesize()
> >  
> >  //#define DEBUG_KVM
> >  
> > 
> 
> Is this a bugfix or just a cleanup?  If the former, on which targets?

It's a bugfix for any targets that have a TARGET_PAGE_SIZE !=
real-host-page-size. For example ARM has TARGET_PAGE_SIZE set to 1024,
even when the host is using 4k or 64k pages. However, I didn't find this
due to a bug, because on ARM I'm not using emulated devices that make
use of the coalesced-mmio feature at this time.

Thanks,
drew

> 
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: [Qemu-devel] [PATCH] kvm-all: PAGE_SIZE should be real host page size

2015-11-10 Thread Andrew Jones
On Tue, Nov 10, 2015 at 04:29:31PM +, Peter Maydell wrote:
> On 10 November 2015 at 00:23, Andrew Jones  wrote:
> > Just noticed this while grepping TARGET_PAGE_SIZE for an unrelated
> > reason. I didn't use qemu_real_host_page_size as kvm_set_phys_mem()
> > does, because we'd need to make sure page_size_init() has run first.
> >
> > Signed-off-by: Andrew Jones 
> > ---
> >  kvm-all.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/kvm-all.c b/kvm-all.c
> > index 1bc12737723c3..de9ff5971fb3b 100644
> > --- a/kvm-all.c
> > +++ b/kvm-all.c
> > @@ -45,8 +45,10 @@
> >  #include 
> >  #endif
> >
> > -/* KVM uses PAGE_SIZE in its definition of COALESCED_MMIO_MAX */
> > -#define PAGE_SIZE TARGET_PAGE_SIZE
> > +/* KVM uses PAGE_SIZE in its definition of KVM_COALESCED_MMIO_MAX. We
> > + * need to use the real host PAGE_SIZE, as that's what KVM will use.
> > + */
> > +#define PAGE_SIZE getpagesize()
> 
> Rather than defining PAGE_SIZE here (a confusing macro given
> we have several page sizes to deal with), why not just use
> getpagesize() in the one and only location where we currently
> use this macro?

The macro is used by kernel headers that we import and include in
kvm-all.c. It's ugly, I agree, but that's how the this cookie crumbled.

> 
> Also, you're guaranteed that page_size_init() has been run, because
> we call that from kvm_init(), and you can't call kvm_vcpu_init()
> before kvm_init().

True, but having that dependency seemed error prone to me. If we
we some day changed when/if page_size_init is called then there
could be an issue, or if somebody did something like

kvm_init()
{
  my_page_size = PAGE_SIZE;
  ...
  page_size_init();
  ...
  use(my_page_size)
}

things would break.

Thanks,
drew



Re: [Qemu-devel] [PATCH v4 0/5] ARM: Add NUMA support for machine virt

2016-01-29 Thread Andrew Jones
On Fri, Jan 29, 2016 at 02:52:35PM +0800, Shannon Zhao wrote:
> 
> 
> On 2016/1/29 14:32, Ashok Kumar wrote:
> > Hi, 
> > 
> > On Sat, Jan 23, 2016 at 07:36:41PM +0800, Shannon Zhao wrote:
> >> > From: Shannon Zhao 
> >> > 
> >> > Add NUMA support for machine virt. Tested successfully running a guest
> >> > Linux kernel with the following patch applied:
> >> > 
> >> > - [PATCH v9 0/6] arm64, numa: Add numa support for arm64 platforms
> >> > https://lwn.net/Articles/672329/
> >> > - [PATCH v2 0/4] ACPI based NUMA support for ARM64
> >> > http://www.spinics.net/lists/linux-acpi/msg61795.html
> >> > 
> >> > Changes since v3:
> >> > * based on new kernel driver and device bindings
> >> > * add ACPI part
> >> > 
> >> > Changes since v2:
> >> > * update to use NUMA node property arm,associativity.
> >> > 
> >> > Changes since v1:
> >> > Take into account Peter's comments:
> >> > * rename virt_memory_init to arm_generate_memory_dtb
> >> > * move arm_generate_memory_dtb to boot.c and make it a common func
> >> > * use a struct numa_map to generate numa dtb
> >> > 
> >> > Example qemu command line:
> >> > qemu-system-aarch64 \
> >> > -enable-kvm -smp 4\
> >> > -kernel Image \
> >> > -m 512 -machine virt,kernel_irqchip=on \
> >> > -initrd guestfs.cpio.gz \
> >> > -cpu host -nographic \
> >> > -numa node,mem=256M,cpus=0-1,nodeid=0 \
> >> > -numa node,mem=256M,cpus=2-3,nodeid=1 \
> >> > -append "console=ttyAMA0 root=/dev/ram"
> >> > 
> >> > Shannon Zhao (5):
> >> >   ARM: Virt: Add /distance-map node for NUMA
> >> >   ARM: Virt: Set numa-node-id for CPUs
> >> >   ARM: Add numa-node-id for /memory node
> >> >   include/hw/acpi/acpi-defs: Add GICC Affinity Structure
> >> >   hw/arm/virt-acpi-build: Generate SRAT table
> >> > 
> >> >  hw/arm/boot.c   | 29 ++-
> >> >  hw/arm/virt-acpi-build.c| 58 
> >> > +
> >> >  hw/arm/virt.c   | 37 +
> >> >  hw/i386/acpi-build.c|  2 +-
> >> >  include/hw/acpi/acpi-defs.h | 15 +++-
> >> >  5 files changed, 138 insertions(+), 3 deletions(-)
> >> > 
> >> > -- 
> >> > 2.0.4
> >> > 
> > Don't we need to populate the NUMA node in the Affinity byte of MPIDR?
> > Linux uses the Affinity information in MPIDR to build topology which
> > might go wrong for the guest in this case. 
> > Maybe a non Linux OS might be impacted more?
> > 
> Ah, yes. It needs to update the MPIDR. But currently QEMU uses the value
> from KVM when using KVM. It needs to call kvm_set_one_reg to set the
> MPIDR and I'm not sure if this will affect KVM by looking at following
> comments:
> /*
>  * When KVM is in use, PSCI is emulated in-kernel and not by qemu.
>  * Currently KVM has its own idea about MPIDR assignment, so we
>  * override our defaults with what we get from KVM.
>  */
> 
> Peter, do you have any suggestion?
> 
> > distance-map compatible string has been changed from
> > "numa,distance-map-v1" to "numa-distance-map-v1"
> Will update this.

Sigh... I've had the MPIDR rework (to let QEMU dictate it0 on my TODO list
for a loong time. I'll pick it up today and hopefully have something to
discuss next week. I'll keep this series in mind too.

Thanks,
drew



Re: [Qemu-devel] [PATCH] arm: virt-acpi: each MADT.GICC entry as enabled unconditionally

2016-01-29 Thread Andrew Jones
On Fri, Jan 29, 2016 at 10:59:32PM +0800, Shannon Zhao wrote:
> 
> 
> On 2016/1/29 22:24, Igor Mammedov wrote:
> >in current impl. condition
> >
> >build_madt() {
> >   ...
> >   if (test_bit(i, cpuinfo->found_cpus))
> >
> >is always true since loop handles only present CPUs
> >in range [0..smp_cpus).
> >But to fill usless cpuinfo->found_cpus we do unnecessary
> >scan over QOM tree to find the same CPUs.
> >So mark GICC as present always and drop not needed
> >code that fills cpuinfo->found_cpus.
> >
> >Signed-off-by: Igor Mammedov
> >---
> >It's just simple cleanup but I'm trying to generalize
> >a bit CPU related ACPI tables and as part of it get rid
> >of found_cpus bitmap and if possible cpu_index usage
> >in ACPI parts of code.
> >---
> >  hw/arm/virt-acpi-build.c | 26 +++---
> >  1 file changed, 3 insertions(+), 23 deletions(-)
> >
> >diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> >index 87fbe7c..3ed39fc 100644
> >--- a/hw/arm/virt-acpi-build.c
> >+++ b/hw/arm/virt-acpi-build.c
> >@@ -46,20 +46,6 @@
> >  #define ARM_SPI_BASE 32
> >  #define ACPI_POWER_BUTTON_DEVICE "PWRB"
> >
> >-typedef struct VirtAcpiCpuInfo {
> >-DECLARE_BITMAP(found_cpus, VIRT_ACPI_CPU_ID_LIMIT);
> >-} VirtAcpiCpuInfo;
> >-
> >-static void virt_acpi_get_cpu_info(VirtAcpiCpuInfo *cpuinfo)
> >-{
> >-CPUState *cpu;
> >-
> >-memset(cpuinfo->found_cpus, 0, sizeof cpuinfo->found_cpus);
> >-CPU_FOREACH(cpu) {
> >-set_bit(cpu->cpu_index, cpuinfo->found_cpus);
> >-}
> >-}
> >-
> >  static void acpi_dsdt_add_cpus(Aml *scope, int smp_cpus)
> >  {
> >  uint16_t i;
> >@@ -458,8 +444,7 @@ build_gtdt(GArray *table_data, GArray *linker)
> >
> >  /* MADT */
> >  static void
> >-build_madt(GArray *table_data, GArray *linker, VirtGuestInfo *guest_info,
> >-   VirtAcpiCpuInfo *cpuinfo)
> >+build_madt(GArray *table_data, GArray *linker, VirtGuestInfo *guest_info)
> >  {
> >  int madt_start = table_data->len;
> >  const MemMapEntry *memmap = guest_info->memmap;
> >@@ -489,9 +474,7 @@ build_madt(GArray *table_data, GArray *linker, 
> >VirtGuestInfo *guest_info,
> >  gicc->cpu_interface_number = i;
> >  gicc->arm_mpidr = armcpu->mp_affinity;
> >  gicc->uid = i;
> >-if (test_bit(i, cpuinfo->found_cpus)) {
> >-gicc->flags = cpu_to_le32(ACPI_GICC_ENABLED);
> >-}
> >+gicc->flags = cpu_to_le32(ACPI_GICC_ENABLED);
> >  }
> Ah, yes, it uses smp_cpus not max_cpus. But we still needs to support
> max_cpus usage even though it doesn't support vcpu hotplug currently. So we
> may need to introduce guest_info->max_cpus and use it here.

We should leave that for when the hotplug patches come, and we should
probably leave the hotplug patches until we see what Igor plans for
sharing more ACPI code between x86 and ARM.

> And below check in virt.c is not right while it should compare the global
> max_cpus with the max_cpus GIC supports.
> 
> if (smp_cpus > max_cpus) {
> error_report("Number of SMP CPUs requested (%d) exceeds max CPUs "
>  "supported by machine 'mach-virt' (%d)",
>  smp_cpus, max_cpus);
> exit(1);
> }

max_cpus is getting set to the number the gic supports just above this
check. So max_cpus == gic_supported_cpus already, and this check is just
confirming the number of cpus the user has selected is OK.

drew



Re: [Qemu-devel] [PATCH] arm: virt-acpi: each MADT.GICC entry as enabled unconditionally

2016-01-29 Thread Andrew Jones
On Fri, Jan 29, 2016 at 11:44:24PM +0800, Shannon Zhao wrote:
> 
> 
> On 2016/1/29 23:26, Andrew Jones wrote:
> >On Fri, Jan 29, 2016 at 10:59:32PM +0800, Shannon Zhao wrote:
> >>>
> >>>
> >>>On 2016/1/29 22:24, Igor Mammedov wrote:
> >>>> >in current impl. condition
> >>>> >
> >>>> >build_madt() {
> >>>> >   ...
> >>>> >   if (test_bit(i, cpuinfo->found_cpus))
> >>>> >
> >>>> >is always true since loop handles only present CPUs
> >>>> >in range [0..smp_cpus).
> >>>> >But to fill usless cpuinfo->found_cpus we do unnecessary
> >>>> >scan over QOM tree to find the same CPUs.
> >>>> >So mark GICC as present always and drop not needed
> >>>> >code that fills cpuinfo->found_cpus.
> >>>> >
> >>>> >Signed-off-by: Igor Mammedov
> >>>> >---
> >>>> >It's just simple cleanup but I'm trying to generalize
> >>>> >a bit CPU related ACPI tables and as part of it get rid
> >>>> >of found_cpus bitmap and if possible cpu_index usage
> >>>> >in ACPI parts of code.
> >>>> >---
> >>>> >  hw/arm/virt-acpi-build.c | 26 +++---
> >>>> >  1 file changed, 3 insertions(+), 23 deletions(-)
> >>>> >
> >>>> >diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> >>>> >index 87fbe7c..3ed39fc 100644
> >>>> >--- a/hw/arm/virt-acpi-build.c
> >>>> >+++ b/hw/arm/virt-acpi-build.c
> >>>> >@@ -46,20 +46,6 @@
> >>>> >  #define ARM_SPI_BASE 32
> >>>> >  #define ACPI_POWER_BUTTON_DEVICE "PWRB"
> >>>> >
> >>>> >-typedef struct VirtAcpiCpuInfo {
> >>>> >-DECLARE_BITMAP(found_cpus, VIRT_ACPI_CPU_ID_LIMIT);
> >>>> >-} VirtAcpiCpuInfo;
> >>>> >-
> >>>> >-static void virt_acpi_get_cpu_info(VirtAcpiCpuInfo *cpuinfo)
> >>>> >-{
> >>>> >-CPUState *cpu;
> >>>> >-
> >>>> >-memset(cpuinfo->found_cpus, 0, sizeof cpuinfo->found_cpus);
> >>>> >-CPU_FOREACH(cpu) {
> >>>> >-set_bit(cpu->cpu_index, cpuinfo->found_cpus);
> >>>> >-}
> >>>> >-}
> >>>> >-
> >>>> >  static void acpi_dsdt_add_cpus(Aml *scope, int smp_cpus)
> >>>> >  {
> >>>> >  uint16_t i;
> >>>> >@@ -458,8 +444,7 @@ build_gtdt(GArray *table_data, GArray *linker)
> >>>> >
> >>>> >  /* MADT */
> >>>> >  static void
> >>>> >-build_madt(GArray *table_data, GArray *linker, VirtGuestInfo 
> >>>> >*guest_info,
> >>>> >-   VirtAcpiCpuInfo *cpuinfo)
> >>>> >+build_madt(GArray *table_data, GArray *linker, VirtGuestInfo 
> >>>> >*guest_info)
> >>>> >  {
> >>>> >  int madt_start = table_data->len;
> >>>> >  const MemMapEntry *memmap = guest_info->memmap;
> >>>> >@@ -489,9 +474,7 @@ build_madt(GArray *table_data, GArray *linker, 
> >>>> >VirtGuestInfo *guest_info,
> >>>> >  gicc->cpu_interface_number = i;
> >>>> >  gicc->arm_mpidr = armcpu->mp_affinity;
> >>>> >  gicc->uid = i;
> >>>> >-if (test_bit(i, cpuinfo->found_cpus)) {
> >>>> >-gicc->flags = cpu_to_le32(ACPI_GICC_ENABLED);
> >>>> >-}
> >>>> >+gicc->flags = cpu_to_le32(ACPI_GICC_ENABLED);
> >>>> >  }
> >>>Ah, yes, it uses smp_cpus not max_cpus. But we still needs to support
> >>>max_cpus usage even though it doesn't support vcpu hotplug currently. So we
> >>>may need to introduce guest_info->max_cpus and use it here.
> >We should leave that for when the hotplug patches come, and we should
> >probably leave the hotplug patches until we see what Igor plans for
> >sharing more ACPI code between x86 and ARM.
> >
> Even if ignoring the vcpu hotplug, we still need to support max_cpus and
> smp_cpus usage like -smp 1,maxcpus=4.

OK, without hotplug, max > smp doesn't gain anything, max < smp results
in an error, and therefore the only useful case is max == smp.

> 
> >>>And below check in virt.c is not right while it should compare the global
> >>>max_cpus with the max_cpus GIC supports.
> >>>
> >>> if (smp_cpus > max_cpus) {
> >>> error_report("Number of SMP CPUs requested (%d) exceeds max CPUs "
> >>>  "supported by machine 'mach-virt' (%d)",
> >>>  smp_cpus, max_cpus);
> >>> exit(1);
> >>> }
> >max_cpus is getting set to the number the gic supports just above this
> >check. So max_cpus == gic_supported_cpus already, and this check is just
> >confirming the number of cpus the user has selected is OK.
> No, the global max_cpus (which is defined in vl.c and exported in
> sysemu/sysemu.h) is not the local variable max_cpus.

I now see what you mean though. If we don't want something like
-smp 1,maxcpus=9 to erroneously succeed on a gicv2 machine, then we
should be checking the global max_cpus here. I agree it should be
fixed, because, even though it changes nothing atm, we don't want to
allow invalid command lines.

Will you send the patch?

Thanks,
drew



[Qemu-devel] [PATCH] kvm-all: trace: strerror fixup

2016-02-01 Thread Andrew Jones
Signed-off-by: Andrew Jones 
---
 kvm-all.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 9148889921197..330f509a0da84 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -2362,7 +2362,7 @@ int kvm_set_one_reg(CPUState *cs, uint64_t id, void 
*source)
 reg.addr = (uintptr_t) source;
 r = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
 if (r) {
-trace_kvm_failed_reg_set(id, strerror(r));
+trace_kvm_failed_reg_set(id, strerror(-r));
 }
 return r;
 }
@@ -2376,7 +2376,7 @@ int kvm_get_one_reg(CPUState *cs, uint64_t id, void 
*target)
 reg.addr = (uintptr_t) target;
 r = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ®);
 if (r) {
-trace_kvm_failed_reg_get(id, strerror(r));
+trace_kvm_failed_reg_get(id, strerror(-r));
 }
 return r;
 }
-- 
2.4.3




Re: [Qemu-devel] [libvirt] ARM KVM GICv3 Support

2016-02-02 Thread Andrew Jones
On Tue, Feb 02, 2016 at 12:10:10PM +, Daniel P. Berrange wrote:
> On Tue, Feb 02, 2016 at 12:49:33PM +0100, Christoffer Dall wrote:
> > On Fri, Jan 22, 2016 at 02:44:32PM +, Daniel P. Berrange wrote:
> > > On Wed, Jan 06, 2016 at 01:30:16PM +, Peter Maydell wrote:
> > > > On 6 January 2016 at 12:49, Andrea Bolognani  
> > > > wrote:
> > > > > That's correct, having a QMP command that lists the values gic-version
> > > > > can have on the current host would be just great.
> > > > >
> > > > > If we had that, we could validate the GIC version chosen for a guest,
> > > > > and expose it in the capabilities XML so that higher-level tools can
> > > > > provide a list of choices to the user.
> > > > >
> > > > > Please note that this QMP command would have to work regardless of the
> > > > > machine type selected on QEMU's command line, because libvirt always
> > > > > runs a QEMU binary with '-M none' when probing its capabilities.
> > > > 
> > > > On the other hand, if you don't tell us the machine type you care
> > > > about then we can't tell you:
> > > >  (a) "this machine type doesn't support setting this property at all"
> > > >  (which applies to machines like vexpress-a15 which you can use with
> > > >  KVM on 32-bit hosts, and of course also to all the non-KVM models)
> > > 
> > > We have just recently merged support for registering properties against
> > > classes instead of object instances. There is also a proposed command
> > > to allow querying the list of properties against a class
> > > 
> > >   https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04348.html
> > > 
> > > So if we now update the machine types to register their properties against
> > > the class instead of object, then we can query what properties are present
> > > against each machine type, while still using '-M none'.
> > > 
> > > >  (b) "this machine type only supports GIC versions X and Y even if the
> > > >   host supports more" (this is currently only hypothetical, though,
> > > >   since we only have the property on 'virt'. it would only happen
> > > >   if in the future we needed something other than '2' or '3' or
> > > >   'host' I think.)
> > > 
> > > Our introspection support in QOM only allows us to say that a property
> > > is a particular type (int / enum / str / whatever). We don't have any
> > > way to expose info about what subset of possible values for a type are
> > > permitted. So I don't see any near term way to inform apps that the
> > > gic property accepts values x, y and but not z.

This actually doesn't matter for the v2 vs. v3 case. The gic-version property
doesn't exist at all for v2-only QEMU. Although maybe the gic-version property
should be reworked. Instead of gic-version=, we could create
one boolean property per version supported, i.e. gicv2, gicv3, gicv4...

> > > 
> > > IMHO, we shouldn't try to overthink this. Libvirt can query the host
> > > to find out what GIC versions are supported and default to using the
> > > most recent version, on the basis that people are likely to have a
> > > matching QEMU. We can just rely on QEMU to report error if we pass
> > > it a version it doesn't support and not try to pre-emptively check
> > > before launch. The key is just getting the default right IMHO.
> > > 
> > This sounds fine to me.
> > 
> > However, not being familiar with the internals of libvirt I really can't
> > just what the right implementation approach here is.
> > 
> > As I understand we need to either:
> > 
> >   1) Add a QMP command that lets you ask for -M virt, if GIC version X
> >  is supported
> > 
> >   2) Just implement something in libvirt that checks what the kernel
> >  supports directly via the well-defined KVM interface and chooses
> >  the highest supported version per default.
> > 
> > To me it sounds like we should just go ahead with (2) and document
> > somehwere that if you get an error, you need to either manually change
> > the gic version setting in your VM definition file or upgrade QEMU.
> > 
> > Can someone with libvirt authority please advice whether (1) or (2) is
> > what we need to do?
> 
> IMHO we should just go for 2.

I'm not familiar enough with libvirt, nor the use of QMP, to really argue
one way or another, but I find it a bit strange that we'd prefer libvirt
to query two entities over one. And, why should the libvirt installed on
a particular host prefer gicv3 as the default, just because KVM supports
it, even when QEMU does not? The default will fail every time until QEMU
is upgraded, which may not be necessary/desired. Finally, I thought we
were trying to get away from relying on QEMU's error messages to make any
sort of decisions.

I don't know what else libvirt queries directly from KVM, but IMO, it
should be a long-term goal to stop doing it, not add to more. Besides
libvirt then properly selecting defaults that both KVM and QEMU support,
it would allow /dev/kvm to have QEMU-only selinux labels applied.

Re: [Qemu-devel] [libvirt] ARM KVM GICv3 Support

2016-02-02 Thread Andrew Jones
On Tue, Feb 02, 2016 at 01:15:49PM +, Peter Maydell wrote:
> On 2 February 2016 at 12:59, Andrew Jones  wrote:
> > This actually doesn't matter for the v2 vs. v3 case. The gic-version 
> > property
> > doesn't exist at all for v2-only QEMU. Although maybe the gic-version 
> > property
> > should be reworked. Instead of gic-version=, we could 
> > create
> > one boolean property per version supported, i.e. gicv2, gicv3, gicv4...
> 
> QEMU 2.5 released with the gic-version property, so we can't just drop
> it now; that would break users' command lines. We'd need a strong
> argument for why it's worth adding new properties that duplicate
> the legacy syntax.

I agree we should have a good argument to justify messing with it, but
until we start versioning mach-virt, then I don't think anybody should
depend on mach-virt command lines working everywhere. Every time we add
a new property, and a user adds it to their command line, then they've
just broken that command line for older QEMU. Yes, I know the upgrade
path is a bit different, but still... Anyway, properties are changeable
with versioned machines.

drew



Re: [Qemu-devel] [libvirt] ARM KVM GICv3 Support

2016-02-02 Thread Andrew Jones
On Tue, Feb 02, 2016 at 03:05:28PM +0100, Christoffer Dall wrote:
> On Tue, Feb 02, 2016 at 01:59:26PM +0100, Andrew Jones wrote:
> > On Tue, Feb 02, 2016 at 12:10:10PM +, Daniel P. Berrange wrote:
> > > On Tue, Feb 02, 2016 at 12:49:33PM +0100, Christoffer Dall wrote:
> > > > On Fri, Jan 22, 2016 at 02:44:32PM +, Daniel P. Berrange wrote:
> > > > > On Wed, Jan 06, 2016 at 01:30:16PM +, Peter Maydell wrote:
> > > > > > On 6 January 2016 at 12:49, Andrea Bolognani  
> > > > > > wrote:
> > > > > > > That's correct, having a QMP command that lists the values 
> > > > > > > gic-version
> > > > > > > can have on the current host would be just great.
> > > > > > >
> > > > > > > If we had that, we could validate the GIC version chosen for a 
> > > > > > > guest,
> > > > > > > and expose it in the capabilities XML so that higher-level tools 
> > > > > > > can
> > > > > > > provide a list of choices to the user.
> > > > > > >
> > > > > > > Please note that this QMP command would have to work regardless 
> > > > > > > of the
> > > > > > > machine type selected on QEMU's command line, because libvirt 
> > > > > > > always
> > > > > > > runs a QEMU binary with '-M none' when probing its capabilities.
> > > > > > 
> > > > > > On the other hand, if you don't tell us the machine type you care
> > > > > > about then we can't tell you:
> > > > > >  (a) "this machine type doesn't support setting this property at 
> > > > > > all"
> > > > > >  (which applies to machines like vexpress-a15 which you can use 
> > > > > > with
> > > > > >  KVM on 32-bit hosts, and of course also to all the non-KVM 
> > > > > > models)
> > > > > 
> > > > > We have just recently merged support for registering properties 
> > > > > against
> > > > > classes instead of object instances. There is also a proposed command
> > > > > to allow querying the list of properties against a class
> > > > > 
> > > > >   https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04348.html
> > > > > 
> > > > > So if we now update the machine types to register their properties 
> > > > > against
> > > > > the class instead of object, then we can query what properties are 
> > > > > present
> > > > > against each machine type, while still using '-M none'.
> > > > > 
> > > > > >  (b) "this machine type only supports GIC versions X and Y even if 
> > > > > > the
> > > > > >   host supports more" (this is currently only hypothetical, 
> > > > > > though,
> > > > > >   since we only have the property on 'virt'. it would only 
> > > > > > happen
> > > > > >   if in the future we needed something other than '2' or '3' or
> > > > > >   'host' I think.)
> > > > > 
> > > > > Our introspection support in QOM only allows us to say that a property
> > > > > is a particular type (int / enum / str / whatever). We don't have any
> > > > > way to expose info about what subset of possible values for a type are
> > > > > permitted. So I don't see any near term way to inform apps that the
> > > > > gic property accepts values x, y and but not z.
> > 
> > This actually doesn't matter for the v2 vs. v3 case. The gic-version 
> > property
> > doesn't exist at all for v2-only QEMU. Although maybe the gic-version 
> > property
> > should be reworked. Instead of gic-version=, we could 
> > create
> > one boolean property per version supported, i.e. gicv2, gicv3, gicv4...
> > 
> > > > > 
> > > > > IMHO, we shouldn't try to overthink this. Libvirt can query the host
> > > > > to find out what GIC versions are supported and default to using the
> > > > > most recent version, on the basis that people are likely to have a
> > > > > matching QEMU. We can just rely on QEMU to report error if we pass
> > > > > i

Re: [Qemu-devel] [PATCH] arm: virt-acpi: each MADT.GICC entry as enabled unconditionally

2016-02-03 Thread Andrew Jones
On Fri, Jan 29, 2016 at 05:07:31PM +0100, Andrew Jones wrote:
> On Fri, Jan 29, 2016 at 11:44:24PM +0800, Shannon Zhao wrote:
> > >>>And below check in virt.c is not right while it should compare the global
> > >>>max_cpus with the max_cpus GIC supports.
> > >>>
> > >>> if (smp_cpus > max_cpus) {
> > >>> error_report("Number of SMP CPUs requested (%d) exceeds max 
> > >>> CPUs "
> > >>>  "supported by machine 'mach-virt' (%d)",
> > >>>  smp_cpus, max_cpus);
> > >>> exit(1);
> > >>> }
> > >max_cpus is getting set to the number the gic supports just above this
> > >check. So max_cpus == gic_supported_cpus already, and this check is just
> > >confirming the number of cpus the user has selected is OK.
> > No, the global max_cpus (which is defined in vl.c and exported in
> > sysemu/sysemu.h) is not the local variable max_cpus.
> 
> I now see what you mean though. If we don't want something like
> -smp 1,maxcpus=9 to erroneously succeed on a gicv2 machine, then we
> should be checking the global max_cpus here. I agree it should be
> fixed, because, even though it changes nothing atm, we don't want to
> allow invalid command lines.
> 
> Will you send the patch?

I'll send one in a second.

drew



[Qemu-devel] [PATCH] hw/arm/virt: fix max-cpus check

2016-02-03 Thread Andrew Jones
mach-virt doesn't yet support hotplug, but command lines specifying
-smp ,maxcpus= don't fail. Of course specifying
bigger-num as something bigger than the machine supports, e.g. > 8
on a gicv2 machine, should fail though. This fix also makes mach-
virt's max-cpus check truly consistent with the one in vl.c:main,
as the one there was already correctly checking max-cpus instead
of smp-cpus.

Reported-by: Shannon Zhao 
Signed-off-by: Andrew Jones 
---
 hw/arm/virt.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 15658f49c4e06..44bbbea92b1cf 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1013,7 +1013,7 @@ static void machvirt_init(MachineState *machine)
 MemoryRegion *sysmem = get_system_memory();
 MemoryRegion *secure_sysmem = NULL;
 int gic_version = vms->gic_version;
-int n, max_cpus;
+int n, virt_max_cpus;
 MemoryRegion *ram = g_new(MemoryRegion, 1);
 const char *cpu_model = machine->cpu_model;
 VirtBoardInfo *vbi;
@@ -1051,15 +1051,15 @@ static void machvirt_init(MachineState *machine)
  * many redistributors we can fit into the memory map.
  */
 if (gic_version == 3) {
-max_cpus = vbi->memmap[VIRT_GIC_REDIST].size / 0x2;
+virt_max_cpus = vbi->memmap[VIRT_GIC_REDIST].size / 0x2;
 } else {
-max_cpus = GIC_NCPU;
+virt_max_cpus = GIC_NCPU;
 }
 
-if (smp_cpus > max_cpus) {
+if (max_cpus > virt_max_cpus) {
 error_report("Number of SMP CPUs requested (%d) exceeds max CPUs "
  "supported by machine 'mach-virt' (%d)",
- smp_cpus, max_cpus);
+ max_cpus, virt_max_cpus);
 exit(1);
 }
 
-- 
2.4.3




Re: [Qemu-devel] [PATCH] hw/arm/virt: fix max-cpus check

2016-02-04 Thread Andrew Jones
On Thu, Feb 04, 2016 at 09:54:45AM +0800, Shannon Zhao wrote:
> 
> 
> On 2016/2/3 22:59, Andrew Jones wrote:
> > mach-virt doesn't yet support hotplug, but command lines specifying
> > -smp ,maxcpus= don't fail. Of course specifying
> > bigger-num as something bigger than the machine supports, e.g. > 8
> > on a gicv2 machine, should fail though. This fix also makes mach-
> > virt's max-cpus check truly consistent with the one in vl.c:main,
> > as the one there was already correctly checking max-cpus instead
> > of smp-cpus.
> > 
> > Reported-by: Shannon Zhao 
> > Signed-off-by: Andrew Jones 
> > ---
> >  hw/arm/virt.c | 10 +-
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 15658f49c4e06..44bbbea92b1cf 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -1013,7 +1013,7 @@ static void machvirt_init(MachineState *machine)
> >  MemoryRegion *sysmem = get_system_memory();
> >  MemoryRegion *secure_sysmem = NULL;
> >  int gic_version = vms->gic_version;
> > -int n, max_cpus;
> > +int n, virt_max_cpus;
> >  MemoryRegion *ram = g_new(MemoryRegion, 1);
> >  const char *cpu_model = machine->cpu_model;
> >  VirtBoardInfo *vbi;
> > @@ -1051,15 +1051,15 @@ static void machvirt_init(MachineState *machine)
> >   * many redistributors we can fit into the memory map.
> >   */
> >  if (gic_version == 3) {
> > -max_cpus = vbi->memmap[VIRT_GIC_REDIST].size / 0x2;
> > +virt_max_cpus = vbi->memmap[VIRT_GIC_REDIST].size / 0x2;
> >  } else {
> > -max_cpus = GIC_NCPU;
> > +virt_max_cpus = GIC_NCPU;
> >  }
> >  
> > -if (smp_cpus > max_cpus) {
> > +if (max_cpus > virt_max_cpus) {
> >  error_report("Number of SMP CPUs requested (%d) exceeds max CPUs "
> Does it need to change the SMP?^~~~

If we consider max_cpus to be [the maximum] "SMP CPUs" after hotplug, then
I think it's OK. When I changed smp_cpus to max_cpus in vl.c (c00cd995),
either that was my reasoning for leaving it (or it was just overlooked :-),
but now that the messages are consistent, I should change them both if that's
the consensus.

Thanks,
drew



Re: [Qemu-devel] [kvm-unit-tests PATCHv5 3/3] arm: pmu: Add CPI checking

2015-11-10 Thread Andrew Jones
On Mon, Nov 02, 2015 at 09:58:14AM -0600, Andrew Jones wrote:
> On Fri, Oct 30, 2015 at 03:32:43PM -0400, Christopher Covington wrote:
> > Hi Drew,
> > 
> > On 10/30/2015 09:00 AM, Andrew Jones wrote:
> > > On Wed, Oct 28, 2015 at 03:12:55PM -0400, Christopher Covington wrote:
> > >> Calculate the numbers of cycles per instruction (CPI) implied by ARM
> > >> PMU cycle counter values. The code includes a strict checking facility
> > >> intended for the -icount option in TCG mode but it is not yet enabled
> > >> in the configuration file. Enabling it must wait on infrastructure
> > >> improvements which allow for different tests to be run on TCG versus
> > >> KVM.
> > >>
> > >> Signed-off-by: Christopher Covington 
> > >> ---
> > >>  arm/pmu.c | 103 
> > >> +-
> > >>  1 file changed, 102 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/arm/pmu.c b/arm/pmu.c
> > >> index 4334de4..76a 100644
> > >> --- a/arm/pmu.c
> > >> +++ b/arm/pmu.c
> > >> @@ -43,6 +43,23 @@ static inline unsigned long get_pmccntr(void)
> > >>  asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
> > >>  return cycles;
> > >>  }
> > >> +
> > >> +/*
> > >> + * Extra instructions inserted by the compiler would be difficult to 
> > >> compensate
> > >> + * for, so hand assemble everything between, and including, the PMCR 
> > >> accesses
> > >> + * to start and stop counting.
> > >> + */
> > >> +static inline void loop(int i, uint32_t pmcr)
> > >> +{
> > >> +asm volatile(
> > >> +"   mcr p15, 0, %[pmcr], c9, c12, 0\n"
> > >> +"1: subs%[i], %[i], #1\n"
> > >> +"   bgt 1b\n"
> > >> +"   mcr p15, 0, %[z], c9, c12, 0\n"
> > >> +: [i] "+r" (i)
> > >> +: [pmcr] "r" (pmcr), [z] "r" (0)
> > >> +: "cc");
> > >> +}
> > >>  #elif defined(__aarch64__)
> > >>  static inline uint32_t get_pmcr(void)
> > >>  {
> > >> @@ -64,6 +81,23 @@ static inline unsigned long get_pmccntr(void)
> > >>  asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
> > >>  return cycles;
> > >>  }
> > >> +
> > >> +/*
> > >> + * Extra instructions inserted by the compiler would be difficult to 
> > >> compensate
> > >> + * for, so hand assemble everything between, and including, the PMCR 
> > >> accesses
> > >> + * to start and stop counting.
> > >> + */
> > >> +static inline void loop(int i, uint32_t pmcr)
> > >> +{
> > >> +asm volatile(
> > >> +"   msr pmcr_el0, %[pmcr]\n"
> > >> +"1: subs%[i], %[i], #1\n"
> > >> +"   b.gt1b\n"
> > >> +"   msr pmcr_el0, xzr\n"
> > >> +: [i] "+r" (i)
> > >> +: [pmcr] "r" (pmcr)
> > >> +: "cc");
> > >> +}
> > >>  #endif
> > >>  
> > >>  struct pmu_data {
> > >> @@ -131,12 +165,79 @@ static bool check_cycles_increase(void)
> > >>  return true;
> > >>  }
> > >>  
> > >> -int main(void)
> > >> +/*
> > >> + * Execute a known number of guest instructions. Only odd instruction 
> > >> counts
> > >> + * greater than or equal to 3 are supported by the in-line assembly 
> > >> code. The
> > >> + * control register (PMCR_EL0) is initialized with the provided value 
> > >> (allowing
> > >> + * for example for the cycle counter or event counters to be reset). At 
> > >> the end
> > >> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
> > >> + * counting, allowing the cycle counter or event counters to be read at 
> > >> the
> > >> + * leisure of the calling code.
> > >> + */
> > >> +static void measure_instrs(int num, uint32_t pmcr)
> >

[Qemu-devel] [PATCH 0/5] target-arm: enable qmp-dump-guest-memory

2015-11-19 Thread Andrew Jones
This series brings qmp-dump-guest-memory to arm and aarch64
targets. I've detailed my testing and the results in the
following table.

arm/aarch64 kvm guest kdump testing (P - PASS, F - FAIL). Testing done
with a latest mainline crash utility patched with [*]

.---.
|   Host| arm32 | arm64 | arm64 | arm64 |
|---|---|---|---|---|
|   Guest   | arm32 | arm64 | arm64 | arm32 |
|---|---|---|---|---|
|   Pagesize| 4K| 4K| 64K   | 4K|
|===|
| kdump in guest| F[1]  | P[2]  | P[3]  | F[1]  |
|---|---|---|---|---|
| qmp-dump-guest-memory [4]   | P | P | P | P |
|---|---|---|---|---|
| qmp-dump-guest-memory -z [5]| F[8]  | P | P | F[8]  |
|---|---|---|---|---|
| qmp-dump-guest-memory -l [6]| F[8]  | P | P | F[8]  |
|---|---|---|---|---|
| qmp-dump-guest-memory -s [7]| F[8]  | P | P | F[8]  |
.---.

[1] Kernel v4.4-rc1 crashes with a NULL pointer dereference at virtual
address  in a memcpy (crash_kexec/machine_kexec/fncpy/memcpy).
Needs kernel debugging.
[2] Not sure about mainline, but works with the RHEL kernel,
makedumpfile does not yet support arm64 with 4K pages, but using
'core_collector cp' in /etc/kdump.conf allows saving an uncompressed
elf file.
[3] Not sure about mainline, but works with the RHEL kernel,
uses makedumpfile, thus generates a makedumpfile formatted file
using zlib compression.
[4] No format specified, creates an uncompressed elf formatted file.
[5] makedumpfile format, with zlib compression
[6] makedumpfile format, with lzo compression
[7] makedumpfile format, with snappy compression
[8] The crash utility doesn't seem to like arm32 dumps in makedumpfile
format. Looks like the physical page bitmap is all zeros? Needs
qemu and crash debugging.

Additional notes:
1) QEMU also has scripts/dump-guest-memory.py, which can and should be
   updated to support multiple architectures, pagesizes, and physbases.
   This is currently left as future work.

[*] https://www.redhat.com/archives/crash-utility/2015-November/msg00031.html

Andrew Jones (5):
  qapi-schema: dump-guest-memory: Improve text
  dump: qemunotes aren't commonly needed
  dump: allow target to set the page size
  dump: allow target to set the phys_base
  target-arm: support QMP dump-guest-memory

 dump.c  | 129 +++--
 include/sysemu/dump-arch.h  |   9 +-
 include/sysemu/dump.h   |  11 +--
 qapi-schema.json|   4 +-
 qom/cpu.c   |   4 +-
 target-arm/Makefile.objs|   3 +-
 target-arm/arch_dump.c  | 222 
 target-arm/cpu-qom.h|   5 +
 target-arm/cpu.c|   3 +
 target-ppc/arch_dump.c  |   6 --
 target-ppc/cpu-qom.h|   2 -
 target-ppc/translate_init.c |   1 -
 target-s390x/arch_dump.c|   6 --
 target-s390x/cpu-qom.h  |   2 -
 target-s390x/cpu.c  |   1 -
 15 files changed, 321 insertions(+), 87 deletions(-)
 create mode 100644 target-arm/arch_dump.c

-- 
2.4.3




[Qemu-devel] [PATCH 3/5] dump: allow target to set the page size

2015-11-19 Thread Andrew Jones
This is necessary for targets that don't have TARGET_PAGE_SIZE ==
real-target-page-size. The target should set the page size to the
correct one, if known, or, if not known, to the maximum page size
it supports.

(No functional change.)

Signed-off-by: Andrew Jones 
---
 dump.c | 127 -
 include/sysemu/dump-arch.h |   8 +--
 include/sysemu/dump.h  |  10 +---
 3 files changed, 85 insertions(+), 60 deletions(-)

diff --git a/dump.c b/dump.c
index 78b7d843ceb29..e1d9bae9e89d1 100644
--- a/dump.c
+++ b/dump.c
@@ -347,18 +347,18 @@ static void write_memory(DumpState *s, GuestPhysBlock 
*block, ram_addr_t start,
 int64_t i;
 Error *local_err = NULL;
 
-for (i = 0; i < size / TARGET_PAGE_SIZE; i++) {
-write_data(s, block->host_addr + start + i * TARGET_PAGE_SIZE,
-   TARGET_PAGE_SIZE, &local_err);
+for (i = 0; i < size / s->dump_info.page_size; i++) {
+write_data(s, block->host_addr + start + i * s->dump_info.page_size,
+   s->dump_info.page_size, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
 }
 }
 
-if ((size % TARGET_PAGE_SIZE) != 0) {
-write_data(s, block->host_addr + start + i * TARGET_PAGE_SIZE,
-   size % TARGET_PAGE_SIZE, &local_err);
+if ((size % s->dump_info.page_size) != 0) {
+write_data(s, block->host_addr + start + i * s->dump_info.page_size,
+   size % s->dump_info.page_size, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -737,7 +737,7 @@ static void create_header32(DumpState *s, Error **errp)
 
 strncpy(dh->signature, KDUMP_SIGNATURE, strlen(KDUMP_SIGNATURE));
 dh->header_version = cpu_to_dump32(s, 6);
-block_size = TARGET_PAGE_SIZE;
+block_size = s->dump_info.page_size;
 dh->block_size = cpu_to_dump32(s, block_size);
 sub_hdr_size = sizeof(struct KdumpSubHeader32) + s->note_size;
 sub_hdr_size = DIV_ROUND_UP(sub_hdr_size, block_size);
@@ -837,7 +837,7 @@ static void create_header64(DumpState *s, Error **errp)
 
 strncpy(dh->signature, KDUMP_SIGNATURE, strlen(KDUMP_SIGNATURE));
 dh->header_version = cpu_to_dump32(s, 6);
-block_size = TARGET_PAGE_SIZE;
+block_size = s->dump_info.page_size;
 dh->block_size = cpu_to_dump32(s, block_size);
 sub_hdr_size = sizeof(struct KdumpSubHeader64) + s->note_size;
 sub_hdr_size = DIV_ROUND_UP(sub_hdr_size, block_size);
@@ -933,6 +933,11 @@ static void write_dump_header(DumpState *s, Error **errp)
 }
 }
 
+static size_t dump_bitmap_get_bufsize(DumpState *s)
+{
+return s->dump_info.page_size;
+}
+
 /*
  * set dump_bitmap sequencely. the bit before last_pfn is not allowed to be
  * rewritten, so if need to set the first bit, set last_pfn and pfn to 0.
@@ -946,6 +951,8 @@ static int set_dump_bitmap(uint64_t last_pfn, uint64_t pfn, 
bool value,
 off_t old_offset, new_offset;
 off_t offset_bitmap1, offset_bitmap2;
 uint32_t byte, bit;
+size_t bitmap_bufsize = dump_bitmap_get_bufsize(s);
+size_t bits_per_buf = bitmap_bufsize * CHAR_BIT;
 
 /* should not set the previous place */
 assert(last_pfn <= pfn);
@@ -956,14 +963,14 @@ static int set_dump_bitmap(uint64_t last_pfn, uint64_t 
pfn, bool value,
  * making new_offset be bigger than old_offset can also sync remained data
  * into vmcore.
  */
-old_offset = BUFSIZE_BITMAP * (last_pfn / PFN_BUFBITMAP);
-new_offset = BUFSIZE_BITMAP * (pfn / PFN_BUFBITMAP);
+old_offset = bitmap_bufsize * (last_pfn / bits_per_buf);
+new_offset = bitmap_bufsize * (pfn / bits_per_buf);
 
 while (old_offset < new_offset) {
 /* calculate the offset and write dump_bitmap */
 offset_bitmap1 = s->offset_dump_bitmap + old_offset;
 if (write_buffer(s->fd, offset_bitmap1, buf,
- BUFSIZE_BITMAP) < 0) {
+ bitmap_bufsize) < 0) {
 return -1;
 }
 
@@ -971,17 +978,17 @@ static int set_dump_bitmap(uint64_t last_pfn, uint64_t 
pfn, bool value,
 offset_bitmap2 = s->offset_dump_bitmap + s->len_dump_bitmap +
  old_offset;
 if (write_buffer(s->fd, offset_bitmap2, buf,
- BUFSIZE_BITMAP) < 0) {
+ bitmap_bufsize) < 0) {
 return -1;
 }
 
-memset(buf, 0, BUFSIZE_BITMAP);
-old_offset += BUFSIZE_BITMAP;
+memset(buf, 0, bitmap_bufsize);
+old_offset += bitmap_bufsize;
 }
 
 /* get the exact place of the bit in the buf, and set it */
-byte = (pfn % PFN_BUFBITMAP) / CHAR_BIT;
-bit = (pfn % PFN_BUFBITMAP) % CHAR_BIT;
+byte = (pfn % bits_per_buf) / CHAR_BIT;
+  

<    1   2   3   4   5   6   7   8   9   10   >