[RFC]How to improve KVM VM resource assignment and per-vm process/thread scheduling.

2014-04-24 Thread Huangpeng (Peter)
Hi, ALL

Currently kvm hypervisor have lots of features depend on linux standard apis, 
like vcpupin/mempin/processpin etc. But in the real production environment,
we need an automated resource assign and/or scheduling, is there any plan to
implement it?

resource assignment requirements like:
cpu eligible by the VMs
  In case it is eligible, whether it is in use
   if it is in use, whether it is dedicated to one VM, or can be shared by many 
VMs
 In case of Shared CPU
   need to configure oversubscription ratio
   used ratio info

So does memory, I/O device assignment requirements.

per-vm process/thread scheduling requirements like:
On hypervisor side, VMs use vhost-net, virtio-scsi devices have qemu io-thread, 
vhost-net 
thread, ovs thread, hNIC interrupt context(hirq/softirq), you shoud place there 
threads on the same
numa node to gain best performance, another important thing, you should balance 
these
threads' cpuload on all the numa cores to avoid unbalance between vcpu usable 
resources.

Thanks.

Peter Huang 


Re: [PATCH v3 2/4] live migration support for initial write protect of VM

2014-04-24 Thread Mario Smarduch
On 04/24/2014 09:39 AM, Steve Capper wrote:
> On Wed, Apr 23, 2014 at 12:18:07AM +0100, Mario Smarduch wrote:
>>
>>
>> Support for live migration initial write protect.
>> - moved write protect to architecture memory region prepare function. This
>>   way you can fail, abort migration without keep track of migration status.
>> - Above also allows to generalize read dirty log function with x86
>> - Added stage2_mark_pte_ro()
>> - optimized initial write protect, skip upper table lookups
>> - added stage2pmd_addr_end() to do generic 4 level table walk
>> - changed kvm_flush_remote_tlbs() to weak function
> 
> Hello Mario,
> I've taken a quick look at this and have a few suggestions below.
> (I'm not a KVM expert, but took a look at the memory manipulation).

Hi Steve,
your suggestions are very helpful, my response inline.

Thanks.
  Mario
> 
> Future versions of this series could probably benefit from being sent
> to lakml too?
> 
> Cheers,
> --
> Steve
> 
>>
>> Signed-off-by: Mario Smarduch 
>> ---
>>  arch/arm/include/asm/kvm_host.h |8 ++
>>  arch/arm/kvm/arm.c  |3 +
>>  arch/arm/kvm/mmu.c  |  163 
>> +++
>>  virt/kvm/kvm_main.c |5 +-
>>  4 files changed, 178 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index 1e739f9..9f827c8 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -67,6 +67,12 @@ struct kvm_arch {
>>
>> /* Interrupt controller */
>> struct vgic_distvgic;
>> +
>> +   /* Marks start of migration, used to handle 2nd stage page faults
>> +* during migration, prevent installing huge pages and split huge 
>> pages
>> +* to small pages.
>> +*/
>> +   int migration_in_progress;
>>  };
>>
>>  #define KVM_NR_MEM_OBJS 40
>> @@ -230,4 +236,6 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, 
>> u64 value);
>>
>>  void kvm_tlb_flush_vmid(struct kvm *kvm);
>>
>> +int kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
>> +
>>  #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 9a4bc10..b916478 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -233,6 +233,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>>struct kvm_userspace_memory_region *mem,
>>enum kvm_mr_change change)
>>  {
>> +   /* Request for migration issued by user, write protect memory slot */
>> +   if ((change != KVM_MR_DELETE) && (mem->flags & 
>> KVM_MEM_LOG_DIRTY_PAGES))
>> +   return kvm_mmu_slot_remove_write_access(kvm, mem->slot);
>> return 0;
>>  }
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 7ab77f3..4d029a6 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -31,6 +31,11 @@
>>
>>  #include "trace.h"
>>
>> +#define stage2pud_addr_end(addr, end)  \
>> +({ u64 __boundary = ((addr) + PUD_SIZE) & PUD_MASK;\
>> +   (__boundary - 1 < (end) - 1) ? __boundary : (end);  \
>> +})
> 
> A matter of personal preference: can this be a static inline function
> instead? That way you could avoid ambiguity with the parameter types.
> (not an issue here, but this has bitten me in the past).

Yes good point, will change.
> 
>> +
>>  extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
>>
>>  static pgd_t *boot_hyp_pgd;
>> @@ -569,6 +574,15 @@ static int stage2_set_pte(struct kvm *kvm, struct 
>> kvm_mmu_memory_cache *cache,
>> return 0;
>>  }
>>
>> +/* Write protect page */
>> +static void stage2_mark_pte_ro(pte_t *pte)
>> +{
>> +   pte_t new_pte;
>> +
>> +   new_pte = pfn_pte(pte_pfn(*pte), PAGE_S2);
>> +   *pte = new_pte;
>> +}
> 
> This isn't making the pte read only.
> It's nuking all the flags from the pte and replacing them with factory
> settings. (In this case the PAGE_S2 pgprot).
> If we had other attributes that we later wish to retain this could be
> easily overlooked. Perhaps a new name for the function?

Yes that's pretty bad, I'll clear the write protect bit only.

> 
>> +
>>  /**
>>   * kvm_phys_addr_ioremap - map a device range to guest IPA
>>   *
>> @@ -649,6 +663,155 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, 
>> phys_addr_t *ipap)
>> return false;
>>  }
>>
>> +/**
>> + * split_pmd - splits huge pages to small pages, required to keep a dirty 
>> log of
>> + *  smaller memory granules, otherwise huge pages would need to be
>> + *  migrated. Practically an idle system has problems migrating with
>> + *  huge pages.  Called during WP of entire VM address space, done
>> + *  initially when  migration thread isses the KVM_MEM_LOG_DIRTY_PAGES
>> + *  ioctl.
>> + *  The mmu_lock is held during splitting.
>> + *
>> + * @kvm:The KVM p

Re: [patch 2/2] target-i386: block migration and savevm if invariant tsc is exposed

2014-04-24 Thread Eduardo Habkost
On Fri, Apr 25, 2014 at 12:57:48AM +0200, Paolo Bonzini wrote:
> Il 24/04/2014 22:57, Eduardo Habkost ha scritto:
> >On Thu, Apr 24, 2014 at 04:42:33PM -0400, Paolo Bonzini wrote:
> >>Il 22/04/2014 21:14, Eduardo Habkost ha scritto:
> >>>Not for "-cpu host". If somebody needs migration to work, they shouldn't
> >>>be using "-cpu host" anyway (I don't know if you have seen the other
> >>>comments in my message?).
> >>
> >>I'm not entirely sure.  If you have hosts with exactly identical
> >>chipsets, "-cpu host" migration will in all likelihood work.
> >>Marcelo's approach is safer.
> >
> >If that didn't break other use cases, I would agree.
> >
> >But "-cpu host" today covers two use cases: 1) enabling everything that
> >can be enabled, even if it breaks migration; 2) enabling all stuff that
> >can be safely enabled without breaking migration.
> 
> What does it enable *now* that breaks migration?

Every single feature it enables can break it. It breaks if you upgrade
to a QEMU version with new feature words. It breaks if you upgrade to a
kernel which supports new features.

A feature that doesn't let you upgrade the kernel isn't a feature I
expect users to be relying upon. libvirt even blocks migration if "-cpu
host" is in use.

> 
> >Now we can't do both at the same time[1].
> >
> >(1) is important for management software;
> >(2) works only if you are lucky.
> 
> Or if you plan ahead.  With additional logic even invariant TSC in
> principle can be made to work across migration if the host clocks are
> synchronized well enough (PTP accuracy is in the 100-1000 TSC ticks
> range).

Yes, it is possible in the future. But we never planned for it, so "-cpu
host" never supported migration.

> 
> >Why would it make sense to break (1) to try make (2) work?
> >
> >[1] I would even argue that we never did both at the same time."-cpu
> >host" depends on host hardware capabilities, host kernel capabilities,
> >and host QEMU version (we never took care of keeping guest ABI with
> >"-cpu host"). If migration did work, it was never supposed to.
> 
> I think this is where I disagree.  Migration of the PMU is one thing
> that obviously was done with "-cpu host" in mind.

We may try to make a reliable implementation of use case (2) some day,
yes. But the choice I see right now is between trying not break a
feature that was never declared to exist, or breaking an existing
interface that is required to solve existing bugs between libvirt and
QEMU.

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/2] target-i386: block migration and savevm if invariant tsc is exposed

2014-04-24 Thread Paolo Bonzini

Il 24/04/2014 22:57, Eduardo Habkost ha scritto:

On Thu, Apr 24, 2014 at 04:42:33PM -0400, Paolo Bonzini wrote:

Il 22/04/2014 21:14, Eduardo Habkost ha scritto:

Not for "-cpu host". If somebody needs migration to work, they shouldn't
be using "-cpu host" anyway (I don't know if you have seen the other
comments in my message?).


I'm not entirely sure.  If you have hosts with exactly identical
chipsets, "-cpu host" migration will in all likelihood work.
Marcelo's approach is safer.


If that didn't break other use cases, I would agree.

But "-cpu host" today covers two use cases: 1) enabling everything that
can be enabled, even if it breaks migration; 2) enabling all stuff that
can be safely enabled without breaking migration.


What does it enable *now* that breaks migration?


Now we can't do both at the same time[1].

(1) is important for management software;
(2) works only if you are lucky.


Or if you plan ahead.  With additional logic even invariant TSC in 
principle can be made to work across migration if the host clocks are 
synchronized well enough (PTP accuracy is in the 100-1000 TSC ticks range).



Why would it make sense to break (1) to try make (2) work?

[1] I would even argue that we never did both at the same time."-cpu
host" depends on host hardware capabilities, host kernel capabilities,
and host QEMU version (we never took care of keeping guest ABI with
"-cpu host"). If migration did work, it was never supposed to.


I think this is where I disagree.  Migration of the PMU is one thing 
that obviously was done with "-cpu host" in mind.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm: Use pci_enable_msix_exact() instead of pci_enable_msix()

2014-04-24 Thread Paolo Bonzini
> > >>> So, do I have to pull something (which I'd rather not, since pulling
> > >>> the wrong thing in a submaintainer tree will make Linus angry), or
> > >>> should I do it in the next merge window after pci_enable_msix_exact
> > >>> gets in?
> > >So it is already in.
> > 
> > It is not, because maintainer branches are not rebased.  KVM
> > development is based on 3.14-rc1, and will not get that commit until
> > the first 3.15 pull request is sent to Linux.
> > 
> > No big deal, I'll include this patch in a second 3.15 pull request.
> 
> Hi Paolo,
> 
> I believe it is safe to pull it now?

Yup, vacation got in the way of doing this during the merge window but I
can safely send this for -rc next week.  It was on my todo list.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/2] target-i386: block migration and savevm if invariant tsc is exposed

2014-04-24 Thread Eduardo Habkost
On Thu, Apr 24, 2014 at 04:42:33PM -0400, Paolo Bonzini wrote:
> Il 22/04/2014 21:14, Eduardo Habkost ha scritto:
> >Not for "-cpu host". If somebody needs migration to work, they shouldn't
> >be using "-cpu host" anyway (I don't know if you have seen the other
> >comments in my message?).
> 
> I'm not entirely sure.  If you have hosts with exactly identical
> chipsets, "-cpu host" migration will in all likelihood work.
> Marcelo's approach is safer.

If that didn't break other use cases, I would agree.

But "-cpu host" today covers two use cases: 1) enabling everything that
can be enabled, even if it breaks migration; 2) enabling all stuff that
can be safely enabled without breaking migration.

Now we can't do both at the same time[1].

(1) is important for management software;
(2) works only if you are lucky.

Why would it make sense to break (1) to try make (2) work?

[1] I would even argue that we never did both at the same time."-cpu
host" depends on host hardware capabilities, host kernel capabilities,
and host QEMU version (we never took care of keeping guest ABI with
"-cpu host"). If migration did work, it was never supposed to.

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/2] target-i386: block migration and savevm if invariant tsc is exposed

2014-04-24 Thread Paolo Bonzini

Il 22/04/2014 21:14, Eduardo Habkost ha scritto:

Not for "-cpu host". If somebody needs migration to work, they shouldn't
be using "-cpu host" anyway (I don't know if you have seen the other
comments in my message?).


I'm not entirely sure.  If you have hosts with exactly identical 
chipsets, "-cpu host" migration will in all likelihood work.  Marcelo's 
approach is safer.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: target-i386: block migration and savevm if invariant tsc is exposed (v3)

2014-04-24 Thread Eduardo Habkost
On Wed, Apr 23, 2014 at 06:04:45PM -0300, Marcelo Tosatti wrote:
> 
> Invariant TSC documentation mentions that "invariant TSC will run at a
> constant rate in all ACPI P-, C-. and T-states".
> 
> This is not the case if migration to a host with different TSC frequency 
> is allowed, or if savevm is performed. So block migration/savevm.
> 
> Signed-off-by: Marcelo Tosatti 
> 
[...]
> @@ -702,6 +706,16 @@ int kvm_arch_init_vcpu(CPUState *cs)
>!!(c->ecx & CPUID_EXT_SMX);
>  }
>  
> +c = cpuid_find_entry(&cpuid_data.cpuid, 0x8007, 0);
> +if (c && (c->edx & 1<<8) && invtsc_mig_blocker == NULL) {
> +/* for migration */
> +error_set(&invtsc_mig_blocker,
> +  QERR_DEVICE_FEATURE_BLOCKS_MIGRATION, "invtsc", "cpu");
> +migrate_add_blocker(invtsc_mig_blocker);
> +/* for savevm */
> +vmstate_x86_cpu.unmigratable = 1;

Did you ensure this will always happen before vmstate_register() is
called for vmstate_x86_cpu? I believe kvm_arch_init_vcpu() is called a
long long time after device_set_realized() (which is where
vmstate_register() is called for DeviceState objects).

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/9] kvmtool: handle guests of a different endianness

2014-04-24 Thread Marc Zyngier
This patch series adds some infrastructure to kvmtool to allow a BE
guest to use virtio-mmio on a LE host, provided that the architecture
actually supports such madness.

Not all the backend have been converted, only those I actually cared
about. Converting them is pretty easy though, and will be done if the
method is deemed acceptable.

This has been tested on both arm and arm64 (I use this on a daily
basis to test BE code). The corresponding kernel changes have all been
merged.

Also available at:
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git 
kvm-arm64/kvmtool-be-on-le

>From v2 (never posted):
- Fixed tons of bugs (config space)
- Fixed TAP networking

>From v1:
- Gave up on the virtio extension after the push back from the PPC
  guys. Instead, we snapshot the endianness of the vcpu  when it
  tries to reset the device. A bit ugly, but doesn't require any
  change on the kernel side.

Marc Zyngier (9):
  kvmtool: pass trapped vcpu to MMIO accessors
  kvmtool: virt_queue configuration based on endianness
  kvmtool: sample CPU endianness on virtio-mmio device reset
  kvmtool: add queue endianness initializer
  kvmtool: convert console backend to support bi-endianness
  kvmtool: convert 9p backend to support bi-endianness
  kvmtool: convert blk backend to support bi-endianness
  kvmtool: convert net backend to support bi-endianness
  kvmtool: virtio: enable arm/arm64 support for bi-endianness

 tools/kvm/arm/aarch32/kvm-cpu.c  | 14 
 tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h |  2 +
 tools/kvm/arm/aarch64/kvm-cpu.c  | 25 
 tools/kvm/arm/include/arm-common/kvm-arch.h  |  2 +
 tools/kvm/arm/include/arm-common/kvm-cpu-arch.h  |  4 +-
 tools/kvm/arm/kvm-cpu.c  | 10 +--
 tools/kvm/hw/pci-shmem.c |  2 +-
 tools/kvm/include/kvm/kvm-cpu.h  |  1 +
 tools/kvm/include/kvm/kvm.h  |  4 +-
 tools/kvm/include/kvm/virtio.h   | 82 +++-
 tools/kvm/kvm-cpu.c  | 10 ++-
 tools/kvm/mmio.c | 11 ++--
 tools/kvm/pci.c  |  3 +-
 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h |  2 +-
 tools/kvm/powerpc/kvm-cpu.c  |  4 +-
 tools/kvm/powerpc/spapr_pci.h|  6 +-
 tools/kvm/virtio/9p.c|  3 +
 tools/kvm/virtio/blk.c   | 31 +++--
 tools/kvm/virtio/console.c   |  8 ++-
 tools/kvm/virtio/core.c  | 59 +
 tools/kvm/virtio/mmio.c  | 21 --
 tools/kvm/virtio/net.c   | 45 +++--
 tools/kvm/virtio/pci.c   |  6 +-
 tools/kvm/x86/include/kvm/kvm-cpu-arch.h |  4 +-
 24 files changed, 284 insertions(+), 75 deletions(-)

-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 5/9] kvmtool: convert console backend to support bi-endianness

2014-04-24 Thread Marc Zyngier
Configure the queues to follow the guest endianness, and make sure
the configuration space is doing the same.

Signed-off-by: Marc Zyngier 
---
 tools/kvm/virtio/console.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/kvm/virtio/console.c b/tools/kvm/virtio/console.c
index 0474e2b..384eac1 100644
--- a/tools/kvm/virtio/console.c
+++ b/tools/kvm/virtio/console.c
@@ -131,7 +131,12 @@ static u32 get_host_features(struct kvm *kvm, void *dev)
 
 static void set_guest_features(struct kvm *kvm, void *dev, u32 features)
 {
-   /* Unused */
+   struct con_dev *cdev = dev;
+   struct virtio_console_config *conf = &cdev->config;
+
+   conf->cols = virtio_host_to_guest_u16(&cdev->vdev, conf->cols);
+   conf->rows = virtio_host_to_guest_u16(&cdev->vdev, conf->rows);
+   conf->max_nr_ports = virtio_host_to_guest_u32(&cdev->vdev, 
conf->max_nr_ports);
 }
 
 static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 page_size, u32 
align,
@@ -149,6 +154,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 
page_size, u32 align,
p   = virtio_get_vq(kvm, queue->pfn, page_size);
 
vring_init(&queue->vring, VIRTIO_CONSOLE_QUEUE_SIZE, p, align);
+   virtio_init_device_vq(&cdev.vdev, queue);
 
if (vq == VIRTIO_CONSOLE_TX_QUEUE) {
thread_pool__init_job(&cdev.jobs[vq], kvm, 
virtio_console_handle_callback, queue);
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 9/9] kvmtool: virtio: enable arm/arm64 support for bi-endianness

2014-04-24 Thread Marc Zyngier
Implement the kcm_cpu__get_endianness call for both AArch32 and
AArch64, and advertise the bi-endianness support.

Signed-off-by: Marc Zyngier 
---
 tools/kvm/arm/aarch32/kvm-cpu.c  | 14 +
 tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h |  2 ++
 tools/kvm/arm/aarch64/kvm-cpu.c  | 25 
 tools/kvm/arm/include/arm-common/kvm-arch.h  |  2 ++
 4 files changed, 43 insertions(+)

diff --git a/tools/kvm/arm/aarch32/kvm-cpu.c b/tools/kvm/arm/aarch32/kvm-cpu.c
index bd71037..464b473 100644
--- a/tools/kvm/arm/aarch32/kvm-cpu.c
+++ b/tools/kvm/arm/aarch32/kvm-cpu.c
@@ -1,5 +1,6 @@
 #include "kvm/kvm-cpu.h"
 #include "kvm/kvm.h"
+#include "kvm/virtio.h"
 
 #include 
 
@@ -76,6 +77,19 @@ void kvm_cpu__reset_vcpu(struct kvm_cpu *vcpu)
die_perror("KVM_SET_ONE_REG failed (pc)");
 }
 
+int kvm_cpu__get_endianness(struct kvm_cpu *vcpu)
+{
+   struct kvm_one_reg reg;
+   u32 data;
+
+   reg.id = ARM_CORE_REG(usr_regs.ARM_cpsr);
+   reg.addr = (u64)(unsigned long)&data;
+   if (ioctl(vcpu->vcpu_fd, KVM_GET_ONE_REG, ®) < 0)
+   die("KVM_GET_ONE_REG failed (cpsr)");
+
+   return (data & PSR_E_BIT) ? VIRTIO_ENDIAN_BE : VIRTIO_ENDIAN_LE;
+}
+
 void kvm_cpu__show_code(struct kvm_cpu *vcpu)
 {
struct kvm_one_reg reg;
diff --git a/tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h 
b/tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h
index 7d70c3b..ed7da45 100644
--- a/tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h
+++ b/tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h
@@ -13,5 +13,7 @@
 #define ARM_MPIDR_HWID_BITMASK 0xFF00FFUL
 #define ARM_CPU_ID 3, 0, 0, 0
 #define ARM_CPU_ID_MPIDR   5
+#define ARM_CPU_CTRL   3, 0, 1, 0
+#define ARM_CPU_CTRL_SCTLR 0
 
 #endif /* KVM__KVM_CPU_ARCH_H */
diff --git a/tools/kvm/arm/aarch64/kvm-cpu.c b/tools/kvm/arm/aarch64/kvm-cpu.c
index 059e42c..b3ce2c8 100644
--- a/tools/kvm/arm/aarch64/kvm-cpu.c
+++ b/tools/kvm/arm/aarch64/kvm-cpu.c
@@ -1,12 +1,16 @@
 #include "kvm/kvm-cpu.h"
 #include "kvm/kvm.h"
+#include "kvm/virtio.h"
 
 #include 
 
 #define COMPAT_PSR_F_BIT   0x0040
 #define COMPAT_PSR_I_BIT   0x0080
+#define COMPAT_PSR_E_BIT   0x0200
 #define COMPAT_PSR_MODE_SVC0x0013
 
+#define SCTLR_EL1_EE_MASK  (1 << 25)
+
 #define ARM64_CORE_REG(x)  (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
 KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
 
@@ -133,6 +137,27 @@ void kvm_cpu__reset_vcpu(struct kvm_cpu *vcpu)
return reset_vcpu_aarch64(vcpu);
 }
 
+int kvm_cpu__get_endianness(struct kvm_cpu *vcpu)
+{
+   struct kvm_one_reg reg;
+   u64 data;
+
+   reg.id = ARM64_CORE_REG(regs.pstate);
+   reg.addr = (u64)&data;
+   if (ioctl(vcpu->vcpu_fd, KVM_GET_ONE_REG, ®) < 0)
+   die("KVM_GET_ONE_REG failed (spsr[EL1])");
+
+   if (data & PSR_MODE32_BIT)
+   return (data & COMPAT_PSR_E_BIT) ? VIRTIO_ENDIAN_BE : 
VIRTIO_ENDIAN_LE;
+
+   reg.id = ARM64_SYS_REG(ARM_CPU_CTRL, ARM_CPU_CTRL_SCTLR); /* SCTLR_EL1 
*/
+   reg.addr = (u64)&data;
+   if (ioctl(vcpu->vcpu_fd, KVM_GET_ONE_REG, ®) < 0)
+   die("KVM_GET_ONE_REG failed (SCTLR_EL1)");
+
+   return (data & SCTLR_EL1_EE_MASK) ? VIRTIO_ENDIAN_BE : VIRTIO_ENDIAN_LE;
+}
+
 void kvm_cpu__show_code(struct kvm_cpu *vcpu)
 {
struct kvm_one_reg reg;
diff --git a/tools/kvm/arm/include/arm-common/kvm-arch.h 
b/tools/kvm/arm/include/arm-common/kvm-arch.h
index b6c4bf8..5d2fab2 100644
--- a/tools/kvm/arm/include/arm-common/kvm-arch.h
+++ b/tools/kvm/arm/include/arm-common/kvm-arch.h
@@ -35,6 +35,8 @@
 #define VIRTIO_DEFAULT_TRANS(kvm)  \
((kvm)->cfg.arch.virtio_trans_pci ? VIRTIO_PCI : VIRTIO_MMIO)
 
+#define VIRTIO_RING_ENDIAN (VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)
+
 static inline bool arm_addr_in_ioport_region(u64 phys_addr)
 {
u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/9] kvmtool: sample CPU endianness on virtio-mmio device reset

2014-04-24 Thread Marc Zyngier
Save the CPU endianness when the device is reset. It is widely
assumed that the guest won't change its endianness after, or at
least not without reseting the device first.

A default implementation of the endianness sampling just returns
the default "host endianness" value so that unsuspecting architectures
are not affected.

Signed-off-by: Marc Zyngier 
---
 tools/kvm/include/kvm/kvm-cpu.h | 1 +
 tools/kvm/include/kvm/virtio.h  | 1 +
 tools/kvm/kvm-cpu.c | 6 ++
 tools/kvm/virtio/mmio.c | 3 +++
 4 files changed, 11 insertions(+)

diff --git a/tools/kvm/include/kvm/kvm-cpu.h b/tools/kvm/include/kvm/kvm-cpu.h
index 0ece28c..aa0cb54 100644
--- a/tools/kvm/include/kvm/kvm-cpu.h
+++ b/tools/kvm/include/kvm/kvm-cpu.h
@@ -15,6 +15,7 @@ void kvm_cpu__run(struct kvm_cpu *vcpu);
 void kvm_cpu__reboot(struct kvm *kvm);
 int kvm_cpu__start(struct kvm_cpu *cpu);
 bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu);
+int kvm_cpu__get_endianness(struct kvm_cpu *vcpu);
 
 int kvm_cpu__get_debug_fd(void);
 void kvm_cpu__set_debug_fd(int fd);
diff --git a/tools/kvm/include/kvm/virtio.h b/tools/kvm/include/kvm/virtio.h
index f6bddd9..1180a3e 100644
--- a/tools/kvm/include/kvm/virtio.h
+++ b/tools/kvm/include/kvm/virtio.h
@@ -132,6 +132,7 @@ struct virtio_device {
booluse_vhost;
void*virtio;
struct virtio_ops   *ops;
+   u16 endian;
 };
 
 struct virtio_ops {
diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c
index 5c70b00..9575b32 100644
--- a/tools/kvm/kvm-cpu.c
+++ b/tools/kvm/kvm-cpu.c
@@ -3,6 +3,7 @@
 #include "kvm/symbol.h"
 #include "kvm/util.h"
 #include "kvm/kvm.h"
+#include "kvm/virtio.h"
 
 #include 
 #include 
@@ -14,6 +15,11 @@
 
 extern __thread struct kvm_cpu *current_kvm_cpu;
 
+int __attribute__((weak)) kvm_cpu__get_endianness(struct kvm_cpu *vcpu)
+{
+   return VIRTIO_ENDIAN_HOST;
+}
+
 void kvm_cpu__enable_singlestep(struct kvm_cpu *vcpu)
 {
struct kvm_guest_debug debug = {
diff --git a/tools/kvm/virtio/mmio.c b/tools/kvm/virtio/mmio.c
index 9d385e2..3a2bd62 100644
--- a/tools/kvm/virtio/mmio.c
+++ b/tools/kvm/virtio/mmio.c
@@ -4,6 +4,7 @@
 #include "kvm/ioport.h"
 #include "kvm/virtio.h"
 #include "kvm/kvm.h"
+#include "kvm/kvm-cpu.h"
 #include "kvm/irq.h"
 #include "kvm/fdt.h"
 
@@ -159,6 +160,8 @@ static void virtio_mmio_config_out(struct kvm_cpu *vcpu,
break;
case VIRTIO_MMIO_STATUS:
vmmio->hdr.status = ioport__read32(data);
+   if (!vmmio->hdr.status) /* Sample endianness on reset */
+   vdev->endian = kvm_cpu__get_endianness(vcpu);
if (vdev->ops->notify_status)
vdev->ops->notify_status(kvm, vmmio->dev, 
vmmio->hdr.status);
break;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/9] kvmtool: add queue endianness initializer

2014-04-24 Thread Marc Zyngier
Add a utility function that transfers the endianness sampled
at device reset time to a queue being set up.

Signed-off-by: Marc Zyngier 
---
 tools/kvm/include/kvm/virtio.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tools/kvm/include/kvm/virtio.h b/tools/kvm/include/kvm/virtio.h
index 1180a3e..8a9eab5 100644
--- a/tools/kvm/include/kvm/virtio.h
+++ b/tools/kvm/include/kvm/virtio.h
@@ -28,6 +28,7 @@ struct virt_queue {
   It's where we assume the next request index is at.  */
u16 last_avail_idx;
u16 last_used_signalled;
+   u16 endian;
 };
 
 /*
@@ -165,4 +166,10 @@ static inline void *virtio_get_vq(struct kvm *kvm, u32 
pfn, u32 page_size)
return guest_flat_to_host(kvm, (u64)pfn * page_size);
 }
 
+static inline void virtio_init_device_vq(struct virtio_device *vdev,
+struct virt_queue *vq)
+{
+   vq->endian = vdev->endian;
+}
+
 #endif /* KVM__VIRTIO_H */
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 7/9] kvmtool: convert blk backend to support bi-endianness

2014-04-24 Thread Marc Zyngier
Configure the queues to follow the guest endianness, and make sure
the configuration space is doing the same.

Signed-off-by: Marc Zyngier 
---
 tools/kvm/virtio/blk.c | 31 +--
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/tools/kvm/virtio/blk.c b/tools/kvm/virtio/blk.c
index 4bed3a9..edfa8e6 100644
--- a/tools/kvm/virtio/blk.c
+++ b/tools/kvm/virtio/blk.c
@@ -77,13 +77,15 @@ void virtio_blk_complete(void *param, long len)
bdev->vdev.ops->signal_vq(req->kvm, &bdev->vdev, queueid);
 }
 
-static void virtio_blk_do_io_request(struct kvm *kvm, struct blk_dev_req *req)
+static void virtio_blk_do_io_request(struct kvm *kvm, struct virt_queue *vq, 
struct blk_dev_req *req)
 {
struct virtio_blk_outhdr *req_hdr;
ssize_t block_cnt;
struct blk_dev *bdev;
struct iovec *iov;
u16 out, in;
+   u32 type;
+   u64 sector;
 
block_cnt   = -1;
bdev= req->bdev;
@@ -92,13 +94,16 @@ static void virtio_blk_do_io_request(struct kvm *kvm, 
struct blk_dev_req *req)
in  = req->in;
req_hdr = iov[0].iov_base;
 
-   switch (req_hdr->type) {
+   type = virtio_guest_to_host_u32(vq, req_hdr->type);
+   sector = virtio_guest_to_host_u64(vq, req_hdr->sector);
+
+   switch (type) {
case VIRTIO_BLK_T_IN:
-   block_cnt = disk_image__read(bdev->disk, req_hdr->sector,
+   block_cnt = disk_image__read(bdev->disk, sector,
iov + 1, in + out - 2, req);
break;
case VIRTIO_BLK_T_OUT:
-   block_cnt = disk_image__write(bdev->disk, req_hdr->sector,
+   block_cnt = disk_image__write(bdev->disk, sector,
iov + 1, in + out - 2, req);
break;
case VIRTIO_BLK_T_FLUSH:
@@ -112,7 +117,7 @@ static void virtio_blk_do_io_request(struct kvm *kvm, 
struct blk_dev_req *req)
virtio_blk_complete(req, block_cnt);
break;
default:
-   pr_warning("request type %d", req_hdr->type);
+   pr_warning("request type %d", type);
block_cnt   = -1;
break;
}
@@ -130,7 +135,7 @@ static void virtio_blk_do_io(struct kvm *kvm, struct 
virt_queue *vq, struct blk_
&req->in, head, kvm);
req->vq = vq;
 
-   virtio_blk_do_io_request(kvm, req);
+   virtio_blk_do_io_request(kvm, vq, req);
}
 }
 
@@ -152,8 +157,21 @@ static u32 get_host_features(struct kvm *kvm, void *dev)
 static void set_guest_features(struct kvm *kvm, void *dev, u32 features)
 {
struct blk_dev *bdev = dev;
+   struct virtio_blk_config *conf = &bdev->blk_config;
+   struct virtio_blk_geometry *geo = &conf->geometry;
 
bdev->features = features;
+
+   conf->capacity = virtio_host_to_guest_u64(&bdev->vdev, conf->capacity);
+   conf->size_max = virtio_host_to_guest_u32(&bdev->vdev, conf->size_max);
+   conf->seg_max = virtio_host_to_guest_u32(&bdev->vdev, conf->seg_max);
+
+   /* Geometry */
+   geo->cylinders = virtio_host_to_guest_u16(&bdev->vdev, geo->cylinders);
+
+   conf->blk_size = virtio_host_to_guest_u32(&bdev->vdev, conf->blk_size);
+   conf->min_io_size = virtio_host_to_guest_u16(&bdev->vdev, 
conf->min_io_size);
+   conf->opt_io_size = virtio_host_to_guest_u32(&bdev->vdev, 
conf->opt_io_size);
 }
 
 static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 page_size, u32 
align,
@@ -170,6 +188,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 
page_size, u32 align,
p   = virtio_get_vq(kvm, queue->pfn, page_size);
 
vring_init(&queue->vring, VIRTIO_BLK_QUEUE_SIZE, p, align);
+   virtio_init_device_vq(&bdev->vdev, queue);
 
return 0;
 }
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 8/9] kvmtool: convert net backend to support bi-endianness

2014-04-24 Thread Marc Zyngier
Configure the queues to follow the guest endianness, and make sure
the configuration space is doing the same.

Extra care is taken for the handling of the virtio_net_hdr structures
on both the TX and RX ends.

Signed-off-by: Marc Zyngier 
---
 tools/kvm/virtio/net.c | 45 -
 1 file changed, 40 insertions(+), 5 deletions(-)

diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c
index dbb4431..363ec73 100644
--- a/tools/kvm/virtio/net.c
+++ b/tools/kvm/virtio/net.c
@@ -73,6 +73,24 @@ static bool has_virtio_feature(struct net_dev *ndev, u32 
feature)
return ndev->features & (1 << feature);
 }
 
+static void virtio_net_fix_tx_hdr(struct virtio_net_hdr *hdr, struct net_dev 
*ndev)
+{
+   hdr->hdr_len= virtio_guest_to_host_u16(&ndev->vdev, 
hdr->hdr_len);
+   hdr->gso_size   = virtio_guest_to_host_u16(&ndev->vdev, 
hdr->gso_size);
+   hdr->csum_start = virtio_guest_to_host_u16(&ndev->vdev, 
hdr->csum_start);
+   hdr->csum_offset= virtio_guest_to_host_u16(&ndev->vdev, 
hdr->csum_offset);
+}
+
+static void virtio_net_fix_rx_hdr(struct virtio_net_hdr_mrg_rxbuf *hdr, struct 
net_dev *ndev)
+{
+   hdr->hdr.hdr_len= virtio_host_to_guest_u16(&ndev->vdev, 
hdr->hdr.hdr_len);
+   hdr->hdr.gso_size   = virtio_host_to_guest_u16(&ndev->vdev, 
hdr->hdr.gso_size);
+   hdr->hdr.csum_start = virtio_host_to_guest_u16(&ndev->vdev, 
hdr->hdr.csum_start);
+   hdr->hdr.csum_offset= virtio_host_to_guest_u16(&ndev->vdev, 
hdr->hdr.csum_offset);
+   if (has_virtio_feature(ndev, VIRTIO_NET_F_MRG_RXBUF))
+   hdr->num_buffers= virtio_host_to_guest_u16(&ndev->vdev, 
hdr->num_buffers);
+}
+
 static void *virtio_net_rx_thread(void *p)
 {
struct iovec iov[VIRTIO_NET_QUEUE_SIZE];
@@ -106,6 +124,7 @@ static void *virtio_net_rx_thread(void *p)
.iov_len  = sizeof(buffer),
};
struct virtio_net_hdr_mrg_rxbuf *hdr;
+   int i;
 
len = ndev->ops->rx(&dummy_iov, 1, ndev);
if (len < 0) {
@@ -114,16 +133,20 @@ static void *virtio_net_rx_thread(void *p)
goto out_err;
}
 
-   copied = 0;
+   copied = i = 0;
head = virt_queue__get_iov(vq, iov, &out, &in, kvm);
-   hdr = (void *)iov[0].iov_base;
+   hdr = iov[0].iov_base;
while (copied < len) {
size_t iovsize = min_t(size_t, len - copied, 
iov_size(iov, in));
 
memcpy_toiovec(iov, buffer + copied, iovsize);
copied += iovsize;
-   if (has_virtio_feature(ndev, 
VIRTIO_NET_F_MRG_RXBUF))
-   hdr->num_buffers++;
+   if (i++ == 0)
+   virtio_net_fix_rx_hdr(hdr, ndev);
+   if (has_virtio_feature(ndev, 
VIRTIO_NET_F_MRG_RXBUF)) {
+   u16 num_buffers = 
virtio_guest_to_host_u16(vq, hdr->num_buffers);
+   hdr->num_buffers = 
virtio_host_to_guest_u16(vq, num_buffers + 1);
+   }
virt_queue__set_used_elem(vq, head, iovsize);
if (copied == len)
break;
@@ -170,11 +193,14 @@ static void *virtio_net_tx_thread(void *p)
mutex_unlock(&ndev->io_lock[id]);
 
while (virt_queue__available(vq)) {
+   struct virtio_net_hdr *hdr;
head = virt_queue__get_iov(vq, iov, &out, &in, kvm);
+   hdr = iov[0].iov_base;
+   virtio_net_fix_tx_hdr(hdr, ndev);
len = ndev->ops->tx(iov, out, ndev);
if (len < 0) {
pr_warning("%s: tx on vq %u failed (%d)\n",
-   __func__, id, len);
+   __func__, id, errno);
goto out_err;
}
 
@@ -415,9 +441,14 @@ static int virtio_net__vhost_set_features(struct net_dev 
*ndev)
 static void set_guest_features(struct kvm *kvm, void *dev, u32 features)
 {
struct net_dev *ndev = dev;
+   struct virtio_net_config *conf = &ndev->config;
 
ndev->features = features;
 
+   conf->status = virtio_host_to_guest_u16(&ndev->vdev, conf->status);
+   conf->max_virtqueue_pairs = virtio_host_to_guest_u16(&ndev->vdev,
+
conf->max_virtqueue_pairs);
+
if (ndev->mode

[PATCH v3 6/9] kvmtool: convert 9p backend to support bi-endianness

2014-04-24 Thread Marc Zyngier
Configure the queues to follow the guest endianness, and make sure
the configuration space is doing the same.

Signed-off-by: Marc Zyngier 
---
 tools/kvm/virtio/9p.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/kvm/virtio/9p.c b/tools/kvm/virtio/9p.c
index 847eddb..9073a1e 100644
--- a/tools/kvm/virtio/9p.c
+++ b/tools/kvm/virtio/9p.c
@@ -1252,8 +1252,10 @@ static u32 get_host_features(struct kvm *kvm, void *dev)
 static void set_guest_features(struct kvm *kvm, void *dev, u32 features)
 {
struct p9_dev *p9dev = dev;
+   struct virtio_9p_config *conf = p9dev->config;
 
p9dev->features = features;
+   conf->tag_len = virtio_host_to_guest_u16(&p9dev->vdev, conf->tag_len);
 }
 
 static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 page_size, u32 
align,
@@ -1272,6 +1274,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, 
u32 page_size, u32 align,
job = &p9dev->jobs[vq];
 
vring_init(&queue->vring, VIRTQUEUE_NUM, p, align);
+   virtio_init_device_vq(&p9dev->vdev, queue);
 
*job= (struct p9_dev_job) {
.vq = queue,
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/9] kvmtool: virt_queue configuration based on endianness

2014-04-24 Thread Marc Zyngier
Define a simple infrastructure to configure a virt_queue
depending on the guest endianness, as reported by the feature
flags. At this stage, the endianness is always the host's.

Wrap all accesses to virt_queue data structures shared between
host and guest with byte swapping helpers.

Should the architecture only support one endianness, these helpers
are reduced to the identity function.

Signed-off-by: Marc Zyngier 
---
 tools/kvm/include/kvm/virtio.h | 74 --
 tools/kvm/virtio/core.c| 59 +++--
 2 files changed, 105 insertions(+), 28 deletions(-)

diff --git a/tools/kvm/include/kvm/virtio.h b/tools/kvm/include/kvm/virtio.h
index 820b94a..f6bddd9 100644
--- a/tools/kvm/include/kvm/virtio.h
+++ b/tools/kvm/include/kvm/virtio.h
@@ -1,6 +1,8 @@
 #ifndef KVM__VIRTIO_H
 #define KVM__VIRTIO_H
 
+#include 
+
 #include 
 #include 
 
@@ -15,6 +17,10 @@
 #define VIRTIO_PCI_O_CONFIG0
 #define VIRTIO_PCI_O_MSIX  1
 
+#define VIRTIO_ENDIAN_HOST 0
+#define VIRTIO_ENDIAN_LE   (1 << 0)
+#define VIRTIO_ENDIAN_BE   (1 << 1)
+
 struct virt_queue {
struct vringvring;
u32 pfn;
@@ -24,9 +30,71 @@ struct virt_queue {
u16 last_used_signalled;
 };
 
+/*
+ * The default policy is not to cope with the guest endianness.
+ * It also helps not breaking archs that do not care about supporting
+ * such a configuration.
+ */
+#ifndef VIRTIO_RING_ENDIAN
+#define VIRTIO_RING_ENDIAN VIRTIO_ENDIAN_HOST
+#endif
+
+#if (VIRTIO_RING_ENDIAN & (VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE))
+
+static inline __u16 __virtio_g2h_u16(u16 endian, __u16 val)
+{
+   return (endian == VIRTIO_ENDIAN_LE) ? le16toh(val) : be16toh(val);
+}
+
+static inline __u16 __virtio_h2g_u16(u16 endian, __u16 val)
+{
+   return (endian == VIRTIO_ENDIAN_LE) ? htole16(val) : htobe16(val);
+}
+
+static inline __u32 __virtio_g2h_u32(u16 endian, __u32 val)
+{
+   return (endian == VIRTIO_ENDIAN_LE) ? le32toh(val) : be32toh(val);
+}
+
+static inline __u32 __virtio_h2g_u32(u16 endian, __u32 val)
+{
+   return (endian == VIRTIO_ENDIAN_LE) ? htole32(val) : htobe32(val);
+}
+
+static inline __u64 __virtio_g2h_u64(u16 endian, __u64 val)
+{
+   return (endian == VIRTIO_ENDIAN_LE) ? le64toh(val) : be64toh(val);
+}
+
+static inline __u64 __virtio_h2g_u64(u16 endian, __u64 val)
+{
+   return (endian == VIRTIO_ENDIAN_LE) ? htole64(val) : htobe64(val);
+}
+
+#define virtio_guest_to_host_u16(x, v) __virtio_g2h_u16((x)->endian, (v))
+#define virtio_host_to_guest_u16(x, v) __virtio_h2g_u16((x)->endian, (v))
+#define virtio_guest_to_host_u32(x, v) __virtio_g2h_u32((x)->endian, (v))
+#define virtio_host_to_guest_u32(x, v) __virtio_h2g_u32((x)->endian, (v))
+#define virtio_guest_to_host_u64(x, v) __virtio_g2h_u64((x)->endian, (v))
+#define virtio_host_to_guest_u64(x, v) __virtio_h2g_u64((x)->endian, (v))
+
+#else
+
+#define virtio_guest_to_host_u16(x, v) (v)
+#define virtio_host_to_guest_u16(x, v) (v)
+#define virtio_guest_to_host_u32(x, v) (v)
+#define virtio_host_to_guest_u32(x, v) (v)
+#define virtio_guest_to_host_u64(x, v) (v)
+#define virtio_host_to_guest_u64(x, v) (v)
+
+#endif
+
 static inline u16 virt_queue__pop(struct virt_queue *queue)
 {
-   return queue->vring.avail->ring[queue->last_avail_idx++ % 
queue->vring.num];
+   __u16 guest_idx;
+
+   guest_idx = queue->vring.avail->ring[queue->last_avail_idx++ % 
queue->vring.num];
+   return virtio_guest_to_host_u16(queue, guest_idx);
 }
 
 static inline struct vring_desc *virt_queue__get_desc(struct virt_queue 
*queue, u16 desc_ndx)
@@ -39,8 +107,8 @@ static inline bool virt_queue__available(struct virt_queue 
*vq)
if (!vq->vring.avail)
return 0;
 
-   vring_avail_event(&vq->vring) = vq->last_avail_idx;
-   return vq->vring.avail->idx !=  vq->last_avail_idx;
+   vring_avail_event(&vq->vring) = virtio_host_to_guest_u16(vq, 
vq->last_avail_idx);
+   return virtio_guest_to_host_u16(vq, vq->vring.avail->idx) != 
vq->last_avail_idx;
 }
 
 struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, 
u32 head, u32 len);
diff --git a/tools/kvm/virtio/core.c b/tools/kvm/virtio/core.c
index 2dfb828..9ae7887 100644
--- a/tools/kvm/virtio/core.c
+++ b/tools/kvm/virtio/core.c
@@ -15,10 +15,11 @@
 struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, 
u32 head, u32 len)
 {
struct vring_used_elem *used_elem;
+   u16 idx = virtio_guest_to_host_u16(queue, queue->vring.used->idx);
 
-   used_elem   = &queue->vring.used->ring[queue->vring.used->idx % 
queue->vring.num];
-   used_elem->id   = head;
-   used_elem->len  = len;
+   used_elem   = &queue->vring.used->ring[idx % queue->vring.num];
+   used_elem->id   = virtio_host_to_guest_u32(queue, head);
+   used_elem->len  = virtio_host_to_guest_u32(queue, len);
 
/*
 * Use wmb to assure that used e

[PATCH v3 1/9] kvmtool: pass trapped vcpu to MMIO accessors

2014-04-24 Thread Marc Zyngier
In order to be able to find out about the endianness of a virtual
CPU, it is necessary to pass a pointer to the kvm_cpu structure
down to the MMIO accessors.

This patch just pushes such pointer as far as required for the
MMIO accessors to have a play with the vcpu.

Signed-off-by: Marc Zyngier 
---
 tools/kvm/arm/include/arm-common/kvm-cpu-arch.h |  4 ++--
 tools/kvm/arm/kvm-cpu.c | 10 +-
 tools/kvm/hw/pci-shmem.c|  2 +-
 tools/kvm/include/kvm/kvm.h |  4 ++--
 tools/kvm/kvm-cpu.c |  4 ++--
 tools/kvm/mmio.c| 11 ++-
 tools/kvm/pci.c |  3 ++-
 tools/kvm/powerpc/include/kvm/kvm-cpu-arch.h|  2 +-
 tools/kvm/powerpc/kvm-cpu.c |  4 ++--
 tools/kvm/powerpc/spapr_pci.h   |  6 +++---
 tools/kvm/virtio/mmio.c | 18 +++---
 tools/kvm/virtio/pci.c  |  6 --
 tools/kvm/x86/include/kvm/kvm-cpu-arch.h|  4 ++--
 13 files changed, 43 insertions(+), 35 deletions(-)

diff --git a/tools/kvm/arm/include/arm-common/kvm-cpu-arch.h 
b/tools/kvm/arm/include/arm-common/kvm-cpu-arch.h
index bef1761..355a02d 100644
--- a/tools/kvm/arm/include/arm-common/kvm-cpu-arch.h
+++ b/tools/kvm/arm/include/arm-common/kvm-cpu-arch.h
@@ -42,8 +42,8 @@ static inline bool kvm_cpu__emulate_io(struct kvm *kvm, u16 
port, void *data,
return false;
 }
 
-bool kvm_cpu__emulate_mmio(struct kvm *kvm, u64 phys_addr, u8 *data, u32 len,
-  u8 is_write);
+bool kvm_cpu__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data,
+  u32 len, u8 is_write);
 
 unsigned long kvm_cpu__get_vcpu_mpidr(struct kvm_cpu *vcpu);
 
diff --git a/tools/kvm/arm/kvm-cpu.c b/tools/kvm/arm/kvm-cpu.c
index 9c9616f..53afa35 100644
--- a/tools/kvm/arm/kvm-cpu.c
+++ b/tools/kvm/arm/kvm-cpu.c
@@ -98,17 +98,17 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
return false;
 }
 
-bool kvm_cpu__emulate_mmio(struct kvm *kvm, u64 phys_addr, u8 *data, u32 len,
-  u8 is_write)
+bool kvm_cpu__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data,
+  u32 len, u8 is_write)
 {
if (arm_addr_in_virtio_mmio_region(phys_addr)) {
-   return kvm__emulate_mmio(kvm, phys_addr, data, len, is_write);
+   return kvm__emulate_mmio(vcpu, phys_addr, data, len, is_write);
} else if (arm_addr_in_ioport_region(phys_addr)) {
int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
u16 port = (phys_addr - KVM_IOPORT_AREA) & USHRT_MAX;
-   return kvm__emulate_io(kvm, port, data, direction, len, 1);
+   return kvm__emulate_io(vcpu->kvm, port, data, direction, len, 
1);
} else if (arm_addr_in_pci_region(phys_addr)) {
-   return kvm__emulate_mmio(kvm, phys_addr, data, len, is_write);
+   return kvm__emulate_mmio(vcpu, phys_addr, data, len, is_write);
}
 
return false;
diff --git a/tools/kvm/hw/pci-shmem.c b/tools/kvm/hw/pci-shmem.c
index 34de747..4b837eb 100644
--- a/tools/kvm/hw/pci-shmem.c
+++ b/tools/kvm/hw/pci-shmem.c
@@ -105,7 +105,7 @@ static struct ioport_operations shmem_pci__io_ops = {
.io_out = shmem_pci__io_out,
 };
 
-static void callback_mmio_msix(u64 addr, u8 *data, u32 len, u8 is_write, void 
*ptr)
+static void callback_mmio_msix(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 
len, u8 is_write, void *ptr)
 {
void *mem;
 
diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
index d05b936..f1b71a0 100644
--- a/tools/kvm/include/kvm/kvm.h
+++ b/tools/kvm/include/kvm/kvm.h
@@ -84,10 +84,10 @@ int kvm_timer__exit(struct kvm *kvm);
 void kvm__irq_line(struct kvm *kvm, int irq, int level);
 void kvm__irq_trigger(struct kvm *kvm, int irq);
 bool kvm__emulate_io(struct kvm *kvm, u16 port, void *data, int direction, int 
size, u32 count);
-bool kvm__emulate_mmio(struct kvm *kvm, u64 phys_addr, u8 *data, u32 len, u8 
is_write);
+bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, 
u8 is_write);
 int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void 
*userspace_addr);
 int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool 
coalesce,
-   void (*mmio_fn)(u64 addr, u8 *data, u32 len, u8 
is_write, void *ptr),
+  void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 
*data, u32 len, u8 is_write, void *ptr),
void *ptr);
 bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr);
 void kvm__pause(struct kvm *kvm);
diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c
index be05c49..5c70b00 100644
--- a/tools/kvm/kvm-cpu.c
+++ b/tools/kvm/kvm-cpu.c
@@ -54,7 +54,7 @@ static void kvm_cpu__handle_coalesced_mmio(struct kvm

[PATCH 0/4] Random kvmtool fixes

2014-04-24 Thread Marc Zyngier
This small series addresses a number of issues that have been
pestering me for a while. Nothing major though:

The first two patches simply ensure that we can always use THP if the
have been enabled on the host. Third one fixes an annoying issue when
-tty is used.

The fourth patch allows me to use TAP interfaces *and* run kvmtool as
a non-priviledged user (tunctl is your BFF).

The whole series applies on top of kvmtool/next as of yesterday.

Thanks,

M.

Marc Zyngier (4):
  kvmtool: ARM: force alignment of memory for THP
  kvmtool: ARM: pass MADV_HUGEPAGE to madvise
  kvmtool: Fix handling of POLLHUP when --tty is used
  kvmtool: allow the TAP interface to be specified on the command line

 tools/kvm/arm/kvm.c| 10 ++
 tools/kvm/include/kvm/virtio-net.h |  1 +
 tools/kvm/term.c   |  4 +++-
 tools/kvm/virtio/net.c | 21 ++---
 4 files changed, 24 insertions(+), 12 deletions(-)

-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] kvmtool: Fix handling of POLLHUP when --tty is used

2014-04-24 Thread Marc Zyngier
The --tty option allows the redirection of a console (serial or virtio)
to a pseudo-terminal. As long as the slave port of this pseudo-terminal
is not opened by another process, a poll() call on the master port will
return POLLHUP in the .event field.

This confuses the virtio console code, as term_readable() returns
a positive value, indicating that something is available, while the
call to term_getc_iov will fail.

The fix is to check for the presence of the POLLIN flag in the .event
field. Note that this is only a partial fix, as kvmtool will still
consume vast amounts of CPU resource by spinning like crazy until
the slave port is actually opened.

Signed-off-by: Marc Zyngier 
---
 tools/kvm/term.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/kvm/term.c b/tools/kvm/term.c
index 5c3e543..214f5e2 100644
--- a/tools/kvm/term.c
+++ b/tools/kvm/term.c
@@ -89,8 +89,10 @@ bool term_readable(int term)
.events = POLLIN,
.revents = 0,
};
+   int err;
 
-   return poll(&pollfd, 1, 0) > 0;
+   err = poll(&pollfd, 1, 0);
+   return (err > 0 && (pollfd.revents & POLLIN));
 }
 
 static void *term_poll_thread_loop(void *param)
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] kvmtool: ARM: force alignment of memory for THP

2014-04-24 Thread Marc Zyngier
Use of THP requires that the VMA containing the guest memory is
2MB aligned. Unfortunately, nothing in kvmtool ensures that the
memory is actually aligned, making the use of THP very unlikely.

Just follow what we're already doing for virtio, and expand our
forced alignment to 2M.

* without this patch:
root@muffin-man:~# for i in $(seq 1 5); do ./hackbench 50 process 1000; done
Running with 50*40 (== 2000) tasks.
Time: 113.600
Running with 50*40 (== 2000) tasks.
Time: 108.650
Running with 50*40 (== 2000) tasks.
Time: 110.753
Running with 50*40 (== 2000) tasks.
Time: 116.992
Running with 50*40 (== 2000) tasks.
Time: 117.317

* with this patch:
root@muffin-man:~# for i in $(seq 1 5); do ./hackbench 50 process 1000; done
Running with 50*40 (== 2000) tasks.
Time: 97.613
Running with 50*40 (== 2000) tasks.
Time: 96.111
Running with 50*40 (== 2000) tasks.
Time: 97.090
Running with 50*40 (== 2000) tasks.
Time: 100.820
Running with 50*40 (== 2000) tasks.
Time: 100.298

Acked-by: Will Deacon 
Signed-off-by: Marc Zyngier 
---
 tools/kvm/arm/kvm.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/kvm/arm/kvm.c b/tools/kvm/arm/kvm.c
index 008b7fe..d0d64ff 100644
--- a/tools/kvm/arm/kvm.c
+++ b/tools/kvm/arm/kvm.c
@@ -61,11 +61,13 @@ void kvm__arch_set_cmdline(char *cmdline, bool video)
 void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size)
 {
/*
-* Allocate guest memory. We must align out buffer to 64K to
+* Allocate guest memory. We must align our buffer to 64K to
 * correlate with the maximum guest page size for virtio-mmio.
+* If using THP, then our minimal alignment becomes 2M.
+* 2M trumps 64K, so let's go with that.
 */
kvm->ram_size = min(ram_size, (u64)ARM_MAX_MEMORY(kvm));
-   kvm->arch.ram_alloc_size = kvm->ram_size + SZ_64K;
+   kvm->arch.ram_alloc_size = kvm->ram_size + SZ_2M;
kvm->arch.ram_alloc_start = mmap_anon_or_hugetlbfs(kvm, hugetlbfs_path,
kvm->arch.ram_alloc_size);
 
@@ -74,7 +76,7 @@ void kvm__arch_init(struct kvm *kvm, const char 
*hugetlbfs_path, u64 ram_size)
kvm->arch.ram_alloc_size, errno);
 
kvm->ram_start = (void *)ALIGN((unsigned long)kvm->arch.ram_alloc_start,
-   SZ_64K);
+   SZ_2M);
 
madvise(kvm->arch.ram_alloc_start, kvm->arch.ram_alloc_size,
MADV_MERGEABLE);
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] kvmtool: ARM: pass MADV_HUGEPAGE to madvise

2014-04-24 Thread Marc Zyngier
If the host kernel is configured with CONFIG_TRANSPARENT_HUGEPAGE_MADVISE,
it is important to madvise(MADV_HUGEPAGE) the memory region.
Otherwise, the guest won't benefit from using THP.

Acked-by: Will Deacon 
Signed-off-by: Marc Zyngier 
---
 tools/kvm/arm/kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/kvm/arm/kvm.c b/tools/kvm/arm/kvm.c
index d0d64ff..58ad9fa 100644
--- a/tools/kvm/arm/kvm.c
+++ b/tools/kvm/arm/kvm.c
@@ -79,7 +79,7 @@ void kvm__arch_init(struct kvm *kvm, const char 
*hugetlbfs_path, u64 ram_size)
SZ_2M);
 
madvise(kvm->arch.ram_alloc_start, kvm->arch.ram_alloc_size,
-   MADV_MERGEABLE);
+   MADV_MERGEABLE | MADV_HUGEPAGE);
 
/* Initialise the virtual GIC. */
if (gic__init_irqchip(kvm))
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] kvmtool: allow the TAP interface to be specified on the command line

2014-04-24 Thread Marc Zyngier
In order to overcome the fact that a TAP interface can only be created
by root, allow the use of an interface that has already been created,
configured, made persistent and owned by a specific user/group (such
as done with tunctl).

In this case, any kind of configuration can be skipped (IP, up and
running mode), and the TAP is assumed to be ready for use.

This is done by introducing the "tapif" option, as used here:
--network trans=mmio,mode=tap,tapif=blah

where "blah" is a TAP interface.

This allow the creation/configuration of the interface to be controlled
by root, and lkvm to be run as a normal user.

Signed-off-by: Marc Zyngier 
---
 tools/kvm/include/kvm/virtio-net.h |  1 +
 tools/kvm/virtio/net.c | 21 ++---
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/tools/kvm/include/kvm/virtio-net.h 
b/tools/kvm/include/kvm/virtio-net.h
index 0f4d1e5..f435cc3 100644
--- a/tools/kvm/include/kvm/virtio-net.h
+++ b/tools/kvm/include/kvm/virtio-net.h
@@ -10,6 +10,7 @@ struct virtio_net_params {
const char *host_ip;
const char *script;
const char *trans;
+   const char *tapif;
char guest_mac[6];
char host_mac[6];
struct kvm *kvm;
diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c
index dbb4431..82dbb88 100644
--- a/tools/kvm/virtio/net.c
+++ b/tools/kvm/virtio/net.c
@@ -257,6 +257,7 @@ static bool virtio_net__tap_init(struct net_dev *ndev)
struct sockaddr_in sin = {0};
struct ifreq ifr;
const struct virtio_net_params *params = ndev->params;
+   bool skipconf = !!params->tapif;
 
/* Did the user already gave us the FD? */
if (params->fd) {
@@ -272,6 +273,8 @@ static bool virtio_net__tap_init(struct net_dev *ndev)
 
memset(&ifr, 0, sizeof(ifr));
ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR;
+   if (params->tapif)
+   strncpy(ifr.ifr_name, params->tapif, sizeof(ifr.ifr_name));
if (ioctl(ndev->tap_fd, TUNSETIFF, &ifr) < 0) {
pr_warning("Config tap device error. Are you root?");
goto fail;
@@ -308,7 +311,7 @@ static bool virtio_net__tap_init(struct net_dev *ndev)
goto fail;
}
}
-   } else {
+   } else if (!skipconf) {
memset(&ifr, 0, sizeof(ifr));
strncpy(ifr.ifr_name, ndev->tap_name, sizeof(ndev->tap_name));
sin.sin_addr.s_addr = inet_addr(params->host_ip);
@@ -320,12 +323,14 @@ static bool virtio_net__tap_init(struct net_dev *ndev)
}
}
 
-   memset(&ifr, 0, sizeof(ifr));
-   strncpy(ifr.ifr_name, ndev->tap_name, sizeof(ndev->tap_name));
-   ioctl(sock, SIOCGIFFLAGS, &ifr);
-   ifr.ifr_flags |= IFF_UP | IFF_RUNNING;
-   if (ioctl(sock, SIOCSIFFLAGS, &ifr) < 0)
-   pr_warning("Could not bring tap device up");
+   if (!skipconf) {
+   memset(&ifr, 0, sizeof(ifr));
+   strncpy(ifr.ifr_name, ndev->tap_name, sizeof(ndev->tap_name));
+   ioctl(sock, SIOCGIFFLAGS, &ifr);
+   ifr.ifr_flags |= IFF_UP | IFF_RUNNING;
+   if (ioctl(sock, SIOCSIFFLAGS, &ifr) < 0)
+   pr_warning("Could not bring tap device up");
+   }
 
close(sock);
 
@@ -650,6 +655,8 @@ static int set_net_param(struct kvm *kvm, struct 
virtio_net_params *p,
p->host_ip = strdup(val);
} else if (strcmp(param, "trans") == 0) {
p->trans = strdup(val);
+   } else if (strcmp(param, "tapif") == 0) {
+   p->tapif = strdup(val);
} else if (strcmp(param, "vhost") == 0) {
p->vhost = atoi(val);
} else if (strcmp(param, "fd") == 0) {
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/4] live migration support for initial write protect of VM

2014-04-24 Thread Steve Capper
On Thu, Apr 24, 2014 at 05:39:29PM +0100, Steve Capper wrote:
[ ... ]

> 
> -- IMPORTANT NOTICE: The contents of this email and any attachments are 
> confidential and may also be privileged. If you are not the intended 
> recipient, please notify the sender immediately and do not disclose the 
> contents to any other person, use it for any purpose, or store or copy the 
> information in any medium.  Thank you.
> 
> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, 
> Registered in England & Wales, Company No:  2557590
> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, 
> Registered in England & Wales, Company No:  2548782
> 

Please ignore this notice, apologies for it appearing.
I will learn how to configure email

Cheers,
-- 
Steve

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/4] live migration support for initial write protect of VM

2014-04-24 Thread Steve Capper
On Wed, Apr 23, 2014 at 12:18:07AM +0100, Mario Smarduch wrote:
>
>
> Support for live migration initial write protect.
> - moved write protect to architecture memory region prepare function. This
>   way you can fail, abort migration without keep track of migration status.
> - Above also allows to generalize read dirty log function with x86
> - Added stage2_mark_pte_ro()
> - optimized initial write protect, skip upper table lookups
> - added stage2pmd_addr_end() to do generic 4 level table walk
> - changed kvm_flush_remote_tlbs() to weak function

Hello Mario,
I've taken a quick look at this and have a few suggestions below.
(I'm not a KVM expert, but took a look at the memory manipulation).

Future versions of this series could probably benefit from being sent
to lakml too?

Cheers,
--
Steve

>
> Signed-off-by: Mario Smarduch 
> ---
>  arch/arm/include/asm/kvm_host.h |8 ++
>  arch/arm/kvm/arm.c  |3 +
>  arch/arm/kvm/mmu.c  |  163 
> +++
>  virt/kvm/kvm_main.c |5 +-
>  4 files changed, 178 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 1e739f9..9f827c8 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -67,6 +67,12 @@ struct kvm_arch {
>
> /* Interrupt controller */
> struct vgic_distvgic;
> +
> +   /* Marks start of migration, used to handle 2nd stage page faults
> +* during migration, prevent installing huge pages and split huge 
> pages
> +* to small pages.
> +*/
> +   int migration_in_progress;
>  };
>
>  #define KVM_NR_MEM_OBJS 40
> @@ -230,4 +236,6 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, 
> u64 value);
>
>  void kvm_tlb_flush_vmid(struct kvm *kvm);
>
> +int kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
> +
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 9a4bc10..b916478 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -233,6 +233,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>struct kvm_userspace_memory_region *mem,
>enum kvm_mr_change change)
>  {
> +   /* Request for migration issued by user, write protect memory slot */
> +   if ((change != KVM_MR_DELETE) && (mem->flags & 
> KVM_MEM_LOG_DIRTY_PAGES))
> +   return kvm_mmu_slot_remove_write_access(kvm, mem->slot);
> return 0;
>  }
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 7ab77f3..4d029a6 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -31,6 +31,11 @@
>
>  #include "trace.h"
>
> +#define stage2pud_addr_end(addr, end)  \
> +({ u64 __boundary = ((addr) + PUD_SIZE) & PUD_MASK;\
> +   (__boundary - 1 < (end) - 1) ? __boundary : (end);  \
> +})

A matter of personal preference: can this be a static inline function
instead? That way you could avoid ambiguity with the parameter types.
(not an issue here, but this has bitten me in the past).

> +
>  extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
>
>  static pgd_t *boot_hyp_pgd;
> @@ -569,6 +574,15 @@ static int stage2_set_pte(struct kvm *kvm, struct 
> kvm_mmu_memory_cache *cache,
> return 0;
>  }
>
> +/* Write protect page */
> +static void stage2_mark_pte_ro(pte_t *pte)
> +{
> +   pte_t new_pte;
> +
> +   new_pte = pfn_pte(pte_pfn(*pte), PAGE_S2);
> +   *pte = new_pte;
> +}

This isn't making the pte read only.
It's nuking all the flags from the pte and replacing them with factory
settings. (In this case the PAGE_S2 pgprot).
If we had other attributes that we later wish to retain this could be
easily overlooked. Perhaps a new name for the function?

> +
>  /**
>   * kvm_phys_addr_ioremap - map a device range to guest IPA
>   *
> @@ -649,6 +663,155 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, 
> phys_addr_t *ipap)
> return false;
>  }
>
> +/**
> + * split_pmd - splits huge pages to small pages, required to keep a dirty 
> log of
> + *  smaller memory granules, otherwise huge pages would need to be
> + *  migrated. Practically an idle system has problems migrating with
> + *  huge pages.  Called during WP of entire VM address space, done
> + *  initially when  migration thread isses the KVM_MEM_LOG_DIRTY_PAGES
> + *  ioctl.
> + *  The mmu_lock is held during splitting.
> + *
> + * @kvm:The KVM pointer
> + * @pmd:Pmd to 2nd stage huge page
> + * @addr: ` Guest Physical Address
Nitpick: typo `

> + */
> +int split_pmd(struct kvm *kvm, pmd_t *pmd, u64 addr)

Maybe worth renaming to something like kvm_split_pmd to avoid future
namespace collisions (either compiler or cscope/ctags)? It should also
probably be static?

> +{
> +   struct page *page;
> +   pfn_t pf

Re: [PATCH v5 0/5] KVM: x86: flush tlb out of mmu-lock after write protection

2014-04-24 Thread Marcelo Tosatti
On Thu, Apr 17, 2014 at 05:06:11PM +0800, Xiao Guangrong wrote:
> Since Marcelo has agreed the comments improving in the off-line mail, i
> consider this is his Ack. :) Please let me know If i misunderstood it.
> 
> This patchset is splited from my previous patchset:
> [PATCH v3 00/15] KVM: MMU: locklessly write-protect
> that can be found at:
> https://lkml.org/lkml/2013/10/23/265

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] KVM: async_pf: use_mm/mm_users fixes

2014-04-24 Thread Oleg Nesterov
On 04/24, Christian Borntraeger wrote:
>
> On 21/04/14 15:25, Oleg Nesterov wrote:
> > Hello.
> >
> > Completely untested and I know nothing about kvm ;) Please review.
> >
> > But use_mm() really looks misleading, and the usage of mm_users looks
> > "obviously wrong". I already sent this change while we were discussing
> > vmacache, but it was ignored. Since then kvm_async_page_present_sync()
> > was added into async_pf_execute() into async_pf_execute(), but it seems
> > to me that use_mm() is still unnecessary.
> >
> > Oleg.
> >
> >  virt/kvm/async_pf.c |   10 --
> >  1 files changed, 4 insertions(+), 6 deletions(-)
> >
>
> I gave both patches some testing on s390, seems fine. I think patch2 really
> does fix a bug. So if Paolo, Marcelo, Gleb agree (maybe do a test on x86 for
> async_pf) both patches are good to go. Given that somebody tests this on x86:
>
> Acked-by: Christian Borntraeger 

Thanks!

I think x86 should be fine, it doesn't select CONFIG_KVM_ASYNC_PF_SYNC and
get_user_pages() is certainly fine without use_mm(). And I still think it
should do get_user_pages(tsk => NULL) but this is minor.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] KVM: async_pf: use_mm/mm_users fixes

2014-04-24 Thread Christian Borntraeger
On 21/04/14 15:25, Oleg Nesterov wrote:
> Hello.
> 
> Completely untested and I know nothing about kvm ;) Please review.
> 
> But use_mm() really looks misleading, and the usage of mm_users looks
> "obviously wrong". I already sent this change while we were discussing
> vmacache, but it was ignored. Since then kvm_async_page_present_sync()
> was added into async_pf_execute() into async_pf_execute(), but it seems
> to me that use_mm() is still unnecessary.
> 
> Oleg.
> 
>  virt/kvm/async_pf.c |   10 --
>  1 files changed, 4 insertions(+), 6 deletions(-)
> 

I gave both patches some testing on s390, seems fine. I think patch2 really
does fix a bug. So if Paolo, Marcelo, Gleb agree (maybe do a test on x86 for
async_pf) both patches are good to go. Given that somebody tests this on x86:

Acked-by: Christian Borntraeger 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/13] KVM: PPC: Book3S PR: Implement LPCR ONE_REG

2014-04-24 Thread Alexander Graf
To control whether we should inject interrupts in little or big endian
mode, user space sets the LPCR.ILE bit accordingly via ONE_REG.

Let's implement it, so we are able to trigger interrupts in LE mode.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s.h | 1 +
 arch/powerpc/kvm/book3s_64_mmu.c  | 8 +++-
 arch/powerpc/kvm/book3s_pr.c  | 6 ++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index bb1e38a..27b1041 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -106,6 +106,7 @@ struct kvmppc_vcpu_book3s {
 #endif
int hpte_cache_count;
spinlock_t mmu_lock;
+   ulong lpcr;
 };
 
 #define CONTEXT_HOST   0
diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index 83da1f8..4a77725 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -38,7 +38,13 @@
 
 static void kvmppc_mmu_book3s_64_reset_msr(struct kvm_vcpu *vcpu)
 {
-   kvmppc_set_msr(vcpu, MSR_SF);
+   struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+   ulong new_msr = MSR_SF;
+
+   if (vcpu_book3s->lpcr & LPCR_ILE)
+   new_msr |= MSR_LE;
+
+   kvmppc_set_msr(vcpu, new_msr);
 }
 
 static struct kvmppc_slb *kvmppc_mmu_book3s_64_find_slbe(
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index c5c052a..9189ac5 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1110,6 +1110,9 @@ static int kvmppc_get_one_reg_pr(struct kvm_vcpu *vcpu, 
u64 id,
case KVM_REG_PPC_HIOR:
*val = get_reg_val(id, to_book3s(vcpu)->hior);
break;
+   case KVM_REG_PPC_LPCR:
+   *val = get_reg_val(id, to_book3s(vcpu)->lpcr);
+   break;
default:
r = -EINVAL;
break;
@@ -1128,6 +1131,9 @@ static int kvmppc_set_one_reg_pr(struct kvm_vcpu *vcpu, 
u64 id,
to_book3s(vcpu)->hior = set_reg_val(id, *val);
to_book3s(vcpu)->hior_explicit = true;
break;
+   case KVM_REG_PPC_LPCR:
+   to_book3s(vcpu)->lpcr = set_reg_val(id, *val) & LPCR_ILE;
+   break;
default:
r = -EINVAL;
break;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/13] KVM: PPC: Book3S: Move little endian conflict to HV KVM

2014-04-24 Thread Alexander Graf
With the previous patches applied, we can now successfully use PR KVM on
little endian hosts which means we can now allow users to select it.

However, HV KVM still needs some work, so let's keep the kconfig conflict
on that one.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 141b202..d6a53b9 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -6,7 +6,6 @@ source "virt/kvm/Kconfig"
 
 menuconfig VIRTUALIZATION
bool "Virtualization"
-   depends on !CPU_LITTLE_ENDIAN
---help---
  Say Y here to get to see options for using your Linux host to run
  other operating systems inside virtual machines (guests).
@@ -76,6 +75,7 @@ config KVM_BOOK3S_64
 config KVM_BOOK3S_64_HV
tristate "KVM support for POWER7 and PPC970 using hypervisor mode in 
host"
depends on KVM_BOOK3S_64
+   depends on !CPU_LITTLE_ENDIAN
select KVM_BOOK3S_HV_POSSIBLE
select MMU_NOTIFIER
select CMA
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/13] PPC: KVM: Enable PR KVM on ppc64le

2014-04-24 Thread Alexander Graf
During the enablement of ppc64le KVM has been kept unfixed. This patch set
is the initial attempt to make all of KVM work on ppc64le hosts. It starts
the effort by bringing PR KVM over.

With this patch set I am successfully able to run book3s_32 (BE) and
book3s_64 (BE, LE) guests on a host ppc64le system.

Please bear in mind that this patch set does *not* implement POWER8 support,
so if you're running on a POWER8 host you definitely want to pass in -cpu POWER7
and cross your fingers that the guest doesn't trigger a facility unavailable
interrupt which we don't trap on yet.


Alex

Alexander Graf (13):
  KVM: PPC: Book3S PR: Implement LPCR ONE_REG
  KVM: PPC: Book3S: PR: Fix C/R bit setting
  KVM: PPC: Book3S_32: PR: Access HTAB in big endian
  KVM: PPC: Book3S_64 PR: Access HTAB in big endian
  KVM: PPC: Book3S_64 PR: Access shadow slb in big endian
  KVM: PPC: Book3S PR: Give guest control over MSR_LE
  KVM: PPC: Book3S PR: Default to big endian guest
  KVM: PPC: Book3S PR: PAPR: Access HTAB in big endian
  KVM: PPC: Book3S PR: PAPR: Access RTAS in big endian
  KVM: PPC: PR: Fill pvinfo hcall instructions in big endian
  KVM: PPC: Make shared struct aka magic page guest endian
  KVM: PPC: Book3S PR: Do dcbz32 patching with big endian instructions
  KVM: PPC: Book3S: Move little endian conflict to HV KVM

 arch/powerpc/include/asm/kvm_book3s.h|   4 +-
 arch/powerpc/include/asm/kvm_host.h  |   3 +
 arch/powerpc/include/asm/kvm_ppc.h   |  80 ++-
 arch/powerpc/kernel/asm-offsets.c|   2 +
 arch/powerpc/kvm/Kconfig |   2 +-
 arch/powerpc/kvm/book3s.c|  72 ++--
 arch/powerpc/kvm/book3s_32_mmu.c |  41 +++-
 arch/powerpc/kvm/book3s_32_mmu_host.c|   4 +-
 arch/powerpc/kvm/book3s_64_mmu.c |  42 +++-
 arch/powerpc/kvm/book3s_64_mmu_host.c|   4 +-
 arch/powerpc/kvm/book3s_64_slb.S |  33 +-
 arch/powerpc/kvm/book3s_emulate.c|  28 
 arch/powerpc/kvm/book3s_hv.c |  11 
 arch/powerpc/kvm/book3s_interrupts.S |  23 ++-
 arch/powerpc/kvm/book3s_paired_singles.c |  16 +++--
 arch/powerpc/kvm/book3s_pr.c | 109 +++
 arch/powerpc/kvm/book3s_pr_papr.c|  16 +++--
 arch/powerpc/kvm/book3s_rtas.c   |  29 
 arch/powerpc/kvm/emulate.c   |  24 +++
 arch/powerpc/kvm/powerpc.c   |  50 +++---
 arch/powerpc/kvm/trace_pr.h  |   2 +-
 21 files changed, 410 insertions(+), 185 deletions(-)

-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/13] KVM: PPC: Book3S PR: Give guest control over MSR_LE

2014-04-24 Thread Alexander Graf
When we calculate the actual MSR that the guest is running with when in guest
context, we take a few MSR bits from the MSR the guest thinks it's using.

Add MSR_LE to these bits, so the guest gets full control over its own endianness
setting.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_pr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 9189ac5..8076543 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -249,7 +249,7 @@ static void kvmppc_recalc_shadow_msr(struct kvm_vcpu *vcpu)
ulong smsr = vcpu->arch.shared->msr;
 
/* Guest MSR values */
-   smsr &= MSR_FE0 | MSR_FE1 | MSR_SF | MSR_SE | MSR_BE;
+   smsr &= MSR_FE0 | MSR_FE1 | MSR_SF | MSR_SE | MSR_BE | MSR_LE;
/* Process MSR values */
smsr |= MSR_ME | MSR_RI | MSR_IR | MSR_DR | MSR_PR | MSR_EE;
/* External providers the guest reserved */
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/13] KVM: PPC: Book3S_32: PR: Access HTAB in big endian

2014-04-24 Thread Alexander Graf
The HTAB is always big endian. We access the guest's HTAB using
copy_from/to_user, but don't yet take care of the fact that we might
be running on an LE host.

Wrap all accesses to the guest HTAB with big endian accessors.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_32_mmu.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index 60fc3f4..0e42b16 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -208,6 +208,7 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu 
*vcpu, gva_t eaddr,
u32 sre;
hva_t ptegp;
u32 pteg[16];
+   u32 pte0, pte1;
u32 ptem = 0;
int i;
int found = 0;
@@ -233,11 +234,13 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu 
*vcpu, gva_t eaddr,
}
 
for (i=0; i<16; i+=2) {
-   if (ptem == pteg[i]) {
+   pte0 = be32_to_cpu(pteg[i]);
+   pte1 = be32_to_cpu(pteg[i + 1]);
+   if (ptem == pte0) {
u8 pp;
 
-   pte->raddr = (pteg[i+1] & ~(0xFFFULL)) | (eaddr & 
0xFFF);
-   pp = pteg[i+1] & 3;
+   pte->raddr = (pte1 & ~(0xFFFULL)) | (eaddr & 0xFFF);
+   pp = pte1 & 3;
 
if ((sr_kp(sre) &&  (vcpu->arch.shared->msr & MSR_PR)) 
||
(sr_ks(sre) && !(vcpu->arch.shared->msr & MSR_PR)))
@@ -260,7 +263,7 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu 
*vcpu, gva_t eaddr,
}
 
dprintk_pte("MMU: Found PTE -> %x %x - %x\n",
-   pteg[i], pteg[i+1], pp);
+   pte0, pte1, pp);
found = 1;
break;
}
@@ -269,7 +272,7 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu 
*vcpu, gva_t eaddr,
/* Update PTE C and A bits, so the guest's swapper knows we used the
   page */
if (found) {
-   u32 pte_r = pteg[i+1];
+   u32 pte_r = pte1;
char __user *addr = (char __user *) (ptegp + (i+1) * 
sizeof(u32));
 
/*
@@ -296,7 +299,8 @@ no_page_found:
to_book3s(vcpu)->sdr1, ptegp);
for (i=0; i<16; i+=2) {
dprintk_pte("   %02d: 0x%x - 0x%x (0x%x)\n",
-   i, pteg[i], pteg[i+1], ptem);
+   i, be32_to_cpu(pteg[i]),
+   be32_to_cpu(pteg[i+1]), ptem);
}
}
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/13] KVM: PPC: PR: Fill pvinfo hcall instructions in big endian

2014-04-24 Thread Alexander Graf
We expose a blob of hypercall instructions to user space that it gives to
the guest via device tree again. That blob should contain a stream of
instructions necessary to do a hypercall in big endian, as it just gets
passed into the guest and old guests use them straight away.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/powerpc.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 3cf541a..a9bd0ff 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1015,10 +1015,10 @@ static int kvm_vm_ioctl_get_pvinfo(struct 
kvm_ppc_pvinfo *pvinfo)
u32 inst_nop = 0x6000;
 #ifdef CONFIG_KVM_BOOKE_HV
u32 inst_sc1 = 0x4422;
-   pvinfo->hcall[0] = inst_sc1;
-   pvinfo->hcall[1] = inst_nop;
-   pvinfo->hcall[2] = inst_nop;
-   pvinfo->hcall[3] = inst_nop;
+   pvinfo->hcall[0] = cpu_to_be32(inst_sc1);
+   pvinfo->hcall[1] = cpu_to_be32(inst_nop);
+   pvinfo->hcall[2] = cpu_to_be32(inst_nop);
+   pvinfo->hcall[3] = cpu_to_be32(inst_nop);
 #else
u32 inst_lis = 0x3c00;
u32 inst_ori = 0x6000;
@@ -1034,10 +1034,10 @@ static int kvm_vm_ioctl_get_pvinfo(struct 
kvm_ppc_pvinfo *pvinfo)
 *sc
 *nop
 */
-   pvinfo->hcall[0] = inst_lis | ((KVM_SC_MAGIC_R0 >> 16) & inst_imm_mask);
-   pvinfo->hcall[1] = inst_ori | (KVM_SC_MAGIC_R0 & inst_imm_mask);
-   pvinfo->hcall[2] = inst_sc;
-   pvinfo->hcall[3] = inst_nop;
+   pvinfo->hcall[0] = cpu_to_be32(inst_lis | ((KVM_SC_MAGIC_R0 >> 16) & 
inst_imm_mask));
+   pvinfo->hcall[1] = cpu_to_be32(inst_ori | (KVM_SC_MAGIC_R0 & 
inst_imm_mask));
+   pvinfo->hcall[2] = cpu_to_be32(inst_sc);
+   pvinfo->hcall[3] = cpu_to_be32(inst_nop);
 #endif
 
pvinfo->flags = KVM_PPC_PVINFO_FLAGS_EV_IDLE;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/13] KVM: PPC: Book3S_64 PR: Access HTAB in big endian

2014-04-24 Thread Alexander Graf
The HTAB is always big endian. We access the guest's HTAB using
copy_from/to_user, but don't yet take care of the fact that we might
be running on an LE host.

Wrap all accesses to the guest HTAB with big endian accessors.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_64_mmu.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index e9854e7..158fb22 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -281,12 +281,15 @@ do_second:
key = 4;
 
for (i=0; i<16; i+=2) {
+   u64 pte0 = be64_to_cpu(pteg[i]);
+   u64 pte1 = be64_to_cpu(pteg[i + 1]);
+
/* Check all relevant fields of 1st dword */
-   if ((pteg[i] & v_mask) == v_val) {
+   if ((pte0 & v_mask) == v_val) {
/* If large page bit is set, check pgsize encoding */
if (slbe->large &&
(vcpu->arch.hflags & BOOK3S_HFLAG_MULTI_PGSIZE)) {
-   pgsize = decode_pagesize(slbe, pteg[i+1]);
+   pgsize = decode_pagesize(slbe, pte1);
if (pgsize < 0)
continue;
}
@@ -303,8 +306,8 @@ do_second:
goto do_second;
}
 
-   v = pteg[i];
-   r = pteg[i+1];
+   v = be64_to_cpu(pteg[i]);
+   r = be64_to_cpu(pteg[i+1]);
pp = (r & HPTE_R_PP) | key;
if (r & HPTE_R_PP0)
pp |= 8;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/13] KVM: PPC: Book3S_64 PR: Access shadow slb in big endian

2014-04-24 Thread Alexander Graf
The "shadow SLB" in the PACA is shared with the hypervisor, so it has to
be big endian. We access the shadow SLB during world switch, so let's make
sure we access it in big endian even when we're on a little endian host.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_64_slb.S | 33 -
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/book3s_64_slb.S
index 4f12e8f..596140e 100644
--- a/arch/powerpc/kvm/book3s_64_slb.S
+++ b/arch/powerpc/kvm/book3s_64_slb.S
@@ -17,29 +17,28 @@
  * Authors: Alexander Graf 
  */
 
-#ifdef __LITTLE_ENDIAN__
-#error Need to fix SLB shadow accesses in little endian mode
-#endif
-
 #define SHADOW_SLB_ESID(num)   (SLBSHADOW_SAVEAREA + (num * 0x10))
 #define SHADOW_SLB_VSID(num)   (SLBSHADOW_SAVEAREA + (num * 0x10) + 0x8)
 #define UNBOLT_SLB_ENTRY(num) \
-   ld  r9, SHADOW_SLB_ESID(num)(r12); \
-   /* Invalid? Skip. */; \
-   rldicl. r0, r9, 37, 63; \
-   beq slb_entry_skip_ ## num; \
-   xoris   r9, r9, SLB_ESID_V@h; \
-   std r9, SHADOW_SLB_ESID(num)(r12); \
+   li  r11, SHADOW_SLB_ESID(num);  \
+   LDX_BE  r9, r12, r11;   \
+   /* Invalid? Skip. */;   \
+   rldicl. r0, r9, 37, 63; \
+   beq slb_entry_skip_ ## num; \
+   xoris   r9, r9, SLB_ESID_V@h;   \
+   STDX_BE r9, r12, r11;   \
   slb_entry_skip_ ## num:
 
 #define REBOLT_SLB_ENTRY(num) \
-   ld  r10, SHADOW_SLB_ESID(num)(r11); \
-   cmpdi   r10, 0; \
-   beq slb_exit_skip_ ## num; \
-   orisr10, r10, SLB_ESID_V@h; \
-   ld  r9, SHADOW_SLB_VSID(num)(r11); \
-   slbmte  r9, r10; \
-   std r10, SHADOW_SLB_ESID(num)(r11); \
+   li  r8, SHADOW_SLB_ESID(num);   \
+   li  r7, SHADOW_SLB_VSID(num);   \
+   LDX_BE  r10, r11, r8;   \
+   cmpdi   r10, 0; \
+   beq slb_exit_skip_ ## num;  \
+   orisr10, r10, SLB_ESID_V@h; \
+   LDX_BE  r9, r11, r7;\
+   slbmte  r9, r10;\
+   STDX_BE r10, r11, r8;   \
 slb_exit_skip_ ## num:
 
 /**
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/13] KVM: PPC: Book3S PR: PAPR: Access RTAS in big endian

2014-04-24 Thread Alexander Graf
When the guest does an RTAS hypercall it keeps all RTAS variables inside a
big endian data structure.

To make sure we don't have to bother about endianness inside the actual RTAS
handlers, let's just convert the whole structure to host endian before we
call our RTAS handlers and back to big endian when we return to the guest.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_rtas.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_rtas.c b/arch/powerpc/kvm/book3s_rtas.c
index 7a05315..edb14ba 100644
--- a/arch/powerpc/kvm/book3s_rtas.c
+++ b/arch/powerpc/kvm/book3s_rtas.c
@@ -205,6 +205,32 @@ int kvm_vm_ioctl_rtas_define_token(struct kvm *kvm, void 
__user *argp)
return rc;
 }
 
+static void kvmppc_rtas_swap_endian_in(struct rtas_args *args)
+{
+#ifdef __LITTLE_ENDIAN__
+   int i;
+
+   args->token = be32_to_cpu(args->token);
+   args->nargs = be32_to_cpu(args->nargs);
+   args->nret = be32_to_cpu(args->nret);
+   for (i = 0; i < args->nargs; i++)
+   args->args[i] = be32_to_cpu(args->args[i]);
+#endif
+}
+
+static void kvmppc_rtas_swap_endian_out(struct rtas_args *args)
+{
+#ifdef __LITTLE_ENDIAN__
+   int i;
+
+   for (i = 0; i < args->nret; i++)
+   args->args[i] = cpu_to_be32(args->args[i]);
+   args->token = cpu_to_be32(args->token);
+   args->nargs = cpu_to_be32(args->nargs);
+   args->nret = cpu_to_be32(args->nret);
+#endif
+}
+
 int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
 {
struct rtas_token_definition *d;
@@ -223,6 +249,8 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
if (rc)
goto fail;
 
+   kvmppc_rtas_swap_endian_in(&args);
+
/*
 * args->rets is a pointer into args->args. Now that we've
 * copied args we need to fix it up to point into our copy,
@@ -247,6 +275,7 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
 
if (rc == 0) {
args.rets = orig_rets;
+   kvmppc_rtas_swap_endian_out(&args);
rc = kvm_write_guest(vcpu->kvm, args_phys, &args, sizeof(args));
if (rc)
goto fail;
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/13] KVM: PPC: Book3S: PR: Fix C/R bit setting

2014-04-24 Thread Alexander Graf
Commit 9308ab8e2d made C/R HTAB updates go byte-wise into the target HTAB.
However, it didn't update the guest's copy of the HTAB, but instead the
host local copy of it.

Write to the guest's HTAB instead.

Signed-off-by: Alexander Graf 
CC: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_32_mmu.c | 2 +-
 arch/powerpc/kvm/book3s_64_mmu.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index 76a64ce..60fc3f4 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -270,7 +270,7 @@ static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu 
*vcpu, gva_t eaddr,
   page */
if (found) {
u32 pte_r = pteg[i+1];
-   char __user *addr = (char __user *) &pteg[i+1];
+   char __user *addr = (char __user *) (ptegp + (i+1) * 
sizeof(u32));
 
/*
 * Use single-byte writes to update the HPTE, to
diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index 4a77725..e9854e7 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -348,14 +348,14 @@ do_second:
 * non-PAPR platforms such as mac99, and this is
 * what real hardware does.
 */
-   char __user *addr = (char __user *) &pteg[i+1];
+char __user *addr = (char __user *) (ptegp + (i + 1) * 
sizeof(u64));
r |= HPTE_R_R;
put_user(r >> 8, addr + 6);
}
if (iswrite && gpte->may_write && !(r & HPTE_R_C)) {
/* Set the dirty flag */
/* Use a single byte write */
-   char __user *addr = (char __user *) &pteg[i+1];
+char __user *addr = (char __user *) (ptegp + (i + 1) * 
sizeof(u64));
r |= HPTE_R_C;
put_user(r, addr + 7);
}
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/13] KVM: PPC: Book3S PR: PAPR: Access HTAB in big endian

2014-04-24 Thread Alexander Graf
The HTAB on PPC is always in big endian. When we access it via hypercalls
on behalf of the guest and we're running on a little endian host, we need
to make sure we swap the bits accordingly.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_pr_papr.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_pr_papr.c 
b/arch/powerpc/kvm/book3s_pr_papr.c
index 5efa97b..255e5b1 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -57,7 +57,7 @@ static int kvmppc_h_pr_enter(struct kvm_vcpu *vcpu)
for (i = 0; ; ++i) {
if (i == 8)
goto done;
-   if ((*hpte & HPTE_V_VALID) == 0)
+   if ((be64_to_cpu(*hpte) & HPTE_V_VALID) == 0)
break;
hpte += 2;
}
@@ -67,8 +67,8 @@ static int kvmppc_h_pr_enter(struct kvm_vcpu *vcpu)
goto done;
}
 
-   hpte[0] = kvmppc_get_gpr(vcpu, 6);
-   hpte[1] = kvmppc_get_gpr(vcpu, 7);
+   hpte[0] = cpu_to_be64(kvmppc_get_gpr(vcpu, 6));
+   hpte[1] = cpu_to_be64(kvmppc_get_gpr(vcpu, 7));
pteg_addr += i * HPTE_SIZE;
copy_to_user((void __user *)pteg_addr, hpte, HPTE_SIZE);
kvmppc_set_gpr(vcpu, 4, pte_index | i);
@@ -93,6 +93,8 @@ static int kvmppc_h_pr_remove(struct kvm_vcpu *vcpu)
pteg = get_pteg_addr(vcpu, pte_index);
mutex_lock(&vcpu->kvm->arch.hpt_mutex);
copy_from_user(pte, (void __user *)pteg, sizeof(pte));
+   pte[0] = be64_to_cpu(pte[0]);
+   pte[1] = be64_to_cpu(pte[1]);
 
ret = H_NOT_FOUND;
if ((pte[0] & HPTE_V_VALID) == 0 ||
@@ -169,6 +171,8 @@ static int kvmppc_h_pr_bulk_remove(struct kvm_vcpu *vcpu)
 
pteg = get_pteg_addr(vcpu, tsh & H_BULK_REMOVE_PTEX);
copy_from_user(pte, (void __user *)pteg, sizeof(pte));
+   pte[0] = be64_to_cpu(pte[0]);
+   pte[1] = be64_to_cpu(pte[1]);
 
/* tsl = AVPN */
flags = (tsh & H_BULK_REMOVE_FLAGS) >> 26;
@@ -207,6 +211,8 @@ static int kvmppc_h_pr_protect(struct kvm_vcpu *vcpu)
pteg = get_pteg_addr(vcpu, pte_index);
mutex_lock(&vcpu->kvm->arch.hpt_mutex);
copy_from_user(pte, (void __user *)pteg, sizeof(pte));
+   pte[0] = be64_to_cpu(pte[0]);
+   pte[1] = be64_to_cpu(pte[1]);
 
ret = H_NOT_FOUND;
if ((pte[0] & HPTE_V_VALID) == 0 ||
@@ -225,6 +231,8 @@ static int kvmppc_h_pr_protect(struct kvm_vcpu *vcpu)
 
rb = compute_tlbie_rb(v, r, pte_index);
vcpu->arch.mmu.tlbie(vcpu, rb, rb & 1 ? true : false);
+   pte[0] = cpu_to_be64(pte[0]);
+   pte[1] = cpu_to_be64(pte[1]);
copy_to_user((void __user *)pteg, pte, sizeof(pte));
ret = H_SUCCESS;
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/13] KVM: PPC: Make shared struct aka magic page guest endian

2014-04-24 Thread Alexander Graf
The shared (magic) page is a data structure that contains often used
supervisor privileged SPRs accessible via memory to the user to reduce
the number of exits we have to take to read/write them.

When we actually share this structure with the guest we have to maintain
it in guest endianness, because some of the patch tricks only work with
native endian load/store operations.

Since we only share the structure with either host or guest in little
endian on book3s_64 pr mode, we don't have to worry about booke or book3s hv.

For booke, the shared struct stays big endian. For book3s_64 hv we maintain
the struct in host native endian, since it never gets shared with the guest.

For book3s_64 pr we introduce a variable that tells us which endianness the
shared struct is in and route every access to it through helper inline
functions that evaluate this variable.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_book3s.h|  3 +-
 arch/powerpc/include/asm/kvm_host.h  |  3 +
 arch/powerpc/include/asm/kvm_ppc.h   | 80 ++-
 arch/powerpc/kernel/asm-offsets.c|  2 +
 arch/powerpc/kvm/book3s.c| 72 
 arch/powerpc/kvm/book3s_32_mmu.c | 21 +++
 arch/powerpc/kvm/book3s_32_mmu_host.c|  4 +-
 arch/powerpc/kvm/book3s_64_mmu.c | 19 ---
 arch/powerpc/kvm/book3s_64_mmu_host.c|  4 +-
 arch/powerpc/kvm/book3s_emulate.c| 28 +-
 arch/powerpc/kvm/book3s_hv.c | 11 
 arch/powerpc/kvm/book3s_interrupts.S | 23 +++-
 arch/powerpc/kvm/book3s_paired_singles.c | 16 +++---
 arch/powerpc/kvm/book3s_pr.c | 95 +++-
 arch/powerpc/kvm/book3s_pr_papr.c|  2 +-
 arch/powerpc/kvm/emulate.c   | 24 
 arch/powerpc/kvm/powerpc.c   | 34 +++-
 arch/powerpc/kvm/trace_pr.h  |  2 +-
 18 files changed, 306 insertions(+), 137 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 27b1041..ca3b8f1 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -269,9 +269,10 @@ static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
return vcpu->arch.pc;
 }
 
+static u64 kvmppc_get_msr(struct kvm_vcpu *vcpu);
 static inline bool kvmppc_need_byteswap(struct kvm_vcpu *vcpu)
 {
-   return (vcpu->arch.shared->msr & MSR_LE) != (MSR_KERNEL & MSR_LE);
+   return (kvmppc_get_msr(vcpu) & MSR_LE) != (MSR_KERNEL & MSR_LE);
 }
 
 static inline u32 kvmppc_get_last_inst_internal(struct kvm_vcpu *vcpu, ulong 
pc)
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 1eaea2d..3fffb2e 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -622,6 +622,9 @@ struct kvm_vcpu_arch {
wait_queue_head_t cpu_run;
 
struct kvm_vcpu_arch_shared *shared;
+#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_PR_POSSIBLE)
+   bool shared_big_endian;
+#endif
unsigned long magic_page_pa; /* phys addr to map the magic page to */
unsigned long magic_page_ea; /* effect. addr to map the magic page to */
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 4096f16..4a7cc45 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -449,6 +449,84 @@ static inline void kvmppc_mmu_flush_icache(pfn_t pfn)
 }
 
 /*
+ * Shared struct helpers. The shared struct can be little or big endian,
+ * depending on the guest endianness. So expose helpers to all of them.
+ */
+static inline bool kvmppc_shared_big_endian(struct kvm_vcpu *vcpu)
+{
+#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_PR_POSSIBLE)
+   /* Only Book3S_64 PR supports bi-endian for now */
+   return vcpu->arch.shared_big_endian;
+#elif defined(CONFIG_PPC_BOOK3S_64) && defined(__LITTLE_ENDIAN__)
+   /* Book3s_64 HV on little endian is always little endian */
+   return false;
+#else
+   return true;
+#endif
+}
+
+#define SHARED_WRAPPER_GET(reg, size)  \
+static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)  \
+{  \
+   if (kvmppc_shared_big_endian(vcpu)) \
+  return be##size##_to_cpu(vcpu->arch.shared->reg);\
+   else\
+  return le##size##_to_cpu(vcpu->arch.shared->reg);\
+}  \
+
+#define SHARED_WRAPPER_SET(reg, size)  \
+static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)
\
+{  \
+   if (kvmppc_shared_big_endi

[PATCH 12/13] KVM: PPC: Book3S PR: Do dcbz32 patching with big endian instructions

2014-04-24 Thread Alexander Graf
When the host CPU we're running on doesn't support dcbz32 itself, but the
guest wants to have dcbz only clear 32 bytes of data, we loop through every
executable mapped page to search for dcbz instructions and patch them with
a special privileged instruction that we emulate as dcbz32.

The only guests that want to see dcbz act as 32byte are book3s_32 guests, so
we don't have to worry about little endian instruction ordering. So let's
just always search for big endian dcbz instructions, also when we're on a
little endian host.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_32_mmu.c | 2 +-
 arch/powerpc/kvm/book3s_pr.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index 628d90e..93503bb 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -131,7 +131,7 @@ static hva_t kvmppc_mmu_book3s_32_get_pteg(struct kvm_vcpu 
*vcpu,
pteg = (vcpu_book3s->sdr1 & 0x) | hash;
 
dprintk("MMU: pc=0x%lx eaddr=0x%lx sdr1=0x%llx pteg=0x%x vsid=0x%x\n",
-   kvmppc_get_pc(&vcpu_book3s->vcpu), eaddr, vcpu_book3s->sdr1, 
pteg,
+   kvmppc_get_pc(vcpu), eaddr, vcpu_book3s->sdr1, pteg,
sr_vsid(sre));
 
r = gfn_to_hva(vcpu->kvm, pteg >> PAGE_SHIFT);
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index a3d705e..96dbb5f 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -428,8 +428,8 @@ static void kvmppc_patch_dcbz(struct kvm_vcpu *vcpu, struct 
kvmppc_pte *pte)
 
/* patch dcbz into reserved instruction, so we trap */
for (i=hpage_offset; i < hpage_offset + (HW_PAGE_SIZE / 4); i++)
-   if ((page[i] & 0xff0007ff) == INS_DCBZ)
-   page[i] &= 0xfff7;
+   if ((be32_to_cpu(page[i]) & 0xff0007ff) == INS_DCBZ)
+   page[i] &= cpu_to_be32(0xfff7);
 
kunmap_atomic(page);
put_page(hpage);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/13] KVM: PPC: Book3S PR: Default to big endian guest

2014-04-24 Thread Alexander Graf
The default MSR when user space does not define anything should be identical
on little and big endian hosts, so remove MSR_LE from it.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_pr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 8076543..1644d17 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1193,7 +1193,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_pr(struct 
kvm *kvm,
kvmppc_set_pvr_pr(vcpu, vcpu->arch.pvr);
vcpu->arch.slb_nr = 64;
 
-   vcpu->arch.shadow_msr = MSR_USER64;
+   vcpu->arch.shadow_msr = MSR_USER64 & ~MSR_LE;
 
err = kvmppc_mmu_init(vcpu);
if (err < 0)
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: commit 0bf1457f0cfca7b " mm: vmscan: do not swap anon pages just because free+file is low" causes heavy performance regression on paging

2014-04-24 Thread Johannes Weiner
Hi Rik,

On Tue, Apr 22, 2014 at 10:40:17AM -0400, Rik van Riel wrote:
> On 04/22/2014 07:57 AM, Christian Borntraeger wrote:
> > On 22/04/14 12:55, Christian Borntraeger wrote:
> >> While preparing/testing some KVM on s390 patches for the next merge window 
> >> (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
> >> performance hickup on guest paging (all anonymous memory).
> >>
> >> All memory bound guests are in "D" state now and the system is barely 
> >> unusable.
> >>
> >> Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
> >> "mm: vmscan: do not swap anon pages just because free+file is low" makes 
> >> the problem go away.
> >>
> >> According to /proc/vmstat the system is now in direct reclaim almost all 
> >> the time for every page fault (more than 10x more direct reclaims than 
> >> kswap reclaims)
> >> With the patch being reverted everything is fine again.
> >>
> >> Any ideas?
> > 
> > Here is an idea to tackle my problem and the original problem:
> > 
> > reverting  0bf1457f0cfca7bc026a82323ad34bcf58ad035d + checking against low, 
> > also seems to make my system usable.
> > 
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1923,7 +1923,7 @@ static void get_scan_count(struct lruvec *lruvec, 
> > struct scan_control *sc,
> >  */
> > if (global_reclaim(sc)) {
> > free = zone_page_state(zone, NR_FREE_PAGES);
> > -   if (unlikely(file + free <= high_wmark_pages(zone))) {
> > +   if (unlikely(file + free <= low_wmark_pages(zone))) {
> > scan_balance = SCAN_ANON;
> > goto out;
> > }
> > 
> 
> Looks reasonable to me.  Johannes?

I went with a full revert to be on the safe side.  Since kswapd's goal
is the high watermark, I kind of liked the idea that we start swapping
once the file pages alone are not enough anymore to restore the wmark.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Change dev->needed_headroom of virtio_net to make Patch"virtio-net: put virtio net header inline with data"work better

2014-04-24 Thread Zhangjie (HZ)
Thank you very much!
I will post a patch tomorrow.

--Best wishes!
//Zhang Jie
-Original Message-
From: Michael S. Tsirkin [mailto:m...@redhat.com] 
Sent: Thursday, April 24, 2014 7:45 PM
To: Zhangjie (HZ)
Cc: jasow...@redhat.com; Qinchuanyu; Liuyongan; kvm@vger.kernel.org; 
net...@vger.kernel.org
Subject: Re: Change dev->needed_headroom of virtio_net to make 
Patch"virtio-net: put virtio net header inline with data"work better

On Thu, Apr 24, 2014 at 10:19:58AM +, Zhangjie (HZ) wrote:
> Hi!
> 
> Patch “virtio-net: put virtio net header inline with data” , has a 
> notable improvement for TCP packages.
> 
> But UDP packages from virtio_net nic, do not have enough head room. I 
> wonder if we can set dev->needed_headroom
> 
> to the size of virtio net header, so as to put the header in. By doing 
> this, udp get about 5% improvement in bandwidth.

Sounds like a reasonable thing to do.
Want to post the patch so people can try it out?

>  
> 
> --
> 
> Thanks,
> 
> //Zhang Jie
> 
>  
> 
N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf

Re: Change dev->needed_headroom of virtio_net to make Patch"virtio-net: put virtio net header inline with data"work better

2014-04-24 Thread Michael S. Tsirkin
On Thu, Apr 24, 2014 at 10:19:58AM +, Zhangjie (HZ) wrote:
> Hi!
> 
> Patch “virtio-net: put virtio net header inline with data” , has a notable
> improvement for TCP packages.
> 
> But UDP packages from virtio_net nic, do not have enough head room. I wonder 
> if
> we can set dev->needed_headroom
> 
> to the size of virtio net header, so as to put the header in. By doing this,
> udp get about 5% improvement in bandwidth.

Sounds like a reasonable thing to do.
Want to post the patch so people can try it out?

>  
> 
> --
> 
> Thanks,
> 
> //Zhang Jie
> 
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3.11 078/182] MIPS: KVM: Pass reserved instruction exceptions to guest

2014-04-24 Thread Luis Henriques
3.11.10.9 -stable review patch.  If anyone has any objections, please let me 
know.

--

From: James Hogan 

commit 15505679362270d02c449626385cb74af8905514 upstream.

Previously a reserved instruction exception while in guest code would
cause a KVM internal error if kvm_mips_handle_ri() didn't recognise the
instruction (including a RDHWR from an unrecognised hardware register).

However the guest OS should really have the opportunity to catch the
exception so that it can take the appropriate actions such as sending a
SIGILL to the guest user process or emulating the instruction itself.

Therefore in these cases emulate a guest RI exception and only return
EMULATE_FAIL if that fails, being careful to revert the PC first in case
the exception occurred in a branch delay slot in which case the PC will
already point to the branch target.

Also turn the printk messages relating to these cases into kvm_debug
messages so that they aren't usually visible.

This allows crashme to run in the guest without killing the entire VM.

Signed-off-by: James Hogan 
Cc: Ralf Baechle 
Cc: Gleb Natapov 
Cc: Paolo Bonzini 
Cc: Sanjay Lal 
Cc: linux-m...@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini 
Signed-off-by: Luis Henriques 
---
 arch/mips/kvm/kvm_mips_emul.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/mips/kvm/kvm_mips_emul.c b/arch/mips/kvm/kvm_mips_emul.c
index 4b6274b..e75ef82 100644
--- a/arch/mips/kvm/kvm_mips_emul.c
+++ b/arch/mips/kvm/kvm_mips_emul.c
@@ -1571,17 +1571,17 @@ kvm_mips_handle_ri(unsigned long cause, uint32_t *opc,
arch->gprs[rt] = kvm_read_c0_guest_userlocal(cop0);
 #else
/* UserLocal not implemented */
-   er = kvm_mips_emulate_ri_exc(cause, opc, run, vcpu);
+   er = EMULATE_FAIL;
 #endif
break;
 
default:
-   printk("RDHWR not supported\n");
+   kvm_debug("RDHWR %#x not supported @ %p\n", rd, opc);
er = EMULATE_FAIL;
break;
}
} else {
-   printk("Emulate RI not supported @ %p: %#x\n", opc, inst);
+   kvm_debug("Emulate RI not supported @ %p: %#x\n", opc, inst);
er = EMULATE_FAIL;
}
 
@@ -1590,6 +1590,7 @@ kvm_mips_handle_ri(unsigned long cause, uint32_t *opc,
 */
if (er == EMULATE_FAIL) {
vcpu->arch.pc = curr_pc;
+   er = kvm_mips_emulate_ri_exc(cause, opc, run, vcpu);
}
return er;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html