Re: [PATCH 2/2] kvm tools: Allow easily sandboxing applications within a guest

2011-12-01 Thread Pekka Enberg
On Fri, Dec 2, 2011 at 9:44 AM, Sasha Levin  wrote:
>> Would it not be better to introduce a new command that works like 'perf
>> stat', for example:
>>
>>    ./kvm sandbox -k  -- trinity --mode=random --quiet -i
>>
>> ?
>
> So basically proxy the first set of parameters to 'kvm run' and run the
> second one as the script? Thats possible as well.

Yes.

> I did the '--sandbox' parameters so that we could pass a script that
> could do more complex testing in the guest, but it's also possible with
> your suggestion so we could do it that way as well.

We probably should do both, actually. 'kvm sandbox' can be a wrapper
on top of 'kvm run'.

Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm tools: Allow easily sandboxing applications within a guest

2011-12-01 Thread Sasha Levin
On Fri, 2011-12-02 at 09:39 +0200, Pekka Enberg wrote:
> On Fri, 2011-12-02 at 09:26 +0200, Pekka Enberg wrote:
> >> On Fri, Dec 2, 2011 at 9:16 AM, Sasha Levin  
> >> wrote:
> >>> This patch adds a '--sandbox' argument when used in conjuction with a 
> >>> custom
> >>> rootfs, it allows running a script or an executable in the guest 
> >>> environment
> >>> by using executables and other files from the host.
> >>>
> >>> This is useful when testing code that might cause problems on the host, or
> >>> to automate kernel testing since it's now easy to link a kvm tools test
> >>> script with 'git bisect run'.
> >>>
> >>> Suggested-by: Ingo Molnar 
> >>> Signed-off-by: Sasha Levin 
> >>
> >> Nice! How do I use this to run trinity sandboxed in a guest?
> 
> On Fri, 2 Dec 2011, Sasha Levin wrote:
> > Assuming you have trinity installed in /usr/bin or something similar in
> > on the host (you can just 'cp trinity /usr/bin/'), just write this
> > script:
> >
> > test-trinity.sh:
> > #! /bin/bash
> > trinity --mode=random --quiet -i
> >
> > and run using:
> > ./kvm run -k [kernel to test] --sandbox test-trinity.sh
> 
> Would it not be better to introduce a new command that works like 'perf 
> stat', for example:
> 
>./kvm sandbox -k  -- trinity --mode=random --quiet -i
> 
> ?

So basically proxy the first set of parameters to 'kvm run' and run the
second one as the script? Thats possible as well.

I did the '--sandbox' parameters so that we could pass a script that
could do more complex testing in the guest, but it's also possible with
your suggestion so we could do it that way as well.

-- 

Sasha.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm tools: Allow easily sandboxing applications within a guest

2011-12-01 Thread Pekka Enberg

On Fri, 2011-12-02 at 09:26 +0200, Pekka Enberg wrote:

On Fri, Dec 2, 2011 at 9:16 AM, Sasha Levin  wrote:

This patch adds a '--sandbox' argument when used in conjuction with a custom
rootfs, it allows running a script or an executable in the guest environment
by using executables and other files from the host.

This is useful when testing code that might cause problems on the host, or
to automate kernel testing since it's now easy to link a kvm tools test
script with 'git bisect run'.

Suggested-by: Ingo Molnar 
Signed-off-by: Sasha Levin 


Nice! How do I use this to run trinity sandboxed in a guest?


On Fri, 2 Dec 2011, Sasha Levin wrote:

Assuming you have trinity installed in /usr/bin or something similar in
on the host (you can just 'cp trinity /usr/bin/'), just write this
script:

test-trinity.sh:
#! /bin/bash
trinity --mode=random --quiet -i

and run using:
./kvm run -k [kernel to test] --sandbox test-trinity.sh


Would it not be better to introduce a new command that works like 'perf 
stat', for example:


  ./kvm sandbox -k  -- trinity --mode=random --quiet -i

?

Pekka

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm tools: Allow easily sandboxing applications within a guest

2011-12-01 Thread Sasha Levin
On Fri, 2011-12-02 at 09:26 +0200, Pekka Enberg wrote:
> On Fri, Dec 2, 2011 at 9:16 AM, Sasha Levin  wrote:
> > This patch adds a '--sandbox' argument when used in conjuction with a custom
> > rootfs, it allows running a script or an executable in the guest environment
> > by using executables and other files from the host.
> >
> > This is useful when testing code that might cause problems on the host, or
> > to automate kernel testing since it's now easy to link a kvm tools test
> > script with 'git bisect run'.
> >
> > Suggested-by: Ingo Molnar 
> > Signed-off-by: Sasha Levin 
> 
> Nice! How do I use this to run trinity sandboxed in a guest?

Assuming you have trinity installed in /usr/bin or something similar in
on the host (you can just 'cp trinity /usr/bin/'), just write this
script:

test-trinity.sh:
#! /bin/bash
trinity --mode=random --quiet -i

and run using:
./kvm run -k [kernel to test] --sandbox test-trinity.sh

-- 

Sasha.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm tools: Allow easily sandboxing applications within a guest

2011-12-01 Thread Pekka Enberg
On Fri, Dec 2, 2011 at 9:16 AM, Sasha Levin  wrote:
> This patch adds a '--sandbox' argument when used in conjuction with a custom
> rootfs, it allows running a script or an executable in the guest environment
> by using executables and other files from the host.
>
> This is useful when testing code that might cause problems on the host, or
> to automate kernel testing since it's now easy to link a kvm tools test
> script with 'git bisect run'.
>
> Suggested-by: Ingo Molnar 
> Signed-off-by: Sasha Levin 

Nice! How do I use this to run trinity sandboxed in a guest?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] kvm tools: Allow easily sandboxing applications within a guest

2011-12-01 Thread Sasha Levin
This patch adds a '--sandbox' argument when used in conjuction with a custom
rootfs, it allows running a script or an executable in the guest environment
by using executables and other files from the host.

This is useful when testing code that might cause problems on the host, or
to automate kernel testing since it's now easy to link a kvm tools test
script with 'git bisect run'.

Suggested-by: Ingo Molnar 
Signed-off-by: Sasha Levin 
---
 tools/kvm/builtin-run.c |   32 
 tools/kvm/guest/init.c  |   13 -
 2 files changed, 44 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 33de4f6..f5341ae 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -82,6 +82,7 @@ static const char *guest_mac;
 static const char *host_mac;
 static const char *script;
 static const char *guest_name;
+static const char *sandbox;
 static struct virtio_net_params *net_params;
 static bool single_step;
 static bool readonly_image[MAX_DISK_IMAGES];
@@ -420,6 +421,8 @@ static const struct option options[] = {
OPT_CALLBACK('\0', "tty", NULL, "tty id",
 "Remap guest TTY into a pty on the host",
 tty_parser),
+   OPT_STRING('\0', "sandbox", &sandbox, "script",
+   "Run this script when booting into custom rootfs"),
 
OPT_GROUP("Kernel options:"),
OPT_STRING('k', "kernel", &kernel_filename, "kernel",
@@ -702,6 +705,32 @@ void kvm_run_help(void)
usage_with_options(run_usage, options);
 }
 
+static int kvm_run_set_sandbox(void)
+{
+   const char *guestfs_name = "default";
+   char path[PATH_MAX], script[PATH_MAX], *tmp;
+
+   if (image_filename[0])
+   guestfs_name = image_filename[0];
+
+   snprintf(path, PATH_MAX, "%s%s/virt/sandbox.sh", kvm__get_dir(), 
guestfs_name);
+
+   remove(path);
+
+   if (sandbox == NULL)
+   return 0;
+
+   tmp = realpath(sandbox, NULL);
+   if (tmp == NULL)
+   return -ENOMEM;
+
+   snprintf(script, PATH_MAX, "/host/%s", tmp);
+   free(tmp);
+
+   return symlink(script, path);
+}
+
+
 int kvm_cmd_run(int argc, const char **argv, const char *prefix)
 {
static char real_cmdline[2048], default_name[20];
@@ -861,7 +890,10 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
if (using_rootfs) {
strcat(real_cmdline, " root=/dev/root rw 
rootflags=rw,trans=virtio,version=9p2000.L rootfstype=9p");
if (custom_rootfs) {
+   kvm_run_set_sandbox();
+
strcat(real_cmdline, " init=/virt/init");
+
if (!no_dhcp)
strcat(real_cmdline, "  ip=dhcp");
}
diff --git a/tools/kvm/guest/init.c b/tools/kvm/guest/init.c
index 8975023..b71491c 100644
--- a/tools/kvm/guest/init.c
+++ b/tools/kvm/guest/init.c
@@ -16,6 +16,14 @@ static int run_process(char *filename)
return execve(filename, new_argv, new_env);
 }
 
+static int run_process_sandbox(char *filename)
+{
+   char *new_argv[] = { filename, "/virt/sandbox.sh", NULL };
+   char *new_env[] = { "TERM=linux", NULL };
+
+   return execve(filename, new_argv, new_env);
+}
+
 static void do_mounts(void)
 {
mount("hostfs", "/host", "9p", MS_RDONLY, 
"trans=virtio,version=9p2000.L");
@@ -38,7 +46,10 @@ int main(int argc, char *argv[])
 
puts("Starting '/bin/sh'...");
 
-   run_process("/bin/sh");
+   if (access("/virt/sandbox.sh", R_OK) == 0)
+   run_process_sandbox("/bin/sh");
+   else
+   run_process("/bin/sh");
 
printf("Init failed: %s\n", strerror(errno));
 
-- 
1.7.8.rc4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] kvm tools: Remove double 'init=' kernel param

2011-12-01 Thread Sasha Levin
Signed-off-by: Sasha Levin 
---
 tools/kvm/builtin-run.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index 43cf2c4..33de4f6 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -856,9 +856,6 @@ int kvm_cmd_run(int argc, const char **argv, const char 
*prefix)
if (virtio_9p__register(kvm, "/", "hostfs") < 0)
die("Unable to initialize virtio 9p");
using_rootfs = custom_rootfs = 1;
-
-   if (!strstr(real_cmdline, "init="))
-   strlcat(real_cmdline, " init=/bin/sh ", 
sizeof(real_cmdline));
}
 
if (using_rootfs) {
-- 
1.7.8.rc4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Veirfy memory slot only for readability

2011-12-01 Thread Takuya Yoshikawa

(2011/12/02 14:46), Sasha Levin wrote:

On Fri, 2011-12-02 at 12:16 +0900, Takuya Yoshikawa wrote:

(2011/12/02 10:15), Takuya Yoshikawa wrote:

(2011/12/02 4:42), Sasha Levin wrote:

It's enough for memory slot to be readable, as the comment above the check
states.

A user should be able to create read-only memory slot.


Note: at that time, it looked to me that the API did not allow me to know
which type was being registered.


Which code exactly assumes that all memory slots are writable?


I agree with you, but as I wrote, my patch only used the check for
read with __xxx_user() because I had not thought changing the code
for write, not hot compared to read, would not worth it.

But access_ok() does the same for the two cases, so I did not mind
changing VERIFY_READ to VERIFY_WRITE: you should not have any problem
to do what you want, no?

I am not mentioning whether it is good as a programming style, here.




Do you want to create read only memory slots for kvm tool?


What KVM tool currently does is copy the kernel into guest memory and
run it from there. An idea raised recently was instead of copying it we
should mmap it into the memory to reduce footprint.


Interesting.



This is why I'm looking into adding a read only memory slot. The KVM
code doesn't have to know it's read only.



What I wanted to say was that, it might be cleaner if I could put the
exactly needed flag only.

And if you want to change the VERIFY_WRITE to VERIFY_READ, please check
the following commit as well:

commit ccddff20b9368cce9b774f6a22440a7c45367784
Author: Xiao Guangrong 

KVM: use __copy_to_user/__clear_user to write guest page

Ah, and the code is not x86 specific, so you may better check other
architectures as well.

Anyway, adding some more descriptions based on the history will be nice.

Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: make vcpu life cycle separated from kvm instance

2011-12-01 Thread Liu Ping Fan
From: Liu Ping Fan 

Currently, vcpu can be destructed only when kvm instance destroyed.
Change this to vcpu's destruction taken when its refcnt is zero,
and then vcpu MUST and CAN be destroyed before kvm's destroy.

Signed-off-by: Liu Ping Fan 
---
 arch/x86/kvm/i8254.c |   14 +-
 arch/x86/kvm/i8259.c |   14 +-
 arch/x86/kvm/mmu.c   |   12 -
 arch/x86/kvm/x86.c   |   66 +++
 include/linux/kvm_host.h |   24 +--
 virt/kvm/irq_comm.c  |   16 ++--
 virt/kvm/kvm_main.c  |   98 --
 7 files changed, 188 insertions(+), 56 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 76e3f1c..36e9943 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -289,7 +289,7 @@ static void pit_do_work(struct work_struct *work)
struct kvm_pit *pit = container_of(work, struct kvm_pit, expired);
struct kvm *kvm = pit->kvm;
struct kvm_vcpu *vcpu;
-   int i;
+   int i, cnt;
struct kvm_kpit_state *ps = &pit->pit_state;
int inject = 0;
 
@@ -315,9 +315,17 @@ static void pit_do_work(struct work_struct *work)
 * LVT0 to NMI delivery. Other PIC interrupts are just sent to
 * VCPU0, and only if its LVT0 is in EXTINT mode.
 */
-   if (kvm->arch.vapics_in_nmi_mode > 0)
-   kvm_for_each_vcpu(i, vcpu, kvm)
+   if (kvm->arch.vapics_in_nmi_mode > 0) {
+   rcu_read_lock();
+   kvm_for_each_vcpu(i, cnt, vcpu, kvm) {
+   vcpu = kvm_get_vcpu(kvm, i);
+   if (vcpu == NULL)
+   continue;
+   cnt++;
kvm_apic_nmi_wd_deliver(vcpu);
+   }
+   rcu_read_unlock();
+   }
}
 }
 
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index cac4746..529057c 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -50,25 +50,33 @@ static void pic_unlock(struct kvm_pic *s)
 {
bool wakeup = s->wakeup_needed;
struct kvm_vcpu *vcpu, *found = NULL;
-   int i;
+   struct kvm *kvm = s->kvm;
+   int i, cnt;
 
s->wakeup_needed = false;
 
spin_unlock(&s->lock);
 
if (wakeup) {
-   kvm_for_each_vcpu(i, vcpu, s->kvm) {
+   rcu_read_lock();
+   kvm_for_each_vcpu(i, cnt, vcpu, kvm) {
+   vcpu = kvm_get_vcpu(kvm, i);
+   if (vcpu == NULL)
+   continue;
+   cnt++;
if (kvm_apic_accept_pic_intr(vcpu)) {
found = vcpu;
break;
}
}
-
+   found = kvm_vcpu_get(found);
+   rcu_read_unlock();
if (!found)
return;
 
kvm_make_request(KVM_REQ_EVENT, found);
kvm_vcpu_kick(found);
+   kvm_vcpu_put(found);
}
 }
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f1b36cf..b9c3a01 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1833,11 +1833,17 @@ static void kvm_mmu_put_page(struct kvm_mmu_page *sp, 
u64 *parent_pte)
 
 static void kvm_mmu_reset_last_pte_updated(struct kvm *kvm)
 {
-   int i;
+   int i, cnt;
struct kvm_vcpu *vcpu;
-
-   kvm_for_each_vcpu(i, vcpu, kvm)
+   rcu_read_lock();
+   kvm_for_each_vcpu(i, cnt, vcpu, kvm) {
+   vcpu = kvm_get_vcpu(kvm, i);
+   if (vcpu == NULL)
+   continue;
+   cnt++;
vcpu->arch.last_pte_updated = NULL;
+   }
+   rcu_read_unlock();
 }
 
 static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c38efd7..5bd8b95 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1830,11 +1830,19 @@ static int get_msr_hyperv(struct kvm_vcpu *vcpu, u32 
msr, u64 *pdata)
 
switch (msr) {
case HV_X64_MSR_VP_INDEX: {
-   int r;
+   int r, cnt;
struct kvm_vcpu *v;
-   kvm_for_each_vcpu(r, v, vcpu->kvm)
+   struct kvm *kvm =  vcpu->kvm;
+   rcu_read_lock();
+   kvm_for_each_vcpu(r, cnt, v, kvm) {
+   v = kvm_get_vcpu(kvm, r);
+   if (v == NULL)
+   continue;
+   cnt++;
if (v == vcpu)
data = r;
+   }
+   rcu_read_unlock();
break;
}
case HV_X64_MSR_EOI:
@@ -4966,7 +4974,7 @@ static int kvmclock_cpufreq_notifier(stru

Re: [PATCH] KVM: Veirfy memory slot only for readability

2011-12-01 Thread Sasha Levin
On Fri, 2011-12-02 at 12:16 +0900, Takuya Yoshikawa wrote:
> (2011/12/02 10:15), Takuya Yoshikawa wrote:
> > (2011/12/02 4:42), Sasha Levin wrote:
> >> It's enough for memory slot to be readable, as the comment above the check
> >> states.
> >>
> >> A user should be able to create read-only memory slot.
> 
> Note: at that time, it looked to me that the API did not allow me to know
> which type was being registered.

Which code exactly assumes that all memory slots are writable?

> Do you want to create read only memory slots for kvm tool?

What KVM tool currently does is copy the kernel into guest memory and
run it from there. An idea raised recently was instead of copying it we
should mmap it into the memory to reduce footprint.

This is why I'm looking into adding a read only memory slot. The KVM
code doesn't have to know it's read only.

-- 

Sasha.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] vfio: VFIO Driver core framework

2011-12-01 Thread Alexey Kardashevskiy
On 29/11/11 16:48, Alex Williamson wrote:
> On Tue, 2011-11-29 at 15:34 +1100, Alexey Kardashevskiy wrote:
>> Hi!
>>
>> On 29/11/11 14:46, Alex Williamson wrote:
>>> On Tue, 2011-11-29 at 12:52 +1100, Alexey Kardashevskiy wrote:
 Hi!

 I tried (successfully) to run it on POWER and while doing that I found 
 some issues. I'll try to
 explain them in separate mails.
>>>
>>> Great!
>>>
 IOMMU domain setup. On POWER, the linux drivers capable of DMA transfer 
 want to know
 a DMA window, i.e. its start and length in the PHB address space. This 
 comes from
 hardware. On X86 (correct if I am wrong), every device driver in the guest 
 allocates
 memory from the same pool.
>>>
>>> Yes, current VT-d/AMD-Vi provide independent IOVA spaces for each
>>> device.
>>>
  On POWER, device drivers get DMA window and allocate pages
 for DMA within this window. In the case of VFIO, that means that QEMU has 
 to
 preallocate this DMA window before running a quest, pass it to a guest (via
 device tree) and then a guest tells the host what pages are taken/released 
 by
 calling map/unmap callbacks of iommu_ops. Deallocation is made in a device 
 detach
 callback as I did not want to add more ioctls.
 So, there are 2 patches:

 - new VFIO_IOMMU_SETUP ioctl introduced which allocates a DMA window via 
 IOMMU API on
 POWER.
 btw do we need an additional capability bit for it?

 KERNEL PATCH:

 diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
 index 10615ad..a882e08 100644
 --- a/drivers/iommu/iommu.c
 +++ b/drivers/iommu/iommu.c
 @@ -247,3 +247,12 @@ int iommu_device_group(struct device *dev, unsigned 
 int *groupid)
return -ENODEV;
  }
  EXPORT_SYMBOL_GPL(iommu_device_group);
 +
 +int iommu_setup(struct iommu_domain *domain,
 +  size_t requested_size, size_t *allocated_size,
 +  phys_addr_t *start_address)
 +{
 +  return domain->ops->setup(domain, requested_size, allocated_size,
 +  start_address);
 +}
 +EXPORT_SYMBOL_GPL(iommu_setup);
>>>
>>> requested_size seems redundant both here and in struct vfio_setup.  We
>>> can just pre-load size/start with desired values.  I assume x86 IOMMUs
>>> would ignore requested values and return start = 0 and size = hardware
>>> decoder address bits.  The IOMMU API currently allows:
>>>
>>> iommu_domain_alloc
>>> [iommu_attach_device]
>>> [iommu_map]
>>> [iommu_unmap]
>>> [iommu_detach_device]
>>> iommu_domain_free
>>>
>>> where everything between alloc and free can be called in any order.  How
>>> does setup fit into that model?
>>
>> This is why I posted a QEMU patch :)
> 
> Right, but qemu/vfio is by no means the defacto standard of how one must
> use the IOMMU API.  KVM currently orders the map vs attach differently.
> When is it valid to call setup when factoring in hot attached/detached
> devices?
> 


>>> For this it seems like we'd almost want
>>> to combine alloc, setup, and the first attach into a single call (ie.
>>> create a domain with this initial device and these parameters), then
>>> subsequent attaches would only allow compatible devices.
>>
>>
>> Not exactly. This setup is more likely to get combined with domain alloc 
>> only.
> 
> At domain_alloc we don't have any association to actual hardware other
> than a bus_type, how would you know which iommu is being setup?


Yes. This is exact problem. We do have preallocated PEs (aka groups) on POWER 
but cannot
use this information during setup until the first device is attached.

Generally speaking, we should not be adding devices to an IOMMU domain, we 
should be adding groups
instead as it is the smallest entity which IOMMU can handle. At least in API, 
it is better to have
as simple as the idea it is implementing.

For example, we could implement a tool to put devices into a group (at the 
moment on POWER it would
check that a domain has no more than just a single group in it). We could pass 
this group ID to QEMU
instead of passing "-device vfio-pci ..." (as we still must pass _all_ devices 
of a group to QEMU).

Sure, we'll have change VFIO to tell QEMU what devices are included in what 
group, quite easy to do.

It is a tree: domain -> group -> device. Lets reflect it in the API.




>> On POWER, we have iommu_table per DMA window which can be or can be not 
>> shared
>> between devices. At the moment there is one window per PCIe _device_ (so 
>> multiple
>> functions of multiport network adapter share one DMA window) and one window 
>> for
>> all the devices behind PCIe-to-PCI bridge. It is more or less so.
>>
>>
>>> I'm a little confused though, is the window determined by hardware or is
>>> it configurable via requested_size?
>>
>>
>> The window parameters are calculated by software and then written to 
>> hardware so
>> hardware does filtering and prevents bad devices from memory corrup

Re: [PATCH] KVM: Veirfy memory slot only for readability

2011-12-01 Thread Takuya Yoshikawa
(2011/12/02 10:15), Takuya Yoshikawa wrote:
> (2011/12/02 4:42), Sasha Levin wrote:
>> It's enough for memory slot to be readable, as the comment above the check
>> states.
>>
>> A user should be able to create read-only memory slot.

Note: at that time, it looked to me that the API did not allow me to know
which type was being registered.

Do you want to create read only memory slots for kvm tool?

Takuya

> 
> I submitted the original patch like you to speed up page table walking,
> a hot path in KVM, and Avi applied the patch with changing the VERIFY_READ
> to VERIFY_WRITE:  on x86, both do the same check.  You can see that on the
> commit.
> 
> After that, Xiao started to write with __xxx_user() based on this check IIRC.
> So you should keep the code as is and change the comment if you like!
> 
> Thanks,
>   Takuya
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] net: use this_cpu_xxx replace percpu_xxx funcs

2011-12-01 Thread Alex,Shi
On Mon, 2011-11-21 at 17:00 +0800, Alex,Shi wrote:
> On Thu, 2011-10-20 at 16:38 +0800, Eric Dumazet wrote:
> > Le jeudi 20 octobre 2011 à 15:32 +0800, Alex,Shi a écrit :
> > > percpu_xxx funcs are duplicated with this_cpu_xxx funcs, so replace them
> > > for further code clean up.
> > > 
> > > And in preempt safe scenario, __this_cpu_xxx funcs has a bit better
> > > performance since __this_cpu_xxx has no redundant preempt_disable()
> > > 
> > > Signed-off-by: Alex Shi 
> > > ---
> > >  net/netfilter/xt_TEE.c |   12 ++--
> > >  net/socket.c   |4 ++--
> > >  2 files changed, 8 insertions(+), 8 deletions(-)
> > 
> > Acked-by: Eric Dumazet 
> > 
> > Thanks !
> 
> Anyone like to pick up this patch? or more comments for this? 

Kaber, David: 
I appreciate for your any comments on this. Could you like do me a
favor? 



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Veirfy memory slot only for readability

2011-12-01 Thread Takuya Yoshikawa
(2011/12/02 4:42), Sasha Levin wrote:
> It's enough for memory slot to be readable, as the comment above the check
> states.
> 
> A user should be able to create read-only memory slot.

I submitted the original patch like you to speed up page table walking,
a hot path in KVM, and Avi applied the patch with changing the VERIFY_READ
to VERIFY_WRITE:  on x86, both do the same check.  You can see that on the
commit.

After that, Xiao started to write with __xxx_user() based on this check IIRC.
So you should keep the code as is and change the comment if you like!

Thanks,
Takuya

> 
> Cc: Avi Kivity
> Cc: Marcelo Tosatti
> Cc: kvm@vger.kernel.org
> Signed-off-by: Sasha Levin
> ---
>   virt/kvm/kvm_main.c |2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index e289486..b92883f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -727,7 +727,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
>   /* We can read the guest memory with __xxx_user() later on. */
>   if (user_alloc&&
>   ((mem->userspace_addr&  (PAGE_SIZE - 1)) ||
> -  !access_ok(VERIFY_WRITE,
> +  !access_ok(VERIFY_READ,
>   (void __user *)(unsigned long)mem->userspace_addr,
>   mem->memory_size)))
>   goto out;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] virtio: use mandatory barriers for remote processor vdevs

2011-12-01 Thread Rusty Russell
On Thu, 1 Dec 2011 10:12:37 +0200, "Michael S. Tsirkin"  wrote:
> On Thu, Dec 01, 2011 at 12:58:59PM +1030, Rusty Russell wrote:
> > On Thu, 1 Dec 2011 01:13:07 +0200, "Michael S. Tsirkin"  
> > wrote:
> > > For x86, stores into memory are ordered. So I think that yes, smp_XXX
> > > can be selected at compile time.
> > > 
> > > So let's forget the virtio strangeness for a minute,
> > 
> > Hmm, we got away with light barriers because we knew we were not
> > *really* talking to a device.  But now with virtio-mmio, turns out we
> > are :)
> 
> You think virtio-mmio this issue too?  It's reported on remoteproc...

I think any non-virtual, non-PCI device has to worry about it.  Perhaps
all virtio-mmio are virtual (at this point).

I'm tempted to say we want permission from the device to do relaxed
barriers (so I don't have to worry about it!)

> > I'm really tempted to revert d57ed95 for 3.2, and we can revisit this
> > optimization later if it proves worthwhile.
> 
> Generally it does seem the best we can do for 3.2.
> 
> Given it's rc3, I'd be a bit wary of introducing regressions - I'll try
> to find some real setups (as in - not my laptop) to run some benchmarks
> on, to verify there's no major problem.
> I hope I can report on this in about a week from now - want to hold onto this 
> meanwhile?

Yep, no huge hurry.  Thanks!

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors

2011-12-01 Thread Rusty Russell
On Thu, 1 Dec 2011 12:26:42 +0200, "Michael S. Tsirkin"  wrote:
> On Thu, Dec 01, 2011 at 10:09:37AM +0200, Sasha Levin wrote:
> > On Thu, 2011-12-01 at 09:58 +0200, Michael S. Tsirkin wrote:
> > > We'll presumably need some logic to increment is back,
> > > to account for random workload changes.
> > > Something like slow start?
> > 
> > We can increment it each time the queue was less than 10% full, it
> > should act like slow start, no?
> 
> No, we really shouldn't get an empty ring as long as things behave
> well. What I meant is something like:

I was thinking of the network output case, but you're right.  We need to
distinguish between usually full (eg. virtio-net input) and usually
empty (eg. virtio-net output).

The signal for "we to pack more into the ring" is different.  We could
use some hacky heuristic like "out == 0" but I'd rather make it explicit
when we set up the virtqueue.

Our other alternative, moving the logic to the driver, is worse.

As to fading the effect over time, that's harder.  We have to deplete
the ring quite a few times before it turns into always-indirect.  We
could back off every time the ring is totally idle, but that may hurt
bursty traffic.  Let's try simple first?

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM test: Introduce qemu_iotests for the KVM test

2011-12-01 Thread Lucas Meneghel Rodrigues
In order to run always the latest up to date qemu iotests
on all KVM branches, insert qemu-iotests execution inside
KVM autotest. It'll attempt to git fetch the latest contents
of the qemu-iotests test suite, then carry on with tests,
reporting errors to autotest.

Besides having the latest test, with this approach it is
easier to specify to the test suite which paths we want
to get tested, which are under kvm autotest's control.
This patch actually depends on a couple of patches sent
to the qemu_iotest upstream maintainers.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/subtests.cfg.sample   |   13 +++
 client/tests/kvm/tests/qemu_iotests.py |   62 
 2 files changed, 75 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/kvm/tests/qemu_iotests.py

diff --git a/client/tests/kvm/subtests.cfg.sample 
b/client/tests/kvm/subtests.cfg.sample
index 55f1b21..9b997dd 100644
--- a/client/tests/kvm/subtests.cfg.sample
+++ b/client/tests/kvm/subtests.cfg.sample
@@ -80,6 +80,19 @@ variants:
 extra_params = " --append ks=REPLACE_THIS_WITH_URL_OF_KS"
 url = REPLACE_THIS_WITH_TREE_URL
 
+- qemu_iotests:
+type = qemu_iotests
+vms = ''
+profilers = ''
+take_regular_screendumps = no
+qemu_io_uri = 
git://git.kernel.org/pub/scm/linux/kernel/git/hch/qemu-iotests.git
+qemu_io_branch = master
+qemu_io_lbranch = master
+qemu_io_formats = "raw qcow qcow2 qed vdi vpc vmdk rdb sheepdog"
+#qemu_io_commit =
+#qemu_io_base_uri =
+#qemu_io_extra_options =
+
 - qemu_img:
 type = qemu_img
 vms = ''
diff --git a/client/tests/kvm/tests/qemu_iotests.py 
b/client/tests/kvm/tests/qemu_iotests.py
new file mode 100644
index 000..2b44a42
--- /dev/null
+++ b/client/tests/kvm/tests/qemu_iotests.py
@@ -0,0 +1,62 @@
+import os
+from autotest_lib.client.common_lib import git, error
+from autotest_lib.client.bin import utils
+from autotest_lib.client.virt import virt_utils
+
+
+def run_qemu_iotests(test, params, env):
+"""
+Fetch from git and run qemu-iotests using the qemu binaries under test.
+
+1) Fetch qemu-io from git
+2) Parse help output to figure out supported file formats
+3) Run test for each file format detected
+4) Report any errors found to autotest
+
+@param test:   KVM test object.
+@param params: Dictionary with the test parameters.
+@param env:Dictionary with test environment.
+"""
+# First, let's get qemu-io
+std = "git://git.kernel.org/pub/scm/linux/kernel/git/hch/qemu-iotests.git"
+uri = params.get("qemu_io_uri", std)
+branch = params.get("qemu_io_branch", 'master')
+lbranch = params.get("qemu_io_lbranch", 'master')
+commit = params.get("qemu_io_commit", None)
+base_uri = params.get("qemu_io_base_uri", None)
+destination_dir = os.path.join(test.srcdir, "qemu_io_tests")
+git.get_repo(uri=uri, branch=branch, lbranch=lbranch, commit=commit,
+ destination_dir=destination_dir, base_uri=base_uri)
+
+# Then, set the qemu paths for the use of the testsuite
+os.environ["QEMU_PROG"] = virt_utils.get_path(test.bindir,
+params.get("qemu_binary", "qemu"))
+os.environ["QEMU_IMG_PROG"] = virt_utils.get_path(test.bindir,
+params.get("qemu_img_binary", "qemu-img"))
+os.environ["QEMU_IO_PROG"] = virt_utils.get_path(test.bindir,
+params.get("qemu_io_binary", "qemu-io"))
+
+# Parse help output to figure out supported file formats
+os.chdir(destination_dir)
+formats = params.get("qemu_io_image_formats",
+ "raw qcow2 qed qcow vdi vpc vmdk rdb sheepdog")
+extra_options = params.get("qemu_io_extra_options", "")
+
+formats = formats.split()
+
+# Run test for each file format detected
+err = []
+cmd = './check'
+if extra_options:
+cmd += extra_options
+for f in formats:
+try:
+utils.system("%s -%s" % (cmd, f))
+except error.CmdError:
+err.append(format)
+
+# Report all errors found to autotest
+if err:
+err = ", ".join(err)
+e_msg = "Testsuite qemu-io reported errors for formats: %s" % err
+raise error.TestFail(e_msg)
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM test: Make tests aware of the qemu-io path

2011-12-01 Thread Lucas Meneghel Rodrigues
As it is important for qemu-iotests that this particular
program is used.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/base.cfg.sample  |1 +
 client/tests/kvm/tests.cfg.sample |4 
 client/virt/kvm_installer.py  |   24 
 3 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/base.cfg.sample b/client/tests/kvm/base.cfg.sample
index c99add6..21fa513 100644
--- a/client/tests/kvm/base.cfg.sample
+++ b/client/tests/kvm/base.cfg.sample
@@ -3,6 +3,7 @@
 # Absolute paths and/or names of binaries (default path is /usr/bin)
 qemu_binary = qemu
 qemu_img_binary = qemu-img
+qemu_io_binary = qemu-io
 
 # List of virtual machine object names (whitespace seperated)
 vms = vm1
diff --git a/client/tests/kvm/tests.cfg.sample 
b/client/tests/kvm/tests.cfg.sample
index b011540..bd0d05e 100644
--- a/client/tests/kvm/tests.cfg.sample
+++ b/client/tests/kvm/tests.cfg.sample
@@ -60,6 +60,7 @@ variants:
 # We want qemu-kvm for this run
 qemu_binary = /usr/bin/qemu-kvm
 qemu_img_binary = /usr/bin/qemu-img
+qemu_io_binary = /usr/bin/qemu-io
 # Only qcow2 file format
 only qcow2
 # Only rtl8139 for nw card (default on qemu-kvm)
@@ -82,6 +83,7 @@ variants:
 # We want qemu for this run
 qemu_binary = /usr/bin/qemu
 qemu_img_binary = /usr/bin/qemu-img
+qemu_io_binary = /usr/bin/qemu-io
 only qcow2
 # The default nw card for qemu is e1000
 only e1000
@@ -100,6 +102,7 @@ variants:
 # We want qemu-kvm for this run
 qemu_binary = /usr/bin/qemu-kvm
 qemu_img_binary = /usr/bin/qemu-img
+qemu_io_binary = /usr/bin/qemu-io
 only qcow2
 only rtl8139
 only ide
@@ -117,6 +120,7 @@ variants:
 # We want qemu-kvm for this run
 qemu_binary = /usr/bin/qemu-kvm
 qemu_img_binary = /usr/bin/qemu-img
+qemu_io_binary = /usr/bin/qemu-io
 only qcow2
 only rtl8139
 only ide
diff --git a/client/virt/kvm_installer.py b/client/virt/kvm_installer.py
index 6bebd87..2af359d 100644
--- a/client/virt/kvm_installer.py
+++ b/client/virt/kvm_installer.py
@@ -32,6 +32,7 @@ class KVMBaseInstaller(base_installer.BaseInstaller):
 #
 QEMU_BIN = 'qemu'
 QEMU_IMG_BIN = 'qemu-img'
+QEMU_IO_BIN = 'qemu-io'
 
 
 def _kill_qemu_processes(self):
@@ -131,6 +132,23 @@ class KVMBaseInstaller(base_installer.BaseInstaller):
 return None
 
 
+def _qemu_io_bin_exists_at_prefix(self):
+'''
+Attempts to find the qemu-io binary at the installation prefix
+
+@return: full path of qemu-io binary or None if not found
+'''
+qemu_io_bin_name = os.path.join(self.install_prefix,
+ 'bin', self.QEMU_IO_BIN)
+if os.path.isfile(qemu_io_bin_name):
+logging.debug('Found qemu-io binary at %s', qemu_io_bin_name)
+return qemu_io_bin_name
+else:
+logging.debug('Could not find qemu-img binary at prefix %s',
+  self.install_prefix)
+return None
+
+
 def _create_symlink_qemu(self):
 """
 Create symbolic links for qemu and qemu-img commands on test bindir
@@ -141,6 +159,7 @@ class KVMBaseInstaller(base_installer.BaseInstaller):
 
 qemu_dst = os.path.join(self.test_bindir, self.QEMU_BIN)
 qemu_img_dst = os.path.join(self.test_bindir, self.QEMU_IMG_BIN)
+qemu_io_dst = os.path.join(self.test_bindir, self.QEMU_IO_BIN)
 
 qemu_bin = self._qemu_bin_exists_at_prefix()
 if qemu_bin is not None:
@@ -154,6 +173,11 @@ class KVMBaseInstaller(base_installer.BaseInstaller):
 else:
 raise error.TestError('Invalid qemu-img path')
 
+qemu_io_bin = self._qemu_io_bin_exists_at_prefix()
+if qemu_img_bin is not None:
+os.symlink(qemu_io_bin, qemu_io_dst)
+else:
+raise error.TestError('Invalid qemu-img path')
 
 def _install_phase_init(self):
 '''
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Suport for qemu-iotests from inside KVM autotest

2011-12-01 Thread Lucas Meneghel Rodrigues
Currently, qemu-iotests is implemented outside
the KVM autotest realm. Although convenient if
you want to test the default qemu shipped with
your distro, it's not so convenient to integrate
with the KVM autotest workflow.

In order to take advantage of the fact kvm autotest
tests are already testing a set of paths
[qemu, qemu-img and qemu-io], implement a qemu-iotests
that fetch the latest testsuite from git and run the
available tests, with the appropriate qemu paths.

Lucas Meneghel Rodrigues (2):
  KVM test: Make tests aware of the qemu-io path
  KVM test: Introduce qemu_iotests for the KVM test

 client/tests/kvm/base.cfg.sample   |1 +
 client/tests/kvm/subtests.cfg.sample   |   13 +++
 client/tests/kvm/tests.cfg.sample  |4 ++
 client/tests/kvm/tests/qemu_iotests.py |   62 
 client/virt/kvm_installer.py   |   24 
 5 files changed, 104 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/kvm/tests/qemu_iotests.py

-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Guest stop notification

2011-12-01 Thread Eric B Munson
On Thu, 01 Dec 2011, Marcelo Tosatti wrote:

> On Thu, Dec 01, 2011 at 06:36:17PM +0100, Jan Kiszka wrote:
> > On 2011-12-01 18:22, Eric B Munson wrote:
> > > On Thu, 01 Dec 2011, Jan Kiszka wrote:
> > > 
> > >> On 2011-11-29 22:36, Eric B Munson wrote:
> > >>> Often when a guest is stopped from the qemu console, it will report 
> > >>> spurious
> > >>> soft lockup warnings on resume.  There are kernel patches being 
> > >>> discussed that
> > >>> will give the host the ability to tell the guest that it is being 
> > >>> stopped and
> > >>> should ignore the soft lockup warning that generates.
> > >>>
> > >>> Signed-off-by: Eric B Munson 
> > >>> Cc: ry...@linux.vnet.ibm.com
> > >>> Cc: aligu...@us.ibm.com
> > >>> Cc: mtosa...@redhat.com
> > >>> Cc: a...@redhat.com
> > >>> Cc: kvm@vger.kernel.org
> > >>> Cc: linux-ker...@vger.kernel.org
> > >>> ---
> > >>>  target-i386/kvm.c |6 ++
> > >>>  1 files changed, 6 insertions(+), 0 deletions(-)
> > >>>
> > >>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> > >>> index 5bfc21f..defd364 100644
> > >>> --- a/target-i386/kvm.c
> > >>> +++ b/target-i386/kvm.c
> > >>> @@ -336,12 +336,18 @@ static int kvm_inject_mce_oldstyle(CPUState *env)
> > >>>  return 0;
> > >>>  }
> > >>>  
> > >>> +static void kvm_put_guest_paused(CPUState *penv)
> > >>> +{
> > >>> +kvm_vcpu_ioctl(penv, KVM_GUEST_PAUSED, 0);
> > >>> +}
> > >>
> > >> I see no need in encapsulating this in a separate function.
> > >>
> > >>> +
> > >>>  static void cpu_update_state(void *opaque, int running, RunState state)
> > >>>  {
> > >>>  CPUState *env = opaque;
> > >>>  
> > >>>  if (running) {
> > >>>  env->tsc_valid = false;
> > >>> +   kvm_put_guest_paused(env);
> > >>
> > >> checkpatch.pl would have asked you to remove this tab.
> > >>
> > >> More general:
> > >>
> > >> Why is this x86-only? If the kernel interface is x86-only, what prevents
> > >> making it generic right from the beginning?
> > > 
> > > Sorry, missed this question on the first pass, this is x86 only because 
> > > the
> > > flag used lives in the pvclock structure.  AFAICT, there aren't any other
> > > architectures out there that implement paravirtualized clocks yet.
> > 
> > That's an implementation "detail" of the kernel. The interface (IOCTL or
> > kvm_run field) is generic, no?
> > 
> > I would just fire this notification from generic code, evaluate the
> > error (that was lacking so far), and only report it if it's something
> > else than "not supported".
> 
> Yes, it should live in hw/kvmclock.c preferably.
> 

Okay, I get a V3 with this moved around out tomorrow.

Thanks for the feedback,
Eric


signature.asc
Description: Digital signature


Re: [PATCH] Guest stop notification

2011-12-01 Thread Marcelo Tosatti
On Thu, Dec 01, 2011 at 12:19:38PM -0500, Eric B Munson wrote:
> On Thu, 01 Dec 2011, Jan Kiszka wrote:
> 
> > On 2011-11-29 22:36, Eric B Munson wrote:
> > > Often when a guest is stopped from the qemu console, it will report 
> > > spurious
> > > soft lockup warnings on resume.  There are kernel patches being discussed 
> > > that
> > > will give the host the ability to tell the guest that it is being stopped 
> > > and
> > > should ignore the soft lockup warning that generates.
> > > 
> > > Signed-off-by: Eric B Munson 
> > > Cc: ry...@linux.vnet.ibm.com
> > > Cc: aligu...@us.ibm.com
> > > Cc: mtosa...@redhat.com
> > > Cc: a...@redhat.com
> > > Cc: kvm@vger.kernel.org
> > > Cc: linux-ker...@vger.kernel.org
> > > ---
> > >  target-i386/kvm.c |6 ++
> > >  1 files changed, 6 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> > > index 5bfc21f..defd364 100644
> > > --- a/target-i386/kvm.c
> > > +++ b/target-i386/kvm.c
> > > @@ -336,12 +336,18 @@ static int kvm_inject_mce_oldstyle(CPUState *env)
> > >  return 0;
> > >  }
> > >  
> > > +static void kvm_put_guest_paused(CPUState *penv)
> > > +{
> > > +kvm_vcpu_ioctl(penv, KVM_GUEST_PAUSED, 0);
> > > +}
> > 
> > I see no need in encapsulating this in a separate function.
> > 
> 
> The encapsulated function was from a previous idea, I will remove it for V2.
> 
> > > +
> > >  static void cpu_update_state(void *opaque, int running, RunState state)
> > >  {
> > >  CPUState *env = opaque;
> > >  
> > >  if (running) {
> > >  env->tsc_valid = false;
> > > + kvm_put_guest_paused(env);
> > 
> > checkpatch.pl would have asked you to remove this tab.
> 
> Will change to spaces for V2.
> 
> > 
> > More general:
> > 
> > Why is this x86-only? If the kernel interface is x86-only, what prevents
> > making it generic right from the beginning?
> > 
> > Why do we need a new IOCTL for this? Was there no space left in the
> > kvm_run structure e.g. to pass this flag down on next vcpu execution? No
> > big deal, just wondering.
> 
> Thanks for your review/feedback.
> 
> When I started looking into this problem, the ioctl was the first suggestion I
> got for how to communicate from qemu to guest kernel.  I don't see a technical
> reason that this could not be added to the kvm_run structure in one of the
> bytes currently used as padding.  I would prefer to keep the ioctl because I
> have the corresponding kernel patches out to work with this, however, if there
> is a strong preference for using kvm_run, I can rework both sets.
> 
> Eric

This functionality being on top of kvmclock, it is more natural for this
command to be an ioctl (in similarity with other kvmclock commands).


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Guest stop notification

2011-12-01 Thread Marcelo Tosatti
On Thu, Dec 01, 2011 at 06:36:17PM +0100, Jan Kiszka wrote:
> On 2011-12-01 18:22, Eric B Munson wrote:
> > On Thu, 01 Dec 2011, Jan Kiszka wrote:
> > 
> >> On 2011-11-29 22:36, Eric B Munson wrote:
> >>> Often when a guest is stopped from the qemu console, it will report 
> >>> spurious
> >>> soft lockup warnings on resume.  There are kernel patches being discussed 
> >>> that
> >>> will give the host the ability to tell the guest that it is being stopped 
> >>> and
> >>> should ignore the soft lockup warning that generates.
> >>>
> >>> Signed-off-by: Eric B Munson 
> >>> Cc: ry...@linux.vnet.ibm.com
> >>> Cc: aligu...@us.ibm.com
> >>> Cc: mtosa...@redhat.com
> >>> Cc: a...@redhat.com
> >>> Cc: kvm@vger.kernel.org
> >>> Cc: linux-ker...@vger.kernel.org
> >>> ---
> >>>  target-i386/kvm.c |6 ++
> >>>  1 files changed, 6 insertions(+), 0 deletions(-)
> >>>
> >>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> >>> index 5bfc21f..defd364 100644
> >>> --- a/target-i386/kvm.c
> >>> +++ b/target-i386/kvm.c
> >>> @@ -336,12 +336,18 @@ static int kvm_inject_mce_oldstyle(CPUState *env)
> >>>  return 0;
> >>>  }
> >>>  
> >>> +static void kvm_put_guest_paused(CPUState *penv)
> >>> +{
> >>> +kvm_vcpu_ioctl(penv, KVM_GUEST_PAUSED, 0);
> >>> +}
> >>
> >> I see no need in encapsulating this in a separate function.
> >>
> >>> +
> >>>  static void cpu_update_state(void *opaque, int running, RunState state)
> >>>  {
> >>>  CPUState *env = opaque;
> >>>  
> >>>  if (running) {
> >>>  env->tsc_valid = false;
> >>> + kvm_put_guest_paused(env);
> >>
> >> checkpatch.pl would have asked you to remove this tab.
> >>
> >> More general:
> >>
> >> Why is this x86-only? If the kernel interface is x86-only, what prevents
> >> making it generic right from the beginning?
> > 
> > Sorry, missed this question on the first pass, this is x86 only because the
> > flag used lives in the pvclock structure.  AFAICT, there aren't any other
> > architectures out there that implement paravirtualized clocks yet.
> 
> That's an implementation "detail" of the kernel. The interface (IOCTL or
> kvm_run field) is generic, no?
> 
> I would just fire this notification from generic code, evaluate the
> error (that was lacking so far), and only report it if it's something
> else than "not supported".

Yes, it should live in hw/kvmclock.c preferably.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5 V4] Avoid soft lockup message when KVM is stopped by host

2011-12-01 Thread Marcelo Tosatti
On Tue, Nov 29, 2011 at 04:35:34PM -0500, Eric B Munson wrote:
> Changes from V3:
> Include CC's on patch 3
> Drop clear flag ioctl and have the watchdog clear the flag when it is reset
> 
> Changes from V2:
> A new kvm functions defined in kvm_para.h, the only change to pvclock is the
> initial flag definition
> 
> Changes from V1:
> (Thanks Marcelo)
> Host code has all been moved to arch/x86/kvm/x86.c
> KVM_PAUSE_GUEST was renamed to KVM_GUEST_PAUSED
> 
> When a guest kernel is stopped by the host hypervisor it can look like a soft
> lockup to the guest kernel.  This false warning can mask later soft lockup
> warnings which may be real.  This patch series adds a method for a host
> hypervisor to communicate to a guest kernel that it is being stopped.  The
> final patch in the series has the watchdog check this flag when it goes to
> issue a soft lockup warning and skip the warning if the guest knows it was
> stopped.
> 
> It was attempted to solve this in Qemu, but the side effects of saving and
> restoring the clock and tsc for each vcpu put the wall clock of the guest 
> behind
> by the amount of time of the pause.  This forces a guest to have ntp running
> in order to keep the wall clock accurate.

Looks good to me.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework

2011-12-01 Thread Alex Williamson
On Thu, 2011-12-01 at 14:58 -0600, Stuart Yoder wrote:
> >> The attributes are not intrinsic features of the domain.  User space will
> >> need to set them.  But in thinking about it a bit more I think the 
> >> attributes
> >> are more properties of the domain rather than a per map() operation
> >> characteristic.  I think a separate API might be appropriate.  Define a
> >> new set_domain_attrs() op in the iommu_ops.In user space, perhaps
> >>  a new vfio group API-- VFIO_GROUP_SET_ATTRS,
> >> VFIO_GROUP_GET_ATTRS.
> >
> > In that case, you should definitely be following what Alexey is thinking
> > about with an iommu_setup IOMMU API callback.  I think it's shaping up
> > to do:
> >
> > x86:
> >  - Report any IOVA range restrictions imposed by hw implementation
> > POWER:
> >  - Request IOVA window size, report size and base
> > powerpc:
> >  - Set domain attributes, probably report range as well.
> 
> One other mechanism we need as well is the ability to
> enable/disable a domain.
> 
> For example-- suppose a device is assigned to a VM, the
> device is in use when the VM is abruptly terminated.  The
> VM terminate would shut off DMA at the IOMMU, but now
> the device is in an indeterminate state.   Some devices
> have no simple reset bit and getting the device back into
> a sane state could be complicated-- something the hypervisor
> doesn't want to do.
> 
> So now KVM restarts the VM, vfio init happens for the device
> and  the IOMMU for that device is re-configured,
> etc, but we really can't re-enable DMA until the guest OS tells us
> (via an hcall) that it is ready.   The guest needs to get the
> assigned device in a sane state before DMA is enabled.

Giant red flag.  We need to paravirtualize the guest?  Not on x86.  Some
devices are better for assignment than others.  PCI devices are moving
towards supporting standard reset mechanisms.

> Does this warrant a new domain enable/disable API, or should
> we make this part of the setup API we are discussing
> here?

What's wrong with simply not adding any DMA mapping entries until you
think your guest is ready?  Isn't that effectively the same thing?
Unmap ~= disable.  If the IOMMU API had a mechanism to toggle the iommu
domain on and off, I wouldn't be opposed to adding an ioctl to do it,
but it really seems like just a shortcut vs map/unmap.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH] vfio: VFIO Driver core framework

2011-12-01 Thread Stuart Yoder
>> The attributes are not intrinsic features of the domain.  User space will
>> need to set them.  But in thinking about it a bit more I think the attributes
>> are more properties of the domain rather than a per map() operation
>> characteristic.  I think a separate API might be appropriate.  Define a
>> new set_domain_attrs() op in the iommu_ops.    In user space, perhaps
>>  a new vfio group API-- VFIO_GROUP_SET_ATTRS,
>> VFIO_GROUP_GET_ATTRS.
>
> In that case, you should definitely be following what Alexey is thinking
> about with an iommu_setup IOMMU API callback.  I think it's shaping up
> to do:
>
> x86:
>  - Report any IOVA range restrictions imposed by hw implementation
> POWER:
>  - Request IOVA window size, report size and base
> powerpc:
>  - Set domain attributes, probably report range as well.

One other mechanism we need as well is the ability to
enable/disable a domain.

For example-- suppose a device is assigned to a VM, the
device is in use when the VM is abruptly terminated.  The
VM terminate would shut off DMA at the IOMMU, but now
the device is in an indeterminate state.   Some devices
have no simple reset bit and getting the device back into
a sane state could be complicated-- something the hypervisor
doesn't want to do.

So now KVM restarts the VM, vfio init happens for the device
and  the IOMMU for that device is re-configured,
etc, but we really can't re-enable DMA until the guest OS tells us
(via an hcall) that it is ready.   The guest needs to get the
assigned device in a sane state before DMA is enabled.

Does this warrant a new domain enable/disable API, or should
we make this part of the setup API we are discussing
here?

Stuart
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V3 2/4] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

2011-12-01 Thread Raghavendra K T

[ CCing Jeremy's new email ID ]
Hi Avi,
Thanks for review and inputs.

On 12/01/2011 04:41 PM, Avi Kivity wrote:

On 11/30/2011 10:59 AM, Raghavendra K T wrote:

The hypercall needs to be documented in
Documentation/virtual/kvm/hypercalls.txt.



Yes, Sure 'll document.  hypercalls.txt is a new file right?. also I 
have to document APIs in Documentation/virtual/kvm/api.txt.



Have you tested it on AMD machines?  There are some differences in the
hypercall infrastructure there.


Yes. 'll test on AMD machine and update on that.




  /* This indicates that the new set of kvmclock msrs
   * are available. The use of 0x11 and 0x12 is deprecated
   */
  #define KVM_FEATURE_CLOCKSOURCE23
  #define KVM_FEATURE_ASYNC_PF  4
  #define KVM_FEATURE_STEAL_TIME5
+#define KVM_FEATURE_KICK_VCPU  6


Documentation/virtual/kvm/cpuid.txt.


Yes. 'll do that.





  /* The last 8 bits are used to indicate how to interpret the flags field
   * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c38efd7..6e1c8b4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2103,6 +2103,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
case KVM_CAP_GET_TSC_KHZ:
+   case KVM_CAP_KICK_VCPU:


This is redundant with the feature bit?  In general, KVM_CAP is for the
host API, while KVM_FEATURE is for the guest API.



No its not. I have to check the flow again. I understood that this 
completes the guest querying, that checks for KVM_FEATURE availability.

Did I miss something? I need a little deep dive though.



+/*
+ * kvm_pv_kick_cpu_op:  Kick a vcpu.
+ *
+ * @cpu - vcpu to be kicked.
+ */
+static void kvm_pv_kick_cpu_op(struct kvm *kvm, int cpu)
+{
+   struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, cpu);


There is no guarantee that guest cpu numbers match host vcpu numbers.
Use APIC IDs, and kvm_apic_match_dest().


Okay. I have to dig more on this.




+   struct kvm_mp_state mp_state;
+
+   mp_state.mp_state = KVM_MP_STATE_RUNNABLE;
+   if (vcpu) {
+   vcpu->kicked = 1;
+   /* Ensure kicked is always set before wakeup */
+   barrier();
+   }
+   kvm_arch_vcpu_ioctl_set_mpstate(vcpu,&mp_state);


This must only be called from the vcpu thread.


Hmm. Okay. 'll check this.




+   kvm_vcpu_kick(vcpu);
+}
+

struct kvm_vcpu_arch arch;
+
+   /*
+* blocked vcpu wakes up by checking this flag set by unlocker.
+*/
+   int kicked;



Write only variable.


Yes 'll remove comments.



An alternative approach is to use an MSR protocol like x2apic ICR.



'll explore on this too.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Veirfy memory slot only for readability

2011-12-01 Thread Sasha Levin
It's enough for memory slot to be readable, as the comment above the check
states.

A user should be able to create read-only memory slot.

Cc: Avi Kivity 
Cc: Marcelo Tosatti 
Cc: kvm@vger.kernel.org
Signed-off-by: Sasha Levin 
---
 virt/kvm/kvm_main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e289486..b92883f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -727,7 +727,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
/* We can read the guest memory with __xxx_user() later on. */
if (user_alloc &&
((mem->userspace_addr & (PAGE_SIZE - 1)) ||
-!access_ok(VERIFY_WRITE,
+!access_ok(VERIFY_READ,
(void __user *)(unsigned long)mem->userspace_addr,
mem->memory_size)))
goto out;
-- 
1.7.8.rc4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: SVM: Notify if SVM is already in use

2011-12-01 Thread Sasha Levin
Currently we silently fail if SVM is already in use by a different
virtualization technology.

This is bad since it's non-obvious for the user, and its not too uncommon
for users to have several of these installed on same host.

This patch adds a message to notify the user of the problem.

Cc: Joerg Roedel 
Cc: Avi Kivity 
Cc: Marcelo Tosatti 
Cc: kvm@vger.kernel.org
Signed-off-by: Sasha Levin 
---
 arch/x86/kvm/svm.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index e32243e..8d1a066 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -587,8 +587,11 @@ static int svm_hardware_enable(void *garbage)
int me = raw_smp_processor_id();
 
rdmsrl(MSR_EFER, efer);
-   if (efer & EFER_SVME)
+   if (efer & EFER_SVME) {
+   printk(KERN_ERR "svm_hardware_enable: SVM already in use on 
CPU%d. "
+   "Are you already another hypervisor?\n", me);
return -EBUSY;
+   }
 
if (!has_svm()) {
printk(KERN_ERR "svm_hardware_enable: err EOPNOTSUPP on %d\n",
-- 
1.7.8.rc4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: VMX: Notify if VMX is already in use

2011-12-01 Thread Sasha Levin
Currently we silently fail if VMX is already in use by a different
virtualization technology.

This is bad since it's non-obvious for the user, and its not too uncommon
for users to have several of these installed on same host.

This patch adds a message to notify the user of the problem.

Cc: Joerg Roedel 
Cc: Avi Kivity 
Cc: Marcelo Tosatti 
Cc: kvm@vger.kernel.org
Signed-off-by: Sasha Levin 
---
 arch/x86/kvm/vmx.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4ceced2..0ef59ed 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2300,8 +2300,11 @@ static int hardware_enable(void *garbage)
u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
u64 old, test_bits;
 
-   if (read_cr4() & X86_CR4_VMXE)
+   if (read_cr4() & X86_CR4_VMXE) {
+   printk(KERN_ERR "hardware_enable: VMX already in use on CPU%d. "
+   "Are you already another hypervisor?\n", cpu);
return -EBUSY;
+   }
 
INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
rdmsrl(MSR_IA32_FEATURE_CONTROL, old);
-- 
1.7.8.rc4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-12-01 Thread Dipankar Sarma
On Thu, Dec 01, 2011 at 06:36:23PM +0100, Andrea Arcangeli wrote:
> On Thu, Dec 01, 2011 at 10:55:20PM +0530, Dipankar Sarma wrote:
> > On Wed, Nov 30, 2011 at 06:41:13PM +0100, Andrea Arcangeli wrote:
> > > On Wed, Nov 30, 2011 at 09:52:37PM +0530, Dipankar Sarma wrote:
> > > > create the guest topology correctly and optimize for NUMA. This
> > > > would work for us.
> > > 
> > > Even on the case of 1 guest that fits in one node, you're not going to
> > > max out the full bandwidth of all memory channels with this.
> > > 
> > > qemu all can do with ms_mbind/tbind is to create a vtopology that
> > > matches the hardware topology. It has these limits:
> > > 
> > > 1) requires all userland applications to be modified to scan either
> > >the physical topology if run on host, or the vtopology if run on
> > >guest to get the full benefit.
> > 
> > Not sure why you would need that. qemu can reflect the
> > topology based on -numa specifications and the corresponding
> > ms_tbind/mbind in FDT (in the case of Power, I guess ACPI
> > tables for x86) and guest kernel would detect this virtualized
> > topology. So there is no need for two types of topologies afaics.
> > It will all be reflected in /sys/devices/system/node in the guest.
> 
> The point is: what a vtopology gives you? If you don't modify all apps
> running in the guest to use it? vtopology on guest, helps exactly like
> the topology on host -> very little unless you modify qemu on host to
> use ms_tbind/mbind.

Sure, ms_tbind/mbind will be needed in qemu. For the rest, NUMA aware apps 
already use topology while running on physical systems and they wouldn't need 
modification for this kind of virtualized topology.

Thanks
Dipankar

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-12-01 Thread Peter Zijlstra
On Wed, 2011-11-23 at 16:03 +0100, Andrea Arcangeli wrote:
> Hi!
> 
> On Mon, Nov 21, 2011 at 07:51:21PM -0600, Anthony Liguori wrote:
> > Fundamentally, the entity that should be deciding what memory should be 
> > present 
> > and where it should located is the kernel.  I'm fundamentally opposed to 
> > trying 
> > to make QEMU override the scheduler/mm by using cpu or memory pinning in 
> > QEMU.
> > 
> >  From what I can tell about ms_mbind(), it just uses process knowledge to 
> > bind 
> > specific areas of memory to a memsched group and let's the kernel decide 
> > what to 
> > do with that knowledge.  This is exactly the type of interface that QEMU 
> > should 
> > be using.
> > 
> > QEMU should tell the kernel enough information such that the kernel can 
> > make 
> > good decisions.  QEMU should not be the one making the decisions.
> 
> True, QEMU won't have to decide where the memory and vcpus should be
> located (but hey it wouldn't need to decide that even if you use
> cpusets, you can use relative mbind with cpusets, the admin or a
> cpuset job scheduler could decide) but it's still QEMU making the
> decision of what memory and which vcpus threads to
> ms_mbind/ms_tbind. Think how you're going to create the input of those
> syscalls...
> 
> If it wasn't qemu to decide that, qemu wouldn't be required to scan
> the whole host physical numa (cpu/memory) topology in order to create
> the "input" arguments of "ms_mbind/ms_tbind".

That's a plain falsehood, you don't need to scan host physcal topology
in order to create useful ms_[mt]bind arguments. You can use physical
topology to optimize for particular hardware, but its not a strict
requirement.

>  And when you migrate the
> VM to another host, the whole vtopology may be counter-productive
> because the kernel isn't automatically detecting the numa affinity
> between threads and the guest vtopology will stick to whatever numa
> _physical_ topology that was seen on the first node where the VM was
> created.

This doesn't make any sense at all.

> I doubt that the assumption that all cloud nodes will have the same
> physical numa topology is reasonable.

So what? If you want to be very careful you can make sure you vnodes are
small enough they fit any any physical node in your cloud (god I f*king
hate that word).

If you're slightly less careful, things will still work, you might get
less max parallelism, but typically (from what I understood) these VM
hosting thingies are overloaded so you never get your max cpu anyway, so
who cares.

Things is, whatever you set-up it will always work, it might not be
optimal, but the one guarantee: [threads,vrange] will stay on the same
node will be kept true, no matter where you run it.

Also, migration between non-identical hosts is always 'tricky'. You're
always stuck with some minimally supported subset or average case thing.
Really, why do you think NUMA would be any different.

> Furthermore to get the same benefits that qemu gets on host by using
> ms_mbind/ms_tbind, every single guest application should be modified
> to scan the guest vtopology and call ms_mbind/ms_tbind too (or use the
> hard bindings which is what we try to avoid).

No! ms_[tm]bind() is just part of the solution, the other part is what
to do for simple programs, and like I wrote in my email earlier, and
what we talked about in Prague, is that for normal simple proglets we
simply pick a numa node and stick to it. Much like:

 http://home.arcor.de/efocht/sched/

Except we could actually migrate the whole thing if needed. Basically
you give each task its own 1 vnode and assign all threads to it.

Only big programs that need to span multiple nodes need to be modified
to get best advantage of numa. But that has always been true.

> In my view the trouble of the numa hard bindings is not the fact
> they're hard and qemu has to also decide the location (in fact it
> doesn't need to decide the location if you use cpusets and relative
> mbinds). The bigger problem is the fact either the admin or the app
> developer has to explicitly scan the numa physical topology (both cpus
> and memory) and tell the kernel how much memory to bind to each
> thread. ms_mbind/ms_tbind only partially solve that problem. They're
> similar to the mbind MPOL_F_RELATIVE_NODES with cpusets, except you
> don't need an admin or a cpuset-job-scheduler (or a perl script) to
> redistribute the hardware resources.

You're full of crap Andrea. 

Yes you need some clue as to your actual topology, but that's life, you
can't get SMP for free either, you need to have some clue.

Just like with regular SMP where you need to be aware of data sharing,
NUMA just makes it worse. If your app decomposes well enough to create a
vnode per thread, that's excellent, if you want to scale your app to fit
your machine that's fine too, heck, every multi-threaded app out there
worth using already queries machine topology one way or another, its not
a big deal.

But cpusets and relative_nodes d

Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-12-01 Thread Andrea Arcangeli
On Thu, Dec 01, 2011 at 10:55:20PM +0530, Dipankar Sarma wrote:
> On Wed, Nov 30, 2011 at 06:41:13PM +0100, Andrea Arcangeli wrote:
> > On Wed, Nov 30, 2011 at 09:52:37PM +0530, Dipankar Sarma wrote:
> > > create the guest topology correctly and optimize for NUMA. This
> > > would work for us.
> > 
> > Even on the case of 1 guest that fits in one node, you're not going to
> > max out the full bandwidth of all memory channels with this.
> > 
> > qemu all can do with ms_mbind/tbind is to create a vtopology that
> > matches the hardware topology. It has these limits:
> > 
> > 1) requires all userland applications to be modified to scan either
> >the physical topology if run on host, or the vtopology if run on
> >guest to get the full benefit.
> 
> Not sure why you would need that. qemu can reflect the
> topology based on -numa specifications and the corresponding
> ms_tbind/mbind in FDT (in the case of Power, I guess ACPI
> tables for x86) and guest kernel would detect this virtualized
> topology. So there is no need for two types of topologies afaics.
> It will all be reflected in /sys/devices/system/node in the guest.

The point is: what a vtopology gives you? If you don't modify all apps
running in the guest to use it? vtopology on guest, helps exactly like
the topology on host -> very little unless you modify qemu on host to
use ms_tbind/mbind.

> > 2) breaks across live migration if host physical topology changes
> 
> That is indeed an issue. Either VM placement software needs to
> be really smart to migrate VMs that fit well or, more likely,
> we will have to find a way to make guest kernels aware of
> topology changes. But the latter has impact on userspace
> as well for applications that might have optimized for NUMA.

Making guest kernel aware about "memory" topology changes is going to
be a whole mess. Or at least harder than memory hotplug.

> I agree. Specifying NUMA topology for guest can result in
> sub-optimal performance in some cases, it is a tradeoff.

I see it more like a limit of this solution, which is a common limit
to the hard bindings than a tradeoff.

> Agreed.

Yep I just wanted to make clear the limits remains with this solution.

I'll try to teach knumad to detect thread<->memory affinity too with
some logic, we'll see how well that can work.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Guest stop notification

2011-12-01 Thread Jan Kiszka
On 2011-12-01 18:22, Eric B Munson wrote:
> On Thu, 01 Dec 2011, Jan Kiszka wrote:
> 
>> On 2011-11-29 22:36, Eric B Munson wrote:
>>> Often when a guest is stopped from the qemu console, it will report spurious
>>> soft lockup warnings on resume.  There are kernel patches being discussed 
>>> that
>>> will give the host the ability to tell the guest that it is being stopped 
>>> and
>>> should ignore the soft lockup warning that generates.
>>>
>>> Signed-off-by: Eric B Munson 
>>> Cc: ry...@linux.vnet.ibm.com
>>> Cc: aligu...@us.ibm.com
>>> Cc: mtosa...@redhat.com
>>> Cc: a...@redhat.com
>>> Cc: kvm@vger.kernel.org
>>> Cc: linux-ker...@vger.kernel.org
>>> ---
>>>  target-i386/kvm.c |6 ++
>>>  1 files changed, 6 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
>>> index 5bfc21f..defd364 100644
>>> --- a/target-i386/kvm.c
>>> +++ b/target-i386/kvm.c
>>> @@ -336,12 +336,18 @@ static int kvm_inject_mce_oldstyle(CPUState *env)
>>>  return 0;
>>>  }
>>>  
>>> +static void kvm_put_guest_paused(CPUState *penv)
>>> +{
>>> +kvm_vcpu_ioctl(penv, KVM_GUEST_PAUSED, 0);
>>> +}
>>
>> I see no need in encapsulating this in a separate function.
>>
>>> +
>>>  static void cpu_update_state(void *opaque, int running, RunState state)
>>>  {
>>>  CPUState *env = opaque;
>>>  
>>>  if (running) {
>>>  env->tsc_valid = false;
>>> +   kvm_put_guest_paused(env);
>>
>> checkpatch.pl would have asked you to remove this tab.
>>
>> More general:
>>
>> Why is this x86-only? If the kernel interface is x86-only, what prevents
>> making it generic right from the beginning?
> 
> Sorry, missed this question on the first pass, this is x86 only because the
> flag used lives in the pvclock structure.  AFAICT, there aren't any other
> architectures out there that implement paravirtualized clocks yet.

That's an implementation "detail" of the kernel. The interface (IOCTL or
kvm_run field) is generic, no?

I would just fire this notification from generic code, evaluate the
error (that was lacking so far), and only report it if it's something
else than "not supported".

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Guest stop notification

2011-12-01 Thread Jan Kiszka
On 2011-12-01 18:31, Arend van Spriel wrote:
> On 12/01/2011 06:19 PM, Eric B Munson wrote:
>> On Thu, 01 Dec 2011, Jan Kiszka wrote:
>>
>>> On 2011-11-29 22:36, Eric B Munson wrote:
 +
  static void cpu_update_state(void *opaque, int running, RunState state)
  {
  CPUState *env = opaque;
  
  if (running) {
  env->tsc_valid = false;
 +  kvm_put_guest_paused(env);
>>>
>>> checkpatch.pl would have asked you to remove this tab.
>>
>> Will change to spaces for V2.
>>
> 
> Huh. Remove what tab? Indent are always 8-space TAB according
> CodingStyle doc:
> 
> Chapter 1: Indentation
> 
> Tabs are 8 characters, and thus indentations are also 8 characters.
> There are heretic movements that try to make indentations 4 (or even 2!)
> characters deep, and that is akin to trying to define the value of PI to
> be 3.

Don't worry, this is not kernel code. :) QEMU uses a different coding style.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Guest stop notification

2011-12-01 Thread Jan Kiszka
On 2011-12-01 18:19, Eric B Munson wrote:
> On Thu, 01 Dec 2011, Jan Kiszka wrote:
> 
>> On 2011-11-29 22:36, Eric B Munson wrote:
>>> Often when a guest is stopped from the qemu console, it will
>>> report spurious soft lockup warnings on resume.  There are
>>> kernel patches being discussed that will give the host the
>>> ability to tell the guest that it is being stopped and should
>>> ignore the soft lockup warning that generates.
>>> 
>>> Signed-off-by: Eric B Munson  Cc:
>>> ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc:
>>> mtosa...@redhat.com Cc: a...@redhat.com Cc: kvm@vger.kernel.org 
>>> Cc: linux-ker...@vger.kernel.org --- target-i386/kvm.c |6
>>> ++ 1 files changed, 6 insertions(+), 0 deletions(-)
>>> 
>>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c index
>>> 5bfc21f..defd364 100644 --- a/target-i386/kvm.c +++
>>> b/target-i386/kvm.c @@ -336,12 +336,18 @@ static int
>>> kvm_inject_mce_oldstyle(CPUState *env) return 0; }
>>> 
>>> +static void kvm_put_guest_paused(CPUState *penv) +{ +
>>> kvm_vcpu_ioctl(penv, KVM_GUEST_PAUSED, 0); +}
>> 
>> I see no need in encapsulating this in a separate function.
>> 
> 
> The encapsulated function was from a previous idea, I will remove
> it for V2.
> 
>>> + static void cpu_update_state(void *opaque, int running,
>>> RunState state) { CPUState *env = opaque;
>>> 
>>> if (running) { env->tsc_valid = false; +
>>> kvm_put_guest_paused(env);
>> 
>> checkpatch.pl would have asked you to remove this tab.
> 
> Will change to spaces for V2.
> 
>> 
>> More general:
>> 
>> Why is this x86-only? If the kernel interface is x86-only, what
>> prevents making it generic right from the beginning?
>> 
>> Why do we need a new IOCTL for this? Was there no space left in
>> the kvm_run structure e.g. to pass this flag down on next vcpu
>> execution? No big deal, just wondering.
> 
> Thanks for your review/feedback.
> 
> When I started looking into this problem, the ioctl was the first
> suggestion I got for how to communicate from qemu to guest kernel.
> I don't see a technical reason that this could not be added to the
> kvm_run structure in one of the bytes currently used as padding.  I
> would prefer to keep the ioctl because I have the corresponding
> kernel patches out to work with this, however, if there is a strong
> preference for using kvm_run, I can rework both sets.

My feeling is that a run field would be more elegant, but I might be
wrong on this as well. In any case: You need KVM_CAP in your kernel
interface to announce the new feature and you have to sync in the new
kernel header into QEMU to make it build.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Guest stop notification

2011-12-01 Thread Arend van Spriel
On 12/01/2011 06:19 PM, Eric B Munson wrote:
> On Thu, 01 Dec 2011, Jan Kiszka wrote:
> 
>> On 2011-11-29 22:36, Eric B Munson wrote:
>>> +
>>>  static void cpu_update_state(void *opaque, int running, RunState state)
>>>  {
>>>  CPUState *env = opaque;
>>>  
>>>  if (running) {
>>>  env->tsc_valid = false;
>>> +   kvm_put_guest_paused(env);
>>
>> checkpatch.pl would have asked you to remove this tab.
> 
> Will change to spaces for V2.
> 

Huh. Remove what tab? Indent are always 8-space TAB according
CodingStyle doc:

Chapter 1: Indentation

Tabs are 8 characters, and thus indentations are also 8 characters.
There are heretic movements that try to make indentations 4 (or even 2!)
characters deep, and that is akin to trying to define the value of PI to
be 3.


Gr. AvS

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] Guest stop notification

2011-12-01 Thread Eric B Munson
Often when a guest is stopped from the qemu console, it will report spurious
soft lockup warnings on resume.  There are kernel patches being discussed that
will give the host the ability to tell the guest that it is being stopped and
should ignore the soft lockup warning that generates.  This patch uses the qemu
Notifier system to tell the guest it is about to be stopped.

Signed-off-by: Eric B Munson 
Cc: Avi Kivity 
Cc: Marcelo Tosatti 
Cc: Jan Kiszka 
Cc: ry...@linux.vnet.ibm.com
Cc: aligu...@us.ibm.com
Cc: linux-ker...@vger.kernel.org
Cc: kvm@vger.kernel.org
---
 target-i386/kvm.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5bfc21f..12ac91a 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -342,6 +342,7 @@ static void cpu_update_state(void *opaque, int running, 
RunState state)
 
 if (running) {
 env->tsc_valid = false;
+kvm_vcpu_ioctl(env, KVM_GUEST_PAUSED, 0);
 }
 }
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-12-01 Thread Dipankar Sarma
On Wed, Nov 30, 2011 at 06:41:13PM +0100, Andrea Arcangeli wrote:
> On Wed, Nov 30, 2011 at 09:52:37PM +0530, Dipankar Sarma wrote:
> > create the guest topology correctly and optimize for NUMA. This
> > would work for us.
> 
> Even on the case of 1 guest that fits in one node, you're not going to
> max out the full bandwidth of all memory channels with this.
> 
> qemu all can do with ms_mbind/tbind is to create a vtopology that
> matches the hardware topology. It has these limits:
> 
> 1) requires all userland applications to be modified to scan either
>the physical topology if run on host, or the vtopology if run on
>guest to get the full benefit.

Not sure why you would need that. qemu can reflect the
topology based on -numa specifications and the corresponding
ms_tbind/mbind in FDT (in the case of Power, I guess ACPI
tables for x86) and guest kernel would detect this virtualized
topology. So there is no need for two types of topologies afaics.
It will all be reflected in /sys/devices/system/node in the guest.

> 
> 2) breaks across live migration if host physical topology changes

That is indeed an issue. Either VM placement software needs to
be really smart to migrate VMs that fit well or, more likely,
we will have to find a way to make guest kernels aware of
topology changes. But the latter has impact on userspace
as well for applications that might have optimized for NUMA.

> 3) 1 small guest on a idle numa system that fits in one numa node will
>tell not enough information to the host kernel
> 
> 4) if used outside of qemu and one threads allocates more memory than
>what fits in one node it won't tell enough info to the host kernel.
> 
> About 3): if you've just one guest that fits in one node, each vcpu
> should be spread across all the nodes probably, and behave like
> MADV_INTERLEAVE if the guest CPU scheduler migrate guests processes in
> reverse, the global memory bandwidth will still be used full even if
> they will both access remote memory. I've just seen benchmarks where
> no pinning runs more than _twice_ as fast than pinning with just 1
> guest and only 10 vcpu threads, probably because of that.

I agree. Specifying NUMA topology for guest can result in
sub-optimal performance in some cases, it is a tradeoff.


> In short it's an incremental step that moves some logic to the kernel
> but I don't see it solving all situations optimally and it shares a
> lot of the limits of the hard bindings.

Agreed.

Thanks
Dipankar

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Guest stop notification

2011-12-01 Thread Eric B Munson
On Thu, 01 Dec 2011, Jan Kiszka wrote:

> On 2011-11-29 22:36, Eric B Munson wrote:
> > Often when a guest is stopped from the qemu console, it will report spurious
> > soft lockup warnings on resume.  There are kernel patches being discussed 
> > that
> > will give the host the ability to tell the guest that it is being stopped 
> > and
> > should ignore the soft lockup warning that generates.
> > 
> > Signed-off-by: Eric B Munson 
> > Cc: ry...@linux.vnet.ibm.com
> > Cc: aligu...@us.ibm.com
> > Cc: mtosa...@redhat.com
> > Cc: a...@redhat.com
> > Cc: kvm@vger.kernel.org
> > Cc: linux-ker...@vger.kernel.org
> > ---
> >  target-i386/kvm.c |6 ++
> >  1 files changed, 6 insertions(+), 0 deletions(-)
> > 
> > diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> > index 5bfc21f..defd364 100644
> > --- a/target-i386/kvm.c
> > +++ b/target-i386/kvm.c
> > @@ -336,12 +336,18 @@ static int kvm_inject_mce_oldstyle(CPUState *env)
> >  return 0;
> >  }
> >  
> > +static void kvm_put_guest_paused(CPUState *penv)
> > +{
> > +kvm_vcpu_ioctl(penv, KVM_GUEST_PAUSED, 0);
> > +}
> 
> I see no need in encapsulating this in a separate function.
> 
> > +
> >  static void cpu_update_state(void *opaque, int running, RunState state)
> >  {
> >  CPUState *env = opaque;
> >  
> >  if (running) {
> >  env->tsc_valid = false;
> > +   kvm_put_guest_paused(env);
> 
> checkpatch.pl would have asked you to remove this tab.
> 
> More general:
> 
> Why is this x86-only? If the kernel interface is x86-only, what prevents
> making it generic right from the beginning?

Sorry, missed this question on the first pass, this is x86 only because the
flag used lives in the pvclock structure.  AFAICT, there aren't any other
architectures out there that implement paravirtualized clocks yet.

> 
> Why do we need a new IOCTL for this? Was there no space left in the
> kvm_run structure e.g. to pass this flag down on next vcpu execution? No
> big deal, just wondering.
> 
> Jan
> 
> -- 
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux
> 


signature.asc
Description: Digital signature


Re: [PATCH] Guest stop notification

2011-12-01 Thread Eric B Munson
On Thu, 01 Dec 2011, Jan Kiszka wrote:

> On 2011-11-29 22:36, Eric B Munson wrote:
> > Often when a guest is stopped from the qemu console, it will report spurious
> > soft lockup warnings on resume.  There are kernel patches being discussed 
> > that
> > will give the host the ability to tell the guest that it is being stopped 
> > and
> > should ignore the soft lockup warning that generates.
> > 
> > Signed-off-by: Eric B Munson 
> > Cc: ry...@linux.vnet.ibm.com
> > Cc: aligu...@us.ibm.com
> > Cc: mtosa...@redhat.com
> > Cc: a...@redhat.com
> > Cc: kvm@vger.kernel.org
> > Cc: linux-ker...@vger.kernel.org
> > ---
> >  target-i386/kvm.c |6 ++
> >  1 files changed, 6 insertions(+), 0 deletions(-)
> > 
> > diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> > index 5bfc21f..defd364 100644
> > --- a/target-i386/kvm.c
> > +++ b/target-i386/kvm.c
> > @@ -336,12 +336,18 @@ static int kvm_inject_mce_oldstyle(CPUState *env)
> >  return 0;
> >  }
> >  
> > +static void kvm_put_guest_paused(CPUState *penv)
> > +{
> > +kvm_vcpu_ioctl(penv, KVM_GUEST_PAUSED, 0);
> > +}
> 
> I see no need in encapsulating this in a separate function.
> 

The encapsulated function was from a previous idea, I will remove it for V2.

> > +
> >  static void cpu_update_state(void *opaque, int running, RunState state)
> >  {
> >  CPUState *env = opaque;
> >  
> >  if (running) {
> >  env->tsc_valid = false;
> > +   kvm_put_guest_paused(env);
> 
> checkpatch.pl would have asked you to remove this tab.

Will change to spaces for V2.

> 
> More general:
> 
> Why is this x86-only? If the kernel interface is x86-only, what prevents
> making it generic right from the beginning?
> 
> Why do we need a new IOCTL for this? Was there no space left in the
> kvm_run structure e.g. to pass this flag down on next vcpu execution? No
> big deal, just wondering.

Thanks for your review/feedback.

When I started looking into this problem, the ioctl was the first suggestion I
got for how to communicate from qemu to guest kernel.  I don't see a technical
reason that this could not be added to the kvm_run structure in one of the
bytes currently used as padding.  I would prefer to keep the ioctl because I
have the corresponding kernel patches out to work with this, however, if there
is a strong preference for using kvm_run, I can rework both sets.

Eric



signature.asc
Description: Digital signature


Re: [Android-virt] [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions

2011-12-01 Thread Catalin Marinas
On Thu, Dec 01, 2011 at 04:44:40PM +, Arnd Bergmann wrote:
> On Thursday 01 December 2011, Catalin Marinas wrote:
> > On Thu, Dec 01, 2011 at 03:42:19PM +, Arnd Bergmann wrote:
> > > On Thursday 01 December 2011, Catalin Marinas wrote:
> > > How do you deal with signed integer arguments passed into SVC or HVC from
> > > a caller? If I understand the architecture correctly, the upper
> > > halves of the argument register end up zero-padded, while the callee
> > > expects sign-extension.
> > 
> > If you treat it as an "int" (32-bit) and function prototype defined
> > accordingly, then the generated code only accesses it as a W (rather
> > than X) register and the top 32-bit part is ignored (no need for
> > sign-extension). If it is defined as a "long" in the 32-bit world, then
> > it indeed needs explicit conversion given the different sizes for long
> > (for example sys_lseek, the second argument is a 'long' and we do
> > explicit sign extension in the wrapper).
...
> What about unsigned long and pointer? Can we always rely on the upper
> half of the register to be zero-filled when we get an exception from 32
> bit into 64 bit state, or do we also have to zero-extend those?

They are also fine, no need for zero-extension.

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Android-virt] [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions

2011-12-01 Thread Arnd Bergmann
On Thursday 01 December 2011, Catalin Marinas wrote:
> On Thu, Dec 01, 2011 at 03:42:19PM +, Arnd Bergmann wrote:
> > On Thursday 01 December 2011, Catalin Marinas wrote:
> > How do you deal with signed integer arguments passed into SVC or HVC from
> > a caller? If I understand the architecture correctly, the upper
> > halves of the argument register end up zero-padded, while the callee
> > expects sign-extension.
> 
> If you treat it as an "int" (32-bit) and function prototype defined
> accordingly, then the generated code only accesses it as a W (rather
> than X) register and the top 32-bit part is ignored (no need for
> sign-extension). If it is defined as a "long" in the 32-bit world, then
> it indeed needs explicit conversion given the different sizes for long
> (for example sys_lseek, the second argument is a 'long' and we do
> explicit sign extension in the wrapper).

Ok, so it's actually different from most other 64 bit architectures, which
normally operate on 64-bit registers and expect the caller to do the
correct sign-extension.

Doing the sign-extension for long arguments then falls into the same
category as long long and unsigned long long arguments, which also need
a wrapper, as you mentioned. Strictly speaking, we only need to do it
for those where the long argument has a meaning outside of the 0..2^31
range, e.g. io_submit can only take small positive numbers although
the type is 'long'.

What about unsigned long and pointer? Can we always rely on the upper
half of the register to be zero-filled when we get an exception from 32
bit into 64 bit state, or do we also have to zero-extend those?

Arnd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Android-virt] [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions

2011-12-01 Thread Catalin Marinas
On Thu, Dec 01, 2011 at 03:42:19PM +, Arnd Bergmann wrote:
> On Thursday 01 December 2011, Catalin Marinas wrote:
> > Given the way register banking is done on AArch64, issuing an HVC on a
> > 32-bit guest OS doesn't require translation on a 64-bit hypervisor. We
> > have a similar implementation at the SVC level (for 32-bit user apps on
> > a 64-bit kernel), the only modification was where a 32-bit SVC takes a
> > 64-bit parameter in two separate 32-bit registers, so packing needs to
> > be done in a syscall wrapper.
> 
> How do you deal with signed integer arguments passed into SVC or HVC from
> a caller? If I understand the architecture correctly, the upper
> halves of the argument register end up zero-padded, while the callee
> expects sign-extension.

If you treat it as an "int" (32-bit) and function prototype defined
accordingly, then the generated code only accesses it as a W (rather
than X) register and the top 32-bit part is ignored (no need for
sign-extension). If it is defined as a "long" in the 32-bit world, then
it indeed needs explicit conversion given the different sizes for long
(for example sys_lseek, the second argument is a 'long' and we do
explicit sign extension in the wrapper).

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Android-virt] [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions

2011-12-01 Thread Ian Campbell
On Thu, 2011-12-01 at 15:10 +, Catalin Marinas wrote:
> On Thu, Dec 01, 2011 at 10:26:37AM +, Ian Campbell wrote:
> > On Wed, 2011-11-30 at 18:32 +, Stefano Stabellini wrote:
> > > On Wed, 30 Nov 2011, Arnd Bergmann wrote:
> > > > KVM and Xen at least both fall into the single-return-value category,
> > > > so we should be able to agree on a calling conventions. KVM does not
> > > > have an hcall API on ARM yet, and I see no reason not to use the
> > > > same implementation that you have in the Xen guest.
> > > > 
> > > > Stefano, can you split out the generic parts of your asm/xen/hypercall.h
> > > > file into a common asm/hypercall.h and submit it for review to the
> > > > arm kernel list?
> > > 
> > > Sure, I can do that.
> > > Usually the hypercall calling convention is very hypervisor specific,
> > > but if it turns out that we have the same requirements I happy to design
> > > a common interface.
> > 
> > I expect the only real decision to be made is hypercall page vs. raw hvc
> > instruction.
> > 
> > The page was useful on x86 where there is a variety of instructions
> > which could be used (at least for PV there was systenter/syscall/int, I
> > think vmcall instruction differs between AMD and Intel also) and gives
> > some additional flexibility. It's hard to predict but I don't think I'd
> > expect that to be necessary on ARM.
> > 
> > Another reason for having a hypercall page instead of a raw instruction
> > might be wanting to support 32 bit guests (from ~today) on a 64 bit
> > hypervisor in the future and perhaps needing to do some shimming/arg
> > translation. It would be better to aim for having the interface just be
> > 32/64 agnostic but mistakes do happen.
> 
> Given the way register banking is done on AArch64, issuing an HVC on a
> 32-bit guest OS doesn't require translation on a 64-bit hypervisor.

The issue I was thinking about was struct packing for arguments passed
as pointers etc rather than the argument registers themselves. Since the
preference appears to be for raw hvc we should just be careful that they
are agnostic in these.

Ian.

>  We
> have a similar implementation at the SVC level (for 32-bit user apps on
> a 64-bit kernel), the only modification was where a 32-bit SVC takes a
> 64-bit parameter in two separate 32-bit registers, so packing needs to
> be done in a syscall wrapper.
> 
> I'm not closely involved with any of the Xen or KVM work but I would
> vote for using HVC than a hypercall page.
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Android-virt] [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions

2011-12-01 Thread Arnd Bergmann
On Thursday 01 December 2011, Catalin Marinas wrote:
> Given the way register banking is done on AArch64, issuing an HVC on a
> 32-bit guest OS doesn't require translation on a 64-bit hypervisor. We
> have a similar implementation at the SVC level (for 32-bit user apps on
> a 64-bit kernel), the only modification was where a 32-bit SVC takes a
> 64-bit parameter in two separate 32-bit registers, so packing needs to
> be done in a syscall wrapper.

How do you deal with signed integer arguments passed into SVC or HVC from
a caller? If I understand the architecture correctly, the upper
halves of the argument register end up zero-padded, while the callee
expects sign-extension.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-12-01 Thread Avi Kivity
On 11/30/2011 07:15 AM, Takuya Yoshikawa wrote:
> (2011/11/30 14:02), Takuya Yoshikawa wrote:
>
>> IIUC, even though O(1) is O(1) at the timing of GET DIRTY LOG, it
>> needs O(N) write
>> protections with respect to the total number of dirty pages:
>> distributed, but
>> actually each page fault, which should be logged, does some write
>> protection?
>
> Sorry, was not precise.  It depends on the level, and not completely
> distributed.
> But I think it is O(N), and the total number of costs will not change
> so much,
> I guess.

That's true.  But some applications do require low latency, and the
current code can impose a lot of time with the mmu spinlock held.

The total amount of work actually increases slightly, from O(N) to O(N
log N), but since the tree is so wide, the overhead is small.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Android-virt] [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions

2011-12-01 Thread Catalin Marinas
On Thu, Dec 01, 2011 at 10:26:37AM +, Ian Campbell wrote:
> On Wed, 2011-11-30 at 18:32 +, Stefano Stabellini wrote:
> > On Wed, 30 Nov 2011, Arnd Bergmann wrote:
> > > KVM and Xen at least both fall into the single-return-value category,
> > > so we should be able to agree on a calling conventions. KVM does not
> > > have an hcall API on ARM yet, and I see no reason not to use the
> > > same implementation that you have in the Xen guest.
> > > 
> > > Stefano, can you split out the generic parts of your asm/xen/hypercall.h
> > > file into a common asm/hypercall.h and submit it for review to the
> > > arm kernel list?
> > 
> > Sure, I can do that.
> > Usually the hypercall calling convention is very hypervisor specific,
> > but if it turns out that we have the same requirements I happy to design
> > a common interface.
> 
> I expect the only real decision to be made is hypercall page vs. raw hvc
> instruction.
> 
> The page was useful on x86 where there is a variety of instructions
> which could be used (at least for PV there was systenter/syscall/int, I
> think vmcall instruction differs between AMD and Intel also) and gives
> some additional flexibility. It's hard to predict but I don't think I'd
> expect that to be necessary on ARM.
> 
> Another reason for having a hypercall page instead of a raw instruction
> might be wanting to support 32 bit guests (from ~today) on a 64 bit
> hypervisor in the future and perhaps needing to do some shimming/arg
> translation. It would be better to aim for having the interface just be
> 32/64 agnostic but mistakes do happen.

Given the way register banking is done on AArch64, issuing an HVC on a
32-bit guest OS doesn't require translation on a 64-bit hypervisor. We
have a similar implementation at the SVC level (for 32-bit user apps on
a 64-bit kernel), the only modification was where a 32-bit SVC takes a
64-bit parameter in two separate 32-bit registers, so packing needs to
be done in a syscall wrapper.

I'm not closely involved with any of the Xen or KVM work but I would
vote for using HVC than a hypercall page.

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap

2011-12-01 Thread Avi Kivity
On 11/30/2011 09:03 AM, Xiao Guangrong wrote:
> On 11/29/2011 08:01 PM, Avi Kivity wrote:
>
> > On 11/29/2011 01:56 PM, Xiao Guangrong wrote:
> >> On 11/29/2011 07:20 PM, Avi Kivity wrote:
> >>
> >>
> >>> We used to have a bitmap in a shadow page with a bit set for every slot
> >>> pointed to by the page.  If we extend this to non-leaf pages (so, when
> >>> we set a bit, we propagate it through its parent_ptes list), then we do
> >>> the following on write fault:
> >>>
> >>
> >>
> >> Thanks for the detail.
> >>
> >> Um, propagating slot bit to parent ptes is little slow, especially, it
> >> is the overload for no Xwindow guests which is dirty logged only in the
> >> migration(i guess most linux guests are running on this mode and migration
> >> is not frequent). No?
> > 
> > You need to propagate very infrequently.  The first pte added to a page
> > will need to propagate, but the second (if from the same slot, which is
> > likely) will already have the bit set in the page, so we're assured it's
> > set in all its parents.
> > 
>
>
> What will happen if a guest page is unmapped or a shadow page is zapped?
> It should immediately clear the "slot" bit of the shadow page and its
> parent, it means it should propagate this "clear slot bit" event to all
> parents, in the case of softmmu. zapping shadow page is frequently, maybe
> it is unacceptable?

You can keep the bit set.  A clear bit means there are exactly zero
pages from the slot in this mmu page or its descendents.  A set bit
means there are zero or more pages.  If we have a bit accidentally set,
nothing bad happens.

> It does not like "unsync" bit which can be lazily cleared, because all bits
> of hierarchy can be cleared when cr3 reload.

With tdp (and without nested virt) the mappings never change anyway. 
With shadow, they do change.  Not sure how to handle that at the higher
levels.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions

2011-12-01 Thread Stefano Stabellini
On Thu, 1 Dec 2011, Ian Campbell wrote:
> On Wed, 2011-11-30 at 18:32 +, Stefano Stabellini wrote:
> > On Wed, 30 Nov 2011, Arnd Bergmann wrote:
> > > KVM and Xen at least both fall into the single-return-value category,
> > > so we should be able to agree on a calling conventions. KVM does not
> > > have an hcall API on ARM yet, and I see no reason not to use the
> > > same implementation that you have in the Xen guest.
> > > 
> > > Stefano, can you split out the generic parts of your asm/xen/hypercall.h
> > > file into a common asm/hypercall.h and submit it for review to the
> > > arm kernel list?
> > 
> > Sure, I can do that.
> > Usually the hypercall calling convention is very hypervisor specific,
> > but if it turns out that we have the same requirements I happy to design
> > a common interface.
> 
> I expect the only real decision to be made is hypercall page vs. raw hvc
> instruction.
> 
> The page was useful on x86 where there is a variety of instructions
> which could be used (at least for PV there was systenter/syscall/int, I
> think vmcall instruction differs between AMD and Intel also) and gives
> some additional flexibility. It's hard to predict but I don't think I'd
> expect that to be necessary on ARM.
>
> Another reason for having a hypercall page instead of a raw instruction
> might be wanting to support 32 bit guests (from ~today) on a 64 bit
> hypervisor in the future and perhaps needing to do some shimming/arg
> translation. It would be better to aim for having the interface just be
> 32/64 agnostic but mistakes do happen.

I always like to keep things as simple as possible, so I am in favor of
using the raw hvc instruction.
Besides with the bulk of mmu hypercalls gone, it should not be difficult
to design a 32/64 bit agnostic interface.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] PCI: Rework config space locking, add INTx masking services

2011-12-01 Thread Jesse Barnes
On Thu, 01 Dec 2011 13:04:15 +0100
Jan Kiszka  wrote:

> On 2011-11-04 09:45, Jan Kiszka wrote:
> > [ Rebase of v1 over yesterday's linux-next ]
> > 
> > This series tries to heal the currently broken locking scheme
> > around PCI config space accesses.
> > 
> > We have an interface lock out access via sysfs, but that service
> > wrongly assumes it is only called by one instance at a time for
> > some device. So two loops doing
> > 
> > echo 1 > /sys/bus/pci/devices//reset
> > 
> > in parallel will trigger a kernel BUG at the moment.
> > 
> > Besides synchronizing with user space, we also need to manage config
> > space access of generic PCI drivers. They need to mask legacy
> > interrupt lines while the specific driver runs in user space or a
> > guest OS.
> > 
> > The approach taken here is provide mutex-like locking for general
> > access - which still requires a special mechanism due to
> > requirements of the IBM Power RAID SCSI driver. Furthermore, INTx
> > masking is now available via the PCI core and synchronized via the
> > internal pci_lock.
> > 
> > Jan Kiszka (3):
> >   pci: Rework config space blocking services
> >   pci: Introduce INTx check & mask API
> >   uio: Convert uio_generic_pci to new intx masking API
> > 
> >  drivers/pci/access.c  |   76 +--
> >  drivers/pci/iov.c |   12 ++--
> >  drivers/pci/pci.c |  114
> > -
> > drivers/pci/pci.h |2 +
> > drivers/scsi/ipr.c|   67 +---
> > drivers/scsi/ipr.h|1 +
> > drivers/uio/uio_pci_generic.c |   76 ++-
> > include/linux/pci.h   |   17 -- 8 files changed, 248
> > insertions(+), 117 deletions(-)
> > 
> 
> I just received yet another request regarding the KVM feature that
> depends on this. What's the status of these patches? Were they merged
> into some staging tree already (didn't find any traces so far)?

Yeah I think it's ok now; I'll give it one more look and pull it into
-next.

Thanks,
Jesse
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] [PATCH RFC v2 2/2] hyper-v: initialize Hyper-V CPUID leaves.

2011-12-01 Thread Jan Kiszka
On 2011-10-23 17:39, Vadim Rozenfeld wrote:
> ---
>  target-i386/kvm.c |   73 +++-
>  1 files changed, 71 insertions(+), 2 deletions(-)
> 
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 82fec8c..c061e3b 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -29,6 +29,7 @@
>  #include "hw/pc.h"
>  #include "hw/apic.h"
>  #include "ioport.h"
> +#include "hyperv.h"
>  
>  //#define DEBUG_KVM
>  
> @@ -380,11 +381,16 @@ int kvm_arch_init_vcpu(CPUState *env)
>  cpuid_i = 0;
>  
>  /* Paravirtualization CPUIDs */
> -memcpy(signature, "KVMKVMKVM\0\0\0", 12);
>  c = &cpuid_data.entries[cpuid_i++];
>  memset(c, 0, sizeof(*c));
>  c->function = KVM_CPUID_SIGNATURE;
> -c->eax = 0;
> +if (!hyperv_enabled()) {
> +memcpy(signature, "KVMKVMKVM\0\0\0", 12);
> +c->eax = 0;
> +} else {
> +memcpy(signature, "Microsoft Hv", 12);
> +c->eax = HYPERV_CPUID_MIN;
> +}
>  c->ebx = signature[0];
>  c->ecx = signature[1];
>  c->edx = signature[2];
> @@ -395,6 +401,54 @@ int kvm_arch_init_vcpu(CPUState *env)
>  c->eax = env->cpuid_kvm_features &
>  kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
>  
> +if (hyperv_enabled()) {
> +memcpy(signature, "Hv#1\0\0\0\0\0\0\0\0", 12);
> +c->eax = signature[0];
> +
> +c = &cpuid_data.entries[cpuid_i++];
> +memset(c, 0, sizeof(*c));
> +c->function = HYPERV_CPUID_VERSION;
> +c->eax = 0x1bbc;
> +c->ebx = 0x00060001;
> +
> +c = &cpuid_data.entries[cpuid_i++];
> +memset(c, 0, sizeof(*c));
> +c->function = HYPERV_CPUID_FEATURES;
> +if (hyperv_relaxed_timing_enabled()) {
> +c->eax |= HV_X64_MSR_HYPERCALL_AVAILABLE;
> +}
> +if (hyperv_vapic_recommended()) {
> +c->eax |= HV_X64_MSR_HYPERCALL_AVAILABLE;
> +c->eax |= HV_X64_MSR_APIC_ACCESS_AVAILABLE;
> +}
> +
> +c = &cpuid_data.entries[cpuid_i++];
> +memset(c, 0, sizeof(*c));
> +c->function = HYPERV_CPUID_ENLIGHTMENT_INFO;
> +if (hyperv_relaxed_timing_enabled()) {
> +c->eax |= HV_X64_RELAXED_TIMING_RECOMMENDED;
> +}
> +if (hyperv_vapic_recommended()) {
> +c->eax |= HV_X64_APIC_ACCESS_RECOMMENDED;
> +}
> +c->ebx = hyperv_get_spinlock_retries();
> +
> +c = &cpuid_data.entries[cpuid_i++];
> +memset(c, 0, sizeof(*c));
> +c->function = HYPERV_CPUID_IMPLEMENT_LIMITS;
> +c->eax = 0x40;
> +c->ebx = 0x40;
> +
> +c = &cpuid_data.entries[cpuid_i++];
> +memset(c, 0, sizeof(*c));
> +c->function = KVM_CPUID_SIGNATURE_NEXT;
> +memcpy(signature, "KVMKVMKVM\0\0\0", 12);
> +c->eax = 0;
> +c->ebx = signature[0];
> +c->ecx = signature[1];
> +c->edx = signature[2];
> +}
> +
>  has_msr_async_pf_en = c->eax & (1 << KVM_FEATURE_ASYNC_PF);
>  
>  cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused);
> @@ -953,6 +1007,13 @@ static int kvm_put_msrs(CPUState *env, int level)
>  kvm_msr_entry_set(&msrs[n++], MSR_KVM_ASYNC_PF_EN,
>env->async_pf_en_msr);
>  }
> +if (hyperv_hypercall_available()) {
> +kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_GUEST_OS_ID, 0);
> +kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_HYPERCALL, 0);
> +}
> +if (hyperv_vapic_recommended()) {
> +kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_APIC_ASSIST_PAGE, 0);
> +}
>  }
>  if (env->mcg_cap) {
>  int i;
> @@ -1190,6 +1251,14 @@ static int kvm_get_msrs(CPUState *env)
>  msrs[n++].index = MSR_KVM_ASYNC_PF_EN;
>  }
>  
> +if (hyperv_hypercall_available()) {
> +msrs[n++].index = HV_X64_MSR_GUEST_OS_ID;
> +msrs[n++].index = HV_X64_MSR_HYPERCALL;
> +}
> +if (hyperv_vapic_recommended()) {
> +msrs[n++].index = HV_X64_MSR_APIC_ASSIST_PAGE;
> +}
> +
>  if (env->mcg_cap) {
>  msrs[n++].index = MSR_MCG_STATUS;
>  msrs[n++].index = MSR_MCG_CTL;

Not sure if we discussed this already: Is there no need to save/restore
the HV MSR values in the vcpu structure? Specifcially kvm_get_msrs looks
fishy as you request to read the state from the kernel but then simply
ignore it.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [PATCH RFC v2 1/2] hyper-v: introduce Hyper-V support infrastructure.

2011-12-01 Thread Jan Kiszka
On 2011-10-23 17:39, Vadim Rozenfeld wrote:
> ---
>  Makefile.target  |2 +
>  target-i386/cpuid.c  |   14 ++
>  target-i386/hyperv.c |   65 
> ++
>  target-i386/hyperv.h |   37 
>  4 files changed, 118 insertions(+), 0 deletions(-)
>  create mode 100644 target-i386/hyperv.c
>  create mode 100644 target-i386/hyperv.h
> 
> diff --git a/Makefile.target b/Makefile.target
> index 325f26e..446fabc 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -202,6 +202,8 @@ obj-$(CONFIG_NO_KVM) += kvm-stub.o
>  obj-y += memory.o
>  LIBS+=-lz
>  
> +obj-i386-y +=hyperv.o
> +
>  QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
>  QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
>  QEMU_CFLAGS += $(VNC_JPEG_CFLAGS)
> diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
> index 1e8bcff..261c168 100644
> --- a/target-i386/cpuid.c
> +++ b/target-i386/cpuid.c
> @@ -27,6 +27,8 @@
>  #include "qemu-option.h"
>  #include "qemu-config.h"
>  
> +#include "hyperv.h"
> +
>  /* feature flags taken from "Intel Processor Identification and the CPUID
>   * Instruction" and AMD's "CPUID Specification".  In cases of disagreement
>   * between feature naming conventions, aliases may be added.
> @@ -716,6 +718,14 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
> const char *cpu_model)
>  goto error;
>  }
>  x86_cpu_def->tsc_khz = tsc_freq / 1000;
> +} else if (!strcmp(featurestr, "hv_spinlocks")) {
> +   char *err;
> +   numvalue = strtoul(val, &err, 0);
> +   if (!*val || *err) {

Indention is off here.

> +fprintf(stderr, "bad numerical value %s\n", val);
> +goto error;
> +}
> +hyperv_set_spinlock_retries(numvalue);
>  } else {
>  fprintf(stderr, "unrecognized feature %s\n", featurestr);
>  goto error;
> @@ -724,6 +734,10 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
> const char *cpu_model)
>  check_cpuid = 1;
>  } else if (!strcmp(featurestr, "enforce")) {
>  check_cpuid = enforce_cpuid = 1;
> +} else if (!strcmp(featurestr, "hv_relaxed")) {
> +hyperv_enable_relaxed_timing(true);
> +} else if (!strcmp(featurestr, "hv_vapic")) {
> +hyperv_enable_vapic_recommended(true);
>  } else {
>  fprintf(stderr, "feature string `%s' not in format 
> (+feature|-feature|feature=xyz)\n", featurestr);
>  goto error;
> diff --git a/target-i386/hyperv.c b/target-i386/hyperv.c
> new file mode 100644
> index 000..b2e57ad
> --- /dev/null
> +++ b/target-i386/hyperv.c
> @@ -0,0 +1,65 @@
> +/*
> + * QEMU Hyper-V support
> + *
> + * Copyright Red Hat, Inc. 2011
> + *
> + * Author: Vadim Rozenfeld 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "hyperv.h"
> +
> +static bool hyperv_vapic;
> +static bool hyperv_relaxed_timing;
> +static int hyperv_spinlock_attempts = HYPERV_SPINLOCK_NEVER_RETRY;
> +
> +void hyperv_enable_vapic_recommended(bool val)
> +{
> +hyperv_vapic = val;
> +}
> +
> +void hyperv_enable_relaxed_timing(bool val)
> +{
> +hyperv_relaxed_timing = val;
> +}
> +
> +void hyperv_set_spinlock_retries(int val)
> +{
> +hyperv_spinlock_attempts = val;
> +if (hyperv_spinlock_attempts < 0xFFF) {
> +hyperv_spinlock_attempts = 0xFFF;
> +}
> +}
> +
> +bool hyperv_enabled(void)
> +{
> +return hyperv_hypercall_available() || hyperv_relaxed_timing_enabled();
> +}
> +
> +bool hyperv_hypercall_available(void)
> +{
> +if (hyperv_vapic ||
> +(hyperv_spinlock_attempts != HYPERV_SPINLOCK_NEVER_RETRY)) {

[ minor: braces arount the second condition aren't needed. ]

> +  return true;
> +}
> +return false;
> +}
> +
> +bool hyperv_vapic_recommended(void)
> +{
> +return hyperv_vapic;
> +}
> +
> +bool hyperv_relaxed_timing_enabled(void)
> +{
> +return hyperv_relaxed_timing;
> +}
> +
> +int hyperv_get_spinlock_retries(void)
> +{
> +return hyperv_spinlock_attempts;
> +}
> +
> diff --git a/target-i386/hyperv.h b/target-i386/hyperv.h
> new file mode 100644
> index 000..0d742f8
> --- /dev/null
> +++ b/target-i386/hyperv.h
> @@ -0,0 +1,37 @@
> +/*
> + * QEMU Hyper-V support
> + *
> + * Copyright Red Hat, Inc. 2011
> + *
> + * Author: Vadim Rozenfeld 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef QEMU_HW_HYPERV_H
> +#define QEMU_HW_HYPERV_H 1
> +
> +#include "qemu-common.h"
> +#include 
> +
> +#ifndef HYPERV_SPINLOCK_NEVER_RETRY
> +#define HYPERV_SPINLOCK_NEVER_RETRY 0x
> +#endif
> +
> +#ifndef KVM_CPUID_SIGNATURE_NEXT
>

Re: [PATCH] Guest stop notification

2011-12-01 Thread Jan Kiszka
On 2011-11-29 22:36, Eric B Munson wrote:
> Often when a guest is stopped from the qemu console, it will report spurious
> soft lockup warnings on resume.  There are kernel patches being discussed that
> will give the host the ability to tell the guest that it is being stopped and
> should ignore the soft lockup warning that generates.
> 
> Signed-off-by: Eric B Munson 
> Cc: ry...@linux.vnet.ibm.com
> Cc: aligu...@us.ibm.com
> Cc: mtosa...@redhat.com
> Cc: a...@redhat.com
> Cc: kvm@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> ---
>  target-i386/kvm.c |6 ++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 5bfc21f..defd364 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -336,12 +336,18 @@ static int kvm_inject_mce_oldstyle(CPUState *env)
>  return 0;
>  }
>  
> +static void kvm_put_guest_paused(CPUState *penv)
> +{
> +kvm_vcpu_ioctl(penv, KVM_GUEST_PAUSED, 0);
> +}

I see no need in encapsulating this in a separate function.

> +
>  static void cpu_update_state(void *opaque, int running, RunState state)
>  {
>  CPUState *env = opaque;
>  
>  if (running) {
>  env->tsc_valid = false;
> + kvm_put_guest_paused(env);

checkpatch.pl would have asked you to remove this tab.

More general:

Why is this x86-only? If the kernel interface is x86-only, what prevents
making it generic right from the beginning?

Why do we need a new IOCTL for this? Was there no space left in the
kvm_run structure e.g. to pass this flag down on next vcpu execution? No
big deal, just wondering.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 01/12] [PATCH] kvm-s390: ioctl to switch to user controlled virtual machines

2011-12-01 Thread Martin Schwidefsky
On Thu, 01 Dec 2011 15:15:03 +0200
Avi Kivity  wrote:

> > +
> > +   if (kvm->arch.gmap)
> > +   gmap_free(kvm->arch.gmap);
> > +
> > +   kvm->arch.gmap = NULL;
> 
> Locking?
> 
> What happens if a vcpu is created afterwards?
> 
> I guess you don't mind too much since this is a privileged interface for
> a single purpose.

That is indeed a race. A malicious user space could create a new cpu with
KVM_CREATE_VCPU on another thread after the for loop checked that there
are no VCPUs. The new VCPU could then pick up the kvm->arch.gmap and use it
while the caller of KVM_S390_ENABLE_UCONTROL frees the structure.
The kvm_s390_enable_ucontrol function needs to lock with the kvm->lock mutex.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 04/12] [PATCH] kvm-s390-ucontrol: export SIE control block to user

2011-12-01 Thread Avi Kivity
On 12/01/2011 03:59 PM, Carsten Otte wrote:
> On 01.12.2011 14:26, Avi Kivity wrote:
>> On 12/01/2011 02:57 PM, Carsten Otte wrote:
>>> This patch exports the SIE hardware control block to userspace
>>> via the mapping of the vcpu file descriptor.
>>>   else if (vmf->pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET)
>>>   page = virt_to_page(vcpu->kvm->coalesced_mmio_ring);
>>>   #endif
>>> +#if defined(CONFIG_S390)&&  defined(CONFIG_KVM_UCONTROL)
>>> +else if ((vmf->pgoff == KVM_SIE_PAGE_OFFSET)
>>> +&&  (!(vcpu->kvm->arch.gmap)))
>>
>> Is the second test "kvm_is_ucontrol()"?
> Yes it is, but that is defined in arch/s390/kvm/kvm-s390.h which
> I do not dare to include here.
>

Oh, so kvm_arch_vcpu_fault() will help.  The pio stuff sets a bad
example, but we needn't follow it.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 04/12] [PATCH] kvm-s390-ucontrol: export SIE control block to user

2011-12-01 Thread Carsten Otte

On 01.12.2011 14:26, Avi Kivity wrote:

On 12/01/2011 02:57 PM, Carsten Otte wrote:

This patch exports the SIE hardware control block to userspace
via the mapping of the vcpu file descriptor.
else if (vmf->pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET)
page = virt_to_page(vcpu->kvm->coalesced_mmio_ring);
  #endif
+#if defined(CONFIG_S390)&&  defined(CONFIG_KVM_UCONTROL)
+   else if ((vmf->pgoff == KVM_SIE_PAGE_OFFSET)
+   &&  (!(vcpu->kvm->arch.gmap)))


Is the second test "kvm_is_ucontrol()"?

Yes it is, but that is defined in arch/s390/kvm/kvm-s390.h which
I do not dare to include here.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: monitor flag on native kvm tool guest

2011-12-01 Thread Avi Kivity
On 12/01/2011 03:54 PM, Daniele Carollo wrote:
> Il 01 dicembre 2011 14:48, Avi Kivity  ha scritto:
> > On 12/01/2011 03:37 PM, Daniele Carollo wrote:
> >> Hi,
> >> my name's Daniele and I'm using the native linux kvm tool.
> >> If I try to execute it in this way ./kvm run -d ~/linux-0.2.img I get
> >> a kernel panic like this: http://paste.org/41673
> >> Only using the addictional option -p "idle=halt" I can run a virtual 
> >> machine.
> >> Printing cat /proc/cpuinfo on the host i get http://paste.org/41663
> >> and on the guest http://paste.org/41664
> >> Sashal from the native linux kvm tool team noticed that there is the
> >> monitor cpu flag even on the guest.
> >>
> >
> > from cpuid.c:
> >
> >/* cpuid 1.ecx */
> >const u32 kvm_supported_word4_x86_features =
> >F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
> >
> > so either the masking later on is subtly wrong, or kvm tool doesn't pass
> > it on correctly, or Linux ignores it.
> >
> > Please run the attached program on the host and post its output.
> >
> > --
> > error compiling committee.c: too many arguments to function
> >
>
> It gave me:
> func  ind  flags  -> 000d 756e6547 6c65746e 
> 49656e69
> func 0001 ind  flags  -> 000206a7 03100800 14b8220b 
> 0f8bfbff
>

cpuid 1.ecx[3] = 1, so it's a host kernel issue.

Please try git://git.kernel.org/pub/scm/virt/kvm/kvm.git
kvm-updates/3.1, see it if fixes it for you.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: monitor flag on native kvm tool guest

2011-12-01 Thread Daniele Carollo
Il 01 dicembre 2011 14:48, Avi Kivity  ha scritto:
> On 12/01/2011 03:37 PM, Daniele Carollo wrote:
>> Hi,
>> my name's Daniele and I'm using the native linux kvm tool.
>> If I try to execute it in this way ./kvm run -d ~/linux-0.2.img I get
>> a kernel panic like this: http://paste.org/41673
>> Only using the addictional option -p "idle=halt" I can run a virtual machine.
>> Printing cat /proc/cpuinfo on the host i get http://paste.org/41663
>> and on the guest http://paste.org/41664
>> Sashal from the native linux kvm tool team noticed that there is the
>> monitor cpu flag even on the guest.
>>
>
> from cpuid.c:
>
>    /* cpuid 1.ecx */
>    const u32 kvm_supported_word4_x86_features =
>        F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
>
> so either the masking later on is subtly wrong, or kvm tool doesn't pass
> it on correctly, or Linux ignores it.
>
> Please run the attached program on the host and post its output.
>
> --
> error compiling committee.c: too many arguments to function
>

It gave me:
func  ind  flags  -> 000d 756e6547 6c65746e 49656e69
func 0001 ind  flags  -> 000206a7 03100800 14b8220b 0f8bfbff
func 0002 ind  flags 0006 -> 76035a01 00f0b2ff  00ca
func 0003 ind  flags  ->    
func 0004 ind  flags 0001 -> 1c004121 01c0003f 003f 
func 0004 ind 0001 flags 0001 -> 1c004122 01c0003f 003f 
func 0004 ind 0002 flags 0001 -> 1c004143 01c0003f 01ff 
func 0004 ind 0003 flags 0001 -> 1c03c163 02c0003f 0fff 0006
func 0004 ind 0004 flags 0001 ->    
func 0005 ind  flags  ->    
func 0006 ind  flags  ->    
func 0007 ind  flags 0001 ->    
func 0008 ind  flags  ->    
func 0009 ind  flags  ->    
func 000a ind  flags  ->    
func 000b ind  flags 0001 -> 0001 0002 0100 0003
func 000b ind 0001 flags 0001 -> 0004 0004 0201 0003
func 000b ind 0002 flags 0001 ->   0002 0003
func 000c ind  flags  ->    
func 000d ind  flags 0001 -> 0007 0340 0340 
func 000d ind 0001 flags 0001 -> 0001   
func 000d ind 0002 flags 0001 -> 0100 0240  
func 8000 ind  flags  -> 8008   
func 8001 ind  flags  ->   0001 28100800
func 8002 ind  flags  ->    
func 8003 ind  flags  ->    
func 8004 ind  flags  ->    
func 8005 ind  flags  ->    
func 8006 ind  flags  ->    
func 8007 ind  flags  ->    
func 8008 ind  flags  -> 3024   
func 4000 ind  flags  ->  4b4d564b 564b4d56 004d
func 4001 ind  flags  -> 013b   
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: monitor flag on native kvm tool guest

2011-12-01 Thread Avi Kivity
On 12/01/2011 03:50 PM, Sasha Levin wrote:
> On Thu, 2011-12-01 at 15:48 +0200, Avi Kivity wrote:
> > On 12/01/2011 03:37 PM, Daniele Carollo wrote:
> > > Hi,
> > > my name's Daniele and I'm using the native linux kvm tool.
> > > If I try to execute it in this way ./kvm run -d ~/linux-0.2.img I get
> > > a kernel panic like this: http://paste.org/41673
> > > Only using the addictional option -p "idle=halt" I can run a virtual 
> > > machine.
> > > Printing cat /proc/cpuinfo on the host i get http://paste.org/41663
> > > and on the guest http://paste.org/41664
> > > Sashal from the native linux kvm tool team noticed that there is the
> > > monitor cpu flag even on the guest.
> > >
> > 
> > from cpuid.c:
> > 
> > /* cpuid 1.ecx */
> > const u32 kvm_supported_word4_x86_features =
> > F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
> > 
> > so either the masking later on is subtly wrong, or kvm tool doesn't pass
> > it on correctly, or Linux ignores it.
> > 
> > Please run the attached program on the host and post its output.
> > 
>
> Avi,
>
> We tested it with qemu where Daniele has reported that monitor is also
> visible in the cpuid there.
>

Not here (3.2).  Maybe it's another manifestation of the -E2BIG cpuid bug.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: monitor flag on native kvm tool guest

2011-12-01 Thread Sasha Levin
On Thu, 2011-12-01 at 15:48 +0200, Avi Kivity wrote:
> On 12/01/2011 03:37 PM, Daniele Carollo wrote:
> > Hi,
> > my name's Daniele and I'm using the native linux kvm tool.
> > If I try to execute it in this way ./kvm run -d ~/linux-0.2.img I get
> > a kernel panic like this: http://paste.org/41673
> > Only using the addictional option -p "idle=halt" I can run a virtual 
> > machine.
> > Printing cat /proc/cpuinfo on the host i get http://paste.org/41663
> > and on the guest http://paste.org/41664
> > Sashal from the native linux kvm tool team noticed that there is the
> > monitor cpu flag even on the guest.
> >
> 
> from cpuid.c:
> 
> /* cpuid 1.ecx */
> const u32 kvm_supported_word4_x86_features =
> F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
> 
> so either the masking later on is subtly wrong, or kvm tool doesn't pass
> it on correctly, or Linux ignores it.
> 
> Please run the attached program on the host and post its output.
> 

Avi,

We tested it with qemu where Daniele has reported that monitor is also
visible in the cpuid there.

-- 

Sasha.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: monitor flag on native kvm tool guest

2011-12-01 Thread Avi Kivity
On 12/01/2011 03:37 PM, Daniele Carollo wrote:
> Hi,
> my name's Daniele and I'm using the native linux kvm tool.
> If I try to execute it in this way ./kvm run -d ~/linux-0.2.img I get
> a kernel panic like this: http://paste.org/41673
> Only using the addictional option -p "idle=halt" I can run a virtual machine.
> Printing cat /proc/cpuinfo on the host i get http://paste.org/41663
> and on the guest http://paste.org/41664
> Sashal from the native linux kvm tool team noticed that there is the
> monitor cpu flag even on the guest.
>

from cpuid.c:

/* cpuid 1.ecx */
const u32 kvm_supported_word4_x86_features =
F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |

so either the masking later on is subtly wrong, or kvm tool doesn't pass
it on correctly, or Linux ignores it.

Please run the attached program on the host and post its output.

-- 
error compiling committee.c: too many arguments to function

#include 
#include 
#include 
#include 
#include 

int main(void)
{
struct kvm_cpuid2 *cpuid;
int kvm, r = 0, i, j;

kvm = open("/dev/kvm", O_RDWR);
cpuid = malloc(sizeof(*cpuid) + sizeof(struct kvm_cpuid_entry2) * 100);
cpuid->nent = 100;

r = ioctl(kvm, KVM_GET_SUPPORTED_CPUID, cpuid);
if (r) {
	printf("KVM_GET_SUPPORTED_CPUID returned %d with errno %d\n", r, errno);
	return 1;
}

for (j = 0; j < cpuid->nent; ++j) {
	struct kvm_cpuid_entry2 *e = &cpuid->entries[j];

	printf("func %08x ind %08x flags %08x -> %08x %08x %08x %08x\n",
	   e->function, e->index, e->flags,
	   e->eax, e->ebx, e->ecx, e->edx);
}
free(cpuid);
close(kvm);

return 0;
}


[PATCH 2/2] common.config: Allow use of arbitrary qemu* paths

2011-12-01 Thread Lucas Meneghel Rodrigues
Since we might want to test arbitrary qemu, qemu-img and
qemu-io paths, allow users to specify environment variable
values for QEMU_PROG, QEMU_IMG_PROG and QEMU_IO_PROG so
the testsuite will use those values rather than find them
on PATH. Obviously, if such env variables are not set
prior to script execution, normal detection mechanism
takes place.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 common.config |   12 +---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/common.config b/common.config
index d5a72af..d07f435 100644
--- a/common.config
+++ b/common.config
@@ -87,13 +87,19 @@ export BC_PROG="`set_prog_path bc`"
 
 export PS_ALL_FLAGS="-ef"
 
-export QEMU_PROG="`set_prog_path qemu`"
+if [ -z "$QEMU_PROG" ]; then
+export QEMU_PROG="`set_prog_path qemu`"
+fi
 [ "$QEMU_PROG" = "" ] && _fatal "qemu not found"
 
-export QEMU_IMG_PROG="`set_prog_path qemu-img`"
+if [ -z "$QEMU_IMG_PROG" ]; then
+export QEMU_IMG_PROG="`set_prog_path qemu-img`"
+fi
 [ "$QEMU_IMG_PROG" = "" ] && _fatal "qemu-img not found"
 
-export QEMU_IO_PROG="`set_prog_path qemu-io`"
+if [ -z "$QEMU_IO_PROG" ]; then
+export QEMU_IO_PROG="`set_prog_path qemu-io`"
+fi
 [ "$QEMU_IO_PROG" = "" ] && _fatal "qemu-io not found"
 
 export QEMU=$QEMU_PROG
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] check: print relevant path information

2011-12-01 Thread Lucas Meneghel Rodrigues
Print the paths of the programs under test
(qemu, qemu-img and qemu-io).

Signed-off-by: Lucas Meneghel Rodrigues 
---
 check |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/check b/check
index 84ef3e5..8499a04 100755
--- a/check
+++ b/check
@@ -158,6 +158,9 @@ FULL_HOST_DETAILS=`_full_platform_details`
 #FULL_MOUNT_OPTIONS=`_scratch_mount_options`
 
 cat 

[PATCH 0/2] qemu-io tests: More fine grained control of qemu paths

2011-12-01 Thread Lucas Meneghel Rodrigues
In automated test environments, we often build and test
qemu from arbitrary paths, rather than installing them
on standard PATH directories. Of course, appending directories
to PATH might produce the desired result, but making it
possible to specify arbitrary qemu paths through environment
variables is very convenient and minimally intrusive.

So, make it possible to set qemu paths through env
variables, and also, print the paths of the qemu versions
being tested, for clarity sake.

Lucas Meneghel Rodrigues (2):
  check: print relevant path information
  common.config: Allow use of arbitrary qemu* paths

 check |3 +++
 common.config |   12 +---
 2 files changed, 12 insertions(+), 3 deletions(-)

-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


monitor flag on native kvm tool guest

2011-12-01 Thread Daniele Carollo
Hi,
my name's Daniele and I'm using the native linux kvm tool.
If I try to execute it in this way ./kvm run -d ~/linux-0.2.img I get
a kernel panic like this: http://paste.org/41673
Only using the addictional option -p "idle=halt" I can run a virtual machine.
Printing cat /proc/cpuinfo on the host i get http://paste.org/41663
and on the guest http://paste.org/41664
Sashal from the native linux kvm tool team noticed that there is the
monitor cpu flag even on the guest.
Thanks for the attention,
Daniele Carollo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 00/12] User controlled virtual machines

2011-12-01 Thread Avi Kivity
On 12/01/2011 03:10 PM, Avi Kivity wrote:
> On 12/01/2011 02:57 PM, Carsten Otte wrote:
> > Hi Avi, Hi Marcelo,
> >
> > this patch series introduces an interface to allow a privileged userspace
> > program to control a KVM virtual machine. The interface is intended for
> > use by a machine simulator called CECSIM that can simulate an entire
> > mainframe machine with nested virtualization and I/O for the purpose
> > of testing and debugging its firmware prior to availability of silicon.
> > This patchset allows for concurrent use of the KVM device driver to drive
> > both regular and user controlled virtual machines at the same time.
> > I would kindly like to ask for review and inclusion of these patches.
>
> So, this moves the responsibility of handling intercepts from the kernel
> to userspace, yes?
>
> How much of the kvm code continues to be active after this?  Perhaps it
> makes sense to have a separate interface for this.

Okay, I read the code and I even think I understood a little bit of it. 
In general the patches look okay, I had only minor comments.  But please
do document all the new interfaces.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 01/12] [PATCH] kvm-s390: ioctl to switch to user controlled virtual machines

2011-12-01 Thread Avi Kivity
On 12/01/2011 03:15 PM, Avi Kivity wrote:
>
> > +
> > +   if (kvm->arch.gmap)
> > +   gmap_free(kvm->arch.gmap);
> > +
> > +   kvm->arch.gmap = NULL;
>
> Locking?
>
> What happens if a vcpu is created afterwards?
>

Having read the code, I think you can repurpose the argument of
KVM_CREATE_VM to be a vm type; with 0 being a normal vm, and 1 being
your new monster.  The code already checks that the argument is zero, so
we're safe there.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 04/12] [PATCH] kvm-s390-ucontrol: export SIE control block to user

2011-12-01 Thread Avi Kivity
On 12/01/2011 02:57 PM, Carsten Otte wrote:
> This patch exports the SIE hardware control block to userspace
> via the mapping of the vcpu file descriptor.
>   else if (vmf->pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET)
>   page = virt_to_page(vcpu->kvm->coalesced_mmio_ring);
>  #endif
> +#if defined(CONFIG_S390) && defined(CONFIG_KVM_UCONTROL)
> + else if ((vmf->pgoff == KVM_SIE_PAGE_OFFSET)
> +  && (!(vcpu->kvm->arch.gmap)))

Is the second test "kvm_is_ucontrol()"?

> + page = virt_to_page(vcpu->arch.sie_block);
> +#endif
>   else
>   return VM_FAULT_SIGBUS;
>   get_page(page);
>


-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 04/12] [PATCH] kvm-s390-ucontrol: export SIE control block to user

2011-12-01 Thread Avi Kivity
On 12/01/2011 02:57 PM, Carsten Otte wrote:
> This patch exports the SIE hardware control block to userspace
> via the mapping of the vcpu file descriptor.
>
> Signed-off-by: Carsten Otte 
> ---
> ---
>  arch/s390/include/asm/kvm_host.h |2 ++
>  virt/kvm/kvm_main.c  |5 +
>  2 files changed, 7 insertions(+)
>
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -24,6 +24,8 @@
>  /* memory slots that does not exposed to userspace */
>  #define KVM_PRIVATE_MEM_SLOTS 4
>  
> +#define KVM_SIE_PAGE_OFFSET 1
> +

How does userspace get to know what this value is?

Coalesced mmio does this with the return value of
KVM_CAP_COALESCED_MMIO.  You can hardcode it, but in a user visible
header please.

>  struct sca_entry {
>   atomic_t scn;
>   __u32   reserved;
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1656,6 +1656,11 @@ static int kvm_vcpu_fault(struct vm_area
>   else if (vmf->pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET)
>   page = virt_to_page(vcpu->kvm->coalesced_mmio_ring);
>  #endif
> +#if defined(CONFIG_S390) && defined(CONFIG_KVM_UCONTROL)
> + else if ((vmf->pgoff == KVM_SIE_PAGE_OFFSET)
> +  && (!(vcpu->kvm->arch.gmap)))
> + page = virt_to_page(vcpu->arch.sie_block);
> +#endif

kvm_arch_vcpu_fault()

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 02/12] [PATCH] kvm-s390-ucontrol: per vcpu address spaces

2011-12-01 Thread Avi Kivity
On 12/01/2011 02:57 PM, Carsten Otte wrote:
> This patch introduces two ioctls for virtual cpus, that are only
> valid for kernel virtual machines that are controlled by userspace.
> Each virtual cpu has its individual address space in this mode of
> operation, and each address space is backed by the gmap
> implementation just like the address space for regular KVM guests.
> KVM_S390_UCAS_MAP allows to map a part of the user's virtual address
> space to the vcpu. Starting offset and length in both the user and
> the vcpu address space need to be aligned to 1M.
> KVM_S390_UCAS_UNMAP can be used to unmap a range of memory from a
> virtual cpu in a similar way.

>   }
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -655,7 +655,14 @@ struct kvm_clock_data {
>  #define KVM_SET_TSS_ADDR  _IO(KVMIO,   0x47)
>  #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
>  /* enable ucontrol for s390 */
> +struct kvm_s390_ucas_mapping {
> + unsigned long user_addr;
> + unsigned long vcpu_addr;
> + unsigned long length;
> +};

Do you have 32/64 issues on s390?  This struct changes size with host
userspace bitness.

>  #define KVM_S390_ENABLE_UCONTROL  _IO(KVMIO,  0x49)
> +#define KVM_S390_UCAS_MAP_IOW(KVMIO, 0x50, struct 
> kvm_s390_ucas_mapping)
> +#define KVM_S390_UCAS_UNMAP  _IOW(KVMIO, 0x51, struct 
> kvm_s390_ucas_mapping)
>  
>

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 01/12] [PATCH] kvm-s390: ioctl to switch to user controlled virtual machines

2011-12-01 Thread Avi Kivity
On 12/01/2011 02:57 PM, Carsten Otte wrote:
> This patch introduces a new config option for user controlled kernel
> virtual machines. It introduces a new ioctl named
> KVM_S390_ENABLE_UCONTROL on the kvm file descriptor which allows for
> a one way transition from a regular kernel virtual machine to a
> user controlled virtual machine. The virtual machine must not have
> any memory slots installed, and no virtual cpus defined.
> Note that the user controlled virtual machines require CAP_SYS_ADMIN
> privileges.
>
> Signed-off-by: Carsten Otte 
> ---
> ---
>  arch/s390/kvm/Kconfig|9 +
>  arch/s390/kvm/kvm-s390.c |   30 ++
>  arch/s390/kvm/kvm-s390.h |   10 ++
>  include/linux/kvm.h  |3 +++
>  4 files changed, 52 insertions(+)

Documentation/virtual/kvm/api.txt +++

>  
> +int kvm_s390_enable_ucontrol(struct kvm *kvm)
> +{
> +#ifdef CONFIG_KVM_UCONTROL
> + int i;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
> +
> + for (i = 0; i < KVM_MAX_VCPUS; i++)
> + if (kvm->vcpus[i])
> + return -EINVAL;
> +
> + if (kvm->memslots->nmemslots)
> + return -EPERM;

Why different errors?


> +
> + if (kvm->arch.gmap)
> + gmap_free(kvm->arch.gmap);
> +
> + kvm->arch.gmap = NULL;

Locking?

What happens if a vcpu is created afterwards?

I guess you don't mind too much since this is a privileged interface for
a single purpose.

> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -654,6 +654,9 @@ struct kvm_clock_data {
>   struct kvm_userspace_memory_region)
>  #define KVM_SET_TSS_ADDR  _IO(KVMIO,   0x47)
>  #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
> +/* enable ucontrol for s390 */
> +#define KVM_S390_ENABLE_UCONTROL  _IO(KVMIO,  0x49)
> +
>  /* Device model IOC */
>  #define KVM_CREATE_IRQCHIP_IO(KVMIO,   0x60)
>  #define KVM_IRQ_LINE  _IOW(KVMIO,  0x61, struct kvm_irq_level)
>

#define KVM_CAP_S390_UCONTROL


-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 00/12] User controlled virtual machines

2011-12-01 Thread Avi Kivity
On 12/01/2011 02:57 PM, Carsten Otte wrote:
> Hi Avi, Hi Marcelo,
>
> this patch series introduces an interface to allow a privileged userspace
> program to control a KVM virtual machine. The interface is intended for
> use by a machine simulator called CECSIM that can simulate an entire
> mainframe machine with nested virtualization and I/O for the purpose
> of testing and debugging its firmware prior to availability of silicon.
> This patchset allows for concurrent use of the KVM device driver to drive
> both regular and user controlled virtual machines at the same time.
> I would kindly like to ask for review and inclusion of these patches.

So, this moves the responsibility of handling intercepts from the kernel
to userspace, yes?

How much of the kvm code continues to be active after this?  Perhaps it
makes sense to have a separate interface for this.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 12/12] From: Carsten Otte

2011-12-01 Thread Carsten Otte
This patch fixes the return code of kvm_arch_vcpu_ioctl in case
of an unkown ioctl number.

Signed-off-by: Carsten Otte 
---
Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
===
--- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c
+++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
@@ -928,7 +928,7 @@ long kvm_arch_vcpu_ioctl(struct file *fi
break;
}
default:
-   r = -EINVAL;
+   r = -ENOTTY;
}
return r;
 }

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 08/12] [PATCH] kvm-s390-ucontrol: disable sca

2011-12-01 Thread Carsten Otte
This patch makes sure user controlled virtual machines do not use a
system control area (sca). This is needed in order to create
virtual machines with more cpus than the size of the sca [64].

Signed-off-by: Carsten Otte 
---
Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
===
--- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c
+++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
@@ -244,10 +244,13 @@ out_err:
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
VCPU_EVENT(vcpu, 3, "%s", "free cpu");
-   clear_bit(63 - vcpu->vcpu_id, (unsigned long *) 
&vcpu->kvm->arch.sca->mcn);
-   if (vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda ==
-   (__u64) vcpu->arch.sie_block)
-   vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0;
+   if (!kvm_is_ucontrol(vcpu->kvm)) {
+   clear_bit(63 - vcpu->vcpu_id,
+ (unsigned long *) &vcpu->kvm->arch.sca->mcn);
+   if (vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda ==
+   (__u64) vcpu->arch.sie_block)
+   vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0;
+   }
smp_mb();
 
if (kvm_is_ucontrol(vcpu->kvm))
@@ -384,12 +387,19 @@ struct kvm_vcpu *kvm_arch_vcpu_create(st
goto out_free_cpu;
 
vcpu->arch.sie_block->icpua = id;
-   BUG_ON(!kvm->arch.sca);
-   if (!kvm->arch.sca->cpu[id].sda)
-   kvm->arch.sca->cpu[id].sda = (__u64) vcpu->arch.sie_block;
-   vcpu->arch.sie_block->scaoh = (__u32)(((__u64)kvm->arch.sca) >> 32);
-   vcpu->arch.sie_block->scaol = (__u32)(__u64)kvm->arch.sca;
-   set_bit(63 - id, (unsigned long *) &kvm->arch.sca->mcn);
+   if (!kvm_is_ucontrol(kvm)) {
+   if (!kvm->arch.sca) {
+   WARN_ON_ONCE(1);
+   goto out_free_cpu;
+   }
+   if (!kvm->arch.sca->cpu[id].sda)
+   kvm->arch.sca->cpu[id].sda =
+   (__u64) vcpu->arch.sie_block;
+   vcpu->arch.sie_block->scaoh =
+   (__u32)(((__u64)kvm->arch.sca) >> 32);
+   vcpu->arch.sie_block->scaol = (__u32)(__u64)kvm->arch.sca;
+   set_bit(63 - id, (unsigned long *) &kvm->arch.sca->mcn);
+   }
 
spin_lock_init(&vcpu->arch.local_int.lock);
INIT_LIST_HEAD(&vcpu->arch.local_int.list);

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 05/12] [PATCH] kvm-s390-ucontrol: disable in-kernel handling of SIE intercepts

2011-12-01 Thread Carsten Otte
This patch disables in-kernel handling of SIE intercepts for user
controlled virtual machines. All intercepts are passed to userspace
via KVM_EXIT_SIE exit reason just like SIE intercepts that cannot be
handled in-kernel for regular KVM guests.

Signed-off-by: Carsten Otte 
---
Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
===
--- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c
+++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
@@ -582,7 +582,10 @@ rerun_vcpu:
rc = __vcpu_run(vcpu);
if (rc)
break;
-   rc = kvm_handle_sie_intercept(vcpu);
+   if (kvm_is_ucontrol(vcpu->kvm))
+   rc = -EOPNOTSUPP;
+   else
+   rc = kvm_handle_sie_intercept(vcpu);
} while (!signal_pending(current) && !rc);
 
if (rc == SIE_INTERCEPT_RERUNVCPU)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 09/12] [PATCH] kvm-s390: fix assumption for KVM_MAX_VCPUS

2011-12-01 Thread Carsten Otte
This patch fixes definition of the idle_mask and the local_int array
in kvm_s390_float_interrupt. Previous definition had 64 cpus max
hardcoded instead of using KVM_MAX_VCPUS.

Signed-off-by: Carsten Otte 
---
Index: linux-2.5-cecsim/arch/s390/include/asm/kvm_host.h
===
--- linux-2.5-cecsim.orig/arch/s390/include/asm/kvm_host.h
+++ linux-2.5-cecsim/arch/s390/include/asm/kvm_host.h
@@ -222,8 +222,9 @@ struct kvm_s390_float_interrupt {
struct list_head list;
atomic_t active;
int next_rr_cpu;
-   unsigned long idle_mask [(64 + sizeof(long) - 1) / sizeof(long)];
-   struct kvm_s390_local_interrupt *local_int[64];
+   unsigned long idle_mask[(KVM_MAX_VCPUS + sizeof(long) - 1)
+   / sizeof(long)];
+   struct kvm_s390_local_interrupt *local_int[KVM_MAX_VCPUS];
 };
 
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 04/12] [PATCH] kvm-s390-ucontrol: export SIE control block to user

2011-12-01 Thread Carsten Otte
This patch exports the SIE hardware control block to userspace
via the mapping of the vcpu file descriptor.

Signed-off-by: Carsten Otte 
---
---
 arch/s390/include/asm/kvm_host.h |2 ++
 virt/kvm/kvm_main.c  |5 +
 2 files changed, 7 insertions(+)

--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -24,6 +24,8 @@
 /* memory slots that does not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 4
 
+#define KVM_SIE_PAGE_OFFSET 1
+
 struct sca_entry {
atomic_t scn;
__u32   reserved;
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1656,6 +1656,11 @@ static int kvm_vcpu_fault(struct vm_area
else if (vmf->pgoff == KVM_COALESCED_MMIO_PAGE_OFFSET)
page = virt_to_page(vcpu->kvm->coalesced_mmio_ring);
 #endif
+#if defined(CONFIG_S390) && defined(CONFIG_KVM_UCONTROL)
+   else if ((vmf->pgoff == KVM_SIE_PAGE_OFFSET)
+&& (!(vcpu->kvm->arch.gmap)))
+   page = virt_to_page(vcpu->arch.sie_block);
+#endif
else
return VM_FAULT_SIGBUS;
get_page(page);

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 11/12] [PATCH] kvm-s390-ucontrol: announce capability for user controlled vms

2011-12-01 Thread Carsten Otte
This patch announces a new capability KVM_CAP_S390_UCONTROL that
indicates that kvm can now support virtual machines that are
controlled by userspace.

Signed-off-by: Carsten Otte 
---
---
 arch/s390/kvm/kvm-s390.c |3 +++
 include/linux/kvm.h  |1 +
 2 files changed, 4 insertions(+)

--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -260,6 +260,9 @@ int kvm_dev_ioctl_check_extension(long e
case KVM_CAP_S390_PSW:
case KVM_CAP_S390_GMAP:
case KVM_CAP_SYNC_MMU:
+#ifdef CONFIG_KVM_UCONTROL
+   case KVM_CAP_S390_UCONTROL:
+#endif
r = 1;
break;
default:
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -570,6 +570,7 @@ struct kvm_s390_keyop {
 #define KVM_CAP_MAX_VCPUS 66   /* returns max vcpus per vm */
 #define KVM_CAP_PPC_PAPR 68
 #define KVM_CAP_S390_GMAP 71
+#define KVM_CAP_S390_UCONTROL 72
 
 #ifdef KVM_CAP_IRQ_ROUTING
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 03/12] [PATCH] kvm-s390-ucontrol: export page faults to user

2011-12-01 Thread Carsten Otte
This patch introduces a new exit reason in the kvm_run structure
named KVM_EXIT_UCONTROL. This exit indicates, that a virtual cpu
has regognized a fault on the host page table. The idea is that
userspace can handle this fault by mapping memory at the fault
location into the cpu's address space and then continue to run the
virtual cpu.

Signed-off-by: Carsten Otte 
---
Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
===
--- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c
+++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
@@ -509,8 +509,10 @@ int kvm_arch_vcpu_ioctl_set_mpstate(stru
return -EINVAL; /* not implemented yet */
 }
 
-static void __vcpu_run(struct kvm_vcpu *vcpu)
+static int __vcpu_run(struct kvm_vcpu *vcpu)
 {
+   int rc;
+
memcpy(&vcpu->arch.sie_block->gg14, &vcpu->arch.guest_gprs[14], 16);
 
if (need_resched())
@@ -527,9 +529,15 @@ static void __vcpu_run(struct kvm_vcpu *
local_irq_enable();
VCPU_EVENT(vcpu, 6, "entering sie flags %x",
   atomic_read(&vcpu->arch.sie_block->cpuflags));
-   if (sie64a(vcpu->arch.sie_block, vcpu->arch.guest_gprs)) {
-   VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction");
-   kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
+   rc = sie64a(vcpu->arch.sie_block, vcpu->arch.guest_gprs);
+   if (rc) {
+   if (kvm_is_ucontrol(vcpu->kvm)) {
+   rc = SIE_INTERCEPT_UCONTROL;
+   } else {
+   VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction");
+   kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
+   rc = 0;
+   }
}
VCPU_EVENT(vcpu, 6, "exit sie icptcode %d",
   vcpu->arch.sie_block->icptcode);
@@ -538,6 +546,7 @@ static void __vcpu_run(struct kvm_vcpu *
local_irq_enable();
 
memcpy(&vcpu->arch.guest_gprs[14], &vcpu->arch.sie_block->gg14, 16);
+   return rc;
 }
 
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
@@ -558,6 +567,7 @@ rerun_vcpu:
case KVM_EXIT_UNKNOWN:
case KVM_EXIT_INTR:
case KVM_EXIT_S390_RESET:
+   case KVM_EXIT_UCONTROL:
break;
default:
BUG();
@@ -569,7 +579,9 @@ rerun_vcpu:
might_fault();
 
do {
-   __vcpu_run(vcpu);
+   rc = __vcpu_run(vcpu);
+   if (rc)
+   break;
rc = kvm_handle_sie_intercept(vcpu);
} while (!signal_pending(current) && !rc);
 
@@ -581,6 +593,16 @@ rerun_vcpu:
rc = -EINTR;
}
 
+#ifdef CONFIG_KVM_UCONTROL
+   if (rc == SIE_INTERCEPT_UCONTROL) {
+   kvm_run->exit_reason = KVM_EXIT_UCONTROL;
+   kvm_run->s390_ucontrol.trans_exc_code =
+   current->thread.gmap_addr;
+   kvm_run->s390_ucontrol.pgm_code = 0x10;
+   rc = 0;
+   }
+#endif
+
if (rc == -EOPNOTSUPP) {
/* intercept cannot be handled in-kernel, prepare kvm-run */
kvm_run->exit_reason = KVM_EXIT_S390_SIEIC;
Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.h
===
--- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.h
+++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.h
@@ -26,6 +26,7 @@ typedef int (*intercept_handler_t)(struc
 
 /* negativ values are error codes, positive values for internal conditions */
 #define SIE_INTERCEPT_RERUNVCPU(1<<0)
+#define SIE_INTERCEPT_UCONTROL (1<<1)
 int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu);
 
 #define VM_EVENT(d_kvm, d_loglevel, d_string, d_args...)\
Index: linux-2.5-cecsim/include/linux/kvm.h
===
--- linux-2.5-cecsim.orig/include/linux/kvm.h
+++ linux-2.5-cecsim/include/linux/kvm.h
@@ -162,6 +162,7 @@ struct kvm_pit_config {
 #define KVM_EXIT_INTERNAL_ERROR   17
 #define KVM_EXIT_OSI  18
 #define KVM_EXIT_PAPR_HCALL  19
+#define KVM_EXIT_UCONTROL20
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 #define KVM_INTERNAL_ERROR_EMULATION 1
@@ -249,6 +250,11 @@ struct kvm_run {
 #define KVM_S390_RESET_CPU_INIT  8
 #define KVM_S390_RESET_IPL   16
__u64 s390_reset_flags;
+   /* KVM_EXIT_UCONTROL */
+   struct {
+   __u64 trans_exc_code;
+   __u32 pgm_code;
+   } s390_ucontrol;
/* KVM_EXIT_DCR */
struct {
__u32 dcrn;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 10/12] [PATCH] kvm-s390: storage key interface

2011-12-01 Thread Carsten Otte
This patch introduces an interface to access the guest visible
storage keys. It supports three operations that model the behavior
that SSKE/ISKE/RRBE instructions would have if they were issued by
the guest. These instructions are all documented in the z architecture
principles of operation book.

Signed-off-by: Carsten Otte 
---
Index: linux-2.5-cecsim/arch/s390/include/asm/kvm_host.h
===
--- linux-2.5-cecsim.orig/arch/s390/include/asm/kvm_host.h
+++ linux-2.5-cecsim/arch/s390/include/asm/kvm_host.h
@@ -25,6 +25,9 @@
 #define KVM_PRIVATE_MEM_SLOTS 4
 
 #define KVM_SIE_PAGE_OFFSET 1
+#define KVM_S390_KEYOP_SSKE 0x01
+#define KVM_S390_KEYOP_ISKE 0x02
+#define KVM_S390_KEYOP_RRBE 0x03
 
 struct sca_entry {
atomic_t scn;
Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
===
--- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c
+++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
@@ -112,13 +112,144 @@ void kvm_arch_exit(void)
 {
 }
 
+pte_t *ptep_for_addr(unsigned long addr)
+{
+   struct vm_area_struct *vma;
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *rc;
+
+   down_read(¤t->mm->mmap_sem);
+
+   rc = NULL;
+   vma = find_vma(current->mm, addr);
+   if (!vma)
+   goto up_out;
+
+   pgd = pgd_offset(current->mm, addr);
+   pud = pud_alloc(current->mm, pgd, addr);
+   if (!pud)
+   goto up_out;
+
+   pmd = pmd_alloc(current->mm, pud, addr);
+   if (!pmd)
+   goto up_out;
+
+   if (!pmd_present(*pmd) &&
+   __pte_alloc(current->mm, vma, pmd, addr))
+   goto up_out;
+
+   rc = pte_offset(pmd, addr);
+up_out:
+   up_read(¤t->mm->mmap_sem);
+   return rc;
+}
+
+static long kvm_s390_keyop(struct kvm_s390_keyop *kop)
+{
+   unsigned long addr = kop->user_addr;
+   pte_t *ptep;
+   pgste_t pgste;
+   int r;
+   unsigned long skey;
+   unsigned long bits;
+
+   /* make sure this process is a hypervisor */
+   r = -EINVAL;
+   if (!mm_has_pgste(current->mm))
+   goto out;
+
+   r = -ENXIO;
+   if (addr >= PGDIR_SIZE)
+   goto out;
+
+   spin_lock(¤t->mm->page_table_lock);
+   ptep = ptep_for_addr(addr);
+   if (!ptep)
+   goto out_unlock;
+
+   pgste = pgste_get_lock(ptep);
+
+   switch (kop->operation) {
+   case KVM_S390_KEYOP_SSKE:
+   pgste = pgste_update_all(ptep, pgste);
+   /* set the real key back w/o rc bits */
+   skey = kop->key & (_PAGE_ACC_BITS | _PAGE_FP_BIT);
+   if (pte_present(*ptep))
+   page_set_storage_key(pte_val(*ptep), skey, 1);
+   /* put acc+f plus guest refereced and changed into the pgste */
+   pgste_val(pgste) &= ~(RCP_ACC_BITS | RCP_FP_BIT | RCP_GR_BIT
+| RCP_GC_BIT);
+   bits = (kop->key & (_PAGE_ACC_BITS | _PAGE_FP_BIT));
+   pgste_val(pgste) |= bits << 56;
+   bits = (kop->key & (_PAGE_CHANGED | _PAGE_REFERENCED));
+   pgste_val(pgste) |= bits << 48;
+   r = 0;
+   break;
+   case KVM_S390_KEYOP_ISKE:
+   if (pte_present(*ptep)) {
+   skey = page_get_storage_key(pte_val(*ptep));
+   kop->key = skey & (_PAGE_ACC_BITS | _PAGE_FP_BIT);
+   } else {
+   skey = 0;
+   kop->key = (pgste_val(pgste) >> 56) &
+  (_PAGE_ACC_BITS | _PAGE_FP_BIT);
+   }
+   kop->key |= skey & (_PAGE_CHANGED | _PAGE_REFERENCED);
+   kop->key |= (pgste_val(pgste) >> 48) &
+   (_PAGE_CHANGED | _PAGE_REFERENCED);
+   r = 0;
+   break;
+   case KVM_S390_KEYOP_RRBE:
+   pgste = pgste_update_all(ptep, pgste);
+   kop->key = 0;
+   if (pgste_val(pgste) & RCP_GR_BIT)
+   kop->key |= _PAGE_REFERENCED;
+   pgste_val(pgste) &= ~RCP_GR_BIT;
+   r = 0;
+   break;
+   default:
+   r = -EINVAL;
+   }
+   pgste_set_unlock(ptep, pgste);
+
+out_unlock:
+   spin_unlock(¤t->mm->page_table_lock);
+out:
+   return r;
+}
+
 /* Section: device related */
 long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
 {
-   if (ioctl == KVM_S390_ENABLE_SIE)
-   return s390_enable_sie();
-   return -EINVAL;
+   void __user *argp = (void __user *)arg;
+   int r;
+
+   switch (ioctl) {
+   case KVM_S390_ENABLE_SIE:
+   r = s390_enable_sie();
+   break;
+   case KVM_S390_KEYOP: {
+   struct kvm_s390_keyop kop;
+  

[patch 06/12] [PATCH] kvm-s390-ucontrol: disable in-kernel irq stack

2011-12-01 Thread Carsten Otte
This patch disables the in-kernel interrupt stack for KVM virtual
machines that are controlled by user. Userspace has to take care
of handling interrupts on its own.

Signed-off-by: Carsten Otte 
---
Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
===
--- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c
+++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c
@@ -521,7 +521,8 @@ static int __vcpu_run(struct kvm_vcpu *v
if (test_thread_flag(TIF_MCCK_PENDING))
s390_handle_mcck();
 
-   kvm_s390_deliver_pending_interrupts(vcpu);
+   if (!kvm_is_ucontrol(vcpu->kvm))
+   kvm_s390_deliver_pending_interrupts(vcpu);
 
vcpu->arch.sie_block->icptcode = 0;
local_irq_disable();

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 00/12] User controlled virtual machines

2011-12-01 Thread Carsten Otte
Hi Avi, Hi Marcelo,

this patch series introduces an interface to allow a privileged userspace
program to control a KVM virtual machine. The interface is intended for
use by a machine simulator called CECSIM that can simulate an entire
mainframe machine with nested virtualization and I/O for the purpose
of testing and debugging its firmware prior to availability of silicon.
This patchset allows for concurrent use of the KVM device driver to drive
both regular and user controlled virtual machines at the same time.
I would kindly like to ask for review and inclusion of these patches.

with kind regards,
Carsten

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 07/12] [PATCH] kvm-s390-ucontrol: interface to inject faults on a vcpu page table

2011-12-01 Thread Carsten Otte
This patch allows the user to fault in pages on a virtual cpus
address space for user controlled virtual machines. Typically this
is superfluous because userspace can just create a mapping and
let the kernel's page fault logic take are of it. There is one
exception: SIE won't start if the lowcore is not present. Normally
the kernel takes care of this [handle_validity() in
arch/s390/kvm/intercept.c] but since the kernel does not handle
intercepts for user controlled virtual machines, userspace needs to
be able to handle this condition.

Signed-off-by: Carsten Otte 
---
---
 arch/s390/kvm/kvm-s390.c |6 ++
 include/linux/kvm.h  |1 +
 2 files changed, 7 insertions(+)

--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -777,6 +777,12 @@ long kvm_arch_vcpu_ioctl(struct file *fi
break;
}
 #endif
+   case KVM_S390_VCPU_FAULT: {
+   r = gmap_fault(arg, vcpu->arch.gmap);
+   if (!IS_ERR_VALUE(r))
+   r = 0;
+   break;
+   }
default:
r = -EINVAL;
}
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -669,6 +669,7 @@ struct kvm_s390_ucas_mapping {
 #define KVM_S390_ENABLE_UCONTROL  _IO(KVMIO,  0x49)
 #define KVM_S390_UCAS_MAP_IOW(KVMIO, 0x50, struct 
kvm_s390_ucas_mapping)
 #define KVM_S390_UCAS_UNMAP  _IOW(KVMIO, 0x51, struct 
kvm_s390_ucas_mapping)
+#define KVM_S390_VCPU_FAULT _IOW(KVMIO, 0x52, unsigned long)
 
 /* Device model IOC */
 #define KVM_CREATE_IRQCHIP_IO(KVMIO,   0x60)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 01/12] [PATCH] kvm-s390: ioctl to switch to user controlled virtual machines

2011-12-01 Thread Carsten Otte
This patch introduces a new config option for user controlled kernel
virtual machines. It introduces a new ioctl named
KVM_S390_ENABLE_UCONTROL on the kvm file descriptor which allows for
a one way transition from a regular kernel virtual machine to a
user controlled virtual machine. The virtual machine must not have
any memory slots installed, and no virtual cpus defined.
Note that the user controlled virtual machines require CAP_SYS_ADMIN
privileges.

Signed-off-by: Carsten Otte 
---
---
 arch/s390/kvm/Kconfig|9 +
 arch/s390/kvm/kvm-s390.c |   30 ++
 arch/s390/kvm/kvm-s390.h |   10 ++
 include/linux/kvm.h  |3 +++
 4 files changed, 52 insertions(+)

--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -34,6 +34,15 @@ config KVM
 
  If unsure, say N.
 
+config KVM_UCONTROL
+   bool "Userspace controlled virtual machines"
+   depends on KVM
+   ---help---
+ Allow CAP_SYS_ADMIN users to create KVM virtual machines that are
+ controlled by userspace.
+
+ If unsure, say N.
+
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
 source drivers/vhost/Kconfig
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -147,6 +147,32 @@ int kvm_vm_ioctl_get_dirty_log(struct kv
return 0;
 }
 
+int kvm_s390_enable_ucontrol(struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_UCONTROL
+   int i;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   for (i = 0; i < KVM_MAX_VCPUS; i++)
+   if (kvm->vcpus[i])
+   return -EINVAL;
+
+   if (kvm->memslots->nmemslots)
+   return -EPERM;
+
+   if (kvm->arch.gmap)
+   gmap_free(kvm->arch.gmap);
+
+   kvm->arch.gmap = NULL;
+
+   return 0;
+#else
+   return -ENOTTY;
+#endif
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
@@ -164,6 +190,10 @@ long kvm_arch_vm_ioctl(struct file *filp
r = kvm_s390_inject_vm(kvm, &s390int);
break;
}
+   case KVM_S390_ENABLE_UCONTROL: {
+   r = kvm_s390_enable_ucontrol(kvm);
+   break;
+   }
default:
r = -ENOTTY;
}
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -47,6 +47,16 @@ static inline int __cpu_is_stopped(struc
return atomic_read(&vcpu->arch.sie_block->cpuflags) & CPUSTAT_STOP_INT;
 }
 
+static inline int kvm_is_ucontrol(struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_UCONTROL
+   if (kvm->arch.gmap)
+   return 0;
+   return 1;
+#else
+   return 0;
+#endif
+}
 int kvm_s390_handle_wait(struct kvm_vcpu *vcpu);
 enum hrtimer_restart kvm_s390_idle_wakeup(struct hrtimer *timer);
 void kvm_s390_tasklet(unsigned long parm);
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -654,6 +654,9 @@ struct kvm_clock_data {
struct kvm_userspace_memory_region)
 #define KVM_SET_TSS_ADDR  _IO(KVMIO,   0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
+/* enable ucontrol for s390 */
+#define KVM_S390_ENABLE_UCONTROL  _IO(KVMIO,  0x49)
+
 /* Device model IOC */
 #define KVM_CREATE_IRQCHIP_IO(KVMIO,   0x60)
 #define KVM_IRQ_LINE  _IOW(KVMIO,  0x61, struct kvm_irq_level)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 02/12] [PATCH] kvm-s390-ucontrol: per vcpu address spaces

2011-12-01 Thread Carsten Otte
This patch introduces two ioctls for virtual cpus, that are only
valid for kernel virtual machines that are controlled by userspace.
Each virtual cpu has its individual address space in this mode of
operation, and each address space is backed by the gmap
implementation just like the address space for regular KVM guests.
KVM_S390_UCAS_MAP allows to map a part of the user's virtual address
space to the vcpu. Starting offset and length in both the user and
the vcpu address space need to be aligned to 1M.
KVM_S390_UCAS_UNMAP can be used to unmap a range of memory from a
virtual cpu in a similar way.

Signed-off-by: Carsten Otte 
---
---
 arch/s390/kvm/kvm-s390.c |   50 ++-
 include/linux/kvm.h  |7 ++
 2 files changed, 56 insertions(+), 1 deletion(-)

--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -249,6 +249,10 @@ void kvm_arch_vcpu_destroy(struct kvm_vc
(__u64) vcpu->arch.sie_block)
vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0;
smp_mb();
+
+   if (kvm_is_ucontrol(vcpu->kvm))
+   gmap_free(vcpu->arch.gmap);
+
free_page((unsigned long)(vcpu->arch.sie_block));
kvm_vcpu_uninit(vcpu);
kfree(vcpu);
@@ -279,12 +283,20 @@ void kvm_arch_destroy_vm(struct kvm *kvm
kvm_free_vcpus(kvm);
free_page((unsigned long)(kvm->arch.sca));
debug_unregister(kvm->arch.dbf);
-   gmap_free(kvm->arch.gmap);
+   if (!kvm_is_ucontrol(kvm))
+   gmap_free(kvm->arch.gmap);
 }
 
 /* Section: vcpu related */
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
+   if (kvm_is_ucontrol(vcpu->kvm)) {
+   vcpu->arch.gmap = gmap_alloc(current->mm);
+   if (!vcpu->arch.gmap)
+   return -ENOMEM;
+   return 0;
+   }
+
vcpu->arch.gmap = vcpu->kvm->arch.gmap;
return 0;
 }
@@ -703,6 +715,42 @@ long kvm_arch_vcpu_ioctl(struct file *fi
case KVM_S390_INITIAL_RESET:
r = kvm_arch_vcpu_ioctl_initial_reset(vcpu);
break;
+#ifdef CONFIG_KVM_UCONTROL
+   case KVM_S390_UCAS_MAP: {
+   struct kvm_s390_ucas_mapping ucasmap;
+
+   if (copy_from_user(&ucasmap, argp, sizeof(ucasmap))) {
+   r = -EFAULT;
+   break;
+   }
+
+   if (!kvm_is_ucontrol(vcpu->kvm)) {
+   r = -EINVAL;
+   break;
+   }
+
+   r = gmap_map_segment(vcpu->arch.gmap, ucasmap.user_addr,
+ucasmap.vcpu_addr, ucasmap.length);
+   break;
+   }
+   case KVM_S390_UCAS_UNMAP: {
+   struct kvm_s390_ucas_mapping ucasmap;
+
+   if (copy_from_user(&ucasmap, argp, sizeof(ucasmap))) {
+   r = -EFAULT;
+   break;
+   }
+
+   if (!kvm_is_ucontrol(vcpu->kvm)) {
+   r = -EINVAL;
+   break;
+   }
+
+   r = gmap_unmap_segment(vcpu->arch.gmap, ucasmap.vcpu_addr,
+   ucasmap.length);
+   break;
+   }
+#endif
default:
r = -EINVAL;
}
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -655,7 +655,14 @@ struct kvm_clock_data {
 #define KVM_SET_TSS_ADDR  _IO(KVMIO,   0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
 /* enable ucontrol for s390 */
+struct kvm_s390_ucas_mapping {
+   unsigned long user_addr;
+   unsigned long vcpu_addr;
+   unsigned long length;
+};
 #define KVM_S390_ENABLE_UCONTROL  _IO(KVMIO,  0x49)
+#define KVM_S390_UCAS_MAP_IOW(KVMIO, 0x50, struct 
kvm_s390_ucas_mapping)
+#define KVM_S390_UCAS_UNMAP  _IOW(KVMIO, 0x51, struct 
kvm_s390_ucas_mapping)
 
 /* Device model IOC */
 #define KVM_CREATE_IRQCHIP_IO(KVMIO,   0x60)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] PCI: Rework config space locking, add INTx masking services

2011-12-01 Thread Jan Kiszka
On 2011-11-04 09:45, Jan Kiszka wrote:
> [ Rebase of v1 over yesterday's linux-next ]
> 
> This series tries to heal the currently broken locking scheme around PCI
> config space accesses.
> 
> We have an interface lock out access via sysfs, but that service wrongly
> assumes it is only called by one instance at a time for some device. So
> two loops doing
> 
> echo 1 > /sys/bus/pci/devices//reset
> 
> in parallel will trigger a kernel BUG at the moment.
> 
> Besides synchronizing with user space, we also need to manage config
> space access of generic PCI drivers. They need to mask legacy interrupt
> lines while the specific driver runs in user space or a guest OS.
> 
> The approach taken here is provide mutex-like locking for general
> access - which still requires a special mechanism due to requirements of
> the IBM Power RAID SCSI driver. Furthermore, INTx masking is now
> available via the PCI core and synchronized via the internal pci_lock.
> 
> Jan Kiszka (3):
>   pci: Rework config space blocking services
>   pci: Introduce INTx check & mask API
>   uio: Convert uio_generic_pci to new intx masking API
> 
>  drivers/pci/access.c  |   76 +--
>  drivers/pci/iov.c |   12 ++--
>  drivers/pci/pci.c |  114 
> -
>  drivers/pci/pci.h |2 +
>  drivers/scsi/ipr.c|   67 +---
>  drivers/scsi/ipr.h|1 +
>  drivers/uio/uio_pci_generic.c |   76 ++-
>  include/linux/pci.h   |   17 --
>  8 files changed, 248 insertions(+), 117 deletions(-)
> 

I just received yet another request regarding the KVM feature that
depends on this. What's the status of these patches? Were they merged
into some staging tree already (didn't find any traces so far)?

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Thanks

2011-12-01 Thread Song Li
l am a Staff of Hang Seng Bank HongKong, I do not know if we can work together 
in transferring $19,500,000.USD from my bank. Finally if you are interested I 
shall provide you with more details. my private Email: mrsong.l...@yahoo.cn
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Thanks

2011-12-01 Thread Song Li
l am a Staff of Hang Seng Bank HongKong, I do not know if we can work together 
in transferring $19,500,000.USD from my bank. Finally if you are interested I 
shall provide you with more details. my private Email: mrsong.l...@yahoo.cn
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V3 2/4] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

2011-12-01 Thread Avi Kivity
On 11/30/2011 10:59 AM, Raghavendra K T wrote:
> Add a hypercall to KVM hypervisor to support pv-ticketlocks 
>
> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt 
> state.
> 
> The presence of these hypercalls is indicated to guest via
> KVM_FEATURE_KICK_VCPU/KVM_CAP_KICK_VCPU.
>
> Qemu needs a corresponding patch to pass up the presence of this feature to 
> guest via cpuid. Patch to qemu will be sent separately.
>
> There is no Xen/KVM hypercall interface to await kick from.

The hypercall needs to be documented in
Documentation/virtual/kvm/hypercalls.txt.

Have you tested it on AMD machines?  There are some differences in the
hypercall infrastructure there.

>  /* This indicates that the new set of kvmclock msrs
>   * are available. The use of 0x11 and 0x12 is deprecated
>   */
>  #define KVM_FEATURE_CLOCKSOURCE23
>  #define KVM_FEATURE_ASYNC_PF 4
>  #define KVM_FEATURE_STEAL_TIME   5
> +#define KVM_FEATURE_KICK_VCPU6

Documentation/virtual/kvm/cpuid.txt.

>  
>  /* The last 8 bits are used to indicate how to interpret the flags field
>   * in pvclock structure. If no bits are set, all flags are ignored.
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c38efd7..6e1c8b4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2103,6 +2103,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>   case KVM_CAP_XSAVE:
>   case KVM_CAP_ASYNC_PF:
>   case KVM_CAP_GET_TSC_KHZ:
> + case KVM_CAP_KICK_VCPU:

This is redundant with the feature bit?  In general, KVM_CAP is for the
host API, while KVM_FEATURE is for the guest API.

>  
> +/*
> + * kvm_pv_kick_cpu_op:  Kick a vcpu.
> + *
> + * @cpu - vcpu to be kicked.
> + */
> +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int cpu)
> +{
> + struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, cpu);

There is no guarantee that guest cpu numbers match host vcpu numbers. 
Use APIC IDs, and kvm_apic_match_dest().

> + struct kvm_mp_state mp_state;
> +
> + mp_state.mp_state = KVM_MP_STATE_RUNNABLE;
> + if (vcpu) {
> + vcpu->kicked = 1;
> + /* Ensure kicked is always set before wakeup */
> + barrier();
> + }
> + kvm_arch_vcpu_ioctl_set_mpstate(vcpu, &mp_state);

This must only be called from the vcpu thread.

> + kvm_vcpu_kick(vcpu);
> +}
> +
>  
>   struct kvm_vcpu_arch arch;
> +
> + /*
> +  * blocked vcpu wakes up by checking this flag set by unlocker.
> +  */
> + int kicked;
>

Write only variable.

An alternative approach is to use an MSR protocol like x2apic ICR.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Process virito blk requests in separate thread

2011-12-01 Thread Pekka Enberg
On Thu, Dec 1, 2011 at 10:58 AM, Sasha Levin  wrote:
> I was looking into the concept of adding 'dedicated' threads to the
> threadpool, since when the threadpool was added originally one of the
> purposes was to have all worker threads in a single place.
>
> This way we could still have threads in one place, which would make it
> easier to control all of them but still won't hurt performance of those
> threads.
>
> We can merge this patch, and I'll do the dedicated thread patch on top
> of it.

Nah. Lets add the new API and change this patch to use it instead.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions

2011-12-01 Thread Ian Campbell
On Wed, 2011-11-30 at 18:15 +, Arnd Bergmann wrote:
> On Wednesday 30 November 2011, Ian Campbell wrote:
> > On Wed, 2011-11-30 at 14:32 +, Arnd Bergmann wrote:
> > > On Wednesday 30 November 2011, Ian Campbell wrote:
> > > What I suggested to the KVM developers is to start out with the
> > > vexpress platform, but then generalize it to the point where it fits
> > > your needs. All hardware that one expects a guest to have (GIC, timer,
> > > ...) will still show up in the same location as on a real vexpress,
> > > while anything that makes no sense or is better paravirtualized (LCD,
> > > storage, ...) just becomes optional and has to be described in the
> > > device tree if it's actually there.
> > 
> > That's along the lines of what I was thinking as well.
> > 
> > The DT contains the address of GIC, timer etc as well right? So at least
> > in principal we needn't provide e.g. the GIC at the same address as any
> > real platform but in practice I expect we will.
> 
> Yes.
> 
> > In principal we could also offer the user options as to which particular
> > platform a guest looks like.
> 
> At least when using a qemu based simulation. Most platforms have some
> characteristics that are not meaningful in a classic virtualization
> scenario, but it would certainly be helpful to use the virtualization
> extensions to run a kernel that was built for a particular platform
> faster than with pure qemu, when you want to test that kernel image.
> 
> It has been suggested in the past that it would be nice to run the
> guest kernel built for the same platform as the host kernel by
> default, but I think it would be much better to have just one
> platform that we end up using for guests on any host platform,
> unless there is a strong reason to do otherwise.

Yes, I agree, certainly that is what we were planning to target in the
first instance. Doing this means that we can get away with minimal
emulation of actual hardware, relying instead on PV drivers or hardware
virtualisation features.

Supporting specific board platforms as guests would be nice to have
eventually. We would need to do more emulation (e.g. running qemu as a
device model) for that case.

> There is also ongoing restructuring in the ARM Linux kernel to
> allow running the same kernel binary on multiple platforms. While
> there is still a lot of work to be done, you should assume that
> we will finish it before you see lots of users in production, there
> is no need to plan for the current one-kernel-per-board case.

We were absolutely banking on targeting the results of this work, so
that's good ;-)

Ian.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] KVM: MMU: audit: replace mmu audit tracepoint with jump-lable

2011-12-01 Thread Avi Kivity
On 11/30/2011 11:43 AM, Xiao Guangrong wrote:
> Subject: [PATCH] KVM: MMU: audit: inline audit function
>
> inline audit function and little cleanup
>
>

Thanks, applied.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Embeddedxen-devel] [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions

2011-12-01 Thread Ian Campbell
On Wed, 2011-11-30 at 18:32 +, Stefano Stabellini wrote:
> On Wed, 30 Nov 2011, Arnd Bergmann wrote:

> 
> > KVM and Xen at least both fall into the single-return-value category,
> > so we should be able to agree on a calling conventions. KVM does not
> > have an hcall API on ARM yet, and I see no reason not to use the
> > same implementation that you have in the Xen guest.
> > 
> > Stefano, can you split out the generic parts of your asm/xen/hypercall.h
> > file into a common asm/hypercall.h and submit it for review to the
> > arm kernel list?
> 
> Sure, I can do that.
> Usually the hypercall calling convention is very hypervisor specific,
> but if it turns out that we have the same requirements I happy to design
> a common interface.

I expect the only real decision to be made is hypercall page vs. raw hvc
instruction.

The page was useful on x86 where there is a variety of instructions
which could be used (at least for PV there was systenter/syscall/int, I
think vmcall instruction differs between AMD and Intel also) and gives
some additional flexibility. It's hard to predict but I don't think I'd
expect that to be necessary on ARM.

Another reason for having a hypercall page instead of a raw instruction
might be wanting to support 32 bit guests (from ~today) on a 64 bit
hypervisor in the future and perhaps needing to do some shimming/arg
translation. It would be better to aim for having the interface just be
32/64 agnostic but mistakes do happen.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors

2011-12-01 Thread Michael S. Tsirkin
On Thu, Dec 01, 2011 at 10:09:37AM +0200, Sasha Levin wrote:
> On Thu, 2011-12-01 at 09:58 +0200, Michael S. Tsirkin wrote:
> > On Thu, Dec 01, 2011 at 01:12:25PM +1030, Rusty Russell wrote:
> > > On Wed, 30 Nov 2011 18:11:51 +0200, Sasha Levin  
> > > wrote:
> > > > On Tue, 2011-11-29 at 16:58 +0200, Avi Kivity wrote:
> > > > > On 11/29/2011 04:54 PM, Michael S. Tsirkin wrote:
> > > > > > > 
> > > > > > > Which is actually strange, weren't indirect buffers introduced to 
> > > > > > > make
> > > > > > > the performance *better*? From what I see it's pretty much the
> > > > > > > same/worse for virtio-blk.
> > > > > >
> > > > > > I know they were introduced to allow adding very large bufs.
> > > > > > See 9fa29b9df32ba4db055f3977933cd0c1b8fe67cd
> > > > > > Mark, you wrote the patch, could you tell us which workloads
> > > > > > benefit the most from indirect bufs?
> > > > > >
> > > > > 
> > > > > Indirects are really for block devices with many spindles, since there
> > > > > the limiting factor is the number of requests in flight.  Network
> > > > > interfaces are limited by bandwidth, it's better to increase the ring
> > > > > size and use direct buffers there (so the ring size more or less
> > > > > corresponds to the buffer size).
> > > > > 
> > > > 
> > > > I did some testing of indirect descriptors under different workloads.
> > > 
> > > MST and I discussed getting clever with dynamic limits ages ago, but it
> > > was down low on the TODO list.  Thanks for diving into this...
> > > 
> > > AFAICT, if the ring never fills, direct is optimal.  When the ring
> > > fills, indirect is optimal (we're better to queue now than later).
> > > 
> > > Why not something simple, like a threshold which drops every time we
> > > fill the ring?
> > > 
> > > struct vring_virtqueue
> > > {
> > > ...
> > > int indirect_thresh;
> > > ...
> > > }
> > > 
> > > virtqueue_add_buf_gfp()
> > > {
> > > ...
> > > 
> > > if (vq->indirect &&
> > > (vq->vring.num - vq->num_free) + out + in > 
> > > vq->indirect_thresh)
> > > return indirect()
> > > ...
> > > 
> > >   if (vq->num_free < out + in) {
> > > if (vq->indirect && vq->indirect_thresh > 0)
> > > vq->indirect_thresh--;
> > > 
> > > ...
> > > }
> > > 
> > > Too dumb?
> > > 
> > > Cheers,
> > > Rusty.
> > 
> > We'll presumably need some logic to increment is back,
> > to account for random workload changes.
> > Something like slow start?
> 
> We can increment it each time the queue was less than 10% full, it
> should act like slow start, no?

No, we really shouldn't get an empty ring as long as things behave
well. What I meant is something like:

#define VIRTIO_DECREMENT 2
#define VIRTIO_INCREMENT 1
if (vq->num_free < out + in) {
 if (vq->indirect && vq->indirect_thresh > VIRTIO_DECREMENT)
 vq->indirect_thresh /= VIRTIO_DECREMENT;
} else {
if (vq->indirect_thresh < vq->num)
vq->indirect_thresh += VIRTIO_INCREMENT;
}

So we try to avoid indirect but the moment there's no space, we decrease
the threshold drastically.  If you make the increment/decrement module
parameters it's easy to try different values.


> -- 
> 
> Sasha.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes for November 29

2011-12-01 Thread Avi Kivity
On 11/29/2011 09:10 PM, Markus Armbruster wrote:
> Avi Kivity  writes:
>
> > On 11/29/2011 05:51 PM, Juan Quintela wrote:
> >> How to do high level stuff?
> >> - python?
> >>
> >
> > One of the disadvantages of the various scripting languages is the lack
> > of static type checking, which makes it harder to do full sweeps of the
> > source for API changes, relying on the compiler to catch type (or other)
> > errors.
> >
> > On the other hand, the statically typed languages usually have more
> > boilerplate.  Since one of the goals is to simplify things, this
> > indicates the need for a language with type inference.
> >
> > On the third hand, languages with type inferences are still immature
> > (golang?), so we probably need to keep this discussion going until an
> > obvious choice presents itself.
>
> I wouldn't call ML immature.  But I wouldn't call it a scripting
> language, either.

It was just off the radar for me.  We should consider it, by all means.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] virtio: use mandatory barriers for remote processor vdevs

2011-12-01 Thread Michael S. Tsirkin
On Thu, Dec 01, 2011 at 08:14:26AM +0200, Ohad Ben-Cohen wrote:
> On Thu, Dec 1, 2011 at 1:13 AM, Michael S. Tsirkin  wrote:
> > For x86, stores into memory are ordered. So I think that yes, smp_XXX
> > can be selected at compile time.
> 
> But then you can't use the same kernel image for both scenarios.

I was talking about virtio-pci. That always allocates the ring
in the normal memory.

> It won't take long until people will use virtio on ARM for both
> virtualization and for talking to devices, and having to rebuild the
> kernel for different use cases is nasty.

Yes, I understand that it's nasty.

> ___
> Virtualization mailing list
> virtualizat...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Process virito blk requests in separate thread

2011-12-01 Thread Sasha Levin
On Thu, 2011-12-01 at 10:48 +0200, Pekka Enberg wrote:
> On Wed, 30 Nov 2011, Asias He wrote:
> >>> In virtio net's notify_vq(), we simply signal the tx/rx handle thread
> >>> and return.
> >>
> >> Why not use the threadpool?
> >
> > No.
> 
> Sasha?

I was looking into the concept of adding 'dedicated' threads to the
threadpool, since when the threadpool was added originally one of the
purposes was to have all worker threads in a single place.

This way we could still have threads in one place, which would make it
easier to control all of them but still won't hurt performance of those
threads.

We can merge this patch, and I'll do the dedicated thread patch on top
of it.

-- 

Sasha.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Process virito blk requests in separate thread

2011-12-01 Thread Pekka Enberg

On Wed, 30 Nov 2011, Asias He wrote:

In virtio net's notify_vq(), we simply signal the tx/rx handle thread
and return.


Why not use the threadpool?


No.


Sasha?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] virtio: use mandatory barriers for remote processor vdevs

2011-12-01 Thread Michael S. Tsirkin
On Thu, Dec 01, 2011 at 12:58:59PM +1030, Rusty Russell wrote:
> On Thu, 1 Dec 2011 01:13:07 +0200, "Michael S. Tsirkin"  
> wrote:
> > For x86, stores into memory are ordered. So I think that yes, smp_XXX
> > can be selected at compile time.
> > 
> > So let's forget the virtio strangeness for a minute,
> 
> Hmm, we got away with light barriers because we knew we were not
> *really* talking to a device.  But now with virtio-mmio, turns out we
> are :)

You think virtio-mmio this issue too?  It's reported on remoteproc...

> I'm really tempted to revert d57ed95 for 3.2, and we can revisit this
> optimization later if it proves worthwhile.
> 
> Thoughts?
> Rusty. 

Generally it does seem the best we can do for 3.2.

Given it's rc3, I'd be a bit wary of introducing regressions - I'll try
to find some real setups (as in - not my laptop) to run some benchmarks
on, to verify there's no major problem.
I hope I can report on this in about a week from now - want to hold onto this 
meanwhile?

Further, if we do revert, need to remember to apply the following
beforehand, to avoid breaking virtio tool:

tools/virtio: implement mandatory barriers for x86

Signed-off-by: Michael S. Tsirkin 

diff --git a/tools/virtio/linux/virtio.h b/tools/virtio/linux/virtio.h
index 68b8b8d..1bf0e80 100644
--- a/tools/virtio/linux/virtio.h
+++ b/tools/virtio/linux/virtio.h
@@ -172,11 +172,18 @@ struct virtqueue {
 #define MODULE_LICENSE(__MODULE_LICENSE_value) \
const char *__MODULE_LICENSE_name = __MODULE_LICENSE_value
 
 #define CONFIG_SMP
 
 #if defined(__i386__) || defined(__x86_64__)
 #define barrier() asm volatile("" ::: "memory")
 #define mb() __sync_synchronize()
+#if defined(__i386__)
+#define wmb() mb()
+#define rmb() mb()
+#else
+#define wmb() asm volatile("sfence" ::: "memory")
+#define rmb() asm volatile("lfence" ::: "memory")
+#endif
 
 #define smp_mb()   mb()
 # define smp_rmb() barrier()

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-ring: Use threshold for switching to indirect descriptors

2011-12-01 Thread Sasha Levin
On Thu, 2011-12-01 at 09:58 +0200, Michael S. Tsirkin wrote:
> On Thu, Dec 01, 2011 at 01:12:25PM +1030, Rusty Russell wrote:
> > On Wed, 30 Nov 2011 18:11:51 +0200, Sasha Levin  
> > wrote:
> > > On Tue, 2011-11-29 at 16:58 +0200, Avi Kivity wrote:
> > > > On 11/29/2011 04:54 PM, Michael S. Tsirkin wrote:
> > > > > > 
> > > > > > Which is actually strange, weren't indirect buffers introduced to 
> > > > > > make
> > > > > > the performance *better*? From what I see it's pretty much the
> > > > > > same/worse for virtio-blk.
> > > > >
> > > > > I know they were introduced to allow adding very large bufs.
> > > > > See 9fa29b9df32ba4db055f3977933cd0c1b8fe67cd
> > > > > Mark, you wrote the patch, could you tell us which workloads
> > > > > benefit the most from indirect bufs?
> > > > >
> > > > 
> > > > Indirects are really for block devices with many spindles, since there
> > > > the limiting factor is the number of requests in flight.  Network
> > > > interfaces are limited by bandwidth, it's better to increase the ring
> > > > size and use direct buffers there (so the ring size more or less
> > > > corresponds to the buffer size).
> > > > 
> > > 
> > > I did some testing of indirect descriptors under different workloads.
> > 
> > MST and I discussed getting clever with dynamic limits ages ago, but it
> > was down low on the TODO list.  Thanks for diving into this...
> > 
> > AFAICT, if the ring never fills, direct is optimal.  When the ring
> > fills, indirect is optimal (we're better to queue now than later).
> > 
> > Why not something simple, like a threshold which drops every time we
> > fill the ring?
> > 
> > struct vring_virtqueue
> > {
> > ...
> > int indirect_thresh;
> > ...
> > }
> > 
> > virtqueue_add_buf_gfp()
> > {
> > ...
> > 
> > if (vq->indirect &&
> > (vq->vring.num - vq->num_free) + out + in > vq->indirect_thresh)
> > return indirect()
> > ...
> > 
> > if (vq->num_free < out + in) {
> > if (vq->indirect && vq->indirect_thresh > 0)
> > vq->indirect_thresh--;
> > 
> > ...
> > }
> > 
> > Too dumb?
> > 
> > Cheers,
> > Rusty.
> 
> We'll presumably need some logic to increment is back,
> to account for random workload changes.
> Something like slow start?

We can increment it each time the queue was less than 10% full, it
should act like slow start, no?

-- 

Sasha.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >