Re: [RESEND PATCH 1/3] x86: Adding structs to reflect cpuid fields

2014-09-18 Thread Borislav Petkov
On Thu, Sep 18, 2014 at 02:29:54AM +0200, Radim Krčmář wrote:
> I think you proposed to use magic constant in place of of MASK_FAM_X, so

Huh, what?

> Second problem:  Most elements don't begin at offset 0, so the usual
> retrieval would add a shift, (repurposing max_monitor_line_size)

So what?

>  max_monitor_line_size = (cpuid & MASK_FAM_X) >> OFFSET_FAM_X;
> 
> and it's not better when we write it back after doing stuff.

Writing back CPUID on baremetal? You can't be serious.

Ok, this is getting ridiculous so I'll stop wasting time. I stand by my
initial statement - this is a bad idea.

-- 
Regards/Gruss,
Boris.
--
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 84781] New: The guest will hang after live migration.

2014-09-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=84781

Bug ID: 84781
   Summary: The guest will hang after live migration.
   Product: Virtualization
   Version: unspecified
Kernel Version: 3.17.0-rc1
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: kvm
  Assignee: virtualization_...@kernel-bugs.osdl.org
  Reporter: chao.z...@intel.com
Regression: No

Environment:

Host OS (ia32/ia32e/IA64):ia32e
Guest OS (ia32/ia32e/IA64):ia32e
Guest OS Type (Linux/Windows):Linux
kvm.git Commit:fd2752352bbc98850d83b5448a288d8991590317
qemu.git Commit:e4d50d47a9eb15f42bdd561803a29a4d7c3eb8ec
Host Kernel Version:3.17.0-rc1
Hardware:Ivytown_EP, Haswell_EP


Bug detailed description:
--
after guest live migration, the guest will hang, and ping guest's fail.

note:
1. after guest save/restore, the guest will hang
2. This should be a kernel bug
kvm  + qemu =  result
fd275235 + e4d50d47 =  bad
c77dcacb + e4d50d47 =  good


Reproduce steps:

1.Start a TCP daemon for migration
qemu-system-x86_64 -enable-kvm -m 4G -smp 4 -net nic,macaddr=00:12:31:45:41:23
-net tap,script=/etc/kvm/qemu-ifup rhel6u5.qcow -incoming tcp:localhost:
2. create guest
qemu-system-x86_64 -enable-kvm -m 4G -smp 4 -net nic,macaddr=00:12:31:45:41:23
-net tap,script=/etc/kvm/qemu-ifup rhel6u5.qcow
3. ctrl+alt+2 to change to qemu monitor
4. migrate tcp:localhost:


Current result:

guest hang after live migration or save/restore.

Expected result:

after live/migration or save/restore, the guest works fine.


Basic root-causing log:
--

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 84781] The guest will hang after live migration.

2014-09-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=84781

--- Comment #1 from Zhou, Chao  ---
the first bad commit is:
commit cbcf2dd3b3d4d990610259e8d878fc8dc1f17d80
Author: Thomas Gleixner 
Date:   Wed Jul 16 21:04:54 2014 +

x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based

Convert the relevant base data right away to nanoseconds instead of
doing the conversion on every readout. Reduces text size by 160 bytes.

Signed-off-by: Thomas Gleixner 
Cc: Gleb Natapov 
Cc: kvm@vger.kernel.org
Acked-by: Paolo Bonzini 
Signed-off-by: John Stultz 

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM Test report, kernel fd275235... qemu e4d50d47...

2014-09-18 Thread Hu, Robert
Hi All,

This is KVM upstream test result against kvm.git next branch and qemu.git 
master branch.
 kvm.git next branch: fd2752352bbc98850d83b5448a288d8991590317 based on 
kernel 3.17.0-rc1
 qemu.git master branch: e4d50d47a9eb15f42bdd561803a29a4d7c3eb8ec

We found one new bug and no fix bugs in the past three weeks.

New issue (1):
1. The guest will hang after live migration.
https://bugzilla.kernel.org/show_bug.cgi?id=84781

Fixed issues (0):

Old issues (4):
--
1. Guest hang when doing kernel build and writing data in guest.
  https://bugs.launchpad.net/qemu/+bug/1096814
2. [Nested] Windows XP Mode can not work
  https://bugzilla.kernel.org/show_bug.cgi?id=60782
3. Cannot boot Xen under KVM with X2APIC enabled
https://bugzilla.kernel.org/show_bug.cgi?id=82211
4. guest can't get ip when create guest with bridge using virtio driver.
   https://bugs.launchpad.net/qemu/+bug/1364249




Test environment:
==
  Platform   IvyBridge-EP    Haswell-EP
  CPU Cores   32   56
  Memory size  128GB      160GB


Regards
Robert Ho















--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[INVITE] OVMF BoF session at the KVM Forum 2014

2014-09-18 Thread Laszlo Ersek
Hello All,

I've been made an offer that I couldn't refuse :) to "organize" a Birds
of a Feather session concerning OVMF at the KVM Forum 2014.

Interested people, please sign up:

  http://www.linux-kvm.org/page/KVM_Forum_2014_BOF#OVMF

Everyone else: apologies about the noise.

Thanks,
Laszlo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] blk-mq: Avoid race condition with uninitialized requests

2014-09-18 Thread David Hildenbrand
This patch should fix the bug reported in https://lkml.org/lkml/2014/9/11/249.

We have to initialize at least the atomic_flags and the cmd_flags when
allocating storage for the requests.

Otherwise blk_mq_timeout_check() might dereference uninitialized pointers when
racing with the creation of a request.

Also move the reset of cmd_flags for the initializing code to the point where a
request is freed. So we will never end up with pending flush request indicators
that might trigger dereferences of invalid pointers in blk_mq_timeout_check().

Cc: sta...@vger.kernel.org
Signed-off-by: David Hildenbrand 
---
 block/blk-mq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 383ea0c..eed6340 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -203,7 +203,6 @@ __blk_mq_alloc_request(struct blk_mq_alloc_data *data, int 
rw)
if (tag != BLK_MQ_TAG_FAIL) {
rq = data->hctx->tags->rqs[tag];
 
-   rq->cmd_flags = 0;
if (blk_mq_tag_busy(data->hctx)) {
rq->cmd_flags = REQ_MQ_INFLIGHT;
atomic_inc(&data->hctx->nr_active);
@@ -258,6 +257,7 @@ static void __blk_mq_free_request(struct blk_mq_hw_ctx 
*hctx,
 
if (rq->cmd_flags & REQ_MQ_INFLIGHT)
atomic_dec(&hctx->nr_active);
+   rq->cmd_flags = 0;
 
clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
blk_mq_put_tag(hctx, tag, &ctx->last_tag);
@@ -1404,6 +1404,8 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct 
blk_mq_tag_set *set,
left -= to_do * rq_size;
for (j = 0; j < to_do; j++) {
tags->rqs[i] = p;
+   tags->rqs[i]->atomic_flags = 0;
+   tags->rqs[i]->cmd_flags = 0;
if (set->ops->init_request) {
if (set->ops->init_request(set->driver_data,
tags->rqs[i], hctx_idx, i,
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] blk-mq: Avoid race condition with uninitialized requests

2014-09-18 Thread David Hildenbrand
This patch should fix the bug reported in https://lkml.org/lkml/2014/9/11/249.

Test is still pending.

David Hildenbrand (1):
  blk-mq: Avoid race condition with uninitialized requests

 block/blk-mq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 84781] The guest will hang after live migration.

2014-09-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=84781

Paolo Bonzini  changed:

   What|Removed |Added

 CC||bonz...@gnu.org

--- Comment #2 from Paolo Bonzini  ---
Chao, can you test a more recent 3.17-rc version (3.17-rc4 or newer)?  It
should be fixed already.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Test report, kernel fd275235... qemu e4d50d47...

2014-09-18 Thread Paolo Bonzini
Il 18/09/2014 10:23, Hu, Robert ha scritto:
> Hi All,
> 
> This is KVM upstream test result against kvm.git next branch and qemu.git 
> master branch.
>  kvm.git next branch: fd2752352bbc98850d83b5448a288d8991590317 based 
> on kernel 3.17.0-rc1
>  qemu.git master branch: e4d50d47a9eb15f42bdd561803a29a4d7c3eb8ec
> 
> We found one new bug and no fix bugs in the past three weeks.

Thanks.

> New issue (1):
> 1. The guest will hang after live migration.
> https://bugzilla.kernel.org/show_bug.cgi?id=84781

> the first bad commit is:
> commit cbcf2dd3b3d4d990610259e8d878fc8dc1f17d80
> Author: Thomas Gleixner 
> Date:   Wed Jul 16 21:04:54 2014 +
> 
> x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based
> 
> Convert the relevant base data right away to nanoseconds instead of
> doing the conversion on every readout. Reduces text size by 160 bytes.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Gleb Natapov 
> Cc: kvm@vger.kernel.org
> Acked-by: Paolo Bonzini 
> Signed-off-by: John Stultz 

This should be fixed by 3.17-rc4.  Please test Linus tree.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND PATCH 1/3] x86: Adding structs to reflect cpuid fields

2014-09-18 Thread Radim Krčmář
2014-09-18 09:19+0200, Borislav Petkov:
> On Thu, Sep 18, 2014 at 02:29:54AM +0200, Radim Krčmář wrote:
> > I think you proposed to use magic constant in place of of MASK_FAM_X, so
> 
> Huh, what?

Your example.  It cannot be verbatim MASK_FAM_X in real code.

I interpreted it to be a placeholder for 0x1 (first 17 bits) and
said why I think it is not a good thing.
The other interpretation (well named macro) was against your goal.

> > Second problem:  Most elements don't begin at offset 0, so the usual
> > retrieval would add a shift, (repurposing max_monitor_line_size)
> 
> So what?

Your goal was easy parsing (last sentence of the mail).

My argument is that bitfields are easier to read.  (With examples.)

> >  max_monitor_line_size = (cpuid & MASK_FAM_X) >> OFFSET_FAM_X;
> > 
> > and it's not better when we write it back after doing stuff.
> 
> Writing back CPUID on baremetal? You can't be serious.

We are not talking just about baremetal, see patch [3/3].

There are, and will be new, use cases for modifying and storing cpuid.

> Ok, this is getting ridiculous so I'll stop wasting time. I stand by my
> initial statement - this is a bad idea.

Acknowledged.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests] arm: fix crash when caches are off

2014-09-18 Thread Paolo Bonzini
Il 16/09/2014 04:06, Andrew Jones ha scritto:
> We shouldn't try Load-Exclusive instructions unless we've enabled memory
> management, as these instructions depend on the data cache unit's
> coherency monitor. This patch adds a new setup boolean, initialized to false,
> that is used to guard Load-Exclusive instructions. Eventually we'll add more
> setup code that sets it true.
> 
> Note: This problem became visible on boards with Cortex-A7 processors. Testing
> with Cortex-A15 didn't expose it, as those may have an external coherency
> monitor that still allows the instruction to execute (on A7s we got data
> aborts). Although even on A15's it's not clear from the spec if the
> instructions will behave as expected while caches are off, so we no longer
> allow Load-Exclusive instructions on those processors without caches enabled
> either.
> 
> Signed-off-by: Andrew Jones 
> ---
>  lib/arm/asm/setup.h |  2 ++
>  lib/arm/setup.c |  1 +
>  lib/arm/spinlock.c  | 10 ++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/lib/arm/asm/setup.h b/lib/arm/asm/setup.h
> index 21445ef2085fc..9c54c184e2866 100644
> --- a/lib/arm/asm/setup.h
> +++ b/lib/arm/asm/setup.h
> @@ -20,6 +20,8 @@ extern phys_addr_t __phys_offset, __phys_end;
>  #define PHYS_SIZE(1ULL << PHYS_SHIFT)
>  #define PHYS_MASK(PHYS_SIZE - 1ULL)
>  
> +extern bool mem_caches_enabled;
> +
>  #define L1_CACHE_SHIFT   6
>  #define L1_CACHE_BYTES   (1 << L1_CACHE_SHIFT)
>  #define SMP_CACHE_BYTES  L1_CACHE_BYTES
> diff --git a/lib/arm/setup.c b/lib/arm/setup.c
> index 3941c9757dcb2..f7ed639c9d499 100644
> --- a/lib/arm/setup.c
> +++ b/lib/arm/setup.c
> @@ -25,6 +25,7 @@ u32 cpus[NR_CPUS] = { [0 ... NR_CPUS-1] = (~0UL) };
>  int nr_cpus;
>  
>  phys_addr_t __phys_offset, __phys_end;
> +bool mem_caches_enabled;
>  
>  static void cpu_set(int fdtnode __unused, u32 regval, void *info __unused)
>  {
> diff --git a/lib/arm/spinlock.c b/lib/arm/spinlock.c
> index d8a6d4c3383d6..43539c5e84062 100644
> --- a/lib/arm/spinlock.c
> +++ b/lib/arm/spinlock.c
> @@ -1,12 +1,22 @@
>  #include "libcflat.h"
>  #include "asm/spinlock.h"
>  #include "asm/barrier.h"
> +#include "asm/setup.h"
>  
>  void spin_lock(struct spinlock *lock)
>  {
>   u32 val, fail;
>  
>   dmb();
> +
> + /*
> +  * Without caches enabled Load-Exclusive instructions may fail.
> +  * In that case we do nothing, and just hope the caller knows
> +  * what they're doing.
> +  */
> + if (!mem_caches_enabled)
> + return;
> +
>   do {
>   asm volatile(
>   "1: ldrex   %0, [%2]\n"
> 

Tested and pushed.  Thanks!

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [INVITE] OVMF BoF session at the KVM Forum 2014

2014-09-18 Thread Andreas Färber
Hello Laszlo,

Am 18.09.2014 um 10:23 schrieb Laszlo Ersek:
> I've been made an offer that I couldn't refuse :) to "organize" a Birds
> of a Feather session concerning OVMF at the KVM Forum 2014.
> 
> Interested people, please sign up:
> 
>   http://www.linux-kvm.org/page/KVM_Forum_2014_BOF#OVMF

Nice idea. Your summary mentions only ia32 and x86_64 - I would be
interested in an update on OVMF for AArch64 - there seemed to already be
support for ARM's Foundation Model but not yet for QEMU.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [INVITE] OVMF BoF session at the KVM Forum 2014

2014-09-18 Thread Laszlo Ersek
On 09/18/14 13:44, Andreas Färber wrote:
> Hello Laszlo,
> 
> Am 18.09.2014 um 10:23 schrieb Laszlo Ersek:
>> I've been made an offer that I couldn't refuse :) to "organize" a Birds
>> of a Feather session concerning OVMF at the KVM Forum 2014.
>>
>> Interested people, please sign up:
>>
>>   http://www.linux-kvm.org/page/KVM_Forum_2014_BOF#OVMF
> 
> Nice idea. Your summary mentions only ia32 and x86_64 - I would be
> interested in an update on OVMF for AArch64 - there seemed to already be
> support for ARM's Foundation Model but not yet for QEMU.

We've successfully UEFI-booted
- GNU/Linux guest(s) on
- upstream edk2 (*) and
- upstream qemu-system-aarch64 with
  - TCG on x86_64 host,
  - KVM on aarch64 host (**)

(*) Ard's patches for upstream edk2 are in the process of being tested /
merged.

(**) Ard's patches for the upstream host kernel (== KVM) have been...
ugh, not sure... applied to a maintainer tree? Ard? :)

So, it works (as far as I tested it myself on TCG, and heard reports
about it on KVM), but right now you need to apply a handful of pending
patches manually.

We can certainly talk about Aarch64 at the BoF, but then Ard should
co-organize. No good deed goes unpunished, as ever! :)

Cheers,
Laszlo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] hw_random: use reference counts on each struct hwrng.

2014-09-18 Thread Amos Kong
On Thu, Sep 18, 2014 at 12:18:23PM +0930, Rusty Russell wrote:
> current_rng holds one reference, and we bump it every time we want
> to do a read from it.
> 
> This means we only hold the rng_mutex to grab or drop a reference,
> so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't
> block on read of /dev/hwrng.
> 
> Using a kref is overkill (we're always under the rng_mutex), but
> a standard pattern.
> 
> This also solves the problem that the hwrng_fillfn thread was
> accessing current_rng without a lock, which could change (eg. to NULL)
> underneath it.

Hi Rusty,
 
> Signed-off-by: Rusty Russell 
> ---
>  drivers/char/hw_random/core.c | 135 
> --
>  include/linux/hw_random.h |   2 +
>  2 files changed, 94 insertions(+), 43 deletions(-)

...

>  static int rng_dev_open(struct inode *inode, struct file *filp)
>  {
>   /* enforce read-only access to this chrdev */
> @@ -154,21 +202,22 @@ static ssize_t rng_dev_read(struct file *filp, char 
> __user *buf,
>   ssize_t ret = 0;
>   int err = 0;
>   int bytes_read, len;
> + struct hwrng *rng;
>  
>   while (size) {
> - if (mutex_lock_interruptible(&rng_mutex)) {
> - err = -ERESTARTSYS;
> + rng = get_current_rng();
> + if (IS_ERR(rng)) {
> + err = PTR_ERR(rng);
>   goto out;
>   }
> -
> - if (!current_rng) {
> + if (!rng) {
>   err = -ENODEV;
> - goto out_unlock;
> + goto out;
>   }
>  
>   mutex_lock(&reading_mutex);
>   if (!data_avail) {
> - bytes_read = rng_get_data(current_rng, rng_buffer,
> + bytes_read = rng_get_data(rng, rng_buffer,
>   rng_buffer_size(),
>   !(filp->f_flags & O_NONBLOCK));
>   if (bytes_read < 0) {
> @@ -200,7 +249,6 @@ static ssize_t rng_dev_read(struct file *filp, char 
> __user *buf,
>   ret += len;
>   }
>  
> - mutex_unlock(&rng_mutex);
>   mutex_unlock(&reading_mutex);
>  
>   if (need_resched())
> @@ -210,15 +258,16 @@ static ssize_t rng_dev_read(struct file *filp, char 
> __user *buf,
>   err = -ERESTARTSYS;

We need put_rng() in this error path. Otherwise, unhotplug will hang
in the end of hwrng_unregister()

|/* Just in case rng is reading right now, wait. */
|wait_event(rng_done, atomic_read(&rng->ref.refcount) == 0);

Steps to reproduce the hang:
  guest) # dd if=/dev/hwrng of=/dev/null 
  cancel dd process after 10 seconds
  guest) # dd if=/dev/hwrng of=/dev/null &
  hotunplug rng device from qemu monitor
  result: device can't be removed (still can find in QEMU monitor)


diff --git a/drivers/char/hw_random/core.c
b/drivers/char/hw_random/core.c
index 96fa067..4e22d70 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -258,6 +258,7 @@ static ssize_t rng_dev_read(struct file *filp,
char __user *buf,
 
if (signal_pending(current)) {
err = -ERESTARTSYS;
+   put_rng(rng);
goto out;
}

>   goto out;
>   }
> +
> + put_rng(rng);
>   }
>  out:
>   return ret ? : err;
> -out_unlock:
> - mutex_unlock(&rng_mutex);
> - goto out;
> +
>  out_unlock_reading:
>   mutex_unlock(&reading_mutex);
> - goto out_unlock;
> + put_rng(rng);
> + goto out;
>  }
>  
>  
> @@ -257,8 +306,8 @@ static ssize_t hwrng_attr_current_store(struct device 
> *dev,
>   err = hwrng_init(rng);
>   if (err)
>   break;
> - hwrng_cleanup(current_rng);
> - current_rng = rng;
> + drop_current_rng();
> + set_current_rng(rng);
>   err = 0;
>   break;
>   }
> @@ -272,17 +321,15 @@ static ssize_t hwrng_attr_current_show(struct device 
> *dev,
>  struct device_attribute *attr,
>  char *buf)
>  {
> - int err;
>   ssize_t ret;
> - const char *name = "none";
> + struct hwrng *rng;
>  
> - err = mutex_lock_interruptible(&rng_mutex);
> - if (err)
> - return -ERESTARTSYS;
> - if (current_rng)
> - name = current_rng->name;
> - ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
> - mutex_unlock(&rng_mutex);
> + rng = get_current_rng();
> + if (IS_ERR(rng))
> + return PTR_ERR(rng);
> +
> + ret = snprintf(buf, PAGE_SIZE, "%s\n", rng ? rng->name : "none");
> + put_rng(rng);
>  
>   return ret;
>  }
> @@ -35

[PATCH v2 6/6] hw_random: don't init list element we're about to add to list.

2014-09-18 Thread Amos Kong
From: Rusty Russell 

Another interesting anti-pattern.

Signed-off-by: Rusty Russell 
---
 drivers/char/hw_random/core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 6420409..12187dd 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -486,7 +486,6 @@ int hwrng_register(struct hwrng *rng)
goto out_unlock;
}
}
-   INIT_LIST_HEAD(&rng->list);
list_add_tail(&rng->list, &rng_list);
 
if (old_rng && !rng->init) {
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/6] hw_random: move some code out mutex_lock for avoiding underlying deadlock

2014-09-18 Thread Amos Kong
In next patch, we use reference counting for each struct hwrng,
changing reference count also needs to take mutex_lock. Before
releasing the lock, if we try to stop a kthread that waits to
take the lock to reduce the referencing count, deadlock will
occur.

Signed-off-by: Amos Kong 
---
 drivers/char/hw_random/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index b1b6042..a0905c8 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -474,12 +474,12 @@ void hwrng_unregister(struct hwrng *rng)
}
}
if (list_empty(&rng_list)) {
+   mutex_unlock(&rng_mutex);
unregister_miscdev();
if (hwrng_fill)
kthread_stop(hwrng_fill);
-   }
-
-   mutex_unlock(&rng_mutex);
+   } else
+   mutex_unlock(&rng_mutex);
 }
 EXPORT_SYMBOL_GPL(hwrng_unregister);
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/6] hw_random: fix unregister race.

2014-09-18 Thread Amos Kong
From: Rusty Russell 

The previous patch added one potential problem: we can still be
reading from a hwrng when it's unregistered.  Add a wait for zero
in the hwrng_unregister path.

Signed-off-by: Rusty Russell 
---
 drivers/char/hw_random/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index ee9e504..9f6f5bb 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -60,6 +60,7 @@ static DEFINE_MUTEX(rng_mutex);
 static DEFINE_MUTEX(reading_mutex);
 static int data_avail;
 static u8 *rng_buffer, *rng_fillbuf;
+static DECLARE_WAIT_QUEUE_HEAD(rng_done);
 static unsigned short current_quality;
 static unsigned short default_quality; /* = 0; default to "off" */
 
@@ -98,6 +99,7 @@ static inline void cleanup_rng(struct kref *kref)
 
if (rng->cleanup)
rng->cleanup(rng);
+   wake_up_all(&rng_done);
 }
 
 static void set_current_rng(struct hwrng *rng)
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/6] hw_random: don't double-check old_rng.

2014-09-18 Thread Amos Kong
From: Rusty Russell 

Interesting anti-pattern.

Signed-off-by: Rusty Russell 
---
 drivers/char/hw_random/core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 9f6f5bb..6420409 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -473,14 +473,13 @@ int hwrng_register(struct hwrng *rng)
}
 
old_rng = current_rng;
+   err = 0;
if (!old_rng) {
err = hwrng_init(rng);
if (err)
goto out_unlock;
set_current_rng(rng);
-   }
-   err = 0;
-   if (!old_rng) {
+
err = register_miscdev();
if (err) {
drop_current_rng();
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/6] hw_random: use reference counts on each struct hwrng.

2014-09-18 Thread Amos Kong
From: Rusty Russell 

current_rng holds one reference, and we bump it every time we want
to do a read from it.

This means we only hold the rng_mutex to grab or drop a reference,
so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't
block on read of /dev/hwrng.

Using a kref is overkill (we're always under the rng_mutex), but
a standard pattern.

This also solves the problem that the hwrng_fillfn thread was
accessing current_rng without a lock, which could change (eg. to NULL)
underneath it.

V2: reduce reference count in signal_pending path

Signed-off-by: Rusty Russell 
Signed-off-by: Amos Kong 
---
 drivers/char/hw_random/core.c | 136 +-
 include/linux/hw_random.h |   2 +
 2 files changed, 95 insertions(+), 43 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index a0905c8..ee9e504 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 
@@ -91,6 +92,59 @@ static void add_early_randomness(struct hwrng *rng)
add_device_randomness(bytes, bytes_read);
 }
 
+static inline void cleanup_rng(struct kref *kref)
+{
+   struct hwrng *rng = container_of(kref, struct hwrng, ref);
+
+   if (rng->cleanup)
+   rng->cleanup(rng);
+}
+
+static void set_current_rng(struct hwrng *rng)
+{
+   BUG_ON(!mutex_is_locked(&rng_mutex));
+   kref_get(&rng->ref);
+   current_rng = rng;
+}
+
+static void drop_current_rng(void)
+{
+   BUG_ON(!mutex_is_locked(&rng_mutex));
+   if (!current_rng)
+   return;
+
+   kref_put(¤t_rng->ref, cleanup_rng);
+   current_rng = NULL;
+}
+
+/* Returns ERR_PTR(), NULL or refcounted hwrng */
+static struct hwrng *get_current_rng(void)
+{
+   struct hwrng *rng;
+
+   if (mutex_lock_interruptible(&rng_mutex))
+   return ERR_PTR(-ERESTARTSYS);
+
+   rng = current_rng;
+   if (rng)
+   kref_get(&rng->ref);
+
+   mutex_unlock(&rng_mutex);
+   return rng;
+}
+
+static void put_rng(struct hwrng *rng)
+{
+   /*
+* Hold rng_mutex here so we serialize in case they set_current_rng
+* on rng again immediately.
+*/
+   mutex_lock(&rng_mutex);
+   if (rng)
+   kref_put(&rng->ref, cleanup_rng);
+   mutex_unlock(&rng_mutex);
+}
+
 static inline int hwrng_init(struct hwrng *rng)
 {
if (rng->init) {
@@ -113,12 +167,6 @@ static inline int hwrng_init(struct hwrng *rng)
return 0;
 }
 
-static inline void hwrng_cleanup(struct hwrng *rng)
-{
-   if (rng && rng->cleanup)
-   rng->cleanup(rng);
-}
-
 static int rng_dev_open(struct inode *inode, struct file *filp)
 {
/* enforce read-only access to this chrdev */
@@ -154,21 +202,22 @@ static ssize_t rng_dev_read(struct file *filp, char 
__user *buf,
ssize_t ret = 0;
int err = 0;
int bytes_read, len;
+   struct hwrng *rng;
 
while (size) {
-   if (mutex_lock_interruptible(&rng_mutex)) {
-   err = -ERESTARTSYS;
+   rng = get_current_rng();
+   if (IS_ERR(rng)) {
+   err = PTR_ERR(rng);
goto out;
}
-
-   if (!current_rng) {
+   if (!rng) {
err = -ENODEV;
-   goto out_unlock;
+   goto out;
}
 
mutex_lock(&reading_mutex);
if (!data_avail) {
-   bytes_read = rng_get_data(current_rng, rng_buffer,
+   bytes_read = rng_get_data(rng, rng_buffer,
rng_buffer_size(),
!(filp->f_flags & O_NONBLOCK));
if (bytes_read < 0) {
@@ -200,7 +249,6 @@ static ssize_t rng_dev_read(struct file *filp, char __user 
*buf,
ret += len;
}
 
-   mutex_unlock(&rng_mutex);
mutex_unlock(&reading_mutex);
 
if (need_resched())
@@ -208,17 +256,19 @@ static ssize_t rng_dev_read(struct file *filp, char 
__user *buf,
 
if (signal_pending(current)) {
err = -ERESTARTSYS;
+   put_rng(rng);
goto out;
}
+
+   put_rng(rng);
}
 out:
return ret ? : err;
-out_unlock:
-   mutex_unlock(&rng_mutex);
-   goto out;
+
 out_unlock_reading:
mutex_unlock(&reading_mutex);
-   goto out_unlock;
+   put_rng(rng);
+   goto out;
 }
 
 
@@ -257,8 +307,8 @@ static ssize_t hwrng_attr_current_store(struct device *dev,
err = hwrng_init(rng);
if (err)
break;
-   hwrng_cleanup(current_rng);
-

[PATCH v2 0/6] fix hw_random stuck

2014-09-18 Thread Amos Kong
When I hotunplug a busy virtio-rng device or try to access
hwrng attributes in non-smp guest, it gets stuck.

My original was pain, Rusty posted a real fix. This patchset
fixed two issue in v1, and tested by my 6+ cases.


| test 0:
|   hotunplug rng device from qemu monitor
|
| test 1:
|   guest) # dd if=/dev/hwrng of=/dev/null &
|   hotunplug rng device from qemu monitor
|
| test 2:
|   guest) # dd if=/dev/random of=/dev/null &
|   hotunplug rng device from qemu monitor
|
| test 4:
|   guest) # dd if=/dev/hwrng of=/dev/null &
|   cat /sys/devices/virtual/misc/hw_random/rng_*
|
| test 5:
|   guest) # dd if=/dev/hwrng of=/dev/null
|   cancel dd process after 10 seconds
|   guest) # dd if=/dev/hwrng of=/dev/null &
|   hotunplug rng device from qemu monitor
|
| test 6:
|   use a fifo as rng backend, execute test 0 ~ 5 with no input of fifo

V2: added patch 2 to fix a deadlock, update current patch 3 to fix reference
counting issue

Amos Kong (1):
  hw_random: move some code out mutex_lock for avoiding underlying
deadlock

Rusty Russell (5):
  hw_random: place mutex around read functions and buffers.
  hw_random: use reference counts on each struct hwrng.
  hw_random: fix unregister race.
  hw_random: don't double-check old_rng.
  hw_random: don't init list element we're about to add to list.

 drivers/char/hw_random/core.c | 166 +-
 include/linux/hw_random.h |   2 +
 2 files changed, 117 insertions(+), 51 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/6] hw_random: place mutex around read functions and buffers.

2014-09-18 Thread Amos Kong
From: Rusty Russell 

There's currently a big lock around everything, and it means that we
can't query sysfs (eg /sys/devices/virtual/misc/hw_random/rng_current)
while the rng is reading.  This is a real problem when the rng is slow,
or blocked (eg. virtio_rng with qemu's default /dev/random backend)

This doesn't help (it leaves the current lock untouched), just adds a
lock to protect the read function and the static buffers, in preparation
for transition.

Signed-off-by: Rusty Russell 
---
 drivers/char/hw_random/core.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index aa30a25..b1b6042 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -53,7 +53,10 @@
 static struct hwrng *current_rng;
 static struct task_struct *hwrng_fill;
 static LIST_HEAD(rng_list);
+/* Protects rng_list and current_rng */
 static DEFINE_MUTEX(rng_mutex);
+/* Protects rng read functions, data_avail, rng_buffer and rng_fillbuf */
+static DEFINE_MUTEX(reading_mutex);
 static int data_avail;
 static u8 *rng_buffer, *rng_fillbuf;
 static unsigned short current_quality;
@@ -81,7 +84,9 @@ static void add_early_randomness(struct hwrng *rng)
unsigned char bytes[16];
int bytes_read;
 
+   mutex_lock(&reading_mutex);
bytes_read = rng_get_data(rng, bytes, sizeof(bytes), 1);
+   mutex_unlock(&reading_mutex);
if (bytes_read > 0)
add_device_randomness(bytes, bytes_read);
 }
@@ -128,6 +133,7 @@ static inline int rng_get_data(struct hwrng *rng, u8 
*buffer, size_t size,
int wait) {
int present;
 
+   BUG_ON(!mutex_is_locked(&reading_mutex));
if (rng->read)
return rng->read(rng, (void *)buffer, size, wait);
 
@@ -160,13 +166,14 @@ static ssize_t rng_dev_read(struct file *filp, char 
__user *buf,
goto out_unlock;
}
 
+   mutex_lock(&reading_mutex);
if (!data_avail) {
bytes_read = rng_get_data(current_rng, rng_buffer,
rng_buffer_size(),
!(filp->f_flags & O_NONBLOCK));
if (bytes_read < 0) {
err = bytes_read;
-   goto out_unlock;
+   goto out_unlock_reading;
}
data_avail = bytes_read;
}
@@ -174,7 +181,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user 
*buf,
if (!data_avail) {
if (filp->f_flags & O_NONBLOCK) {
err = -EAGAIN;
-   goto out_unlock;
+   goto out_unlock_reading;
}
} else {
len = data_avail;
@@ -186,7 +193,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user 
*buf,
if (copy_to_user(buf + ret, rng_buffer + data_avail,
len)) {
err = -EFAULT;
-   goto out_unlock;
+   goto out_unlock_reading;
}
 
size -= len;
@@ -194,6 +201,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user 
*buf,
}
 
mutex_unlock(&rng_mutex);
+   mutex_unlock(&reading_mutex);
 
if (need_resched())
schedule_timeout_interruptible(1);
@@ -208,6 +216,9 @@ out:
 out_unlock:
mutex_unlock(&rng_mutex);
goto out;
+out_unlock_reading:
+   mutex_unlock(&reading_mutex);
+   goto out_unlock;
 }
 
 
@@ -348,13 +359,16 @@ static int hwrng_fillfn(void *unused)
while (!kthread_should_stop()) {
if (!current_rng)
break;
+   mutex_lock(&reading_mutex);
rc = rng_get_data(current_rng, rng_fillbuf,
  rng_buffer_size(), 1);
+   mutex_unlock(&reading_mutex);
if (rc <= 0) {
pr_warn("hwrng: no data available\n");
msleep_interruptible(1);
continue;
}
+   /* Outside lock, sure, but y'know: randomness. */
add_hwgenerator_randomness((void *)rng_fillbuf, rc,
   rc * current_quality * 8 >> 10);
}
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: x86: count actual tlb flushes

2014-09-18 Thread Paolo Bonzini
Il 18/09/2014 07:05, Xiao Guangrong ha scritto:
> On 09/18/2014 02:35 AM, Liang Chen wrote:
>> - we count KVM_REQ_TLB_FLUSH requests, not actual flushes
>> (KVM can have multiple requests for one flush)
>> - flushes from kvm_flush_remote_tlbs aren't counted
>> - it's easy to make a direct request by mistake
>>
>> Solve these by postponing the counting to kvm_check_request(),
> 
> It's good.
> 
>> and refactor the code to use kvm_make_request again.
> 
> Why this refactor is needed? It's really a bad idea using
> raw-bit-set instead of meaningful name.

set_bit is worse than kvm_make_request, but adding a one-line wrapper
around kvm_make_request is not particularly useful.

We have the following requests:

- used multiple times:
kvm_make_request(KVM_REQ_APF_HALT, vcpu);
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
kvm_make_request(KVM_REQ_EVENT, vcpu);
kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);

- used once:
kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
kvm_make_request(KVM_REQ_NMI, vcpu);
kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
kvm_make_request(KVM_REQ_PMI, pmc->vcpu);
kvm_make_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu);

So I'm pretty much ambivalent about that.  However, I agree with this
suggestion:

> [ Btw, maybe kvm_mmu_flush_local_tlb is a better name than
> kvm_mmu_flush_tlb() in the current code. ]

Or even better, call it kvm_vcpu_flush_tlb since it is all within
x86.c/vmx.c/svm.c and the MMU is not involved at all.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] hw_random: fix stuck in catting hwrng attributes

2014-09-18 Thread Amos Kong
On Thu, Sep 18, 2014 at 12:13:08PM +0930, Rusty Russell wrote:
> Amos Kong  writes:
> 
> > I started a QEMU (non-smp) guest with one virtio-rng device, and read
> > random data from /dev/hwrng by dd:
> >
> >  # dd if=/dev/hwrng of=/dev/null &
> >
> > In the same time, if I check hwrng attributes from sysfs by cat:
> >
> >  # cat /sys/class/misc/hw_random/rng_*
> >
> > The cat process always gets stuck with slow backend (5 k/s), if we
> > use a quick backend (1.2 M/s), the cat process will cost 1 to 2
> > minutes. The stuck doesn't exist for smp guest.
> >
> > Reading syscall enters kernel and call rng_dev_read(), it's user
> > context. We used need_resched() to check if other tasks need to
> > be run, but it almost always return false, and re-hold the mutex
> > lock. The attributes accessing process always fails to hold the
> > lock, so the cat gets stuck.
> >
> > User context doesn't allow other user contexts run on that CPU,
> > unless the kernel code sleeps for some reason. This is why the
> > need_reshed() always return false here.
> >
> > This patch removed need_resched() and always schedule other tasks
> > then other tasks can have chance to hold the lock and execute
> > protected code.
 
Hi Rusty,

> OK, this is going to be a rant.
> 
> Your explanation doesn't make sense at all.  Worse, your solution breaks
> the advice of Kernighan & Plaugher: "Don't patch bad code - rewrite
> it.".
> 
> But worst of all, this detailed explanation might have convinced me you
> understood the problem better than I did, and applied your patch.
 
I'm sorry about the misleading.

> I did some tests.  For me, as expected, the process spends its time
> inside the virtio rng read function, holding the mutex and thus blocking
> sysfs access; it's not a failure of this code at all.

Got it now.

The catting hang bug was found when I try to fix unhotplug issue, the
unhotplug issue can't be reproduced if I try to debug by gdb or
printk. So I forgot to debug cat hang ... but spend time to misunderstand
schedle code :(

> Your schedule_timeout() "fix" probably just helps by letting the host
> refresh entropy, so we spend less time waiting in the read fn.
> 
> I will post a series, which unfortunately is only lightly tested, then
> I'm going to have some beer to begin my holiday.  That may help me
> forget my disappointment at seeing respected fellow developers
> monkey-patching random code they don't understand.

I just posted a V2 with two additional fixes, hotunplugging works well now :)

> Grrr

Enjoy your holiday!
Amos

> Rusty.
>
> > Signed-off-by: Amos Kong 
> > ---
> >  drivers/char/hw_random/core.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> > index c591d7e..263a370 100644
> > --- a/drivers/char/hw_random/core.c
> > +++ b/drivers/char/hw_random/core.c
> > @@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char 
> > __user *buf,
> >  
> > mutex_unlock(&rng_mutex);
> >  
> > -   if (need_resched())
> > -   schedule_timeout_interruptible(1);
> > +   schedule_timeout_interruptible(1);
> >  
> > if (signal_pending(current)) {
> > err = -ERESTARTSYS;
> > -- 
> > 1.9.3

-- 
Amos.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND PATCH 1/3] x86: Adding structs to reflect cpuid fields

2014-09-18 Thread Paolo Bonzini
Il 17/09/2014 16:06, Borislav Petkov ha scritto:
>> > AFAIK backward compatibility is usually maintained in x86. I did not
>> > see in Intel SDM anything that says "this CPUID field means something
>> > for CPU X and something else for CPU Y". Anyhow, it is not different
>> > than bitmasks in this respect.
> You still don't get my point: what are you going to do when
> min_monitor_line_size needs to be 17 bits all of a sudden?

The extra bit used to be reserved and thus will be zero on older
families.  So, nothing?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND PATCH 1/3] x86: Adding structs to reflect cpuid fields

2014-09-18 Thread Borislav Petkov
On Thu, Sep 18, 2014 at 03:06:59PM +0200, Paolo Bonzini wrote:
> The extra bit used to be reserved and thus will be zero on older
> families.  So, nothing?

"thus will be zero" is unfortunately simply not true.

>From the SDM:

"1.3.2 Reserved Bits and Software Compatibility

In many register and memory layout descriptions, certain bits are marked
as reserved. When bits are marked as reserved, it is essential for
compatibility with future processors that software treat these bits as
having a future, though unknown, effect. The behavior of reserved bits
should be regarded as not only undefined, but unpredict- able. Software
should follow these guidelines in dealing with reserved bits:

* Do not depend on the states of any reserved bits when testing the values of 
registers that contain such bits.
Mask out the reserved bits before testing.

*  Do not depend on the states of any reserved bits when storing to memory or 
to a register.

* Do not depend on the ability to retain information written into any reserved 
bits.

* When loading a register, always load the reserved bits with the values 
indicated in the documentation, if any,
or reload them with values previously read from the same register."

And so on and so on.

Aaanyway, I see kvm folks are strongly determined to do this so I'll
propose you do it for kvm only and not touch the rest of arch/x86/.

I really really disagree with casting CPUID words in stone and with
touching code which is perfectly fine.

But this is just me and the final word lies with x86 people so it'll
happen whatever they say.

-- 
Regards/Gruss,
Boris.
--
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND PATCH 1/3] x86: Adding structs to reflect cpuid fields

2014-09-18 Thread Paolo Bonzini
Il 18/09/2014 15:26, Borislav Petkov ha scritto:
> On Thu, Sep 18, 2014 at 03:06:59PM +0200, Paolo Bonzini wrote:
>> The extra bit used to be reserved and thus will be zero on older
>> families.  So, nothing?
> 
> "thus will be zero" is unfortunately simply not true.
> 
> From the SDM:
> 
> "1.3.2 Reserved Bits and Software Compatibility
> 
> In many register and memory layout descriptions, certain bits are marked
> as reserved. When bits are marked as reserved, it is essential for
> compatibility with future processors that software treat these bits as
> having a future, though unknown, effect. The behavior of reserved bits
> should be regarded as not only undefined, but unpredict- able. Software
> should follow these guidelines in dealing with reserved bits:

We're talking about the case where the field is not reserved anymore and
we _know_ that the vendor has just decided to grow the bitfield that
precedes it.

As soon as we know that the field is not reserved anymore, we obviously
rely on reserved bits being zero in older processors, and in future
processors from other vendors.  The trivial example is feature bits like
XSAVE.  We query them all the time without checking the family when they
were first introduced, don't we?

> * Do not depend on the states of any reserved bits when testing the values of 
> registers that contain such bits.
> Mask out the reserved bits before testing.

Done by the bitfields.

> *  Do not depend on the states of any reserved bits when storing to memory or 
> to a register.

Doesn't apply to CPUID.

> * Do not depend on the ability to retain information written into any 
> reserved bits.

Doesn't apply to CPUID.

> * When loading a register, always load the reserved bits with the values 
> indicated in the documentation, if any,
> or reload them with values previously read from the same register."

Only applies to KVM, where it can be done with the bitfields too.

> And so on and so on.
> 
> Aaanyway, I see kvm folks are strongly determined to do this so I'll
> propose you do it for kvm only and not touch the rest of arch/x86/.
> 
> I really really disagree with casting CPUID words in stone and with
> touching code which is perfectly fine.

It's not really the KVM folks.  I actually kinda agree with you about
"touching code that works", which is why I hadn't yet commented on the
series.  But Ingo and Peter like it, so I can accept the patch as well.

> But this is just me and the final word lies with x86 people so it'll
> happen whatever they say.

Yep, same with me---KVM will just follow suit.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: x86: count actual tlb flushes

2014-09-18 Thread Liang Chen

On 09/18/2014 08:47 AM, Paolo Bonzini wrote:
> Il 18/09/2014 07:05, Xiao Guangrong ha scritto:
>> On 09/18/2014 02:35 AM, Liang Chen wrote:
>>> - we count KVM_REQ_TLB_FLUSH requests, not actual flushes
>>> (KVM can have multiple requests for one flush)
>>> - flushes from kvm_flush_remote_tlbs aren't counted
>>> - it's easy to make a direct request by mistake
>>>
>>> Solve these by postponing the counting to kvm_check_request(),
>> It's good.
>>
>>> and refactor the code to use kvm_make_request again.
>> Why this refactor is needed? It's really a bad idea using
>> raw-bit-set instead of meaningful name.
> set_bit is worse than kvm_make_request, but adding a one-line wrapper
> around kvm_make_request is not particularly useful.
>
> We have the following requests:
>
> - used multiple times:
> kvm_make_request(KVM_REQ_APF_HALT, vcpu);
> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> kvm_make_request(KVM_REQ_EVENT, vcpu);
> kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
> kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
> kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
> kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
>
> - used once:
> kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
> kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
> kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
> kvm_make_request(KVM_REQ_NMI, vcpu);
> kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
> kvm_make_request(KVM_REQ_PMI, pmc->vcpu);
> kvm_make_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu);
>
> So I'm pretty much ambivalent about that.  However, I agree with this
> suggestion:
>
>> [ Btw, maybe kvm_mmu_flush_local_tlb is a better name than
>> kvm_mmu_flush_tlb() in the current code. ]
> Or even better, call it kvm_vcpu_flush_tlb since it is all within
> x86.c/vmx.c/svm.c and the MMU is not involved at all.
>
> Paolo

Thanks for the quick feedback! I will incorporate the name change into the
next iteration of the patch.

Thanks!
Liang Chen

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: x86: count actual tlb flushes

2014-09-18 Thread Radim Krčmář
2014-09-17 14:35-0400, Liang Chen:
> - we count KVM_REQ_TLB_FLUSH requests, not actual flushes
> (KVM can have multiple requests for one flush)
> - flushes from kvm_flush_remote_tlbs aren't counted
> - it's easy to make a direct request by mistake
> 
> Solve these by postponing the counting to kvm_check_request(),
> and refactor the code to use kvm_make_request again.
> 
> Signed-off-by: Liang Chen 
> ---
> Changes since v2:
> 
> * Instead of calling kvm_mmu_flush_tlb everywhere to make sure the
> stat is always incremented, postponing the counting to
> kvm_check_request.
> 
> (The idea comes from Radim. Much of the work is indeed done by him
> and is included in this patch, otherwise I couldn't start working
> on the followup work as I promised early. As I'm new to kvm
> development, please let me know if I am doing wrong here.)

I found (shame on me) Documentation/development-process/ when looking
how to help and it looks really good.
(If you read it, the rest of my mail will be obsolete :)

You usually want to Cc linux-ker...@vger.kernel.org.
(I've heard that someone actually reads it directly and it is a good
 archive otherwise.  It allows people to `git blame` your code and find
 the discussion in their preferred mail reader.)

The hard part about posting a patch is splitting it ...
You want to separate logical changes to make the code maintainable:
For this patch, I would create at least two-part series (cover letter!)

 - change the meaning of tlb_flush
 - refactor code

And see if it would make sense to split the refactoring further or if it
breaks when only a first part of the whole series is applied.

It's not a problem if your code depends on unmerged patches, you can
include someone else's code in the series as long as it isn't modified.
(Which probably is better than just mentioning that your code depends on
 some other patches from the list, but I'm not applying it ... Paolo?)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: x86: count actual tlb flushes

2014-09-18 Thread Radim Krčmář
2014-09-18 14:47+0200, Paolo Bonzini:
> Il 18/09/2014 07:05, Xiao Guangrong ha scritto:
> > On 09/18/2014 02:35 AM, Liang Chen wrote:
> >> - we count KVM_REQ_TLB_FLUSH requests, not actual flushes
> >> (KVM can have multiple requests for one flush)
> >> - flushes from kvm_flush_remote_tlbs aren't counted
> >> - it's easy to make a direct request by mistake
> >>
> >> Solve these by postponing the counting to kvm_check_request(),
> > 
> > It's good.
> > 
> >> and refactor the code to use kvm_make_request again.
> > 
> > Why this refactor is needed? It's really a bad idea using
> > raw-bit-set instead of meaningful name.
> 
> set_bit is worse than kvm_make_request, but adding a one-line wrapper
> around kvm_make_request is not particularly useful.
> 
> We have the following requests:
> 
> - used multiple times:
> kvm_make_request(KVM_REQ_APF_HALT, vcpu);
> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> kvm_make_request(KVM_REQ_EVENT, vcpu);
> kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
> kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
> kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
> kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
> 
> - used once:
> kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
> kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
> kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
> kvm_make_request(KVM_REQ_NMI, vcpu);
> kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
> kvm_make_request(KVM_REQ_PMI, pmc->vcpu);
> kvm_make_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu);
> 
> So I'm pretty much ambivalent about that.

I love abstractions, but consistency is above that, so I'd rather use
kvm_make_requests unless we modified all of them ... which is a lot of
work for the gain.  (make_request is just clunky)

> However, I agree with this > suggestion:
> 
> > [ Btw, maybe kvm_mmu_flush_local_tlb is a better name than
> > kvm_mmu_flush_tlb() in the current code. ]
> 
> Or even better, call it kvm_vcpu_flush_tlb since it is all within
> x86.c/vmx.c/svm.c and the MMU is not involved at all.

Good ideas.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread H. Peter Anvin
On 09/18/2014 07:40 AM, KY Srinivasan wrote:
>>
>> The main questions are what MSR index to use and how to detect the
>> presence of the MSR.  I've played with two approaches:
>>
>> 1. Use CPUID to detect the presence of this feature.  This is very easy for
>> KVM to implement by using a KVM-specific CPUID feature.  The problem is
>> that this will necessarily be KVM-specific, as the guest must first probe for
>> KVM and then probe for the KVM feature.  I doubt that Hyper-V, for
>> example, wants to claim to be KVM.  If we could standardize a non-
>> hypervisor-specific CPUID feature, then this problem would go away.
> 
> We would prefer a CPUID feature bit to detect this feature.
>  

I guess if we're introducing the concept of pan-OS MSRs we could also
have pan-OS CPUID.  The real issue is to get a single non-conflicting
standard.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: x86: count actual tlb flushes

2014-09-18 Thread Liang Chen

On 09/18/2014 10:00 AM, Radim Krčmář wrote:
> 2014-09-17 14:35-0400, Liang Chen:
>> - we count KVM_REQ_TLB_FLUSH requests, not actual flushes
>> (KVM can have multiple requests for one flush)
>> - flushes from kvm_flush_remote_tlbs aren't counted
>> - it's easy to make a direct request by mistake
>>
>> Solve these by postponing the counting to kvm_check_request(),
>> and refactor the code to use kvm_make_request again.
>>
>> Signed-off-by: Liang Chen 
>> ---
>> Changes since v2:
>>
>> * Instead of calling kvm_mmu_flush_tlb everywhere to make sure the
>> stat is always incremented, postponing the counting to
>> kvm_check_request.
>>
>> (The idea comes from Radim. Much of the work is indeed done by him
>> and is included in this patch, otherwise I couldn't start working
>> on the followup work as I promised early. As I'm new to kvm
>> development, please let me know if I am doing wrong here.)
> I found (shame on me) Documentation/development-process/ when looking
> how to help and it looks really good.
> (If you read it, the rest of my mail will be obsolete :)
>
> You usually want to Cc linux-ker...@vger.kernel.org.
> (I've heard that someone actually reads it directly and it is a good
>  archive otherwise.  It allows people to `git blame` your code and find
>  the discussion in their preferred mail reader.)
>
> The hard part about posting a patch is splitting it ...
> You want to separate logical changes to make the code maintainable:
> For this patch, I would create at least two-part series (cover letter!)
>
>  - change the meaning of tlb_flush
>  - refactor code
>
> And see if it would make sense to split the refactoring further or if it
> breaks when only a first part of the whole series is applied.
>
> It's not a problem if your code depends on unmerged patches, you can
> include someone else's code in the series as long as it isn't modified.
> (Which probably is better than just mentioning that your code depends on
>  some other patches from the list, but I'm not applying it ... Paolo?)

Thank you very much for the help! Creating a patch series and including
your patch intact as the first one sound to be the best ;)

Thanks,
Liang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread KY Srinivasan


> -Original Message-
> From: virtualization-boun...@lists.linux-foundation.org
> [mailto:virtualization-boun...@lists.linux-foundation.org] On Behalf Of Andy
> Lutomirski
> Sent: Wednesday, September 17, 2014 7:51 PM
> To: Linux Virtualization; kvm list
> Cc: Gleb Natapov; Paolo Bonzini; Theodore Ts'o; H. Peter Anvin
> Subject: Standardizing an MSR or other hypercall to get an RNG seed?
> 
> Hi all-
> 
> I would like to standardize on a very simple protocol by which a guest OS can
> obtain an RNG seed early in boot.
> 
> The main design requirements are:
> 
>  - The interface should be very easy to use.  Linux, at least, will want to 
> use it
> extremely early in boot as part of kernel ASLR.  This means that PCI and ACPI
> will not work.
> 
>  - It should be synchronous.  We don't want to delay boot while waiting for a
> slow host RNG.  (On Linux, at least, we have a separate interface for that:
> virtio-rng.  I think that Windows has some support for virtio-rng as well.)
> 
>  - Random numbers obtained through this interface should be best-effort.
> We want the best quality randomness that the host can provide
> immediately.
> 
> It seems to me that the best interface for the actual request for a random
> number is rdmsr.  This is supported on all hypervisors and all virtualization
> technologies.  It can return a 64 bit random number, and it is easy to rdmsr
> the same register more than once to get a larger random number.
> 
> The main questions are what MSR index to use and how to detect the
> presence of the MSR.  I've played with two approaches:
> 
> 1. Use CPUID to detect the presence of this feature.  This is very easy for
> KVM to implement by using a KVM-specific CPUID feature.  The problem is
> that this will necessarily be KVM-specific, as the guest must first probe for
> KVM and then probe for the KVM feature.  I doubt that Hyper-V, for
> example, wants to claim to be KVM.  If we could standardize a non-
> hypervisor-specific CPUID feature, then this problem would go away.

We would prefer a CPUID feature bit to detect this feature.
 
> 
> 2. Detect the existence of the MSR by trying to read it and handling the
> #GP(0) that will occur if the MSR is not present.  Linux, at least, is okay 
> with
> doing this, and I have code to enable an IDT and an rdmsr fixup early enough
> in boot to use it for ASLR.  I don't know whether other operating systems can
> do this, though.
> 
> The major questions, then, are what enumeration mechanism should be
> used and what MSR index should be used.
> 
> For the MSR index, we could use an MSR from the Intel range if Intel were to
> give explicit approval, thus guaranteeing that nothing would conflict.  Or we
> could try to agree on an MSR index in the 0x4000-0x4fff range that is
> unlikely to conflict with anything.
> 
> For enumeration, we could just probe the MSR if all relevant guests are okay
> with this or we could standardize on a CPUID-based mechanism.
> If we do the latter, I don't know what that mechanism would be.
> 
> NB: This thread will be cc'd to Microsoft and possibly Hyper-V people shortly.
> I very much appreciate Jun Nakajima's help with this!
> 
> Thanks,
> Andy


Regards,

K. Y
> 
> --
> Andy Lutomirski
> AMA Capital Management, LLC
> ___
> Virtualization mailing list
> virtualizat...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Andy Lutomirski
On Thu, Sep 18, 2014 at 7:43 AM, H. Peter Anvin  wrote:
> On 09/18/2014 07:40 AM, KY Srinivasan wrote:
>>>
>>> The main questions are what MSR index to use and how to detect the
>>> presence of the MSR.  I've played with two approaches:
>>>
>>> 1. Use CPUID to detect the presence of this feature.  This is very easy for
>>> KVM to implement by using a KVM-specific CPUID feature.  The problem is
>>> that this will necessarily be KVM-specific, as the guest must first probe 
>>> for
>>> KVM and then probe for the KVM feature.  I doubt that Hyper-V, for
>>> example, wants to claim to be KVM.  If we could standardize a non-
>>> hypervisor-specific CPUID feature, then this problem would go away.
>>
>> We would prefer a CPUID feature bit to detect this feature.
>>
>
> I guess if we're introducing the concept of pan-OS MSRs we could also
> have pan-OS CPUID.  The real issue is to get a single non-conflicting
> standard.

Agreed.

KVM currently puts 0 in 0x4000.EAX, meaning that a feature bit in
Microsoft's leaf 0x4003 would probably not work well for KVM.  I
don't expect that Microsoft wants to start claiming to be KVM for the
purpose of using a KVM-style feature bit, so, if we went the CPUID
route, we would probably need something new.

--Andy

>
> -hpa
>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Andy Lutomirski
On Thu, Sep 18, 2014 at 8:38 AM, Andy Lutomirski  wrote:
> On Thu, Sep 18, 2014 at 7:43 AM, H. Peter Anvin  wrote:
>> On 09/18/2014 07:40 AM, KY Srinivasan wrote:

 The main questions are what MSR index to use and how to detect the
 presence of the MSR.  I've played with two approaches:

 1. Use CPUID to detect the presence of this feature.  This is very easy for
 KVM to implement by using a KVM-specific CPUID feature.  The problem is
 that this will necessarily be KVM-specific, as the guest must first probe 
 for
 KVM and then probe for the KVM feature.  I doubt that Hyper-V, for
 example, wants to claim to be KVM.  If we could standardize a non-
 hypervisor-specific CPUID feature, then this problem would go away.
>>>
>>> We would prefer a CPUID feature bit to detect this feature.
>>>
>>
>> I guess if we're introducing the concept of pan-OS MSRs we could also
>> have pan-OS CPUID.  The real issue is to get a single non-conflicting
>> standard.
>
> Agreed.
>
> KVM currently puts 0 in 0x4000.EAX, meaning that a feature bit in
> Microsoft's leaf 0x4003 would probably not work well for KVM.  I
> don't expect that Microsoft wants to start claiming to be KVM for the
> purpose of using a KVM-style feature bit, so, if we went the CPUID
> route, we would probably need something new.

Slight correction: QEMU/KVM has optional support for Hyper-V feature
enumeration.  Ideally the RNG seed mechanism would be enabled by
default, but I don't know whether the QEMU maintainers would be okay
with enabling the Hyper-V cpuid mechanism in a default configuration.

--Andy

>
> --Andy
>
>>
>> -hpa
>>
>>
>
>
>
> --
> Andy Lutomirski
> AMA Capital Management, LLC



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Paolo Bonzini
Il 18/09/2014 17:44, Andy Lutomirski ha scritto:
> Slight correction: QEMU/KVM has optional support for Hyper-V feature
> enumeration.  Ideally the RNG seed mechanism would be enabled by
> default, but I don't know whether the QEMU maintainers would be okay
> with enabling the Hyper-V cpuid mechanism in a default configuration.

Some guests cannot find the KVM leaves at 0x4100, so it wouldn't be
great.  And I also don't know what VMware folks would think, but I think
they would be even less thrilled than me.

Note that even if there is no well-defined CPUID leaf, and the main
detection mechanism is #GP, each hypervisor is free to define a CPUID
bit of its own.

However, if it's going to be an architectural (Intel-defined) MSR, I
think the right place for a feature bit is in the low leaves (like
EAX=7, ECX=0).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/2] KVM: count actual tlb flushes

2014-09-18 Thread Liang Chen
* Instead of counting the number of coalesced flush requests,
  we count the actual tlb flushes.
* Flushes from kvm_flush_remote_tlbs will also be counted.
* Freeing the namespace a bit by replaces kvm_mmu_flush_tlb()
  with kvm_make_request() again.

---
v2 -> v3:

* split the patch into a series of two patches.
* rename the util function kvm_mmu_flush_tlb in x86.c to
  kvm_vcpu_flush_tlb

v1 -> v2:

* Instead of calling kvm_mmu_flush_tlb everywhere to make sure the
  stat is always incremented, postponing the counting to
  kvm_check_request.


Liang Chen (1):
  KVM: x86: directly use kvm_make_request again

Radim Krčmář (1):
  KVM: x86: count actual tlb flushes

 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/mmu.c  | 17 +
 arch/x86/kvm/vmx.c  |  2 +-
 arch/x86/kvm/x86.c  | 13 ++---
 4 files changed, 16 insertions(+), 17 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/2] KVM: x86: count actual tlb flushes

2014-09-18 Thread Liang Chen
From: Radim Krčmář 

- we count KVM_REQ_TLB_FLUSH requests, not actual flushes
  (KVM can have multiple requests for one flush)
- flushes from kvm_flush_remote_tlbs aren't counted
- it's easy to make a direct request by mistake

Solve these by postponing the counting to kvm_check_request().

Signed-off-by: Radim Krčmář 
Signed-off-by: Liang Chen 
---
 arch/x86/kvm/mmu.c | 1 -
 arch/x86/kvm/x86.c | 4 +++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9314678..b41fd97 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3452,7 +3452,6 @@ static void nonpaging_init_context(struct kvm_vcpu *vcpu,
 
 void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu)
 {
-   ++vcpu->stat.tlb_flush;
kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_flush_tlb);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..9eb5458 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6017,8 +6017,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
}
if (kvm_check_request(KVM_REQ_MMU_SYNC, vcpu))
kvm_mmu_sync_roots(vcpu);
-   if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
+   if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
+   ++vcpu->stat.tlb_flush;
kvm_x86_ops->tlb_flush(vcpu);
+   }
if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) {
vcpu->run->exit_reason = KVM_EXIT_TPR_ACCESS;
r = 0;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] KVM: x86: directly use kvm_make_request again

2014-09-18 Thread Liang Chen
A one-line wrapper around kvm_make_request does not seem
particularly useful. Replace kvm_mmu_flush_tlb() with
kvm_make_request() again to free the namespace a bit.

Signed-off-by: Liang Chen 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/mmu.c  | 16 +---
 arch/x86/kvm/vmx.c  |  2 +-
 arch/x86/kvm/x86.c  | 11 ---
 4 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..77ade89 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -917,7 +917,6 @@ void kvm_inject_nmi(struct kvm_vcpu *vcpu);
 
 int fx_init(struct kvm_vcpu *vcpu);
 
-void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu);
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
   const u8 *new, int bytes);
 int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b41fd97..acc2d0c5 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1749,7 +1749,7 @@ static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp,
return 1;
}
 
-   kvm_mmu_flush_tlb(vcpu);
+   kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
return 0;
 }
 
@@ -1802,7 +1802,7 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu,  gfn_t 
gfn)
 
kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
if (flush)
-   kvm_mmu_flush_tlb(vcpu);
+   kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 }
 
 struct mmu_page_path {
@@ -2536,7 +2536,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
  true, host_writable)) {
if (write_fault)
*emulate = 1;
-   kvm_mmu_flush_tlb(vcpu);
+   kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
}
 
if (unlikely(is_mmio_spte(*sptep) && emulate))
@@ -3450,12 +3450,6 @@ static void nonpaging_init_context(struct kvm_vcpu *vcpu,
context->nx = false;
 }
 
-void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu)
-{
-   kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
-}
-EXPORT_SYMBOL_GPL(kvm_mmu_flush_tlb);
-
 void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu)
 {
mmu_free_roots(vcpu);
@@ -3961,7 +3955,7 @@ static void mmu_pte_write_flush_tlb(struct kvm_vcpu 
*vcpu, bool zap_page,
if (remote_flush)
kvm_flush_remote_tlbs(vcpu->kvm);
else if (local_flush)
-   kvm_mmu_flush_tlb(vcpu);
+   kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 }
 
 static u64 mmu_pte_write_fetch_gpte(struct kvm_vcpu *vcpu, gpa_t *gpa,
@@ -4222,7 +4216,7 @@ EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);
 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
 {
vcpu->arch.mmu.invlpg(vcpu, gva);
-   kvm_mmu_flush_tlb(vcpu);
+   kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
++vcpu->stat.invlpg;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_invlpg);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bfe11cf..bb0a7ab 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6617,7 +6617,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
switch (type) {
case VMX_EPT_EXTENT_GLOBAL:
kvm_mmu_sync_roots(vcpu);
-   kvm_mmu_flush_tlb(vcpu);
+   kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
nested_vmx_succeed(vcpu);
break;
default:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9eb5458..fc3df50 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -726,7 +726,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
if (cr3 == kvm_read_cr3(vcpu) && !pdptrs_changed(vcpu)) {
kvm_mmu_sync_roots(vcpu);
-   kvm_mmu_flush_tlb(vcpu);
+   kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
return 0;
}
 
@@ -5989,6 +5989,12 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
+static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
+{
+   ++vcpu->stat.tlb_flush;
+   kvm_x86_ops->tlb_flush(vcpu);
+}
+
 /*
  * Returns 1 to let __vcpu_run() continue the guest execution loop without
  * exiting to the userspace.  Otherwise, the value will be returned to the
@@ -6018,8 +6024,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (kvm_check_request(KVM_REQ_MMU_SYNC, vcpu))
kvm_mmu_sync_roots(vcpu);
if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
-   ++vcpu->stat.tlb_flush;
-   kvm_x86_ops->tlb_flush(vcpu);
+   kvm_vcpu_flush_tlb(vcpu);
}
if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) {
vcpu->run->exit_reason = KVM_EXIT_TPR_ACCESS;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" 

RE: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread KY Srinivasan


> -Original Message-
> From: Andy Lutomirski [mailto:l...@amacapital.net]
> Sent: Thursday, September 18, 2014 8:38 AM
> To: H. Peter Anvin
> Cc: KY Srinivasan; Linux Virtualization; kvm list; Gleb Natapov; Paolo 
> Bonzini;
> Theodore Ts'o
> Subject: Re: Standardizing an MSR or other hypercall to get an RNG seed?
> 
> On Thu, Sep 18, 2014 at 7:43 AM, H. Peter Anvin  wrote:
> > On 09/18/2014 07:40 AM, KY Srinivasan wrote:
> >>>
> >>> The main questions are what MSR index to use and how to detect the
> >>> presence of the MSR.  I've played with two approaches:
> >>>
> >>> 1. Use CPUID to detect the presence of this feature.  This is very
> >>> easy for KVM to implement by using a KVM-specific CPUID feature.
> >>> The problem is that this will necessarily be KVM-specific, as the
> >>> guest must first probe for KVM and then probe for the KVM feature.
> >>> I doubt that Hyper-V, for example, wants to claim to be KVM.  If we
> >>> could standardize a non- hypervisor-specific CPUID feature, then this
> problem would go away.
> >>
> >> We would prefer a CPUID feature bit to detect this feature.
> >>
> >
> > I guess if we're introducing the concept of pan-OS MSRs we could also
> > have pan-OS CPUID.  The real issue is to get a single non-conflicting
> > standard.
> 
> Agreed.
> 
> KVM currently puts 0 in 0x4000.EAX, meaning that a feature bit in
> Microsoft's leaf 0x4003 would probably not work well for KVM.  I don't
> expect that Microsoft wants to start claiming to be KVM for the purpose of
> using a KVM-style feature bit, so, if we went the CPUID route, we would
> probably need something new.
> 
> --Andy

I am copying other Hyper-V engineers to this discussion.

Regards,

K. Y
> 
> >
> > -hpa
> >
> >
> 
> 
> 
> --
> Andy Lutomirski
> AMA Capital Management, LLC


Re: [PATCH v2 3/6] hw_random: use reference counts on each struct hwrng.

2014-09-18 Thread Amos Kong
On Thu, Sep 18, 2014 at 08:37:44PM +0800, Amos Kong wrote:
> From: Rusty Russell 
> 
> current_rng holds one reference, and we bump it every time we want
> to do a read from it.
> 
> This means we only hold the rng_mutex to grab or drop a reference,
> so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't
> block on read of /dev/hwrng.
> 
> Using a kref is overkill (we're always under the rng_mutex), but
> a standard pattern.
> 
> This also solves the problem that the hwrng_fillfn thread was
> accessing current_rng without a lock, which could change (eg. to NULL)
> underneath it.
> 
> V2: reduce reference count in signal_pending path
> 
> Signed-off-by: Rusty Russell 
> Signed-off-by: Amos Kong 

Reply myself.

> ---
>  drivers/char/hw_random/core.c | 136 
> +-
>  include/linux/hw_random.h |   2 +
>  2 files changed, 95 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index a0905c8..ee9e504 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -42,6 +42,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  
> @@ -91,6 +92,59 @@ static void add_early_randomness(struct hwrng *rng)
>   add_device_randomness(bytes, bytes_read);
>  }
>  
> +static inline void cleanup_rng(struct kref *kref)
> +{
> + struct hwrng *rng = container_of(kref, struct hwrng, ref);
> +
> + if (rng->cleanup)
> + rng->cleanup(rng);
> +}
> +
> +static void set_current_rng(struct hwrng *rng)
> +{
> + BUG_ON(!mutex_is_locked(&rng_mutex));
> + kref_get(&rng->ref);
> + current_rng = rng;
> +}
> +
> +static void drop_current_rng(void)
> +{
> + BUG_ON(!mutex_is_locked(&rng_mutex));
> + if (!current_rng)
> + return;
> +
> + kref_put(¤t_rng->ref, cleanup_rng);
> + current_rng = NULL;
> +}
> +
> +/* Returns ERR_PTR(), NULL or refcounted hwrng */
> +static struct hwrng *get_current_rng(void)
> +{
> + struct hwrng *rng;
> +
> + if (mutex_lock_interruptible(&rng_mutex))
> + return ERR_PTR(-ERESTARTSYS);
> +
> + rng = current_rng;
> + if (rng)
> + kref_get(&rng->ref);
> +
> + mutex_unlock(&rng_mutex);
> + return rng;
> +}
> +
> +static void put_rng(struct hwrng *rng)
> +{
> + /*
> +  * Hold rng_mutex here so we serialize in case they set_current_rng
> +  * on rng again immediately.
> +  */
> + mutex_lock(&rng_mutex);
> + if (rng)
> + kref_put(&rng->ref, cleanup_rng);
> + mutex_unlock(&rng_mutex);
> +}
> +
>  static inline int hwrng_init(struct hwrng *rng)
>  {
>   if (rng->init) {
> @@ -113,12 +167,6 @@ static inline int hwrng_init(struct hwrng *rng)
>   return 0;
>  }
>  
> -static inline void hwrng_cleanup(struct hwrng *rng)
> -{
> - if (rng && rng->cleanup)
> - rng->cleanup(rng);
> -}
> -
>  static int rng_dev_open(struct inode *inode, struct file *filp)
>  {
>   /* enforce read-only access to this chrdev */
> @@ -154,21 +202,22 @@ static ssize_t rng_dev_read(struct file *filp, char 
> __user *buf,
>   ssize_t ret = 0;
>   int err = 0;
>   int bytes_read, len;
> + struct hwrng *rng;
>  
>   while (size) {
> - if (mutex_lock_interruptible(&rng_mutex)) {
> - err = -ERESTARTSYS;
> + rng = get_current_rng();
> + if (IS_ERR(rng)) {
> + err = PTR_ERR(rng);
>   goto out;
>   }
> -
> - if (!current_rng) {
> + if (!rng) {
>   err = -ENODEV;
> - goto out_unlock;
> + goto out;
>   }
>  
>   mutex_lock(&reading_mutex);
>   if (!data_avail) {
> - bytes_read = rng_get_data(current_rng, rng_buffer,
> + bytes_read = rng_get_data(rng, rng_buffer,
>   rng_buffer_size(),
>   !(filp->f_flags & O_NONBLOCK));
>   if (bytes_read < 0) {
> @@ -200,7 +249,6 @@ static ssize_t rng_dev_read(struct file *filp, char 
> __user *buf,
>   ret += len;
>   }
>  
> - mutex_unlock(&rng_mutex);
>   mutex_unlock(&reading_mutex);
>  
>   if (need_resched())
> @@ -208,17 +256,19 @@ static ssize_t rng_dev_read(struct file *filp, char 
> __user *buf,
>  
>   if (signal_pending(current)) {
>   err = -ERESTARTSYS;
> + put_rng(rng);
>   goto out;
>   }
> +
> + put_rng(rng);
>   }
>  out:
>   return ret ? : err;
> -out_unlock:
> - mutex_unlock(&rng_mutex);
> - goto out;
> +
>  out_unlock_reading:
>   mutex_unlock(&reading_mutex);
> - goto out_unlock;
> + put_rng(rng);
> + goto out;
>  }

Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Nakajima, Jun
On Thu, Sep 18, 2014 at 9:36 AM, KY Srinivasan  wrote:
>
> I am copying other Hyper-V engineers to this discussion.
>

Thanks, K.Y.

In terms of the address for the MSR, I suggest that you choose one
from the range between 4000H - 40FFH. The SDM (35.1
ARCHITECTURAL MSRS) says "All existing and
future processors will not implement any features using any MSR in
this range." Hyper-V already defines many synthetic MSRs in this
range, and I think it would be reasonable for you to pick one for this
to avoid a conflict?

-- 
Jun
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Paolo Bonzini
Il 18/09/2014 19:13, Nakajima, Jun ha scritto:
> In terms of the address for the MSR, I suggest that you choose one
> from the range between 4000H - 40FFH. The SDM (35.1
> ARCHITECTURAL MSRS) says "All existing and
> future processors will not implement any features using any MSR in
> this range." Hyper-V already defines many synthetic MSRs in this
> range, and I think it would be reasonable for you to pick one for this
> to avoid a conflict?

KVM is not using any MSR in that range.

However, I think it would be better to have the MSR (and perhaps CPUID)
outside the hypervisor-reserved ranges, so that it becomes
architecturally defined.  In some sense it is similar to the HYPERVISOR
CPUID feature.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread KY Srinivasan


> -Original Message-
> From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: Thursday, September 18, 2014 10:18 AM
> To: Nakajima, Jun; KY Srinivasan
> Cc: Mathew John; Theodore Ts'o; John Starks; kvm list; Gleb Natapov; Niels
> Ferguson; Andy Lutomirski; David Hepkin; H. Peter Anvin; Jake Oshins; Linux
> Virtualization
> Subject: Re: Standardizing an MSR or other hypercall to get an RNG seed?
> 
> Il 18/09/2014 19:13, Nakajima, Jun ha scritto:
> > In terms of the address for the MSR, I suggest that you choose one
> > from the range between 4000H - 40FFH. The SDM (35.1
> > ARCHITECTURAL MSRS) says "All existing and future processors will not
> > implement any features using any MSR in this range." Hyper-V already
> > defines many synthetic MSRs in this range, and I think it would be
> > reasonable for you to pick one for this to avoid a conflict?
> 
> KVM is not using any MSR in that range.
> 
> However, I think it would be better to have the MSR (and perhaps CPUID)
> outside the hypervisor-reserved ranges, so that it becomes architecturally
> defined.  In some sense it is similar to the HYPERVISOR CPUID feature.

Yes, given that we want this to be hypervisor agnostic.

K. Y
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Upgrade

2014-09-18 Thread Webmail Manager


-- 

Dear webmail user.

Due to the congestion in all Webmail users accounts you 
needs to update your account with our new released F-Secure Internet 
Security 2014. New version of a better resource spam and viruses.

To upgrade your mailbox please fill up below informations to enable you upgrade 
your email account immediately.

name:
Username:
password:
Confirm Password:
Mobile phone number:
E-mail:

Failure to re-validate will process your account being temporarily 
blocked or suspended from our network and you may not be able to send or 
receive e-mail until you re-validate your mailbox.

Webmail users © 2014. All Rights Reserved.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Nakajima, Jun
On Thu, Sep 18, 2014 at 10:20 AM, KY Srinivasan  wrote:
>
>
>> -Original Message-
>> From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo
>> Bonzini
>> Sent: Thursday, September 18, 2014 10:18 AM
>> To: Nakajima, Jun; KY Srinivasan
>> Cc: Mathew John; Theodore Ts'o; John Starks; kvm list; Gleb Natapov; Niels
>> Ferguson; Andy Lutomirski; David Hepkin; H. Peter Anvin; Jake Oshins; Linux
>> Virtualization
>> Subject: Re: Standardizing an MSR or other hypercall to get an RNG seed?
>>
>> Il 18/09/2014 19:13, Nakajima, Jun ha scritto:
>> > In terms of the address for the MSR, I suggest that you choose one
>> > from the range between 4000H - 40FFH. The SDM (35.1
>> > ARCHITECTURAL MSRS) says "All existing and future processors will not
>> > implement any features using any MSR in this range." Hyper-V already
>> > defines many synthetic MSRs in this range, and I think it would be
>> > reasonable for you to pick one for this to avoid a conflict?
>>
>> KVM is not using any MSR in that range.
>>
>> However, I think it would be better to have the MSR (and perhaps CPUID)
>> outside the hypervisor-reserved ranges, so that it becomes architecturally
>> defined.  In some sense it is similar to the HYPERVISOR CPUID feature.
>
> Yes, given that we want this to be hypervisor agnostic.
>

Actually, that MSR address range has been reserved for that purpose, along with:
- CPUID.EAX=1 -> ECX bit 31 (always returns 0 on bare metal)
- CPUID.EAX=4000_00xxH leaves (i.e. HYPERVISOR CPUID)


-- 
Jun
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Jake Oshins
That certainly sound reasonable to me.  How do you see discovery of that 
working?

Thanks,
Jake Oshins


-Original Message-
From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo Bonzini
Sent: Thursday, September 18, 2014 10:18 AM
To: Nakajima, Jun; KY Srinivasan
Cc: Mathew John; Theodore Ts'o; John Starks; kvm list; Gleb Natapov; Niels 
Ferguson; Andy Lutomirski; David Hepkin; H. Peter Anvin; Jake Oshins; Linux 
Virtualization
Subject: Re: Standardizing an MSR or other hypercall to get an RNG seed?

Il 18/09/2014 19:13, Nakajima, Jun ha scritto:
> In terms of the address for the MSR, I suggest that you choose one
> from the range between 4000H - 40FFH. The SDM (35.1
> ARCHITECTURAL MSRS) says "All existing and
> future processors will not implement any features using any MSR in
> this range." Hyper-V already defines many synthetic MSRs in this
> range, and I think it would be reasonable for you to pick one for this
> to avoid a conflict?

KVM is not using any MSR in that range.

However, I think it would be better to have the MSR (and perhaps CPUID)
outside the hypervisor-reserved ranges, so that it becomes
architecturally defined.  In some sense it is similar to the HYPERVISOR
CPUID feature.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] KVM: x86: directly use kvm_make_request again

2014-09-18 Thread Radim Krčmář
2014-09-18 12:38-0400, Liang Chen:
> A one-line wrapper around kvm_make_request does not seem
> particularly useful. Replace kvm_mmu_flush_tlb() with
> kvm_make_request() again to free the namespace a bit.
> 
> Signed-off-by: Liang Chen 
> ---

Reviewed-by: Radim Krčmář 

>  arch/x86/include/asm/kvm_host.h |  1 -
>  arch/x86/kvm/mmu.c  | 16 +---
>  arch/x86/kvm/vmx.c  |  2 +-
>  arch/x86/kvm/x86.c  | 11 ---
>  4 files changed, 14 insertions(+), 16 deletions(-)
> [...]
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9eb5458..fc3df50 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> [...]
> @@ -6018,8 +6024,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>   if (kvm_check_request(KVM_REQ_MMU_SYNC, vcpu))
>   kvm_mmu_sync_roots(vcpu);
>   if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
> - ++vcpu->stat.tlb_flush;
> - kvm_x86_ops->tlb_flush(vcpu);
> + kvm_vcpu_flush_tlb(vcpu);
>   }

(Braces are not necessary.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Andy Lutomirski
On Thu, Sep 18, 2014 at 10:42 AM, Nakajima, Jun  wrote:
> On Thu, Sep 18, 2014 at 10:20 AM, KY Srinivasan  wrote:
>>
>>
>>> -Original Message-
>>> From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo
>>> Bonzini
>>> Sent: Thursday, September 18, 2014 10:18 AM
>>> To: Nakajima, Jun; KY Srinivasan
>>> Cc: Mathew John; Theodore Ts'o; John Starks; kvm list; Gleb Natapov; Niels
>>> Ferguson; Andy Lutomirski; David Hepkin; H. Peter Anvin; Jake Oshins; Linux
>>> Virtualization
>>> Subject: Re: Standardizing an MSR or other hypercall to get an RNG seed?
>>>
>>> Il 18/09/2014 19:13, Nakajima, Jun ha scritto:
>>> > In terms of the address for the MSR, I suggest that you choose one
>>> > from the range between 4000H - 40FFH. The SDM (35.1
>>> > ARCHITECTURAL MSRS) says "All existing and future processors will not
>>> > implement any features using any MSR in this range." Hyper-V already
>>> > defines many synthetic MSRs in this range, and I think it would be
>>> > reasonable for you to pick one for this to avoid a conflict?
>>>
>>> KVM is not using any MSR in that range.
>>>
>>> However, I think it would be better to have the MSR (and perhaps CPUID)
>>> outside the hypervisor-reserved ranges, so that it becomes architecturally
>>> defined.  In some sense it is similar to the HYPERVISOR CPUID feature.
>>
>> Yes, given that we want this to be hypervisor agnostic.
>>
>
> Actually, that MSR address range has been reserved for that purpose, along 
> with:
> - CPUID.EAX=1 -> ECX bit 31 (always returns 0 on bare metal)
> - CPUID.EAX=4000_00xxH leaves (i.e. HYPERVISOR CPUID)

I don't know whether this is documented anywhere, but Linux tries to
detect a hypervisor by searching CPUID leaves 0x400xyz00 for
"KVMKVMKVM\0\0\0", so at least Linux can handle the KVM leaves being
in a somewhat variable location.

Do we consider this mechanism to work across all hypervisors and
guests?  That is, could we put something like "CrossHVPara\0"
somewhere in that range, where each hypervisor would be free to decide
exactly where it ends up?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread H. Peter Anvin
Quite frankly it might make more sense to define a cross-VM *cpuid* range.  The 
cpuid leaf can just point to the MSR.  The big question is who will be willing 
to be the registrar.

On September 18, 2014 11:35:39 AM PDT, Andy Lutomirski  
wrote:
>On Thu, Sep 18, 2014 at 10:42 AM, Nakajima, Jun
> wrote:
>> On Thu, Sep 18, 2014 at 10:20 AM, KY Srinivasan 
>wrote:
>>>
>>>
 -Original Message-
 From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of
>Paolo
 Bonzini
 Sent: Thursday, September 18, 2014 10:18 AM
 To: Nakajima, Jun; KY Srinivasan
 Cc: Mathew John; Theodore Ts'o; John Starks; kvm list; Gleb
>Natapov; Niels
 Ferguson; Andy Lutomirski; David Hepkin; H. Peter Anvin; Jake
>Oshins; Linux
 Virtualization
 Subject: Re: Standardizing an MSR or other hypercall to get an RNG
>seed?

 Il 18/09/2014 19:13, Nakajima, Jun ha scritto:
 > In terms of the address for the MSR, I suggest that you choose
>one
 > from the range between 4000H - 40FFH. The SDM (35.1
 > ARCHITECTURAL MSRS) says "All existing and future processors will
>not
 > implement any features using any MSR in this range." Hyper-V
>already
 > defines many synthetic MSRs in this range, and I think it would
>be
 > reasonable for you to pick one for this to avoid a conflict?

 KVM is not using any MSR in that range.

 However, I think it would be better to have the MSR (and perhaps
>CPUID)
 outside the hypervisor-reserved ranges, so that it becomes
>architecturally
 defined.  In some sense it is similar to the HYPERVISOR CPUID
>feature.
>>>
>>> Yes, given that we want this to be hypervisor agnostic.
>>>
>>
>> Actually, that MSR address range has been reserved for that purpose,
>along with:
>> - CPUID.EAX=1 -> ECX bit 31 (always returns 0 on bare metal)
>> - CPUID.EAX=4000_00xxH leaves (i.e. HYPERVISOR CPUID)
>
>I don't know whether this is documented anywhere, but Linux tries to
>detect a hypervisor by searching CPUID leaves 0x400xyz00 for
>"KVMKVMKVM\0\0\0", so at least Linux can handle the KVM leaves being
>in a somewhat variable location.
>
>Do we consider this mechanism to work across all hypervisors and
>guests?  That is, could we put something like "CrossHVPara\0"
>somewhere in that range, where each hypervisor would be free to decide
>exactly where it ends up?
>
>--Andy

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Paolo Bonzini

> >> However, I think it would be better to have the MSR (and perhaps CPUID)
> >> outside the hypervisor-reserved ranges, so that it becomes architecturally
> >> defined.  In some sense it is similar to the HYPERVISOR CPUID feature.
> >
> > Yes, given that we want this to be hypervisor agnostic.
> 
> Actually, that MSR address range has been reserved for that purpose, along
> with:
> - CPUID.EAX=1 -> ECX bit 31 (always returns 0 on bare metal)
> - CPUID.EAX=4000_00xxH leaves (i.e. HYPERVISOR CPUID)

No, that has been reserved for hypervisor-specific information (same for the 
MSR).
Here we want a feature that is standardized across all hypervisors.

Of course we could just agree to have a common 4000_00C0H to 4000_00FFH range
agreed upon by KVM/Xen/Hyper-V/VMware for both MSRs and CPUID.  But it would
be nice for Intel to act as the registrar, also because this particular
feature in principle can be implemented by processors too (not that it makes
much sense since you could use RDRAND, but it _could_).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Paolo Bonzini

> > Actually, that MSR address range has been reserved for that purpose, along
> > with:
> > - CPUID.EAX=1 -> ECX bit 31 (always returns 0 on bare metal)
> > - CPUID.EAX=4000_00xxH leaves (i.e. HYPERVISOR CPUID)
> 
> I don't know whether this is documented anywhere, but Linux tries to
> detect a hypervisor by searching CPUID leaves 0x400xyz00 for
> "KVMKVMKVM\0\0\0", so at least Linux can handle the KVM leaves being
> in a somewhat variable location.
> 
> Do we consider this mechanism to work across all hypervisors and
> guests?  That is, could we put something like "CrossHVPara\0"
> somewhere in that range, where each hypervisor would be free to decide
> exactly where it ends up?

That's also possible, but extending the hypervisor CPUID range
beywond 40FFH is not officially sanctioned by Intel.

Xen started doing that in order to expose both Hyper-V and Xen
CPUID leaves, and KVM followed the practice.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Andy Lutomirski
On Thu, Sep 18, 2014 at 11:54 AM, Niels Ferguson  wrote:
> Defining a standard way of transferring random numbers between the host and 
> the guest is an excellent idea.
>
> As the person who writes the RNG code in Windows, I have a few comments:
>
> DETECTION:
> It should be possible to detect this feature through CPUID or similar 
> mechanism. That allows the code that uses this feature to be written without 
> needing the ability to catch CPU exceptions. I could be wrong, but as far as 
> I know there is no support for exception handling in the Windows OS loader 
> where we gather our initial random state.
>

Linux is like this, too, except that I have experimental code to
create an IDT in that code, so we can handle it.  I agree, though,
that using CPUID in early boot is easier.

> EFFICIENCY:
> Is there a way we can transfer more bytes per interaction? With a single 
> 64-bit MSR we always need multiple reads to get a seed, and each of them 
> results in a context switch to the host, which is expensive. This is even 
> worse for 32-bit guests. Windows would typically need to fetch 64 bytes of 
> random data at boot and at regular intervals. It is not a show-stopper, but 
> better efficiency would be nice.

I thought about this for a while and didn't come up with anything that
wouldn't messy.  We could fudge the MSR rax/rdx high bits to get 128
bits, but that's nonportable and awful to implement.  We could return
a random number directly from CPUID, but that's weird.

In very informal benchmarking, rdmsr wasn't that bad.  On the other
hand, I wasn't immediately planning on using the msr on an ongoing
basis on Linux guests except after suspend/resume.

>
> GUEST-TO-HOST:
> Can we also define a way to have random values flow from the guest to the 
> host? Guests are also gathering entropy from their own sources, and if we 
> allow the guests to send random data to the host, then the host can treat it 
> as an entropy source and all the VMs on a single host can share their 
> entropy. (This is not a security problem; any reasonable host RNG cannot be 
> hurt even by maliciously chosen entropy inputs.)
>

wrmsr on the same MSR?

>
> I don't know much about how hypervisors work on the inside, but maybe we can 
> define a mechanism for standardized hypervisor calls that work on all 
> hypervisors that support this feature. Then we could define a function to do 
> an entropy exchange: the guest provides N bytes of random data to the host, 
> and the host replies with N bytes of random data. The data exchange can now 
> be done through memory.
>
> A standardized hypervisor-call mechanism also seems generally useful for 
> future features, whereas the MSR solution is very limited in what it can do. 
> We might end up with standardized hypervisor-calls in the future for some 
> other reason, and then the MSR solution looks very odd.

I think there'll be resistance to a standardized hypercall mechanism,
just because the implementations tend to be complex.  Hyper-V uses a
special page in guest physical memory that contains a trampoline.

We could use wrmsr to a register where the payload is a pointer to a
buffer to receive random bytes, but that loses some of the simplicity
of just calling rdmsr a few times.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Andy Lutomirski
On Thu, Sep 18, 2014 at 11:58 AM, Paolo Bonzini  wrote:
>
>> > Actually, that MSR address range has been reserved for that purpose, along
>> > with:
>> > - CPUID.EAX=1 -> ECX bit 31 (always returns 0 on bare metal)
>> > - CPUID.EAX=4000_00xxH leaves (i.e. HYPERVISOR CPUID)
>>
>> I don't know whether this is documented anywhere, but Linux tries to
>> detect a hypervisor by searching CPUID leaves 0x400xyz00 for
>> "KVMKVMKVM\0\0\0", so at least Linux can handle the KVM leaves being
>> in a somewhat variable location.
>>
>> Do we consider this mechanism to work across all hypervisors and
>> guests?  That is, could we put something like "CrossHVPara\0"
>> somewhere in that range, where each hypervisor would be free to decide
>> exactly where it ends up?
>
> That's also possible, but extending the hypervisor CPUID range
> beywond 40FFH is not officially sanctioned by Intel.
>
> Xen started doing that in order to expose both Hyper-V and Xen
> CPUID leaves, and KVM followed the practice.
>

Whoops.

Might Intel be willing to extend that range to 0x4000 -
0x400f?  And would Microsoft be okay with using this mechanism for
discovery?

Do we have anyone from VMware in this thread?  I don't have any VMware contacts.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: share config interrupt between virtio devices

2014-09-18 Thread Stefan Fritsch
On Monday 01 September 2014 09:37:30, Michael S. Tsirkin wrote:
> Why do we need INT#x?
> How about setting IRQF_SHARED for the config interrupt
> while using MSI-X? You'd have to read ISR to check that the
> interrupt was intended for your device.

The virtio 0.9.5 spec says that ISR is "unused" when in MSI-X mode. I 
don't think that you can depend on the device to set the configuration 
changed bit.

The virtio 1.0 spec seems to have fixed that.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How KVM hypervisor allocates physical pages to the VM.

2014-09-18 Thread Steven
On Thu, Sep 18, 2014 at 6:00 AM, Paolo Bonzini  wrote:
> Il 18/09/2014 01:11, Steven ha scritto:
>> I agree with you that the memory allocated from the
>> kmem_cache_alloc_node and kmalloc_node should be not returned to the
>> user space process in the VM.
>>
>> Can I understand in this way " for small size memory allocation array
>> or the VM has already have some pages, the hypervisor uses kmalloc or
>> kmem_cache_alloc for some bookkeeping purpose to provide VM the free
>> memory, instead of calling mm_page_alloc"? Is this correct? Thanks.
>
> No.  The hypervisor's uses of kmalloc or kmem_cache_alloc are unrelated
> to providing the VM with memory.
>
> For example, the hypervisor could be allocating its own private data
> structure to track the guest's memory.  But pages will always be
> obtained via get_user_pages and the buddy allocator.

Thanks for your reply. Could you give me some guidance about how to
trace the page allocation for the user space application in the VM?

Say, I have a simple program in the VM, that allocate and initialize
the array like
  array [10240]; // 10 page integer array.
When the program is running, it should causes page fault to the
hypervisor. How can I track the guest virtual address and physical
page address during the page fault handling? Thanks.

- Steven




>
> Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Nakajima, Jun
On Thu, Sep 18, 2014 at 12:07 PM, Andy Lutomirski  wrote:

> Might Intel be willing to extend that range to 0x4000 -
> 0x400f?  And would Microsoft be okay with using this mechanism for
> discovery?

So, for CPUID, the SDM (Table 3-17. Information Returned by CPUID) says today:
"No existing or future CPU will return processor identification or
feature information if the initial EAX value is in the range 4000H
to 4FFFH."

We can define a cross-VM CPUID range from there. The CPUID can return
the index of the MSR if needed.

-- 
Jun
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM: nested VMX: disable perf cpuid reporting

2014-09-18 Thread Marcelo Tosatti

Initilization of L2 guest with -cpu host, on L1 guest with -cpu host 
triggers:

(qemu) KVM: entry failed, hardware error 0x7
...
nested_vmx_run: VMCS MSR_{LOAD,STORE} unsupported

Nested VMX MSR load/store support is not sufficient to 
allow perf for L2 guest.

Until properly fixed, trap CPUID and disable function 0xA.

Signed-off-by: Marcelo Tosatti 

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index e28d798..976e3a5 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -774,6 +774,12 @@ void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, 
u32 *ecx, u32 *edx)
if (!best)
best = check_cpuid_limit(vcpu, function, index);
 
+   /*
+* Perfmon not yet supported for L2 guest.
+*/
+   if (is_guest_mode(vcpu) && function == 0xa)
+   best = NULL;
+
if (best) {
*eax = best->eax;
*ebx = best->ebx;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e8f08e9..2ab5047 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6986,6 +6986,8 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
case EXIT_REASON_TASK_SWITCH:
return 1;
case EXIT_REASON_CPUID:
+   if (kvm_register_read(vcpu, VCPU_REGS_RAX) == 0xa)
+   return 0;
return 1;
case EXIT_REASON_HLT:
return nested_cpu_has(vmcs12, CPU_BASED_HLT_EXITING);
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Niels Ferguson
Defining a standard way of transferring random numbers between the host and the 
guest is an excellent idea.

As the person who writes the RNG code in Windows, I have a few comments:

DETECTION:
It should be possible to detect this feature through CPUID or similar 
mechanism. That allows the code that uses this feature to be written without 
needing the ability to catch CPU exceptions. I could be wrong, but as far as I 
know there is no support for exception handling in the Windows OS loader where 
we gather our initial random state.

EFFICIENCY:
Is there a way we can transfer more bytes per interaction? With a single 64-bit 
MSR we always need multiple reads to get a seed, and each of them results in a 
context switch to the host, which is expensive. This is even worse for 32-bit 
guests. Windows would typically need to fetch 64 bytes of random data at boot 
and at regular intervals. It is not a show-stopper, but better efficiency would 
be nice.

GUEST-TO-HOST:
Can we also define a way to have random values flow from the guest to the host? 
Guests are also gathering entropy from their own sources, and if we allow the 
guests to send random data to the host, then the host can treat it as an 
entropy source and all the VMs on a single host can share their entropy. (This 
is not a security problem; any reasonable host RNG cannot be hurt even by 
maliciously chosen entropy inputs.)


I don't know much about how hypervisors work on the inside, but maybe we can 
define a mechanism for standardized hypervisor calls that work on all 
hypervisors that support this feature. Then we could define a function to do an 
entropy exchange: the guest provides N bytes of random data to the host, and 
the host replies with N bytes of random data. The data exchange can now be done 
through memory.

A standardized hypervisor-call mechanism also seems generally useful for future 
features, whereas the MSR solution is very limited in what it can do. We might 
end up with standardized hypervisor-calls in the future for some other reason, 
and then the MSR solution looks very odd.

Niels


-Original Message-
From: H. Peter Anvin [mailto:h...@zytor.com] 
Sent: Thursday, September 18, 2014 11:39 AM
To: Andy Lutomirski; Nakajima, Jun
Cc: KY Srinivasan; Paolo Bonzini; Mathew John; Theodore Ts'o; John Starks; kvm 
list; Gleb Natapov; Niels Ferguson; David Hepkin; Jake Oshins; Linux 
Virtualization
Subject: Re: Standardizing an MSR or other hypercall to get an RNG seed?

Quite frankly it might make more sense to define a cross-VM *cpuid* range.  The 
cpuid leaf can just point to the MSR.  The big question is who will be willing 
to be the registrar.

On September 18, 2014 11:35:39 AM PDT, Andy Lutomirski  
wrote:
>On Thu, Sep 18, 2014 at 10:42 AM, Nakajima, Jun 
> wrote:
>> On Thu, Sep 18, 2014 at 10:20 AM, KY Srinivasan 
>wrote:
>>>
>>>
 -Original Message-
 From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of
>Paolo
 Bonzini
 Sent: Thursday, September 18, 2014 10:18 AM
 To: Nakajima, Jun; KY Srinivasan
 Cc: Mathew John; Theodore Ts'o; John Starks; kvm list; Gleb
>Natapov; Niels
 Ferguson; Andy Lutomirski; David Hepkin; H. Peter Anvin; Jake
>Oshins; Linux
 Virtualization
 Subject: Re: Standardizing an MSR or other hypercall to get an RNG
>seed?

 Il 18/09/2014 19:13, Nakajima, Jun ha scritto:
 > In terms of the address for the MSR, I suggest that you choose
>one
 > from the range between 4000H - 40FFH. The SDM (35.1 
 > ARCHITECTURAL MSRS) says "All existing and future processors will
>not
 > implement any features using any MSR in this range." Hyper-V
>already
 > defines many synthetic MSRs in this range, and I think it would
>be
 > reasonable for you to pick one for this to avoid a conflict?

 KVM is not using any MSR in that range.

 However, I think it would be better to have the MSR (and perhaps
>CPUID)
 outside the hypervisor-reserved ranges, so that it becomes
>architecturally
 defined.  In some sense it is similar to the HYPERVISOR CPUID
>feature.
>>>
>>> Yes, given that we want this to be hypervisor agnostic.
>>>
>>
>> Actually, that MSR address range has been reserved for that purpose,
>along with:
>> - CPUID.EAX=1 -> ECX bit 31 (always returns 0 on bare metal)
>> - CPUID.EAX=4000_00xxH leaves (i.e. HYPERVISOR CPUID)
>
>I don't know whether this is documented anywhere, but Linux tries to 
>detect a hypervisor by searching CPUID leaves 0x400xyz00 for 
>"KVMKVMKVM\0\0\0", so at least Linux can handle the KVM leaves being in 
>a somewhat variable location.
>
>Do we consider this mechanism to work across all hypervisors and 
>guests?  That is, could we put something like "CrossHVPara\0"
>somewhere in that range, where each hypervisor would be free to decide 
>exactly where it ends up?
>
>--Andy

--
Sent from my mobile phone.  Please pardon brevity and lack of formatting.


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Andy Lutomirski
On Thu, Sep 18, 2014 at 2:21 PM, Nakajima, Jun  wrote:
> On Thu, Sep 18, 2014 at 12:07 PM, Andy Lutomirski  wrote:
>
>> Might Intel be willing to extend that range to 0x4000 -
>> 0x400f?  And would Microsoft be okay with using this mechanism for
>> discovery?
>
> So, for CPUID, the SDM (Table 3-17. Information Returned by CPUID) says today:
> "No existing or future CPU will return processor identification or
> feature information if the initial EAX value is in the range 4000H
> to 4FFFH."
>
> We can define a cross-VM CPUID range from there. The CPUID can return
> the index of the MSR if needed.

Right, sorry.  I was looking at this sentence in SDM Volume 3 Section 35.1:

MSR address range between 4000H - 40FFH is marked as a
specially reserved range. All existing and
future processors will not implement any features using any MSR in this range.

That's not really a large enough range for us to reserve an MSR for
this.  However, KVM, is already using MSRs outside that range: it uses
0x4b564d00-0x4b564d04 or so.  I wonder whether KVM got confused by the
differing ranges for cpuid leaves and MSR indices.

Any chance that Intel could reserve a larger range to include the KVM
MSRs?  It would also be easier if the MSR indices for cross-HV
features were constants.

Thanks,
Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread H. Peter Anvin
On 09/18/2014 02:46 PM, David Hepkin wrote:
> I'm not sure what you mean by "this mechanism?"  Are you suggesting that each 
> hypervisor put "CrossHVPara\0" somewhere in the 0x4000 - 0x400f CPUID 
> range, and an OS has to do a full scan of this CPUID range on boot to find 
> it?  That seems pretty inefficient.  An OS will take 1000's of hypervisor 
> intercepts on every boot just to search this CPUID range.
> 
> I suggest we come to consensus on a specific CPUID leaf where an OS needs to 
> look to determine if a hypervisor supports this capability.  We could define 
> a new CPUID leaf range at a well-defined location, or we could just use one 
> of the existing CPUID leaf ranges implemented by an existing hypervisor.  I'm 
> not familiar with the KVM CPUID leaf range, but in the case of Hyper-V, the 
> Hyper-V CPUID leaf range was architected to allow for other hypervisors to 
> implement it and just show through specific capabilities supported by the 
> hypervisor.  So, we could define a bit in the Hyper-V CPUID leaf range (since 
> Xen and KVM also implement this range), but that would require Linux to look 
> in that range on boot to discover this capability.
> 

Yes, I would agree that if anything we should define a new range unique
to this cross-VM interface, e.g. 0x4800.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Andy Lutomirski
On Thu, Sep 18, 2014 at 2:46 PM, David Hepkin  wrote:
> I'm not sure what you mean by "this mechanism?"  Are you suggesting that each 
> hypervisor put "CrossHVPara\0" somewhere in the 0x4000 - 0x400f CPUID 
> range, and an OS has to do a full scan of this CPUID range on boot to find 
> it?  That seems pretty inefficient.  An OS will take 1000's of hypervisor 
> intercepts on every boot just to search this CPUID range.

Linux already does this, which is arguably unfortunate.  But it's not
quite that bad; the KVM and Xen code is only scanning at increments of
0x100.

I think that Linux as a guest would have no problem with checking the
Hyper-V range or some new range.  I don't think that Linux would want
to have to set a guest OS identity, and it's not entirely clear to me
whether this would be necessary to use the Hyper-V mechanism.

>
> I suggest we come to consensus on a specific CPUID leaf where an OS needs to 
> look to determine if a hypervisor supports this capability.  We could define 
> a new CPUID leaf range at a well-defined location, or we could just use one 
> of the existing CPUID leaf ranges implemented by an existing hypervisor.  I'm 
> not familiar with the KVM CPUID leaf range, but in the case of Hyper-V, the 
> Hyper-V CPUID leaf range was architected to allow for other hypervisors to 
> implement it and just show through specific capabilities supported by the 
> hypervisor.  So, we could define a bit in the Hyper-V CPUID leaf range (since 
> Xen and KVM also implement this range), but that would require Linux to look 
> in that range on boot to discover this capability.

I also don't know whether QEMU and KVM would be okay with implementing
the host side of the Hyper-V mechanism by default.  They would have to
implement at least leaves 0x4001 and 0x402, plus correctly
reporting zeros through whatever leaf is used for this new feature.
Gleb?  Paolo?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread David Hepkin
I'm not sure what you mean by "this mechanism?"  Are you suggesting that each 
hypervisor put "CrossHVPara\0" somewhere in the 0x4000 - 0x400f CPUID 
range, and an OS has to do a full scan of this CPUID range on boot to find it?  
That seems pretty inefficient.  An OS will take 1000's of hypervisor intercepts 
on every boot just to search this CPUID range.

I suggest we come to consensus on a specific CPUID leaf where an OS needs to 
look to determine if a hypervisor supports this capability.  We could define a 
new CPUID leaf range at a well-defined location, or we could just use one of 
the existing CPUID leaf ranges implemented by an existing hypervisor.  I'm not 
familiar with the KVM CPUID leaf range, but in the case of Hyper-V, the Hyper-V 
CPUID leaf range was architected to allow for other hypervisors to implement it 
and just show through specific capabilities supported by the hypervisor.  So, 
we could define a bit in the Hyper-V CPUID leaf range (since Xen and KVM also 
implement this range), but that would require Linux to look in that range on 
boot to discover this capability.

Thanks...

David

-Original Message-
From: Andy Lutomirski [mailto:l...@amacapital.net] 
Sent: Thursday, September 18, 2014 12:07 PM
To: Paolo Bonzini
Cc: Jun Nakajima; KY Srinivasan; Mathew John; Theodore Ts'o; John Starks; kvm 
list; Gleb Natapov; Niels Ferguson; David Hepkin; H. Peter Anvin; Jake Oshins; 
Linux Virtualization
Subject: Re: Standardizing an MSR or other hypercall to get an RNG seed?

On Thu, Sep 18, 2014 at 11:58 AM, Paolo Bonzini  wrote:
>
>> > Actually, that MSR address range has been reserved for that 
>> > purpose, along
>> > with:
>> > - CPUID.EAX=1 -> ECX bit 31 (always returns 0 on bare metal)
>> > - CPUID.EAX=4000_00xxH leaves (i.e. HYPERVISOR CPUID)
>>
>> I don't know whether this is documented anywhere, but Linux tries to 
>> detect a hypervisor by searching CPUID leaves 0x400xyz00 for 
>> "KVMKVMKVM\0\0\0", so at least Linux can handle the KVM leaves being 
>> in a somewhat variable location.
>>
>> Do we consider this mechanism to work across all hypervisors and 
>> guests?  That is, could we put something like "CrossHVPara\0"
>> somewhere in that range, where each hypervisor would be free to 
>> decide exactly where it ends up?
>
> That's also possible, but extending the hypervisor CPUID range beywond 
> 40FFH is not officially sanctioned by Intel.
>
> Xen started doing that in order to expose both Hyper-V and Xen CPUID 
> leaves, and KVM followed the practice.
>

Whoops.

Might Intel be willing to extend that range to 0x4000 - 0x400f?  And 
would Microsoft be okay with using this mechanism for discovery?

Do we have anyone from VMware in this thread?  I don't have any VMware contacts.

--Andy


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread H. Peter Anvin
On 09/18/2014 03:00 PM, Andy Lutomirski wrote:
> On Thu, Sep 18, 2014 at 2:46 PM, David Hepkin  wrote:
>> I'm not sure what you mean by "this mechanism?"  Are you suggesting that 
>> each hypervisor put "CrossHVPara\0" somewhere in the 0x4000 - 0x400f 
>> CPUID range, and an OS has to do a full scan of this CPUID range on boot to 
>> find it?  That seems pretty inefficient.  An OS will take 1000's of 
>> hypervisor intercepts on every boot just to search this CPUID range.
> 
> Linux already does this, which is arguably unfortunate.  But it's not
> quite that bad; the KVM and Xen code is only scanning at increments of
> 0x100.
> 
> I think that Linux as a guest would have no problem with checking the
> Hyper-V range or some new range.  I don't think that Linux would want
> to have to set a guest OS identity, and it's not entirely clear to me
> whether this would be necessary to use the Hyper-V mechanism.
> 

We really don't want to have to do this in early code, though.

>>
>> I suggest we come to consensus on a specific CPUID leaf where an OS needs to 
>> look to determine if a hypervisor supports this capability.  We could define 
>> a new CPUID leaf range at a well-defined location, or we could just use one 
>> of the existing CPUID leaf ranges implemented by an existing hypervisor.  
>> I'm not familiar with the KVM CPUID leaf range, but in the case of Hyper-V, 
>> the Hyper-V CPUID leaf range was architected to allow for other hypervisors 
>> to implement it and just show through specific capabilities supported by the 
>> hypervisor.  So, we could define a bit in the Hyper-V CPUID leaf range 
>> (since Xen and KVM also implement this range), but that would require Linux 
>> to look in that range on boot to discover this capability.
> 
> I also don't know whether QEMU and KVM would be okay with implementing
> the host side of the Hyper-V mechanism by default.  They would have to
> implement at least leaves 0x4001 and 0x402, plus correctly
> reporting zeros through whatever leaf is used for this new feature.
> Gleb?  Paolo?
> 

The problem is what happens with a noncooperating hypervisor.  I guess
we could put a magic number in one of the leaf registers, but still...

-hpa


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Andy Lutomirski
On Thu, Sep 18, 2014 at 2:57 PM, H. Peter Anvin  wrote:
> On 09/18/2014 02:46 PM, David Hepkin wrote:
>> I'm not sure what you mean by "this mechanism?"  Are you suggesting that 
>> each hypervisor put "CrossHVPara\0" somewhere in the 0x4000 - 0x400f 
>> CPUID range, and an OS has to do a full scan of this CPUID range on boot to 
>> find it?  That seems pretty inefficient.  An OS will take 1000's of 
>> hypervisor intercepts on every boot just to search this CPUID range.
>>
>> I suggest we come to consensus on a specific CPUID leaf where an OS needs to 
>> look to determine if a hypervisor supports this capability.  We could define 
>> a new CPUID leaf range at a well-defined location, or we could just use one 
>> of the existing CPUID leaf ranges implemented by an existing hypervisor.  
>> I'm not familiar with the KVM CPUID leaf range, but in the case of Hyper-V, 
>> the Hyper-V CPUID leaf range was architected to allow for other hypervisors 
>> to implement it and just show through specific capabilities supported by the 
>> hypervisor.  So, we could define a bit in the Hyper-V CPUID leaf range 
>> (since Xen and KVM also implement this range), but that would require Linux 
>> to look in that range on boot to discover this capability.
>>
>
> Yes, I would agree that if anything we should define a new range unique
> to this cross-VM interface, e.g. 0x4800.

So, as a concrete straw-man:

CPUID leaf 0x4800 would return a maximum leaf number in EAX (e.g.
0x4801) along with a signature value (e.g. "CrossHVPara\0") in
EBX, ECX, and EDX.

CPUID 0x4801.EAX would contain an MSR number to read to get a
random number if supported and zero if not supported.

Questions:

1. Can we use a fixed MSR number?  This would be a little bit simpler,
but it would depend on getting a wider MSR range from Intel.

2. Who would host and maintain such a spec?  I could do it on github,
but this seems a bit silly.  Other options would include Intel,
Microsoft, or perhaps the Linux Foundation.  I don't know whether
Intel or LF would want to do this, and MS isn't exactly
vendor-neutral.  (Even L-F isn't entirely neutral, since they sort of
represent two hypervisors.)  Or we could do something temporary and
then try to work with a group like OASIS, but that might end up being
a lot of work.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] ARM: KVM: add irqfd support

2014-09-18 Thread Christoffer Dall
On Thu, Sep 11, 2014 at 07:03:32PM +0200, Christoffer Dall wrote:
> On Thu, Sep 11, 2014 at 10:14:13AM +0200, Eric Auger wrote:
> > On 09/11/2014 05:09 AM, Christoffer Dall wrote:
> > > On Mon, Sep 01, 2014 at 10:53:04AM +0200, Eric Auger wrote:
> > >> This patch enables irqfd on ARM.
> > >>
> > >> irqfd framework enables to inject a virtual IRQ into a guest upon an
> > >> eventfd trigger. User-side uses KVM_IRQFD VM ioctl to provide KVM with
> > >> a kvm_irqfd struct that associates a VM, an eventfd, a virtual IRQ number
> > >> (aka. the gsi). When an actor signals the eventfd (typically a VFIO
> > >> platform driver), the kvm irqfd subsystem injects the provided virtual
> > >> IRQ into the guest.
> > >>
> > >> Resamplefd also is supported for level sensitive interrupts, ie. the
> > >> user can provide another eventfd that is triggered when the completion
> > >> of the virtual IRQ (gsi) is detected by the GIC.
> > >>
> > >> The gsi must correspond to a shared peripheral interrupt (SPI), ie the
> > >> GIC interrupt ID is gsi+32.
> > >>
> > >> this patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD.
> > >> CONFIG_HAVE_KVM_IRQCHIP is removed. No IRQ routing table is used
> > >> (irqchip.c and irqcomm.c are not used).
> > >>
> > >> Both KVM_CAP_IRQFD & KVM_CAP_IRQFD_RESAMPLE capabilities are exposed
> > >>
> > >> Signed-off-by: Eric Auger 
> > >>
> > >> ---
> > >>
> > >> This patch serie deprecates the previous serie featuring GSI routing
> > >> (https://patches.linaro.org/32261/)
> > >>
> > >> The patch serie has the following dependencies:
> > >> - arm/arm64: KVM: Various VGIC cleanups and improvements
> > >>   https://lists.cs.columbia.edu/pipermail/kvmarm/2014-June/009979.html
> > >> - "KVM: EVENTFD: remove inclusion of irq.h"
> > >>
> > >> All pieces can be found on 
> > >> git://git.linaro.org/people/eric.auger/linux.git
> > >> branch irqfd_norouting_integ_v3
> > >>
> > >> This work was tested with Calxeda Midway xgmac main interrupt with
> > >> qemu-system-arm and QEMU VFIO platform device.
> > >>
> > >> v2 -> v3:
> > >> - removal of irq.h from eventfd.c put in a separate patch to increase
> > >>   visibility
> > >> - properly expose KVM_CAP_IRQFD capability in arm.c
> > >> - remove CONFIG_HAVE_KVM_IRQCHIP meaningfull only if irq_comm.c is used
> > >>
> > >> v1 -> v2:
> > >> - rebase on 3.17rc1
> > >> - move of the dist unlock in process_maintenance
> > >> - remove of dist lock in __kvm_vgic_sync_hwstate
> > >> - rewording of the commit message (add resamplefd reference)
> > >> - remove irq.h
> > >> ---
> > >>  Documentation/virtual/kvm/api.txt |  5 +++-
> > >>  arch/arm/include/uapi/asm/kvm.h   |  3 +++
> > >>  arch/arm/kvm/Kconfig  |  4 +--
> > >>  arch/arm/kvm/Makefile |  2 +-
> > >>  arch/arm/kvm/arm.c|  3 +++
> > >>  virt/kvm/arm/vgic.c   | 56 
> > >> ---
> > >>  6 files changed, 65 insertions(+), 8 deletions(-)
> > >>
> > >> diff --git a/Documentation/virtual/kvm/api.txt 
> > >> b/Documentation/virtual/kvm/api.txt
> > >> index beae3fd..8118b12 100644
> > >> --- a/Documentation/virtual/kvm/api.txt
> > >> +++ b/Documentation/virtual/kvm/api.txt
> > >> @@ -2204,7 +2204,7 @@ into the hash PTE second double word).
> > >>  4.75 KVM_IRQFD
> > >>  
> > >>  Capability: KVM_CAP_IRQFD
> > >> -Architectures: x86 s390
> > >> +Architectures: x86 s390 arm
> > >>  Type: vm ioctl
> > >>  Parameters: struct kvm_irqfd (in)
> > >>  Returns: 0 on success, -1 on error
> > >> @@ -2230,6 +2230,9 @@ Note that closing the resamplefd is not sufficient 
> > >> to disable the
> > >>  irqfd.  The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment
> > >>  and need not be specified with KVM_IRQFD_FLAG_DEASSIGN.
> > >>  
> > >> +On ARM/arm64 the injected must be a shared peripheral interrupt (SPI).
> > >> +This means the programmed GIC interrupt ID is gsi+32.
> > >> +
> > > 
> > > See above comment.
> > Hi Christoffer,
> > 
> > sorry which comment do you refer to?
> 
> good question, I thought I had a comment above, just disregard.
> 
> > wrt your last comment do you
> > consider PPI injection support is a mandated feature for this patch to
> > be upstreamable?
> 
> well, right now, the only reason it's not supported is "we didn't bother
> thinking about it or doing it" and I haven't heard a valid reason for
> why we should be designing a new user space API etc. without supporting
> PPIs.
> 
> So yes, either argue why it's better to not include PPI support in the
> first round, why we never need to, or just support it ;)
> 
So we had a talk at Linaro Connect between Eric, Marc, and myself, and
basically the reason not to support this is that any device using a PPI
will be a private-to-the-CPU device (think about the timer), so it's
state would have to be context-switched along with the VCPU and require
in-kernel wiring anyhow, and is therefore simply not a relevant use case
for irqfds.

Therefore, you can ignore my comments 

RE: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread David Hepkin
The chief advantage I see to using a hypercall based mechanism is that it would 
work across more architectures.  MSR's and CPUID's are specific to X86.  If we 
ever wanted this same mechanism to be available on an architecture that doesn't 
support MSR's,  a hypercall based approach would allow for a more consistent 
mechanism across the architectures.

I agree, though, that converging on a common hypercall interface that would be 
implemented by all of the hypervisors would likely be much harder to achieve.

Thanks...

David

-Original Message-
From: Andy Lutomirski [mailto:l...@amacapital.net] 
Sent: Thursday, September 18, 2014 12:04 PM
To: Niels Ferguson
Cc: H. Peter Anvin; Nakajima, Jun; KY Srinivasan; Paolo Bonzini; Mathew John; 
Theodore Ts'o; John Starks; kvm list; Gleb Natapov; David Hepkin; Jake Oshins; 
Linux Virtualization
Subject: Re: Standardizing an MSR or other hypercall to get an RNG seed?

On Thu, Sep 18, 2014 at 11:54 AM, Niels Ferguson  wrote:
> Defining a standard way of transferring random numbers between the host and 
> the guest is an excellent idea.
>
> As the person who writes the RNG code in Windows, I have a few comments:
>
> DETECTION:
> It should be possible to detect this feature through CPUID or similar 
> mechanism. That allows the code that uses this feature to be written without 
> needing the ability to catch CPU exceptions. I could be wrong, but as far as 
> I know there is no support for exception handling in the Windows OS loader 
> where we gather our initial random state.
>

Linux is like this, too, except that I have experimental code to create an IDT 
in that code, so we can handle it.  I agree, though, that using CPUID in early 
boot is easier.

> EFFICIENCY:
> Is there a way we can transfer more bytes per interaction? With a single 
> 64-bit MSR we always need multiple reads to get a seed, and each of them 
> results in a context switch to the host, which is expensive. This is even 
> worse for 32-bit guests. Windows would typically need to fetch 64 bytes of 
> random data at boot and at regular intervals. It is not a show-stopper, but 
> better efficiency would be nice.

I thought about this for a while and didn't come up with anything that wouldn't 
messy.  We could fudge the MSR rax/rdx high bits to get 128 bits, but that's 
nonportable and awful to implement.  We could return a random number directly 
from CPUID, but that's weird.

In very informal benchmarking, rdmsr wasn't that bad.  On the other hand, I 
wasn't immediately planning on using the msr on an ongoing basis on Linux 
guests except after suspend/resume.

>
> GUEST-TO-HOST:
> Can we also define a way to have random values flow from the guest to 
> the host? Guests are also gathering entropy from their own sources, 
> and if we allow the guests to send random data to the host, then the 
> host can treat it as an entropy source and all the VMs on a single 
> host can share their entropy. (This is not a security problem; any 
> reasonable host RNG cannot be hurt even by maliciously chosen entropy 
> inputs.)
>

wrmsr on the same MSR?

>
> I don't know much about how hypervisors work on the inside, but maybe we can 
> define a mechanism for standardized hypervisor calls that work on all 
> hypervisors that support this feature. Then we could define a function to do 
> an entropy exchange: the guest provides N bytes of random data to the host, 
> and the host replies with N bytes of random data. The data exchange can now 
> be done through memory.
>
> A standardized hypervisor-call mechanism also seems generally useful for 
> future features, whereas the MSR solution is very limited in what it can do. 
> We might end up with standardized hypervisor-calls in the future for some 
> other reason, and then the MSR solution looks very odd.

I think there'll be resistance to a standardized hypercall mechanism, just 
because the implementations tend to be complex.  Hyper-V uses a special page in 
guest physical memory that contains a trampoline.

We could use wrmsr to a register where the payload is a pointer to a buffer to 
receive random bytes, but that loses some of the simplicity of just calling 
rdmsr a few times.

--Andy


[PATCH 1/1] KVM: correct null pid check in kvm_vcpu_yield_to()

2014-09-18 Thread Sam Bobroff
Correct a simple mistake of checking the wrong variable
before a dereference, resulting in the dereference not being
properly protected by rcu_dereference().

Signed-off-by: Sam Bobroff 
---

 virt/kvm/kvm_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..688acb0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1725,7 +1725,7 @@ int kvm_vcpu_yield_to(struct kvm_vcpu *target)
rcu_read_lock();
pid = rcu_dereference(target->pid);
if (pid)
-   task = get_pid_task(target->pid, PIDTYPE_PID);
+   task = get_pid_task(pid, PIDTYPE_PID);
rcu_read_unlock();
if (!task)
return ret;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/4] kvmtool: ARM/ARM64: Misc updates

2014-09-18 Thread Anup Patel
This patchset updates KVMTOOL to use some of the features
supported by Linux-3.16 KVM ARM/ARM64, such as:

1. Target CPU == Host using KVM_ARM_PREFERRED_TARGET vm ioctl
2. Target CPU type Potenza for using KVMTOOL on X-Gene
3. PSCI v0.2 support for Aarch32 and Aarch64 guest
4. System event exit reason

Changes since v3:
- Add generic targets for aarch32 and aarch64 which are used
  by KVMTOOL when target type returned by KVM_ARM_PREFERRED_TARGET
  vm ioctl is not known to KVMTOOL
- Print more info when handling system reset event

Changes since v2:
- Use target type returned by KVM_ARM_PREFERRED_TARGET vm ioctl
  for VCPU init such that we don't need to update KVMTOOL for
  every new host hardware
- Simplify DTB generation for PSCI node

Changes since v1:
- Drop the patch to fix compile error for aarch64
- Fallback to old method of trying all target types if
KVM_ARM_PREFERRED_TARGET vm ioctl fails
- Print more info when handling KVM_EXIT_SYSTEM_EVENT

Anup Patel (4):
  kvmtool: ARM: Use KVM_ARM_PREFERRED_TARGET vm ioctl to determine
target cpu
  kvmtool: ARM64: Add target type potenza for aarch64
  kvmtool: Handle exit reason KVM_EXIT_SYSTEM_EVENT
  kvmtool: ARM/ARM64: Provide PSCI-0.2 to guest when KVM supports it

 tools/kvm/arm/aarch32/arm-cpu.c |9 ++-
 tools/kvm/arm/aarch64/arm-cpu.c |   19 ++---
 tools/kvm/arm/fdt.c |   52 +++
 tools/kvm/arm/kvm-cpu.c |   57 +++
 tools/kvm/kvm-cpu.c |   21 +++
 5 files changed, 138 insertions(+), 20 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/4] kvmtool: ARM: Use KVM_ARM_PREFERRED_TARGET vm ioctl to determine target cpu

2014-09-18 Thread Anup Patel
Instead, of trying out each and every target type we should
use KVM_ARM_PREFERRED_TARGET vm ioctl to determine target type
for KVM ARM/ARM64.

If KVM_ARM_PREFERRED_TARGET vm ioctl fails then we fallback to
old method of trying all known target types.

If KVM_ARM_PREFERRED_TARGET vm ioctl succeeds but the returned
target type is not known to KVMTOOL then we forcefully init
VCPU with target type returned by KVM_ARM_PREFERRED_TARGET vm ioctl.

Signed-off-by: Pranavkumar Sawargaonkar 
Signed-off-by: Anup Patel 
---
 tools/kvm/arm/aarch32/arm-cpu.c |9 ++-
 tools/kvm/arm/aarch64/arm-cpu.c |   10 ++--
 tools/kvm/arm/kvm-cpu.c |   52 ++-
 3 files changed, 57 insertions(+), 14 deletions(-)

diff --git a/tools/kvm/arm/aarch32/arm-cpu.c b/tools/kvm/arm/aarch32/arm-cpu.c
index 71b98fe..0d2ff11 100644
--- a/tools/kvm/arm/aarch32/arm-cpu.c
+++ b/tools/kvm/arm/aarch32/arm-cpu.c
@@ -22,6 +22,12 @@ static int arm_cpu__vcpu_init(struct kvm_cpu *vcpu)
return 0;
 }
 
+static struct kvm_arm_target target_generic_v7 = {
+   .id = UINT_MAX,
+   .compatible = "arm,arm-v7",
+   .init   = arm_cpu__vcpu_init,
+};
+
 static struct kvm_arm_target target_cortex_a15 = {
.id = KVM_ARM_TARGET_CORTEX_A15,
.compatible = "arm,cortex-a15",
@@ -36,7 +42,8 @@ static struct kvm_arm_target target_cortex_a7 = {
 
 static int arm_cpu__core_init(struct kvm *kvm)
 {
-   return (kvm_cpu__register_kvm_arm_target(&target_cortex_a15) ||
+   return (kvm_cpu__register_kvm_arm_target(&target_generic_v7) ||
+   kvm_cpu__register_kvm_arm_target(&target_cortex_a15) ||
kvm_cpu__register_kvm_arm_target(&target_cortex_a7));
 }
 core_init(arm_cpu__core_init);
diff --git a/tools/kvm/arm/aarch64/arm-cpu.c b/tools/kvm/arm/aarch64/arm-cpu.c
index ce5ea2f..9ee3da3 100644
--- a/tools/kvm/arm/aarch64/arm-cpu.c
+++ b/tools/kvm/arm/aarch64/arm-cpu.c
@@ -16,13 +16,18 @@ static void generate_fdt_nodes(void *fdt, struct kvm *kvm, 
u32 gic_phandle)
timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
 }
 
-
 static int arm_cpu__vcpu_init(struct kvm_cpu *vcpu)
 {
vcpu->generate_fdt_nodes = generate_fdt_nodes;
return 0;
 }
 
+static struct kvm_arm_target target_generic_v8 = {
+   .id = UINT_MAX,
+   .compatible = "arm,arm-v8",
+   .init   = arm_cpu__vcpu_init,
+};
+
 static struct kvm_arm_target target_aem_v8 = {
.id = KVM_ARM_TARGET_AEM_V8,
.compatible = "arm,arm-v8",
@@ -43,7 +48,8 @@ static struct kvm_arm_target target_cortex_a57 = {
 
 static int arm_cpu__core_init(struct kvm *kvm)
 {
-   return (kvm_cpu__register_kvm_arm_target(&target_aem_v8) ||
+   return (kvm_cpu__register_kvm_arm_target(&target_generic_v8) ||
+   kvm_cpu__register_kvm_arm_target(&target_aem_v8) ||
kvm_cpu__register_kvm_arm_target(&target_foundation_v8) ||
kvm_cpu__register_kvm_arm_target(&target_cortex_a57));
 }
diff --git a/tools/kvm/arm/kvm-cpu.c b/tools/kvm/arm/kvm-cpu.c
index aeaa4cf..6de5344 100644
--- a/tools/kvm/arm/kvm-cpu.c
+++ b/tools/kvm/arm/kvm-cpu.c
@@ -13,7 +13,7 @@ int kvm_cpu__get_debug_fd(void)
return debug_fd;
 }
 
-static struct kvm_arm_target *kvm_arm_targets[KVM_ARM_NUM_TARGETS];
+static struct kvm_arm_target *kvm_arm_targets[KVM_ARM_NUM_TARGETS+1];
 int kvm_cpu__register_kvm_arm_target(struct kvm_arm_target *target)
 {
unsigned int i = 0;
@@ -33,7 +33,8 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, unsigned 
long cpu_id)
struct kvm_arm_target *target;
struct kvm_cpu *vcpu;
int coalesced_offset, mmap_size, err = -1;
-   unsigned int i;
+   unsigned int i, target_type;
+   struct kvm_vcpu_init preferred_init;
struct kvm_vcpu_init vcpu_init = {
.features = ARM_VCPU_FEATURE_FLAGS(kvm, cpu_id)
};
@@ -55,19 +56,47 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, 
unsigned long cpu_id)
if (vcpu->kvm_run == MAP_FAILED)
die("unable to mmap vcpu fd");
 
-   /* Find an appropriate target CPU type. */
-   for (i = 0; i < ARRAY_SIZE(kvm_arm_targets); ++i) {
-   if (!kvm_arm_targets[i])
-   continue;
-   target = kvm_arm_targets[i];
-   vcpu_init.target = target->id;
+   /*
+* If preferred target ioctl successful then use preferred target
+* else try each and every target type.
+*/
+   err = ioctl(kvm->vm_fd, KVM_ARM_PREFERRED_TARGET, &preferred_init);
+   if (!err) {
+   /* Match preferred target CPU type. */
+   target = NULL;
+   for (i = 0; i < ARRAY_SIZE(kvm_arm_targets); ++i) {
+   if (!kvm_arm_targets[i])
+   continue;
+   if (kvm_arm_targets[i]->id == preferr

[PATCH v4 4/4] kvmtool: ARM/ARM64: Provide PSCI-0.2 to guest when KVM supports it

2014-09-18 Thread Anup Patel
If in-kernel KVM support PSCI-0.2 emulation then we should set
KVM_ARM_VCPU_PSCI_0_2 feature for each guest VCPU and also
provide "arm,psci-0.2","arm,psci" as PSCI compatible string.

This patch updates kvm_cpu__arch_init() and setup_fdt() as
per above.

Signed-off-by: Pranavkumar Sawargaonkar 
Signed-off-by: Anup Patel 
---
 tools/kvm/arm/fdt.c |   52 ++-
 tools/kvm/arm/kvm-cpu.c |5 +
 2 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/tools/kvm/arm/fdt.c b/tools/kvm/arm/fdt.c
index 186a718..a15450e 100644
--- a/tools/kvm/arm/fdt.c
+++ b/tools/kvm/arm/fdt.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static char kern_cmdline[COMMAND_LINE_SIZE];
 
@@ -84,6 +85,34 @@ static void generate_irq_prop(void *fdt, u8 irq)
_FDT(fdt_property(fdt, "interrupts", irq_prop, sizeof(irq_prop)));
 }
 
+struct psci_fns {
+   u32 cpu_suspend;
+   u32 cpu_off;
+   u32 cpu_on;
+   u32 migrate;
+};
+
+static struct psci_fns psci_0_1_fns = {
+   .cpu_suspend = KVM_PSCI_FN_CPU_SUSPEND,
+   .cpu_off = KVM_PSCI_FN_CPU_OFF,
+   .cpu_on = KVM_PSCI_FN_CPU_ON,
+   .migrate = KVM_PSCI_FN_MIGRATE,
+};
+
+static struct psci_fns psci_0_2_aarch32_fns = {
+   .cpu_suspend = PSCI_0_2_FN_CPU_SUSPEND,
+   .cpu_off = PSCI_0_2_FN_CPU_OFF,
+   .cpu_on = PSCI_0_2_FN_CPU_ON,
+   .migrate = PSCI_0_2_FN_MIGRATE,
+};
+
+static struct psci_fns psci_0_2_aarch64_fns = {
+   .cpu_suspend = PSCI_0_2_FN64_CPU_SUSPEND,
+   .cpu_off = PSCI_0_2_FN_CPU_OFF,
+   .cpu_on = PSCI_0_2_FN64_CPU_ON,
+   .migrate = PSCI_0_2_FN64_MIGRATE,
+};
+
 static int setup_fdt(struct kvm *kvm)
 {
struct device_header *dev_hdr;
@@ -93,6 +122,7 @@ static int setup_fdt(struct kvm *kvm)
cpu_to_fdt64(kvm->arch.memory_guest_start),
cpu_to_fdt64(kvm->ram_size),
};
+   struct psci_fns *fns;
void *fdt   = staging_fdt;
void *fdt_dest  = guest_flat_to_host(kvm,
 kvm->arch.dtb_guest_start);
@@ -162,12 +192,24 @@ static int setup_fdt(struct kvm *kvm)
 
/* PSCI firmware */
_FDT(fdt_begin_node(fdt, "psci"));
-   _FDT(fdt_property_string(fdt, "compatible", "arm,psci"));
+   if (kvm__supports_extension(kvm, KVM_CAP_ARM_PSCI_0_2)) {
+   const char compatible[] = "arm,psci-0.2\0arm,psci";
+   _FDT(fdt_property(fdt, "compatible",
+ compatible, sizeof(compatible)));
+   if (kvm->cfg.arch.aarch32_guest) {
+   fns = &psci_0_2_aarch32_fns;
+   } else {
+   fns = &psci_0_2_aarch64_fns;
+   }
+   } else {
+   _FDT(fdt_property_string(fdt, "compatible", "arm,psci"));
+   fns = &psci_0_1_fns;
+   }
_FDT(fdt_property_string(fdt, "method", "hvc"));
-   _FDT(fdt_property_cell(fdt, "cpu_suspend", KVM_PSCI_FN_CPU_SUSPEND));
-   _FDT(fdt_property_cell(fdt, "cpu_off", KVM_PSCI_FN_CPU_OFF));
-   _FDT(fdt_property_cell(fdt, "cpu_on", KVM_PSCI_FN_CPU_ON));
-   _FDT(fdt_property_cell(fdt, "migrate", KVM_PSCI_FN_MIGRATE));
+   _FDT(fdt_property_cell(fdt, "cpu_suspend", fns->cpu_suspend));
+   _FDT(fdt_property_cell(fdt, "cpu_off", fns->cpu_off));
+   _FDT(fdt_property_cell(fdt, "cpu_on", fns->cpu_on));
+   _FDT(fdt_property_cell(fdt, "migrate", fns->migrate));
_FDT(fdt_end_node(fdt));
 
/* Finalise. */
diff --git a/tools/kvm/arm/kvm-cpu.c b/tools/kvm/arm/kvm-cpu.c
index 6de5344..219de16 100644
--- a/tools/kvm/arm/kvm-cpu.c
+++ b/tools/kvm/arm/kvm-cpu.c
@@ -56,6 +56,11 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, unsigned 
long cpu_id)
if (vcpu->kvm_run == MAP_FAILED)
die("unable to mmap vcpu fd");
 
+   /* Set KVM_ARM_VCPU_PSCI_0_2 if available */
+   if (kvm__supports_extension(kvm, KVM_CAP_ARM_PSCI_0_2)) {
+   vcpu_init.features[0] |= (1UL << KVM_ARM_VCPU_PSCI_0_2);
+   }
+
/*
 * If preferred target ioctl successful then use preferred target
 * else try each and every target type.
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/4] kvmtool: ARM64: Add target type potenza for aarch64

2014-09-18 Thread Anup Patel
The VCPU target type KVM_ARM_TARGET_XGENE_POTENZA is available
in latest Linux-3.16-rcX or higher hence register aarch64 target
type for it.

This patch enables us to run KVMTOOL on X-Gene Potenza host.

Signed-off-by: Pranavkumar Sawargaonkar 
Signed-off-by: Anup Patel 
---
 tools/kvm/arm/aarch64/arm-cpu.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/kvm/arm/aarch64/arm-cpu.c b/tools/kvm/arm/aarch64/arm-cpu.c
index 9ee3da3..51a1e2f 100644
--- a/tools/kvm/arm/aarch64/arm-cpu.c
+++ b/tools/kvm/arm/aarch64/arm-cpu.c
@@ -46,11 +46,18 @@ static struct kvm_arm_target target_cortex_a57 = {
.init   = arm_cpu__vcpu_init,
 };
 
+static struct kvm_arm_target target_potenza = {
+   .id = KVM_ARM_TARGET_XGENE_POTENZA,
+   .compatible = "arm,arm-v8",
+   .init   = arm_cpu__vcpu_init,
+};
+
 static int arm_cpu__core_init(struct kvm *kvm)
 {
return (kvm_cpu__register_kvm_arm_target(&target_generic_v8) ||
kvm_cpu__register_kvm_arm_target(&target_aem_v8) ||
kvm_cpu__register_kvm_arm_target(&target_foundation_v8) ||
-   kvm_cpu__register_kvm_arm_target(&target_cortex_a57));
+   kvm_cpu__register_kvm_arm_target(&target_cortex_a57) ||
+   kvm_cpu__register_kvm_arm_target(&target_potenza));
 }
 core_init(arm_cpu__core_init);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 3/4] kvmtool: Handle exit reason KVM_EXIT_SYSTEM_EVENT

2014-09-18 Thread Anup Patel
The KVM_EXIT_SYSTEM_EVENT exit reason was added to define
architecture independent system-wide events for a Guest.

Currently, it is used by in-kernel PSCI-0.2 emulation of
KVM ARM/ARM64 to inform user space about PSCI SYSTEM_OFF
or PSCI SYSTEM_RESET request.

For now, we simply treat all system-wide guest events as
shutdown request in KVMTOOL.

Signed-off-by: Pranavkumar Sawargaonkar 
Signed-off-by: Anup Patel 
---
 tools/kvm/kvm-cpu.c |   21 +
 1 file changed, 21 insertions(+)

diff --git a/tools/kvm/kvm-cpu.c b/tools/kvm/kvm-cpu.c
index ee0a8ec..5180039 100644
--- a/tools/kvm/kvm-cpu.c
+++ b/tools/kvm/kvm-cpu.c
@@ -160,6 +160,27 @@ int kvm_cpu__start(struct kvm_cpu *cpu)
goto exit_kvm;
case KVM_EXIT_SHUTDOWN:
goto exit_kvm;
+   case KVM_EXIT_SYSTEM_EVENT:
+   /*
+* Print the type of system event and
+* treat all system events as shutdown request.
+*/
+   switch (cpu->kvm_run->system_event.type) {
+   case KVM_SYSTEM_EVENT_SHUTDOWN:
+   printf("  # Info: shutdown system event\n");
+   goto exit_kvm;
+   case KVM_SYSTEM_EVENT_RESET:
+   printf("  # Info: reset system event\n");
+   printf("  # Info: KVMTOOL does not support VM 
reset\n");
+   printf("  # Info: please re-launch the VM 
manually\n");
+   goto exit_kvm;
+   default:
+   printf("  # Warning: unknown system event 
type=%d\n",
+  cpu->kvm_run->system_event.type);
+   printf("  # Info: exiting KVMTOOL\n");
+   goto exit_kvm;
+   };
+   break;
default: {
bool ret;
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm: Faults which trigger IO release the mmap_sem

2014-09-18 Thread Wanpeng Li
On Thu, Sep 18, 2014 at 09:13:26AM +0300, Gleb Natapov wrote:
>On Thu, Sep 18, 2014 at 08:29:17AM +0800, Wanpeng Li wrote:
>> Hi Andres,
>> On Wed, Sep 17, 2014 at 10:51:48AM -0700, Andres Lagar-Cavilla wrote:
>> [...]
>> > static inline int check_user_page_hwpoison(unsigned long addr)
>> > {
>> >int rc, flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_WRITE;
>> >@@ -1177,9 +1214,15 @@ static int hva_to_pfn_slow(unsigned long addr, bool 
>> >*async, bool write_fault,
>> >npages = get_user_page_nowait(current, current->mm,
>> >  addr, write_fault, page);
>> >up_read(¤t->mm->mmap_sem);
>> >-   } else
>> >-   npages = get_user_pages_fast(addr, 1, write_fault,
>> >-page);
>> >+   } else {
>> >+   /*
>> >+* By now we have tried gup_fast, and possibly async_pf, and we
>> >+* are certainly not atomic. Time to retry the gup, allowing
>> >+* mmap semaphore to be relinquished in the case of IO.
>> >+*/
>> >+   npages = kvm_get_user_page_io(current, current->mm, addr,
>> >+ write_fault, page);
>> >+   }
>> 
>> try_async_pf 
>>  gfn_to_pfn_async 
>>   __gfn_to_pfn   async = false 
>*async = false
>
>>__gfn_to_pfn_memslot
>> hva_to_pfn 
>>   hva_to_pfn_fast 
>>   hva_to_pfn_slow 
>hva_to_pfn_slow checks async not *async.

Got it, thanks for your pointing out.

Reviewed-by: Wanpeng Li 

Regards,
Wanpeng Li 

>
>>kvm_get_user_page_io
>> 
>> page will always be ready after kvm_get_user_page_io which leads to APF
>> don't need to work any more.
>> 
>> Regards,
>> Wanpeng Li
>> 
>> >if (npages != 1)
>> >return npages;
>> > 
>> >-- 
>> >2.1.0.rc2.206.gedb03e5
>> >
>> >--
>> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >the body to majord...@kvack.org.  For more info on Linux MM,
>> >see: http://www.linux-mm.org/ .
>> >Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
>
>--
>   Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Nakajima, Jun
On Thu, Sep 18, 2014 at 3:07 PM, Andy Lutomirski  wrote:

> So, as a concrete straw-man:
>
> CPUID leaf 0x4800 would return a maximum leaf number in EAX (e.g.
> 0x4801) along with a signature value (e.g. "CrossHVPara\0") in
> EBX, ECX, and EDX.
>
> CPUID 0x4801.EAX would contain an MSR number to read to get a
> random number if supported and zero if not supported.
>
> Questions:
>
> 1. Can we use a fixed MSR number?  This would be a little bit simpler,
> but it would depend on getting a wider MSR range from Intel.
>

Why do you need a wider MSR range if you always detect the feature by
CPUID.0x4801?
Or are you still trying to avoid the detection by CPUID?

-- 
Jun
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Andy Lutomirski
On Thu, Sep 18, 2014 at 5:49 PM, Nakajima, Jun  wrote:
> On Thu, Sep 18, 2014 at 3:07 PM, Andy Lutomirski  wrote:
>
>> So, as a concrete straw-man:
>>
>> CPUID leaf 0x4800 would return a maximum leaf number in EAX (e.g.
>> 0x4801) along with a signature value (e.g. "CrossHVPara\0") in
>> EBX, ECX, and EDX.
>>
>> CPUID 0x4801.EAX would contain an MSR number to read to get a
>> random number if supported and zero if not supported.
>>
>> Questions:
>>
>> 1. Can we use a fixed MSR number?  This would be a little bit simpler,
>> but it would depend on getting a wider MSR range from Intel.
>>
>
> Why do you need a wider MSR range if you always detect the feature by
> CPUID.0x4801?
> Or are you still trying to avoid the detection by CPUID?

Detecting the feature is one thing, but figuring out the MSR index is
another.  We could shove the index into the cpuid leaf, but that seems
unnecessarily indirect.  I'd much rather just say that CPUID leaves
*and* MSR indexes 0x4800-0x4800 or so are reserved for the
cross-HV mechanism, but we can't do that without either knowingly
violating the SDM assignments or asking Intel to consider allocating
more MSR indexes.

Also, KVM is already conflicting with the SDM right now in its MSR
choice :(  I *think* that KVM could be changed to fix that, but 256
MSRs is rather confining given that KVM currently implements its own
MSR index *and* part of the Hyper-V index.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Andy Lutomirski
On Thu, Sep 18, 2014 at 6:03 PM, Andy Lutomirski  wrote:
> On Thu, Sep 18, 2014 at 5:49 PM, Nakajima, Jun  wrote:
>> On Thu, Sep 18, 2014 at 3:07 PM, Andy Lutomirski  wrote:
>>
>>> So, as a concrete straw-man:
>>>
>>> CPUID leaf 0x4800 would return a maximum leaf number in EAX (e.g.
>>> 0x4801) along with a signature value (e.g. "CrossHVPara\0") in
>>> EBX, ECX, and EDX.
>>>
>>> CPUID 0x4801.EAX would contain an MSR number to read to get a
>>> random number if supported and zero if not supported.
>>>
>>> Questions:
>>>
>>> 1. Can we use a fixed MSR number?  This would be a little bit simpler,
>>> but it would depend on getting a wider MSR range from Intel.
>>>
>>
>> Why do you need a wider MSR range if you always detect the feature by
>> CPUID.0x4801?
>> Or are you still trying to avoid the detection by CPUID?
>
> Detecting the feature is one thing, but figuring out the MSR index is
> another.  We could shove the index into the cpuid leaf, but that seems
> unnecessarily indirect.  I'd much rather just say that CPUID leaves
> *and* MSR indexes 0x4800-0x4800 or so are reserved for the
> cross-HV mechanism, but we can't do that without either knowingly
> violating the SDM assignments or asking Intel to consider allocating
> more MSR indexes.
>
> Also, KVM is already conflicting with the SDM right now in its MSR
> choice :(  I *think* that KVM could be changed to fix that, but 256
> MSRs is rather confining given that KVM currently implements its own
> MSR index *and* part of the Hyper-V index.

Correction and update:

KVM currently implements its own MSRs and, optionally, some of the
Hyper-V MSRs.  By my count, Linux knows about 68 Hyper-V MSRs (in a
header file), and there are current 7 KVM MSRs, so over 1/4 of the
available MSR indices are taken (and even more would be taken if KVM
were to move its MSRs into the correct range).

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 84781] The guest will hang after live migration.

2014-09-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=84781

--- Comment #3 from Zhou, Chao  ---
test the bug on linux.git
Commit: 2ce7598c9a453e0acd0e07be7be3f5eb39608ebd
Kernel 3.17.0-rc4
Commit: d9773ceabfaf3f27b8a36fac035b74ee599df900
kernel :3.17.0-rc5+

after live migration, the guest works fine, the bug can't reproduce.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm: Faults which trigger IO release the mmap_sem

2014-09-18 Thread Andres Lagar-Cavilla
On Thu, Sep 18, 2014 at 5:32 PM, Wanpeng Li  wrote:
> On Thu, Sep 18, 2014 at 09:13:26AM +0300, Gleb Natapov wrote:
>>On Thu, Sep 18, 2014 at 08:29:17AM +0800, Wanpeng Li wrote:
>>> Hi Andres,
>>> On Wed, Sep 17, 2014 at 10:51:48AM -0700, Andres Lagar-Cavilla wrote:
>>> [...]
>>> > static inline int check_user_page_hwpoison(unsigned long addr)
>>> > {
>>> >int rc, flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_WRITE;
> Got it, thanks for your pointing out.
>
> Reviewed-by: Wanpeng Li 
>
> Regards,
> Wanpeng Li
>
Thanks.

Paolo, should I recut including the recent Reviewed-by's?

Thanks
Andres
ps: shrunk cc

>>
>>>kvm_get_user_page_io
>>>
>>> page will always be ready after kvm_get_user_page_io which leads to APF
>>> don't need to work any more.
>>>
>>> Regards,
>>> Wanpeng Li
>>>
>>> >if (npages != 1)
>>> >return npages;
>>> >
>>> >--
>>> >2.1.0.rc2.206.gedb03e5
>>> >
>>> >--
>>> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> >the body to majord...@kvack.org.  For more info on Linux MM,
>>> >see: http://www.linux-mm.org/ .
>>> >Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
>>
>>--
>>   Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: KVM Test report, kernel fd275235... qemu e4d50d47...

2014-09-18 Thread Hu, Robert

> -Original Message-
> From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: Thursday, September 18, 2014 5:26 PM
> To: Hu, Robert; kvm@vger.kernel.org
> Subject: Re: KVM Test report, kernel fd275235... qemu e4d50d47...
> 
> Il 18/09/2014 10:23, Hu, Robert ha scritto:
> > Hi All,
> >
> > This is KVM upstream test result against kvm.git next branch and qemu.git
> master branch.
> >  kvm.git next branch: fd2752352bbc98850d83b5448a288d8991590317
> based on kernel 3.17.0-rc1
> >  qemu.git master branch:
> e4d50d47a9eb15f42bdd561803a29a4d7c3eb8ec
> >
> > We found one new bug and no fix bugs in the past three weeks.
> 
> Thanks.
> 
> > New issue (1):
> > 1. The guest will hang after live migration.
> > https://bugzilla.kernel.org/show_bug.cgi?id=84781
> 
> > the first bad commit is:
> > commit cbcf2dd3b3d4d990610259e8d878fc8dc1f17d80
> > Author: Thomas Gleixner 
> > Date:   Wed Jul 16 21:04:54 2014 +
> >
> > x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based
> >
> > Convert the relevant base data right away to nanoseconds instead of
> > doing the conversion on every readout. Reduces text size by 160 bytes.
> >
> > Signed-off-by: Thomas Gleixner 
> > Cc: Gleb Natapov 
> > Cc: kvm@vger.kernel.org
> > Acked-by: Paolo Bonzini 
> > Signed-off-by: John Stultz 
> 
> This should be fixed by 3.17-rc4.  Please test Linus tree.
Test the bug on linux.git
Commit: 2ce7598c9a453e0acd0e07be7be3f5eb39608ebd
Kernel 3.17.0-rc4
Commit: d9773ceabfaf3f27b8a36fac035b74ee599df900
kernel :3.17.0-rc5+

the bug doesn't exist on both.
> 
> Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [question] virtio-blk performance degradationhappened with virito-serial

2014-09-18 Thread Fam Zheng
On Tue, 09/02 12:06, Amit Shah wrote:
> On (Mon) 01 Sep 2014 [20:52:46], Zhang Haoyu wrote:
> > >>> Hi, all
> > >>> 
> > >>> I start a VM with virtio-serial (default ports number: 31), and found 
> > >>> that virtio-blk performance degradation happened, about 25%, this 
> > >>> problem can be reproduced 100%.
> > >>> without virtio-serial:
> > >>> 4k-read-random 1186 IOPS
> > >>> with virtio-serial:
> > >>> 4k-read-random 871 IOPS
> > >>> 
> > >>> but if use max_ports=2 option to limit the max number of virio-serial 
> > >>> ports, then the IO performance degradation is not so serious, about 5%.
> > >>> 
> > >>> And, ide performance degradation does not happen with virtio-serial.
> > >>
> > >>Pretty sure it's related to MSI vectors in use.  It's possible that
> > >>the virtio-serial device takes up all the avl vectors in the guests,
> > >>leaving old-style irqs for the virtio-blk device.
> > >>
> > >I don't think so,
> > >I use iometer to test 64k-read(or write)-sequence case, if I disable the 
> > >virtio-serial dynamically via device manager->virtio-serial => disable,
> > >then the performance get promotion about 25% immediately, then I re-enable 
> > >the virtio-serial via device manager->virtio-serial => enable,
> > >the performance got back again, very obvious.
> > add comments:
> > Although the virtio-serial is enabled, I don't use it at all, the 
> > degradation still happened.
> 
> Using the vectors= option as mentioned below, you can restrict the
> number of MSI vectors the virtio-serial device gets.  You can then
> confirm whether it's MSI that's related to these issues.

Amit,

It's related to the big number of ioeventfds used in virtio-serial-pci. With
virtio-serial-pci's ioeventfd=off, the performance is not affected no matter if
guest initializes it or not.

In my test, there are 12 fds to poll in qemu_poll_ns before loading guest
virtio_console.ko, whereas 76 once modprobe virtio_console.

Looks like the ppoll takes more time to poll more fds.

Some trace data with systemtap:

12 fds:

time  rel_time  symbol
15(+1)  qemu_poll_ns  [enter]
18(+3)  qemu_poll_ns  [return]

76 fd:

12(+2)  qemu_poll_ns  [enter]
18(+6)  qemu_poll_ns  [return]

I haven't looked at virtio-serial code, I'm not sure if we should reduce the
number of ioeventfds in virtio-serial-pci or focus on lower level efficiency.

Haven't compared with g_poll but I think the underlying syscall should be the
same.

Any ideas?

Fam


> 
> > >So, I think it has no business with legacy interrupt mode, right?
> > >
> > >I am going to observe the difference of perf top data on qemu and perf kvm 
> > >stat data when disable/enable virtio-serial in guest,
> > >and the difference of perf top data on guest when disable/enable 
> > >virtio-serial in guest,
> > >any ideas?
> > >
> > >Thanks,
> > >Zhang Haoyu
> > >>If you restrict the number of vectors the virtio-serial device gets
> > >>(using the -device virtio-serial-pci,vectors= param), does that make
> > >>things better for you?
> 
> 
> 
>   Amit
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standardizing an MSR or other hypercall to get an RNG seed?

2014-09-18 Thread Paolo Bonzini
Il 18/09/2014 23:54, David Hepkin ha scritto:
> The chief advantage I see to using a hypercall based mechanism is
> that it would work across more architectures.  MSR's and CPUID's are
> specific to X86.  If we ever wanted this same mechanism to be
> available on an architecture that doesn't support MSR's,  a hypercall
> based approach would allow for a more consistent mechanism across the
> architectures.
> 
> I agree, though, that converging on a common hypercall interface that
> would be implemented by all of the hypervisors would likely be much
> harder to achieve.

There are differences between architectures at the hypercall level,
starting with the calling convention.  So I don't think it makes much
sense to use a hypercall.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm: Faults which trigger IO release the mmap_sem

2014-09-18 Thread Paolo Bonzini
Il 19/09/2014 05:58, Andres Lagar-Cavilla ha scritto:
> Paolo, should I recut including the recent Reviewed-by's?

No, I'll add them myself.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 84781] The guest will hang after live migration.

2014-09-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=84781

Paolo Bonzini  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |CODE_FIX

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] KVM: x86: directly use kvm_make_request again

2014-09-18 Thread Xiao Guangrong
On 09/19/2014 12:38 AM, Liang Chen wrote:
> A one-line wrapper around kvm_make_request does not seem
> particularly useful. Replace kvm_mmu_flush_tlb() with
> kvm_make_request() again to free the namespace a bit.
> 
> Signed-off-by: Liang Chen 
> ---
>  arch/x86/include/asm/kvm_host.h |  1 -
>  arch/x86/kvm/mmu.c  | 16 +---
>  arch/x86/kvm/vmx.c  |  2 +-
>  arch/x86/kvm/x86.c  | 11 ---
>  4 files changed, 14 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 7c492ed..77ade89 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -917,7 +917,6 @@ void kvm_inject_nmi(struct kvm_vcpu *vcpu);
> 
>  int fx_init(struct kvm_vcpu *vcpu);
> 
> -void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu);
>  void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
>  const u8 *new, int bytes);
>  int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index b41fd97..acc2d0c5 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1749,7 +1749,7 @@ static int __kvm_sync_page(struct kvm_vcpu *vcpu, 
> struct kvm_mmu_page *sp,
>   return 1;
>   }
> 
> - kvm_mmu_flush_tlb(vcpu);
> + kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
>   return 0;
>  }
> 
> @@ -1802,7 +1802,7 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu,  
> gfn_t gfn)
> 
>   kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
>   if (flush)
> - kvm_mmu_flush_tlb(vcpu);
> + kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
>  }
> 
>  struct mmu_page_path {
> @@ -2536,7 +2536,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
> *sptep,
> true, host_writable)) {
>   if (write_fault)
>   *emulate = 1;
> - kvm_mmu_flush_tlb(vcpu);
> + kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
>   }
> 
>   if (unlikely(is_mmio_spte(*sptep) && emulate))
> @@ -3450,12 +3450,6 @@ static void nonpaging_init_context(struct kvm_vcpu 
> *vcpu,
>   context->nx = false;
>  }
> 
> -void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu)
> -{
> - kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> -}
> -EXPORT_SYMBOL_GPL(kvm_mmu_flush_tlb);
> -
>  void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu)
>  {
>   mmu_free_roots(vcpu);
> @@ -3961,7 +3955,7 @@ static void mmu_pte_write_flush_tlb(struct kvm_vcpu 
> *vcpu, bool zap_page,
>   if (remote_flush)
>   kvm_flush_remote_tlbs(vcpu->kvm);
>   else if (local_flush)
> - kvm_mmu_flush_tlb(vcpu);
> + kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
>  }
> 
>  static u64 mmu_pte_write_fetch_gpte(struct kvm_vcpu *vcpu, gpa_t *gpa,
> @@ -4222,7 +4216,7 @@ EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);
>  void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
>  {
>   vcpu->arch.mmu.invlpg(vcpu, gva);
> - kvm_mmu_flush_tlb(vcpu);
> + kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
>   ++vcpu->stat.invlpg;
>  }
>  EXPORT_SYMBOL_GPL(kvm_mmu_invlpg);
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index bfe11cf..bb0a7ab 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6617,7 +6617,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
>   switch (type) {
>   case VMX_EPT_EXTENT_GLOBAL:
>   kvm_mmu_sync_roots(vcpu);
> - kvm_mmu_flush_tlb(vcpu);
> + kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
>   nested_vmx_succeed(vcpu);
>   break;
>   default:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9eb5458..fc3df50 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -726,7 +726,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>  {
>   if (cr3 == kvm_read_cr3(vcpu) && !pdptrs_changed(vcpu)) {
>   kvm_mmu_sync_roots(vcpu);
> - kvm_mmu_flush_tlb(vcpu);
> + kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
>   return 0;
>   }
> 
> @@ -5989,6 +5989,12 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
>   kvm_apic_update_tmr(vcpu, tmr);
>  }
> 
> +static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
> +{
> + ++vcpu->stat.tlb_flush;
> + kvm_x86_ops->tlb_flush(vcpu);
> +}
> +
>  /*
>   * Returns 1 to let __vcpu_run() continue the guest execution loop without
>   * exiting to the userspace.  Otherwise, the value will be returned to the
> @@ -6018,8 +6024,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>   if (kvm_check_request(KVM_REQ_MMU_SYNC, vcpu))
>   kvm_mmu_sync_roots(vcpu);
>   if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
> - ++vcpu->stat.tlb_flush;
> - kvm_x86_ops->tlb_flush(vcpu);
> + kvm_vcpu_flush_tlb(vcpu);

NACK!

Do not understand why you hav