Re: kvm ivy bridge support?

2014-12-05 Thread Thomas Lau
virsh cpu-models x86_64 | sort

486
Conroe
Haswell
Nehalem
Opteron_G1
Opteron_G2
Opteron_G3
Opteron_G4
Opteron_G5
Penryn
SandyBridge
Westmere
athlon
core2duo
coreduo
cpu64-rhel5
cpu64-rhel6
kvm32
kvm64
n270
pentium
pentium2
pentium3
pentiumpro
phenom
qemu32
qemu64

interesting that it doesn't have Ivy Bridge, any reason?

On Fri, Dec 5, 2014 at 3:29 PM, Paolo Bonzini pbonz...@redhat.com wrote:


 On 05/12/2014 02:33, t...@tetrioncapital.com wrote:
 May I know how could I check CPU support list on KVM?

 QEMU: qemu-kvm -cpu help|grep x86

 (or /usr/libexec/qemu-kvm -cpu help|grep x86 depending on your
 distribution)

 Libvirt: virsh cpu-models x86_64|sort

 Paolo



-- 
Thomas Lau
Director of Infrastructure
Tetrion Capital Limited

Direct: +852-3976-8903
Mobile: +852-9323-9670
Address: 20/F, IFC 1, Central district, Hong Kong
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE][RFC] KVMGT - the implementation of Intel GVT-g(full GPU virtualization) for KVM

2014-12-05 Thread Gerd Hoffmann
  Hi,

   In KVMGT, we need to register an iodev only *after* BAR 
 registers are
   written by guest.

Oh, the guest can write the bar register at any time.  Typically it
happens at boot only, but it can also happen at runtime, for example on
reboot.

I've also seen the kernel redoing the pci mappings created by the bios,
due to buggy _crs declarations in the qemu acpi tables.

   
 https://www.usenix.org/conference/atc14/technical-sessions/presentation/tian

/me goes read this.

A few comments on the kernel stuff (brief look so far, also
compile-tested only, intel gfx on my test machine is too old).

 * Noticed the kernel bits don't even compile when configured as
   module.  Everything (vgt, i915, kvm) must be compiled into the
   kernel.
 * Design approach still seems to be i915 on vgt not the other way
   around.

Qemu/SeaBIOS bits:

I've seen the host bridge changes identity from i440fx to
copy-pci-ids-from-host.  Guess the reason for this is that seabios uses
this device to figure whenever it is running on i440fx or q35.  Correct?

What are the exact requirements for the device?  Must it match the host
exactly, to not confuse the guest intel graphics driver?  Or would
something more recent -- such as the q35 emulation qemu has -- be good
enough to make things work (assuming we add support for the
graphic-related pci config space registers there)?

The patch also adds a dummy isa bridge at 0x1f.  Simliar question here:
What exactly is needed here?  Would things work if we simply use the q35
lpc device here?

more to come after I've read the paper linked above ...

cheers,
  Gerd


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/5] ARM: on IO mem abort - route the call to KVM MMIO bus

2014-12-05 Thread Nikolay Nikolaev
On Sat, Nov 29, 2014 at 1:28 PM, Christoffer Dall
christoffer.d...@linaro.org wrote:
 On Mon, Nov 24, 2014 at 11:26:51PM +0200, Nikolay Nikolaev wrote:
 On IO memory abort, try to handle the MMIO access thorugh the KVM
 registered read/write callbacks. This is done by invoking the relevant
 kvm_io_bus_* API.

 Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com
 ---
  arch/arm/kvm/mmio.c |   33 +
  1 file changed, 33 insertions(+)

 diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
 index 4cb5a93..81230da 100644
 --- a/arch/arm/kvm/mmio.c
 +++ b/arch/arm/kvm/mmio.c
 @@ -162,6 +162,36 @@ static int decode_hsr(struct kvm_vcpu *vcpu, 
 phys_addr_t fault_ipa,
   return 0;
  }

 +/**
 + * kvm_handle_mmio - handle an in-kernel MMIO access
 + * @vcpu:pointer to the vcpu performing the access
 + * @run: pointer to the kvm_run structure
 + * @mmio:pointer to the data describing the access
 + *
 + * returns true if the MMIO access has been performed in kernel space,
 + * and false if it needs to be emulated in user space.
 + */
 +static bool handle_kernel_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
 + struct kvm_exit_mmio *mmio)
 +{
 + int ret;
 +
 + if (mmio-is_write) {
 + ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, mmio-phys_addr,
 + mmio-len, mmio-data);
 +
 + } else {
 + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, mmio-phys_addr,
 + mmio-len, mmio-data);
 + }
 + if (!ret) {
 + kvm_prepare_mmio(run, mmio);
 + kvm_handle_mmio_return(vcpu, run);
 + }
 +
 + return !ret;
 +}
 +
  int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
phys_addr_t fault_ipa)
  {
 @@ -200,6 +230,9 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run 
 *run,
   if (vgic_handle_mmio(vcpu, run, mmio))
   return 1;

 + if (handle_kernel_mmio(vcpu, run, mmio))
 + return 1;
 +

 Is this stuff always synchronously handled so that the mmio is properly
 populated upon handle_kernel_mmio on reads?

If I get it right the kvm_io_bus_ API is intended to work
synchronously. Of course it probably depends
on how the registered device handles the read/write call.
Or maybe I misunderstand your question? Please clarify in that case.

regards,
Nikolay Nikolaev


 -Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/5] KVM: ARM VGIC add kvm_io_bus_ frontend

2014-12-05 Thread Nikolay Nikolaev
On Sat, Nov 29, 2014 at 3:54 PM, Nikolay Nikolaev
n.nikol...@virtualopensystems.com wrote:
 On Sat, Nov 29, 2014 at 1:29 PM, Christoffer Dall
 christoffer.d...@linaro.org wrote:
 On Mon, Nov 24, 2014 at 11:26:58PM +0200, Nikolay Nikolaev wrote:
 In io_mem_abort remove the call to vgic_handle_mmio. The target is to have
 a single MMIO handling path - that is through the kvm_io_bus_ API.

 Register a kvm_io_device in kvm_vgic_init on the whole vGIC MMIO region.
 Both read and write calls are redirected to vgic_io_dev_access where
 kvm_exit_mmio is composed to pass it to vm_ops.handle_mmio.


 Signed-off-by: Nikolay Nikolaev n.nikol...@virtualopensystems.com
 ---
  arch/arm/kvm/mmio.c|3 --
  include/kvm/arm_vgic.h |3 +-
  virt/kvm/arm/vgic.c|   88 
 
  3 files changed, 74 insertions(+), 20 deletions(-)

 diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
 index 81230da..1c44a2b 100644
 --- a/arch/arm/kvm/mmio.c
 +++ b/arch/arm/kvm/mmio.c
 @@ -227,9 +227,6 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run 
 *run,
   if (mmio.is_write)
   mmio_write_buf(mmio.data, mmio.len, data);

 - if (vgic_handle_mmio(vcpu, run, mmio))
 - return 1;
 -
   if (handle_kernel_mmio(vcpu, run, mmio))
   return 1;

 diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
 index e452ef7..d9b7d2a 100644
 --- a/include/kvm/arm_vgic.h
 +++ b/include/kvm/arm_vgic.h
 @@ -233,6 +233,7 @@ struct vgic_dist {
   unsigned long   *irq_pending_on_cpu;

   struct vgic_vm_ops  vm_ops;
 + struct kvm_io_device*io_dev;
  #endif
  };

 @@ -307,8 +308,6 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, 
 unsigned int irq_num,
   bool level);
  void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
 -bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
 -   struct kvm_exit_mmio *mmio);

  #define irqchip_in_kernel(k) (!!((k)-arch.vgic.in_kernel))
  #define vgic_initialized(k)  ((k)-arch.vgic.ready)
 diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
 index 1213da5..3da1115 100644
 --- a/virt/kvm/arm/vgic.c
 +++ b/virt/kvm/arm/vgic.c
 @@ -31,6 +31,9 @@
  #include asm/kvm_emulate.h
  #include asm/kvm_arm.h
  #include asm/kvm_mmu.h
 +#include asm/kvm.h
 +
 +#include iodev.h

  /*
   * How the whole thing works (courtesy of Christoffer Dall):
 @@ -775,28 +778,81 @@ bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, 
 struct kvm_run *run,
   return true;
  }

 -/**
 - * vgic_handle_mmio - handle an in-kernel MMIO access for the GIC emulation
 - * @vcpu:  pointer to the vcpu performing the access
 - * @run:   pointer to the kvm_run structure
 - * @mmio:  pointer to the data describing the access
 - *
 - * returns true if the MMIO access has been performed in kernel space,
 - * and false if it needs to be emulated in user space.
 - * Calls the actual handling routine for the selected VGIC model.
 - */
 -bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
 -   struct kvm_exit_mmio *mmio)
 +static int vgic_io_dev_access(struct kvm_vcpu *vcpu, struct kvm_io_device 
 *this,
 + gpa_t addr, int len, void *val, bool is_write)
  {
 - if (!irqchip_in_kernel(vcpu-kvm))
 - return false;
 + struct kvm_exit_mmio mmio;
 + bool ret;
 +
 + mmio = (struct kvm_exit_mmio) {
 + .phys_addr = addr,
 + .len = len,
 + .is_write = is_write,
 + };
 +
 + if (is_write)
 + memcpy(mmio.data, val, len);

   /*
* This will currently call either vgic_v2_handle_mmio() or
* vgic_v3_handle_mmio(), which in turn will call
* vgic_handle_mmio_range() defined above.
*/
 - return vcpu-kvm-arch.vgic.vm_ops.handle_mmio(vcpu, run, mmio);
 + ret = vcpu-kvm-arch.vgic.vm_ops.handle_mmio(vcpu, vcpu-run, mmio);
 +
 + if (!is_write)
 + memcpy(val, mmio.data, len);
 +
 + return ret ? 0 : 1;
 +}
 +
 +static int vgic_io_dev_read(struct kvm_vcpu *vcpu, struct kvm_io_device 
 *this,
 +   gpa_t addr, int len, void *val)
 +{
 + return vgic_io_dev_access(vcpu, this, addr, len, val, false);
 +}
 +
 +static int vgic_io_dev_write(struct kvm_vcpu *vcpu, struct kvm_io_device 
 *this,
 +gpa_t addr, int len, const void *val)
 +{
 + return vgic_io_dev_access(vcpu, this, addr, len, (void *)val, true);
 +}
 +
 +static const struct kvm_io_device_ops vgic_io_dev_ops = {
 + .read   = vgic_io_dev_read,
 + .write  = vgic_io_dev_write,
 +};
 +
 +static int vgic_register_kvm_io_dev(struct kvm *kvm)
 +{
 + struct kvm_io_device *dev;
 + int ret;
 +
 + struct vgic_dist *dist = kvm-arch.vgic;
 + unsigned long base = dist-vgic_dist_base;
 +
 + dev = 

Re: [RFC PATCH 4/5] ARM: enable linking against eventfd and irqchip

2014-12-05 Thread Nikolay Nikolaev
On Sat, Nov 29, 2014 at 1:29 PM, Christoffer Dall
christoffer.d...@linaro.org wrote:
 Inidicate KVM in the subject please.

 On Mon, Nov 24, 2014 at 11:27:06PM +0200, Nikolay Nikolaev wrote:
 This enables compilation of the eventfd feature on ARM.

 does this enable anything else than the ioeventfd stuff so that
 everything is in place to do this safely now?
Inspecting the eventfd.c I see that there is the IRQFD code
conditionally compiled when CONFIG_HAVE_KVM_IRQFD is defined.
 I think this is enabled for arm/arm64 in Eric's IRQFD patcheries, so
we should be safe now.

regards,
Nikolay Nikolaev


 -Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE][RFC] KVMGT - the implementation of Intel GVT-g(full GPU virtualization) for KVM

2014-12-05 Thread Paolo Bonzini


On 05/12/2014 09:50, Gerd Hoffmann wrote:
 A few comments on the kernel stuff (brief look so far, also
 compile-tested only, intel gfx on my test machine is too old).
 
  * Noticed the kernel bits don't even compile when configured as
module.  Everything (vgt, i915, kvm) must be compiled into the
kernel.

I'll add that the patch is basically impossible to review with all the
XenGT bits still in.  For example, the x86 emulator seems to be
unnecessary for KVMGT, but I am not 100% sure.

I would like a clear understanding of why/how Andrew Barnes was able to
do i915 passthrough (GVT-d) without hacking the ISA bridge, and why this
does not apply to GVT-g.

Paolo

  * Design approach still seems to be i915 on vgt not the other way
around.
 
 Qemu/SeaBIOS bits:
 
 I've seen the host bridge changes identity from i440fx to
 copy-pci-ids-from-host.  Guess the reason for this is that seabios uses
 this device to figure whenever it is running on i440fx or q35.  Correct?
 
 What are the exact requirements for the device?  Must it match the host
 exactly, to not confuse the guest intel graphics driver?  Or would
 something more recent -- such as the q35 emulation qemu has -- be good
 enough to make things work (assuming we add support for the
 graphic-related pci config space registers there)?
 
 The patch also adds a dummy isa bridge at 0x1f.  Simliar question here:
 What exactly is needed here?  Would things work if we simply use the q35
 lpc device here?
 
 more to come after I've read the paper linked above ...
 
 cheers,
   Gerd
 
 
 ___
 Intel-gfx mailing list
 intel-...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/intel-gfx
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/9] kvm: x86: Add kvm_x86_ops hook that enables XSAVES for guest

2014-12-05 Thread Radim Krčmář
(I somehow managed to review one patch twice instead of this one ...)

2014-12-04 16:57+0100, Paolo Bonzini:
 From: Wanpeng Li wanpeng...@linux.intel.com
 
 Expose the XSAVES feature to the guest if the kvm_x86_ops say it is
 available.
 
 Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---

Reviewed-by: Radim Krčmář rkrc...@redhat.com

 --- a/arch/x86/include/asm/vmx.h
 +++ b/arch/x86/include/asm/vmx.h
 @@ -69,6 +69,7 @@
  #define SECONDARY_EXEC_PAUSE_LOOP_EXITING0x0400
  #define SECONDARY_EXEC_ENABLE_INVPCID0x1000
  #define SECONDARY_EXEC_SHADOW_VMCS  0x4000
 +#define SECONDARY_EXEC_XSAVES0x0010

(Email exposes tab alignment nicely.)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Intel-gfx] [ANNOUNCE][RFC] KVMGT - the implementation of Intel GVT-g(full GPU virtualization) for KVM

2014-12-05 Thread Daniel Vetter
On Fri, Dec 05, 2014 at 09:50:21AM +0100, Gerd Hoffmann wrote:
  
  https://www.usenix.org/conference/atc14/technical-sessions/presentation/tian
 
 /me goes read this.
 
 A few comments on the kernel stuff (brief look so far, also
 compile-tested only, intel gfx on my test machine is too old).
 
  * Noticed the kernel bits don't even compile when configured as
module.  Everything (vgt, i915, kvm) must be compiled into the
kernel.
  * Design approach still seems to be i915 on vgt not the other way
around.

Yeah done a quick read-through of just the i915 bits too, same comment. I
guess this is just the first RFC and the redesign we've discussed about
already with xengt is in progress somewhere?

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[virtio-mmio]: arm-only?

2014-12-05 Thread Vasile Catalin-B50542

Is virtio_mmio used only for arm emulation?
I was looking through qemu sources and the only file
that I have found to instantiate virtio-mmio proxies is arm/virt.c .

Cata
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm ivy bridge support?

2014-12-05 Thread Rik van Riel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/05/2014 03:17 AM, Thomas Lau wrote:
 virsh cpu-models x86_64 | sort

...
 Penryn SandyBridge Westmere

...

 interesting that it doesn't have Ivy Bridge, any reason?

The reason would be that you are running an older version.

- -- 
All rights reversed
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJUgc4KAAoJEM553pKExN6DHGoH/iRWWBCGO0oIgRuA2iEGVYhh
M5iUVJfxzdK6S4sNRPRnUHW8YqQVkKQSULBna4zYPI6q/9d5llDZog8Ejk0XES23
8yembpEM5SsKW2bUzaxpd1t8g096+tX7tmX/i06VOyzSnRDMpyqBAR0cX2c/Yr44
2lFu7IoGI8E+ZM9CeOnCjtKBIsr5ZMV2S9++WKo9EpZAvhMdsxFQh5a47PSShJes
0Dib/KEmg3S45+eWRvDoqI6KwAgTXJ0MvvMkXvWT2uQ25BHu7AzekKl2jsSnKJeq
4PWgD23dDn1quUfeP7GJq5/zN+Rv2Apw4cQZYtVAmm7pfmMOaM8xV5A2R0ygyTw=
=o0NK
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Stupid Xen vs KVM question

2014-12-05 Thread Konrad Rzeszutek Wilk
On Fri, Dec 05, 2014 at 08:29:54AM +0100, Paolo Bonzini wrote:
 
 
 On 05/12/2014 03:24, Konrad Rzeszutek Wilk wrote:
  We could do a simple thing - which is that the paravirt_enabled
  could have the value 1 for Xen and 2 for KVM. The assembler logic
  would be inverted and just check for 1. I am not going to attempt
  to write the assembler code :-)
 
 Wouldn't Xen HVM also want to be 2?

Oddly enough it was never set!

Looking at where the paravit_enabled() macro is used, on KVM it could
be just set to zero.
 
 Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch -rt 2/2] KVM: lapic: mark LAPIC timer handler as irqsafe

2014-12-05 Thread Marcelo Tosatti
On Thu, Dec 04, 2014 at 02:53:26PM +0100, Paolo Bonzini wrote:
 
 
 On 25/11/2014 21:24, Thomas Gleixner wrote:
  On Tue, 25 Nov 2014, Rik van Riel wrote:
  On 11/25/2014 12:21 PM, Marcelo Tosatti wrote:
  Since lapic timer handler only wakes up a simple waitqueue,
  it can be executed from hardirq context.
 
  Reduces average cyclictest latency by 3us.
 
  Can this patch be merged in the KVM tree, and go
  upstream via Paolo?
  
  Not really as it has RT dependencies 
 
 Can hrtimer_start_expires even return ETIME on a non-RT kernel?  If yes,
 I can take patch 2.  If not, it's better to keep both patches in the RT
 tree.
 
 Paolo

It can't. We're still evaluating the necessity of this patch, will
resubmit accordingly.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

2014-12-05 Thread Luis R. Rodriguez
On Wed, Dec 03, 2014 at 08:39:47PM +0100, Luis R. Rodriguez wrote:
 On Wed, Dec 03, 2014 at 05:37:51AM +0100, Juergen Gross wrote:
  On 12/03/2014 03:28 AM, Luis R. Rodriguez wrote:
  On Tue, Dec 02, 2014 at 11:11:18AM +, David Vrabel wrote:
  On 01/12/14 22:36, Luis R. Rodriguez wrote:
 
  Then I do agree its a fair analogy (and find this obviously odd that how
  widespread cond_resched() is), we just don't have an equivalent for IRQ
  context, why not avoid the special check then and use this all the time 
  in the
  middle of a hypercall on the return from an interrupt (e.g., the timer
  interrupt)?
 
  http://lists.xen.org/archives/html/xen-devel/2014-02/msg01101.html
 
  OK thanks! That explains why we need some asm code but in that submission 
  you
  still also had used is_preemptible_hypercall(regs) and in the new
  implementation you use a CPU variable xen_in_preemptible_hcall prior to 
  calling
  preempt_schedule_irq(). I believe you added the CPU variable because
  preempt_schedule_irq() will preempt first without any checks if it should, 
  I'm
  asking why not do something like cond_resched_irq() where we check with
  should_resched() prior to preempting and that way we can avoid having to 
  use
  the CPU variable?
 
  Because that could preempt at any asynchronous interrupt making the
  no-preempt kernel fully preemptive. 
 
 OK yeah I see. That still doesn't negate the value of using something
 like cond_resched_irq() with a should_resched() on only critical hypercalls.
 The current implementation (patch by David) forces preemption without
 checking for should_resched() so it would preempt unnecessarily at least
 once.
 
  How would you know you are just
  doing a critical hypercall which should be preempted?
 
 You would not, you're right. I was just trying to see if we could generalize
 an API for this to avoid having users having to create their own CPU variables
 but this all seems very specialized as we want to use this on the timer
 so if we do generalize a cond_resched_irq() perhaps the documentation can
 warn about this type of case or abuse.

David's patch had the check only it was x86 based, if we use cond_resched_irq()
we can leave that aspect out to be done through asm inlines or it'll use the
generic shoudl_resched(), that should save some code on the asm implementations.

I have some patches now which generalizees this, I also have more information
about this can happen exactly, and a way to triggger it on small systems with
some hacks to emulate possibly backend behaviour on larger systems. In the worst
case this can be a dangerious situation to be in. I'll send some new RFTs.

 Luis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm ivy bridge support?

2014-12-05 Thread Paolo Bonzini


- Original Message -
 From: Rik van Riel r...@redhat.com
 To: Thomas Lau t...@tetrioncapital.com, Paolo Bonzini 
 pbonz...@redhat.com
 Cc: Yong Wang yong.y.w...@linux.intel.com, kvm kvm@vger.kernel.org
 Sent: Friday, December 5, 2014 4:23:54 PM
 Subject: Re: kvm ivy bridge support?
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 12/05/2014 03:17 AM, Thomas Lau wrote:
  virsh cpu-models x86_64 | sort
 
 ...
  Penryn SandyBridge Westmere
 
 ...
 
  interesting that it doesn't have Ivy Bridge, any reason?
 
 The reason would be that you are running an older version.


No, the reason could also be that for some reason we added Haswell and
Broadwell, but forgot Ivy Bridge.  I'll send QEMU patches next week.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/6] Improve PSCI system events and fix reboot bugs

2014-12-05 Thread Andrew Jones
On Wed, Dec 03, 2014 at 10:18:36PM +0100, Christoffer Dall wrote:
 Several people have reported problems with rebooting ARM VMs, especially
 on 32-bit ARM.  This is mainly due to the same reason we were seeing
 boot errors in the past, namely that the ram, dcache, and icache weren't
 coherent on guest boot with the guest (stage-1) MMU disabled.  We solved
 this by ensuring coherency when we fault in pages, but since most memory
 is already mapped after a reboot, we don't do anything.
 
 The solution is to unmap the regular RAM on VCPU init, but we must
 take care to not unmap the GIC or other IO regions, hence the somehwat
 complicated solution.
 
 As part of figuring this out, it became clear that some semantics around
 the KVM_ARM_VCPU_INIT ABI and system event ABI was unclear (what is
 userspace expected to do when it receives a system event).  This series
 also clarifies the ABI and changes the kernel functionality to do what
 userspace expects (turn off VCPUs on a system shutdown event).
 
 The code is avaliable here as well:
 http://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git 
 vcpu_init_fixes-v2
 
 There is an alternative version with more code-reuse for the unmapping
 implementation for the previous version of this patch series available
 in the following git repo:
 
 http://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git 
 vcpu_init_fixes-alternative
 
 Testing
 ---
 This has been tested on CubieBoard, Arndale, TC2, and Juno.  On Arndale
 and TC2 it was extremely easy to reproduce the problem (just start a VM
 that runs reboot from /etc/rc.local or similar) and this series clearly
 fixes the behavior.  For the previous version of this series, I was
 seeing some problems on Juno, but it turned out to be because I wasn't
 limiting my testing to one of the clusters, and since we don't support
 re-initing a VCPU on a different physical host CPU (big.LITTLE), it was
 failing.  For this version of the patch series, it has been running a
 reboot loop on Juno for hours.

Just tested this version. Looks good. No problems after install nor
after many, many reboots.

drew

 
 Changelog
 -
 Changes v1-v2:
  - New patch to not clear the VCPU_POWER_OFF flag
  - Fixed spelling error in commit message
  - Adapted ABI texts based on Peter's feedback
  - Check for changed parameters to KVM_ARM_VCPU_INIT
  - Now unmap the Stage-2 RAM mappings at VCPU init instead of at PSCI
system event time.
 
 Christoffer Dall (6):
   arm/arm64: KVM: Don't clear the VCPU_POWER_OFF flag
   arm/arm64: KVM: Correct KVM_ARM_VCPU_INIT power off option
   arm/arm64: KVM: Reset the HCR on each vcpu when resetting the vcpu
   arm/arm64: KVM: Clarify KVM_ARM_VCPU_INIT ABI
   arm/arm64: KVM: Turn off vcpus on PSCI shutdown/reboot
   arm/arm64: KVM: Introduce stage2_unmap_vm
 
  Documentation/virtual/kvm/api.txt| 17 +-
  arch/arm/include/asm/kvm_emulate.h   |  5 +++
  arch/arm/include/asm/kvm_host.h  |  2 --
  arch/arm/include/asm/kvm_mmu.h   |  1 +
  arch/arm/kvm/arm.c   | 56 ++-
  arch/arm/kvm/guest.c | 26 ---
  arch/arm/kvm/mmu.c   | 65 
 
  arch/arm/kvm/psci.c  | 19 +++
  arch/arm64/include/asm/kvm_emulate.h |  5 +++
  arch/arm64/include/asm/kvm_host.h|  3 +-
  arch/arm64/include/asm/kvm_mmu.h |  1 +
  arch/arm64/kvm/guest.c   | 26 ---
  12 files changed, 168 insertions(+), 58 deletions(-)
 
 -- 
 2.1.2.330.g565301e.dirty
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Stupid Xen vs KVM question

2014-12-05 Thread Andy Lutomirski
On Dec 5, 2014 8:09 AM, Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote:

 On Fri, Dec 05, 2014 at 08:29:54AM +0100, Paolo Bonzini wrote:
 
 
  On 05/12/2014 03:24, Konrad Rzeszutek Wilk wrote:
   We could do a simple thing - which is that the paravirt_enabled
   could have the value 1 for Xen and 2 for KVM. The assembler logic
   would be inverted and just check for 1. I am not going to attempt
   to write the assembler code :-)
 
  Wouldn't Xen HVM also want to be 2?

 Oddly enough it was never set!

 Looking at where the paravit_enabled() macro is used, on KVM it could
 be just set to zero.

I noticed that KVM is setting paravirt_enabled = 1 twice, once the
main kvm guest code and once in kvmclock.

Will the EBDA code in head.c care?

That would also increase sanity a little IMO.  We currently have
paravirt meaning that the normal HW architecture isn't present (Den,
lguest) and paravirt meaning that there are extra optional-to-use
hypervisor features (KVM, etc).

--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] uio/uio_pci_generic: don't return zero on failure path in probe()

2014-12-05 Thread Alexey Khoroshilov
If uio_register_device() fails in probe(), it breaks off initialization,
deallocates all resources, but returns zero.

The patch adds proper error code propagation.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov khoroshi...@ispras.ru
---
 drivers/uio/uio_pci_generic.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/uio/uio_pci_generic.c b/drivers/uio/uio_pci_generic.c
index 077ae12269ce..d0b508b68f3c 100644
--- a/drivers/uio/uio_pci_generic.c
+++ b/drivers/uio/uio_pci_generic.c
@@ -91,7 +91,8 @@ static int probe(struct pci_dev *pdev,
gdev-info.handler = irqhandler;
gdev-pdev = pdev;
 
-   if (uio_register_device(pdev-dev, gdev-info))
+   err = uio_register_device(pdev-dev, gdev-info);
+   if (err)
goto err_register;
pci_set_drvdata(pdev, gdev);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC] KVM: x86: nested: support for MSR loading/storing

2014-12-05 Thread Eugene Korenevsky
BitVisor hypervisor fails to start a nested guest due to lack of MSR
load/store support in KVM.

This patch fixes this problem according to Intel SDM.


Signed-off-by: Eugene Korenevsky ekorenev...@gmail.com
---
 arch/x86/include/asm/vmx.h|   6 ++
 arch/x86/include/uapi/asm/msr-index.h |   3 +
 arch/x86/include/uapi/asm/vmx.h   |   1 +
 arch/x86/kvm/vmx.c| 191 +-
 virt/kvm/kvm_main.c   |   1 +
 5 files changed, 197 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index bcbfade..e07414c 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -454,6 +454,12 @@ struct vmx_msr_entry {
 #define ENTRY_FAIL_VMCS_LINK_PTR   4
 
 /*
+ * VMX Abort codes
+ */
+#define VMX_ABORT_MSR_STORE1
+#define VMX_ABORT_MSR_LOAD 4
+
+/*
  * VM-instruction error numbers
  */
 enum vm_instruction_error_number {
diff --git a/arch/x86/include/uapi/asm/msr-index.h 
b/arch/x86/include/uapi/asm/msr-index.h
index e21331c..3c9c601 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/uapi/asm/msr-index.h
@@ -316,6 +316,9 @@
 #define MSR_IA32_UCODE_WRITE   0x0079
 #define MSR_IA32_UCODE_REV 0x008b
 
+#define MSR_IA32_SMM_MONITOR_CTL   0x009b
+#define MSR_IA32_SMBASE0x009e
+
 #define MSR_IA32_PERF_STATUS   0x0198
 #define MSR_IA32_PERF_CTL  0x0199
 #define MSR_AMD_PSTATE_DEF_BASE0xc0010064
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index 990a2fe..f5f8a62 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -56,6 +56,7 @@
 #define EXIT_REASON_MSR_READ31
 #define EXIT_REASON_MSR_WRITE   32
 #define EXIT_REASON_INVALID_STATE   33
+#define EXIT_REASON_MSR_LOAD_FAILURE34
 #define EXIT_REASON_MWAIT_INSTRUCTION   36
 #define EXIT_REASON_MONITOR_INSTRUCTION 39
 #define EXIT_REASON_PAUSE_INSTRUCTION   40
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6a951d8..deebc96 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8498,6 +8498,163 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
struct vmcs12 *vmcs12)
kvm_register_write(vcpu, VCPU_REGS_RIP, vmcs12-guest_rip);
 }
 
+struct msr_entry {
+   u32 index;
+   u32 reserved;
+   u64 data;
+} __packed;
+
+static bool vmx_msr_switch_area_verify(u32 count, u64 addr, int maxphyaddr)
+{
+   if (count == 0)
+   return true;
+   if ((addr  0xf) != 0)
+   return false;
+   if ((addr  maxphyaddr) != 0)
+   return false;
+   if (((addr + count * sizeof(struct msr_entry) - 1)  maxphyaddr) != 0)
+   return false;
+   return true;
+}
+
+static bool nested_vmx_msr_switch_verify(struct kvm_vcpu *vcpu,
+struct vmcs12 *vmcs12)
+{
+   int maxphyaddr = cpuid_maxphyaddr(vcpu);
+
+#define VMCS12_MSR_SWITCH_VERIFY(name) { \
+   if (!vmx_msr_switch_area_verify(vmcs12-name##_count, \
+   vmcs12-name##_addr, maxphyaddr)) { \
+   pr_warn_ratelimited( \
+   %s: invalid MSR_{LOAD,STORE} parameters, \
+   #name); \
+   return false; \
+   } \
+   }
+   VMCS12_MSR_SWITCH_VERIFY(vm_exit_msr_load);
+   VMCS12_MSR_SWITCH_VERIFY(vm_exit_msr_store);
+   VMCS12_MSR_SWITCH_VERIFY(vm_entry_msr_load);
+   return true;
+}
+
+static bool vmx_load_msr_entry_verify(struct kvm_vcpu *vcpu,
+ struct msr_entry *e)
+{
+   if (e-index == MSR_FS_BASE || e-index == MSR_GS_BASE)
+   return false;
+   /* SMM is not supported */
+   if (e-index == MSR_IA32_SMM_MONITOR_CTL)
+   return false;
+   /* x2APIC MSR accesses are not allowed */
+   if (apic_x2apic_mode(vcpu-arch.apic)  (e-index  24) == 0x800)
+   return false;
+   if (e-reserved != 0)
+   return false;
+   return true;
+}
+
+static bool vmx_msr_switch_is_protected_msr(u32 msr_index)
+{
+   /* Intel SDM: a processor MAY prevent writing to these MSRs when
+* loading guest states on VM entries or saving guest states on VM
+* exits.
+* Assume our emulated processor DOES prevent writing */
+   return msr_index == MSR_IA32_UCODE_WRITE ||
+  msr_index == MSR_IA32_UCODE_REV;
+}
+
+static unsigned int nested_vmx_load_msrs(struct kvm_vcpu *vcpu,
+u32 count, u64 addr)
+{
+   unsigned int i;
+   struct msr_entry msr_entry;
+   struct msr_data msr;
+
+   for (i = 1; i = count; i++, addr += sizeof(msr_entry)) {
+   if (kvm_read_guest(vcpu-kvm, addr,
+  msr_entry, 

[PATCH] x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit

2014-12-05 Thread Andy Lutomirski
paravirt_enabled has the following effects:

 - Disables the F00F bug workaround warning.  There is no F00F bug
   workaround any more because Linux's standard IDT handling already
   works around the F00F bug, but the warning still exists.  This
   is only cosmetic, and, in any event, there is no such thing as
   KVM on a CPU with the F00F bug.

 - Disables 32-bit APM BIOS detection.  On a KVM paravirt system,
   there should be no APM BIOS anyway.

 - Disables tboot.  I think that the tboot code should check the
   CPUID hypervisor bit directly if it matters.

 - paravirt_enabled disables espfix32.  espfix32 should *not* be
   disabled under KVM paravirt.

The last point is the purpose of this patch.  It fixes a leak of the
high 16 bits of the kernel stack address on 32-bit KVM paravirt
guests.

While I'm at it, this removes pv_info setup from kvmclock.  That
code seems to serve no purpose.

Cc: sta...@vger.kernel.org
Signed-off-by: Andy Lutomirski l...@amacapital.net
---
 arch/x86/kernel/kvm.c  | 9 -
 arch/x86/kernel/kvmclock.c | 2 --
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index f6945bef2cd1..94f643484300 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -283,7 +283,14 @@ NOKPROBE_SYMBOL(do_async_page_fault);
 static void __init paravirt_ops_setup(void)
 {
pv_info.name = KVM;
-   pv_info.paravirt_enabled = 1;
+
+   /*
+* KVM isn't paravirt in the sense of paravirt_enabled.  A KVM
+* guest kernel works like a bare metal kernel with additional
+* features, and paravirt_enabled is about features that are
+* missing.
+*/
+   pv_info.paravirt_enabled = 0;
 
if (kvm_para_has_feature(KVM_FEATURE_NOP_IO_DELAY))
pv_cpu_ops.io_delay = kvm_io_delay;
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index d9156ceecdff..d4d9a8ad7893 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -263,8 +263,6 @@ void __init kvmclock_init(void)
 #endif
kvm_get_preset_lpj();
clocksource_register_hz(kvm_clock, NSEC_PER_SEC);
-   pv_info.paravirt_enabled = 1;
-   pv_info.name = KVM;
 
if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 4/6] hw_random: fix unregister race.

2014-12-05 Thread Amos Kong
On Wed, Nov 12, 2014 at 02:33:00PM +1030, Rusty Russell wrote:
 Amos Kong ak...@redhat.com writes:
  From: Rusty Russell ru...@rustcorp.com.au
 
  The previous patch added one potential problem: we can still be
  reading from a hwrng when it's unregistered.  Add a wait for zero
  in the hwrng_unregister path.
 
  v4: add cleanup_done flag to insure that cleanup is done
 
 That's a bit weird.  The usual pattern would be to hold a reference
 until we're actually finished, but this reference is a bit weird.
 
 We hold the mutex across cleanup, so we could grab that but we have to
 take care sleeping inside wait_event, otherwise Peter will have to fix
 my code again :)
 
 AFAICT the wake_woken() stuff isn't merged yet, so your patch will
 have to do for now.
 
  @@ -98,6 +99,8 @@ static inline void cleanup_rng(struct kref *kref)
   
  if (rng-cleanup)
  rng-cleanup(rng);
  +   rng-cleanup_done = true;
  +   wake_up_all(rng_done);
   }
   
   static void set_current_rng(struct hwrng *rng)
  @@ -536,6 +539,11 @@ void hwrng_unregister(struct hwrng *rng)
  kthread_stop(hwrng_fill);
  } else
  mutex_unlock(rng_mutex);
  +
  +   /* Just in case rng is reading right now, wait. */
  +   wait_event(rng_done, rng-cleanup_done 
  +  atomic_read(rng-ref.refcount) == 0);
  +
 
 The atomic_read() isn't necessary here.

At least, we need it to convert refcount from atomic_t to int.
Otherwise, we will touch this error:

  error: invalid operands to binary == (have 'atomic_t' and 'int')
 
 However, you should probably init cleanup_done in hwrng_register().
 (Probably noone does unregister then register, but let's be clear).
 
 Thanks,
 Rusty.

-- 
Amos.


signature.asc
Description: Digital signature


[PATCH v5 4/6] hw_random: fix unregister race.

2014-12-05 Thread Amos Kong
From: Rusty Russell ru...@rustcorp.com.au

The previous patch added one potential problem: we can still be
reading from a hwrng when it's unregistered.  Add a wait for zero
in the hwrng_unregister path.

v5: reset cleanup_done flag, use compiler barrier to prevent recording.
v4: add cleanup_done flag to insure that cleanup is done

Signed-off-by: Rusty Russell ru...@rustcorp.com.au
Signed-off-by: Amos Kong ak...@redhat.com
---
 drivers/char/hw_random/core.c | 12 
 include/linux/hw_random.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 83516cb..067270b 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -60,6 +60,7 @@ static DEFINE_MUTEX(rng_mutex);
 static DEFINE_MUTEX(reading_mutex);
 static int data_avail;
 static u8 *rng_buffer, *rng_fillbuf;
+static DECLARE_WAIT_QUEUE_HEAD(rng_done);
 static unsigned short current_quality;
 static unsigned short default_quality; /* = 0; default to off */
 
@@ -98,6 +99,11 @@ static inline void cleanup_rng(struct kref *kref)
 
if (rng-cleanup)
rng-cleanup(rng);
+
+   /* cleanup_done should be updated after cleanup finishes */
+   smp_wmb();
+   rng-cleanup_done = true;
+   wake_up_all(rng_done);
 }
 
 static void set_current_rng(struct hwrng *rng)
@@ -498,6 +504,8 @@ int hwrng_register(struct hwrng *rng)
add_early_randomness(rng);
}
 
+   rng-cleanup_done = false;
+
 out_unlock:
mutex_unlock(rng_mutex);
 out:
@@ -529,6 +537,10 @@ void hwrng_unregister(struct hwrng *rng)
kthread_stop(hwrng_fill);
} else
mutex_unlock(rng_mutex);
+
+   /* Just in case rng is reading right now, wait. */
+   wait_event(rng_done, rng-cleanup_done 
+  atomic_read(rng-ref.refcount) == 0);
 }
 EXPORT_SYMBOL_GPL(hwrng_unregister);
 
diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h
index c212e71..7832e50 100644
--- a/include/linux/hw_random.h
+++ b/include/linux/hw_random.h
@@ -46,6 +46,7 @@ struct hwrng {
/* internal. */
struct list_head list;
struct kref ref;
+   bool cleanup_done;
 };
 
 /** Register a new Hardware Random Number Generator driver. */
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 6/6] hw_random: don't init list element we're about to add to list.

2014-12-05 Thread Amos Kong
From: Rusty Russell ru...@rustcorp.com.au

Another interesting anti-pattern.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au
---
 drivers/char/hw_random/core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index a9286bf..4d13ac5 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -489,7 +489,6 @@ int hwrng_register(struct hwrng *rng)
goto out_unlock;
}
}
-   INIT_LIST_HEAD(rng-list);
list_add_tail(rng-list, rng_list);
 
if (old_rng  !rng-init) {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 5/6] hw_random: don't double-check old_rng.

2014-12-05 Thread Amos Kong
From: Rusty Russell ru...@rustcorp.com.au

Interesting anti-pattern.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au
---
 drivers/char/hw_random/core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 067270b..a9286bf 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -476,14 +476,13 @@ int hwrng_register(struct hwrng *rng)
}
 
old_rng = current_rng;
+   err = 0;
if (!old_rng) {
err = hwrng_init(rng);
if (err)
goto out_unlock;
set_current_rng(rng);
-   }
-   err = 0;
-   if (!old_rng) {
+
err = register_miscdev();
if (err) {
drop_current_rng();
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 3/6] hw_random: use reference counts on each struct hwrng.

2014-12-05 Thread Amos Kong
From: Rusty Russell ru...@rustcorp.com.au

current_rng holds one reference, and we bump it every time we want
to do a read from it.

This means we only hold the rng_mutex to grab or drop a reference,
so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't
block on read of /dev/hwrng.

Using a kref is overkill (we're always under the rng_mutex), but
a standard pattern.

This also solves the problem that the hwrng_fillfn thread was
accessing current_rng without a lock, which could change (eg. to NULL)
underneath it.

v5: drop redundant kref_init()
v4: decrease last reference for triggering the cleanup
v3: initialize kref (thanks Amos Kong)
v2: fix missing put_rng() on exit path (thanks Amos Kong)
Signed-off-by: Rusty Russell ru...@rustcorp.com.au
Signed-off-by: Amos Kong ak...@redhat.com
---
 drivers/char/hw_random/core.c | 135 --
 include/linux/hw_random.h |   2 +
 2 files changed, 94 insertions(+), 43 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index a0905c8..83516cb 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -42,6 +42,7 @@
 #include linux/delay.h
 #include linux/slab.h
 #include linux/random.h
+#include linux/err.h
 #include asm/uaccess.h
 
 
@@ -91,6 +92,60 @@ static void add_early_randomness(struct hwrng *rng)
add_device_randomness(bytes, bytes_read);
 }
 
+static inline void cleanup_rng(struct kref *kref)
+{
+   struct hwrng *rng = container_of(kref, struct hwrng, ref);
+
+   if (rng-cleanup)
+   rng-cleanup(rng);
+}
+
+static void set_current_rng(struct hwrng *rng)
+{
+   BUG_ON(!mutex_is_locked(rng_mutex));
+   kref_get(rng-ref);
+   current_rng = rng;
+}
+
+static void drop_current_rng(void)
+{
+   BUG_ON(!mutex_is_locked(rng_mutex));
+   if (!current_rng)
+   return;
+
+   /* decrease last reference for triggering the cleanup */
+   kref_put(current_rng-ref, cleanup_rng);
+   current_rng = NULL;
+}
+
+/* Returns ERR_PTR(), NULL or refcounted hwrng */
+static struct hwrng *get_current_rng(void)
+{
+   struct hwrng *rng;
+
+   if (mutex_lock_interruptible(rng_mutex))
+   return ERR_PTR(-ERESTARTSYS);
+
+   rng = current_rng;
+   if (rng)
+   kref_get(rng-ref);
+
+   mutex_unlock(rng_mutex);
+   return rng;
+}
+
+static void put_rng(struct hwrng *rng)
+{
+   /*
+* Hold rng_mutex here so we serialize in case they set_current_rng
+* on rng again immediately.
+*/
+   mutex_lock(rng_mutex);
+   if (rng)
+   kref_put(rng-ref, cleanup_rng);
+   mutex_unlock(rng_mutex);
+}
+
 static inline int hwrng_init(struct hwrng *rng)
 {
if (rng-init) {
@@ -113,12 +168,6 @@ static inline int hwrng_init(struct hwrng *rng)
return 0;
 }
 
-static inline void hwrng_cleanup(struct hwrng *rng)
-{
-   if (rng  rng-cleanup)
-   rng-cleanup(rng);
-}
-
 static int rng_dev_open(struct inode *inode, struct file *filp)
 {
/* enforce read-only access to this chrdev */
@@ -154,21 +203,22 @@ static ssize_t rng_dev_read(struct file *filp, char 
__user *buf,
ssize_t ret = 0;
int err = 0;
int bytes_read, len;
+   struct hwrng *rng;
 
while (size) {
-   if (mutex_lock_interruptible(rng_mutex)) {
-   err = -ERESTARTSYS;
+   rng = get_current_rng();
+   if (IS_ERR(rng)) {
+   err = PTR_ERR(rng);
goto out;
}
-
-   if (!current_rng) {
+   if (!rng) {
err = -ENODEV;
-   goto out_unlock;
+   goto out;
}
 
mutex_lock(reading_mutex);
if (!data_avail) {
-   bytes_read = rng_get_data(current_rng, rng_buffer,
+   bytes_read = rng_get_data(rng, rng_buffer,
rng_buffer_size(),
!(filp-f_flags  O_NONBLOCK));
if (bytes_read  0) {
@@ -200,8 +250,8 @@ static ssize_t rng_dev_read(struct file *filp, char __user 
*buf,
ret += len;
}
 
-   mutex_unlock(rng_mutex);
mutex_unlock(reading_mutex);
+   put_rng(rng);
 
if (need_resched())
schedule_timeout_interruptible(1);
@@ -213,12 +263,11 @@ static ssize_t rng_dev_read(struct file *filp, char 
__user *buf,
}
 out:
return ret ? : err;
-out_unlock:
-   mutex_unlock(rng_mutex);
-   goto out;
+
 out_unlock_reading:
mutex_unlock(reading_mutex);
-   goto out_unlock;
+   put_rng(rng);
+   goto out;
 }
 
 
@@ -257,8 +306,8 @@ static ssize_t hwrng_attr_current_store(struct device *dev,

[PATCH v5 0/6] fix hw_random stuck

2014-12-05 Thread Amos Kong
When I hotunplug a busy virtio-rng device or try to access
hwrng attributes in non-smp guest, it gets stuck.

My hotplug tests:

| test 0:
|   hotunplug rng device from qemu monitor
|
| test 1:
|   guest) # dd if=/dev/hwrng of=/dev/null 
|   hotunplug rng device from qemu monitor
|
| test 2:
|   guest) # dd if=/dev/random of=/dev/null 
|   hotunplug rng device from qemu monitor
|
| test 4:
|   guest) # dd if=/dev/hwrng of=/dev/null 
|   cat /sys/devices/virtual/misc/hw_random/rng_*
|
| test 5:
|   guest) # dd if=/dev/hwrng of=/dev/null
|   cancel dd process after 10 seconds
|   guest) # dd if=/dev/hwrng of=/dev/null 
|   hotunplug rng device from qemu monitor
|
| test 6:
|   use a fifo as rng backend, execute test 0 ~ 5 with no input of fifo

V5: reset cleanup_done flag, drop redundant init of reference count, use
compiler barrier to prevent recording.
V4: update patch 4 to fix corrupt, decrease last reference for triggering
the cleanup, fix unregister race pointed by Herbert
V3: initialize kref to 1
V2: added patch 2 to fix a deadlock, update current patch 3 to fix reference
counting issue

Amos Kong (1):
  hw_random: move some code out mutex_lock for avoiding underlying
deadlock

Rusty Russell (5):
  hw_random: place mutex around read functions and buffers.
  hw_random: use reference counts on each struct hwrng.
  hw_random: fix unregister race.
  hw_random: don't double-check old_rng.
  hw_random: don't init list element we're about to add to list.

 drivers/char/hw_random/core.c | 173 ++
 include/linux/hw_random.h |   3 +
 2 files changed, 126 insertions(+), 50 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 2/6] hw_random: move some code out mutex_lock for avoiding underlying deadlock

2014-12-05 Thread Amos Kong
In next patch, we use reference counting for each struct hwrng,
changing reference count also needs to take mutex_lock. Before
releasing the lock, if we try to stop a kthread that waits to
take the lock to reduce the referencing count, deadlock will
occur.

Signed-off-by: Amos Kong ak...@redhat.com
---
 drivers/char/hw_random/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index b1b6042..a0905c8 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -474,12 +474,12 @@ void hwrng_unregister(struct hwrng *rng)
}
}
if (list_empty(rng_list)) {
+   mutex_unlock(rng_mutex);
unregister_miscdev();
if (hwrng_fill)
kthread_stop(hwrng_fill);
-   }
-
-   mutex_unlock(rng_mutex);
+   } else
+   mutex_unlock(rng_mutex);
 }
 EXPORT_SYMBOL_GPL(hwrng_unregister);
 
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/6] hw_random: place mutex around read functions and buffers.

2014-12-05 Thread Amos Kong
From: Rusty Russell ru...@rustcorp.com.au

There's currently a big lock around everything, and it means that we
can't query sysfs (eg /sys/devices/virtual/misc/hw_random/rng_current)
while the rng is reading.  This is a real problem when the rng is slow,
or blocked (eg. virtio_rng with qemu's default /dev/random backend)

This doesn't help (it leaves the current lock untouched), just adds a
lock to protect the read function and the static buffers, in preparation
for transition.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au
---
 drivers/char/hw_random/core.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index aa30a25..b1b6042 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -53,7 +53,10 @@
 static struct hwrng *current_rng;
 static struct task_struct *hwrng_fill;
 static LIST_HEAD(rng_list);
+/* Protects rng_list and current_rng */
 static DEFINE_MUTEX(rng_mutex);
+/* Protects rng read functions, data_avail, rng_buffer and rng_fillbuf */
+static DEFINE_MUTEX(reading_mutex);
 static int data_avail;
 static u8 *rng_buffer, *rng_fillbuf;
 static unsigned short current_quality;
@@ -81,7 +84,9 @@ static void add_early_randomness(struct hwrng *rng)
unsigned char bytes[16];
int bytes_read;
 
+   mutex_lock(reading_mutex);
bytes_read = rng_get_data(rng, bytes, sizeof(bytes), 1);
+   mutex_unlock(reading_mutex);
if (bytes_read  0)
add_device_randomness(bytes, bytes_read);
 }
@@ -128,6 +133,7 @@ static inline int rng_get_data(struct hwrng *rng, u8 
*buffer, size_t size,
int wait) {
int present;
 
+   BUG_ON(!mutex_is_locked(reading_mutex));
if (rng-read)
return rng-read(rng, (void *)buffer, size, wait);
 
@@ -160,13 +166,14 @@ static ssize_t rng_dev_read(struct file *filp, char 
__user *buf,
goto out_unlock;
}
 
+   mutex_lock(reading_mutex);
if (!data_avail) {
bytes_read = rng_get_data(current_rng, rng_buffer,
rng_buffer_size(),
!(filp-f_flags  O_NONBLOCK));
if (bytes_read  0) {
err = bytes_read;
-   goto out_unlock;
+   goto out_unlock_reading;
}
data_avail = bytes_read;
}
@@ -174,7 +181,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user 
*buf,
if (!data_avail) {
if (filp-f_flags  O_NONBLOCK) {
err = -EAGAIN;
-   goto out_unlock;
+   goto out_unlock_reading;
}
} else {
len = data_avail;
@@ -186,7 +193,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user 
*buf,
if (copy_to_user(buf + ret, rng_buffer + data_avail,
len)) {
err = -EFAULT;
-   goto out_unlock;
+   goto out_unlock_reading;
}
 
size -= len;
@@ -194,6 +201,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user 
*buf,
}
 
mutex_unlock(rng_mutex);
+   mutex_unlock(reading_mutex);
 
if (need_resched())
schedule_timeout_interruptible(1);
@@ -208,6 +216,9 @@ out:
 out_unlock:
mutex_unlock(rng_mutex);
goto out;
+out_unlock_reading:
+   mutex_unlock(reading_mutex);
+   goto out_unlock;
 }
 
 
@@ -348,13 +359,16 @@ static int hwrng_fillfn(void *unused)
while (!kthread_should_stop()) {
if (!current_rng)
break;
+   mutex_lock(reading_mutex);
rc = rng_get_data(current_rng, rng_fillbuf,
  rng_buffer_size(), 1);
+   mutex_unlock(reading_mutex);
if (rc = 0) {
pr_warn(hwrng: no data available\n);
msleep_interruptible(1);
continue;
}
+   /* Outside lock, sure, but y'know: randomness. */
add_hwgenerator_randomness((void *)rng_fillbuf, rc,
   rc * current_quality * 8  10);
}
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE][RFC] KVMGT - the implementation of Intel GVT-g(full GPU virtualization) for KVM

2014-12-05 Thread Jike Song

On 12/05/2014 04:50 PM, Gerd Hoffmann wrote:

A few comments on the kernel stuff (brief look so far, also
compile-tested only, intel gfx on my test machine is too old).

  * Noticed the kernel bits don't even compile when configured as
module.  Everything (vgt, i915, kvm) must be compiled into the
kernel.


Yes, that's planned to be done along with separating hypervisor-related
code from vgt.


  * Design approach still seems to be i915 on vgt not the other way
around.


So far yes.



Qemu/SeaBIOS bits:

I've seen the host bridge changes identity from i440fx to
copy-pci-ids-from-host.  Guess the reason for this is that seabios uses
this device to figure whenever it is running on i440fx or q35.  Correct?



I did some trick in seabios/qemu. The purpose is to make qemu:

- provide IDs of an old host bridge to SeaBIOS
- provide IDs of new host bridge(the physical ones) to guest OS

So I made seabios to tell qemu that POST is done before jumping to guest
OS context.

This may be the simplest method to make things work, but yes, q35 emulation
of qemu may have this unnecessary, see below.


What are the exact requirements for the device?  Must it match the host
exactly, to not confuse the guest intel graphics driver?  Or would
something more recent -- such as the q35 emulation qemu has -- be good
enough to make things work (assuming we add support for the
graphic-related pci config space registers there)?



I don't know that is exactly needed, we also need to have Windows
driver considered.  However, I'm quite confident that, if things gonna
work for IGD passthrough, it gonna work for GVT-g.


The patch also adds a dummy isa bridge at 0x1f.  Simliar question here:
What exactly is needed here?  Would things work if we simply use the q35
lpc device here?



Ditto.


more to come after I've read the paper linked above ...


Thanks for review :)



cheers,
   Gerd



--
Thanks,
Jike
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE][RFC] KVMGT - the implementation of Intel GVT-g(full GPU virtualization) for KVM

2014-12-05 Thread Jike Song

CC Andy :)

On 12/05/2014 09:03 PM, Paolo Bonzini wrote:


On 05/12/2014 09:50, Gerd Hoffmann wrote:

A few comments on the kernel stuff (brief look so far, also
compile-tested only, intel gfx on my test machine is too old).

  * Noticed the kernel bits don't even compile when configured as
module.  Everything (vgt, i915, kvm) must be compiled into the
kernel.


I'll add that the patch is basically impossible to review with all the
XenGT bits still in.  For example, the x86 emulator seems to be
unnecessary for KVMGT, but I am not 100% sure.



This is not ready for merge yet, please wait for a while, we'll have
Xen/KVM specific code separated.

BTW, definitely you are right, the emulator is unnecessary for KVMGT,
and ... unnecessary for XenGT :)


I would like a clear understanding of why/how Andrew Barnes was able to
do i915 passthrough (GVT-d) without hacking the ISA bridge, and why this
does not apply to GVT-g.


AFAIK, the graphics drivers need to figure out the offset of
some MMIO registers, by the IDs of this ISA bridge. It simply won't work
without this information.

Talked with Andy about the pass-through but I don't have his implementation,
CC Andy for his advice :)



Paolo



Thanks for review. Would you please also have a look at the issues I mentioned
in the original email? they are most KVM-related: the SRCU trickiness, domid,
and the memslot created in kernel.

Thank you!

--
Thanks,
Jike
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Intel-gfx] [ANNOUNCE][RFC] KVMGT - the implementation of Intel GVT-g(full GPU virtualization) for KVM

2014-12-05 Thread Jike Song

On 12/05/2014 09:54 PM, Daniel Vetter wrote:

Yeah done a quick read-through of just the i915 bits too, same comment. I
guess this is just the first RFC and the redesign we've discussed about
already with xengt is in progress somewhere?


Yes, it's marching on with Xen now. The KVM implementation is
currently not even feature complete - we still have PPGTT missing.




Thanks, Daniel



--
Thanks,
Jike
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/6] fix hw_random stuck

2014-12-05 Thread Herbert Xu
On Sat, Dec 06, 2014 at 12:16:36PM +0800, Amos Kong wrote:
 When I hotunplug a busy virtio-rng device or try to access
 hwrng attributes in non-smp guest, it gets stuck.

Please resend these via the linux-crypto mailing list so they
can be picked up by patchwork.

Thanks,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html