Re: [PATCH 3.14 58/73] KVM: MIPS: Dont leak FPU/DSP to guest

2015-03-04 Thread James Hogan
Hi Greg,

On Tue, Mar 03, 2015 at 10:13:26PM -0800, Greg Kroah-Hartman wrote:
> 3.14-stable review patch.  If anyone has any objections, please let me know.
> 
> --
> 
> From: James Hogan 
> 
> commit f798217dfd038af981a18bbe4bc57027a08bb182 upstream.
> 
> The FPU and DSP are enabled via the CP0 Status CU1 and MX bits by
> kvm_mips_set_c0_status() on a guest exit, presumably in case there is
> active state that needs saving if pre-emption occurs. However neither of
> these bits are cleared again when returning to the guest.
> 
> This effectively gives the guest access to the FPU/DSP hardware after
> the first guest exit even though it is not aware of its presence,
> allowing FP instructions in guest user code to intermittently actually
> execute instead of trapping into the guest OS for emulation. It will
> then read & manipulate the hardware FP registers which technically
> belong to the user process (e.g. QEMU), or are stale from another user
> process. It can also crash the guest OS by causing an FP exception, for
> which a guest exception handler won't have been registered.
> 
> First lets save and disable the FPU (and MSA) state with lose_fpu(1)
> before entering the guest. This simplifies the problem, especially for
> when guest FPU/MSA support is added in the future, and prevents FR=1 FPU
> state being live when the FR bit gets cleared for the guest, which
> according to the architecture causes the contents of the FPU and vector
> registers to become UNPREDICTABLE.
> 
> We can then safely remove the enabling of the FPU in
> kvm_mips_set_c0_status(), since there should never be any active FPU or
> MSA state to save at pre-emption, which should plug the FPU leak.
> 
> DSP state is always live rather than being lazily restored, so for that
> it is simpler to just clear the MX bit again when re-entering the guest.
> 
> Signed-off-by: James Hogan 
> Cc: Paolo Bonzini 
> Cc: Ralf Baechle 
> Cc: Sanjay Lal 
> Cc: Gleb Natapov 
> Cc: kvm@vger.kernel.org
> Cc: linux-m...@linux-mips.org
> Cc:  # v3.10+: 044f0f03eca0: MIPS: KVM: Deliver guest 
> interrupts

The original 3.10 and 3.12/3.14 backports had this added:
Cc:  # v3.10+: 3ce465e04bfd: MIPS: Export FP functions 
used by lose_fpu(1) for KVM 
Which I can't see included in the v3.10 stable queue or branch. It fixes
a build error with MIPS malta_kvm_defconfig (MIPS=y, KVM=m) after this
patch is applied.

Same applies to the 3.14 queue too I think.

Cheers
James

> Cc:  # v3.10+
> Signed-off-by: Paolo Bonzini 
> Signed-off-by: James Hogan 
> Signed-off-by: Greg Kroah-Hartman 
> ---
> This should apply to stable trees 3.12 and 3.14, but not 3.10. The files
> had been renamed since v3.14 so it cherry-picked cleanly but the patch
> didn't apply cleanly. I've also added a reference to the "MIPS: Export
> FP functions used by lose_fpu(1) for KVM" commit which is itself marked
> for stable, but is needed to avoid a build failure when KVM=m.
> ---
>  arch/mips/kvm/kvm_locore.S |2 +-
>  arch/mips/kvm/kvm_mips.c   |6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> --- a/arch/mips/kvm/kvm_locore.S
> +++ b/arch/mips/kvm/kvm_locore.S
> @@ -428,7 +428,7 @@ __kvm_mips_return_to_guest:
>   /* Setup status register for running guest in UM */
>   .setat
>   or  v1, v1, (ST0_EXL | KSU_USER | ST0_IE)
> - and v1, v1, ~ST0_CU0
> + and v1, v1, ~(ST0_CU0 | ST0_MX)
>   .setnoat
>   mtc0v1, CP0_STATUS
>   ehb
> --- a/arch/mips/kvm/kvm_mips.c
> +++ b/arch/mips/kvm/kvm_mips.c
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -418,6 +419,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_v
>   vcpu->mmio_needed = 0;
>   }
>  
> + lose_fpu(1);
> +
>   local_irq_disable();
>   /* Check if we have any exceptions/interrupts pending */
>   kvm_mips_deliver_interrupts(vcpu,
> @@ -1021,9 +1024,6 @@ void kvm_mips_set_c0_status(void)
>  {
>   uint32_t status = read_c0_status();
>  
> - if (cpu_has_fpu)
> - status |= (ST0_CU1);
> -
>   if (cpu_has_dsp)
>   status |= (ST0_MX);
>  
> 
> 


signature.asc
Description: Digital signature


Re: virtio balloon: do not call blocking ops when !TASK_RUNNING

2015-03-04 Thread Rusty Russell
"Michael S. Tsirkin"  writes:
> On Mon, Mar 02, 2015 at 10:37:26AM +1030, Rusty Russell wrote:
>> Thomas Huth  writes:
>> > On Thu, 26 Feb 2015 11:50:42 +1030
>> > Rusty Russell  wrote:
>> >
>> >> Thomas Huth  writes:
>> >> >  Hi all,
>> >> >
>> >> > with the recent kernel 3.19, I get a kernel warning when I start my
>> >> > KVM guest on s390 with virtio balloon enabled:
>> >> 
>> >> The deeper problem is that virtio_ccw_get_config just silently fails on
>> >> OOM.
>> >> 
>> >> Neither get_config nor set_config are expected to fail.
>> >
>> > AFAIK this is currently not a problem. According to
>> > http://lwn.net/Articles/627419/ these kmalloc calls never
>> > fail because they allocate less than a page.
>> 
>> I strongly suggest you unlearn that fact.
>> The fix for this is in two parts:
>> 
>> 1) Annotate using sched_annotate_sleep() and add a comment: we may spin
>>a few times in low memory situations, but this isn't a high
>>performance path.
>> 
>> 2) Handle get_config (and other) failure in some more elegant way.
>> 
>> Cheers,
>> Rusty.
>
> I agree, but I'd like to point out that even without kmalloc,
> on s390 get_config is blocking - it's waiting
> for a hardware interrupt.
>
> And it makes sense: config is not data path, I don't think
> we should spin there.
>
> So I think besides these two parts, we still need my two patches:
> virtio-balloon: do not call blocking ops when !TASK_RUNNING

I prefer to annotate, over trying to fix this.

Because it's not important.  We might spin a few times, but it's very
unlikely, and it's certainly not performance critical.

Thanks,
Rusty.

Subject: virtio_balloon: annotate possible sleep waiting for event.

CCW (s390) does this.

Reported-by: Thomas Huth 
Signed-off-by: Rusty Russell 

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 0413157f3b49..3f4d5acdbde0 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -340,6 +340,15 @@ static int balloon(void *_vballoon)
s64 diff;
 
try_to_freeze();
+
+   /*
+* Reading the config on the ccw backend involves an
+* allocation, so we may actually sleep and have an
+* extra iteration.  It's extremely unlikely, and this
+* isn't a fast path in any sense.
+*/
+   sched_annotate_sleep();
+
wait_event_interruptible(vb->config_change,
 (diff = towards_target(vb)) != 0
 || vb->need_stats_update
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3.14 28/73] MIPS: KVM: Deliver guest interrupts after local_irq_disable()

2015-03-04 Thread Greg Kroah-Hartman
3.14-stable review patch.  If anyone has any objections, please let me know.

--

From: James Hogan 

commit 044f0f03eca0110e1835b2ea038a484b93950328 upstream.

When about to run the guest, deliver guest interrupts after disabling
host interrupts. This should prevent an hrtimer interrupt from being
handled after delivering guest interrupts, and therefore not delivering
the guest timer interrupt until after the next guest exit.

Signed-off-by: James Hogan 
Cc: Paolo Bonzini 
Cc: Gleb Natapov 
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle 
Cc: linux-m...@linux-mips.org
Cc: Sanjay Lal 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/mips/kvm/kvm_mips.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/mips/kvm/kvm_mips.c
+++ b/arch/mips/kvm/kvm_mips.c
@@ -418,11 +418,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_v
vcpu->mmio_needed = 0;
}
 
+   local_irq_disable();
/* Check if we have any exceptions/interrupts pending */
kvm_mips_deliver_interrupts(vcpu,
kvm_read_c0_guest_cause(vcpu->arch.cop0));
 
-   local_irq_disable();
kvm_guest_enter();
 
r = __kvm_mips_vcpu_run(run, vcpu);


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3.10 43/53] KVM: MIPS: Dont leak FPU/DSP to guest

2015-03-04 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: James Hogan 

commit f798217dfd038af981a18bbe4bc57027a08bb182 upstream.

The FPU and DSP are enabled via the CP0 Status CU1 and MX bits by
kvm_mips_set_c0_status() on a guest exit, presumably in case there is
active state that needs saving if pre-emption occurs. However neither of
these bits are cleared again when returning to the guest.

This effectively gives the guest access to the FPU/DSP hardware after
the first guest exit even though it is not aware of its presence,
allowing FP instructions in guest user code to intermittently actually
execute instead of trapping into the guest OS for emulation. It will
then read & manipulate the hardware FP registers which technically
belong to the user process (e.g. QEMU), or are stale from another user
process. It can also crash the guest OS by causing an FP exception, for
which a guest exception handler won't have been registered.

First lets save and disable the FPU (and MSA) state with lose_fpu(1)
before entering the guest. This simplifies the problem, especially for
when guest FPU/MSA support is added in the future, and prevents FR=1 FPU
state being live when the FR bit gets cleared for the guest, which
according to the architecture causes the contents of the FPU and vector
registers to become UNPREDICTABLE.

We can then safely remove the enabling of the FPU in
kvm_mips_set_c0_status(), since there should never be any active FPU or
MSA state to save at pre-emption, which should plug the FPU leak.

DSP state is always live rather than being lazily restored, so for that
it is simpler to just clear the MX bit again when re-entering the guest.

Signed-off-by: James Hogan 
Cc: Paolo Bonzini 
Cc: Ralf Baechle 
Cc: Sanjay Lal 
Cc: Gleb Natapov 
Cc: kvm@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc:  # v3.10+: 044f0f03eca0: MIPS: KVM: Deliver guest 
interrupts
Cc:  # v3.10+
Signed-off-by: Paolo Bonzini 
Signed-off-by: James Hogan 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/mips/kvm/kvm_locore.S |2 +-
 arch/mips/kvm/kvm_mips.c   |6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

--- a/arch/mips/kvm/kvm_locore.S
+++ b/arch/mips/kvm/kvm_locore.S
@@ -431,7 +431,7 @@ __kvm_mips_return_to_guest:
 /* Setup status register for running guest in UM */
 .set at
 or v1, v1, (ST0_EXL | KSU_USER | ST0_IE)
-and v1, v1, ~ST0_CU0
+and v1, v1, ~(ST0_CU0 | ST0_MX)
 .set noat
 mtc0v1, CP0_STATUS
 ehb
--- a/arch/mips/kvm/kvm_mips.c
+++ b/arch/mips/kvm/kvm_mips.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -413,6 +414,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_v
vcpu->mmio_needed = 0;
}
 
+   lose_fpu(1);
+
local_irq_disable();
/* Check if we have any exceptions/interrupts pending */
kvm_mips_deliver_interrupts(vcpu,
@@ -1017,9 +1020,6 @@ void kvm_mips_set_c0_status(void)
 {
uint32_t status = read_c0_status();
 
-   if (cpu_has_fpu)
-   status |= (ST0_CU1);
-
if (cpu_has_dsp)
status |= (ST0_MX);
 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3.10 20/53] MIPS: KVM: Deliver guest interrupts after local_irq_disable()

2015-03-04 Thread Greg Kroah-Hartman
3.10-stable review patch.  If anyone has any objections, please let me know.

--

From: James Hogan 

commit 044f0f03eca0110e1835b2ea038a484b93950328 upstream.

When about to run the guest, deliver guest interrupts after disabling
host interrupts. This should prevent an hrtimer interrupt from being
handled after delivering guest interrupts, and therefore not delivering
the guest timer interrupt until after the next guest exit.

Signed-off-by: James Hogan 
Cc: Paolo Bonzini 
Cc: Gleb Natapov 
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle 
Cc: linux-m...@linux-mips.org
Cc: Sanjay Lal 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/mips/kvm/kvm_mips.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/mips/kvm/kvm_mips.c
+++ b/arch/mips/kvm/kvm_mips.c
@@ -413,11 +413,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_v
vcpu->mmio_needed = 0;
}
 
+   local_irq_disable();
/* Check if we have any exceptions/interrupts pending */
kvm_mips_deliver_interrupts(vcpu,
kvm_read_c0_guest_cause(vcpu->arch.cop0));
 
-   local_irq_disable();
kvm_guest_enter();
 
r = __kvm_mips_vcpu_run(run, vcpu);


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 14/16] target-s390x: Extend QMP command query-cpu-definitions

2015-03-04 Thread Michael Mueller
On Mon, 2 Mar 2015 16:11:00 -0300
Eduardo Habkost  wrote:

> On Mon, Mar 02, 2015 at 01:44:06PM +0100, Michael Mueller wrote:
> > This patch implements the QMP command 'query-cpu-definitions' in the S390
> > context. The command returns a list of cpu model names in the current host
> > context. A consumer may successfully request each listed cpu model as long
> > for a given accelerator this model is runnable.
> > 
> > The QMP type AccelCpuModelInfo is introduced and the type CpuDefinitionInfo
> > is extended by the optional field 'accelerators'. It contains a list of 
> > named
> > accelerators and some indication whether the associated cpu model is 
> > runnable
> > or the default cpu model. The default cpu model is used if either no 
> > specific
> > cpu model is requested during QEMU startup or if the cpu model with name
> > 'host' is requested.
> > 
> > request:
> >   {"execute": "query-cpu-definitions"}
> > 
> > answer:
> >   {"return":
> > 
> > [{"name":"2964-ga1","accelerators":[{"name":"kvm","runnable":false,"default":false}]},
> >  
> > {"name":"2828-ga1","accelerators":[{"name":"kvm","runnable":false,"default":false}]},
> >  
> > {"name":"2827-ga2","accelerators":[{"name":"kvm","runnable":true,"default":true}]},
> >  
> > {"name":"2827-ga1","accelerators":[{"name":"kvm","runnable":true,"default":false}]},
> >  
> > {"name":"2818-ga1","accelerators":[{"name":"kvm","runnable":true,"default":false}]},
> >  ...
> >  
> > {"name":"2064-ga1","accelerators":[{"runnable":true,"name":"kvm","default":false}]}
> > ]
> >}
> 
> On x86, being runnable or not is something that depends on the
> machine-type. I expect that to happen in other machines as soon as they
> start implementing backwards compatiblity.
> 
> I see two options to implement that: 1) adding a "machine-type" argument
> to query-cpu-definitions; 2) returning a machine-type-based dictionary
> on the "runnable" property. The former sounds better to me as it won't
> require enumerating all machine-types every time.
> 
> In that case, why we do need to enumerate all accelerators on every
> query, either? We could have both "machine-type" and "accel" arguments
> to query-cpu-definitions, so callers will just ask for the
> acceleratores/machine-types they are interested into.
> 
> e.g.:
> 
> request:
>   {"execute": "query-cpu-definitions",
>"arguments": {"machine":"s390-virtio", "accel":"kvm"}}
> 
> answer:
>   {"return":
> [{"name":"2964-ga1","runnable":true},
>  {"name":"2828-ga1","runnable":false}
>  ...
> ]
>   }

I had some discussion with out libvirt people on that. I will compile a patch 
implementing this
to see how it fits..

> 
> We can also extend this to other variables, such as extra CPU flags that
> could make the CPU runnable or not. e.g.: want to know if "-machine
> foo,accel=bar -cpu xxx,+yyy,-zzz" is runnable? Send this request:
>   {"execute": "query-cpu-definitions",
>"arguments": {"machine":"s390-virtio", "accel":"kvm", 
> "cpu":"xxx,+yyy,-zzz"}}
> and get this response:
>   {"return": [{"name":"xxx","runnable":false}]}
> or maybe being more explicit in the response about the extra CPU flags:
>   {"return": [{"name":"xxx,+yyy,-zzz","runnable":false}]}
> 
> 
> >   {"execute": "query-cpu-definitions"}
> > 
> > answer:
> >   {"return":
> > 
> > [{"name":"2964-ga1","accelerators":[{"name":"kvm","runnable":false,"default":false}]},
> >  
> > {"name":"2828-ga1","accelerators":[{"name":"kvm","runnable":false,"default":false}]},
> >  
> > {"name":"2827-ga2","accelerators":[{"name":"kvm","runnable":true,"default":true}]},
> >  
> > {"name":"2827-ga1","accelerators":[{"name":"kvm","runnable":true,"default":false}]},
> >  
> > {"name":"2818-ga1","accelerators":[{"name":"kvm","runnable":true,"default":false}]},
> > 
> > Signed-off-by: Michael Mueller 
> > [...]
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v9 1/5] KVM: arm/arm64: unset CONFIG_HAVE_KVM_IRQCHIP

2015-03-04 Thread Eric Auger
CONFIG_HAVE_KVM_IRQCHIP is needed to support IRQ routing (along
with irq_comm.c and irqchip.c usage). This is not the case for
arm/arm64 currently.

This patch unsets the flag for both arm and arm64.

Signed-off-by: Eric Auger 
Reviewed-by: Andre Przywara 
Acked-by: Christoffer Dall 
Acked-by: Will Deacon 
---
 arch/arm/kvm/Kconfig   | 2 --
 arch/arm64/kvm/Kconfig | 1 -
 2 files changed, 3 deletions(-)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 338ace7..aae5242 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -58,7 +58,6 @@ config KVM_ARM_MAX_VCPUS
 config KVM_ARM_VGIC
bool "KVM support for Virtual GIC"
depends on KVM_ARM_HOST && OF
-   select HAVE_KVM_IRQCHIP
default y
---help---
  Adds support for a hardware assisted, in-kernel GIC emulation.
@@ -66,7 +65,6 @@ config KVM_ARM_VGIC
 config KVM_ARM_TIMER
bool "KVM support for Architected Timers"
depends on KVM_ARM_VGIC && ARM_ARCH_TIMER
-   select HAVE_KVM_IRQCHIP
default y
---help---
  Adds support for the Architected Timers in virtual machines
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index f5590c8..039d8cf 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -53,7 +53,6 @@ config KVM_ARM_MAX_VCPUS
 config KVM_ARM_VGIC
bool
depends on KVM_ARM_HOST && OF
-   select HAVE_KVM_IRQCHIP
---help---
  Adds support for a hardware assisted, in-kernel GIC emulation.
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v9 5/5] KVM: arm/arm64: add irqfd support

2015-03-04 Thread Eric Auger
This patch enables irqfd on arm/arm64.

Both irqfd and resamplefd are supported. Injection is implemented
in vgic.c without routing.

This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD.

KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability
automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set.

Irqfd injection is restricted to SPI. The rationale behind not
supporting PPI irqfd injection is that any device using a PPI would
be a private-to-the-CPU device (timer for instance), so its state
would have to be context-switched along with the VCPU and would
require in-kernel wiring anyhow. It is not a relevant use case for
irqfds.

Signed-off-by: Eric Auger 
Reviewed-by: Christoffer Dall 

---
v8 -> v9:
- replace kvm_debug by trace_kvm_set_irq and add
  BUG_ON(!vgic_initialized(kvm));

v7 -> v8:
- remove kvm_irq_has_notifier call
- part of dist locking changes now are part of previous patch file
- remove gic_initialized() check in kvm_set_irq
- remove Christoffer's Reviewed-by after this change

v5 -> v6:
- KVM_CAP_IRQFD support depends on vgic_present
- add Christoffer's Reviewed-by

v4 -> v5:
- squash [PATCH v4 3/3] KVM: arm64: add irqfd support into this patch
- some rewording in Documentation/virtual/kvm/api.txt and in vgic
  vgic_process_maintenance unlock comment.
- move explanation of why not supporting PPI into commit message
- in case of injection before gic readiness, -ENODEV is returned. It is
  up to the user space to avoid this situation.

v3 -> v4:
- reword commit message
- explain why we unlock the distributor before calling kvm_notify_acked_irq
- rename is_assigned_irq into has_notifier
- change EOI and injection kvm_debug format string
- remove error local variable in kvm_set_irq
- Move HAVE_KVM_IRQCHIP unset in a separate patch
- handle case were the irqfd injection is attempted before the vgic is ready.
  in such a case the notifier, if any, is called immediatly
- use nr_irqs to test spi is within correct range

v2 -> v3:
- removal of irq.h from eventfd.c put in a separate patch to increase
  visibility
- properly expose KVM_CAP_IRQFD capability in arm.c
- remove CONFIG_HAVE_KVM_IRQCHIP meaningfull only if irq_comm.c is used

v1 -> v2:
- rebase on 3.17rc1
- move of the dist unlock in process_maintenance
- remove of dist lock in __kvm_vgic_sync_hwstate
- rewording of the commit message (add resamplefd reference)
- remove irq.h

Conflicts:
arch/arm64/kvm/Kconfig
---
 Documentation/virtual/kvm/api.txt |  6 -
 arch/arm/include/uapi/asm/kvm.h   |  3 +++
 arch/arm/kvm/Kconfig  |  2 ++
 arch/arm/kvm/Makefile |  2 +-
 arch/arm/kvm/arm.c|  5 
 arch/arm64/include/uapi/asm/kvm.h |  3 +++
 arch/arm64/kvm/Kconfig|  2 ++
 arch/arm64/kvm/Makefile   |  2 +-
 virt/kvm/arm/vgic.c   | 48 +++
 9 files changed, 70 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index b112efc..b265d8e 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2234,7 +2234,7 @@ into the hash PTE second double word).
 4.75 KVM_IRQFD
 
 Capability: KVM_CAP_IRQFD
-Architectures: x86 s390
+Architectures: x86 s390 arm arm64
 Type: vm ioctl
 Parameters: struct kvm_irqfd (in)
 Returns: 0 on success, -1 on error
@@ -2260,6 +2260,10 @@ Note that closing the resamplefd is not sufficient to 
disable the
 irqfd.  The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment
 and need not be specified with KVM_IRQFD_FLAG_DEASSIGN.
 
+On ARM/ARM64, the gsi field in the kvm_irqfd struct specifies the Shared
+Peripheral Interrupt (SPI) index, such that the GIC interrupt ID is
+given by gsi + 32.
+
 4.76 KVM_PPC_ALLOCATE_HTAB
 
 Capability: KVM_CAP_PPC_ALLOC_HTAB
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index 0db25bc..2499867 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -198,6 +198,9 @@ struct kvm_arch_memory_slot {
 /* Highest supported SPI, from VGIC_NR_IRQS */
 #define KVM_ARM_IRQ_GIC_MAX127
 
+/* One single KVM irqchip, ie. the VGIC */
+#define KVM_NR_IRQCHIPS  1
+
 /* PSCI interface */
 #define KVM_PSCI_FN_BASE   0x95c1ba5e
 #define KVM_PSCI_FN(n) (KVM_PSCI_FN_BASE + (n))
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index aae5242..d735a3e 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -27,6 +27,7 @@ config KVM
select KVM_GENERIC_DIRTYLOG_READ_PROTECT
select SRCU
depends on ARM_VIRT_EXT && ARM_LPAE
+   select HAVE_KVM_EVENTFD
---help---
  Support hosting virtualized guest machines. You will also
  need to select one or more of the processor modules below.
@@ -58,6 +59,7 @@ config KVM_ARM_MAX_VCPUS
 config KVM_ARM_VGIC
bool "KVM support for Virtual GIC"
depends on K

[PATCH v9 3/5] KVM: arm/arm64: implement kvm_arch_intc_initialized

2015-03-04 Thread Eric Auger
On arm/arm64 the VGIC is dynamically instantiated and it is useful
to expose its state, especially for irqfd setup.

This patch defines __KVM_HAVE_ARCH_INTC_INITIALIZED and
implements kvm_arch_intc_initialized.

Signed-off-by: Eric Auger 
Acked-by: Christoffer Dall 
Reviewed-by: Andre Przywara 

---

v7 -> v8:
- add Andre's Reviewed-by

v6 -> v7:
- usage of the new function in irqfd is introduced in previous patch
  file.
- Changed the title and the commit message accordingly

v5 -> v6:
- remove kvm_arch_intc_initialized declaration from
  architecture specific kvm_host.h since it is now in
  generic kvm_host.h
- squash v5 patch files 3 & 4
---
 arch/arm/include/asm/kvm_host.h   | 2 ++
 arch/arm/kvm/arm.c| 5 +
 arch/arm64/include/asm/kvm_host.h | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 41008cd..902a7d1 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -27,6 +27,8 @@
 #include 
 #include 
 
+#define __KVM_HAVE_ARCH_INTC_INITIALIZED
+
 #if defined(CONFIG_KVM_ARM_MAX_VCPUS)
 #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
 #else
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 07e7eb1..5300d5a 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -452,6 +452,11 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
return 0;
 }
 
+bool kvm_arch_intc_initialized(struct kvm *kvm)
+{
+   return vgic_initialized(kvm);
+}
+
 static void vcpu_pause(struct kvm_vcpu *vcpu)
 {
wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 8ac3c70..967fb1c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -28,6 +28,8 @@
 #include 
 #include 
 
+#define __KVM_HAVE_ARCH_INTC_INITIALIZED
+
 #if defined(CONFIG_KVM_ARM_MAX_VCPUS)
 #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
 #else
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v9 2/5] KVM: introduce kvm_arch_intc_initialized and use it in irqfd

2015-03-04 Thread Eric Auger
Introduce __KVM_HAVE_ARCH_INTC_INITIALIZED define and
associated kvm_arch_intc_initialized function. This latter
allows to test whether the virtual interrupt controller is initialized
and ready to accept virtual IRQ injection. On some architectures,
the virtual interrupt controller is dynamically instantiated, justifying
that kind of check.

The new function can now be used by irqfd to check whether the
virtual interrupt controller is ready on KVM_IRQFD request. If not,
KVM_IRQFD returns -EAGAIN.

Signed-off-by: Eric Auger 
Acked-by: Christoffer Dall 
Reviewed-by: Andre Przywara 

---
v7 -> v8:
- correct typo in comment
- add Andre's Reviewed-by

v6 -> v7:
- From now on, kvm_irqfd_assign calls kvm_arch_intc_initialized
  (previously introduced in subsequent patch file).
- add Christoffer's ack

v5 -> v6:
- rename function name and macro
- add kvm_arch_intc_initialized declaration in case the archi defines
  the macro
---
 include/linux/kvm_host.h | 14 ++
 virt/kvm/eventfd.c   |  3 +++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d12b210..ae9c720 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -700,6 +700,20 @@ static inline wait_queue_head_t *kvm_arch_vcpu_wq(struct 
kvm_vcpu *vcpu)
 #endif
 }
 
+#ifdef __KVM_HAVE_ARCH_INTC_INITIALIZED
+/*
+ * returns true if the virtual interrupt controller is initialized and
+ * ready to accept virtual IRQ. On some architectures the virtual interrupt
+ * controller is dynamically instantiated and this is not always true.
+ */
+bool kvm_arch_intc_initialized(struct kvm *kvm);
+#else
+static inline bool kvm_arch_intc_initialized(struct kvm *kvm)
+{
+   return true;
+}
+#endif
+
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type);
 void kvm_arch_destroy_vm(struct kvm *kvm);
 void kvm_arch_sync_events(struct kvm *kvm);
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 148b239..fc5f43e 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -311,6 +311,9 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
unsigned int events;
int idx;
 
+   if (!kvm_arch_intc_initialized(kvm))
+   return -EAGAIN;
+
irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
if (!irqfd)
return -ENOMEM;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v9 4/5] KVM: arm/arm64: remove coarse grain dist locking at kvm_vgic_sync_hwstate

2015-03-04 Thread Eric Auger
To prepare for irqfd addition, coarse grain locking is removed at
kvm_vgic_sync_hwstate level and finer grain locking is introduced in
vgic_process_maintenance only.

Signed-off-by: Eric Auger 
Acked-by: Christoffer Dall 
---
 virt/kvm/arm/vgic.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 0cc6ab6..4e9b6d3 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1081,6 +1081,7 @@ epilog:
 static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 {
u32 status = vgic_get_interrupt_status(vcpu);
+   struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
bool level_pending = false;
 
kvm_debug("STATUS = %08x\n", status);
@@ -1098,6 +1099,7 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
*vcpu)
struct vgic_lr vlr = vgic_get_lr(vcpu, lr);
WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
 
+   spin_lock(&dist->lock);
vgic_irq_clear_queued(vcpu, vlr.irq);
WARN_ON(vlr.state & LR_STATE_MASK);
vlr.state = 0;
@@ -1125,6 +1127,8 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
*vcpu)
vgic_cpu_irq_clear(vcpu, vlr.irq);
}
 
+   spin_unlock(&dist->lock);
+
/*
 * Despite being EOIed, the LR may not have
 * been marked as empty.
@@ -1139,10 +1143,7 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
*vcpu)
return level_pending;
 }
 
-/*
- * Sync back the VGIC state after a guest run. The distributor lock is
- * needed so we don't get preempted in the middle of the state processing.
- */
+/* Sync back the VGIC state after a guest run */
 static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
@@ -1189,14 +1190,10 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 
 void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
-   struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
-
if (!irqchip_in_kernel(vcpu->kvm))
return;
 
-   spin_lock(&dist->lock);
__kvm_vgic_sync_hwstate(vcpu);
-   spin_unlock(&dist->lock);
 }
 
 int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v9 0/5] irqfd support for arm/arm64

2015-03-04 Thread Eric Auger
This patch series enables irqfd on arm and arm64.

Irqfd framework enables to inject a virtual IRQ into a guest upon an
eventfd trigger. User-side uses KVM_IRQFD VM ioctl to provide KVM with
a kvm_irqfd struct that associates a VM, an eventfd, a virtual IRQ number
(aka. the gsi). When an actor signals the eventfd (typically a VFIO
platform driver), the kvm irqfd subsystem injects the gsi into the VM.

Resamplefd also is supported for level sensitive interrupts, ie. the
user can provide another eventfd that is triggered when the completion
of the virtual IRQ (gsi) is detected by the GIC.

The gsi must correspond to a shared peripheral interrupt (SPI), ie the
GIC interrupt ID is gsi + 32.

The rationale behind not supporting PPI irqfd injection is that
any device using a PPI would be a private-to-the-CPU device (timer for
instance), so its state would have to be context-switched along with the
VCPU and would require in-kernel wiring anyhow. It is not a relevant use
case for irqfds.

This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD.

No IRQ routing table is used, enabling to remove CONFIG_HAVE_KVM_IRQCHIP

The ARM virtual interrupt controller, the VGIC, is dynamically
instantiated. The user-space may attempt to assign an irqfd before
the virtual interrupt controller is ready. For that reason a
check is added in the generic irqfd code to test whether the virtual
interrupt controller is ready.

This work was tested with Calxeda Midway xgmac main interrupt with
qemu-system-arm and QEMU VFIO platform device. Also irqfd was proven
functional on several vhost-net prototypes.

Available on ssh://git.linaro.org/people/eric.auger/linux.git
branch irqfd_v9_stdalone_official_release

v8 -> v9:
- add Christoffer's Reviewed-by on "KVM: arm/arm64: add irqfd support"
  and Acked-by on "KVM: arm/arm64: remove coarse grain dist locking at
  kvm_vgic_sync_hwstate"
- replace kvm_debug by trace_kvm_set_irq and add
  BUG_ON(!vgic_initialized(kvm)) in vgic.c kvm_set_irq.

v7 -> v8:
- Address Andre's comments:
  - introduce a separate patch file for coarse grain dist lock removal at
kvm_vgic_sync_hwstate level
  - update comment above __kvm_vgic_sync_hwstate
  - remove kvm_irq_has_notifier call
  - remove gic_initialized() check in kvm_set_irq
  - fix a typo in one comment (kvm_host.h)

v6 -> v7:
- kvm_arch_intc_initialized call introduced in first patch file
  related to kvm_arch_intc_initialized (generic part)
- add Christoffer's acks

v5 -> v6:
- take into account Christoffer's comments:
  - rename macro and function enabling to check the state of virtual
interrupt controller (kvm_arch_intc_initialized)
  - kvm_arch_intc_initialized is declared in kvm_host.h whatever the
archi support.
  - squash v5 patch files 3 & 4
  - KVM_CAP_IRQFD support depends on vgic_present
  - add Christoffer's Reviewed-by on last patch file

v4 -> v5:
- add the capability to check whether vgic is initialized when
  assigning an irqfd.  objective is to avoid injecting IRQ before
  this vgic is ready: this corresponds to new patch files 2, 3, 4.
- do not specifically handle early virtual IRQ injections in
  kvm_set_irq.  In case of injection when vgic is not yet ready,
  simply return an error.  User-space now has means to force vgic
  init and get notified if irqfd assign takes place too early.
- squash [PATCH v4 2/3] KVM: arm: add irqfd support and
 [PATCH v4 3/3] KVM: arm64: add irqfd support
- add Acked-by's in KVM: arm/arm64: unset CONFIG_HAVE_KVM_IRQCHIP
- some comment rewording in vgic

v3 -> v4:
- rebase on 3.18rc5
- vgic dynamic instantiation brought new challenges:
  handling of irqfd injection when vgic is not ready
- unset of CONFIG_HAVE_KVM_IRQCHIP in a separate patch
- add arm64 enable
- vgic.c style modifications according to Christoffer comments

v2 -> v3:
- removal of irq.h from eventfd.c put in a separate patch to increase
  visibility
- properly expose KVM_CAP_IRQFD capability in arm.c
- remove CONFIG_HAVE_KVM_IRQCHIP meaningfull only if irq_comm.c is used

v1 -> v2:
- rebase on 3.17rc1
- move of the dist unlock in process_maintenance
- remove of dist lock in __kvm_vgic_sync_hwstate
- rewording of the commit message (add resamplefd reference)
- remove irq.h

Eric Auger (5):
  KVM: arm/arm64: unset CONFIG_HAVE_KVM_IRQCHIP
  KVM: introduce kvm_arch_intc_initialized and use it in irqfd
  KVM: arm/arm64: implement kvm_arch_intc_initialized
  KVM: arm/arm64: remove coarse grain dist locking at
kvm_vgic_sync_hwstate
  KVM: arm/arm64: add irqfd support

 Documentation/virtual/kvm/api.txt |  6 +++-
 arch/arm/include/asm/kvm_host.h   |  2 ++
 arch/arm/include/uapi/asm/kvm.h   |  3 ++
 arch/arm/kvm/Kconfig  |  4 +--
 arch/arm/kvm/Makefile |  2 +-
 arch/arm/kvm/arm.c| 10 +++
 arch/arm64/include/asm/kvm_host.h |  2 ++
 arch/arm64/include/uapi/asm/kvm.h |  3 ++
 arch/arm64/kvm/Kconfig|  3 +-
 arch/arm64/kvm/Makefile   |  2 +-
 include/linux/

Re: virtio balloon: do not call blocking ops when !TASK_RUNNING

2015-03-04 Thread Michael S. Tsirkin
On Wed, Mar 04, 2015 at 04:44:54PM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin"  writes:
> > On Mon, Mar 02, 2015 at 10:37:26AM +1030, Rusty Russell wrote:
> >> Thomas Huth  writes:
> >> > On Thu, 26 Feb 2015 11:50:42 +1030
> >> > Rusty Russell  wrote:
> >> >
> >> >> Thomas Huth  writes:
> >> >> >  Hi all,
> >> >> >
> >> >> > with the recent kernel 3.19, I get a kernel warning when I start my
> >> >> > KVM guest on s390 with virtio balloon enabled:
> >> >> 
> >> >> The deeper problem is that virtio_ccw_get_config just silently fails on
> >> >> OOM.
> >> >> 
> >> >> Neither get_config nor set_config are expected to fail.
> >> >
> >> > AFAIK this is currently not a problem. According to
> >> > http://lwn.net/Articles/627419/ these kmalloc calls never
> >> > fail because they allocate less than a page.
> >> 
> >> I strongly suggest you unlearn that fact.
> >> The fix for this is in two parts:
> >> 
> >> 1) Annotate using sched_annotate_sleep() and add a comment: we may spin
> >>a few times in low memory situations, but this isn't a high
> >>performance path.
> >> 
> >> 2) Handle get_config (and other) failure in some more elegant way.
> >> 
> >> Cheers,
> >> Rusty.
> >
> > I agree, but I'd like to point out that even without kmalloc,
> > on s390 get_config is blocking - it's waiting
> > for a hardware interrupt.
> >
> > And it makes sense: config is not data path, I don't think
> > we should spin there.
> >
> > So I think besides these two parts, we still need my two patches:
> > virtio-balloon: do not call blocking ops when !TASK_RUNNING
> 
> I prefer to annotate, over trying to fix this.
> 
> Because it's not important.  We might spin a few times, but it's very
> unlikely, and it's certainly not performance critical.
> 
> Thanks,
> Rusty.
> 
> Subject: virtio_balloon: annotate possible sleep waiting for event.
> 
> CCW (s390) does this.
> 
> Reported-by: Thomas Huth 
> Signed-off-by: Rusty Russell 
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 0413157f3b49..3f4d5acdbde0 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -340,6 +340,15 @@ static int balloon(void *_vballoon)
>   s64 diff;
>  
>   try_to_freeze();
> +
> + /*
> +  * Reading the config on the ccw backend involves an
> +  * allocation, so we may actually sleep and have an
> +  * extra iteration.  It's extremely unlikely,

Hmm, this part of the comment seems wrong to me.
Reading the config on the ccw backend always sleeps
because it's interrupt driven.

This is the relevant code:

static int ccw_io_helper(struct virtio_ccw_device *vcdev,
 struct ccw1 *ccw, __u32 intparm)
{
int ret;
unsigned long flags;
int flag = intparm & VIRTIO_CCW_INTPARM_MASK;

do {
spin_lock_irqsave(get_ccwdev_lock(vcdev->cdev), flags);
ret = ccw_device_start(vcdev->cdev, ccw, intparm, 0, 0);
if (!ret) {
if (!vcdev->curr_io)
vcdev->err = 0;
vcdev->curr_io |= flag;
}
spin_unlock_irqrestore(get_ccwdev_lock(vcdev->cdev), flags);
cpu_relax();
} while (ret == -EBUSY);
wait_event(vcdev->wait_q, doing_io(vcdev, flag) == 0);
return ret ? ret : vcdev->err;
}


>and this
> +  * isn't a fast path in any sense.
> +  */
> + sched_annotate_sleep();
> +
>   wait_event_interruptible(vb->config_change,
>(diff = towards_target(vb)) != 0
>|| vb->need_stats_update


So the wait_event_interruptible always calls wait_event
which then becomes a busy wait on s390, which is not nice.

So I suspect
http://mid.gmane.org/1424874878-17155-1-git-send-email-...@redhat.com
is better.

What do you think?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM fixes for 4.0-rc2

2015-03-04 Thread Marcelo Tosatti

Linus,

Please pull from

git://git.kernel.org/pub/scm/virt/kvm/kvm.git master

To receive the following KVM bug fixes, including a SVM
interrupt injection regression fix, MIPS and ARM bug fixes.


James Hogan (1):
  KVM: MIPS: Fix trace event to save PC directly

Jan Kiszka (1):
  ARM: KVM: Fix size check in __coherent_cache_guest_page

Paolo Bonzini (1):
  KVM: emulate: fix CMPXCHG8B on 32-bit hosts

Radim Krčmář (2):
  KVM: VMX: fix build without CONFIG_SMP
  KVM: SVM: fix interrupt injection (apic->isr_count always 0)

Tapasweni Pathak (1):
  KVM: MIPS: Enable after disabling interrupt

Wei Huang (1):
  arm/arm64: KVM: Add exit reaons to kvm_exit event tracing

 arch/arm/include/asm/kvm_mmu.h |2 +-
 arch/arm/kvm/arm.c |2 +-
 arch/arm/kvm/trace.h   |   10 +++---
 arch/mips/kvm/tlb.c|1 +
 arch/mips/kvm/trace.h  |6 +++---
 arch/x86/kvm/emulate.c |3 ++-
 arch/x86/kvm/lapic.c   |4 ++--
 arch/x86/kvm/svm.c |6 --
 arch/x86/kvm/vmx.c |   23 ++-
 9 files changed, 31 insertions(+), 26 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-03-04 Thread Catalin Marinas
(please try to avoid top-posting)

On Mon, Mar 02, 2015 at 06:20:19PM -0800, Mario Smarduch wrote:
> On 03/02/2015 08:31 AM, Christoffer Dall wrote:
> > However, my concern with these patches are on two points:
> > 
> > 1. It's not a fix-all.  We still have the case where the guest expects
> > the behavior of device memory (for strong ordering for example) on a RAM
> > region, which we now break.  Similiarly this doesn't support the
> > non-coherent DMA to RAM region case.
> > 
> > 2. While the code is probably as nice as this kind of stuff gets, it
> > is non-trivial and extremely difficult to debug.  The counter-point here
> > is that we may end up handling other stuff at EL2 for performanc reasons
> > in the future.
> > 
> > Mainly because of point 1 above, I am leaning to thinking userspace
> > should do the invalidation when it knows it needs to, either through KVM
> > via a memslot flag or through some other syscall mechanism.

I expressed my concerns as well, I'm definitely against merging this
series.

> I don't understand how can the CPU handle different cache attributes
> used by QEMU and Guest won't you run into B2.9 checklist? Wouldn't
> cache evictions or cleans wipe out guest updates to same cache
> line(s)?

"Clean+invalidate" is a safe operation even if the guest accesses the
memory in a cacheable way. But if the guest can update the cache lines,
Qemu should avoid cache maintenance from a performance perspective.

The guest is either told that the DMA is coherent (via DT properties) or
Qemu deals with (non-)coherency itself. The latter is fully in line with
the B2.9 chapter in the ARM ARM, more precisely point 5:

  If the mismatched attributes for a memory location all assign the same
  shareability attribute to the location, any loss of uniprocessor
  semantics or coherency within a shareability domain can be avoided by
  use of software cache management.

... it continues with what kind of cache maintenance is required,
together with:

  A clean and invalidate instruction can be used instead of a clean
  instruction, or instead of an invalidate instruction.

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 2/5] KVM: s390/cpacf: Fix kernel bug under z/VM

2015-03-04 Thread Christian Borntraeger
Under z/VM PQAP might trigger an operation exception if no crypto cards
are defined via APVIRTUAL or APDEDICATED.

[  386.098666] Kernel BUG at 00135c56 [verbose debug info unavailable]
[  386.098693] illegal operation: 0001 ilc:2 [#1] SMP
[...]
[  386.098751] Krnl PSW : 0704c0018000 00135c56 
(kvm_s390_apxa_installed+0x46/0x98)
[...]
[  386.098804]  [<0013627c>] kvm_arch_init_vm+0x29c/0x358
[  386.098806]  [<0012d008>] kvm_dev_ioctl+0xc0/0x460
[  386.098809]  [<002c639a>] do_vfs_ioctl+0x332/0x508
[  386.098811]  [<002c660e>] SyS_ioctl+0x9e/0xb0
[  386.098814]  [<0070476a>] system_call+0xd6/0x258
[  386.098815]  [<03fffc7400a2>] 0x3fffc7400a2

Lets add an extable entry and provide a zeroed config in that case.

Reported-by: Stefan Zimmermann 
Signed-off-by: Christian Borntraeger 
Reviewed-by: Thomas Huth 
Tested-by: Stefan Zimmermann 
---
 arch/s390/kvm/kvm-s390.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index b4d2030..18965f9 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -778,15 +778,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
 static int kvm_s390_query_ap_config(u8 *config)
 {
u32 fcn_code = 0x0400UL;
-   u32 cc;
+   u32 cc = 0;
 
+   memset(config, 0, 128);
asm volatile(
"lgr 0,%1\n"
"lgr 2,%2\n"
".long 0xb2af\n"/* PQAP(QCI) */
-   "ipm %0\n"
+   "0: ipm %0\n"
"srl %0,28\n"
-   : "=r" (cc)
+   "1:\n"
+   EX_TABLE(0b, 1b)
+   : "+r" (cc)
: "r" (fcn_code), "r" (config)
: "cc", "0", "2", "memory"
);
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 5/5] KVM: s390: non-LPAR case obsolete during facilities mask init

2015-03-04 Thread Christian Borntraeger
From: Michael Mueller 

With patch "include guest facilities in kvm facility test" it is no
longer necessary to have special handling for the non-LPAR case.

Signed-off-by: Michael Mueller 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/kvm/kvm-s390.c | 17 +
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 5a02be4..f6579cf 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -902,24 +902,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (!kvm->arch.model.fac)
goto out_nofac;
 
+   /* Populate the facility mask initially. */
memcpy(kvm->arch.model.fac->mask, S390_lowcore.stfle_fac_list,
   S390_ARCH_FAC_LIST_SIZE_BYTE);
-
-   /*
-* If this KVM host runs *not* in a LPAR, relax the facility bits
-* of the kvm facility mask by all missing facilities. This will allow
-* to determine the right CPU model by means of the remaining 
facilities.
-* Live guest migration must prohibit the migration of KVMs running in
-* a LPAR to non LPAR hosts.
-*/
-   if (!MACHINE_IS_LPAR)
-   for (i = 0; i < kvm_s390_fac_list_mask_size(); i++)
-   kvm_s390_fac_list_mask[i] &= 
kvm->arch.model.fac->mask[i];
-
-   /*
-* Apply the kvm facility mask to limit the kvm supported/tolerated
-* facility list.
-*/
for (i = 0; i < S390_ARCH_FAC_LIST_SIZE_U64; i++) {
if (i < kvm_s390_fac_list_mask_size())
kvm->arch.model.fac->mask[i] &= 
kvm_s390_fac_list_mask[i];
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 0/5] KVM: s390: Fixups for changes in merge window for 4.0

2015-03-04 Thread Christian Borntraeger
Paolo, Marcelo,

here is a pull request for kvm/master targetting 4.0. It is based on
4.0-rc1 as I want all patches in an upcoming pull request against 
kvm/next as well to avoid nasty merge conflicts with other changes
that we have pending. So my next pull request for kvm/next will
be based on fb5bf93f84c277546473, which is the HEAD of this pull
request.

All patches fix commits that were merged during this merge window.

Please pull.


Christian


The following changes since commit c517d838eb7d07bbe9507871fab3931deccff539:

  Linux 4.0-rc1 (2015-02-22 18:21:14 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  
tags/kvm-s390-master-20150303

for you to fetch changes up to fb5bf93f84c277546473be35543ed7890f6e6742:

  KVM: s390: non-LPAR case obsolete during facilities mask init (2015-03-04 
10:33:25 +0100)


KVM: s390: Fixups for changes in merge window for 4.0

Here are some fixups/improvements for

commit 658b6eda204 ("KVM: s390: add cpu model support")
commit 9d8d578605b ("KVM: s390: use facilities and cpu_id per KVM")
commit a374e892c34 ("KVM: s390/cpacf: Enable/disable protected key
functions for kvm guest")
commit 45c9b47c588 ("KVM: s390/CPACF: Choose crypto control block format")

which all have been merged during the merge window for 4.0.


Christian Borntraeger (1):
  KVM: s390/cpacf: Fix kernel bug under z/VM

Michael Mueller (3):
  KVM: s390: fix in memory copy of facility lists
  KVM: s390: include guest facilities in kvm facility test
  KVM: s390: non-LPAR case obsolete during facilities mask init

Tony Krowiak (1):
  KVM: s390/cpacf: Enable key wrapping by default

 arch/s390/include/asm/kvm_host.h | 12 +++
 arch/s390/kvm/kvm-s390.c | 68 ++--
 arch/s390/kvm/kvm-s390.h |  3 +-
 arch/s390/kvm/priv.c |  2 +-
 4 files changed, 40 insertions(+), 45 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 1/5] KVM: s390/cpacf: Enable key wrapping by default

2015-03-04 Thread Christian Borntraeger
From: Tony Krowiak 

z/VM and LPAR enable key wrapping by default, lets do the same on KVM.

Signed-off-by: Tony Krowiak 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/kvm/kvm-s390.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 0c36239..b4d2030 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -839,9 +839,13 @@ static int kvm_s390_crypto_init(struct kvm *kvm)
 
kvm_s390_set_crycb_format(kvm);
 
-   /* Disable AES/DEA protected key functions by default */
-   kvm->arch.crypto.aes_kw = 0;
-   kvm->arch.crypto.dea_kw = 0;
+   /* Enable AES/DEA protected key functions by default */
+   kvm->arch.crypto.aes_kw = 1;
+   kvm->arch.crypto.dea_kw = 1;
+   get_random_bytes(kvm->arch.crypto.crycb->aes_wrapping_key_mask,
+sizeof(kvm->arch.crypto.crycb->aes_wrapping_key_mask));
+   get_random_bytes(kvm->arch.crypto.crycb->dea_wrapping_key_mask,
+sizeof(kvm->arch.crypto.crycb->dea_wrapping_key_mask));
 
return 0;
 }
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 4/5] KVM: s390: include guest facilities in kvm facility test

2015-03-04 Thread Christian Borntraeger
From: Michael Mueller 

Most facility related decisions in KVM have to take into account:

- the facilities offered by the underlying run container (LPAR/VM)
- the facilities supported by the KVM code itself
- the facilities requested by a guest VM

This patch adds the KVM driver requested facilities to the test routine.

It additionally renames struct s390_model_fac to kvm_s390_fac and its field
names to be more meaningful.

The semantics of the facilities stored in the KVM architecture structure
is changed. The address arch.model.fac->list now points to the guest
facility list and arch.model.fac->mask points to the KVM facility mask.

This patch fixes the behaviour of KVM for some facilities for guests
that ignore the guest visible facility bits, e.g. guests could use
transactional memory intructions on hosts supporting them even if the
chosen cpu model would not offer them.

The userspace interface is not affected by this change.

Signed-off-by: Michael Mueller 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/include/asm/kvm_host.h | 12 ++--
 arch/s390/kvm/kvm-s390.c | 30 --
 arch/s390/kvm/kvm-s390.h |  3 ++-
 arch/s390/kvm/priv.c |  2 +-
 4 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index d84559e..f407bbf 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -515,15 +515,15 @@ struct s390_io_adapter {
 #define S390_ARCH_FAC_MASK_SIZE_U64 \
(S390_ARCH_FAC_MASK_SIZE_BYTE / sizeof(u64))
 
-struct s390_model_fac {
-   /* facilities used in SIE context */
-   __u64 sie[S390_ARCH_FAC_LIST_SIZE_U64];
-   /* subset enabled by kvm */
-   __u64 kvm[S390_ARCH_FAC_LIST_SIZE_U64];
+struct kvm_s390_fac {
+   /* facility list requested by guest */
+   __u64 list[S390_ARCH_FAC_LIST_SIZE_U64];
+   /* facility mask supported by kvm & hosting machine */
+   __u64 mask[S390_ARCH_FAC_LIST_SIZE_U64];
 };
 
 struct kvm_s390_cpu_model {
-   struct s390_model_fac *fac;
+   struct kvm_s390_fac *fac;
struct cpuid cpu_id;
unsigned short ibc;
 };
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 76894c8..5a02be4 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -522,7 +522,7 @@ static int kvm_s390_set_processor(struct kvm *kvm, struct 
kvm_device_attr *attr)
memcpy(&kvm->arch.model.cpu_id, &proc->cpuid,
   sizeof(struct cpuid));
kvm->arch.model.ibc = proc->ibc;
-   memcpy(kvm->arch.model.fac->kvm, proc->fac_list,
+   memcpy(kvm->arch.model.fac->list, proc->fac_list,
   S390_ARCH_FAC_LIST_SIZE_BYTE);
} else
ret = -EFAULT;
@@ -556,7 +556,7 @@ static int kvm_s390_get_processor(struct kvm *kvm, struct 
kvm_device_attr *attr)
}
memcpy(&proc->cpuid, &kvm->arch.model.cpu_id, sizeof(struct cpuid));
proc->ibc = kvm->arch.model.ibc;
-   memcpy(&proc->fac_list, kvm->arch.model.fac->kvm, 
S390_ARCH_FAC_LIST_SIZE_BYTE);
+   memcpy(&proc->fac_list, kvm->arch.model.fac->list, 
S390_ARCH_FAC_LIST_SIZE_BYTE);
if (copy_to_user((void __user *)attr->addr, proc, sizeof(*proc)))
ret = -EFAULT;
kfree(proc);
@@ -576,8 +576,8 @@ static int kvm_s390_get_machine(struct kvm *kvm, struct 
kvm_device_attr *attr)
}
get_cpu_id((struct cpuid *) &mach->cpuid);
mach->ibc = sclp_get_ibc();
-   memcpy(&mach->fac_mask, kvm_s390_fac_list_mask,
-  kvm_s390_fac_list_mask_size() * sizeof(u64));
+   memcpy(&mach->fac_mask, kvm->arch.model.fac->mask,
+  S390_ARCH_FAC_LIST_SIZE_BYTE);
memcpy((unsigned long *)&mach->fac_list, S390_lowcore.stfle_fac_list,
   S390_ARCH_FAC_LIST_SIZE_BYTE);
if (copy_to_user((void __user *)attr->addr, mach, sizeof(*mach)))
@@ -893,16 +893,16 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
/*
 * The architectural maximum amount of facilities is 16 kbit. To store
 * this amount, 2 kbyte of memory is required. Thus we need a full
-* page to hold the active copy (arch.model.fac->sie) and the current
-* facilities set (arch.model.fac->kvm). Its address size has to be
+* page to hold the guest facility list (arch.model.fac->list) and the
+* facility mask (arch.model.fac->mask). Its address size has to be
 * 31 bits and word aligned.
 */
kvm->arch.model.fac =
-   (struct s390_model_fac *) get_zeroed_page(GFP_KERNEL | GFP_DMA);
+   (struct kvm_s390_fac *) get_zeroed_page(GFP_KERNEL | GFP_DMA);
if (!kvm->arch.model.fac)
goto out_nofac;
 
-   memcpy(kvm->arch.model.fac->kvm, S390_lowcore.stfle_fac_list,
+   memcpy(kvm->arch.model.fac->mask, S390_lowcore.s

[GIT PULL 3/5] KVM: s390: fix in memory copy of facility lists

2015-03-04 Thread Christian Borntraeger
From: Michael Mueller 

The facility lists were not fully copied.

Signed-off-by: Michael Mueller 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/kvm/kvm-s390.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 18965f9..76894c8 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -579,7 +579,7 @@ static int kvm_s390_get_machine(struct kvm *kvm, struct 
kvm_device_attr *attr)
memcpy(&mach->fac_mask, kvm_s390_fac_list_mask,
   kvm_s390_fac_list_mask_size() * sizeof(u64));
memcpy((unsigned long *)&mach->fac_list, S390_lowcore.stfle_fac_list,
-  S390_ARCH_FAC_LIST_SIZE_U64);
+  S390_ARCH_FAC_LIST_SIZE_BYTE);
if (copy_to_user((void __user *)attr->addr, mach, sizeof(*mach)))
ret = -EFAULT;
kfree(mach);
@@ -903,7 +903,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
goto out_nofac;
 
memcpy(kvm->arch.model.fac->kvm, S390_lowcore.stfle_fac_list,
-  S390_ARCH_FAC_LIST_SIZE_U64);
+  S390_ARCH_FAC_LIST_SIZE_BYTE);
 
/*
 * If this KVM host runs *not* in a LPAR, relax the facility bits
-- 
2.3.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL 0/5] KVM: s390: Fixups for changes in merge window for 4.0

2015-03-04 Thread Christian Borntraeger
Am 04.03.2015 um 12:41 schrieb Christian Borntraeger:
> Paolo, Marcelo,
> 
> here is a pull request for kvm/master targetting 4.0. It is based on
> 4.0-rc1 as I want all patches in an upcoming pull request against 
> kvm/next as well to avoid nasty merge conflicts with other changes
> that we have pending. So my next pull request for kvm/next will
> be based on fb5bf93f84c277546473, which is the HEAD of this pull
> request.

Or just against 4.0-rc1 thus having 5 identical commits.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-03-04 Thread Ard Biesheuvel
On 4 March 2015 at 12:35, Catalin Marinas  wrote:
> (please try to avoid top-posting)
>
> On Mon, Mar 02, 2015 at 06:20:19PM -0800, Mario Smarduch wrote:
>> On 03/02/2015 08:31 AM, Christoffer Dall wrote:
>> > However, my concern with these patches are on two points:
>> >
>> > 1. It's not a fix-all.  We still have the case where the guest expects
>> > the behavior of device memory (for strong ordering for example) on a RAM
>> > region, which we now break.  Similiarly this doesn't support the
>> > non-coherent DMA to RAM region case.
>> >
>> > 2. While the code is probably as nice as this kind of stuff gets, it
>> > is non-trivial and extremely difficult to debug.  The counter-point here
>> > is that we may end up handling other stuff at EL2 for performanc reasons
>> > in the future.
>> >
>> > Mainly because of point 1 above, I am leaning to thinking userspace
>> > should do the invalidation when it knows it needs to, either through KVM
>> > via a memslot flag or through some other syscall mechanism.
>
> I expressed my concerns as well, I'm definitely against merging this
> series.
>

Don't worry, that was never the intention, at least not as-is :-)

I think we have established that the performance hit is not the
problem but the correctness is.

I do have a remaining question, though: my original [non-working]
approach was to replace uncached mappings with write-through
read-allocate write-allocate, which I expected would keep the caches
in sync with main memory, but apparently I am misunderstanding
something here. (This is the reason for s/0xbb/0xff/ in patch #2 to
get it to work: it replaces WT/RA/WA with WB/RA/WA)

Is there no way to use write-through caching here?

>> I don't understand how can the CPU handle different cache attributes
>> used by QEMU and Guest won't you run into B2.9 checklist? Wouldn't
>> cache evictions or cleans wipe out guest updates to same cache
>> line(s)?
>
> "Clean+invalidate" is a safe operation even if the guest accesses the
> memory in a cacheable way. But if the guest can update the cache lines,
> Qemu should avoid cache maintenance from a performance perspective.
>
> The guest is either told that the DMA is coherent (via DT properties) or
> Qemu deals with (non-)coherency itself. The latter is fully in line with
> the B2.9 chapter in the ARM ARM, more precisely point 5:
>
>   If the mismatched attributes for a memory location all assign the same
>   shareability attribute to the location, any loss of uniprocessor
>   semantics or coherency within a shareability domain can be avoided by
>   use of software cache management.
>
> ... it continues with what kind of cache maintenance is required,
> together with:
>
>   A clean and invalidate instruction can be used instead of a clean
>   instruction, or instead of an invalidate instruction.
>
> --
> Catalin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked

2015-03-04 Thread Marcelo Tosatti
On Mon, Mar 02, 2015 at 01:36:51PM +, Wu, Feng wrote:
> 
> 
> > -Original Message-
> > From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
> > Sent: Friday, February 27, 2015 7:41 AM
> > To: Wu, Feng
> > Cc: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org;
> > g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org;
> > j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com;
> > eric.au...@linaro.org; linux-ker...@vger.kernel.org;
> > io...@lists.linux-foundation.org; kvm@vger.kernel.org
> > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> > is blocked
> > 
> > On Fri, Dec 12, 2014 at 11:14:58PM +0800, Feng Wu wrote:
> > > This patch updates the Posted-Interrupts Descriptor when vCPU
> > > is blocked.
> > >
> > > pre-block:
> > > - Add the vCPU to the blocked per-CPU list
> > > - Clear 'SN'
> > > - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> > >
> > > post-block:
> > > - Remove the vCPU from the per-CPU list
> > >
> > > Signed-off-by: Feng Wu 
> > > ---
> > >  arch/x86/include/asm/kvm_host.h |  2 +
> > >  arch/x86/kvm/vmx.c  | 96
> > +
> > >  arch/x86/kvm/x86.c  | 22 +++---
> > >  include/linux/kvm_host.h|  4 ++
> > >  virt/kvm/kvm_main.c |  6 +++
> > >  5 files changed, 123 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/kvm_host.h
> > b/arch/x86/include/asm/kvm_host.h
> > > index 13e3e40..32c110a 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -101,6 +101,8 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t
> > base_gfn, int level)
> > >
> > >  #define ASYNC_PF_PER_VCPU 64
> > >
> > > +extern void (*wakeup_handler_callback)(void);
> > > +
> > >  enum kvm_reg {
> > >   VCPU_REGS_RAX = 0,
> > >   VCPU_REGS_RCX = 1,
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > index bf2e6cd..a1c83a2 100644
> > > --- a/arch/x86/kvm/vmx.c
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -832,6 +832,13 @@ static DEFINE_PER_CPU(struct vmcs *,
> > current_vmcs);
> > >  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
> > >  static DEFINE_PER_CPU(struct desc_ptr, host_gdt);
> > >
> > > +/*
> > > + * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
> > > + * can find which vCPU should be waken up.
> > > + */
> > > +static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
> > > +static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
> > > +
> > >  static unsigned long *vmx_io_bitmap_a;
> > >  static unsigned long *vmx_io_bitmap_b;
> > >  static unsigned long *vmx_msr_bitmap_legacy;
> > > @@ -1921,6 +1928,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu,
> > int cpu)
> > >   struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > >   struct pi_desc old, new;
> > >   unsigned int dest;
> > > + unsigned long flags;
> > >
> > >   memset(&old, 0, sizeof(old));
> > >   memset(&new, 0, sizeof(new));
> > > @@ -1942,6 +1950,20 @@ static void vmx_vcpu_load(struct kvm_vcpu
> > *vcpu, int cpu)
> > >   new.nv = POSTED_INTR_VECTOR;
> > >   } while (cmpxchg(&pi_desc->control, old.control,
> > >   new.control) != old.control);
> > > +
> > > + /*
> > > +  * Delete the vCPU from the related wakeup queue
> > > +  * if we are resuming from blocked state
> > > +  */
> > > + if (vcpu->blocked) {
> > > + vcpu->blocked = false;
> > > + spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > + vcpu->wakeup_cpu), flags);
> > > + list_del(&vcpu->blocked_vcpu_list);
> > > + 
> > > spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > + vcpu->wakeup_cpu), flags);
> > > + vcpu->wakeup_cpu = -1;
> > > + }
> > >   }
> > >  }
> > >
> > > @@ -1950,6 +1972,9 @@ static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
> > >   if (irq_remapping_cap(IRQ_POSTING_CAP)) {
> > >   struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > >   struct pi_desc old, new;
> > > + unsigned long flags;
> > > + int cpu;
> > > + struct cpumask cpu_others_mask;
> > >
> > >   memset(&old, 0, sizeof(old));
> > >   memset(&new, 0, sizeof(new));
> > > @@ -1961,6 +1986,54 @@ static void vmx_vcpu_put(struct kvm_vcpu
> > *vcpu)
> > >   pi_set_sn(&new);
> > >   } while (cmpxchg(&pi_desc->control, old.control,
> > >   new.control) != old.control);
> > > + } else if (vcpu->blocked) {
> > > + /*
> > > +  * The vcpu is blocked on the wait queue.
> > > +  * Store the blocked vCPU on the list of the
> > > +  * vcpu->wakeup_cpu, which is

Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-03-04 Thread Catalin Marinas
On Wed, Mar 04, 2015 at 12:50:57PM +0100, Ard Biesheuvel wrote:
> On 4 March 2015 at 12:35, Catalin Marinas  wrote:
> > On Mon, Mar 02, 2015 at 06:20:19PM -0800, Mario Smarduch wrote:
> >> On 03/02/2015 08:31 AM, Christoffer Dall wrote:
> >> > However, my concern with these patches are on two points:
> >> >
> >> > 1. It's not a fix-all.  We still have the case where the guest expects
> >> > the behavior of device memory (for strong ordering for example) on a RAM
> >> > region, which we now break.  Similiarly this doesn't support the
> >> > non-coherent DMA to RAM region case.
> >> >
> >> > 2. While the code is probably as nice as this kind of stuff gets, it
> >> > is non-trivial and extremely difficult to debug.  The counter-point here
> >> > is that we may end up handling other stuff at EL2 for performanc reasons
> >> > in the future.
> >> >
> >> > Mainly because of point 1 above, I am leaning to thinking userspace
> >> > should do the invalidation when it knows it needs to, either through KVM
> >> > via a memslot flag or through some other syscall mechanism.
> >
> > I expressed my concerns as well, I'm definitely against merging this
> > series.
> 
> Don't worry, that was never the intention, at least not as-is :-)

I wasn't worried, just wanted to make my position clearer ;).

> I think we have established that the performance hit is not the
> problem but the correctness is.

I haven't looked at the performance figures but has anyone assessed the
hit caused by doing cache maintenance in Qemu vs cacheable guest
accesses (and no maintenance)?

> I do have a remaining question, though: my original [non-working]
> approach was to replace uncached mappings with write-through
> read-allocate write-allocate,

Does it make sense to have write-through and write-allocate at the same
time? The write-allocate hint would probably be ignored as write-through
writes do not generate linefills.

> which I expected would keep the caches
> in sync with main memory, but apparently I am misunderstanding
> something here. (This is the reason for s/0xbb/0xff/ in patch #2 to
> get it to work: it replaces WT/RA/WA with WB/RA/WA)
> 
> Is there no way to use write-through caching here?

Write-through is considered non-cacheable from a write perspective when
it does not hit in the cache. AFAIK, it should still be able to hit
existing cache lines and evict. The ARM ARM states that cache cleaning
to _PoU_ is not required for coherency when the writes are to
write-through memory but I have to dig further into the PoC because
that's what we care about here.

What platform did you test it on? I can't tell what the behaviour of
system caches is. I know they intercept explicit cache maintenance by VA
but not sure what happens to write-through writes when they hit in the
system cache (are they evicted to RAM or not?). If such write-through
writes are only evicted to the point-of-unification, they won't work
since non-cacheable accesses go all the way to PoC.

I need to do more reading through the ARM ARM, it should be hidden
somewhere ;).

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-03-04 Thread Ard Biesheuvel
On 4 March 2015 at 13:29, Catalin Marinas  wrote:
> On Wed, Mar 04, 2015 at 12:50:57PM +0100, Ard Biesheuvel wrote:
>> On 4 March 2015 at 12:35, Catalin Marinas  wrote:
>> > On Mon, Mar 02, 2015 at 06:20:19PM -0800, Mario Smarduch wrote:
>> >> On 03/02/2015 08:31 AM, Christoffer Dall wrote:
>> >> > However, my concern with these patches are on two points:
>> >> >
>> >> > 1. It's not a fix-all.  We still have the case where the guest expects
>> >> > the behavior of device memory (for strong ordering for example) on a RAM
>> >> > region, which we now break.  Similiarly this doesn't support the
>> >> > non-coherent DMA to RAM region case.
>> >> >
>> >> > 2. While the code is probably as nice as this kind of stuff gets, it
>> >> > is non-trivial and extremely difficult to debug.  The counter-point here
>> >> > is that we may end up handling other stuff at EL2 for performanc reasons
>> >> > in the future.
>> >> >
>> >> > Mainly because of point 1 above, I am leaning to thinking userspace
>> >> > should do the invalidation when it knows it needs to, either through KVM
>> >> > via a memslot flag or through some other syscall mechanism.
>> >
>> > I expressed my concerns as well, I'm definitely against merging this
>> > series.
>>
>> Don't worry, that was never the intention, at least not as-is :-)
>
> I wasn't worried, just wanted to make my position clearer ;).
>
>> I think we have established that the performance hit is not the
>> problem but the correctness is.
>
> I haven't looked at the performance figures but has anyone assessed the
> hit caused by doing cache maintenance in Qemu vs cacheable guest
> accesses (and no maintenance)?
>

No, I don't think so. The performance hit I am referring to is the
performance hit caused by leaving the trapping of VM system register
writes enabled all the time, so that writes to MAIR_EL1 are always
caught. This is why patch #1 implements some of the sysreg write
handling in EL2

>> I do have a remaining question, though: my original [non-working]
>> approach was to replace uncached mappings with write-through
>> read-allocate write-allocate,
>
> Does it make sense to have write-through and write-allocate at the same
> time? The write-allocate hint would probably be ignored as write-through
> writes do not generate linefills.
>

OK, that answers my question then. The availability of a
write-allocate setting on write-through attributes suggested to me
that writes would go to both the cache and main memory, so that the
write-back cached attribute the host is using for the same memory
would not result in it reading stale data.

>> which I expected would keep the caches
>> in sync with main memory, but apparently I am misunderstanding
>> something here. (This is the reason for s/0xbb/0xff/ in patch #2 to
>> get it to work: it replaces WT/RA/WA with WB/RA/WA)
>>
>> Is there no way to use write-through caching here?
>
> Write-through is considered non-cacheable from a write perspective when
> it does not hit in the cache. AFAIK, it should still be able to hit
> existing cache lines and evict. The ARM ARM states that cache cleaning
> to _PoU_ is not required for coherency when the writes are to
> write-through memory but I have to dig further into the PoC because
> that's what we care about here.
>
> What platform did you test it on? I can't tell what the behaviour of
> system caches is. I know they intercept explicit cache maintenance by VA
> but not sure what happens to write-through writes when they hit in the
> system cache (are they evicted to RAM or not?). If such write-through
> writes are only evicted to the point-of-unification, they won't work
> since non-cacheable accesses go all the way to PoC.
>

This was tested on APM, by Drew and Laszlo (thanks guys)

I have recently received a Seattle myself, but I haven't had time yet
to test these patches myself.

> I need to do more reading through the ARM ARM,

If you say so :-)

> it should be hidden
> somewhere ;).
>

-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-03-04 Thread Andrew Jones
On Wed, Mar 04, 2015 at 01:43:02PM +0100, Ard Biesheuvel wrote:
> On 4 March 2015 at 13:29, Catalin Marinas  wrote:
> > On Wed, Mar 04, 2015 at 12:50:57PM +0100, Ard Biesheuvel wrote:
> >> On 4 March 2015 at 12:35, Catalin Marinas  wrote:
> >> > On Mon, Mar 02, 2015 at 06:20:19PM -0800, Mario Smarduch wrote:
> >> >> On 03/02/2015 08:31 AM, Christoffer Dall wrote:
> >> >> > However, my concern with these patches are on two points:
> >> >> >
> >> >> > 1. It's not a fix-all.  We still have the case where the guest expects
> >> >> > the behavior of device memory (for strong ordering for example) on a 
> >> >> > RAM
> >> >> > region, which we now break.  Similiarly this doesn't support the
> >> >> > non-coherent DMA to RAM region case.
> >> >> >
> >> >> > 2. While the code is probably as nice as this kind of stuff gets, it
> >> >> > is non-trivial and extremely difficult to debug.  The counter-point 
> >> >> > here
> >> >> > is that we may end up handling other stuff at EL2 for performanc 
> >> >> > reasons
> >> >> > in the future.
> >> >> >
> >> >> > Mainly because of point 1 above, I am leaning to thinking userspace
> >> >> > should do the invalidation when it knows it needs to, either through 
> >> >> > KVM
> >> >> > via a memslot flag or through some other syscall mechanism.
> >> >
> >> > I expressed my concerns as well, I'm definitely against merging this
> >> > series.
> >>
> >> Don't worry, that was never the intention, at least not as-is :-)
> >
> > I wasn't worried, just wanted to make my position clearer ;).
> >
> >> I think we have established that the performance hit is not the
> >> problem but the correctness is.
> >
> > I haven't looked at the performance figures but has anyone assessed the
> > hit caused by doing cache maintenance in Qemu vs cacheable guest
> > accesses (and no maintenance)?
> >

I'm working on a PoC of a QEMU/KVM cache maintenance approach now.
Hopefully I'll send it out this evening. Tomorrow at the latest.
Getting numbers of that approach vs. a guest's use of cached memory
for devices would take a decent amount of additional work, so won't
be part of that post. I'm actually not sure we should care about
the numbers for a guest using normal mem attributes for device
memory - other than out of curiosity. For correctness this issue
really needs to be solved 100% host-side. We can't rely on a
guest to do different/weird things, just because it's a guest.
Ideally guests don't even know that they're guests. (Even if we
describe the memory as cache-able to the guest, I don't think we
can rely on the guest believing us.)

drew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/6] target-arm: kvm: save/restore mp state

2015-03-04 Thread Alex Bennée
This adds the saving and restore of the current Multi-Processing state
of the machine. While the KVM_GET/SET_MP_STATE API exposes a number of
potential states for x86 we only use two for ARM. Either the process is
running or not. We then save this state into the cpu_powered TCG state
to avoid changing the serialisation format.

Signed-off-by: Alex Bennée 

---
v2
  - make mpstate field runtime dependant (kvm_enabled())
  - drop initial KVM_CAP_MP_STATE requirement
  - re-use cpu_powered instead of new field

diff --git a/target-arm/machine.c b/target-arm/machine.c
index 9446e5a..185f9a2 100644
--- a/target-arm/machine.c
+++ b/target-arm/machine.c
@@ -161,6 +161,7 @@ static const VMStateInfo vmstate_cpsr = {
 .put = put_cpsr,
 };
 
+
 static void cpu_pre_save(void *opaque)
 {
 ARMCPU *cpu = opaque;
@@ -170,6 +171,20 @@ static void cpu_pre_save(void *opaque)
 /* This should never fail */
 abort();
 }
+#if defined CONFIG_KVM
+if (kvm_check_extension(CPU(cpu)->kvm_state, KVM_CAP_MP_STATE)) {
+struct kvm_mp_state mp_state;
+int ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_MP_STATE, &mp_state);
+if (ret) {
+fprintf(stderr, "%s: failed to get MP_STATE %d/%s\n",
+__func__, ret, strerror(ret));
+abort();
+}
+cpu->powered_off =
+(mp_state.mp_state == KVM_MP_STATE_RUNNABLE)
+? false : true;
+}
+#endif
 } else {
 if (!write_cpustate_to_list(cpu)) {
 /* This should never fail. */
@@ -222,6 +237,20 @@ static int cpu_post_load(void *opaque, int version_id)
  * we're using it.
  */
 write_list_to_cpustate(cpu);
+#if defined CONFIG_KVM
+if (kvm_check_extension(CPU(cpu)->kvm_state, KVM_CAP_MP_STATE)) {
+struct kvm_mp_state mp_state = {
+.mp_state =
+cpu->powered_off ? KVM_MP_STATE_HALTED : KVM_MP_STATE_RUNNABLE
+};
+int ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MP_STATE, &mp_state);
+if (ret) {
+fprintf(stderr, "%s: failed to set MP_STATE %d/%s\n",
+__func__, ret, strerror(ret));
+return -1;
+}
+}
+#endif
 } else {
 if (!write_list_to_cpustate(cpu)) {
 return -1;
-- 
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/6] QEMU ARM64 Migration Fixes

2015-03-04 Thread Alex Bennée
This is an update to the series I posted last week addressing some of
the comments so far.

The main changes to this series are:

v2

  - Save/Restore MP STATE
- no longer needs CAP_MP_STATE at start
- re-uses cpu->powered_off for storing state (no stream ABI change)
- kvm_enabled() runtime check (although ioctl still in #if defined)
  - Save/Restore SPSR
- use the correct bank_number for aarch32
- only tweak SPSR for elevated exception levels
  - arm_giv_kvm
- add Christoffer's Acked-by:

The only question up in the air at the moment is defining a new
constant for the MP_STATE to represent powered off which I'm going to
look at when I re-spin the kernel series.

Branch: https://github.com/stsquad/qemu/tree/migration/fixes-v3

Alex Bennée (5):
  target-arm: kvm: save/restore mp state
  hw/intc: arm_gic_kvm.c restore config first
  hw/char: pl011 don't keep setting the IRQ if nothing changed
  target-arm: kvm64 sync FP register state
  target-arm: cpu.h document why env->spsr exists

Christoffer Dall (1):
  target-arm: kvm64 fix save/restore of SPSR regs

 hw/char/pl011.c   |  12 --
 hw/intc/arm_gic_kvm.c |   7 +++-
 target-arm/cpu.h  |   5 +++
 target-arm/kvm64.c| 109 +++---
 target-arm/machine.c  |  29 ++
 5 files changed, 151 insertions(+), 11 deletions(-)

-- 
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-03-04 Thread Peter Maydell
On 4 March 2015 at 23:29, Catalin Marinas  wrote:
> I disagree it is 100% a host-side issue. It is a host-side issue _if_
> the host tells the guest that the (virtual) device is non-coherent (or,
> more precisely, it does not explicitly tell the guest that the device is
> coherent). If the guest thinks the (virtual) device is non-coherent
> because of information passed by the host, I fully agree that the host
> needs to manage the cache coherency.
>
> However, the host could also pass a "dma-coherent" property in the DT
> given to the guest and avoid any form of cache maintenance. If the guest
> does not honour such coherency property, it's a guest problem and it
> needs fixing in the guest. This isn't any different from a real physical
> device behaviour.

Right, and we should do that for things like virtio, because we want
the performance. But we also have devices (like vga framebuffers)
which shouldn't be handled as cacheable, so we need to be able to
deal with both situations.

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/6] hw/char: pl011 don't keep setting the IRQ if nothing changed

2015-03-04 Thread Alex Bennée
While observing KVM traces I can see additional IRQ calls on pretty much
every MMIO access which is just plain inefficient. Only update the QEMU
IRQ level if something has actually changed from last time. Otherwise we
may be papering over other failure modes.

Signed-off-by: Alex Bennée 

diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index 0a45115..bb554bc 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -36,6 +36,9 @@ typedef struct PL011State {
 CharDriverState *chr;
 qemu_irq irq;
 const unsigned char *id;
+
+/* not serialised, prevents pl011_update doing extra set_irqs */
+uint32_t current_irq;
 } PL011State;
 
 #define PL011_INT_TX 0x20
@@ -53,10 +56,11 @@ static const unsigned char pl011_id_luminary[8] =
 
 static void pl011_update(PL011State *s)
 {
-uint32_t flags;
-
-flags = s->int_level & s->int_enabled;
-qemu_set_irq(s->irq, flags != 0);
+uint32_t flags = s->int_level & s->int_enabled;
+if (flags != s->current_irq) {
+s->current_irq = flags;
+qemu_set_irq(s->irq, s->current_irq != 0);
+}
 }
 
 static uint64_t pl011_read(void *opaque, hwaddr offset,
-- 
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-03-04 Thread Catalin Marinas
On Wed, Mar 04, 2015 at 03:12:12PM +0100, Andrew Jones wrote:
> On Wed, Mar 04, 2015 at 01:43:02PM +0100, Ard Biesheuvel wrote:
> > On 4 March 2015 at 13:29, Catalin Marinas  wrote:
> > > On Wed, Mar 04, 2015 at 12:50:57PM +0100, Ard Biesheuvel wrote:
> > >> I think we have established that the performance hit is not the
> > >> problem but the correctness is.
> > >
> > > I haven't looked at the performance figures but has anyone assessed the
> > > hit caused by doing cache maintenance in Qemu vs cacheable guest
> > > accesses (and no maintenance)?
> 
> I'm working on a PoC of a QEMU/KVM cache maintenance approach now.
> Hopefully I'll send it out this evening. Tomorrow at the latest.
> Getting numbers of that approach vs. a guest's use of cached memory
> for devices would take a decent amount of additional work, so won't
> be part of that post.

OK.

> I'm actually not sure we should care about
> the numbers for a guest using normal mem attributes for device
> memory - other than out of curiosity. For correctness this issue
> really needs to be solved 100% host-side. We can't rely on a
> guest to do different/weird things, just because it's a guest.
> Ideally guests don't even know that they're guests. (Even if we
> describe the memory as cache-able to the guest, I don't think we
> can rely on the guest believing us.)

I disagree it is 100% a host-side issue. It is a host-side issue _if_
the host tells the guest that the (virtual) device is non-coherent (or,
more precisely, it does not explicitly tell the guest that the device is
coherent). If the guest thinks the (virtual) device is non-coherent
because of information passed by the host, I fully agree that the host
needs to manage the cache coherency.

However, the host could also pass a "dma-coherent" property in the DT
given to the guest and avoid any form of cache maintenance. If the guest
does not honour such coherency property, it's a guest problem and it
needs fixing in the guest. This isn't any different from a real physical
device behaviour.

(there are counter arguments for the latter as well like emulating old
platforms that never had coherency but from a performance/production
perspective, I strongly recommend that guests are passed the
"dma-coherent" property for such virtual devices)

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/6] target-arm: kvm64 fix save/restore of SPSR regs

2015-03-04 Thread Alex Bennée
From: Christoffer Dall 

The current code was negatively indexing the cpu state array and not
synchronizing banked spsr register state with the current mode's spsr
state, causing occasional failures with migration.

Some munging is done to take care of the aarch64 mapping and also to
ensure the most current value of the spsr is updated to the banked
registers (relevant for KVM<->TCG migration).

Signed-off-by: Christoffer Dall 
Signed-off-by: Alex Bennée 

---
v2 (ajb)
  - minor tweaks and clarifications
v3
  - Use the correct bank index function for setting/getting env->spsr
  - only deal with spsrs in elevated exception levels

diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
index c60e989..45e5c3f 100644
--- a/target-arm/kvm64.c
+++ b/target-arm/kvm64.c
@@ -140,6 +140,7 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 uint64_t val;
 int i;
 int ret;
+unsigned int el;
 
 ARMCPU *cpu = ARM_CPU(cs);
 CPUARMState *env = &cpu->env;
@@ -206,9 +207,27 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 return ret;
 }
 
+/* Saved Program State Registers
+ *
+ * Before we restore from the banked_spsr[] array we need to
+ * ensure that any modifications to env->spsr are correctly
+ * reflected and map aarch64 exception levels if required.
+ */
+el = arm_current_el(env);
+if (el > 0) {
+if (is_a64(env)) {
+g_assert(el == 1);
+/* KVM only maps KVM_SPSR_SVC to KVM_SPSR_EL1 for aarch64 ATM */
+env->banked_spsr[1] = env->banked_spsr[0];
+} else {
+i = bank_number(env->uncached_cpsr & CPSR_M);
+env->banked_spsr[i] = env->spsr;
+}
+}
+
 for (i = 0; i < KVM_NR_SPSR; i++) {
 reg.id = AARCH64_CORE_REG(spsr[i]);
-reg.addr = (uintptr_t) &env->banked_spsr[i - 1];
+reg.addr = (uintptr_t) &env->banked_spsr[i+1];
 ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
 if (ret) {
 return ret;
@@ -253,6 +272,7 @@ int kvm_arch_get_registers(CPUState *cs)
 struct kvm_one_reg reg;
 uint64_t val;
 uint32_t fpr;
+unsigned int el;
 int i;
 int ret;
 
@@ -325,15 +345,35 @@ int kvm_arch_get_registers(CPUState *cs)
 return ret;
 }
 
+/* Fetch the SPSR registers
+ *
+ * KVM has an array of state indexed for all the possible aarch32
+ * privilage levels. Although not all are valid at all points
+ * there are some transitions possible which can access old state
+ * so it is worth keeping them all.
+ */
 for (i = 0; i < KVM_NR_SPSR; i++) {
 reg.id = AARCH64_CORE_REG(spsr[i]);
-reg.addr = (uintptr_t) &env->banked_spsr[i - 1];
+reg.addr = (uintptr_t) &env->banked_spsr[i+1];
 ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ®);
 if (ret) {
 return ret;
 }
 }
 
+el = arm_current_el(env);
+if (el > 0) {
+if (is_a64(env)) {
+g_assert(el == 1);
+/* KVM maps KVM_SPSR_SVC to KVM_SPSR_EL1 for aarch64 */
+env->banked_spsr[0] = env->banked_spsr[1];
+i = aarch64_banked_spsr_index(el);
+} else {
+i = bank_number(env->uncached_cpsr & CPSR_M);
+}
+env->spsr = env->banked_spsr[i];
+}
+
 /* Advanced SIMD and FP registers */
 for (i = 0; i < 32; i++) {
 reg.id = AARCH64_SIMD_CORE_REG(fp_regs.vregs[i]);
-- 
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/6] hw/intc: arm_gic_kvm.c restore config first

2015-03-04 Thread Alex Bennée
As there is logic to deal with the difference between edge and level
triggered interrupts in the kernel we must ensure it knows the
configuration of the IRQs before we restore the pending state.

Signed-off-by: Alex Bennée 
Acked-by: Christoffer Dall 

diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c
index 1ad3eb0..2f21ae7 100644
--- a/hw/intc/arm_gic_kvm.c
+++ b/hw/intc/arm_gic_kvm.c
@@ -370,6 +370,11 @@ static void kvm_arm_gic_put(GICState *s)
  * the appropriate CPU interfaces in the kernel) */
 kvm_dist_put(s, 0x800, 8, s->num_irq, translate_targets);
 
+/* irq_state[n].trigger -> GICD_ICFGRn
+ * (restore targets before pending IRQs so we treat level/edge
+ * correctly */
+kvm_dist_put(s, 0xc00, 2, s->num_irq, translate_trigger);
+
 /* irq_state[n].pending + irq_state[n].level -> GICD_ISPENDRn */
 kvm_dist_put(s, 0x280, 1, s->num_irq, translate_clear);
 kvm_dist_put(s, 0x200, 1, s->num_irq, translate_pending);
@@ -378,8 +383,6 @@ static void kvm_arm_gic_put(GICState *s)
 kvm_dist_put(s, 0x380, 1, s->num_irq, translate_clear);
 kvm_dist_put(s, 0x300, 1, s->num_irq, translate_active);
 
-/* irq_state[n].trigger -> GICD_ICFRn */
-kvm_dist_put(s, 0xc00, 2, s->num_irq, translate_trigger);
 
 /* s->priorityX[irq] -> ICD_IPRIORITYRn */
 kvm_dist_put(s, 0x400, 8, s->num_irq, translate_priority);
-- 
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 6/6] target-arm: cpu.h document why env->spsr exists

2015-03-04 Thread Alex Bennée
I was getting very confused about the duplication of state. Perhaps we
should just get rid of env->spsr and use helpers that understand the
banking?

Signed-off-by: Alex Bennée 

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 11845a6..d7fd13f 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -155,6 +155,11 @@ typedef struct CPUARMState {
This contains all the other bits.  Use cpsr_{read,write} to access
the whole CPSR.  */
 uint32_t uncached_cpsr;
+/* The spsr is a alias for spsr_elN where N is the current
+ * exception level. It is provided for here so the TCG msr/mrs
+ * implementation can access one register. Care needs to be taken
+ * to ensure the banked_spsr[] is also updated.
+ */
 uint32_t spsr;
 
 /* Banked registers.  */
-- 
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/6] target-arm: kvm64 sync FP register state

2015-03-04 Thread Alex Bennée
For migration to work we need to sync all of the register state. This is
especially noticeable when GCC starts using FP registers as spill
registers even with integer programs.

Signed-off-by: Alex Bennée 

diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
index 8cf3a62..c60e989 100644
--- a/target-arm/kvm64.c
+++ b/target-arm/kvm64.c
@@ -126,9 +126,17 @@ bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx)
 #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
  KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
 
+/* The linux headers don't define a 128 bit wide SIMD macro for us */
+#define AARCH64_SIMD_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U128 | \
+ KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
+
+#define AARCH64_SIMD_CTRL_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U32 | \
+ KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
+
 int kvm_arch_put_registers(CPUState *cs, int level)
 {
 struct kvm_one_reg reg;
+uint32_t fpr;
 uint64_t val;
 int i;
 int ret;
@@ -207,13 +215,36 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 }
 }
 
+/* Advanced SIMD and FP registers */
+for (i = 0; i < 32; i++) {
+reg.id = AARCH64_SIMD_CORE_REG(fp_regs.vregs[i]);
+reg.addr = (uintptr_t)(&env->vfp.regs[i]);
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
+if (ret) {
+return ret;
+}
+reg.id++;
+}
+
+reg.addr = (uintptr_t)(&fpr);
+fpr = vfp_get_fpsr(env);
+reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpsr);
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
+if (ret) {
+return ret;
+}
+
+fpr = vfp_get_fpcr(env);
+reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpcr);
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
+if (ret) {
+return ret;
+}
+
 if (!write_list_to_kvmstate(cpu)) {
 return EINVAL;
 }
 
-/* TODO:
- * FP state
- */
 return ret;
 }
 
@@ -221,6 +252,7 @@ int kvm_arch_get_registers(CPUState *cs)
 {
 struct kvm_one_reg reg;
 uint64_t val;
+uint32_t fpr;
 int i;
 int ret;
 
@@ -302,9 +334,36 @@ int kvm_arch_get_registers(CPUState *cs)
 }
 }
 
+/* Advanced SIMD and FP registers */
+for (i = 0; i < 32; i++) {
+reg.id = AARCH64_SIMD_CORE_REG(fp_regs.vregs[i]);
+reg.addr = (uintptr_t)(&env->vfp.regs[i]);
+ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ®);
+if (ret) {
+return ret;
+}
+reg.id++;
+}
+
+reg.addr = (uintptr_t)(&fpr);
+reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpsr);
+ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ®);
+if (ret) {
+return ret;
+}
+vfp_set_fpsr(env, fpr);
+
+reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpcr);
+ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ®);
+if (ret) {
+return ret;
+}
+vfp_set_fpcr(env, fpr);
+
 if (!write_kvmstate_to_list(cpu)) {
 return EINVAL;
 }
+
 /* Note that it's OK to have registers which aren't in CPUState,
  * so we can ignore a failure return here.
  */
-- 
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 6/6] target-arm: cpu.h document why env->spsr exists

2015-03-04 Thread Peter Maydell
On 4 March 2015 at 23:35, Alex Bennée  wrote:
> I was getting very confused about the duplication of state. Perhaps we
> should just get rid of env->spsr and use helpers that understand the
> banking?

Doesn't seem worth changing the current working code to something
else that's strictly less efficient... spsr is by no means the
only banked-by-mode register, and it works just like all the others.

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "target-ppc: Create versionless CPU class per family if KVM"

2015-03-04 Thread Andreas Färber
Am 03.03.2015 um 23:14 schrieb Alexey Kardashevskiy:
> On 03/04/2015 07:43 AM, Alexander Graf wrote:
>> On 03.03.15 01:42, Alexey Kardashevskiy wrote:
>>> On 03/03/2015 12:51 AM, Alexander Graf wrote:
 On 02.03.15 14:42, Andreas Färber wrote:
> Am 02.03.2015 um 14:37 schrieb Alexander Graf:
>> On 01.03.15 01:31, Andreas Färber wrote:
>>> This reverts commit 5b79b1cadd3e565b6d1a5ba59764bd47af58b271 to
>>> avoid
>>> double-registration of types:
>>>
>>> Registering `POWER5+-powerpc64-cpu' which already exists
>>>
>>> Taking the textual description of a CPU type as part of a new type
>>> name
>>> is plain wrong, and so is unconditionally registering a new type
>>> here.
>>>
>>> Cc: Alexey Kardashevskiy 
>>> Cc: qemu-sta...@nongnu.org
>>> Signed-off-by: Andreas Färber 
>>
>> Doesn't this break p8 support?
>
> Maybe, but p5 support was in longer and this is definitely a
> regression
> and really really wrong. If you know a way to fix it without
> handing it
> back to the IBM guys for more thought, feel free to give it a shot.

 I honestly don't fully remember what this was about. Wasn't this our
 special KVM class that we use to create a compatible cpu type on the
 fly?

 Alexey, please take a look at it.
>>>
>>>
>>> I sent a note yesterday :-/ Here it is again:
>>>
>>> With this revert, running qemu with HV KVM and -cpu POWER7 fails on real
>>> POWER7 machine as my machine has pvr 003f 0201 and POWER7 is an alias of
>>> POWER7_v2.3 (pvr 003f 0203); and this is what I tried to fix at the
>>> first place. QEMU looks at classes first, and if not found - at aliases,
>>> so this worked.
>>>
>>> I would rename "POWER5+" to "POWER5+_0.0" and make "POWER5+" an alias
>>> for POWER5+_v2.1 (or POWER5+_0.0).
>>
>> Care to send a patch?
> 
> I wonder if Andreas has a better solution to my initial problem - he
> obviously won't like the proposed patch :)

Quite predictable, am I? ;)

Could you please explain in detail what problem you are seeing on POWER8
without this patch?

>From your new patch it rather sounds as if this was totally unrelated to
-cpu host and a new KVM-only feature, reinforcing my feeling that my
function is the wrong place for your code.

Also, as I pointed out, the description cannot safely be used as part of
the type name, as it may contain prohibited characters, so this still
needs fixing.

And for sure, if registering new types is indeed needed here, then a
check is needed for whether that type already exists and appropriate
error handling. I just don't understand why that is needed at all with
-cpu host taking the PVR as you say is needed here.

If you can precisely tell me what it is that you need then I'd be happy
to cook up a patch.

Regards,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu,
Graham Norton; HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


QEMU, libvirt, and KVM are participating in Outreachy May-August 2015

2015-03-04 Thread Stefan Hajnoczi
QEMU, libvirt, and KVM are participating in the Outreachy (formerly
known as Outreach Program for Women) May-August 2015 round.

Outreachy is a 12-week full-time paid internship for working on open
source projects.  We are able to participate thanks to sponsors who
will be listed on the Outreachy website in the future:
http://outreachy.org/

>From the website: Outreachy is "open to women (cis and trans), trans
men, genderqueer people, and all participants of the Ascend Project
(http://ascendproject.org/) regardless of gender".

Check out the project ideas list and details on how to apply:
http://qemu-project.org/Outreachy_2015_MayAugust

We encourage applicants to apply for both Outreachy and Google Summer
of Code if they are eligible:
http://qemu-project.org/Google_Summer_of_Code_2015

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: vmx: Set msr bitmap correctly if vcpu is in guest mode

2015-03-04 Thread Bandan Das
Hi Wincy,

Wincy Van  writes:

> In commit 3af18d9c5fe9 ("KVM: nVMX: Prepare for using hardware MSR bitmap"),
> we are setting MSR_BITMAP in prepare_vmcs02 if we should use hardware. This
> is not enough since the field will be modified by following vmx_set_efer.
>
> Fix this by setting vmx_msr_bitmap_nested in vmx_set_msr_bitmap if vcpu is
> in guest mode.
>
> Signed-off-by: Wincy Van 
> ---
>  arch/x86/kvm/vmx.c |   11 +++
>  1 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index f7b20b4..10a481b 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2168,7 +2168,10 @@ static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
>  {
>   unsigned long *msr_bitmap;
>  
> - if (irqchip_in_kernel(vcpu->kvm) && apic_x2apic_mode(vcpu->arch.apic)) {
> + if (is_guest_mode(vcpu))
> + msr_bitmap = vmx_msr_bitmap_nested;
Do you think this should be (is_guest_mode(vcpu) &
  (exec_control &  CPU_BASED_USE_MSR_BITMAPS)) ?

> + else if (irqchip_in_kernel(vcpu->kvm) &&
> + apic_x2apic_mode(vcpu->arch.apic)) {
>   if (is_long_mode(vcpu))
>   msr_bitmap = vmx_msr_bitmap_longmode_x2apic;
>   else
> @@ -9218,9 +9221,9 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
> struct vmcs12 *vmcs12)
>   }
>  
>   if (cpu_has_vmx_msr_bitmap() &&
> - exec_control & CPU_BASED_USE_MSR_BITMAPS &&
> - nested_vmx_merge_msr_bitmap(vcpu, vmcs12)) {
> - vmcs_write64(MSR_BITMAP, __pa(vmx_msr_bitmap_nested));
> + exec_control & CPU_BASED_USE_MSR_BITMAPS) {
> + nested_vmx_merge_msr_bitmap(vcpu, vmcs12);
> + /* MSR_BITMAP will be set by following vmx_set_efer. */
>   } else
>   exec_control &= ~CPU_BASED_USE_MSR_BITMAPS;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/4] vfio: implement iommu driver capabilities with an enum

2015-03-04 Thread Baptiste Reynal
From: Antonios Motakis 

Currently a VFIO driver's IOMMU capabilities are encoded as a series of
numerical defines. Replace this with an enum for future maintainability.

Signed-off-by: Antonios Motakis 
Signed-off-by: Baptiste Reynal 
---
 include/uapi/linux/vfio.h | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 82889c3..5fb3d46 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -19,22 +19,20 @@
 
 /* Kernel & User level defines for VFIO IOCTLs. */
 
-/* Extensions */
-
-#define VFIO_TYPE1_IOMMU   1
-#define VFIO_SPAPR_TCE_IOMMU   2
-#define VFIO_TYPE1v2_IOMMU 3
 /*
- * IOMMU enforces DMA cache coherence (ex. PCIe NoSnoop stripping).  This
- * capability is subject to change as groups are added or removed.
+ * Capabilities exposed by the VFIO IOMMU driver. Some capabilities are subject
+ * to change as groups are added or removed.
  */
-#define VFIO_DMA_CC_IOMMU  4
-
-/* Check if EEH is supported */
-#define VFIO_EEH   5
+enum vfio_iommu_cap {
+   VFIO_TYPE1_IOMMU = 1,
+   VFIO_SPAPR_TCE_IOMMU = 2,
+   VFIO_TYPE1v2_IOMMU = 3,
+   VFIO_DMA_CC_IOMMU = 4,  /* IOMMU enforces DMA cache coherence
+  (ex. PCIe NoSnoop stripping) */
+   VFIO_EEH = 5,   /* Check if EEH is supported */
+   VFIO_TYPE1_NESTING_IOMMU = 6,   /* Two-stage IOMMU, implies v2  */
+};
 
-/* Two-stage IOMMU */
-#define VFIO_TYPE1_NESTING_IOMMU   6   /* Implies v2 */
 
 /*
  * The IOCTL interface is designed for extensibility by embedding the
-- 
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 4/4] vfio: type1: implement the VFIO_DMA_MAP_FLAG_NOEXEC flag

2015-03-04 Thread Baptiste Reynal
From: Antonios Motakis 

Some IOMMU drivers, such as the ARM SMMU driver, make available the
IOMMU_NOEXEC flag to set the page tables for a device as XN (execute never).
This affects devices such as the ARM PL330 DMA Controller, which respects
this flag and will refuse to fetch DMA instructions from memory where the
XN flag has been set.

The flag can be used only if all IOMMU domains behind the container support
the IOMMU_NOEXEC flag. Also, if any mappings are created with the flag, any
new domains with devices will have to support it as well.

Signed-off-by: Antonios Motakis 
Signed-off-by: Baptiste Reynal 
---
 drivers/vfio/vfio_iommu_type1.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a5847e8..ec313e5 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -591,6 +591,12 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
if (!prot || !size || (size | iova | vaddr) & mask)
return -EINVAL;
 
+   if (map->flags & VFIO_DMA_MAP_FLAG_NOEXEC) {
+   if (!vfio_domains_have_iommu_cap(iommu, IOMMU_CAP_NOEXEC))
+   return -EINVAL;
+   prot |= IOMMU_NOEXEC;
+   }
+
/* Don't allow IOVA or virtual address wrap */
if (iova + size - 1 < iova || vaddr + size - 1 < vaddr)
return -EINVAL;
@@ -672,11 +678,20 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 
for (; n; n = rb_next(n)) {
struct vfio_dma *dma;
+   const struct iommu_ops *ops = domain->domain->ops;
dma_addr_t iova;
 
dma = rb_entry(n, struct vfio_dma, node);
iova = dma->iova;
 
+   /*
+* if any of the mappings to be replayed has the NOEXEC flag
+* set, then the new iommu domain must support it
+*/
+   if ((dma->prot & IOMMU_NOEXEC) &&
+   !(ops->capable(IOMMU_CAP_NOEXEC)))
+   return -EINVAL;
+
while (iova < dma->iova + dma->size) {
phys_addr_t phys = iommu_iova_to_phys(d->domain, iova);
size_t size;
@@ -969,6 +984,11 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
return 0;
return vfio_domains_have_iommu_cap(iommu,
  IOMMU_CAP_CACHE_COHERENCY);
+   case VFIO_DMA_NOEXEC_IOMMU:
+   if (!iommu)
+   return 0;
+   return vfio_domains_have_iommu_cap(iommu,
+  IOMMU_CAP_NOEXEC);
default:
return 0;
}
@@ -992,7 +1012,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
} else if (cmd == VFIO_IOMMU_MAP_DMA) {
struct vfio_iommu_type1_dma_map map;
uint32_t mask = VFIO_DMA_MAP_FLAG_READ |
-   VFIO_DMA_MAP_FLAG_WRITE;
+   VFIO_DMA_MAP_FLAG_WRITE |
+   VFIO_DMA_MAP_FLAG_NOEXEC;
 
minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
 
-- 
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 3/4] vfio: type1: replace vfio_domains_have_iommu_cache with generic function

2015-03-04 Thread Baptiste Reynal
From: Antonios Motakis 

Replace the function vfio_domains_have_iommu_cache() with a more generic
function vfio_domains_have_iommu_cap() which allows to check all domains
of an vfio_iommu structure for a given cached capability.

Signed-off-by: Antonios Motakis 
Signed-off-by: Baptiste Reynal 
---
 drivers/vfio/vfio_iommu_type1.c | 37 +++--
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 57d8c37..a5847e8 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -82,6 +82,23 @@ struct vfio_group {
struct list_headnext;
 };
 
+static int vfio_domains_have_iommu_cap(struct vfio_iommu *iommu, int cap)
+{
+   struct vfio_domain *domain;
+   int ret = 1;
+
+   mutex_lock(&iommu->lock);
+   list_for_each_entry(domain, &iommu->domain_list, next) {
+   if (!(domain->domain->ops->capable(cap))) {
+   ret = 0;
+   break;
+   }
+   }
+   mutex_unlock(&iommu->lock);
+
+   return ret;
+}
+
 /*
  * This code handles mapping and unmapping of user data buffers
  * into DMA'ble space using the IOMMU
@@ -935,23 +952,6 @@ static void vfio_iommu_type1_release(void *iommu_data)
kfree(iommu);
 }
 
-static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
-{
-   struct vfio_domain *domain;
-   int ret = 1;
-
-   mutex_lock(&iommu->lock);
-   list_for_each_entry(domain, &iommu->domain_list, next) {
-   if (!(domain->prot & IOMMU_CACHE)) {
-   ret = 0;
-   break;
-   }
-   }
-   mutex_unlock(&iommu->lock);
-
-   return ret;
-}
-
 static long vfio_iommu_type1_ioctl(void *iommu_data,
   unsigned int cmd, unsigned long arg)
 {
@@ -967,7 +967,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
case VFIO_DMA_CC_IOMMU:
if (!iommu)
return 0;
-   return vfio_domains_have_iommu_cache(iommu);
+   return vfio_domains_have_iommu_cap(iommu,
+ IOMMU_CAP_CACHE_COHERENCY);
default:
return 0;
}
-- 
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 2/4] vfio: introduce the VFIO_DMA_MAP_FLAG_NOEXEC flag

2015-03-04 Thread Baptiste Reynal
From: Antonios Motakis 

We introduce the VFIO_DMA_MAP_FLAG_NOEXEC flag to the VFIO dma map call,
and expose its availability via the capability VFIO_DMA_NOEXEC_IOMMU.
This way the user can control whether the XN flag will be set on the
requested mappings. The IOMMU_NOEXEC flag needs to be available for all
the IOMMUs of the container used.

Signed-off-by: Antonios Motakis 
Signed-off-by: Baptiste Reynal 
---
 include/uapi/linux/vfio.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 5fb3d46..30801a7 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -31,6 +31,7 @@ enum vfio_iommu_cap {
   (ex. PCIe NoSnoop stripping) */
VFIO_EEH = 5,   /* Check if EEH is supported */
VFIO_TYPE1_NESTING_IOMMU = 6,   /* Two-stage IOMMU, implies v2  */
+   VFIO_DMA_NOEXEC_IOMMU = 7,
 };
 
 
@@ -397,12 +398,17 @@ struct vfio_iommu_type1_info {
  *
  * Map process virtual addresses to IO virtual addresses using the
  * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
+ *
+ * To use the VFIO_DMA_MAP_FLAG_NOEXEC flag, the container must support the
+ * VFIO_DMA_NOEXEC_IOMMU capability. If mappings are created using this flag,
+ * any groups subsequently added to the container must support this capability.
  */
 struct vfio_iommu_type1_dma_map {
__u32   argsz;
__u32   flags;
 #define VFIO_DMA_MAP_FLAG_READ (1 << 0)/* readable from device 
*/
 #define VFIO_DMA_MAP_FLAG_WRITE (1 << 1)   /* writable from device */
+#define VFIO_DMA_MAP_FLAG_NOEXEC (1 << 2)  /* not executable from device */
__u64   vaddr;  /* Process virtual address */
__u64   iova;   /* IO virtual address */
__u64   size;   /* Size of mapping (bytes) */
-- 
2.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 6/6] target-arm: cpu.h document why env->spsr exists

2015-03-04 Thread Alex Bennée

Peter Maydell  writes:

> On 4 March 2015 at 23:35, Alex Bennée  wrote:
>> I was getting very confused about the duplication of state. Perhaps we
>> should just get rid of env->spsr and use helpers that understand the
>> banking?
>
> Doesn't seem worth changing the current working code to something
> else that's strictly less efficient... spsr is by no means the
> only banked-by-mode register, and it works just like all the others.

Fair enough. I just wanted to make it clear it was a cached value for
the benefit of TCG.

>
> -- PMM

-- 
Alex Bennée
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-03-04 Thread Paolo Bonzini


On 04/03/2015 15:29, Catalin Marinas wrote:
> I disagree it is 100% a host-side issue. It is a host-side issue _if_
> the host tells the guest that the (virtual) device is non-coherent (or,
> more precisely, it does not explicitly tell the guest that the device is
> coherent). If the guest thinks the (virtual) device is non-coherent
> because of information passed by the host, I fully agree that the host
> needs to manage the cache coherency.
> 
> However, the host could also pass a "dma-coherent" property in the DT
> given to the guest and avoid any form of cache maintenance. If the guest
> does not honour such coherency property, it's a guest problem and it
> needs fixing in the guest. This isn't any different from a real physical
> device behaviour.

Can you add that property to the device tree for PCI devices too?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-03-04 Thread Catalin Marinas
On Wed, Mar 04, 2015 at 06:03:11PM +0100, Paolo Bonzini wrote:
> 
> 
> On 04/03/2015 15:29, Catalin Marinas wrote:
> > I disagree it is 100% a host-side issue. It is a host-side issue _if_
> > the host tells the guest that the (virtual) device is non-coherent (or,
> > more precisely, it does not explicitly tell the guest that the device is
> > coherent). If the guest thinks the (virtual) device is non-coherent
> > because of information passed by the host, I fully agree that the host
> > needs to manage the cache coherency.
> > 
> > However, the host could also pass a "dma-coherent" property in the DT
> > given to the guest and avoid any form of cache maintenance. If the guest
> > does not honour such coherency property, it's a guest problem and it
> > needs fixing in the guest. This isn't any different from a real physical
> > device behaviour.
> 
> Can you add that property to the device tree for PCI devices too?

Yes but not with mainline yet:

http://thread.gmane.org/gmane.linux.kernel.iommu/8935

We can add the property at the PCI host bridge level (with the drawback
that it covers all the PCI devices), like here:

Documentation/devicetree/bindings/pci/xgene-pci.txt

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3.14 58/73] KVM: MIPS: Dont leak FPU/DSP to guest

2015-03-04 Thread Greg Kroah-Hartman
On Wed, Mar 04, 2015 at 08:10:40AM +, James Hogan wrote:
> Hi Greg,
> 
> On Tue, Mar 03, 2015 at 10:13:26PM -0800, Greg Kroah-Hartman wrote:
> > 3.14-stable review patch.  If anyone has any objections, please let me know.
> > 
> > --
> > 
> > From: James Hogan 
> > 
> > commit f798217dfd038af981a18bbe4bc57027a08bb182 upstream.
> > 
> > The FPU and DSP are enabled via the CP0 Status CU1 and MX bits by
> > kvm_mips_set_c0_status() on a guest exit, presumably in case there is
> > active state that needs saving if pre-emption occurs. However neither of
> > these bits are cleared again when returning to the guest.
> > 
> > This effectively gives the guest access to the FPU/DSP hardware after
> > the first guest exit even though it is not aware of its presence,
> > allowing FP instructions in guest user code to intermittently actually
> > execute instead of trapping into the guest OS for emulation. It will
> > then read & manipulate the hardware FP registers which technically
> > belong to the user process (e.g. QEMU), or are stale from another user
> > process. It can also crash the guest OS by causing an FP exception, for
> > which a guest exception handler won't have been registered.
> > 
> > First lets save and disable the FPU (and MSA) state with lose_fpu(1)
> > before entering the guest. This simplifies the problem, especially for
> > when guest FPU/MSA support is added in the future, and prevents FR=1 FPU
> > state being live when the FR bit gets cleared for the guest, which
> > according to the architecture causes the contents of the FPU and vector
> > registers to become UNPREDICTABLE.
> > 
> > We can then safely remove the enabling of the FPU in
> > kvm_mips_set_c0_status(), since there should never be any active FPU or
> > MSA state to save at pre-emption, which should plug the FPU leak.
> > 
> > DSP state is always live rather than being lazily restored, so for that
> > it is simpler to just clear the MX bit again when re-entering the guest.
> > 
> > Signed-off-by: James Hogan 
> > Cc: Paolo Bonzini 
> > Cc: Ralf Baechle 
> > Cc: Sanjay Lal 
> > Cc: Gleb Natapov 
> > Cc: kvm@vger.kernel.org
> > Cc: linux-m...@linux-mips.org
> > Cc:  # v3.10+: 044f0f03eca0: MIPS: KVM: Deliver 
> > guest interrupts
> 
> The original 3.10 and 3.12/3.14 backports had this added:
> Cc:  # v3.10+: 3ce465e04bfd: MIPS: Export FP 
> functions used by lose_fpu(1) for KVM 
> Which I can't see included in the v3.10 stable queue or branch. It fixes
> a build error with MIPS malta_kvm_defconfig (MIPS=y, KVM=m) after this
> patch is applied.
> 
> Same applies to the 3.14 queue too I think.

Odd, I remember having problems in this area and thought I had queued
this up.  It's now applied to both trees, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 01/16] Introduce probe mode for machine type none

2015-03-04 Thread Eduardo Habkost
On Tue, Mar 03, 2015 at 11:55:24AM +0100, Michael Mueller wrote:
> On Mon, 02 Mar 2015 17:57:01 +0100
> Andreas Färber  wrote:
> 
> > Am 02.03.2015 um 17:43 schrieb Michael Mueller:
> > > On Mon, 02 Mar 2015 14:57:21 +0100
> > > Andreas Färber  wrote:
> > > 
> > >>>  int configure_accelerator(MachineState *ms)
> > >>>  {
> > >>> -const char *p;
> > >>> +const char *p, *name;
> > >>>  char buf[10];
> > >>>  int ret;
> > >>>  bool accel_initialised = false;
> > >>>  bool init_failed = false;
> > >>>  AccelClass *acc = NULL;
> > >>> +ObjectClass *oc;
> > >>> +bool probe_mode = false;
> > >>>  
> > >>>  p = qemu_opt_get(qemu_get_machine_opts(), "accel");
> > >>>  if (p == NULL) {
> > >>> -/* Use the default "accelerator", tcg */
> > >>> -p = "tcg";
> > >>> +oc = (ObjectClass *) MACHINE_GET_CLASS(current_machine);
> > >>> +name = object_class_get_name(oc);
> > >>> +probe_mode = !strcmp(name, "none" TYPE_MACHINE_SUFFIX);
> > >>> +if (probe_mode) {
> > >>> +/* Use these accelerators in probe mode, tcg should be 
> > >>> last */
> > >>> +p = probe_mode_accels;
> > >>> +} else {
> > >>> +/* Use the default "accelerator", tcg */
> > >>> +p = "tcg";
> > >>> +}
> > >>>  }  
> > >>
> > >> Can't we instead use an explicit ,accel=probe or ,accel=auto?
> > >> That would then obsolete the next patch.
> > > 
> > > How would you express the following with the accel= 
> > > approach?
> > > 
> > > -probe -machine s390-ccw,accel=kvm 
> > > 
> > > Using machine "none" as default with tcg as last accelerator initialized 
> > > should not break
> > > anything.
> > > 
> > > -M none
> > 
> > Let me ask differently: What does -machine none or -M none have to do
> > with probing? It reads as if you are introducing two probe modes. Why do
> 
> The machine none? nothing directly, I guess. What are real world use cases 
> for that
> machine type?
> 
> > you need both? If we have -probe, isn't that independent of which
> 
> It is just two different means to switch on the same mode.
> 
> > machine we specify? Who is going to call either, with which respective goal?
> 
> -probe itself would be sufficient but I currently do not want to enforce the 
> use of
> a new parameter. Best would be not to have that mode at all if possible. 
> 
> The intended use case is driven by management interfaces that need to draw 
> decisions
> on, in this particular case runnable cpu models, with information originated 
> by qemu.
> 
> Let me walk through Eduardo's suggestion first and crosscheck it with my 
> requirements
> before we enter in a maybe afterwards obsolete discussion.

I have been working on some changes to implement x86 CPU probing code
that creates accel objects on the fly, that may be useful. See:
  https://github.com/ehabkost/qemu-hacks/tree/work/user-accel-init

Especially the commit:
  kvm: Move /dev/kvm opening/closing to open/close methods

The next steps I plan are:
 * Create AccelState object on TCG too, and somehow pass it as argument
   to cpu_x86_init()
 * Change all kvm_enabled() occurrences on target-i386/cpu.c to use
   the provided accel object (including
   x86_cpu_get_supported_feature_word() and x86_cpu_filter_features())
 * Use the new
   x86_cpu_get_supported_feature_word()/x86_cpu_filter_features() code
   to implement a is_runnable(X86CPUClass*, AccelState*) check
 * Use the new is_runnable() check to extend query-cpu-definitions for x86 too
 * Add -cpu string and machine-type arguments to the is_runnable() check

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] vfio-pci: Remove warning if try-reset fails

2015-03-04 Thread Alex Williamson
As indicated in the comment, this is not entirely uncommon and
causes user concern for no reason.

Signed-off-by: Alex Williamson 
---
 drivers/vfio/pci/vfio_pci.c |   10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 9c854b0..e5eb2e6 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -171,14 +171,8 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
 * Try to reset the device.  The success of this is dependent on
 * being able to lock the device, which is not always possible.
 */
-   if (vdev->reset_works) {
-   int ret = pci_try_reset_function(pdev);
-   if (ret)
-   pr_warn("%s: Failed to reset device %s (%d)\n",
-   __func__, dev_name(&pdev->dev), ret);
-   else
-   vdev->needs_reset = false;
-   }
+   if (vdev->reset_works && !pci_try_reset_function(pdev))
+   vdev->needs_reset = false;
 
pci_restore_state(pdev);
 out:

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] vfio-pci: Move idle devices to D3hot power state

2015-03-04 Thread Alex Williamson
We can save some power by putting devices that are bound to vfio-pci
but not in use by the user in the D3hot power state.  Devices get
woken into D0 when opened by the user.  Resets return the device to
D0, so we need to re-apply the low power state after a bus reset.
It's tempting to try to use D3cold, but we have no reason to inhibit
hotplug of idle devices and we might get into a loop of having the
device disappear before we have a chance to try to use it.

A new module parameter "disable_idle_pm" allows this feature to be
disabled if there are devices that misbehave as a result of this
change.

Signed-off-by: Alex Williamson 
---
 drivers/vfio/pci/vfio_pci.c |   33 ++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index e5eb2e6..2133b74 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -48,6 +48,11 @@ MODULE_PARM_DESC(disable_vga,
 "Disable VGA resource access for VGA-capable devices");
 #endif
 
+static bool disable_idle_pm;
+module_param_named(disable_idle_pm, disable_idle_pm, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(disable_idle_pm,
+"Disable support for trying to move idle, unused devices to a 
low power state.  This might be necessary if devices do not behave properly 
moving into or out of low power states.");
+
 static DEFINE_MUTEX(driver_lock);
 
 static inline bool vfio_vga_disabled(void)
@@ -68,6 +73,8 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
u16 cmd;
u8 msix_pos;
 
+   pci_set_power_state(pdev, PCI_D0);
+
/* Don't allow our initial saved state to include busmaster */
pci_clear_master(pdev);
 
@@ -179,6 +186,9 @@ out:
pci_disable_device(pdev);
 
vfio_pci_try_bus_reset(vdev);
+
+   if (!disable_idle_pm)
+   pci_set_power_state(pdev, PCI_D3hot);
 }
 
 static void vfio_pci_release(void *device_data)
@@ -899,6 +909,17 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
kfree(vdev);
}
 
+   if (!disable_idle_pm) {
+   /*
+* pci-core sets the device power state to an unknown value at
+* bootup and after being removed from a driver.  The only
+* transition it allows from this unknown state is to D0.  We
+* therefore first do a D0 transition before going to D3.
+*/
+   pci_set_power_state(pdev, PCI_D0);
+   pci_set_power_state(pdev, PCI_D3hot);
+   }
+
return ret;
 }
 
@@ -911,6 +932,9 @@ static void vfio_pci_remove(struct pci_dev *pdev)
iommu_group_put(pdev->dev.iommu_group);
kfree(vdev);
}
+
+   if (!disable_idle_pm)
+   pci_set_power_state(pdev, PCI_D0);
 }
 
 static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
@@ -1029,10 +1053,13 @@ static void vfio_pci_try_bus_reset(struct 
vfio_pci_device *vdev)
 
 put_devs:
for (i = 0; i < devs.cur_index; i++) {
-   if (!ret) {
-   tmp = vfio_device_data(devs.devices[i]);
+   tmp = vfio_device_data(devs.devices[i]);
+   if (!ret)
tmp->needs_reset = false;
-   }
+
+   if (!tmp->refcnt && !disable_idle_pm)
+   pci_set_power_state(tmp->pdev, PCI_D3hot);
+
vfio_device_put(devs.devices[i]);
}
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] vfio-pci: Allow PCI IDs to be specified as module options

2015-03-04 Thread Alex Williamson
This copies the same support from pci-stub for exactly the same
purpose, enabling a set of PCI IDs to be automatically added to the
driver's dynamic ID table at module load time.  The code here is
pretty simple and both vfio-pci and pci-stub are fairly unique in
being meta drivers, capable of attaching to any device, so there's no
attempt made to generalize the code into pci-core.

Signed-off-by: Alex Williamson 
---
 drivers/vfio/pci/vfio_pci.c |   46 +++
 1 file changed, 46 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index f8a1863..b3bae4c 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -32,6 +32,10 @@
 #define DRIVER_AUTHOR   "Alex Williamson "
 #define DRIVER_DESC "VFIO PCI - User Level meta-driver"
 
+static char ids[1024] __initdata;
+module_param_string(ids, ids, sizeof(ids), 0);
+MODULE_PARM_DESC(ids, "Initial PCI IDs to add to the vfio driver, format is 
\"vendor:device[:subvendor[:subdevice[:class[:class_mask\" and multiple 
comma separated entries can be specified");
+
 static bool nointxmask;
 module_param_named(nointxmask, nointxmask, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(nointxmask,
@@ -1034,6 +1038,46 @@ static void __exit vfio_pci_cleanup(void)
vfio_pci_uninit_perm_bits();
 }
 
+static void __init vfio_pci_fill_ids(void)
+{
+   char *p, *id;
+   int rc;
+
+   /* no ids passed actually */
+   if (ids[0] == '\0')
+   return;
+
+   /* add ids specified in the module parameter */
+   p = ids;
+   while ((id = strsep(&p, ","))) {
+   unsigned int vendor, device, subvendor = PCI_ANY_ID,
+   subdevice = PCI_ANY_ID, class = 0, class_mask = 0;
+   int fields;
+
+   if (!strlen(id))
+   continue;
+
+   fields = sscanf(id, "%x:%x:%x:%x:%x:%x",
+   &vendor, &device, &subvendor, &subdevice,
+   &class, &class_mask);
+
+   if (fields < 2) {
+   pr_warn("vfio-pci: invalid id string \"%s\"\n", id);
+   continue;
+   }
+
+   pr_info("vfio-pci: add %04X:%04X sub=%04X:%04X cls=%08X/%08X\n",
+   vendor, device, subvendor, subdevice,
+   class, class_mask);
+
+   rc = pci_add_dynid(&vfio_pci_driver, vendor, device,
+  subvendor, subdevice, class, class_mask, 0);
+   if (rc)
+   pr_warn("vfio-pci: failed to add dynamic id (%d)\n",
+   rc);
+   }
+}
+
 static int __init vfio_pci_init(void)
 {
int ret;
@@ -1053,6 +1097,8 @@ static int __init vfio_pci_init(void)
if (ret)
goto out_driver;
 
+   vfio_pci_fill_ids();
+
return 0;
 
 out_driver:

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] vfio-pci: Add VGA arbiter client

2015-03-04 Thread Alex Williamson
If VFIO VGA access is disabled for the user, either by CONFIG option
or module parameter, we can often opt-out of VGA arbitration.  We can
do this when PCI bridge control of VGA routing is possible.  This
means that we must have a parent bridge and there must only be a
single VGA device below that bridge.  Fortunately this is the typical
case for discrete GPUs.

Doing this allows us to minimize the impact of additional GPUs, in
terms of VGA arbitration, when they are only used via vfio-pci for
non-VGA applications.

Signed-off-by: Alex Williamson 
---
 drivers/vfio/pci/vfio_pci.c |   67 ---
 1 file changed, 63 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 2133b74..2522d7c 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "vfio_pci_private.h"
 
@@ -64,6 +65,50 @@ static inline bool vfio_vga_disabled(void)
 #endif
 }
 
+/*
+ * Our VGA arbiter participation is limited since we don't know anything
+ * about the device itself.  However, if the device is the only VGA device
+ * downstream of a bridge and VFIO VGA support is disabled, then we can
+ * safely return legacy VGA IO and memory as not decoded since the user
+ * has no way to get to it and routing can be disabled externally at the
+ * bridge.
+ */
+static unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga)
+{
+   struct vfio_pci_device *vdev = opaque;
+   struct pci_dev *tmp = NULL, *pdev = vdev->pdev;
+   unsigned char max_busnr;
+   unsigned int decodes;
+
+   if (single_vga || !vfio_vga_disabled() || pci_is_root_bus(pdev->bus))
+   return VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM |
+  VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM;
+
+   max_busnr = pci_bus_max_busnr(pdev->bus);
+   decodes = VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM;
+
+   while ((tmp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, tmp)) != NULL) {
+   if (tmp == pdev ||
+   pci_domain_nr(tmp->bus) != pci_domain_nr(pdev->bus) ||
+   pci_is_root_bus(tmp->bus))
+   continue;
+
+   if (tmp->bus->number >= pdev->bus->number &&
+   tmp->bus->number <= max_busnr) {
+   pci_dev_put(tmp);
+   decodes |= VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM;
+   break;
+   }
+   }
+
+   return decodes;
+}
+
+static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
+{
+   return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA;
+}
+
 static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev);
 
 static int vfio_pci_enable(struct vfio_pci_device *vdev)
@@ -120,7 +165,7 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
} else
vdev->msix_bar = 0xFF;
 
-   if (!vfio_vga_disabled() && (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
+   if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev))
vdev->has_vga = true;
 
return 0;
@@ -909,6 +954,12 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
kfree(vdev);
}
 
+   if (vfio_pci_is_vga(pdev)) {
+   vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
+   vga_set_legacy_decoding(pdev,
+   vfio_pci_set_vga_decode(vdev, false));
+   }
+
if (!disable_idle_pm) {
/*
 * pci-core sets the device power state to an unknown value at
@@ -928,9 +979,17 @@ static void vfio_pci_remove(struct pci_dev *pdev)
struct vfio_pci_device *vdev;
 
vdev = vfio_del_group_dev(&pdev->dev);
-   if (vdev) {
-   iommu_group_put(pdev->dev.iommu_group);
-   kfree(vdev);
+   if (!vdev)
+   return;
+
+   iommu_group_put(pdev->dev.iommu_group);
+   kfree(vdev);
+
+   if (vfio_pci_is_vga(pdev)) {
+   vga_client_register(pdev, NULL, NULL, NULL);
+   vga_set_legacy_decoding(pdev,
+   VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM |
+   VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM);
}
 
if (!disable_idle_pm)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] vfio-pci: Misc enhancements

2015-03-04 Thread Alex Williamson
Each of these patches stands on it's own, but they all touch the
same file and therefore build on each other to some extent.  The first
patch allows vfio-pci to accept a list of IDs as a module parameter,
exactly like pci-stub.  This is really useful to get vfio-pci bound to
devices that are intended to be used exclusively by userspace without
more complex scripts (particuarly with modprobe.d softdep, pre
loading).

The 2nd patch allows a module option to disable VGA access if vfio-pci
is configured with VGA support.  By itself this doesn't do much, but
with patch 5, we add a VGA arbiter client that will opt-out devices
from arbitration if the user cannot access VGA regions of the device.
If you have extra GPUs in your system exclusively for use via vfio-pci
and don't depend on VGA resource access, this opt-out can allow host
graphics to enable DRI2 support as if the additional GPUs weren't
there.  This depends on https://lkml.org/lkml/2015/2/24/430 or else
vga arbiter support gets ugly (still looking for an Ack there).

The other interesting feature here is idle power management.  It's
possible that there are extra devices in a host that are used
exclusively via vfio-pci and that those devices are not used 100% of
the time.  To be power conscious we can put devices into a low power
state when they're not actively being used.  This doesn't always save
much, or necessarily any power, but we can at least try.

I'll also note that the last patch makes use of pci_bus_max_busnr()
which seems to be in flux with some current upstream patches.  I don't
really care if we use pci_bus_max_busnr() or some replacement for it,
so long as pci-core allows a way to determine the bus number range
below a bridge so I can determine if there's more than one VGA device
below it.  Comments welcome.  Thanks,

Alex

---

Alex Williamson (5):
  vfio-pci: Allow PCI IDs to be specified as module options
  vfio-pci: Add module option to disable VGA region access
  vfio-pci: Remove warning if try-reset fails
  vfio-pci: Move idle devices to D3hot power state
  vfio-pci: Add VGA arbiter client


 drivers/vfio/pci/vfio_pci.c |  174 +++
 1 file changed, 157 insertions(+), 17 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] vfio-pci: Add module option to disable VGA region access

2015-03-04 Thread Alex Williamson
Add a module option so that we don't require a CONFIG change and
kernel rebuild to disable VGA support.  Not only can VGA support be
troublesome in itself, but by disabling it we can reduce the impact
to host devices by doing a VGA arbitration opt-out.

Signed-off-by: Alex Williamson 
---
 drivers/vfio/pci/vfio_pci.c |   20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index b3bae4c..9c854b0 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -41,8 +41,24 @@ module_param_named(nointxmask, nointxmask, bool, S_IRUGO | 
S_IWUSR);
 MODULE_PARM_DESC(nointxmask,
  "Disable support for PCI 2.3 style INTx masking.  If this 
resolves problems for specific devices, report lspci -vvvxxx to 
linux-...@vger.kernel.org so the device can be fixed automatically via the 
broken_intx_masking flag.");
 
+#ifdef CONFIG_VFIO_PCI_VGA
+static bool disable_vga;
+module_param_named(disable_vga, disable_vga, bool, S_IRUGO);
+MODULE_PARM_DESC(disable_vga,
+"Disable VGA resource access for VGA-capable devices");
+#endif
+
 static DEFINE_MUTEX(driver_lock);
 
+static inline bool vfio_vga_disabled(void)
+{
+#ifdef CONFIG_VFIO_PCI_VGA
+   return disable_vga;
+#else
+   return true;
+#endif
+}
+
 static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev);
 
 static int vfio_pci_enable(struct vfio_pci_device *vdev)
@@ -97,10 +113,8 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
} else
vdev->msix_bar = 0xFF;
 
-#ifdef CONFIG_VFIO_PCI_VGA
-   if ((pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
+   if (!vfio_vga_disabled() && (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
vdev->has_vga = true;
-#endif
 
return 0;
 }

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] vfio-pci: Allow PCI IDs to be specified as module options

2015-03-04 Thread Bandan Das
Hi Alex,

Alex Williamson  writes:
...
> + if (fields < 2) {
> + pr_warn("vfio-pci: invalid id string \"%s\"\n", id);
> + continue;
> + }
> +
> + pr_info("vfio-pci: add %04X:%04X sub=%04X:%04X cls=%08X/%08X\n",
> + vendor, device, subvendor, subdevice,
> + class, class_mask);
> +
> + rc = pci_add_dynid(&vfio_pci_driver, vendor, device,
> +subvendor, subdevice, class, class_mask, 0);
> + if (rc)
> + pr_warn("vfio-pci: failed to add dynamic id (%d)\n",
> + rc);
Just a minor observation - maybe we should print out the
vendor/device/subvendor/subdevice as part of the failure message too. The info
message could potentially get lost in a system with high syslog traffic.

Bandan

> + }
> +}
> +
>  static int __init vfio_pci_init(void)
>  {
>   int ret;
> @@ -1053,6 +1097,8 @@ static int __init vfio_pci_init(void)
>   if (ret)
>   goto out_driver;
>  
> + vfio_pci_fill_ids();
> +
>   return 0;
>  
>  out_driver:
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Multicast, macvtap, bridge

2015-03-04 Thread Hoggins!
Hello list,

I'm trying to have guests communicate with each other on multicast. I
don't know where to look to enable this, and if its possible.
My guests communicate normally with each other on their bridged
(macvtap) interface on unicast, but I cannot seem to be able to do it
with multicast.

Other tests conducted between bare metal hosts are successful, so my
testing methods don't seem to be the problem here. I'm using sockperf.

All the interfaces I'm using, either on host or guest, have MULTICAST
enabled.

What am I missing ?

Thanks for your help.



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/5] vfio-pci: Allow PCI IDs to be specified as module options

2015-03-04 Thread Alex Williamson
On Wed, 2015-03-04 at 15:49 -0500, Bandan Das wrote:
> Hi Alex,
> 
> Alex Williamson  writes:
> ...
> > +   if (fields < 2) {
> > +   pr_warn("vfio-pci: invalid id string \"%s\"\n", id);
> > +   continue;
> > +   }
> > +
> > +   pr_info("vfio-pci: add %04X:%04X sub=%04X:%04X cls=%08X/%08X\n",
> > +   vendor, device, subvendor, subdevice,
> > +   class, class_mask);
> > +
> > +   rc = pci_add_dynid(&vfio_pci_driver, vendor, device,
> > +  subvendor, subdevice, class, class_mask, 0);
> > +   if (rc)
> > +   pr_warn("vfio-pci: failed to add dynamic id (%d)\n",
> > +   rc);
> Just a minor observation - maybe we should print out the
> vendor/device/subvendor/subdevice as part of the failure message too. The info
> message could potentially get lost in a system with high syslog traffic.

Thanks for the comment.  I don't think we want to duplicate the pr_info
above it, so are you thinking something like this?

if (fields < 2) {
pr_warn("vfio-pci: invalid id string \"%s\"\n", id);
continue;
}

rc = pci_add_dynid(&vfio_pci_driver, vendor, device,
   subvendor, subdevice, class, class_mask, 0);
if (rc)
pr_warn("vfio-pci: failed to add dynamic id: %04X:%04X 
sub=%04X:%04X cls=%08X/%08X (%d)\n",
vendor, device, subvendor, subdevice,
class, class_mask, rc);
else
pr_info("vfio-pci: add %04X:%04X sub=%04X:%04X 
cls=%08X/%08X\n",
vendor, device, subvendor, subdevice,
class, class_mask);
}


> > +   }
> > +}
> > +
> >  static int __init vfio_pci_init(void)
> >  {
> > int ret;
> > @@ -1053,6 +1097,8 @@ static int __init vfio_pci_init(void)
> > if (ret)
> > goto out_driver;
> >  
> > +   vfio_pci_fill_ids();
> > +
> > return 0;
> >  
> >  out_driver:
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] vfio-pci: Allow PCI IDs to be specified as module options

2015-03-04 Thread Bandan Das
Alex Williamson  writes:

> On Wed, 2015-03-04 at 15:49 -0500, Bandan Das wrote:
>> Hi Alex,
>> 
>> Alex Williamson  writes:
>> ...
>> > +  if (fields < 2) {
>> > +  pr_warn("vfio-pci: invalid id string \"%s\"\n", id);
>> > +  continue;
>> > +  }
>> > +
>> > +  pr_info("vfio-pci: add %04X:%04X sub=%04X:%04X cls=%08X/%08X\n",
>> > +  vendor, device, subvendor, subdevice,
>> > +  class, class_mask);
>> > +
>> > +  rc = pci_add_dynid(&vfio_pci_driver, vendor, device,
>> > + subvendor, subdevice, class, class_mask, 0);
>> > +  if (rc)
>> > +  pr_warn("vfio-pci: failed to add dynamic id (%d)\n",
>> > +  rc);
>> Just a minor observation - maybe we should print out the
>> vendor/device/subvendor/subdevice as part of the failure message too. The 
>> info
>> message could potentially get lost in a system with high syslog traffic.
>
> Thanks for the comment.  I don't think we want to duplicate the pr_info
> above it, so are you thinking something like this?
>
> if (fields < 2) {
> pr_warn("vfio-pci: invalid id string \"%s\"\n", id);
> continue;
> }
>
> rc = pci_add_dynid(&vfio_pci_driver, vendor, device,
>subvendor, subdevice, class, class_mask, 
> 0);
> if (rc)
> pr_warn("vfio-pci: failed to add dynamic id: 
> %04X:%04X sub=%04X:%04X cls=%08X/%08X (%d)\n",
> vendor, device, subvendor, subdevice,
> class, class_mask, rc);
> else
> pr_info("vfio-pci: add %04X:%04X sub=%04X:%04X 
> cls=%08X/%08X\n",
> vendor, device, subvendor, subdevice,
> class, class_mask);
> }

Yep, this is good. (BTW, we can define pr_fmt at the top of the file and avoid
the "vfio-pci" prefix)

>
>> > +  }
>> > +}
>> > +
>> >  static int __init vfio_pci_init(void)
>> >  {
>> >int ret;
>> > @@ -1053,6 +1097,8 @@ static int __init vfio_pci_init(void)
>> >if (ret)
>> >goto out_driver;
>> >  
>> > +  vfio_pci_fill_ids();
>> > +
>> >return 0;
>> >  
>> >  out_driver:
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe kvm" in
>> > the body of a message to majord...@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] vfio-pci: Allow PCI IDs to be specified as module options

2015-03-04 Thread Alex Williamson
On Wed, 2015-03-04 at 17:32 -0500, Bandan Das wrote:
> Alex Williamson  writes:
> 
> > On Wed, 2015-03-04 at 15:49 -0500, Bandan Das wrote:
> >> Hi Alex,
> >> 
> >> Alex Williamson  writes:
> >> ...
> >> > +if (fields < 2) {
> >> > +pr_warn("vfio-pci: invalid id string \"%s\"\n", 
> >> > id);
> >> > +continue;
> >> > +}
> >> > +
> >> > +pr_info("vfio-pci: add %04X:%04X sub=%04X:%04X 
> >> > cls=%08X/%08X\n",
> >> > +vendor, device, subvendor, subdevice,
> >> > +class, class_mask);
> >> > +
> >> > +rc = pci_add_dynid(&vfio_pci_driver, vendor, device,
> >> > +   subvendor, subdevice, class, 
> >> > class_mask, 0);
> >> > +if (rc)
> >> > +pr_warn("vfio-pci: failed to add dynamic id 
> >> > (%d)\n",
> >> > +rc);
> >> Just a minor observation - maybe we should print out the
> >> vendor/device/subvendor/subdevice as part of the failure message too. The 
> >> info
> >> message could potentially get lost in a system with high syslog traffic.
> >
> > Thanks for the comment.  I don't think we want to duplicate the pr_info
> > above it, so are you thinking something like this?
> >
> > if (fields < 2) {
> > pr_warn("vfio-pci: invalid id string \"%s\"\n", id);
> > continue;
> > }
> >
> > rc = pci_add_dynid(&vfio_pci_driver, vendor, device,
> >subvendor, subdevice, class, class_mask, 
> > 0);
> > if (rc)
> > pr_warn("vfio-pci: failed to add dynamic id: 
> > %04X:%04X sub=%04X:%04X cls=%08X/%08X (%d)\n",
> > vendor, device, subvendor, subdevice,
> > class, class_mask, rc);
> > else
> > pr_info("vfio-pci: add %04X:%04X sub=%04X:%04X 
> > cls=%08X/%08X\n",
> > vendor, device, subvendor, subdevice,
> > class, class_mask);
> > }
> 
> Yep, this is good. (BTW, we can define pr_fmt at the top of the file and avoid
> the "vfio-pci" prefix)

Cool, I'll remove the "vfio-pci: " prefixes and define the standard:

#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

Thanks!

> >> > +}
> >> > +}
> >> > +
> >> >  static int __init vfio_pci_init(void)
> >> >  {
> >> >  int ret;
> >> > @@ -1053,6 +1097,8 @@ static int __init vfio_pci_init(void)
> >> >  if (ret)
> >> >  goto out_driver;
> >> >  
> >> > +vfio_pci_fill_ids();
> >> > +
> >> >  return 0;
> >> >  
> >> >  out_driver:
> >> >
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> > the body of a message to majord...@vger.kernel.org
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "target-ppc: Create versionless CPU class per family if KVM"

2015-03-04 Thread Alexey Kardashevskiy

On 03/05/2015 01:55 AM, Andreas Färber wrote:

Am 03.03.2015 um 23:14 schrieb Alexey Kardashevskiy:

On 03/04/2015 07:43 AM, Alexander Graf wrote:

On 03.03.15 01:42, Alexey Kardashevskiy wrote:

On 03/03/2015 12:51 AM, Alexander Graf wrote:

On 02.03.15 14:42, Andreas Färber wrote:

Am 02.03.2015 um 14:37 schrieb Alexander Graf:

On 01.03.15 01:31, Andreas Färber wrote:

This reverts commit 5b79b1cadd3e565b6d1a5ba59764bd47af58b271 to
avoid
double-registration of types:

 Registering `POWER5+-powerpc64-cpu' which already exists

Taking the textual description of a CPU type as part of a new type
name
is plain wrong, and so is unconditionally registering a new type
here.

Cc: Alexey Kardashevskiy 
Cc: qemu-sta...@nongnu.org
Signed-off-by: Andreas Färber 


Doesn't this break p8 support?


Maybe, but p5 support was in longer and this is definitely a
regression
and really really wrong. If you know a way to fix it without
handing it
back to the IBM guys for more thought, feel free to give it a shot.


I honestly don't fully remember what this was about. Wasn't this our
special KVM class that we use to create a compatible cpu type on the
fly?

Alexey, please take a look at it.



I sent a note yesterday :-/ Here it is again:

With this revert, running qemu with HV KVM and -cpu POWER7 fails on real
POWER7 machine as my machine has pvr 003f 0201 and POWER7 is an alias of
POWER7_v2.3 (pvr 003f 0203); and this is what I tried to fix at the
first place. QEMU looks at classes first, and if not found - at aliases,
so this worked.

I would rename "POWER5+" to "POWER5+_0.0" and make "POWER5+" an alias
for POWER5+_v2.1 (or POWER5+_0.0).


Care to send a patch?


I wonder if Andreas has a better solution to my initial problem - he
obviously won't like the proposed patch :)


Quite predictable, am I? ;)

Could you please explain in detail what problem you are seeing on POWER8
without this patch?

 From your new patch it rather sounds as if this was totally unrelated to
-cpu host and a new KVM-only feature, reinforcing my feeling that my
function is the wrong place for your code.

Also, as I pointed out, the description cannot safely be used as part of
the type name, as it may contain prohibited characters, so this still
needs fixing.


So I can duplicate the CPU family name in PowerPCCPUClass. Which will 
always be the same as DeviceClass::desc. Well, it may be the right thing to do.




And for sure, if registering new types is indeed needed here, then a
check is needed for whether that type already exists and appropriate
error handling.


I'll cook a patch for this.

> I just don't understand why that is needed at all with

-cpu host taking the PVR as you say is needed here.

If you can precisely tell me what it is that you need then I'd be happy
to cook up a patch.


I thought I did... I need a way to run QEMU under HV KVM with a CPU name, 
just like this - -cpu POWER7 on _any_ real POWER7 CPU (v2.1, v2.3 or even 
POWER7+). Simply because all CPUs from the family behave the same. But HV 
KVM cannot virtualize PVR, it is a hardware limitation. So this is what my 
original patch fixed/bandaid'd.


The original request for -cpu POWER7 vs. -cpu host came from libvirt for 
migration purposes - afair the issue was that the destination QEMU must not 
start if it is POWER8-host and the source is POWER7-host so trying -cpu 
POWER7 will fail on POWER8 (but -cpu host would work and this would be 
wrong); but migration between POWER7 2.1 and POWER7 2.3 should still work.


And in general there is no good reason why -cpu POWER7 cannot work on any 
POWER7 CPU.


I could try adding a dynamic alias for "POWER7" to "host" but at the moment 
aliases are static so new dynamic class seemed simpler.



--
Alexey
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL 0/5] KVM: s390: Fixups for changes in merge window for 4.0

2015-03-04 Thread Marcelo Tosatti
On Wed, Mar 04, 2015 at 12:41:49PM +0100, Christian Borntraeger wrote:
> Paolo, Marcelo,
> 
> here is a pull request for kvm/master targetting 4.0. It is based on
> 4.0-rc1 as I want all patches in an upcoming pull request against 
> kvm/next as well to avoid nasty merge conflicts with other changes
> that we have pending. So my next pull request for kvm/next will
> be based on fb5bf93f84c277546473, which is the HEAD of this pull
> request.
> 
> All patches fix commits that were merged during this merge window.
> 
> Please pull.

Can you generate pull against current master?

$ git pull --ff-only 
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git
tags/kvm-s390-master-20150303   
>From git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux
 * tag   kvm-s390-master-20150303 -> FETCH_HEAD
fatal: Not possible to fast-forward, aborting.

> Christian
> 
> 
> The following changes since commit c517d838eb7d07bbe9507871fab3931deccff539:
> 
>   Linux 4.0-rc1 (2015-02-22 18:21:14 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  
> tags/kvm-s390-master-20150303
> 
> for you to fetch changes up to fb5bf93f84c277546473be35543ed7890f6e6742:
> 
>   KVM: s390: non-LPAR case obsolete during facilities mask init (2015-03-04 
> 10:33:25 +0100)
> 
> 
> KVM: s390: Fixups for changes in merge window for 4.0
> 
> Here are some fixups/improvements for
> 
> commit 658b6eda204 ("KVM: s390: add cpu model support")
> commit 9d8d578605b ("KVM: s390: use facilities and cpu_id per KVM")
> commit a374e892c34 ("KVM: s390/cpacf: Enable/disable protected key
> functions for kvm guest")
> commit 45c9b47c588 ("KVM: s390/CPACF: Choose crypto control block format")
> 
> which all have been merged during the merge window for 4.0.
> 
> 
> Christian Borntraeger (1):
>   KVM: s390/cpacf: Fix kernel bug under z/VM
> 
> Michael Mueller (3):
>   KVM: s390: fix in memory copy of facility lists
>   KVM: s390: include guest facilities in kvm facility test
>   KVM: s390: non-LPAR case obsolete during facilities mask init
> 
> Tony Krowiak (1):
>   KVM: s390/cpacf: Enable key wrapping by default
> 
>  arch/s390/include/asm/kvm_host.h | 12 +++
>  arch/s390/kvm/kvm-s390.c | 68 
> ++--
>  arch/s390/kvm/kvm-s390.h |  3 +-
>  arch/s390/kvm/priv.c |  2 +-
>  4 files changed, 40 insertions(+), 45 deletions(-)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq

2015-03-04 Thread Marcelo Tosatti
On Thu, Feb 26, 2015 at 09:23:57PM -0300, Marcelo Tosatti wrote:
> On Tue, Feb 17, 2015 at 06:44:19PM +0100, Sebastian Andrzej Siewior wrote:
> > * Peter Zijlstra | 2015-01-21 16:07:16 [+0100]:
> > 
> > >On Tue, Jan 20, 2015 at 01:16:13PM -0500, Steven Rostedt wrote:
> > >> I'm actually wondering if we should just nuke the _interruptible()
> > >> version of swait. As it should only be all interruptible or all not
> > >> interruptible, that the swait_wake() should just do the wake up
> > >> regardless. In which case, swait_wake() is good enough. No need to have
> > >> different versions where people may think do something special.
> > >> 
> > >> Peter?
> > >
> > >Yeah, I think the lastest thing I have sitting here on my disk only has
> > >the swake_up() which does TASK_NORMAL, no choice there.
> > 
> > what is the swait status in terms of mainline? This sounds like it
> > beeing worked on.
> > I could take the series but then I would drop it again if the mainline
> > implementation changes…
> 
> Hi Sebastian,
> 
> No, you would just adjust it to the upstream kernel interfaces, as the rest of
> the -rt users of the swait interfaces.
> 
> Can you please include the series?
> 
> Thanks

Sebastian?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: vmx: Set msr bitmap correctly if vcpu is in guest mode

2015-03-04 Thread Wincy Van
On Wed, Mar 4, 2015 at 11:58 PM, Bandan Das  wrote:
> Hi Wincy,
>
> Wincy Van  writes:
>
>> In commit 3af18d9c5fe9 ("KVM: nVMX: Prepare for using hardware MSR bitmap"),
>> we are setting MSR_BITMAP in prepare_vmcs02 if we should use hardware. This
>> is not enough since the field will be modified by following vmx_set_efer.
>>
>> Fix this by setting vmx_msr_bitmap_nested in vmx_set_msr_bitmap if vcpu is
>> in guest mode.
>>
>> Signed-off-by: Wincy Van 
>> ---
>>  arch/x86/kvm/vmx.c |   11 +++
>>  1 files changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index f7b20b4..10a481b 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -2168,7 +2168,10 @@ static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu)
>>  {
>>   unsigned long *msr_bitmap;
>>
>> - if (irqchip_in_kernel(vcpu->kvm) && apic_x2apic_mode(vcpu->arch.apic)) 
>> {
>> + if (is_guest_mode(vcpu))
>> + msr_bitmap = vmx_msr_bitmap_nested;
> Do you think this should be (is_guest_mode(vcpu) &
>   (exec_control &  
> CPU_BASED_USE_MSR_BITMAPS)) ?
>

We don't need to do that, because if we don't use the hardware, we will disable
hardware msr bitmap:

if (cpu_has_vmx_msr_bitmap() &&
exec_control & CPU_BASED_USE_MSR_BITMAPS) {
nested_vmx_merge_msr_bitmap(vcpu, vmcs12);
/* MSR_BITMAP will be set by following vmx_set_efer. */
} else
exec_control &= ~CPU_BASED_USE_MSR_BITMAPS;

Then, the hardware will ignore MSR_BITMAP.
Setting MSR_BITMAP without enabling it may be unnecessary, but it is no
harm, and we can reduce the code complexity of vmx_set_msr_bitmap.


Thanks,
Wincy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch -rt 1/2] KVM: use simple waitqueue for vcpu->wq

2015-03-04 Thread Sebastian Andrzej Siewior
On 03/05/2015 02:09 AM, Marcelo Tosatti wrote:
>> Can you please include the series?
>>
>> Thanks
> 
> Sebastian?

I will pick it up, don't worry. I think I do my -RT day tomorrow.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html