Re: [net-next RFC V4 PATCH 0/4] Multiqueue virtio-net

2012-06-26 Thread Jason Wang

On 06/26/2012 01:49 AM, Sridhar Samudrala wrote:

On 6/25/2012 2:16 AM, Jason Wang wrote:

Hello All:

This series is an update version of multiqueue virtio-net driver 
based on
Krishna Kumar's work to let virtio-net use multiple rx/tx queues to 
do the

packets reception and transmission. Please review and comments.

Test Environment:
- Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, 8 cores 2 numa nodes
- Two directed connected 82599

Test Summary:

- Highlights: huge improvements on TCP_RR test
- Lowlights: regression on small packet transmission, higher cpu 
utilization

  than single queue, need further optimization

Does this also scale with increased number of VMs?



Hi Sridhar:

Good suggestions, I didn't measure them. I would run test and post them.

Thanks


Thanks
Sridhar


Analysis of the performance result:

- I count the number of packets sending/receiving during the test, and
   multiqueue show much more ability in terms of packets per second.

- For the tx regression, multiqueue send about 1-2 times of more packets
   compared to single queue, and the packets size were much smaller 
than single
   queue does. I suspect tcp does less batching in multiqueue, so I 
hack the

   tcp_write_xmit() to forece more batching, multiqueue works as well as
   singlequeue for both small transmission and throughput

- I didn't pack the accelerate RFS with virtio-net in this sereis as 
it still

   need further shaping, for the one that interested in this please see:
   http://www.mail-archive.com/kvm@vger.kernel.org/msg64111.html




--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC V4 PATCH 0/4] Multiqueue virtio-net

2012-06-26 Thread Jason Wang

On 06/26/2012 02:01 AM, Shirley Ma wrote:

Hello Jason,

Good work. Do you have local guest to guest results?

Thanks
Shirley

Hi Shirley:

I would run tests to measure the performance and post here.

Thanks
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 4/5] KVM: emulator: move linearize() out of emulator code.

2012-06-26 Thread Gleb Natapov
On Mon, Jun 25, 2012 at 06:50:06PM +0300, Avi Kivity wrote:
Later we can extend x86_decode_insn() and the other
   functions to follow the same rule.
   
   What rule? We cannot not initialize a context. You can reduce things
   that should be initialized to minimum (getting GP registers on demand,
   etc), but still some initialization is needed since ctxt holds emulation
   state and it needs to be reset before each emulation.
  
  An alternative is to use two contexts, the base context only holds ops
  and is the parameter to all the callbacks on the non-state APIs, the
  derived context holds the state:
  
  struct x86_emulation_ctxt {
  struct x86_ops *ops;
  /* state that always needs to be initialized, preferablt none */
  };
  
  struct x86_insn_ctxt {
  struct x86_emulation_ctxt em;
  /* instruction state */
  }
  
  and so we have a compile-time split between users of the state and
  non-users.
  
  I do not understand how you will divide current ctxt structure between
  those two.
  
  Where will you put those for instance: interruptibility, have_exception,
  perm_ok, only_vendor_specific_insn and how can they not be initialized
  before each instruction emulation?
 
 x86_emulate_ops::get_interruptibility()
 x86_emulate_ops::set_interruptibility()
 x86_emulate_ops::exception()
 
They do not remove the need for initialization before instruction
execution, they just move things that need to be initialized somewhere
else (to kvm_arch_vcpu likely).

 x86_decode_insn(struct x86_insn_ctxt *ctxt, unsigned flags)
 {
 ctxt-flags = flags;
 ctxt-perm_ok = false;
 }
 
 In short, instruction emulation state is only seen by instruction
 emulation functions, the others don't get to see it.
 
So you want to divide emulator.c to two types of function: those without
side effect, that do some kind of calculations on vcpu state according
to weird x86 rules, and those that change vcpu state and write it back
eventually. I do not see the justification for that complication really.
emulator.c is complicated enough already and the line between two may be
blurred.

If you dislike linearize() callback so much I can make
kvm_linearize_address() to do calculation base on its parameters only.
It is almost there, only cpl and seg base/desc are missing from
parameter list. I can put it into header and x86.c/emulator.c will both
be able to use it.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 4/5] KVM: emulator: move linearize() out of emulator code.

2012-06-26 Thread Avi Kivity
On 06/26/2012 11:30 AM, Gleb Natapov wrote:
  
  Where will you put those for instance: interruptibility, have_exception,
  perm_ok, only_vendor_specific_insn and how can they not be initialized
  before each instruction emulation?
 
 x86_emulate_ops::get_interruptibility()
 x86_emulate_ops::set_interruptibility()
 x86_emulate_ops::exception()
 
 They do not remove the need for initialization before instruction
 execution, they just move things that need to be initialized somewhere
 else (to kvm_arch_vcpu likely).
 
 x86_decode_insn(struct x86_insn_ctxt *ctxt, unsigned flags)
 {
 ctxt-flags = flags;
 ctxt-perm_ok = false;
 }
 
 In short, instruction emulation state is only seen by instruction
 emulation functions, the others don't get to see it.
 
 So you want to divide emulator.c to two types of function: those without
 side effect, that do some kind of calculations on vcpu state according
 to weird x86 rules, and those that change vcpu state and write it back
 eventually. I do not see the justification for that complication really.
 emulator.c is complicated enough already and the line between two may be
 blurred.

Really, the only issue is that the read/write callbacks sometimes cannot
return a result.  Otherwise the entire thing would be stateless.

 If you dislike linearize() callback so much I can make
 kvm_linearize_address() to do calculation base on its parameters only.
 It is almost there, only cpl and seg base/desc are missing from
 parameter list. I can put it into header and x86.c/emulator.c will both
 be able to use it.

And all the stack mask and stuff?  Yuck.


-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/1] vhost, use_mm and KERNEL_DS

2012-06-26 Thread Christian Borntraeger
Folks,

here is a patch that fixes vhost to use USER_DS before
doing a use_mm/usercopy operation. This was found during
vhost prototyping on s390 were we have a separate user/kernel
address space.


Jens Freimann (1):
  use USER_DS in vhost_worker thread

 drivers/vhost/vhost.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] use USER_DS in vhost_worker thread

2012-06-26 Thread Michael S. Tsirkin
On Tue, Jun 26, 2012 at 12:59:58PM +0200, Christian Borntraeger wrote:
 From: Jens Freimann jf...@linux.vnet.ibm.com
 
 On some architectures address spaces are set up in a way that this is
 not necessary to work properly but on some others (like s390) it is.
 Make sure we operate on the user address space to allow copy_xxx_user()
 from the vhost_worker() thread by setting it explicitly before calling
 use_mm() and revert it after unuse_mm().
 
 Signed-off-by: Jens Freimann jf...@linux.vnet.ibm.com
 Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com

Acked-by: Michael S. Tsirkin m...@redhat.com

Dave, can you queue this up for 3.5 please?

Thanks.

 ---
  drivers/vhost/vhost.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
 index 94dbd25..112156f 100644
 --- a/drivers/vhost/vhost.c
 +++ b/drivers/vhost/vhost.c
 @@ -191,7 +191,9 @@ static int vhost_worker(void *data)
   struct vhost_dev *dev = data;
   struct vhost_work *work = NULL;
   unsigned uninitialized_var(seq);
 + mm_segment_t oldfs = get_fs();
  
 + set_fs(USER_DS);
   use_mm(dev-mm);
  
   for (;;) {
 @@ -229,6 +231,7 @@ static int vhost_worker(void *data)
  
   }
   unuse_mm(dev-mm);
 + set_fs(oldfs);
   return 0;
  }
  
 -- 
 1.7.0.4
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


grub is stuck in grub_biosdisk_check_int13_extensions() with qemu-kvm

2012-06-26 Thread Alexander Lyakas
Greetings everybody,

We are running qemu-kvm-0.14.0 on stock ubuntu-natty 2.6.38-8, and
booting stock ubuntu-precise image (3.2.0-25-generic), as a guest.
Occasionally, grub gets stuck before it reaches the OS selection menu.
After adding prints to grub, we discovered that it gets stuck in
grub_biosdisk_check_int13_extensions(), which does:

  struct grub_bios_int_registers regs;
  regs.edx = drive  0xff;
  regs.eax = 0x4100;
  regs.ebx = 0x55aa;
  regs.flags = GRUB_CPU_INT_FLAGS_DEFAULT;
  grub_bios_interrupt (0x13, regs);

We have confirmed that drive=0x80 at this point.

Also, during normal boots we see that this function always returns the
same values: flags:0x3202 eax:0x3000 ebx:0xaa55 ecx:0x7. This is in
line with code in extboot.S: int13_handler() and
check_if_extensions_present().

Can anybody advise on how to debug this problem further?

I am also attaching a screenshot of a stuck grub.

Thanks,
Alex.
attachment: grub_hang_with_drive.png

[PATCH 5/6] KVM: s390: use sigp condition code defines

2012-06-26 Thread Cornelia Huck
From: Heiko Carstens heiko.carst...@de.ibm.com

Just use the defines instead of using plain numbers and adding
a comment behind each line.

Signed-off-by: Heiko Carstens heiko.carst...@de.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 arch/s390/kvm/sigp.c |   58 +-
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
index ca544d5..97c9f36 100644
--- a/arch/s390/kvm/sigp.c
+++ b/arch/s390/kvm/sigp.c
@@ -26,19 +26,19 @@ static int __sigp_sense(struct kvm_vcpu *vcpu, u16 cpu_addr,
int rc;
 
if (cpu_addr = KVM_MAX_VCPUS)
-   return 3; /* not operational */
+   return SIGP_CC_NOT_OPERATIONAL;
 
spin_lock(fi-lock);
if (fi-local_int[cpu_addr] == NULL)
-   rc = 3; /* not operational */
+   rc = SIGP_CC_NOT_OPERATIONAL;
else if (!(atomic_read(fi-local_int[cpu_addr]-cpuflags)
   CPUSTAT_STOPPED)) {
*reg = 0xUL;
-   rc = 1; /* status stored */
+   rc = SIGP_CC_STATUS_STORED;
} else {
*reg = 0xUL;
*reg |= SIGP_STATUS_STOPPED;
-   rc = 1; /* status stored */
+   rc = SIGP_CC_STATUS_STORED;
}
spin_unlock(fi-lock);
 
@@ -54,7 +54,7 @@ static int __sigp_emergency(struct kvm_vcpu *vcpu, u16 
cpu_addr)
int rc;
 
if (cpu_addr = KVM_MAX_VCPUS)
-   return 3; /* not operational */
+   return SIGP_CC_NOT_OPERATIONAL;
 
inti = kzalloc(sizeof(*inti), GFP_KERNEL);
if (!inti)
@@ -66,7 +66,7 @@ static int __sigp_emergency(struct kvm_vcpu *vcpu, u16 
cpu_addr)
spin_lock(fi-lock);
li = fi-local_int[cpu_addr];
if (li == NULL) {
-   rc = 3; /* not operational */
+   rc = SIGP_CC_NOT_OPERATIONAL;
kfree(inti);
goto unlock;
}
@@ -77,7 +77,7 @@ static int __sigp_emergency(struct kvm_vcpu *vcpu, u16 
cpu_addr)
if (waitqueue_active(li-wq))
wake_up_interruptible(li-wq);
spin_unlock_bh(li-lock);
-   rc = 0; /* order accepted */
+   rc = SIGP_CC_ORDER_CODE_ACCEPTED;
VCPU_EVENT(vcpu, 4, sent sigp emerg to cpu %x, cpu_addr);
 unlock:
spin_unlock(fi-lock);
@@ -92,7 +92,7 @@ static int __sigp_external_call(struct kvm_vcpu *vcpu, u16 
cpu_addr)
int rc;
 
if (cpu_addr = KVM_MAX_VCPUS)
-   return 3; /* not operational */
+   return SIGP_CC_NOT_OPERATIONAL;
 
inti = kzalloc(sizeof(*inti), GFP_KERNEL);
if (!inti)
@@ -104,7 +104,7 @@ static int __sigp_external_call(struct kvm_vcpu *vcpu, u16 
cpu_addr)
spin_lock(fi-lock);
li = fi-local_int[cpu_addr];
if (li == NULL) {
-   rc = 3; /* not operational */
+   rc = SIGP_CC_NOT_OPERATIONAL;
kfree(inti);
goto unlock;
}
@@ -115,7 +115,7 @@ static int __sigp_external_call(struct kvm_vcpu *vcpu, u16 
cpu_addr)
if (waitqueue_active(li-wq))
wake_up_interruptible(li-wq);
spin_unlock_bh(li-lock);
-   rc = 0; /* order accepted */
+   rc = SIGP_CC_ORDER_CODE_ACCEPTED;
VCPU_EVENT(vcpu, 4, sent sigp ext call to cpu %x, cpu_addr);
 unlock:
spin_unlock(fi-lock);
@@ -143,7 +143,7 @@ static int __inject_sigp_stop(struct 
kvm_s390_local_interrupt *li, int action)
 out:
spin_unlock_bh(li-lock);
 
-   return 0; /* order accepted */
+   return SIGP_CC_ORDER_CODE_ACCEPTED;
 }
 
 static int __sigp_stop(struct kvm_vcpu *vcpu, u16 cpu_addr, int action)
@@ -153,12 +153,12 @@ static int __sigp_stop(struct kvm_vcpu *vcpu, u16 
cpu_addr, int action)
int rc;
 
if (cpu_addr = KVM_MAX_VCPUS)
-   return 3; /* not operational */
+   return SIGP_CC_NOT_OPERATIONAL;
 
spin_lock(fi-lock);
li = fi-local_int[cpu_addr];
if (li == NULL) {
-   rc = 3; /* not operational */
+   rc = SIGP_CC_NOT_OPERATIONAL;
goto unlock;
}
 
@@ -182,11 +182,11 @@ static int __sigp_set_arch(struct kvm_vcpu *vcpu, u32 
parameter)
 
switch (parameter  0xff) {
case 0:
-   rc = 3; /* not operational */
+   rc = SIGP_CC_NOT_OPERATIONAL;
break;
case 1:
case 2:
-   rc = 0; /* order accepted */
+   rc = SIGP_CC_ORDER_CODE_ACCEPTED;
break;
default:
rc = -EOPNOTSUPP;
@@ -209,12 +209,12 @@ static int __sigp_set_prefix(struct kvm_vcpu *vcpu, u16 
cpu_addr, u32 address,
   copy_from_guest_absolute(vcpu, tmp, address + PAGE_SIZE, 1)) {
*reg = 0xUL;
*reg |= SIGP_STATUS_INVALID_PARAMETER;
-   

[PATCH 4/6] KVM: s390: fix sigp set prefix status stored cases

2012-06-26 Thread Cornelia Huck
From: Heiko Carstens heiko.carst...@de.ibm.com

If an invalid parameter is passed or the addressed cpu is in an
incorrect state sigp set prefix will store a status.
This status must only have bits set as defined by the architecture.
The current kvm implementation missed to clear bits and also did
not set the intended status bit (and instead of or operation).

Signed-off-by: Heiko Carstens heiko.carst...@de.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 arch/s390/kvm/sigp.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
index caccc0e..ca544d5 100644
--- a/arch/s390/kvm/sigp.c
+++ b/arch/s390/kvm/sigp.c
@@ -207,6 +207,7 @@ static int __sigp_set_prefix(struct kvm_vcpu *vcpu, u16 
cpu_addr, u32 address,
address = address  0x7fffe000u;
if (copy_from_guest_absolute(vcpu, tmp, address, 1) ||
   copy_from_guest_absolute(vcpu, tmp, address + PAGE_SIZE, 1)) {
+   *reg = 0xUL;
*reg |= SIGP_STATUS_INVALID_PARAMETER;
return 1; /* invalid parameter */
}
@@ -220,8 +221,9 @@ static int __sigp_set_prefix(struct kvm_vcpu *vcpu, u16 
cpu_addr, u32 address,
li = fi-local_int[cpu_addr];
 
if (li == NULL) {
+   *reg = 0xUL;
+   *reg |= SIGP_STATUS_INCORRECT_STATE;
rc = 1; /* incorrect state */
-   *reg = SIGP_STATUS_INCORRECT_STATE;
kfree(inti);
goto out_fi;
}
@@ -229,8 +231,9 @@ static int __sigp_set_prefix(struct kvm_vcpu *vcpu, u16 
cpu_addr, u32 address,
spin_lock_bh(li-lock);
/* cpu must be in stopped state */
if (!(atomic_read(li-cpuflags)  CPUSTAT_STOPPED)) {
+   *reg = 0xUL;
+   *reg |= SIGP_STATUS_INCORRECT_STATE;
rc = 1; /* incorrect state */
-   *reg = SIGP_STATUS_INCORRECT_STATE;
kfree(inti);
goto out_li;
}
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] kvm/s390: sigp related changes for 3.6

2012-06-26 Thread Cornelia Huck
Avi, Marcelo,

here are some more s390 patches for the next release.

Patches 1 and 2 are included for dependency reasons; they will also
be sent through Martin's s390 tree.

The other patches fix several problems in our sigp handling code
and make it nicer to read.

Cornelia Huck (1):
  KVM: s390: Fix sigp sense handling.

Heiko Carstens (5):
  s390/smp: remove redundant check
  s390/smp/kvm: unifiy sigp definitions
  KVM: s390: fix sigp sense running condition code handling
  KVM: s390: fix sigp set prefix status stored cases
  KVM: s390: use sigp condition code defines

 arch/s390/include/asm/sigp.h |   32 
 arch/s390/kernel/smp.c   |   76 ++-
 arch/s390/kvm/sigp.c |  117 +-
 3 files changed, 106 insertions(+), 119 deletions(-)
 create mode 100644 arch/s390/include/asm/sigp.h

-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] s390/smp/kvm: unifiy sigp definitions

2012-06-26 Thread Cornelia Huck
From: Heiko Carstens heiko.carst...@de.ibm.com

The smp and the kvm code have different defines for the sigp order codes.
Let's just have a single place where these are defined.
Also move the sigp condition code and sigp cpu status bits to the new
sigp.h header file.

Signed-off-by: Heiko Carstens heiko.carst...@de.ibm.com
Signed-off-by: Martin Schwidefsky schwidef...@de.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 arch/s390/include/asm/sigp.h |   31 ++
 arch/s390/kernel/smp.c   |   72 ++
 arch/s390/kvm/sigp.c |   46 ++-
 3 files changed, 64 insertions(+), 85 deletions(-)
 create mode 100644 arch/s390/include/asm/sigp.h

diff --git a/arch/s390/include/asm/sigp.h b/arch/s390/include/asm/sigp.h
new file mode 100644
index 000..7306270
--- /dev/null
+++ b/arch/s390/include/asm/sigp.h
@@ -0,0 +1,31 @@
+#ifndef __S390_ASM_SIGP_H
+#define __S390_ASM_SIGP_H
+
+/* SIGP order codes */
+#define SIGP_SENSE   1
+#define SIGP_EXTERNAL_CALL   2
+#define SIGP_EMERGENCY_SIGNAL3
+#define SIGP_STOP5
+#define SIGP_RESTART 6
+#define SIGP_STOP_AND_STORE_STATUS9
+#define SIGP_INITIAL_CPU_RESET  11
+#define SIGP_SET_PREFIX 13
+#define SIGP_STORE_STATUS_AT_ADDRESS 14
+#define SIGP_SET_ARCHITECTURE   18
+#define SIGP_SENSE_RUNNING  21
+
+/* SIGP condition codes */
+#define SIGP_CC_ORDER_CODE_ACCEPTED 0
+#define SIGP_CC_STATUS_STORED  1
+#define SIGP_CC_BUSY   2
+#define SIGP_CC_NOT_OPERATIONAL3
+
+/* SIGP cpu status bits */
+
+#define SIGP_STATUS_CHECK_STOP 0x0010UL
+#define SIGP_STATUS_STOPPED0x0040UL
+#define SIGP_STATUS_INVALID_PARAMETER  0x0100UL
+#define SIGP_STATUS_INCORRECT_STATE0x0200UL
+#define SIGP_STATUS_NOT_RUNNING0x0400UL
+
+#endif /* __S390_ASM_SIGP_H */
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index c78074c..6e4047e 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -44,34 +44,10 @@
 #include asm/vdso.h
 #include asm/debug.h
 #include asm/os_info.h
+#include asm/sigp.h
 #include entry.h
 
 enum {
-   sigp_sense = 1,
-   sigp_external_call = 2,
-   sigp_emergency_signal = 3,
-   sigp_start = 4,
-   sigp_stop = 5,
-   sigp_restart = 6,
-   sigp_stop_and_store_status = 9,
-   sigp_initial_cpu_reset = 11,
-   sigp_cpu_reset = 12,
-   sigp_set_prefix = 13,
-   sigp_store_status_at_address = 14,
-   sigp_store_extended_status_at_address = 15,
-   sigp_set_architecture = 18,
-   sigp_conditional_emergency_signal = 19,
-   sigp_sense_running = 21,
-};
-
-enum {
-   sigp_order_code_accepted = 0,
-   sigp_status_stored = 1,
-   sigp_busy = 2,
-   sigp_not_operational = 3,
-};
-
-enum {
ec_schedule = 0,
ec_call_function,
ec_call_function_single,
@@ -124,7 +100,7 @@ static inline int __pcpu_sigp_relax(u16 addr, u8 order, u32 
parm, u32 *status)
 
while (1) {
cc = __pcpu_sigp(addr, order, parm, status);
-   if (cc != sigp_busy)
+   if (cc != SIGP_CC_BUSY)
return cc;
cpu_relax();
}
@@ -136,7 +112,7 @@ static int pcpu_sigp_retry(struct pcpu *pcpu, u8 order, u32 
parm)
 
for (retry = 0; ; retry++) {
cc = __pcpu_sigp(pcpu-address, order, parm, pcpu-status);
-   if (cc != sigp_busy)
+   if (cc != SIGP_CC_BUSY)
break;
if (retry = 3)
udelay(10);
@@ -146,8 +122,8 @@ static int pcpu_sigp_retry(struct pcpu *pcpu, u8 order, u32 
parm)
 
 static inline int pcpu_stopped(struct pcpu *pcpu)
 {
-   if (__pcpu_sigp(pcpu-address, sigp_sense,
-   0, pcpu-status) != sigp_status_stored)
+   if (__pcpu_sigp(pcpu-address, SIGP_SENSE,
+   0, pcpu-status) != SIGP_CC_STATUS_STORED)
return 0;
/* Check for stopped and check stop state */
return !!(pcpu-status  0x50);
@@ -155,8 +131,8 @@ static inline int pcpu_stopped(struct pcpu *pcpu)
 
 static inline int pcpu_running(struct pcpu *pcpu)
 {
-   if (__pcpu_sigp(pcpu-address, sigp_sense_running,
-   0, pcpu-status) != sigp_status_stored)
+   if (__pcpu_sigp(pcpu-address, SIGP_SENSE_RUNNING,
+   0, pcpu-status) != SIGP_CC_STATUS_STORED)
return 1;
/* Status stored condition code is equivalent to cpu not running. */
return 0;
@@ -181,7 +157,7 @@ static void pcpu_ec_call(struct pcpu *pcpu, int ec_bit)
 
set_bit(ec_bit, pcpu-ec_mask);
order = pcpu_running(pcpu) ?
-   sigp_external_call : sigp_emergency_signal;
+   SIGP_EXTERNAL_CALL : SIGP_EMERGENCY_SIGNAL;

[PATCH 3/6] KVM: s390: fix sigp sense running condition code handling

2012-06-26 Thread Cornelia Huck
From: Heiko Carstens heiko.carst...@de.ibm.com

Only if the sensed cpu is not running a status is stored, which
is reflected by condition code 1. If the cpu is running, condition
code 0 should be returned.
Just the opposite of what the code is doing.

Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
Signed-off-by: Heiko Carstens heiko.carst...@de.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 arch/s390/kvm/sigp.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
index fda1d64..caccc0e 100644
--- a/arch/s390/kvm/sigp.c
+++ b/arch/s390/kvm/sigp.c
@@ -268,12 +268,12 @@ static int __sigp_sense_running(struct kvm_vcpu *vcpu, 
u16 cpu_addr,
if (atomic_read(fi-local_int[cpu_addr]-cpuflags)
 CPUSTAT_RUNNING) {
/* running */
-   rc = 1;
+   rc = 0;
} else {
/* not running */
*reg = 0xUL;
*reg |= SIGP_STATUS_NOT_RUNNING;
-   rc = 0;
+   rc = 1;
}
}
spin_unlock(fi-lock);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] s390/smp: remove redundant check

2012-06-26 Thread Cornelia Huck
From: Heiko Carstens heiko.carst...@de.ibm.com

condition code status stored for sigp sense running always implies
that only the not running status bit is set. Therefore no need to
check if it is set.

Signed-off-by: Heiko Carstens heiko.carst...@de.ibm.com
Signed-off-by: Martin Schwidefsky schwidef...@de.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 arch/s390/kernel/smp.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 15cca26..c78074c 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -158,8 +158,8 @@ static inline int pcpu_running(struct pcpu *pcpu)
if (__pcpu_sigp(pcpu-address, sigp_sense_running,
0, pcpu-status) != sigp_status_stored)
return 1;
-   /* Check for running status */
-   return !(pcpu-status  0x400);
+   /* Status stored condition code is equivalent to cpu not running. */
+   return 0;
 }
 
 /*
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] KVM: s390: Fix sigp sense handling.

2012-06-26 Thread Cornelia Huck
If sigp sense doesn't have any status bits to report, it should set
cc 0 and leave the register as-is.

Since we know about the external call pending bit, we should report
it if it is set as well.

Acked-by: Heiko Carstens heiko.carst...@de.ibm.com
Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 arch/s390/include/asm/sigp.h |1 +
 arch/s390/kvm/sigp.c |   14 +-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/s390/include/asm/sigp.h b/arch/s390/include/asm/sigp.h
index 7306270..5a87d16 100644
--- a/arch/s390/include/asm/sigp.h
+++ b/arch/s390/include/asm/sigp.h
@@ -24,6 +24,7 @@
 
 #define SIGP_STATUS_CHECK_STOP 0x0010UL
 #define SIGP_STATUS_STOPPED0x0040UL
+#define SIGP_STATUS_EXT_CALL_PENDING   0x0080UL
 #define SIGP_STATUS_INVALID_PARAMETER  0x0100UL
 #define SIGP_STATUS_INCORRECT_STATE0x0200UL
 #define SIGP_STATUS_NOT_RUNNING0x0400UL
diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
index 97c9f36..6ed8175 100644
--- a/arch/s390/kvm/sigp.c
+++ b/arch/s390/kvm/sigp.c
@@ -32,12 +32,16 @@ static int __sigp_sense(struct kvm_vcpu *vcpu, u16 cpu_addr,
if (fi-local_int[cpu_addr] == NULL)
rc = SIGP_CC_NOT_OPERATIONAL;
else if (!(atomic_read(fi-local_int[cpu_addr]-cpuflags)
-  CPUSTAT_STOPPED)) {
-   *reg = 0xUL;
-   rc = SIGP_CC_STATUS_STORED;
-   } else {
+   (CPUSTAT_ECALL_PEND | CPUSTAT_STOPPED)))
+   rc = SIGP_CC_ORDER_CODE_ACCEPTED;
+   else {
*reg = 0xUL;
-   *reg |= SIGP_STATUS_STOPPED;
+   if (atomic_read(fi-local_int[cpu_addr]-cpuflags)
+CPUSTAT_ECALL_PEND)
+   *reg |= SIGP_STATUS_EXT_CALL_PENDING;
+   if (atomic_read(fi-local_int[cpu_addr]-cpuflags)
+CPUSTAT_STOPPED)
+   *reg |= SIGP_STATUS_STOPPED;
rc = SIGP_CC_STATUS_STORED;
}
spin_unlock(fi-lock);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] KVM: s390: fix sigp sense running condition code handling

2012-06-26 Thread Alexander Graf

On 26.06.2012, at 16:06, Cornelia Huck wrote:

 From: Heiko Carstens heiko.carst...@de.ibm.com
 
 Only if the sensed cpu is not running a status is stored, which
 is reflected by condition code 1. If the cpu is running, condition
 code 0 should be returned.
 Just the opposite of what the code is doing.
 
 Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
 Signed-off-by: Heiko Carstens heiko.carst...@de.ibm.com
 Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com

Yikes. Is this a stable candidate?


Alex

 ---
 arch/s390/kvm/sigp.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
 index fda1d64..caccc0e 100644
 --- a/arch/s390/kvm/sigp.c
 +++ b/arch/s390/kvm/sigp.c
 @@ -268,12 +268,12 @@ static int __sigp_sense_running(struct kvm_vcpu *vcpu, 
 u16 cpu_addr,
   if (atomic_read(fi-local_int[cpu_addr]-cpuflags)
CPUSTAT_RUNNING) {
   /* running */
 - rc = 1;
 + rc = 0;
   } else {
   /* not running */
   *reg = 0xUL;
   *reg |= SIGP_STATUS_NOT_RUNNING;
 - rc = 0;
 + rc = 1;
   }
   }
   spin_unlock(fi-lock);
 -- 
 1.7.10.4
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] KVM: s390: fix sigp set prefix status stored cases

2012-06-26 Thread Alexander Graf

On 26.06.2012, at 16:06, Cornelia Huck wrote:

 From: Heiko Carstens heiko.carst...@de.ibm.com
 
 If an invalid parameter is passed or the addressed cpu is in an
 incorrect state sigp set prefix will store a status.
 This status must only have bits set as defined by the architecture.
 The current kvm implementation missed to clear bits and also did
 not set the intended status bit (and instead of or operation).
 
 Signed-off-by: Heiko Carstens heiko.carst...@de.ibm.com
 Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com

What was the net effect of this for a guest? Any problems rising from it?


Alex

 ---
 arch/s390/kvm/sigp.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
 index caccc0e..ca544d5 100644
 --- a/arch/s390/kvm/sigp.c
 +++ b/arch/s390/kvm/sigp.c
 @@ -207,6 +207,7 @@ static int __sigp_set_prefix(struct kvm_vcpu *vcpu, u16 
 cpu_addr, u32 address,
   address = address  0x7fffe000u;
   if (copy_from_guest_absolute(vcpu, tmp, address, 1) ||
  copy_from_guest_absolute(vcpu, tmp, address + PAGE_SIZE, 1)) {
 + *reg = 0xUL;
   *reg |= SIGP_STATUS_INVALID_PARAMETER;
   return 1; /* invalid parameter */
   }
 @@ -220,8 +221,9 @@ static int __sigp_set_prefix(struct kvm_vcpu *vcpu, u16 
 cpu_addr, u32 address,
   li = fi-local_int[cpu_addr];
 
   if (li == NULL) {
 + *reg = 0xUL;
 + *reg |= SIGP_STATUS_INCORRECT_STATE;
   rc = 1; /* incorrect state */
 - *reg = SIGP_STATUS_INCORRECT_STATE;
   kfree(inti);
   goto out_fi;
   }
 @@ -229,8 +231,9 @@ static int __sigp_set_prefix(struct kvm_vcpu *vcpu, u16 
 cpu_addr, u32 address,
   spin_lock_bh(li-lock);
   /* cpu must be in stopped state */
   if (!(atomic_read(li-cpuflags)  CPUSTAT_STOPPED)) {
 + *reg = 0xUL;
 + *reg |= SIGP_STATUS_INCORRECT_STATE;
   rc = 1; /* incorrect state */
 - *reg = SIGP_STATUS_INCORRECT_STATE;
   kfree(inti);
   goto out_li;
   }
 -- 
 1.7.10.4
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm-1.0 crashes with threaded vnc server?

2012-06-26 Thread Peter Lieven

On 13.03.2012 16:06, Alexander Graf wrote:

On 13.03.2012, at 16:05, Corentin Chary wrote:


On Tue, Mar 13, 2012 at 12:29 PM, Peter Lievenp...@dlh.net  wrote:

On 11.02.2012 09:55, Corentin Chary wrote:

On Thu, Feb 9, 2012 at 7:08 PM, Peter Lievenp...@dlh.net   wrote:

Hi,

is anyone aware if there are still problems when enabling the threaded
vnc
server?
I saw some VMs crashing when using a qemu-kvm build with
--enable-vnc-thread.

qemu-kvm-1.0[22646]: segfault at 0 ip 7fec1ca7ea0b sp
7fec19d056d0
error 6 in libz.so.1.2.3.3[7fec1ca75000+16000]
qemu-kvm-1.0[26056]: segfault at 7f06d8d6e010 ip 7f06e0a30d71 sp
7f06df035748 error 6 in libc-2.11.1.so[7f06e09aa000+17a000]

I had no time to debug further. It seems to happen shortly after
migrating,
but thats uncertain. At least the segfault in libz seems to
give a hint to VNC since I cannot image of any other part of qemu-kvm
using
libz except for VNC server.

Thanks,
Peter



Hi Peter,
I found two patches on my git tree that I sent long ago but somehow
get lost on the mailing list. I rebased the tree but did not have the
time (yet) to test them.
http://git.iksaif.net/?p=qemu.git;a=shortlog;h=refs/heads/wip
Feel free to try them. If QEMU segfault again, please send a full gdb
backtrace / valgrind trace / way to reproduce :).
Thanks,


I have seen no more crashes with these to patches applied. I would suggest
it would be good to push them to the master repository.

Thank you,
Peter


Ccing Alexander,

Ah, cool. Corentin, I think you're right now the closest thing we have to a 
maintainer for VNC. Could you please just send out a pull request for those?

hi all,

i suspect there is still a problem with the threaded vnc server. its 
just a guess, but we saw a resonable number of vms hanging in the
last weeks. hanging meaning the emulation is stopped and the qemu-kvm 
process does no longer react, not on monitor, not on vnc, not on qmp.
why i suspect the threaded vnc server is that in all cases we have 
analyzed this happened with an open vnc session and only on nodes with 
the threaded vnc server
enabled. it might also be the case that this happens at a resolution 
change. is there anything known or has someone an idea?


we are running qemu-kvm 1.0.1 with

  vnc: don't mess up with iohandlers in the vnc thread

  vnc: Limit r/w access to size of allocated memory

compiled in.

unfortunately, i was not yet able to reproduce this with a debugger 
attached.


thanks,
peter



Alex



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42782] IO_PAGE_FAULT while starting xorg

2012-06-26 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42782





--- Comment #11 from Reartes Guillermo rtgui...@gmail.com  2012-06-26 
15:11:16 ---
I booted with iommu=pt kernel parameter and i stopped getting these messages.

I am experimenting with pcie kvm pass-through, so i ended using that parameter.
Sadly the whole host freezes when i power on the guest, but that is outside of
this bug-report.

 In my case, it only happens at POWER-ON, not when doing a RESET.
 So to reproduce it one must POWER-CYCLE.

I don't think that was actually not 100% accurate.



# dmesg| grep -i amd-vi 
[1.774021] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: 3e info 1300
[1.774024] AMD-Vi:mmio-addr: feb2
[1.774259] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:00.0 flags: 00
[1.774262] AMD-Vi:   DEV_RANGE_END   devid: 00:00.2
[1.774264] AMD-Vi:   DEV_SELECT  devid: 00:02.0 flags:
00
[1.774266] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 01:00.0 flags: 00
[1.774268] AMD-Vi:   DEV_RANGE_END   devid: 01:00.1
[1.774270] AMD-Vi:   DEV_SELECT  devid: 00:04.0 flags:
00
[1.774272] AMD-Vi:   DEV_SELECT  devid: 02:00.0 flags:
00
[1.774274] AMD-Vi:   DEV_SELECT  devid: 00:05.0 flags:
00
[1.774275] AMD-Vi:   DEV_SELECT  devid: 03:00.0 flags:
00
[1.774277] AMD-Vi:   DEV_SELECT  devid: 00:06.0 flags:
00
[1.774279] AMD-Vi:   DEV_SELECT  devid: 04:00.0 flags:
00
[1.774281] AMD-Vi:   DEV_SELECT  devid: 00:09.0 flags:
00
[1.774283] AMD-Vi:   DEV_SELECT  devid: 05:00.0 flags:
00
[1.774284] AMD-Vi:   DEV_SELECT  devid: 00:11.0 flags:
00
[1.774286] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:12.0 flags: 00
[1.774288] AMD-Vi:   DEV_RANGE_END   devid: 00:12.2
[1.774290] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:13.0 flags: 00
[1.774292] AMD-Vi:   DEV_RANGE_END   devid: 00:13.2
[1.774294] AMD-Vi:   DEV_SELECT  devid: 00:14.0 flags:
d7
[1.774296] AMD-Vi:   DEV_SELECT  devid: 00:14.3 flags:
00
[1.774297] AMD-Vi:   DEV_SELECT  devid: 00:14.4 flags:
00
[1.774300] AMD-Vi:   DEV_ALIAS_RANGE devid: 06:00.0 flags:
00 devid_to: 00:14.4
[1.774302] AMD-Vi:   DEV_RANGE_END   devid: 06:1f.7
[1.774311] AMD-Vi:   DEV_SELECT  devid: 00:14.5 flags:
00
[1.774313] AMD-Vi:   DEV_SELECT  devid: 00:15.0 flags:
00
[1.774315] AMD-Vi:   DEV_SELECT  devid: 07:00.0 flags:
00
[1.774317] AMD-Vi:   DEV_SELECT  devid: 00:15.1 flags:
00
[1.774319] AMD-Vi:   DEV_SELECT  devid: 08:00.0 flags:
00
[1.774320] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:16.0 flags: 00
[1.774322] AMD-Vi:   DEV_RANGE_END   devid: 00:16.2
[1.774428] AMD-Vi: Enabling IOMMU at :00:00.2 cap 0x40
[1.827755] AMD-Vi: Initialized for Passthrough Mode

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] KVM: s390: fix sigp sense running condition code handling

2012-06-26 Thread Cornelia Huck
On Tue, 26 Jun 2012 16:52:56 +0200
Alexander Graf ag...@suse.de wrote:

 
 On 26.06.2012, at 16:06, Cornelia Huck wrote:
 
  From: Heiko Carstens heiko.carst...@de.ibm.com
  
  Only if the sensed cpu is not running a status is stored, which
  is reflected by condition code 1. If the cpu is running, condition
  code 0 should be returned.
  Just the opposite of what the code is doing.
  
  Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
  Signed-off-by: Heiko Carstens heiko.carst...@de.ibm.com
  Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
 
 Yikes. Is this a stable candidate?

This code will only hit when running on a host running virtualized
itself (where sigp sense running will cause an intercept), so I doubt
many people will see the effects.

Cornelia

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] KVM: s390: fix sigp sense running condition code handling

2012-06-26 Thread Alexander Graf

On 26.06.2012, at 17:33, Cornelia Huck wrote:

 On Tue, 26 Jun 2012 16:52:56 +0200
 Alexander Graf ag...@suse.de wrote:
 
 
 On 26.06.2012, at 16:06, Cornelia Huck wrote:
 
 From: Heiko Carstens heiko.carst...@de.ibm.com
 
 Only if the sensed cpu is not running a status is stored, which
 is reflected by condition code 1. If the cpu is running, condition
 code 0 should be returned.
 Just the opposite of what the code is doing.
 
 Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
 Signed-off-by: Heiko Carstens heiko.carst...@de.ibm.com
 Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
 
 Yikes. Is this a stable candidate?
 
 This code will only hit when running on a host running virtualized
 itself (where sigp sense running will cause an intercept), so I doubt
 many people will see the effects.

You mean this will hit when running kvm inside of a z/VM VM? That's a pretty 
valid use case.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] KVM: s390: fix sigp set prefix status stored cases

2012-06-26 Thread Cornelia Huck
On Tue, 26 Jun 2012 16:56:08 +0200
Alexander Graf ag...@suse.de wrote:

 
 On 26.06.2012, at 16:06, Cornelia Huck wrote:
 
  From: Heiko Carstens heiko.carst...@de.ibm.com
  
  If an invalid parameter is passed or the addressed cpu is in an
  incorrect state sigp set prefix will store a status.
  This status must only have bits set as defined by the architecture.
  The current kvm implementation missed to clear bits and also did
  not set the intended status bit (and instead of or operation).
  
  Signed-off-by: Heiko Carstens heiko.carst...@de.ibm.com
  Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
 
 What was the net effect of this for a guest? Any problems rising from it?

The guest might see some unexpected status bits if it did call sigp set
prefix in an incorrect way. I'm not aware that anybody has actually
seen that.

Cornelia

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] KVM: s390: fix sigp set prefix status stored cases

2012-06-26 Thread Alexander Graf


Am 26.06.2012 um 17:39 schrieb Cornelia Huck cornelia.h...@de.ibm.com:

 On Tue, 26 Jun 2012 16:56:08 +0200
 Alexander Graf ag...@suse.de wrote:
 
 
 On 26.06.2012, at 16:06, Cornelia Huck wrote:
 
 From: Heiko Carstens heiko.carst...@de.ibm.com
 
 If an invalid parameter is passed or the addressed cpu is in an
 incorrect state sigp set prefix will store a status.
 This status must only have bits set as defined by the architecture.
 The current kvm implementation missed to clear bits and also did
 not set the intended status bit (and instead of or operation).
 
 Signed-off-by: Heiko Carstens heiko.carst...@de.ibm.com
 Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
 
 What was the net effect of this for a guest? Any problems rising from it?
 
 The guest might see some unexpected status bits if it did call sigp set
 prefix in an incorrect way. I'm not aware that anybody has actually
 seen that.

Yeah, we only need the set prefix on vm init and here it should always succeed, 
so this one is not all that urgent :).

Alex

 
 Cornelia
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] KVM: s390: fix sigp sense running condition code handling

2012-06-26 Thread Cornelia Huck
On Tue, 26 Jun 2012 17:36:19 +0200
Alexander Graf ag...@suse.de wrote:

 
 On 26.06.2012, at 17:33, Cornelia Huck wrote:
 
  On Tue, 26 Jun 2012 16:52:56 +0200
  Alexander Graf ag...@suse.de wrote:
  
  
  On 26.06.2012, at 16:06, Cornelia Huck wrote:
  
  From: Heiko Carstens heiko.carst...@de.ibm.com
  
  Only if the sensed cpu is not running a status is stored, which
  is reflected by condition code 1. If the cpu is running, condition
  code 0 should be returned.
  Just the opposite of what the code is doing.
  
  Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
  Signed-off-by: Heiko Carstens heiko.carst...@de.ibm.com
  Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
  
  Yikes. Is this a stable candidate?
  
  This code will only hit when running on a host running virtualized
  itself (where sigp sense running will cause an intercept), so I doubt
  many people will see the effects.
 
 You mean this will hit when running kvm inside of a z/VM VM? That's a pretty 
 valid use case.

I'd have thought it was a very uncommon one. But I certainly don't
object against putting the fix into stable.

Cornelia

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] KVM: s390: fix sigp sense running condition code handling

2012-06-26 Thread Alexander Graf

On 26.06.2012, at 17:57, Cornelia Huck wrote:

 On Tue, 26 Jun 2012 17:36:19 +0200
 Alexander Graf ag...@suse.de wrote:
 
 
 On 26.06.2012, at 17:33, Cornelia Huck wrote:
 
 On Tue, 26 Jun 2012 16:52:56 +0200
 Alexander Graf ag...@suse.de wrote:
 
 
 On 26.06.2012, at 16:06, Cornelia Huck wrote:
 
 From: Heiko Carstens heiko.carst...@de.ibm.com
 
 Only if the sensed cpu is not running a status is stored, which
 is reflected by condition code 1. If the cpu is running, condition
 code 0 should be returned.
 Just the opposite of what the code is doing.
 
 Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
 Signed-off-by: Heiko Carstens heiko.carst...@de.ibm.com
 Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
 
 Yikes. Is this a stable candidate?
 
 This code will only hit when running on a host running virtualized
 itself (where sigp sense running will cause an intercept), so I doubt
 many people will see the effects.
 
 You mean this will hit when running kvm inside of a z/VM VM? That's a pretty 
 valid use case.
 
 I'd have thought it was a very uncommon one. But I certainly don't
 object against putting the fix into stable.

It's not exactly useful for productive things, but just for prototyping KVM, 
people tend to not have spare LPARs lying around :)
So yes, this should go into stable.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] KVM: Add new -cpu best

2012-06-26 Thread Alexander Graf
During discussions on whether to make -cpu host the default in SLE, I found
myself disagreeing to the thought, because it potentially opens a big can
of worms for potential bugs. But if I already am so opposed to it for SLE, how
can it possibly be reasonable to default to -cpu host in upstream QEMU? And
what would a sane default look like?

So I had this idea of looping through all available CPU definitions. We can
pretty well tell if our host is able to execute any of them by checking the
respective flags and seeing if our host has all features the CPU definition
requires. With that, we can create a -cpu type that would fall back to the
best known CPU definition that our host can fulfill. On my Phenom II
system for example, that would be -cpu phenom.

With this approach we can test and verify that CPU types actually work at
any random user setup, because we can always verify that all the -cpu types
we ship actually work. And we only default to some clever mechanism that
chooses from one of these.

Signed-off-by: Alexander Graf ag...@suse.de
---
 target-i386/cpu.c |   81 +
 1 files changed, 81 insertions(+), 0 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index fdd95be..98cc1ec 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -558,6 +558,85 @@ static int cpu_x86_fill_host(x86_def_t *x86_cpu_def)
 return 0;
 }
 
+/* Are all guest feature bits present on the host? */
+static bool cpu_x86_feature_subset(uint32_t host, uint32_t guest)
+{
+int i;
+
+for (i = 0; i  32; i++) {
+uint32_t mask = 1  i;
+if ((guest  mask)  !(host  mask)) {
+return false;
+}
+}
+
+return true;
+}
+
+/* Does the host support all the features of the CPU definition? */
+static bool cpu_x86_fits_host(x86_def_t *x86_cpu_def)
+{
+uint32_t eax = 0, ebx = 0, ecx = 0, edx = 0;
+
+host_cpuid(0x0, 0, eax, ebx, ecx, edx);
+if (x86_cpu_def-level  eax) {
+return false;
+}
+if ((x86_cpu_def-vendor1 != ebx) ||
+(x86_cpu_def-vendor2 != edx) ||
+(x86_cpu_def-vendor3 != ecx)) {
+return false;
+}
+
+host_cpuid(0x1, 0, eax, ebx, ecx, edx);
+if (!cpu_x86_feature_subset(ecx, x86_cpu_def-ext_features) ||
+!cpu_x86_feature_subset(edx, x86_cpu_def-features)) {
+return false;
+}
+
+host_cpuid(0x8000, 0, eax, ebx, ecx, edx);
+if (x86_cpu_def-xlevel  eax) {
+return false;
+}
+
+host_cpuid(0x8001, 0, eax, ebx, ecx, edx);
+if (!cpu_x86_feature_subset(edx, x86_cpu_def-ext2_features) ||
+!cpu_x86_feature_subset(ecx, x86_cpu_def-ext3_features)) {
+return false;
+}
+
+return true;
+}
+
+/* Returns true when new_def is higher versioned than old_def */
+static int cpu_x86_fits_higher(x86_def_t *new_def, x86_def_t *old_def)
+{
+int old_fammod = (old_def-family  24) | (old_def-model  8)
+   | (old_def-stepping);
+int new_fammod = (new_def-family  24) | (new_def-model  8)
+   | (new_def-stepping);
+
+return new_fammod  old_fammod;
+}
+
+static void cpu_x86_fill_best(x86_def_t *x86_cpu_def)
+{
+x86_def_t *def;
+
+x86_cpu_def-family = 0;
+x86_cpu_def-model = 0;
+for (def = x86_defs; def; def = def-next) {
+if (cpu_x86_fits_host(def)  cpu_x86_fits_higher(def, x86_cpu_def)) {
+memcpy(x86_cpu_def, def, sizeof(*def));
+}
+}
+
+if (!x86_cpu_def-family  !x86_cpu_def-model) {
+fprintf(stderr, No fitting CPU model found!\n);
+exit(1);
+}
+}
+
 static int unavailable_host_feature(struct model_features_t *f, uint32_t mask)
 {
 int i;
@@ -878,6 +957,8 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
const char *cpu_model)
 break;
 if (kvm_enabled()  name  strcmp(name, host) == 0) {
 cpu_x86_fill_host(x86_cpu_def);
+} else if (kvm_enabled()  name  strcmp(name, best) == 0) {
+cpu_x86_fill_best(x86_cpu_def);
 } else if (!def) {
 goto error;
 } else {
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] KVM: Use -cpu best as default on x86

2012-06-26 Thread Alexander Graf
When running QEMU without -cpu parameter, the user usually wants a sane
default. So far, we're using the qemu64/qemu32 CPU type, which basically
means the maximum TCG can emulate.

That's a really good default when using TCG, but when running with KVM
we much rather want a default saying the maximum performance I can get.

Fortunately we just added an option that gives us the best performance
while still staying safe on the testability side of things: -cpu best.
So all we need to do is make -cpu best the default when the user doesn't
explicitly specify a CPU type.

This fixes a lot of subtile breakage in the GNU toolchain (libgmp) which
hicks up on QEMU's non-existent CPU models.

This patch also adds a new pc-1.2 machine type to stay backwards compatible
with older versions of QEMU.

Signed-off-by: Alexander Graf ag...@suse.de

---

v1 - v2:

  - rebase
---
 hw/pc_piix.c |   45 -
 1 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index eae258c..eafd383 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -127,7 +127,8 @@ static void pc_init1(MemoryRegion *system_memory,
  const char *initrd_filename,
  const char *cpu_model,
  int pci_enabled,
- int kvmclock_enabled)
+ int kvmclock_enabled,
+ int may_cpu_best)
 {
 int i;
 ram_addr_t below_4g_mem_size, above_4g_mem_size;
@@ -149,6 +150,9 @@ static void pc_init1(MemoryRegion *system_memory,
 MemoryRegion *rom_memory;
 void *fw_cfg = NULL;
 
+if (!cpu_model  kvm_enabled()  may_cpu_best) {
+cpu_model = best;
+}
 pc_cpus_init(cpu_model);
 
 if (kvmclock_enabled) {
@@ -298,7 +302,21 @@ static void pc_init_pci(ram_addr_t ram_size,
  get_system_io(),
  ram_size, boot_device,
  kernel_filename, kernel_cmdline,
- initrd_filename, cpu_model, 1, 1);
+ initrd_filename, cpu_model, 1, 1, 1);
+}
+
+static void pc_init_pci_oldcpu(ram_addr_t ram_size,
+   const char *boot_device,
+   const char *kernel_filename,
+   const char *kernel_cmdline,
+   const char *initrd_filename,
+   const char *cpu_model)
+{
+pc_init1(get_system_memory(),
+ get_system_io(),
+ ram_size, boot_device,
+ kernel_filename, kernel_cmdline,
+ initrd_filename, cpu_model, 1, 1, 0);
 }
 
 static void pc_init_pci_no_kvmclock(ram_addr_t ram_size,
@@ -312,7 +330,7 @@ static void pc_init_pci_no_kvmclock(ram_addr_t ram_size,
  get_system_io(),
  ram_size, boot_device,
  kernel_filename, kernel_cmdline,
- initrd_filename, cpu_model, 1, 0);
+ initrd_filename, cpu_model, 1, 0, 0);
 }
 
 static void pc_init_isa(ram_addr_t ram_size,
@@ -328,7 +346,7 @@ static void pc_init_isa(ram_addr_t ram_size,
  get_system_io(),
  ram_size, boot_device,
  kernel_filename, kernel_cmdline,
- initrd_filename, cpu_model, 0, 1);
+ initrd_filename, cpu_model, 0, 1, 0);
 }
 
 #ifdef CONFIG_XEN
@@ -349,8 +367,8 @@ static void pc_xen_hvm_init(ram_addr_t ram_size,
 }
 #endif
 
-static QEMUMachine pc_machine_v1_1 = {
-.name = pc-1.1,
+static QEMUMachine pc_machine_v1_2 = {
+.name = pc-1.2,
 .alias = pc,
 .desc = Standard PC,
 .init = pc_init_pci,
@@ -358,6 +376,14 @@ static QEMUMachine pc_machine_v1_1 = {
 .is_default = 1,
 };
 
+static QEMUMachine pc_machine_v1_1 = {
+.name = pc-1.1,
+.desc = Standard PC,
+.init = pc_init_pci_oldcpu,
+.max_cpus = 255,
+.is_default = 1,
+};
+
 #define PC_COMPAT_1_0 \
 {\
 .driver   = pc-sysfw,\
@@ -384,7 +410,7 @@ static QEMUMachine pc_machine_v1_1 = {
 static QEMUMachine pc_machine_v1_0 = {
 .name = pc-1.0,
 .desc = Standard PC,
-.init = pc_init_pci,
+.init = pc_init_pci_oldcpu,
 .max_cpus = 255,
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_1_0,
@@ -399,7 +425,7 @@ static QEMUMachine pc_machine_v1_0 = {
 static QEMUMachine pc_machine_v0_15 = {
 .name = pc-0.15,
 .desc = Standard PC,
-.init = pc_init_pci,
+.init = pc_init_pci_oldcpu,
 .max_cpus = 255,
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_0_15,
@@ -431,7 +457,7 @@ static QEMUMachine pc_machine_v0_15 = {
 static QEMUMachine pc_machine_v0_14 = {
 .name = pc-0.14,
 .desc = Standard PC,
-.init = pc_init_pci,
+.init = pc_init_pci_oldcpu,
 .max_cpus = 255,
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_0_14, 
@@ -612,6 +638,7 @@ static QEMUMachine xenfv_machine = {
 
 static void pc_machine_init(void)
 {
+qemu_register_machine(pc_machine_v1_2);
 

[PATCH v2 3/3] i386: KVM: List -cpu host and best in -cpu ?

2012-06-26 Thread Alexander Graf
The kvm_enabled() helper doesn't work in a function as early as -cpu ?
yet. It also doesn't make sense to list the -cpu ? output conditional on
the -enable-kvm parameter. So let's always mention -cpu host in the
CPU list when KVM is supported on that configuration.

In addition, this patch also adds listing of -cpu best in the -cpu ?
list, so that people know that this option exists.

Signed-off-by: Alexander Graf ag...@suse.de
---
 target-i386/cpu.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 98cc1ec..6c20798 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1199,9 +1199,10 @@ void x86_cpu_list(FILE *f, fprintf_function cpu_fprintf, 
const char *optarg)
 (*cpu_fprintf)(f, \n);
 }
 }
-if (kvm_enabled()) {
-(*cpu_fprintf)(f, x86 %16s\n, [host]);
-}
+#ifdef CONFIG_KVM
+(*cpu_fprintf)(f, x86 %16s\n, KVM only: [host]);
+(*cpu_fprintf)(f, x86 %16s\n, KVM only: [best]);
+#endif
 }
 
 int cpu_x86_register(X86CPU *cpu, const char *cpu_model)
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call minutes (2012-06-26)

2012-06-26 Thread Juan Quintela

Hi
This are the minutes for Today call

- q35 integration
  why not ICH10? ICH9 is already obsolete.
  what are the differences?
  We need to check guests from Windows XP and some *BSD.
  Having it default for 1.2? Anthony.
  Having is as an option in 1.2 and make it defalut in 1.3. Alex?
  Anthony don't like today patches.  He wants to have 1st the refactoring, and 
then merge
  q35 sharing things with current code.
  Add requirements to the Q35 wiki page (Anthony task).
  People use ISA devices?
- irq chip integration?
  Continue discussion on mailing listh
- having pc-next, and don't change each version.
  Continue discussion on mailing list

Thanks, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: First step to push iothread lock out of inner run loop

2012-06-26 Thread Marcelo Tosatti
On Sat, Jun 23, 2012 at 12:55:49AM +0200, Jan Kiszka wrote:
 Should have declared this [RFC] in the subject and CC'ed kvm...
 
 On 2012-06-23 00:45, Jan Kiszka wrote:
  This sketches a possible path to get rid of the iothread lock on vmexits
  in KVM mode. On x86, the the in-kernel irqchips has to be used because
  we otherwise need to synchronize APIC and other per-cpu state accesses
  that could be changed concurrently. Not yet fully analyzed is the NMI
  injection path in the absence of an APIC.
  
  s390x should be fine without specific locking as their pre/post-run
  callbacks are empty. Power requires locking for the pre-run callback.
  
  This patch is untested, but a similar version was successfully used in
  a x86 setup with a network I/O path that needed no central iothread
  locking anymore (required special MMIO exit handling).
  ---
   kvm-all.c |   18 --
   target-i386/kvm.c |7 +++
   target-ppc/kvm.c  |4 
   3 files changed, 27 insertions(+), 2 deletions(-)
  
  diff --git a/kvm-all.c b/kvm-all.c
  index f8e4328..9c3e26f 100644
  --- a/kvm-all.c
  +++ b/kvm-all.c
  @@ -1460,6 +1460,8 @@ int kvm_cpu_exec(CPUArchState *env)
   return EXCP_HLT;
   }
   
  +qemu_mutex_unlock_iothread();
  +
   do {
   if (env-kvm_vcpu_dirty) {
   kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE);
  @@ -1476,14 +1478,16 @@ int kvm_cpu_exec(CPUArchState *env)
*/
   qemu_cpu_kick_self();
   }
  -qemu_mutex_unlock_iothread();
   
   run_ret = kvm_vcpu_ioctl(env, KVM_RUN, 0);
   
  -qemu_mutex_lock_iothread();
   kvm_arch_post_run(env, run);
   
  +/* TODO: push coalesced mmio flushing to the point where we access
  + * devices that are using it (currently VGA and E1000). */
  +qemu_mutex_lock_iothread();
   kvm_flush_coalesced_mmio_buffer();
  +qemu_mutex_unlock_iothread();
   
   if (run_ret  0) {
   if (run_ret == -EINTR || run_ret == -EAGAIN) {
  @@ -1499,19 +1503,23 @@ int kvm_cpu_exec(CPUArchState *env)
   switch (run-exit_reason) {
   case KVM_EXIT_IO:
   DPRINTF(handle_io\n);
  +qemu_mutex_lock_iothread();
   kvm_handle_io(run-io.port,
 (uint8_t *)run + run-io.data_offset,
 run-io.direction,
 run-io.size,
 run-io.count);
  +qemu_mutex_unlock_iothread();
   ret = 0;
   break;
   case KVM_EXIT_MMIO:
   DPRINTF(handle_mmio\n);
  +qemu_mutex_lock_iothread();
   cpu_physical_memory_rw(run-mmio.phys_addr,
  run-mmio.data,
  run-mmio.len,
  run-mmio.is_write);
  +qemu_mutex_unlock_iothread();
   ret = 0;
   break;
   case KVM_EXIT_IRQ_WINDOW_OPEN:
  @@ -1520,7 +1528,9 @@ int kvm_cpu_exec(CPUArchState *env)
   break;
   case KVM_EXIT_SHUTDOWN:
   DPRINTF(shutdown\n);
  +qemu_mutex_lock_iothread();
   qemu_system_reset_request();
  +qemu_mutex_unlock_iothread();
   ret = EXCP_INTERRUPT;
   break;
   case KVM_EXIT_UNKNOWN:
  @@ -1533,11 +1543,15 @@ int kvm_cpu_exec(CPUArchState *env)
   break;
   default:
   DPRINTF(kvm_arch_handle_exit\n);
  +qemu_mutex_lock_iothread();
   ret = kvm_arch_handle_exit(env, run);
  +qemu_mutex_unlock_iothread();
   break;
   }
   } while (ret == 0);
   
  +qemu_mutex_lock_iothread();
  +
   if (ret  0) {
   cpu_dump_state(env, stderr, fprintf, CPU_DUMP_CODE);
   vm_stop(RUN_STATE_INTERNAL_ERROR);
  diff --git a/target-i386/kvm.c b/target-i386/kvm.c
  index 0d0d8f6..0ad64d1 100644
  --- a/target-i386/kvm.c
  +++ b/target-i386/kvm.c
  @@ -1631,7 +1631,10 @@ void kvm_arch_pre_run(CPUX86State *env, struct 
  kvm_run *run)
   
   /* Inject NMI */
   if (env-interrupt_request  CPU_INTERRUPT_NMI) {
  +qemu_mutex_lock_iothread();
   env-interrupt_request = ~CPU_INTERRUPT_NMI;
  +qemu_mutex_unlock_iothread();
  +
   DPRINTF(injected NMI\n);
   ret = kvm_vcpu_ioctl(env, KVM_NMI);
   if (ret  0) {
  @@ -1641,6 +1644,8 @@ void kvm_arch_pre_run(CPUX86State *env, struct 
  kvm_run *run)
   }
   
   if (!kvm_irqchip_in_kernel()) {
  +qemu_mutex_lock_iothread();
  +
   /* Force the VCPU out of its inner loop to process any INIT 
  requests
* or pending TPR access reports. */
   if (env-interrupt_request 
  @@ -1682,6 +1687,8 @@ void kvm_arch_pre_run(CPUX86State *env, struct 
  kvm_run *run)
   
 

[PATCH] Add a page cache-backed balloon device driver.

2012-06-26 Thread Frank Swiderski
This implementation of a virtio balloon driver uses the page cache to
store pages that have been released to the host.  The communication
(outside of target counts) is one way--the guest notifies the host when
it adds a page to the page cache, allowing the host to madvise(2) with
MADV_DONTNEED.  Reclaim in the guest is therefore automatic and implicit
(via the regular page reclaim).  This means that inflating the balloon
is similar to the existing balloon mechanism, but the deflate is
different--it re-uses existing Linux kernel functionality to
automatically reclaim.

Signed-off-by: Frank Swiderski f...@google.com
---
 drivers/virtio/Kconfig  |   13 +
 drivers/virtio/Makefile |1 +
 drivers/virtio/virtio_fileballoon.c |  636 +++
 include/linux/virtio_balloon.h  |9 +
 include/linux/virtio_ids.h  |1 +
 5 files changed, 660 insertions(+), 0 deletions(-)
 create mode 100644 drivers/virtio/virtio_fileballoon.c

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index f38b17a..cffa2a7 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -35,6 +35,19 @@ config VIRTIO_BALLOON
 
 If unsure, say M.
 
+config VIRTIO_FILEBALLOON
+   tristate Virtio page cache-backed balloon driver
+   select VIRTIO
+   select VIRTIO_RING
+   ---help---
+This driver supports decreasing and automatically reclaiming the
+memory within a guest VM.  Unlike VIRTIO_BALLOON, this driver instead
+tries to maintain a specific target balloon size using the page cache.
+This allows the guest to implicitly deflate the balloon by flushing
+pages from the cache and touching the page.
+
+If unsure, say N.
+
  config VIRTIO_MMIO
tristate Platform bus driver for memory mapped virtio devices 
(EXPERIMENTAL)
depends on HAS_IOMEM  EXPERIMENTAL
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 5a4c63c..7ca0a3f 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -3,3 +3,4 @@ obj-$(CONFIG_VIRTIO_RING) += virtio_ring.o
 obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
+obj-$(CONFIG_VIRTIO_FILEBALLOON) += virtio_fileballoon.o
diff --git a/drivers/virtio/virtio_fileballoon.c 
b/drivers/virtio/virtio_fileballoon.c
new file mode 100644
index 000..ff252ec
--- /dev/null
+++ b/drivers/virtio/virtio_fileballoon.c
@@ -0,0 +1,636 @@
+/* Virtio file (page cache-backed) balloon implementation, inspired by
+ * Dor Loar and Marcelo Tosatti's implementations, and based on Rusty Russel's
+ * implementation.
+ *
+ * This implementation of the virtio balloon driver re-uses the page cache to
+ * allow memory consumed by inflating the balloon to be reclaimed by linux.  It
+ * creates and mounts a bare-bones filesystem containing a single inode.  When
+ * the host requests the balloon to inflate, it does so by reading pages at
+ * offsets into the inode mapping's page_tree.  The host is notified when the
+ * pages are added to the page_tree, allowing it (the host) to madvise(2) the
+ * corresponding host memory, reducing the RSS of the virtual machine.  In this
+ * implementation, the host is only notified when a page is added to the
+ * balloon.  Reclaim happens under the existing TTFP logic, which flushes 
unused
+ * pages in the page cache.  If the host used MADV_DONTNEED, then when the 
guest
+ * uses the page, the zero page will be mapped in, allowing automatic (and 
fast,
+ * compared to requiring a host notification via a virtio queue to get memory
+ * back) reclaim.
+ *
+ *  Copyright 2008 Rusty Russell IBM Corporation
+ *  Copyright 2011 Frank Swiderski Google Inc
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#include linux/backing-dev.h
+#include linux/delay.h
+#include linux/file.h
+#include linux/freezer.h
+#include linux/fs.h
+#include linux/jiffies.h
+#include linux/kthread.h
+#include linux/module.h
+#include linux/mount.h
+#include linux/pagemap.h
+#include linux/slab.h
+#include linux/swap.h
+#include linux/virtio.h
+#include linux/virtio_balloon.h
+#include linux/writeback.h
+
+#define VIRTBALLOON_PFN_ARRAY_SIZE 256
+
+struct virtio_balloon {
+   struct 

Re: [PATCH] Add a page cache-backed balloon device driver.

2012-06-26 Thread Rik van Riel

On 06/26/2012 04:32 PM, Frank Swiderski wrote:

This implementation of a virtio balloon driver uses the page cache to
store pages that have been released to the host.  The communication
(outside of target counts) is one way--the guest notifies the host when
it adds a page to the page cache, allowing the host to madvise(2) with
MADV_DONTNEED.  Reclaim in the guest is therefore automatic and implicit
(via the regular page reclaim).  This means that inflating the balloon
is similar to the existing balloon mechanism, but the deflate is
different--it re-uses existing Linux kernel functionality to
automatically reclaim.

Signed-off-by: Frank Swiderskif...@google.com


It is a great idea, but how can this memory balancing
possibly work if someone uses memory cgroups inside a
guest?

Having said that, we currently do not have proper
memory reclaim balancing between cgroups at all, so
requiring that of this balloon driver would be
unreasonable.

The code looks good to me, my only worry is the
code duplication. We now have 5 balloon drivers,
for 4 hypervisors, all implementing everything
from scratch...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Request VFIO inclusion in linux-next

2012-06-26 Thread Benjamin Herrenschmidt
On Mon, 2012-06-25 at 22:55 -0600, Alex Williamson wrote:
 Hi,
 
 VFIO has been kicking around for well over a year now and has been
 posted numerous times for review.  The pre-requirements are finally
 available in linux-next (or will be in the 20120626 build) so I'd like
 to request a new branch be included in linux-next with a goal of being
 accepted into v3.6.

Ack. Let's get that in, it's been simmering for too long and we'll need
that to do PCI pass-through on KVM powerpc.

Cheers,
Ben.

 VFIO is a userspace driver interface designed to support assignment of
 devices into virtual machines using IOMMU level access control.  This
 IOMMU requirement, secure resource access, and flexible interrupt
 support make VFIO unique from existing drivers, like UIO.  VFIO supports
 modular backends for both IOMMU and device access.  Initial backends are
 included for PCI device assignment using the IOMMU API in a manner
 compatible with x86 device assignment.  POWER support is also under
 development, making use of the same PCI device backend, but adding new
 IOMMU support for their platforms.
 
 As with previous versions of VFIO, Qemu is targeted as a primary user
 and a working development tree including vfio-pci support can be found
 here:
 
 git://github.com/awilliam/qemu-vfio.git iommu-group-vfio
 
 Eventually we hope VFIO can deprecate the x86, PCI-specific device
 assignment currently used by KVM.
 
 The info for linux-next:
 
 Tree: git://github.com/awilliam/linux-vfio.git
 Branch: next
 Contact: Alex Williamson alex.william...@redhat.com
 
 This branch should be applied after both Bjorn's PCI next branch and
 Joerg's IOMMU next branch and contains the following changes:
 
  Documentation/ioctl/ioctl-number.txt |1 
  Documentation/vfio.txt   |  315 +++
  MAINTAINERS  |8 
  drivers/Kconfig  |2 
  drivers/Makefile |1 
  drivers/vfio/Kconfig |   16 
  drivers/vfio/Makefile|3 
  drivers/vfio/pci/Kconfig |8 
  drivers/vfio/pci/Makefile|4 
  drivers/vfio/pci/vfio_pci.c  |  565 
  drivers/vfio/pci/vfio_pci_config.c   | 1528 
 +++
  drivers/vfio/pci/vfio_pci_intrs.c|  727 
  drivers/vfio/pci/vfio_pci_private.h  |   91 ++
  drivers/vfio/pci/vfio_pci_rdwr.c |  269 ++
  drivers/vfio/vfio.c  | 1420 
  drivers/vfio/vfio_iommu_type1.c  |  754 +
  include/linux/vfio.h |  445 ++
  17 files changed, 6157 insertions(+)
 
 If there are any objections to including this, please speak now.  If
 anything looks amiss in the branch, let me know.  I've never hosted a
 next branch.  Review comments welcome and I'll be glad to post the
 series in email again if requested.  Thanks,
 
 Alex
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-pci in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add a page cache-backed balloon device driver.

2012-06-26 Thread Frank Swiderski
On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel r...@redhat.com wrote:
 On 06/26/2012 04:32 PM, Frank Swiderski wrote:

 This implementation of a virtio balloon driver uses the page cache to
 store pages that have been released to the host.  The communication
 (outside of target counts) is one way--the guest notifies the host when
 it adds a page to the page cache, allowing the host to madvise(2) with
 MADV_DONTNEED.  Reclaim in the guest is therefore automatic and implicit
 (via the regular page reclaim).  This means that inflating the balloon
 is similar to the existing balloon mechanism, but the deflate is
 different--it re-uses existing Linux kernel functionality to
 automatically reclaim.

 Signed-off-by: Frank Swiderskif...@google.com


 It is a great idea, but how can this memory balancing
 possibly work if someone uses memory cgroups inside a
 guest?

Thanks and good point--this isn't something that I considered in the
implementation.

 Having said that, we currently do not have proper
 memory reclaim balancing between cgroups at all, so
 requiring that of this balloon driver would be
 unreasonable.

 The code looks good to me, my only worry is the
 code duplication. We now have 5 balloon drivers,
 for 4 hypervisors, all implementing everything
 from scratch...

Do you have any recommendations on this?  I could (I think reasonably
so) modify the existing virtio_balloon.c and have it change behavior
based on a feature bit or other configuration.  I'm not sure that
really addresses the root of what you're pointing out--it's still
adding a different implementation, but doing so as an extension of an
existing one.

fes
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add a page cache-backed balloon device driver.

2012-06-26 Thread Michael S. Tsirkin
On Tue, Jun 26, 2012 at 01:32:58PM -0700, Frank Swiderski wrote:
 This implementation of a virtio balloon driver uses the page cache to
 store pages that have been released to the host.  The communication
 (outside of target counts) is one way--the guest notifies the host when
 it adds a page to the page cache, allowing the host to madvise(2) with
 MADV_DONTNEED.  Reclaim in the guest is therefore automatic and implicit
 (via the regular page reclaim).  This means that inflating the balloon
 is similar to the existing balloon mechanism, but the deflate is
 different--it re-uses existing Linux kernel functionality to
 automatically reclaim.
 
 Signed-off-by: Frank Swiderski f...@google.com

I'm pondering this:

Should it really be a separate driver/device ID?
If it behaves the same from host POV, maybe it
should be up to the guest how to inflate/deflate
the balloon internally?

 ---
  drivers/virtio/Kconfig  |   13 +
  drivers/virtio/Makefile |1 +
  drivers/virtio/virtio_fileballoon.c |  636 
 +++
  include/linux/virtio_balloon.h  |9 +
  include/linux/virtio_ids.h  |1 +
  5 files changed, 660 insertions(+), 0 deletions(-)
  create mode 100644 drivers/virtio/virtio_fileballoon.c
 
 diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
 index f38b17a..cffa2a7 100644
 --- a/drivers/virtio/Kconfig
 +++ b/drivers/virtio/Kconfig
 @@ -35,6 +35,19 @@ config VIRTIO_BALLOON
  
If unsure, say M.
  
 +config VIRTIO_FILEBALLOON
 + tristate Virtio page cache-backed balloon driver
 + select VIRTIO
 + select VIRTIO_RING
 + ---help---
 +  This driver supports decreasing and automatically reclaiming the
 +  memory within a guest VM.  Unlike VIRTIO_BALLOON, this driver instead
 +  tries to maintain a specific target balloon size using the page cache.
 +  This allows the guest to implicitly deflate the balloon by flushing
 +  pages from the cache and touching the page.
 +
 +  If unsure, say N.
 +
   config VIRTIO_MMIO
   tristate Platform bus driver for memory mapped virtio devices 
 (EXPERIMENTAL)
   depends on HAS_IOMEM  EXPERIMENTAL
 diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
 index 5a4c63c..7ca0a3f 100644
 --- a/drivers/virtio/Makefile
 +++ b/drivers/virtio/Makefile
 @@ -3,3 +3,4 @@ obj-$(CONFIG_VIRTIO_RING) += virtio_ring.o
  obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
  obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
 +obj-$(CONFIG_VIRTIO_FILEBALLOON) += virtio_fileballoon.o
 diff --git a/drivers/virtio/virtio_fileballoon.c 
 b/drivers/virtio/virtio_fileballoon.c
 new file mode 100644
 index 000..ff252ec
 --- /dev/null
 +++ b/drivers/virtio/virtio_fileballoon.c
 @@ -0,0 +1,636 @@
 +/* Virtio file (page cache-backed) balloon implementation, inspired by
 + * Dor Loar and Marcelo Tosatti's implementations, and based on Rusty 
 Russel's
 + * implementation.
 + *
 + * This implementation of the virtio balloon driver re-uses the page cache to
 + * allow memory consumed by inflating the balloon to be reclaimed by linux.  
 It
 + * creates and mounts a bare-bones filesystem containing a single inode.  
 When
 + * the host requests the balloon to inflate, it does so by reading pages at
 + * offsets into the inode mapping's page_tree.  The host is notified when the
 + * pages are added to the page_tree, allowing it (the host) to madvise(2) the
 + * corresponding host memory, reducing the RSS of the virtual machine.  In 
 this
 + * implementation, the host is only notified when a page is added to the
 + * balloon.  Reclaim happens under the existing TTFP logic, which flushes 
 unused
 + * pages in the page cache.  If the host used MADV_DONTNEED, then when the 
 guest
 + * uses the page, the zero page will be mapped in, allowing automatic (and 
 fast,
 + * compared to requiring a host notification via a virtio queue to get memory
 + * back) reclaim.
 + *
 + *  Copyright 2008 Rusty Russell IBM Corporation
 + *  Copyright 2011 Frank Swiderski Google Inc
 + *
 + *  This program is free software; you can redistribute it and/or modify
 + *  it under the terms of the GNU General Public License as published by
 + *  the Free Software Foundation; either version 2 of the License, or
 + *  (at your option) any later version.
 + *
 + *  This program is distributed in the hope that it will be useful,
 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + *  GNU General Public License for more details.
 + *
 + *  You should have received a copy of the GNU General Public License
 + *  along with this program; if not, write to the Free Software
 + *  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  
 USA
 + */
 +#include linux/backing-dev.h
 +#include linux/delay.h
 +#include linux/file.h
 +#include linux/freezer.h
 +#include 

Re: [PATCH] Add a page cache-backed balloon device driver.

2012-06-26 Thread Rik van Riel

On 06/26/2012 05:31 PM, Frank Swiderski wrote:

On Tue, Jun 26, 2012 at 1:40 PM, Rik van Rielr...@redhat.com  wrote:



The code looks good to me, my only worry is the
code duplication. We now have 5 balloon drivers,
for 4 hypervisors, all implementing everything
from scratch...


Do you have any recommendations on this?  I could (I think reasonably
so) modify the existing virtio_balloon.c and have it change behavior
based on a feature bit or other configuration.  I'm not sure that
really addresses the root of what you're pointing out--it's still
adding a different implementation, but doing so as an extension of an
existing one.


Ideally, I believe we would have two balloon
top parts in a guest (one classical balloon,
one on the LRU), and four bottom parts (kvm,
xen, vmware  s390).

That way the virt specific bits of a balloon
driver would be essentially a -balloon_page
and -release_page callback for pages, as well
as methods to communicate with the host.

All the management of pages, including stuff
like putting them on the LRU, or isolating
them for migration, would be done with the
same common code, regardless of what virt
software we are running on.

Of course, that is a substantial amount of
work and I feel it would be unreasonable to
block anyone's code on that kind of thing
(especially considering that your code is good),
but I do believe the explosion of balloon
code is a little worrying.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add a page cache-backed balloon device driver.

2012-06-26 Thread Michael S. Tsirkin
On Tue, Jun 26, 2012 at 02:31:26PM -0700, Frank Swiderski wrote:
 On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel r...@redhat.com wrote:
  On 06/26/2012 04:32 PM, Frank Swiderski wrote:
 
  This implementation of a virtio balloon driver uses the page cache to
  store pages that have been released to the host.  The communication
  (outside of target counts) is one way--the guest notifies the host when
  it adds a page to the page cache, allowing the host to madvise(2) with
  MADV_DONTNEED.  Reclaim in the guest is therefore automatic and implicit
  (via the regular page reclaim).  This means that inflating the balloon
  is similar to the existing balloon mechanism, but the deflate is
  different--it re-uses existing Linux kernel functionality to
  automatically reclaim.
 
  Signed-off-by: Frank Swiderskif...@google.com
 
 
  It is a great idea, but how can this memory balancing
  possibly work if someone uses memory cgroups inside a
  guest?
 
 Thanks and good point--this isn't something that I considered in the
 implementation.
 
  Having said that, we currently do not have proper
  memory reclaim balancing between cgroups at all, so
  requiring that of this balloon driver would be
  unreasonable.
 
  The code looks good to me, my only worry is the
  code duplication. We now have 5 balloon drivers,
  for 4 hypervisors, all implementing everything
  from scratch...
 
 Do you have any recommendations on this?  I could (I think reasonably
 so) modify the existing virtio_balloon.c and have it change behavior
 based on a feature bit or other configuration.  I'm not sure that
 really addresses the root of what you're pointing out--it's still
 adding a different implementation, but doing so as an extension of an
 existing one.
 
 fes

Let's assume it's a feature bit: how would you
formulate what the feature does *from host point of view*?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 10/17] PowerPC: booke64: Refactor exception prolog for save/restore regs

2012-06-26 Thread Benjamin Herrenschmidt
On Mon, 2012-06-25 at 15:26 +0300, Mihai Caraman wrote:
 Refactor exception prolog to allow save/restore register parameters. Add
 addition none definition for exception prolog usage.
 This is needed for exceptions like Guest Doorbell that use GSRRx regsiters
 which do not map on exception type.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
  arch/powerpc/kernel/exceptions-64e.S |   23 ---
  1 files changed, 8 insertions(+), 15 deletions(-)
 
 diff --git a/arch/powerpc/kernel/exceptions-64e.S 
 b/arch/powerpc/kernel/exceptions-64e.S
 index 7215cc2..52aa96b 100644
 --- a/arch/powerpc/kernel/exceptions-64e.S
 +++ b/arch/powerpc/kernel/exceptions-64e.S
 @@ -35,7 +35,7 @@
  #define  SPECIAL_EXC_FRAME_SIZE  INT_FRAME_SIZE
  
  /* Exception prolog code for all exceptions */
 -#define EXCEPTION_PROLOG(n, type, addition)  \
 +#define EXCEPTION_PROLOG(n, type, srr0, srr1, addition)  
 \
   mtspr   SPRN_SPRG_##type##_SCRATCH,r13; /* get spare registers */   \
   mfspr   r13,SPRN_SPRG_PACA; /* get PACA */  \
   std r10,PACA_EX##type+EX_R10(r13);  \
 @@ -44,54 +44,47 @@
   addition;   /* additional code for that exc. */ \
   std r1,PACA_EX##type+EX_R1(r13); /* save old r1 in the PACA */  \
   stw r10,PACA_EX##type+EX_CR(r13); /* save old CR in the PACA */ \
 - mfspr   r11,SPRN_##type##_SRR1;/* what are we coming from */\
 + mfspr   r11,srr1;/* what are we coming from */  \
   type##_SET_KSTACK;  /* get special stack if necessary */\
   andi.   r10,r11,MSR_PR; /* save stack pointer */\
   beq 1f; /* branch around if supervisor */   \
   ld  r1,PACAKSAVE(r13);  /* get kernel stack coming from usr */\
  1:   cmpdi   cr1,r1,0;   /* check if SP makes sense */   \
   bge-cr1,exc_##n##_bad_stack;/* bad stack (TODO: out of line) */ \
 - mfspr   r10,SPRN_##type##_SRR0; /* read SRR0 before touching stack */
 + mfspr   r10,srr0;   /* read SRR0 before touching stack */

No, use the existing macro, use a ##type## specific to guest doorbells,
with appropriate definitions of the corresponding SPRN_ macros.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 11/17] PowerPC: booke64: Fix machine check handler to use the right prolog

2012-06-26 Thread Benjamin Herrenschmidt
On Mon, 2012-06-25 at 15:26 +0300, Mihai Caraman wrote:
 Machine check exception handler was using a wrong prolog. Hypervisors, like
 KVM, which are called early from the exception handler rely on the interrupt
 source.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com

Ack.

Please separate your core patches from your KVM series and submit them
separately. I'll take care of the core Book3E part.

Cheers,
Ben.

 ---
  arch/powerpc/kernel/exceptions-64e.S |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/arch/powerpc/kernel/exceptions-64e.S 
 b/arch/powerpc/kernel/exceptions-64e.S
 index 52aa96b..06f7aec 100644
 --- a/arch/powerpc/kernel/exceptions-64e.S
 +++ b/arch/powerpc/kernel/exceptions-64e.S
 @@ -290,7 +290,7 @@ interrupt_end_book3e:
  
  /* Machine Check Interrupt */
   START_EXCEPTION(machine_check);
 - CRIT_EXCEPTION_PROLOG(0x200, PROLOG_ADDITION_NONE)
 + MC_EXCEPTION_PROLOG(0x200, PROLOG_ADDITION_NONE)
  //   EXCEPTION_COMMON(0x200, PACA_EXMC, INTS_DISABLE)
  //   bl  special_reg_save_mc
  //   addir3,r1,STACK_FRAME_OVERHEAD


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 13/17] PowerPC: booke64: Use SPRG0/3 scratch for bolted TLB miss crit int

2012-06-26 Thread Benjamin Herrenschmidt
On Mon, 2012-06-25 at 15:26 +0300, Mihai Caraman wrote:
 Embedded.Hypervisor category defines GSPRG0..3 physical registers for guests.
 Avoid SPRG4-7 usage as scratch in host exception handlers, otherwise guest
 SPRG4-7 registers will be clobbered.
 For bolted TLB miss exception handlers, which is the version currently
 supported by KVM, use SPRN_SPRG_GEN_SCRATCH (aka SPRG0) instead of
 SPRN_SPRG_TLB_SCRATCH (aka SPRG6) and replace TLB with GEN PACA slots to
 keep consitency.
 For critical exception handler use SPRG3 instead of SPRG7.

Beware with SPRG3 usage. It's user space visible and we plan to use it
for other things (see Anton's patch to stick topology information in
there for use by the vdso). If you clobber it, you may want to restore
it later.

I think Anton's patch should put the proper value we want in the PACA
anyway since we also need to restore it on exit from KVM, so you can
still use it as scratch, just restore the value before going to C.

Cheers,
Ben.

 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
  arch/powerpc/include/asm/exception-64e.h |   14 +++---
  arch/powerpc/include/asm/reg.h   |6 +++---
  arch/powerpc/mm/tlb_low_64e.S|   28 ++--
  3 files changed, 24 insertions(+), 24 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/exception-64e.h 
 b/arch/powerpc/include/asm/exception-64e.h
 index ac13add..c90a9a4 100644
 --- a/arch/powerpc/include/asm/exception-64e.h
 +++ b/arch/powerpc/include/asm/exception-64e.h
 @@ -38,8 +38,11 @@
   */
  
 
 -/* We are out of SPRGs so we save some things in the PACA. The normal
 - * exception frame is smaller than the CRIT or MC one though
 +/* We are out of SPRGs so we save some things in the 8 slots available in 
 PACA.
 + * The normal exception frame is smaller than the CRIT or MC one though
 + *
 + * Bolted TLB miss exception variant also uses these slots which in 
 combination
 + * with pgd and kernel_pgd fits in one 64-byte cache line.
   */
  #define EX_R1(0 * 8)
  #define EX_CR(1 * 8)
 @@ -47,13 +50,10 @@
  #define EX_R11   (3 * 8)
  #define EX_R14   (4 * 8)
  #define EX_R15   (5 * 8)
 +#define EX_R16   (6 * 8)
  
  /*
 - * The TLB miss exception uses different slots.
 - *
 - * The bolted variant uses only the first six fields,
 - * which in combination with pgd and kernel_pgd fits in
 - * one 64-byte cache line.
 + * PACA slots offset for standard TLB miss exception.
   */
  
  #define EX_TLB_R10   ( 0 * 8)
 diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
 index f0cb7f4..51c14a7 100644
 --- a/arch/powerpc/include/asm/reg.h
 +++ b/arch/powerpc/include/asm/reg.h
 @@ -760,10 +760,10 @@
   * 64-bit embedded
   *   - SPRG0 generic exception scratch
   *   - SPRG2 TLB exception stack
 - *   - SPRG3 unused (user visible)
 + *   - SPRG3 critical exception scratch (user visible)
   *   - SPRG4 unused (user visible)
   *   - SPRG6 TLB miss scratch (user visible, sorry !)
 - *   - SPRG7 critical exception scratch
 + *   - SPRG7 unused (user visible)
   *   - SPRG8 machine check exception scratch
   *   - SPRG9 debug exception scratch
   *
 @@ -857,7 +857,7 @@
  
  #ifdef CONFIG_PPC_BOOK3E_64
  #define SPRN_SPRG_MC_SCRATCH SPRN_SPRG8
 -#define SPRN_SPRG_CRIT_SCRATCH   SPRN_SPRG7
 +#define SPRN_SPRG_CRIT_SCRATCH   SPRN_SPRG3
  #define SPRN_SPRG_DBG_SCRATCHSPRN_SPRG9
  #define SPRN_SPRG_TLB_EXFRAMESPRN_SPRG2
  #define SPRN_SPRG_TLB_SCRATCHSPRN_SPRG6
 diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
 index 88feaaa..4192ade 100644
 --- a/arch/powerpc/mm/tlb_low_64e.S
 +++ b/arch/powerpc/mm/tlb_low_64e.S
 @@ -40,36 +40,36 @@
   **/
  
  .macro tlb_prolog_bolted intnum addr
 - mtspr   SPRN_SPRG_TLB_SCRATCH,r13
 + mtspr   SPRN_SPRG_GEN_SCRATCH,r13
   mfspr   r13,SPRN_SPRG_PACA
 - std r10,PACA_EXTLB+EX_TLB_R10(r13)
 + std r10,PACA_EXGEN+EX_R10(r13)
   mfcrr10
 - std r11,PACA_EXTLB+EX_TLB_R11(r13)
 + std r11,PACA_EXGEN+EX_R11(r13)
  #ifdef CONFIG_KVM_BOOKE_HV
  BEGIN_FTR_SECTION
   mfspr   r11, SPRN_SRR1
  END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
  #endif
   DO_KVM  \intnum, SPRN_SRR1
 - std r16,PACA_EXTLB+EX_TLB_R16(r13)
 + std r16,PACA_EXGEN+EX_R16(r13)
   mfspr   r16,\addr   /* get faulting address */
 - std r14,PACA_EXTLB+EX_TLB_R14(r13)
 + std r14,PACA_EXGEN+EX_R14(r13)
   ld  r14,PACAPGD(r13)
 - std r15,PACA_EXTLB+EX_TLB_R15(r13)
 - std r10,PACA_EXTLB+EX_TLB_CR(r13)
 + std r15,PACA_EXGEN+EX_R15(r13)
 + std r10,PACA_EXGEN+EX_CR(r13)
   TLB_MISS_PROLOG_STATS_BOLTED
  .endm
  
  .macro tlb_epilog_bolted
 - ld  r14,PACA_EXTLB+EX_TLB_CR(r13)
 - ld  r10,PACA_EXTLB+EX_TLB_R10(r13)
 - ld  

Re: [RFC PATCH 13/17] PowerPC: booke64: Use SPRG0/3 scratch for bolted TLB miss crit int

2012-06-26 Thread Scott Wood
On 06/25/2012 07:26 AM, Mihai Caraman wrote:
 Embedded.Hypervisor category defines GSPRG0..3 physical registers for guests.
 Avoid SPRG4-7 usage as scratch in host exception handlers, otherwise guest
 SPRG4-7 registers will be clobbered.
 For bolted TLB miss exception handlers, which is the version currently
 supported by KVM, use SPRN_SPRG_GEN_SCRATCH (aka SPRG0) instead of
 SPRN_SPRG_TLB_SCRATCH (aka SPRG6) and replace TLB with GEN PACA slots to
 keep consitency.
 For critical exception handler use SPRG3 instead of SPRG7.

extlb is in the same cache line as other TLB stuff we need, while exgen
isn't.  Let's stick with extlb.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 03/17] KVM: PPC64: booke: Add EPCR support in sregs

2012-06-26 Thread Scott Wood
On 06/25/2012 07:26 AM, Mihai Caraman wrote:
 Add KVM_SREGS_E_64 feature and EPCR spr support in get/set sregs
 for 64-bit hosts.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
  arch/powerpc/kvm/booke.c |   14 ++
  1 files changed, 14 insertions(+), 0 deletions(-)
 
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index f9fa260..d15c4b5 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -1052,6 +1052,9 @@ static void get_sregs_base(struct kvm_vcpu *vcpu,
   u64 tb = get_tb();
  
   sregs-u.e.features |= KVM_SREGS_E_BASE;
 +#ifdef CONFIG_64BIT
 + sregs-u.e.features |= KVM_SREGS_E_64;0
 +#endif
  
   sregs-u.e.csrr0 = vcpu-arch.csrr0;
   sregs-u.e.csrr1 = vcpu-arch.csrr1;
 @@ -1063,6 +1066,9 @@ static void get_sregs_base(struct kvm_vcpu *vcpu,
   sregs-u.e.dec = kvmppc_get_dec(vcpu, tb);
   sregs-u.e.tb = tb;
   sregs-u.e.vrsave = vcpu-arch.vrsave;
 +#ifdef CONFIG_64BIT
 + sregs-u.e.epcr = vcpu-arch.epcr;
 +#endif
  }
  
  static int set_sregs_base(struct kvm_vcpu *vcpu,
 @@ -1071,6 +1077,11 @@ static int set_sregs_base(struct kvm_vcpu *vcpu,
   if (!(sregs-u.e.features  KVM_SREGS_E_BASE))
   return 0;
  
 +#ifdef CONFIG_64BIT
 + if (!(sregs-u.e.features  KVM_SREGS_E_64))
 + return 0;
 +#endif

This means that a QEMU targeting a 32-bit guest won't be able to set any
special registers, if it sets feature bits manually rather than getting
them from GET_SREGS.

This check should only qualify whether we look at sregs.u.e.epcr, not
whether this function works at all.

BTW, shouldn't the BASE check return an error rather than silently no-op?

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add a page cache-backed balloon device driver.

2012-06-26 Thread Frank Swiderski
On Tue, Jun 26, 2012 at 2:47 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Tue, Jun 26, 2012 at 02:31:26PM -0700, Frank Swiderski wrote:
 On Tue, Jun 26, 2012 at 1:40 PM, Rik van Riel r...@redhat.com wrote:
  On 06/26/2012 04:32 PM, Frank Swiderski wrote:
 
  This implementation of a virtio balloon driver uses the page cache to
  store pages that have been released to the host.  The communication
  (outside of target counts) is one way--the guest notifies the host when
  it adds a page to the page cache, allowing the host to madvise(2) with
  MADV_DONTNEED.  Reclaim in the guest is therefore automatic and implicit
  (via the regular page reclaim).  This means that inflating the balloon
  is similar to the existing balloon mechanism, but the deflate is
  different--it re-uses existing Linux kernel functionality to
  automatically reclaim.
 
  Signed-off-by: Frank Swiderskif...@google.com
 
 
  It is a great idea, but how can this memory balancing
  possibly work if someone uses memory cgroups inside a
  guest?

 Thanks and good point--this isn't something that I considered in the
 implementation.

  Having said that, we currently do not have proper
  memory reclaim balancing between cgroups at all, so
  requiring that of this balloon driver would be
  unreasonable.
 
  The code looks good to me, my only worry is the
  code duplication. We now have 5 balloon drivers,
  for 4 hypervisors, all implementing everything
  from scratch...

 Do you have any recommendations on this?  I could (I think reasonably
 so) modify the existing virtio_balloon.c and have it change behavior
 based on a feature bit or other configuration.  I'm not sure that
 really addresses the root of what you're pointing out--it's still
 adding a different implementation, but doing so as an extension of an
 existing one.

 fes

 Let's assume it's a feature bit: how would you
 formulate what the feature does *from host point of view*?

 --
 MST

In this implementation, the host doesn't keep track of pages in the
balloon, as there is no explicit deflate path.  The host device for
this implementation should merely, for example, MADV_DONTNEED on the
pages sent in an inflate.  Thus, the inflate becomes a notification
that the guest doesn't need those pages mapped in, but that they
should be available if the guest touches them.  In that sense, it's
not a rigid shrink of guest memory.  I'm not sure what I'd call the
feature bit though.

Was that the question you were asking, or did I misread?

fes
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add a page cache-backed balloon device driver.

2012-06-26 Thread Frank Swiderski
On Tue, Jun 26, 2012 at 2:45 PM, Rik van Riel r...@redhat.com wrote:
 On 06/26/2012 05:31 PM, Frank Swiderski wrote:

 On Tue, Jun 26, 2012 at 1:40 PM, Rik van Rielr...@redhat.com  wrote:


 The code looks good to me, my only worry is the
 code duplication. We now have 5 balloon drivers,
 for 4 hypervisors, all implementing everything
 from scratch...


 Do you have any recommendations on this?  I could (I think reasonably
 so) modify the existing virtio_balloon.c and have it change behavior
 based on a feature bit or other configuration.  I'm not sure that
 really addresses the root of what you're pointing out--it's still
 adding a different implementation, but doing so as an extension of an
 existing one.


 Ideally, I believe we would have two balloon
 top parts in a guest (one classical balloon,
 one on the LRU), and four bottom parts (kvm,
 xen, vmware  s390).

 That way the virt specific bits of a balloon
 driver would be essentially a -balloon_page
 and -release_page callback for pages, as well
 as methods to communicate with the host.

 All the management of pages, including stuff
 like putting them on the LRU, or isolating
 them for migration, would be done with the
 same common code, regardless of what virt
 software we are running on.

 Of course, that is a substantial amount of
 work and I feel it would be unreasonable to
 block anyone's code on that kind of thing
 (especially considering that your code is good),
 but I do believe the explosion of balloon
 code is a little worrying.


Hm, that makes a lot of sense.  That would be a few patches definitely
worth doing, IMHO.  I'm not entirely sure how I feel about inflating
the balloon drivers in the meantime.  Sigh, and I didn't even mean
that as a pun.

fes
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Request VFIO inclusion in linux-next

2012-06-26 Thread Stephen Rothwell
Hi Alex,

On Mon, 25 Jun 2012 22:55:52 -0600 Alex Williamson alex.william...@redhat.com 
wrote:

 VFIO has been kicking around for well over a year now and has been
 posted numerous times for review.  The pre-requirements are finally
 available in linux-next (or will be in the 20120626 build) so I'd like
 to request a new branch be included in linux-next with a goal of being
 accepted into v3.6.
 
 The info for linux-next:
 
 Tree: git://github.com/awilliam/linux-vfio.git
 Branch: next
 Contact: Alex Williamson alex.william...@redhat.com
 
 This branch should be applied after both Bjorn's PCI next branch and
 Joerg's IOMMU next branch

I have added this from today.  Since this tree merges (parts of) the pci
and iommu trees, I am hoping that those are to remain stable - if not,
then you need to stay alert if they are rebased.

Thanks for adding your subsystem tree as a participant of linux-next.  As
you may know, this is not a judgment of your code.  The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window. 

You will need to ensure that the patches/commits in your tree/series have
been:
 * submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
 * posted to the relevant mailing list,
 * reviewed by you (or another maintainer of your subsystem tree),
 * successfully unit tested, and 
 * destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch).  It is allowed to be rebased if you deem it necessary.

-- 
Cheers,
Stephen Rothwell 
s...@canb.auug.org.au

Legal Stuff:
By participating in linux-next, your subsystem tree contributions are
public and will be included in the linux-next trees.  You may be sent
e-mail messages indicating errors or other issues when the
patches/commits from your subsystem tree are merged and tested in
linux-next.  These messages may also be cross-posted to the linux-next
mailing list, the linux-kernel mailing list, etc.  The linux-next tree
project and IBM (my employer) make no warranties regarding the linux-next
project, the testing procedures, the results, the e-mails, etc.  If you
don't agree to these ground rules, let me know and I'll remove your tree
from participation in linux-next.


pgpBcKIHoRdyA.pgp
Description: PGP signature


[PATCH 2/4] KVM: Use __print_hex() for kvm_emulate_insn tracepoint

2012-06-26 Thread Namhyung Kim
From: Namhyung Kim namhyung@lge.com

The kvm_emulate_insn tracepoint used __print_insn()
for printing its instructions. However it makes the
format of the event hard to parse as it reveals TP
internals.

Fortunately, kernel provides __print_hex for almost
same purpose, we can use it instead of open coding
it. The user-space can be changed to parse it later.

That means raw kernel tracing will not be affected
by this change:

 # cd /sys/kernel/debug/tracing/
 # cat events/kvm/kvm_emulate_insn/format
 name: kvm_emulate_insn
 ID: 29
 format:
...
 print fmt: %x:%llx:%s (%s)%s, REC-csbase, REC-rip, __print_hex(REC-insn, 
REC-len), \
 __print_symbolic(REC-flags, { 0, real }, { (1  0) | (1  1), vm16 }, \
 { (1  0), prot16 }, { (1  0) | (1  2), prot32 }, { (1  0) | (1  
3), prot64 }), \
 REC-failed ?  failed : 

 # echo 1  events/kvm/kvm_emulate_insn/enable
 # cat trace
 # tracer: nop
 #
 # entries-in-buffer/entries-written: 2183/2183   #P:12
 #
 #  _-= irqs-off
 # / _= need-resched
 #| / _---= hardirq/softirq
 #|| / _--= preempt-depth
 #||| / delay
 #   TASK-PID   CPU#  TIMESTAMP  FUNCTION
 #  | |   |      | |
 qemu-kvm-1782  [002] ...1   140.931636: kvm_emulate_insn: 
0:c102fa25:89 10 (prot32)
 qemu-kvm-1781  [004] ...1   140.931637: kvm_emulate_insn: 
0:c102fa25:89 10 (prot32)

Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/n/tip-wfw6y3b9ugtey8snaow9n...@git.kernel.org
Signed-off-by: Namhyung Kim namhy...@kernel.org
---
 arch/x86/kvm/trace.h   |   12 +---
 include/trace/ftrace.h |1 +
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 911d2641f14c..62d02e3c3ed6 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -710,16 +710,6 @@ TRACE_EVENT(kvm_skinit,
  __entry-rip, __entry-slb)
 );
 
-#define __print_insn(insn, ilen) ({ \
-   int i;   \
-   const char *ret = p-buffer + p-len;\
-\
-   for (i = 0; i  ilen; ++i)   \
-   trace_seq_printf(p,  %02x, insn[i]);   \
-   trace_seq_printf(p, %c, 0);\
-   ret; \
-   })
-
 #define KVM_EMUL_INSN_F_CR0_PE (1  0)
 #define KVM_EMUL_INSN_F_EFL_VM (1  1)
 #define KVM_EMUL_INSN_F_CS_D   (1  2)
@@ -786,7 +776,7 @@ TRACE_EVENT(kvm_emulate_insn,
 
TP_printk(%x:%llx:%s (%s)%s,
  __entry-csbase, __entry-rip,
- __print_insn(__entry-insn, __entry-len),
+ __print_hex(__entry-insn, __entry-len),
  __print_symbolic(__entry-flags,
   kvm_trace_symbol_emul_flags),
  __entry-failed ?  failed : 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 769724944fc6..c6bc2faaf261 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -571,6 +571,7 @@ static inline void ftrace_test_probe_##call(void)   
\
 
 #undef __print_flags
 #undef __print_symbolic
+#undef __print_hex
 #undef __get_dynamic_array
 #undef __get_str
 
-- 
1.7.10.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/6] kvm: Sanitize KVM_IRQFD flags

2012-06-26 Thread Alex Williamson
We only know of one so far.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 virt/kvm/eventfd.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index c307c24..7d7e2aa 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -340,6 +340,9 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
 int
 kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
 {
+   if (args-flags  ~KVM_IRQFD_FLAG_DEASSIGN)
+   return -EINVAL;
+
if (args-flags  KVM_IRQFD_FLAG_DEASSIGN)
return kvm_irqfd_deassign(kvm, args);
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

2012-06-26 Thread Alex Williamson
In order to inject an interrupt from an external source using an
irqfd, we need to allocate a new irq_source_id.  This allows us to
assert and (later) de-assert an interrupt line independently from
users of KVM_IRQ_LINE and avoid lost interrupts.

We also add what may appear like a bit of excessive infrastructure
around an object for storing this irq_source_id.  However, notice
that we only provide a way to assert the interrupt here.  A follow-on
interface will make use of the same irq_source_id to allow de-assert.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Documentation/virtual/kvm/api.txt |5 ++
 arch/x86/kvm/x86.c|1 
 include/linux/kvm.h   |3 +
 virt/kvm/eventfd.c|   95 +++--
 4 files changed, 99 insertions(+), 5 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index ea9edce..b216709 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1981,6 +1981,11 @@ the guest using the specified gsi pin.  The irqfd is 
removed using
 the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
 and kvm_irqfd.gsi.
 
+With KVM_IRQFD_FLAG_LEVEL KVM_IRQFD allocates a new IRQ source ID for
+the requested irqfd.  This is necessary to share level triggered
+interrupts with those injected through KVM_IRQ_LINE.  IRQFDs created
+with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
+KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
 
 5. The kvm_run structure
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a01a424..80bed07 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_GET_TSC_KHZ:
case KVM_CAP_PCI_2_3:
case KVM_CAP_KVMCLOCK_CTRL:
+   case KVM_CAP_IRQFD_LEVEL:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2ce09aa..b2e6e4f 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_GET_SMMU_INFO 78
 #define KVM_CAP_S390_COW 79
 #define KVM_CAP_PPC_ALLOC_HTAB 80
+#define KVM_CAP_IRQFD_LEVEL 81
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
 #endif
 
 #define KVM_IRQFD_FLAG_DEASSIGN (1  0)
+/* Available with KVM_CAP_IRQFD_LEVEL */
+#define KVM_IRQFD_FLAG_LEVEL (1  1)
 
 struct kvm_irqfd {
__u32 fd;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 7d7e2aa..18cc284 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -36,6 +36,64 @@
 #include iodev.h
 
 /*
+ * An irq_source_id can be created from KVM_IRQFD for level interrupt
+ * injections and shared with other interfaces for EOI or de-assert.
+ * Create an object with reference counting to make it easy to use.
+ */
+struct _irq_source {
+   int id;
+   struct kvm *kvm;
+   struct kref kref;
+};
+
+static void release_irq_source(struct kref *kref)
+{
+   struct _irq_source *source;
+
+   source = container_of(kref, struct _irq_source, kref);
+
+   kvm_free_irq_source_id(source-kvm, source-id);
+   kfree(source);
+}
+
+static void put_irq_source(struct _irq_source *source)
+{
+   if (source)
+   kref_put(source-kref, release_irq_source);
+}
+
+static struct _irq_source *__attribute__ ((used)) /* white lie for now */
+get_irq_source(struct _irq_source *source)
+{
+   if (source)
+   kref_get(source-kref);
+
+   return source;
+}
+
+static struct _irq_source *new_irq_source(struct kvm *kvm)
+{
+   struct _irq_source *source;
+   int id;
+
+   source = kzalloc(sizeof(*source), GFP_KERNEL);
+   if (!source)
+   return ERR_PTR(-ENOMEM);
+
+   id = kvm_request_irq_source_id(kvm);
+   if (id  0) {
+   kfree(source);
+   return ERR_PTR(id);
+   }
+
+   kref_init(source-kref);
+   source-kvm = kvm;
+   source-id = id;
+
+   return source;
+}
+
+/*
  * 
  * irqfd: Allows an fd to be used to inject an interrupt to the guest
  *
@@ -52,6 +110,7 @@ struct _irqfd {
/* Used for level IRQ fast-path */
int gsi;
struct work_struct inject;
+   struct _irq_source *source;
/* Used for setup/shutdown */
struct eventfd_ctx *eventfd;
struct list_head list;
@@ -62,7 +121,7 @@ struct _irqfd {
 static struct workqueue_struct *irqfd_cleanup_wq;
 
 static void
-irqfd_inject(struct work_struct *work)
+irqfd_inject_edge(struct work_struct *work)
 {
struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
struct kvm *kvm = irqfd-kvm;
@@ -71,6 +130,14 @@ irqfd_inject(struct work_struct *work)
kvm_set_irq(kvm, 

[PATCH v2 5/6] kvm: KVM_EOIFD, an eventfd for EOIs

2012-06-26 Thread Alex Williamson
This new ioctl enables an eventfd to be triggered when an EOI is
written for a specified irqchip pin.  By default this is a simple
notification, but we can also tie the eoifd to a level irqfd, which
enables the irqchip pin to be automatically de-asserted on EOI.
This mode is particularly useful for device-assignment applications
where the unmask and notify triggers a hardware unmask.  The default
mode is most applicable to simple notify with no side-effects for
userspace usage, such as Qemu.

Here we make use of the reference counting of the _irq_source
object allowing us to share it with an irqfd and cleanup regardless
of the release order.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Documentation/virtual/kvm/api.txt |   24 +
 arch/x86/kvm/x86.c|1 
 include/linux/kvm.h   |   14 +++
 include/linux/kvm_host.h  |   13 +++
 virt/kvm/eventfd.c|  189 +
 virt/kvm/kvm_main.c   |   11 ++
 6 files changed, 250 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index b216709..87a2558 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1987,6 +1987,30 @@ interrupts with those injected through KVM_IRQ_LINE.  
IRQFDs created
 with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
 KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
 
+4.77 KVM_EOIFD
+
+Capability: KVM_CAP_EOIFD
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_eoifd (in)
+Returns: 0 on success, -1 on error
+
+KVM_EOIFD allows userspace to receive EOI notification through an
+eventfd for level triggered irqchip interrupts.  Behavior for edge
+triggered interrupts is undefined.  kvm_eoifd.fd specifies the eventfd
+used for notification and kvm_eoifd.gsi specifies the irchip pin,
+similar to KVM_IRQFD.  KVM_EOIFD_FLAG_DEASSIGN is used to deassign
+a previously enabled eoifd and should also set fd and gsi to match.
+
+The KVM_EOIFD_FLAG_LEVEL_IRQFD flag indicates that the EOI is for
+a level triggered EOI and the kvm_eoifd structure includes
+kvm_eoifd.irqfd, which must be previously configured using KVM_IRQFD
+with the KVM_IRQFD_FLAG_LEVEL flag.  This allows both EOI notification
+through kvm_eoifd.fd as well as automatically de-asserting level
+irqfds on EOI.  Both KVM_EOIFD_FLAG_DEASSIGN and
+KVM_EOIFD_FLAG_LEVEL_IRQFD should be used to de-assign an eoifd
+initially setup with KVM_EOIFD_FLAG_LEVEL_IRQFD.
+
 5. The kvm_run structure
 
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 80bed07..62d6eca 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2149,6 +2149,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_PCI_2_3:
case KVM_CAP_KVMCLOCK_CTRL:
case KVM_CAP_IRQFD_LEVEL:
+   case KVM_CAP_EOIFD:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index b2e6e4f..7567e7d 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -619,6 +619,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_S390_COW 79
 #define KVM_CAP_PPC_ALLOC_HTAB 80
 #define KVM_CAP_IRQFD_LEVEL 81
+#define KVM_CAP_EOIFD 82
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -694,6 +695,17 @@ struct kvm_irqfd {
__u8  pad[20];
 };
 
+#define KVM_EOIFD_FLAG_DEASSIGN (1  0)
+#define KVM_EOIFD_FLAG_LEVEL_IRQFD (1  1)
+
+struct kvm_eoifd {
+   __u32 fd;
+   __u32 gsi;
+   __u32 flags;
+   __u32 irqfd;
+   __u8 pad[16];
+};
+
 struct kvm_clock_data {
__u64 clock;
__u32 flags;
@@ -834,6 +846,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_PPC_GET_SMMU_INFO_IOR(KVMIO,  0xa6, struct kvm_ppc_smmu_info)
 /* Available with KVM_CAP_PPC_ALLOC_HTAB */
 #define KVM_PPC_ALLOCATE_HTAB_IOWR(KVMIO, 0xa7, __u32)
+/* Available with KVM_CAP_EOIFD */
+#define KVM_EOIFD _IOW(KVMIO,  0xa8, struct kvm_eoifd)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ae3b426..83472eb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -285,6 +285,10 @@ struct kvm {
struct list_head  items;
} irqfds;
struct list_head ioeventfds;
+   struct {
+   spinlock_tlock;
+   struct list_head  items;
+   } eoifds;
 #endif
struct kvm_vm_stat stat;
struct kvm_arch arch;
@@ -828,6 +832,8 @@ int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args);
 void kvm_irqfd_release(struct kvm *kvm);
 void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
 int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
+int kvm_eoifd(struct kvm *kvm, struct kvm_eoifd *args);
+void kvm_eoifd_release(struct kvm *kvm);
 
 #else
 
@@ -853,6 +859,13 @@ static inline int kvm_ioeventfd(struct kvm 

[PATCH v2 6/6] kvm: Level IRQ de-assert for KVM_IRQFD

2012-06-26 Thread Alex Williamson
This is an alternate level irqfd de-assert mode that's potentially
useful for emulated drivers.  It's included here to show how easy it
is to implement with the new level irqfd and eoifd support.  It's
possible this mode might also prove interesting for device-assignment
where we inject via level irqfd, receive an EOI (w/o de-assert), and
use the level de-assert irqfd here.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Documentation/virtual/kvm/api.txt |8 
 include/linux/kvm.h   |6 +-
 virt/kvm/eventfd.c|   28 ++--
 3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 87a2558..b356937 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1987,6 +1987,14 @@ interrupts with those injected through KVM_IRQ_LINE.  
IRQFDs created
 with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
 KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
 
+The KVM_IRQFD_FLAG_LEVEL_DEASSERT flag creates an irqfd similar to
+KVM_IRQFD_FLAG_LEVEL, except the irqfd de-asserts the irqchip pin
+rather than asserts it.  The level irqfd must first be created as
+aboved and passed to this ioctl in kvm_irqfd.irqfd.  This ensures
+the de-assert irqfd uses the same IRQ source ID as the assert irqfd.
+This flag should also be specified on de-assign.  This feature is
+present when KVM_CAP_IRQFD_LEVEL_DEASSERT is available.
+
 4.77 KVM_EOIFD
 
 Capability: KVM_CAP_EOIFD
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 7567e7d..0bbfd47 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -620,6 +620,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_ALLOC_HTAB 80
 #define KVM_CAP_IRQFD_LEVEL 81
 #define KVM_CAP_EOIFD 82
+#define KVM_CAP_IRQFD_LEVEL_DEASSERT 83
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -687,12 +688,15 @@ struct kvm_xen_hvm_config {
 #define KVM_IRQFD_FLAG_DEASSIGN (1  0)
 /* Available with KVM_CAP_IRQFD_LEVEL */
 #define KVM_IRQFD_FLAG_LEVEL (1  1)
+/* Available with KVM_CAP_IRQFD_LEVEL_DEASSERT */
+#define KVM_IRQFD_FLAG_LEVEL_DEASSERT (1  2)
 
 struct kvm_irqfd {
__u32 fd;
__u32 gsi;
__u32 flags;
-   __u8  pad[20];
+   __u32 irqfd;
+   __u8  pad[16];
 };
 
 #define KVM_EOIFD_FLAG_DEASSIGN (1  0)
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 02ca50f..50ace0e 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -172,6 +172,14 @@ irqfd_inject_level(struct work_struct *work)
kvm_set_irq(irqfd-kvm, irqfd-source-id, irqfd-gsi, 1);
 }
 
+static void
+irqfd_inject_level_deassert(struct work_struct *work)
+{
+   struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
+
+   kvm_set_irq(irqfd-kvm, irqfd-source-id, irqfd-gsi, 0);
+}
+
 /*
  * Race-free decouple logic (ordering is critical)
  */
@@ -320,6 +328,9 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
INIT_LIST_HEAD(irqfd-list);
 
if (args-flags  KVM_IRQFD_FLAG_LEVEL) {
+   if (args-flags  KVM_IRQFD_FLAG_LEVEL_DEASSERT)
+   return -EINVAL; /* mutually exclusive */
+
irqfd-source = new_irq_source(kvm);
if (IS_ERR(irqfd-source)) {
ret = PTR_ERR(irqfd-source);
@@ -328,6 +339,16 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
}
 
INIT_WORK(irqfd-inject, irqfd_inject_level);
+
+   } else if (args-flags  KVM_IRQFD_FLAG_LEVEL_DEASSERT) {
+   irqfd-source = get_irq_source_from_irqfd(kvm, args-irqfd);
+   if (IS_ERR(irqfd-source)) {
+   ret = PTR_ERR(irqfd-source);
+   irqfd-source = NULL;
+   goto fail;
+   }
+
+   INIT_WORK(irqfd-inject, irqfd_inject_level_deassert);
} else
INIT_WORK(irqfd-inject, irqfd_inject_edge);
 
@@ -421,7 +442,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
 {
struct _irqfd *irqfd, *tmp;
struct eventfd_ctx *eventfd;
-   bool is_level = (args-flags  KVM_IRQFD_FLAG_LEVEL) != 0;
+   bool is_level = (args-flags  (KVM_IRQFD_FLAG_LEVEL |
+   KVM_IRQFD_FLAG_LEVEL_DEASSERT)) != 0;
 
eventfd = eventfd_ctx_fdget(args-fd);
if (IS_ERR(eventfd))
@@ -461,7 +483,9 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
 int
 kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
 {
-   if (args-flags  ~(KVM_IRQFD_FLAG_DEASSIGN | KVM_IRQFD_FLAG_LEVEL))
+   if (args-flags  ~(KVM_IRQFD_FLAG_DEASSIGN |
+   KVM_IRQFD_FLAG_LEVEL |
+   KVM_IRQFD_FLAG_LEVEL_DEASSERT))
return -EINVAL;
 
if (args-flags  KVM_IRQFD_FLAG_DEASSIGN)

--
To unsubscribe from this list: send the line unsubscribe 

[PATCH v2 0/6] kvm: level triggered irqfd support

2012-06-26 Thread Alex Williamson
Ok, let's see how this flies.  I actually quite like this, so be
gentle tearing it apart ;)

I just couldn't bring myself to contort KVM_IRQFD into something
that either sets up an irqfd or specifies a nearly unrelated EOI
eventfd.  The solution I've come up with, that also avoids exposing
irq_source_ids to userspace, is to work through the irqfd.  If we
setup a level irqfd, we can optionally associate an eoifd with
the same irq_source_id, by passing the irqfd.  To do this, we just
need to create a new reference counted object for the source ID
so we don't run into problems ordering release.  This means we
end up with a KVM_EOIFD ioctl that has both general usefulness and
can still tie into an irqfd.  In patch 6/6 I also include an
alternate de-assert mechanism via an irqfd with the opposite
polarity.  I don't currently use this, but it's pretty trivial
and at least available in archives now.

I don't address whether injecting an edge irqfd really needs an assert
followed by de-assert (I don't know).  This new interface really
unties itself from caring.  We might be able to consolidate inject
functions at some future point, but it doesn't change how we'd name
flags as it did in the previous version.  Thanks,

Alex

---

Alex Williamson (6):
  kvm: Level IRQ de-assert for KVM_IRQFD
  kvm: KVM_EOIFD, an eventfd for EOIs
  kvm: Extend irqfd to support level interrupts
  kvm: Sanitize KVM_IRQFD flags
  kvm: Add missing KVM_IRQFD API documentation
  kvm: Pass kvm_irqfd to functions


 Documentation/virtual/kvm/api.txt |   53 ++
 arch/x86/kvm/x86.c|2 
 include/linux/kvm.h   |   23 +++
 include/linux/kvm_host.h  |   17 ++
 virt/kvm/eventfd.c|  323 -
 virt/kvm/kvm_main.c   |   13 +
 6 files changed, 414 insertions(+), 17 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/6] kvm: Pass kvm_irqfd to functions

2012-06-26 Thread Alex Williamson
Prune this down to just the struct kvm_irqfd so we can avoid
changing function definition for every flag or field we use.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 include/linux/kvm_host.h |4 ++--
 virt/kvm/eventfd.c   |   20 ++--
 virt/kvm/kvm_main.c  |2 +-
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 27ac8a4..ae3b426 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -824,7 +824,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {}
 #ifdef CONFIG_HAVE_KVM_EVENTFD
 
 void kvm_eventfd_init(struct kvm *kvm);
-int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags);
+int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args);
 void kvm_irqfd_release(struct kvm *kvm);
 void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
 int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
@@ -833,7 +833,7 @@ int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd 
*args);
 
 static inline void kvm_eventfd_init(struct kvm *kvm) {}
 
-static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
+static inline int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
 {
return -EINVAL;
 }
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index f59c1e8..c307c24 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -198,7 +198,7 @@ static void irqfd_update(struct kvm *kvm, struct _irqfd 
*irqfd,
 }
 
 static int
-kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
+kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 {
struct kvm_irq_routing_table *irq_rt;
struct _irqfd *irqfd, *tmp;
@@ -212,12 +212,12 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
return -ENOMEM;
 
irqfd-kvm = kvm;
-   irqfd-gsi = gsi;
+   irqfd-gsi = args-gsi;
INIT_LIST_HEAD(irqfd-list);
INIT_WORK(irqfd-inject, irqfd_inject);
INIT_WORK(irqfd-shutdown, irqfd_shutdown);
 
-   file = eventfd_fget(fd);
+   file = eventfd_fget(args-fd);
if (IS_ERR(file)) {
ret = PTR_ERR(file);
goto fail;
@@ -298,19 +298,19 @@ kvm_eventfd_init(struct kvm *kvm)
  * shutdown any irqfd's that match fd+gsi
  */
 static int
-kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi)
+kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
 {
struct _irqfd *irqfd, *tmp;
struct eventfd_ctx *eventfd;
 
-   eventfd = eventfd_ctx_fdget(fd);
+   eventfd = eventfd_ctx_fdget(args-fd);
if (IS_ERR(eventfd))
return PTR_ERR(eventfd);
 
spin_lock_irq(kvm-irqfds.lock);
 
list_for_each_entry_safe(irqfd, tmp, kvm-irqfds.items, list) {
-   if (irqfd-eventfd == eventfd  irqfd-gsi == gsi) {
+   if (irqfd-eventfd == eventfd  irqfd-gsi == args-gsi) {
/*
 * This rcu_assign_pointer is needed for when
 * another thread calls kvm_irq_routing_update before
@@ -338,12 +338,12 @@ kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi)
 }
 
 int
-kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
+kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
 {
-   if (flags  KVM_IRQFD_FLAG_DEASSIGN)
-   return kvm_irqfd_deassign(kvm, fd, gsi);
+   if (args-flags  KVM_IRQFD_FLAG_DEASSIGN)
+   return kvm_irqfd_deassign(kvm, args);
 
-   return kvm_irqfd_assign(kvm, fd, gsi);
+   return kvm_irqfd_assign(kvm, args);
 }
 
 /*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 02cb440..b4ad14cc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2059,7 +2059,7 @@ static long kvm_vm_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(data, argp, sizeof data))
goto out;
-   r = kvm_irqfd(kvm, data.fd, data.gsi, data.flags);
+   r = kvm_irqfd(kvm, data);
break;
}
case KVM_IOEVENTFD: {

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/6] kvm: Add missing KVM_IRQFD API documentation

2012-06-26 Thread Alex Williamson
Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Documentation/virtual/kvm/api.txt |   16 
 1 file changed, 16 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 310fe50..ea9edce 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1965,6 +1965,22 @@ return the hash table order in the parameter.  (If the 
guest is using
 the virtualized real-mode area (VRMA) facility, the kernel will
 re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
 
+4.76 KVM_IRQFD
+
+Capability: KVM_CAP_IRQFD
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_irqfd (in)
+Returns: 0 on success, -1 on error
+
+Allows setting an eventfd to directly trigger a guest interrupt.
+kvm_irqfd.fd specifies the file descriptor to use as the eventfd and
+kvm_irqfd.gsi specifies the irqchip pin toggled by this event.  When
+an event is tiggered on the eventfd, an interrupt is injected into
+the guest using the specified gsi pin.  The irqfd is removed using
+the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
+and kvm_irqfd.gsi.
+
 
 5. The kvm_run structure
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 10/17] PowerPC: booke64: Refactor exception prolog for save/restore regs

2012-06-26 Thread Benjamin Herrenschmidt
On Mon, 2012-06-25 at 15:26 +0300, Mihai Caraman wrote:
 Refactor exception prolog to allow save/restore register parameters. Add
 addition none definition for exception prolog usage.
 This is needed for exceptions like Guest Doorbell that use GSRRx regsiters
 which do not map on exception type.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
  arch/powerpc/kernel/exceptions-64e.S |   23 ---
  1 files changed, 8 insertions(+), 15 deletions(-)
 
 diff --git a/arch/powerpc/kernel/exceptions-64e.S 
 b/arch/powerpc/kernel/exceptions-64e.S
 index 7215cc2..52aa96b 100644
 --- a/arch/powerpc/kernel/exceptions-64e.S
 +++ b/arch/powerpc/kernel/exceptions-64e.S
 @@ -35,7 +35,7 @@
  #define  SPECIAL_EXC_FRAME_SIZE  INT_FRAME_SIZE
  
  /* Exception prolog code for all exceptions */
 -#define EXCEPTION_PROLOG(n, type, addition)  \
 +#define EXCEPTION_PROLOG(n, type, srr0, srr1, addition)  
 \
   mtspr   SPRN_SPRG_##type##_SCRATCH,r13; /* get spare registers */   \
   mfspr   r13,SPRN_SPRG_PACA; /* get PACA */  \
   std r10,PACA_EX##type+EX_R10(r13);  \
 @@ -44,54 +44,47 @@
   addition;   /* additional code for that exc. */ \
   std r1,PACA_EX##type+EX_R1(r13); /* save old r1 in the PACA */  \
   stw r10,PACA_EX##type+EX_CR(r13); /* save old CR in the PACA */ \
 - mfspr   r11,SPRN_##type##_SRR1;/* what are we coming from */\
 + mfspr   r11,srr1;/* what are we coming from */  \
   type##_SET_KSTACK;  /* get special stack if necessary */\
   andi.   r10,r11,MSR_PR; /* save stack pointer */\
   beq 1f; /* branch around if supervisor */   \
   ld  r1,PACAKSAVE(r13);  /* get kernel stack coming from usr */\
  1:   cmpdi   cr1,r1,0;   /* check if SP makes sense */   \
   bge-cr1,exc_##n##_bad_stack;/* bad stack (TODO: out of line) */ \
 - mfspr   r10,SPRN_##type##_SRR0; /* read SRR0 before touching stack */
 + mfspr   r10,srr0;   /* read SRR0 before touching stack */

No, use the existing macro, use a ##type## specific to guest doorbells,
with appropriate definitions of the corresponding SPRN_ macros.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 13/17] PowerPC: booke64: Use SPRG0/3 scratch for bolted TLB miss crit int

2012-06-26 Thread Benjamin Herrenschmidt
On Mon, 2012-06-25 at 15:26 +0300, Mihai Caraman wrote:
 Embedded.Hypervisor category defines GSPRG0..3 physical registers for guests.
 Avoid SPRG4-7 usage as scratch in host exception handlers, otherwise guest
 SPRG4-7 registers will be clobbered.
 For bolted TLB miss exception handlers, which is the version currently
 supported by KVM, use SPRN_SPRG_GEN_SCRATCH (aka SPRG0) instead of
 SPRN_SPRG_TLB_SCRATCH (aka SPRG6) and replace TLB with GEN PACA slots to
 keep consitency.
 For critical exception handler use SPRG3 instead of SPRG7.

Beware with SPRG3 usage. It's user space visible and we plan to use it
for other things (see Anton's patch to stick topology information in
there for use by the vdso). If you clobber it, you may want to restore
it later.

I think Anton's patch should put the proper value we want in the PACA
anyway since we also need to restore it on exit from KVM, so you can
still use it as scratch, just restore the value before going to C.

Cheers,
Ben.

 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
  arch/powerpc/include/asm/exception-64e.h |   14 +++---
  arch/powerpc/include/asm/reg.h   |6 +++---
  arch/powerpc/mm/tlb_low_64e.S|   28 ++--
  3 files changed, 24 insertions(+), 24 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/exception-64e.h 
 b/arch/powerpc/include/asm/exception-64e.h
 index ac13add..c90a9a4 100644
 --- a/arch/powerpc/include/asm/exception-64e.h
 +++ b/arch/powerpc/include/asm/exception-64e.h
 @@ -38,8 +38,11 @@
   */
  
 
 -/* We are out of SPRGs so we save some things in the PACA. The normal
 - * exception frame is smaller than the CRIT or MC one though
 +/* We are out of SPRGs so we save some things in the 8 slots available in 
 PACA.
 + * The normal exception frame is smaller than the CRIT or MC one though
 + *
 + * Bolted TLB miss exception variant also uses these slots which in 
 combination
 + * with pgd and kernel_pgd fits in one 64-byte cache line.
   */
  #define EX_R1(0 * 8)
  #define EX_CR(1 * 8)
 @@ -47,13 +50,10 @@
  #define EX_R11   (3 * 8)
  #define EX_R14   (4 * 8)
  #define EX_R15   (5 * 8)
 +#define EX_R16   (6 * 8)
  
  /*
 - * The TLB miss exception uses different slots.
 - *
 - * The bolted variant uses only the first six fields,
 - * which in combination with pgd and kernel_pgd fits in
 - * one 64-byte cache line.
 + * PACA slots offset for standard TLB miss exception.
   */
  
  #define EX_TLB_R10   ( 0 * 8)
 diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
 index f0cb7f4..51c14a7 100644
 --- a/arch/powerpc/include/asm/reg.h
 +++ b/arch/powerpc/include/asm/reg.h
 @@ -760,10 +760,10 @@
   * 64-bit embedded
   *   - SPRG0 generic exception scratch
   *   - SPRG2 TLB exception stack
 - *   - SPRG3 unused (user visible)
 + *   - SPRG3 critical exception scratch (user visible)
   *   - SPRG4 unused (user visible)
   *   - SPRG6 TLB miss scratch (user visible, sorry !)
 - *   - SPRG7 critical exception scratch
 + *   - SPRG7 unused (user visible)
   *   - SPRG8 machine check exception scratch
   *   - SPRG9 debug exception scratch
   *
 @@ -857,7 +857,7 @@
  
  #ifdef CONFIG_PPC_BOOK3E_64
  #define SPRN_SPRG_MC_SCRATCH SPRN_SPRG8
 -#define SPRN_SPRG_CRIT_SCRATCH   SPRN_SPRG7
 +#define SPRN_SPRG_CRIT_SCRATCH   SPRN_SPRG3
  #define SPRN_SPRG_DBG_SCRATCHSPRN_SPRG9
  #define SPRN_SPRG_TLB_EXFRAMESPRN_SPRG2
  #define SPRN_SPRG_TLB_SCRATCHSPRN_SPRG6
 diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
 index 88feaaa..4192ade 100644
 --- a/arch/powerpc/mm/tlb_low_64e.S
 +++ b/arch/powerpc/mm/tlb_low_64e.S
 @@ -40,36 +40,36 @@
   **/
  
  .macro tlb_prolog_bolted intnum addr
 - mtspr   SPRN_SPRG_TLB_SCRATCH,r13
 + mtspr   SPRN_SPRG_GEN_SCRATCH,r13
   mfspr   r13,SPRN_SPRG_PACA
 - std r10,PACA_EXTLB+EX_TLB_R10(r13)
 + std r10,PACA_EXGEN+EX_R10(r13)
   mfcrr10
 - std r11,PACA_EXTLB+EX_TLB_R11(r13)
 + std r11,PACA_EXGEN+EX_R11(r13)
  #ifdef CONFIG_KVM_BOOKE_HV
  BEGIN_FTR_SECTION
   mfspr   r11, SPRN_SRR1
  END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
  #endif
   DO_KVM  \intnum, SPRN_SRR1
 - std r16,PACA_EXTLB+EX_TLB_R16(r13)
 + std r16,PACA_EXGEN+EX_R16(r13)
   mfspr   r16,\addr   /* get faulting address */
 - std r14,PACA_EXTLB+EX_TLB_R14(r13)
 + std r14,PACA_EXGEN+EX_R14(r13)
   ld  r14,PACAPGD(r13)
 - std r15,PACA_EXTLB+EX_TLB_R15(r13)
 - std r10,PACA_EXTLB+EX_TLB_CR(r13)
 + std r15,PACA_EXGEN+EX_R15(r13)
 + std r10,PACA_EXGEN+EX_CR(r13)
   TLB_MISS_PROLOG_STATS_BOLTED
  .endm
  
  .macro tlb_epilog_bolted
 - ld  r14,PACA_EXTLB+EX_TLB_CR(r13)
 - ld  r10,PACA_EXTLB+EX_TLB_R10(r13)
 - ld  

Re: [RFC PATCH 13/17] PowerPC: booke64: Use SPRG0/3 scratch for bolted TLB miss crit int

2012-06-26 Thread Scott Wood
On 06/25/2012 07:26 AM, Mihai Caraman wrote:
 Embedded.Hypervisor category defines GSPRG0..3 physical registers for guests.
 Avoid SPRG4-7 usage as scratch in host exception handlers, otherwise guest
 SPRG4-7 registers will be clobbered.
 For bolted TLB miss exception handlers, which is the version currently
 supported by KVM, use SPRN_SPRG_GEN_SCRATCH (aka SPRG0) instead of
 SPRN_SPRG_TLB_SCRATCH (aka SPRG6) and replace TLB with GEN PACA slots to
 keep consitency.
 For critical exception handler use SPRG3 instead of SPRG7.

extlb is in the same cache line as other TLB stuff we need, while exgen
isn't.  Let's stick with extlb.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 03/17] KVM: PPC64: booke: Add EPCR support in sregs

2012-06-26 Thread Scott Wood
On 06/25/2012 07:26 AM, Mihai Caraman wrote:
 Add KVM_SREGS_E_64 feature and EPCR spr support in get/set sregs
 for 64-bit hosts.
 
 Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
 ---
  arch/powerpc/kvm/booke.c |   14 ++
  1 files changed, 14 insertions(+), 0 deletions(-)
 
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index f9fa260..d15c4b5 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -1052,6 +1052,9 @@ static void get_sregs_base(struct kvm_vcpu *vcpu,
   u64 tb = get_tb();
  
   sregs-u.e.features |= KVM_SREGS_E_BASE;
 +#ifdef CONFIG_64BIT
 + sregs-u.e.features |= KVM_SREGS_E_64;0
 +#endif
  
   sregs-u.e.csrr0 = vcpu-arch.csrr0;
   sregs-u.e.csrr1 = vcpu-arch.csrr1;
 @@ -1063,6 +1066,9 @@ static void get_sregs_base(struct kvm_vcpu *vcpu,
   sregs-u.e.dec = kvmppc_get_dec(vcpu, tb);
   sregs-u.e.tb = tb;
   sregs-u.e.vrsave = vcpu-arch.vrsave;
 +#ifdef CONFIG_64BIT
 + sregs-u.e.epcr = vcpu-arch.epcr;
 +#endif
  }
  
  static int set_sregs_base(struct kvm_vcpu *vcpu,
 @@ -1071,6 +1077,11 @@ static int set_sregs_base(struct kvm_vcpu *vcpu,
   if (!(sregs-u.e.features  KVM_SREGS_E_BASE))
   return 0;
  
 +#ifdef CONFIG_64BIT
 + if (!(sregs-u.e.features  KVM_SREGS_E_64))
 + return 0;
 +#endif

This means that a QEMU targeting a 32-bit guest won't be able to set any
special registers, if it sets feature bits manually rather than getting
them from GET_SREGS.

This check should only qualify whether we look at sregs.u.e.epcr, not
whether this function works at all.

BTW, shouldn't the BASE check return an error rather than silently no-op?

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html