date:20111014

[Qemu-devel] [PATCH 1/1 V4] qemu-kvm: fix improper nmi emulation

2011-10-14 Thread Lai Jiangshan

On 10/14/2011 01:53 PM, Jan Kiszka wrote:
 On 2011-10-14 02:53, Lai Jiangshan wrote:


 As explained in some other mail, we could then emulate the missing
 kernel feature by reading out the current in-kernel APIC state, testing
 if LINT1 is unmasked, and then delivering the NMI directly.


 Only the thread of the VCPU can safely get the in-kernel LAPIC states,
 so this approach will cause some troubles.
 
 run_on_cpu() can help.
 
 Jan
 

Ah, I forgot it, Thanks.

From: Lai Jiangshan la...@cn.fujitsu.com

Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
button event happens. This doesn't properly emulate real hardware on
which NMI button event triggers LINT1. Because of this, NMI is sent to
the processor even when LINT1 is maskied in LVT. For example, this
causes the problem that kdump initiated by NMI sometimes doesn't work
on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.

With this patch, inject-nmi request is handled as follows.

- When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
  interrupt.
- When in-kernel irqchip is enabled, get the in-kernel LAPIC states
  and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
  delivering the NMI directly. (Suggested by Jan Kiszka)

Changed from old version:
  re-implement it by the Jan's suggestion.

Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
Reported-by: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com
---
 hw/apic.c |   48 
 hw/apic.h |1 +
 monitor.c |6 +-
 3 files changed, 54 insertions(+), 1 deletions(-)
diff --git a/hw/apic.c b/hw/apic.c
index 69d6ac5..9a40129 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -205,6 +205,54 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
 }
 }
 
+#ifdef KVM_CAP_IRQCHIP
+static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id);
+
+struct kvm_get_remote_lapic_params {
+CPUState *env;
+struct kvm_lapic_state klapic;
+};
+
+static void kvm_get_remote_lapic(void *p)
+{
+struct kvm_get_remote_lapic_params *params = p;
+
+kvm_get_lapic(params-env, params-klapic);
+}
+
+void apic_deliver_nmi(DeviceState *d)
+{
+APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
+
+if (kvm_irqchip_in_kernel()) {
+struct kvm_get_remote_lapic_params p = {.env = s-cpu_env,};
+uint32_t lvt;
+
+run_on_cpu(s-cpu_env, kvm_get_remote_lapic, p);
+lvt = kapic_reg(p.klapic, 0x32 + APIC_LVT_LINT1);
+
+if (lvt  APIC_LVT_MASKED) {
+return;
+}
+
+if (((lvt  8)  7) != APIC_DM_NMI) {
+return;
+}
+
+cpu_interrupt(s-cpu_env, CPU_INTERRUPT_NMI);
+} else {
+apic_local_deliver(s, APIC_LVT_LINT1);
+}
+}
+#else
+void apic_deliver_nmi(DeviceState *d)
+{
+APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
+
+apic_local_deliver(s, APIC_LVT_LINT1);
+}
+#endif
+
 #define foreach_apic(apic, deliver_bitmask, code) \
 {\
 int __i, __j, __mask;\
diff --git a/hw/apic.h b/hw/apic.h
index c857d52..3a4be0a 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -10,6 +10,7 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode,
  uint8_t trigger_mode);
 int apic_accept_pic_intr(DeviceState *s);
 void apic_deliver_pic_intr(DeviceState *s, int level);
+void apic_deliver_nmi(DeviceState *d);
 int apic_get_interrupt(DeviceState *s);
 void apic_reset_irq_delivered(void);
 int apic_get_irq_delivered(void);
diff --git a/monitor.c b/monitor.c
index cb485bf..0b81f17 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2616,7 +2616,11 @@ static int do_inject_nmi(Monitor *mon, const QDict 
*qdict, QObject **ret_data)
 CPUState *env;
 
 for (env = first_cpu; env != NULL; env = env-next_cpu) {
-cpu_interrupt(env, CPU_INTERRUPT_NMI);
+if (!env-apic_state) {
+cpu_interrupt(env, CPU_INTERRUPT_NMI);
+} else {
+apic_deliver_nmi(env-apic_state);
+}
 }
 
 return 0;

Re: [Qemu-devel] [Qemu-ppc] [PATCH] ppcr: Avoid decrementer related kvm exits

2011-10-14 Thread David Gibson

On Fri, Oct 14, 2011 at 07:30:09AM +0200, Alexander Graf wrote:
 
 On 14.10.2011, at 07:19, David Gibson wrote:
 
  In __cpu_ppc_store_decr(), we set up a regular timer used to trigger
  decrementer interrupts.  This is necessary to implement the decrementer
  properly under TCG, but is unnecessary under KVM (true for both Book3S-PR
  and Book3S-HV KVM variants), because the kernel handles generating and
  delivering decrementer exceptions.
  
  Under kvm, in fact, the timer causes expensive and unnecessary exits from
  kvm to qemu.  This patch, therefore, disables setting the timer when kvm
  is in use.
  
  Signed-off-by: Anton Blanchard an...@au1.ibm.com
  Signed-off-by: David Gibson da...@gibson.dropbear.id.au
  ---
  hw/ppc.c |   25 ++---
  1 files changed, 14 insertions(+), 11 deletions(-)
  
  diff --git a/hw/ppc.c b/hw/ppc.c
  index 25b59dd..87aa4e5 100644
  --- a/hw/ppc.c
  +++ b/hw/ppc.c
  @@ -658,21 +658,24 @@ static void __cpu_ppc_store_decr (CPUState *env, 
  uint64_t *nextp,
 
 Do we ever call store_decr in the kvm case? Isn't that only called
 from emulated mtdec?

Yes, from cpu_ppc_set_tb_clk().  Anton observed the kvm exits in the
wild, they're not theoretical.

Agh, which reminds me, I forgot to fixup the git author again.  The
patch should show authorship by Anton Blanchard an...@au1.ibm.com,
as in the s-o-b.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Re: [Qemu-devel] [Qemu-ppc] [PATCH] ppcr: Avoid decrementer related kvm exits

2011-10-14 Thread Alexander Graf


On 14.10.2011, at 08:36, David Gibson wrote:

 On Fri, Oct 14, 2011 at 07:30:09AM +0200, Alexander Graf wrote:
 
 On 14.10.2011, at 07:19, David Gibson wrote:
 
 In __cpu_ppc_store_decr(), we set up a regular timer used to trigger
 decrementer interrupts.  This is necessary to implement the decrementer
 properly under TCG, but is unnecessary under KVM (true for both Book3S-PR
 and Book3S-HV KVM variants), because the kernel handles generating and
 delivering decrementer exceptions.
 
 Under kvm, in fact, the timer causes expensive and unnecessary exits from
 kvm to qemu.  This patch, therefore, disables setting the timer when kvm
 is in use.
 
 Signed-off-by: Anton Blanchard an...@au1.ibm.com
 Signed-off-by: David Gibson da...@gibson.dropbear.id.au
 ---
 hw/ppc.c |   25 ++---
 1 files changed, 14 insertions(+), 11 deletions(-)
 
 diff --git a/hw/ppc.c b/hw/ppc.c
 index 25b59dd..87aa4e5 100644
 --- a/hw/ppc.c
 +++ b/hw/ppc.c
 @@ -658,21 +658,24 @@ static void __cpu_ppc_store_decr (CPUState *env, 
 uint64_t *nextp,
 
 Do we ever call store_decr in the kvm case? Isn't that only called
 from emulated mtdec?
 
 Yes, from cpu_ppc_set_tb_clk().  Anton observed the kvm exits in the
 wild, they're not theoretical.
 
 Agh, which reminds me, I forgot to fixup the git author again.  The
 patch should show authorship by Anton Blanchard an...@au1.ibm.com,
 as in the s-o-b.

Wouldn't a simple

if (kvm_enabled()) {
return;
}

in the beginning of the function make more sense? There's no code connecting 
the in-qemu and the in-kvm decrementors atm, so any logic applying to the 
in-qemu one is moot for kvm.


Alex

Re: [Qemu-devel] [PATCH v2 1/2] Introduce QLIST_INSERT_HEAD_RCU and dummy RCU wrappers.

2011-10-14 Thread Paolo Bonzini


On 10/13/2011 10:35 PM, Harsh Prateek Bora wrote:


+#define QLIST_INSERT_HEAD_RCU(head, elm, field) do {\
+(elm)-field.le_prev =(head)-lh_first;   \
+smp_wmb();  \
+if (((elm)-field.le_next = (head)-lh_first) != NULL)  \
+(head)-lh_first-field.le_prev =(elm)-field.le_next;\
+smp_wmb();  \
+(head)-lh_first = (elm);   \
+smp_wmb();  \
+} while (/* CONSTCOND*/0)


Actually, looking more at it it should be more like

   (elm)-field.le_prev =(head)-lh_first;
   (elm)-field.le_next = (head)-lh_first;
   smb_wmb(); /* fill elm before linking it */
   if ((head)-lh_first != NULL)
  (head)-lh_first-field.le_prev =(elm)-field.le_next;
   (head)-lh_first = (elm);
   smp_wmb();

... which even saves a memory barrier.

Paolo

Re: [Qemu-devel] [Qemu-ppc] [PATCH] ppcr: Avoid decrementer related kvm exits

2011-10-14 Thread David Gibson

On Fri, Oct 14, 2011 at 08:44:06AM +0200, Alexander Graf wrote:
 
 On 14.10.2011, at 08:36, David Gibson wrote:
 
  On Fri, Oct 14, 2011 at 07:30:09AM +0200, Alexander Graf wrote:
  
  On 14.10.2011, at 07:19, David Gibson wrote:
  
  In __cpu_ppc_store_decr(), we set up a regular timer used to trigger
  decrementer interrupts.  This is necessary to implement the decrementer
  properly under TCG, but is unnecessary under KVM (true for both Book3S-PR
  and Book3S-HV KVM variants), because the kernel handles generating and
  delivering decrementer exceptions.
  
  Under kvm, in fact, the timer causes expensive and unnecessary exits from
  kvm to qemu.  This patch, therefore, disables setting the timer when kvm
  is in use.
  
  Signed-off-by: Anton Blanchard an...@au1.ibm.com
  Signed-off-by: David Gibson da...@gibson.dropbear.id.au
  ---
  hw/ppc.c |   25 ++---
  1 files changed, 14 insertions(+), 11 deletions(-)
  
  diff --git a/hw/ppc.c b/hw/ppc.c
  index 25b59dd..87aa4e5 100644
  --- a/hw/ppc.c
  +++ b/hw/ppc.c
  @@ -658,21 +658,24 @@ static void __cpu_ppc_store_decr (CPUState *env, 
  uint64_t *nextp,
  
  Do we ever call store_decr in the kvm case? Isn't that only called
  from emulated mtdec?
  
  Yes, from cpu_ppc_set_tb_clk().  Anton observed the kvm exits in the
  wild, they're not theoretical.
  
  Agh, which reminds me, I forgot to fixup the git author again.  The
  patch should show authorship by Anton Blanchard an...@au1.ibm.com,
  as in the s-o-b.
 
 Wouldn't a simple
 
 if (kvm_enabled()) {
 return;
 }
 
 in the beginning of the function make more sense? There's no code
 connecting the in-qemu and the in-kvm decrementors atm, so any logic
 applying to the in-qemu one is moot for kvm.

Uh.. I guess so.  I wasn't 100% sure the last bit of code in the
function wouldn't have some effect on kvm.  But I guess it doesn't;
I'll revise.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Re: [Qemu-devel] [PATCH 1/1 V4] qemu-kvm: fix improper nmi emulation

2011-10-14 Thread Jan Kiszka

On 2011-10-14 08:36, Lai Jiangshan wrote:
 On 10/14/2011 01:53 PM, Jan Kiszka wrote:
 On 2011-10-14 02:53, Lai Jiangshan wrote:


 As explained in some other mail, we could then emulate the missing
 kernel feature by reading out the current in-kernel APIC state, testing
 if LINT1 is unmasked, and then delivering the NMI directly.


 Only the thread of the VCPU can safely get the in-kernel LAPIC states,
 so this approach will cause some troubles.

 run_on_cpu() can help.

 Jan

 
 Ah, I forgot it, Thanks.
 
 From: Lai Jiangshan la...@cn.fujitsu.com
 
 Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
 button event happens. This doesn't properly emulate real hardware on
 which NMI button event triggers LINT1. Because of this, NMI is sent to
 the processor even when LINT1 is maskied in LVT. For example, this
 causes the problem that kdump initiated by NMI sometimes doesn't work
 on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
 
 With this patch, inject-nmi request is handled as follows.
 
 - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
   interrupt.
 - When in-kernel irqchip is enabled, get the in-kernel LAPIC states
   and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
   delivering the NMI directly. (Suggested by Jan Kiszka)
 
 Changed from old version:
   re-implement it by the Jan's suggestion.
 
 Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
 Reported-by: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com
 ---
  hw/apic.c |   48 
  hw/apic.h |1 +
  monitor.c |6 +-
  3 files changed, 54 insertions(+), 1 deletions(-)
 diff --git a/hw/apic.c b/hw/apic.c
 index 69d6ac5..9a40129 100644
 --- a/hw/apic.c
 +++ b/hw/apic.c
 @@ -205,6 +205,54 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
  }
  }
  
 +#ifdef KVM_CAP_IRQCHIP

Again, this is always defined on x86 thus pointless to test.

 +static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id);
 +
 +struct kvm_get_remote_lapic_params {
 +CPUState *env;
 +struct kvm_lapic_state klapic;
 +};
 +
 +static void kvm_get_remote_lapic(void *p)
 +{
 +struct kvm_get_remote_lapic_params *params = p;
 +
 +kvm_get_lapic(params-env, params-klapic);

When you already interrupted that vcpu, why not inject from here? Avoids
one further ping-pong round.

 +}
 +
 +void apic_deliver_nmi(DeviceState *d)
 +{
 +APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
 +
 +if (kvm_irqchip_in_kernel()) {
 +struct kvm_get_remote_lapic_params p = {.env = s-cpu_env,};
 +uint32_t lvt;
 +
 +run_on_cpu(s-cpu_env, kvm_get_remote_lapic, p);
 +lvt = kapic_reg(p.klapic, 0x32 + APIC_LVT_LINT1);
 +
 +if (lvt  APIC_LVT_MASKED) {
 +return;
 +}
 +
 +if (((lvt  8)  7) != APIC_DM_NMI) {
 +return;
 +}
 +
 +cpu_interrupt(s-cpu_env, CPU_INTERRUPT_NMI);

Err, aren't you introducing KVM_CAP_LAPIC_NMI that allows to test if
this workaround is needed? Oh, your latest kernel patch is missing this
again - requires fixing as well.

Jan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2] runstate: add more valid transitions

2011-10-14 Thread Paolo Bonzini


On 10/13/2011 10:26 PM, Luiz Capitulino wrote:

I'm going to take my word back on this one, I've found the real cause of the
problem. Will post the patch right now.



Are you keeping the vl.c hunks though?

Paolo

[Qemu-devel] [PATCH] correct spelling

2011-10-14 Thread Dong Xu Wang


Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
---
 block/sheepdog.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index c1f6e07..ae857e2 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -66,7 +66,7 @@
  * 20 - 31 (12 bits): reserved data object space
  * 32 - 55 (24 bits): vdi object space
  * 56 - 59 ( 4 bits): reserved vdi object space
- * 60 - 63 ( 4 bits): object type indentifier space
+ * 60 - 63 ( 4 bits): object type identifier space
  */
 
 #define VDI_SPACE_SHIFT   32
-- 
1.7.5.4

Re: [Qemu-devel] [PATCH 1/1 V4] qemu-kvm: fix improper nmi emulation

2011-10-14 Thread Lai Jiangshan

On 10/14/2011 02:49 PM, Jan Kiszka wrote:
 On 2011-10-14 08:36, Lai Jiangshan wrote:
 On 10/14/2011 01:53 PM, Jan Kiszka wrote:
 On 2011-10-14 02:53, Lai Jiangshan wrote:


 As explained in some other mail, we could then emulate the missing
 kernel feature by reading out the current in-kernel APIC state, testing
 if LINT1 is unmasked, and then delivering the NMI directly.


 Only the thread of the VCPU can safely get the in-kernel LAPIC states,
 so this approach will cause some troubles.

 run_on_cpu() can help.

 Jan


 Ah, I forgot it, Thanks.

 From: Lai Jiangshan la...@cn.fujitsu.com

 Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
 button event happens. This doesn't properly emulate real hardware on
 which NMI button event triggers LINT1. Because of this, NMI is sent to
 the processor even when LINT1 is maskied in LVT. For example, this
 causes the problem that kdump initiated by NMI sometimes doesn't work
 on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.

 With this patch, inject-nmi request is handled as follows.

 - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
   interrupt.
 - When in-kernel irqchip is enabled, get the in-kernel LAPIC states
   and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
   delivering the NMI directly. (Suggested by Jan Kiszka)

 Changed from old version:
   re-implement it by the Jan's suggestion.

 Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
 Reported-by: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com
 ---
  hw/apic.c |   48 
  hw/apic.h |1 +
  monitor.c |6 +-
  3 files changed, 54 insertions(+), 1 deletions(-)
 diff --git a/hw/apic.c b/hw/apic.c
 index 69d6ac5..9a40129 100644
 --- a/hw/apic.c
 +++ b/hw/apic.c
 @@ -205,6 +205,54 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
  }
  }
  
 +#ifdef KVM_CAP_IRQCHIP
 
 Again, this is always defined on x86 thus pointless to test.
 
 +static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id);
 +
 +struct kvm_get_remote_lapic_params {
 +CPUState *env;
 +struct kvm_lapic_state klapic;
 +};
 +
 +static void kvm_get_remote_lapic(void *p)
 +{
 +struct kvm_get_remote_lapic_params *params = p;
 +
 +kvm_get_lapic(params-env, params-klapic);
 
 When you already interrupted that vcpu, why not inject from here? Avoids
 one further ping-pong round.

get_remote_lapic and inject nmi are two different things,
so I don't inject nmi from here. I didn't notice this ping-pond overhead.
Thank you.

 
 +}
 +
 +void apic_deliver_nmi(DeviceState *d)
 +{
 +APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
 +
 +if (kvm_irqchip_in_kernel()) {
 +struct kvm_get_remote_lapic_params p = {.env = s-cpu_env,};
 +uint32_t lvt;
 +
 +run_on_cpu(s-cpu_env, kvm_get_remote_lapic, p);
 +lvt = kapic_reg(p.klapic, 0x32 + APIC_LVT_LINT1);
 +
 +if (lvt  APIC_LVT_MASKED) {
 +return;
 +}
 +
 +if (((lvt  8)  7) != APIC_DM_NMI) {
 +return;
 +}
 +
 +cpu_interrupt(s-cpu_env, CPU_INTERRUPT_NMI);
 
 Err, aren't you introducing KVM_CAP_LAPIC_NMI that allows to test if
 this workaround is needed? Oh, your latest kernel patch is missing this
 again - requires fixing as well.
 


Kernel site patch is dropped with this v4 patch.

Did you mean you want KVM_CAP_SET_LINT1 + KVM_SET_LINT1 patches?
I have made them.

Sent soon.

Thanks,
Lai

Re: [Qemu-devel] [PATCH] linux-aio: Allow reads beyond the end of growable images

2011-10-14 Thread Christoph Hellwig

On Thu, Oct 13, 2011 at 03:49:39PM +0200, Kevin Wolf wrote:
 This is the linux-aio version of commits 22afa7b5 (raw-posix, synchronous) and
 ba1d1afd (posix-aio-compat). Reads now produce zeros after the end of file
 instead of failing or resulting in short reads, making linux-aio compatible
 with the behaviour of synchronous raw-posix requests and posix-aio-compat.
 
 The problem can be reproduced like this:
 
 dd if=/dev/zero of=/tmp/test.raw bs=1 count=1234
 ./qemu-io -k -n -g -c 'read -p 1024 512' /tmp/test.raw

Can you send a patch doing this for qemu-iotests?

Re: [Qemu-devel] [PATCH 1/1 V4] qemu-kvm: fix improper nmi emulation

2011-10-14 Thread Jan Kiszka

On 2011-10-14 09:43, Lai Jiangshan wrote:
 On 10/14/2011 02:49 PM, Jan Kiszka wrote:
 On 2011-10-14 08:36, Lai Jiangshan wrote:
 On 10/14/2011 01:53 PM, Jan Kiszka wrote:
 On 2011-10-14 02:53, Lai Jiangshan wrote:


 As explained in some other mail, we could then emulate the missing
 kernel feature by reading out the current in-kernel APIC state, testing
 if LINT1 is unmasked, and then delivering the NMI directly.


 Only the thread of the VCPU can safely get the in-kernel LAPIC states,
 so this approach will cause some troubles.

 run_on_cpu() can help.

 Jan


 Ah, I forgot it, Thanks.

 From: Lai Jiangshan la...@cn.fujitsu.com

 Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
 button event happens. This doesn't properly emulate real hardware on
 which NMI button event triggers LINT1. Because of this, NMI is sent to
 the processor even when LINT1 is maskied in LVT. For example, this
 causes the problem that kdump initiated by NMI sometimes doesn't work
 on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.

 With this patch, inject-nmi request is handled as follows.

 - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
   interrupt.
 - When in-kernel irqchip is enabled, get the in-kernel LAPIC states
   and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
   delivering the NMI directly. (Suggested by Jan Kiszka)

 Changed from old version:
   re-implement it by the Jan's suggestion.

 Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
 Reported-by: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com
 ---
  hw/apic.c |   48 
  hw/apic.h |1 +
  monitor.c |6 +-
  3 files changed, 54 insertions(+), 1 deletions(-)
 diff --git a/hw/apic.c b/hw/apic.c
 index 69d6ac5..9a40129 100644
 --- a/hw/apic.c
 +++ b/hw/apic.c
 @@ -205,6 +205,54 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
  }
  }
  
 +#ifdef KVM_CAP_IRQCHIP

 Again, this is always defined on x86 thus pointless to test.

 +static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int 
 reg_id);
 +
 +struct kvm_get_remote_lapic_params {
 +CPUState *env;
 +struct kvm_lapic_state klapic;
 +};
 +
 +static void kvm_get_remote_lapic(void *p)
 +{
 +struct kvm_get_remote_lapic_params *params = p;
 +
 +kvm_get_lapic(params-env, params-klapic);

 When you already interrupted that vcpu, why not inject from here? Avoids
 one further ping-pong round.
 
 get_remote_lapic and inject nmi are two different things,
 so I don't inject nmi from here. I didn't notice this ping-pond overhead.
 Thank you.

Actually, it is not performance-critical. But there is a race between
obtaining the APIC state and testing for the NMI injection path. So it's
better to define an on-vcpu LINT1 NMI injection service.

 

 +}
 +
 +void apic_deliver_nmi(DeviceState *d)
 +{
 +APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
 +
 +if (kvm_irqchip_in_kernel()) {
 +struct kvm_get_remote_lapic_params p = {.env = s-cpu_env,};
 +uint32_t lvt;
 +
 +run_on_cpu(s-cpu_env, kvm_get_remote_lapic, p);
 +lvt = kapic_reg(p.klapic, 0x32 + APIC_LVT_LINT1);
 +
 +if (lvt  APIC_LVT_MASKED) {
 +return;
 +}
 +
 +if (((lvt  8)  7) != APIC_DM_NMI) {
 +return;
 +}
 +
 +cpu_interrupt(s-cpu_env, CPU_INTERRUPT_NMI);

 Err, aren't you introducing KVM_CAP_LAPIC_NMI that allows to test if
 this workaround is needed? Oh, your latest kernel patch is missing this
 again - requires fixing as well.

 
 
 Kernel site patch is dropped with this v4 patch.
 
 Did you mean you want KVM_CAP_SET_LINT1 + KVM_SET_LINT1 patches?
 I have made them.

OK, so this is going to be applied on top? Then I take this remark back.

Jan



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 0/4] coroutinization of flush and discard (split out of NBD series)

2011-10-14 Thread Paolo Bonzini

Thanks to Stefan's series from today, I managed to understand what
the coroutinization was all about.  So I split this part out of the
NBD series.

Applies on top of block branch + the other five patches.

Paolo Bonzini (3):
  block: rename bdrv_co_rw_bh
  block: unify flush implementations
  block: add bdrv_co_discard and bdrv_aio_discard support

Stefan Hajnoczi (1):
  block: drop redundant bdrv_flush implementation

 block.c   |  236 ++---
 block.h   |3 +
 block/blkdebug.c  |6 --
 block/blkverify.c |9 --
 block/qcow.c  |6 --
 block/qcow2.c |   19 
 block/qed.c   |6 --
 block/raw-posix.c |   18 
 block/raw.c   |   19 ++---
 block_int.h   |   10 ++-
 trace-events  |1 +
 11 files changed, 152 insertions(+), 181 deletions(-)

-- 
1.7.6

[Qemu-devel] [PATCH 1/4] block: rename bdrv_co_rw_bh

2011-10-14 Thread Paolo Bonzini

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 block.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 9873b57..7184a0f 100644
--- a/block.c
+++ b/block.c
@@ -2735,7 +2735,7 @@ static AIOPool bdrv_em_co_aio_pool = {
 .cancel = bdrv_aio_co_cancel_em,
 };
 
-static void bdrv_co_rw_bh(void *opaque)
+static void bdrv_co_em_bh(void *opaque)
 {
 BlockDriverAIOCBCoroutine *acb = opaque;
 
@@ -2758,7 +2758,7 @@ static void coroutine_fn bdrv_co_do_rw(void *opaque)
 acb-req.nb_sectors, acb-req.qiov);
 }
 
-acb-bh = qemu_bh_new(bdrv_co_rw_bh, acb);
+acb-bh = qemu_bh_new(bdrv_co_em_bh, acb);
 qemu_bh_schedule(acb-bh);
 }
 
-- 
1.7.6

Re: [Qemu-devel] [PATCH] savevm: qemu_savevm_state(): Drop stop VM logic

2011-10-14 Thread Kevin Wolf

Am 13.10.2011 22:27, schrieb Luiz Capitulino:
 qemu_savevm_state() has some logic to stop the VM and to (or not to)
 resume it. But this seems to be a big noop, as qemu_savevm_state()
 is only called by do_savevm() when the VM is already stopped.
 
 So, let's drop qemu_savevm_state()'s stop VM logic.
 
 Signed-off-by: Luiz Capitulino lcapitul...@redhat.com

Reviewed-by: Kevin Wolf kw...@redhat.com

[Qemu-devel] [PATCH 2/4] block: unify flush implementations

2011-10-14 Thread Paolo Bonzini

Add coroutine support for flush and apply the same emulation that
we already do for read/write.  bdrv_aio_flush is simplified to always
go through a coroutine.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 block.c |  160 ++-
 block_int.h |1 +
 2 files changed, 71 insertions(+), 90 deletions(-)

diff --git a/block.c b/block.c
index 7184a0f..0af9a89 100644
--- a/block.c
+++ b/block.c
@@ -53,17 +53,12 @@ static BlockDriverAIOCB *bdrv_aio_readv_em(BlockDriverState 
*bs,
 static BlockDriverAIOCB *bdrv_aio_writev_em(BlockDriverState *bs,
 int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
 BlockDriverCompletionFunc *cb, void *opaque);
-static BlockDriverAIOCB *bdrv_aio_flush_em(BlockDriverState *bs,
-BlockDriverCompletionFunc *cb, void *opaque);
-static BlockDriverAIOCB *bdrv_aio_noop_em(BlockDriverState *bs,
-BlockDriverCompletionFunc *cb, void *opaque);
 static int coroutine_fn bdrv_co_readv_em(BlockDriverState *bs,
  int64_t sector_num, int nb_sectors,
  QEMUIOVector *iov);
 static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
  int64_t sector_num, int nb_sectors,
  QEMUIOVector *iov);
-static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs);
 static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
 static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
@@ -203,9 +198,6 @@ void bdrv_register(BlockDriver *bdrv)
 }
 }
 
-if (!bdrv-bdrv_aio_flush)
-bdrv-bdrv_aio_flush = bdrv_aio_flush_em;
-
 QLIST_INSERT_HEAD(bdrv_drivers, bdrv, list);
 }
 
@@ -1027,11 +1019,6 @@ static int bdrv_check_request(BlockDriverState *bs, 
int64_t sector_num,
nb_sectors * BDRV_SECTOR_SIZE);
 }
 
-static inline bool bdrv_has_async_flush(BlockDriver *drv)
-{
-return drv-bdrv_aio_flush != bdrv_aio_flush_em;
-}
-
 typedef struct RwCo {
 BlockDriverState *bs;
 int64_t sector_num;
@@ -1759,33 +1746,6 @@ const char *bdrv_get_device_name(BlockDriverState *bs)
 return bs-device_name;
 }
 
-int bdrv_flush(BlockDriverState *bs)
-{
-if (bs-open_flags  BDRV_O_NO_FLUSH) {
-return 0;
-}
-
-if (bs-drv  bdrv_has_async_flush(bs-drv)  qemu_in_coroutine()) {
-return bdrv_co_flush_em(bs);
-}
-
-if (bs-drv  bs-drv-bdrv_flush) {
-return bs-drv-bdrv_flush(bs);
-}
-
-/*
- * Some block drivers always operate in either writethrough or unsafe mode
- * and don't support bdrv_flush therefore. Usually qemu doesn't know how
- * the server works (because the behaviour is hardcoded or depends on
- * server-side configuration), so we can't ensure that everything is safe
- * on disk. Returning an error doesn't work because that would break guests
- * even if the server operates in writethrough mode.
- *
- * Let's hope the user knows what he's doing.
- */
-return 0;
-}
-
 void bdrv_flush_all(void)
 {
 BlockDriverState *bs;
@@ -2610,22 +2570,6 @@ fail:
 return -1;
 }
 
-BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
-BlockDriverCompletionFunc *cb, void *opaque)
-{
-BlockDriver *drv = bs-drv;
-
-trace_bdrv_aio_flush(bs, opaque);
-
-if (bs-open_flags  BDRV_O_NO_FLUSH) {
-return bdrv_aio_noop_em(bs, cb, opaque);
-}
-
-if (!drv)
-return NULL;
-return drv-bdrv_aio_flush(bs, cb, opaque);
-}
-
 void bdrv_aio_cancel(BlockDriverAIOCB *acb)
 {
 acb-pool-cancel(acb);
@@ -2785,41 +2729,28 @@ static BlockDriverAIOCB 
*bdrv_co_aio_rw_vector(BlockDriverState *bs,
 return acb-common;
 }
 
-static BlockDriverAIOCB *bdrv_aio_flush_em(BlockDriverState *bs,
-BlockDriverCompletionFunc *cb, void *opaque)
+static void coroutine_fn bdrv_co_flush(void *opaque)
 {
-BlockDriverAIOCBSync *acb;
-
-acb = qemu_aio_get(bdrv_em_aio_pool, bs, cb, opaque);
-acb-is_write = 1; /* don't bounce in the completion hadler */
-acb-qiov = NULL;
-acb-bounce = NULL;
-acb-ret = 0;
-
-if (!acb-bh)
-acb-bh = qemu_bh_new(bdrv_aio_bh_cb, acb);
+BlockDriverAIOCBCoroutine *acb = opaque;
+BlockDriverState *bs = acb-common.bs;
 
-bdrv_flush(bs);
+acb-req.error = bdrv_flush(bs);
+acb-bh = qemu_bh_new(bdrv_co_em_bh, acb);
 qemu_bh_schedule(acb-bh);
-return acb-common;
 }
 
-static BlockDriverAIOCB *bdrv_aio_noop_em(BlockDriverState *bs,
+BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
 BlockDriverCompletionFunc *cb, void *opaque)
 {
-BlockDriverAIOCBSync *acb;
+trace_bdrv_aio_flush(bs, opaque);
 
-acb = qemu_aio_get(bdrv_em_aio_pool, bs, cb, opaque);
-acb-is_write = 1; /* don't bounce in the completion handler */
-acb-qiov =

[Qemu-devel] [PATCH 1/2 V5] qemu-kvm: Synchronize kernel headers

2011-10-14 Thread Lai Jiangshan



Synchronize newest kernel headers which have
KVM_CAP_SET_LINT1 and KVM_SET_LINT1 by
./scripts/update-linux-headers.sh

Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
---
 linux-headers/asm-powerpc/kvm.h  |   19 +--
 linux-headers/asm-x86/kvm.h  |1 +
 linux-headers/asm-x86/kvm_para.h |   14 ++
 linux-headers/linux/kvm.h|   26 +++---
 linux-headers/linux/kvm_para.h   |1 +
 5 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h
index 777d307..a4f6c85 100644
--- a/linux-headers/asm-powerpc/kvm.h
+++ b/linux-headers/asm-powerpc/kvm.h
@@ -22,6 +22,10 @@
 
 #include linux/types.h
 
+/* Select powerpc specific features in linux/kvm.h */
+#define __KVM_HAVE_SPAPR_TCE
+#define __KVM_HAVE_PPC_SMT
+
 struct kvm_regs {
__u64 pc;
__u64 cr;
@@ -166,8 +170,8 @@ struct kvm_sregs {
} ppc64;
struct {
__u32 sr[16];
-   __u64 ibat[8];
-   __u64 dbat[8];
+   __u64 ibat[8]; 
+   __u64 dbat[8]; 
} ppc32;
} s;
struct {
@@ -272,4 +276,15 @@ struct kvm_guest_debug_arch {
 #define KVM_INTERRUPT_UNSET-2U
 #define KVM_INTERRUPT_SET_LEVEL-3U
 
+/* for KVM_CAP_SPAPR_TCE */
+struct kvm_create_spapr_tce {
+   __u64 liobn;
+   __u32 window_size;
+};
+
+/* for KVM_ALLOCATE_RMA */
+struct kvm_allocate_rma {
+   __u64 rma_size;
+};
+
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index 4d8dcbd..88d0ac3 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -24,6 +24,7 @@
 #define __KVM_HAVE_DEBUGREGS
 #define __KVM_HAVE_XSAVE
 #define __KVM_HAVE_XCRS
+#define __KVM_HAVE_SET_LINT1
 
 /* Architectural interrupt line count. */
 #define KVM_NR_INTERRUPTS 256
diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h
index 834d71e..f2ac46a 100644
--- a/linux-headers/asm-x86/kvm_para.h
+++ b/linux-headers/asm-x86/kvm_para.h
@@ -21,6 +21,7 @@
  */
 #define KVM_FEATURE_CLOCKSOURCE23
 #define KVM_FEATURE_ASYNC_PF   4
+#define KVM_FEATURE_STEAL_TIME 5
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -30,10 +31,23 @@
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
 
+#define KVM_MSR_ENABLED 1
 /* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */
 #define MSR_KVM_WALL_CLOCK_NEW  0x4b564d00
 #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
+#define MSR_KVM_STEAL_TIME  0x4b564d03
+
+struct kvm_steal_time {
+   __u64 steal;
+   __u32 version;
+   __u32 flags;
+   __u32 pad[12];
+};
+
+#define KVM_STEAL_ALIGNMENT_BITS 5
+#define KVM_STEAL_VALID_BITS ((-1ULL  (KVM_STEAL_ALIGNMENT_BITS + 1)))
+#define KVM_STEAL_RESERVED_MASK (((1  KVM_STEAL_ALIGNMENT_BITS) - 1 )  1)
 
 #define KVM_MAX_MMU_OP_BATCH   32
 
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index fc63b73..86808b4 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -161,6 +161,7 @@ struct kvm_pit_config {
 #define KVM_EXIT_NMI  16
 #define KVM_EXIT_INTERNAL_ERROR   17
 #define KVM_EXIT_OSI  18
+#define KVM_EXIT_PAPR_HCALL  19
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 #define KVM_INTERNAL_ERROR_EMULATION 1
@@ -264,6 +265,11 @@ struct kvm_run {
struct {
__u64 gprs[32];
} osi;
+   struct {
+   __u64 nr;
+   __u64 ret;
+   __u64 args[9];
+   } papr_hcall;
/* Fix the size of the union. */
char padding[256];
};
@@ -544,6 +550,13 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_TSC_CONTROL 60
 #define KVM_CAP_GET_TSC_KHZ 61
 #define KVM_CAP_PPC_BOOKE_SREGS 62
+#define KVM_CAP_SPAPR_TCE 63
+#define KVM_CAP_PPC_SMT 64
+#define KVM_CAP_PPC_RMA65
+#define KVM_CAP_S390_GMAP 71
+#ifdef __KVM_HAVE_SET_LINT1
+#define KVM_CAP_SET_LINT1 72
+#endif
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -746,6 +759,11 @@ struct kvm_clock_data {
 /* Available with KVM_CAP_XCRS */
 #define KVM_GET_XCRS _IOR(KVMIO,  0xa6, struct kvm_xcrs)
 #define KVM_SET_XCRS _IOW(KVMIO,  0xa7, struct kvm_xcrs)
+#define KVM_CREATE_SPAPR_TCE _IOW(KVMIO,  0xa8, struct 
kvm_create_spapr_tce)
+/* Available with KVM_CAP_RMA */
+#define KVM_ALLOCATE_RMA _IOR(KVMIO,  0xa9, struct kvm_allocate_rma)
+/* Available with KVM_CAP_SET_LINT1 for x86 */
+#define KVM_SET_LINT1_IO(KVMIO,   0xaa)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1  0)

[Qemu-devel] [PATCH 1/1 V5] kernel/kvm: introduce KVM_SET_LINT1 and fix improper nmi emulation

2011-10-14 Thread Lai Jiangshan

Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
button event happens. This doesn't properly emulate real hardware on
which NMI button event triggers LINT1. Because of this, NMI is sent to
the processor even when LINT1 is masked in LVT. For example, this
causes the problem that kdump initiated by NMI sometimes doesn't work
on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.

With this patch, we introduce introduce KVM_SET_LINT1,
and we can use KVM_SET_LINT1 to correctly emulate NMI button
without change the old KVM_NMI behavior.

Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
Reported-by: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com
---
 arch/x86/include/asm/kvm.h |1 +
 arch/x86/kvm/irq.h |1 +
 arch/x86/kvm/lapic.c   |7 +++
 arch/x86/kvm/x86.c |8 
 include/linux/kvm.h|5 +
 5 files changed, 22 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
index 4d8dcbd..88d0ac3 100644
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -24,6 +24,7 @@
 #define __KVM_HAVE_DEBUGREGS
 #define __KVM_HAVE_XSAVE
 #define __KVM_HAVE_XCRS
+#define __KVM_HAVE_SET_LINT1
 
 /* Architectural interrupt line count. */
 #define KVM_NR_INTERRUPTS 256
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index 53e2d08..0c96315 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -95,6 +95,7 @@ void kvm_pic_reset(struct kvm_kpic_state *s);
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu);
+void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu);
 void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu);
 void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu);
 void __kvm_migrate_timers(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 57dcbd4..87fe36a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1039,6 +1039,13 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu)
kvm_apic_local_deliver(apic, APIC_LVT0);
 }
 
+void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu)
+{
+   struct kvm_lapic *apic = vcpu-arch.apic;
+
+   kvm_apic_local_deliver(apic, APIC_LVT1);
+}
+
 static struct kvm_timer_ops lapic_timer_ops = {
.is_periodic = lapic_is_periodic,
 };
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 84a28ea..fccd094 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2077,6 +2077,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
case KVM_CAP_GET_TSC_KHZ:
+   case KVM_CAP_SET_LINT1:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -3264,6 +3265,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 
goto out;
}
+   case KVM_SET_LINT1: {
+   r = -EINVAL;
+   if (!irqchip_in_kernel(vcpu-kvm))
+   goto out;
+   r = 0;
+   kvm_apic_lint1_deliver(vcpu);
+   }
default:
r = -EINVAL;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index aace6b8..3a10572 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -554,6 +554,9 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_SMT 64
 #define KVM_CAP_PPC_RMA65
 #define KVM_CAP_S390_GMAP 71
+#ifdef __KVM_HAVE_SET_LINT1
+#define KVM_CAP_SET_LINT1 72
+#endif
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -759,6 +762,8 @@ struct kvm_clock_data {
 #define KVM_CREATE_SPAPR_TCE _IOW(KVMIO,  0xa8, struct 
kvm_create_spapr_tce)
 /* Available with KVM_CAP_RMA */
 #define KVM_ALLOCATE_RMA _IOR(KVMIO,  0xa9, struct kvm_allocate_rma)
+/* Available with KVM_CAP_SET_LINT1 for x86 */
+#define KVM_SET_LINT1_IO(KVMIO,   0xaa)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1  0)

[Qemu-devel] [PATCH 2/2 V5] qemu-kvm: fix improper nmi emulation

2011-10-14 Thread Lai Jiangshan

Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
button event happens. This doesn't properly emulate real hardware on
which NMI button event triggers LINT1. Because of this, NMI is sent to
the processor even when LINT1 is masked in LVT. For example, this
causes the problem that kdump initiated by NMI sometimes doesn't work
on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.

With this patch, inject-nmi request is handled as follows.

- When in-kernel irqchip is enabled and KVM_SET_LINT1 is enabled,
  inject LINT1 instead of NMI interrupt.

- otherwise when in-kernel irqchip is enabled, get the in-kernel
  LAPIC states and test the APIC_LVT_MASKED, if LINT1 is unmasked,
  and then delivering the NMI directly.

- otherwise, userland lapic emulates NMI button and inject NMI
  if it is unmasked.

Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
Reported-by: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com
---
 hw/apic.c |   72 +
 hw/apic.h |1 +
 monitor.c |6 -
 3 files changed, 78 insertions(+), 1 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 69d6ac5..91b82d0 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -205,6 +205,78 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
 }
 }
 
+#ifdef KVM_CAP_IRQCHIP
+static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id);
+
+static void kvm_irqchip_deliver_nmi(void *p)
+{
+APICState *s = p;
+struct kvm_lapic_state klapic;
+uint32_t lvt;
+
+kvm_get_lapic(s-cpu_env, klapic);
+lvt = kapic_reg(klapic, 0x32 + APIC_LVT_LINT1);
+
+if (lvt  APIC_LVT_MASKED) {
+return;
+}
+
+if (((lvt  8)  7) != APIC_DM_NMI) {
+return;
+}
+
+kvm_vcpu_ioctl(s-cpu_env, KVM_NMI);
+}
+
+static void __apic_deliver_nmi(APICState *s)
+{
+if (kvm_irqchip_in_kernel()) {
+run_on_cpu(s-cpu_env, kvm_irqchip_deliver_nmi, s);
+} else {
+apic_local_deliver(s, APIC_LVT_LINT1);
+}
+}
+#else
+static void __apic_deliver_nmi(APICState *s)
+{
+apic_local_deliver(s, APIC_LVT_LINT1);
+}
+#endif
+
+enum {
+KVM_SET_LINT1_UNKNOWN,
+KVM_SET_LINT1_ENABLED,
+KVM_SET_LINT1_DISABLED,
+};
+
+static void kvm_set_lint1(void *p)
+{
+CPUState *env = p;
+
+kvm_vcpu_ioctl(env, KVM_SET_LINT1);
+}
+
+void apic_deliver_nmi(DeviceState *d)
+{
+APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
+static int kernel_lint1 = KVM_SET_LINT1_UNKNOWN;
+
+if (kernel_lint1 == KVM_SET_LINT1_UNKNOWN) {
+if (kvm_enabled()  kvm_irqchip_in_kernel() 
+kvm_check_extension(kvm_state, KVM_CAP_SET_LINT1)) {
+kernel_lint1 = KVM_SET_LINT1_ENABLED;
+} else {
+kernel_lint1 = KVM_SET_LINT1_DISABLED;
+}
+}
+
+if (kernel_lint1 == KVM_SET_LINT1_ENABLED) {
+run_on_cpu(s-cpu_env, kvm_set_lint1, s-cpu_env);
+} else {
+__apic_deliver_nmi(s);
+}
+}
+
 #define foreach_apic(apic, deliver_bitmask, code) \
 {\
 int __i, __j, __mask;\
diff --git a/hw/apic.h b/hw/apic.h
index c857d52..3a4be0a 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -10,6 +10,7 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode,
  uint8_t trigger_mode);
 int apic_accept_pic_intr(DeviceState *s);
 void apic_deliver_pic_intr(DeviceState *s, int level);
+void apic_deliver_nmi(DeviceState *d);
 int apic_get_interrupt(DeviceState *s);
 void apic_reset_irq_delivered(void);
 int apic_get_irq_delivered(void);
diff --git a/monitor.c b/monitor.c
index cb485bf..0b81f17 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2616,7 +2616,11 @@ static int do_inject_nmi(Monitor *mon, const QDict 
*qdict, QObject **ret_data)
 CPUState *env;
 
 for (env = first_cpu; env != NULL; env = env-next_cpu) {
-cpu_interrupt(env, CPU_INTERRUPT_NMI);
+if (!env-apic_state) {
+cpu_interrupt(env, CPU_INTERRUPT_NMI);
+} else {
+apic_deliver_nmi(env-apic_state);
+}
 }
 
 return 0;

[Qemu-devel] [PATCH 3/4] block: drop redundant bdrv_flush implementation

2011-10-14 Thread Paolo Bonzini

From: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

Block drivers now only need to provide either of .bdrv_co_flush,
.bdrv_aio_flush() or for legacy drivers .bdrv_flush().  Remove
the redundant .bdrv_flush() and, for the raw driver, replace the
asynchronous operation with the coroutine-based one.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 block/blkdebug.c  |6 --
 block/blkverify.c |9 -
 block/qcow.c  |6 --
 block/qcow2.c |   19 ---
 block/qed.c   |6 --
 block/raw-posix.c |   18 --
 block/raw.c   |   11 ++-
 7 files changed, 2 insertions(+), 73 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index b3c5d42..9b88535 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -397,11 +397,6 @@ static void blkdebug_close(BlockDriverState *bs)
 }
 }
 
-static int blkdebug_flush(BlockDriverState *bs)
-{
-return bdrv_flush(bs-file);
-}
-
 static BlockDriverAIOCB *blkdebug_aio_flush(BlockDriverState *bs,
 BlockDriverCompletionFunc *cb, void *opaque)
 {
@@ -454,7 +449,6 @@ static BlockDriver bdrv_blkdebug = {
 
 .bdrv_file_open = blkdebug_open,
 .bdrv_close = blkdebug_close,
-.bdrv_flush = blkdebug_flush,
 
 .bdrv_aio_readv = blkdebug_aio_readv,
 .bdrv_aio_writev= blkdebug_aio_writev,
diff --git a/block/blkverify.c b/block/blkverify.c
index c7522b4..483f3b3 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -116,14 +116,6 @@ static void blkverify_close(BlockDriverState *bs)
 s-test_file = NULL;
 }
 
-static int blkverify_flush(BlockDriverState *bs)
-{
-BDRVBlkverifyState *s = bs-opaque;
-
-/* Only flush test file, the raw file is not important */
-return bdrv_flush(s-test_file);
-}
-
 static int64_t blkverify_getlength(BlockDriverState *bs)
 {
 BDRVBlkverifyState *s = bs-opaque;
@@ -368,7 +360,6 @@ static BlockDriver bdrv_blkverify = {
 
 .bdrv_file_open = blkverify_open,
 .bdrv_close = blkverify_close,
-.bdrv_flush = blkverify_flush,
 
 .bdrv_aio_readv = blkverify_aio_readv,
 .bdrv_aio_writev= blkverify_aio_writev,
diff --git a/block/qcow.c b/block/qcow.c
index c8bfecc..9b71116 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -781,11 +781,6 @@ static int qcow_write_compressed(BlockDriverState *bs, 
int64_t sector_num,
 return 0;
 }
 
-static int qcow_flush(BlockDriverState *bs)
-{
-return bdrv_flush(bs-file);
-}
-
 static BlockDriverAIOCB *qcow_aio_flush(BlockDriverState *bs,
 BlockDriverCompletionFunc *cb, void *opaque)
 {
@@ -826,7 +821,6 @@ static BlockDriver bdrv_qcow = {
 .bdrv_open = qcow_open,
 .bdrv_close= qcow_close,
 .bdrv_create   = qcow_create,
-.bdrv_flush= qcow_flush,
 .bdrv_is_allocated = qcow_is_allocated,
 .bdrv_set_key  = qcow_set_key,
 .bdrv_make_empty   = qcow_make_empty,
diff --git a/block/qcow2.c b/block/qcow2.c
index 510ff68..4dc980c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1092,24 +1092,6 @@ static int qcow2_write_compressed(BlockDriverState *bs, 
int64_t sector_num,
 return 0;
 }
 
-static int qcow2_flush(BlockDriverState *bs)
-{
-BDRVQcowState *s = bs-opaque;
-int ret;
-
-ret = qcow2_cache_flush(bs, s-l2_table_cache);
-if (ret  0) {
-return ret;
-}
-
-ret = qcow2_cache_flush(bs, s-refcount_block_cache);
-if (ret  0) {
-return ret;
-}
-
-return bdrv_flush(bs-file);
-}
-
 static BlockDriverAIOCB *qcow2_aio_flush(BlockDriverState *bs,
  BlockDriverCompletionFunc *cb,
  void *opaque)
@@ -1242,7 +1224,6 @@ static BlockDriver bdrv_qcow2 = {
 .bdrv_open  = qcow2_open,
 .bdrv_close = qcow2_close,
 .bdrv_create= qcow2_create,
-.bdrv_flush = qcow2_flush,
 .bdrv_is_allocated  = qcow2_is_allocated,
 .bdrv_set_key   = qcow2_set_key,
 .bdrv_make_empty= qcow2_make_empty,
diff --git a/block/qed.c b/block/qed.c
index e87dc4d..2e06992 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -533,11 +533,6 @@ static void bdrv_qed_close(BlockDriverState *bs)
 qemu_vfree(s-l1_table);
 }
 
-static int bdrv_qed_flush(BlockDriverState *bs)
-{
-return bdrv_flush(bs-file);
-}
-
 static int qed_create(const char *filename, uint32_t cluster_size,
   uint64_t image_size, uint32_t table_size,
   const char *backing_file, const char *backing_fmt)
@@ -1479,7 +1474,6 @@ static BlockDriver bdrv_qed = {
 .bdrv_open= bdrv_qed_open,
 .bdrv_close   = bdrv_qed_close,
 .bdrv_create  = bdrv_qed_create,
-.bdrv_flush   = bdrv_qed_flush,
 .bdrv_is_allocated= bdrv_qed_is_allocated,
 .bdrv_make_empty  =

[Qemu-devel] [PATCH 4/4] block: add bdrv_co_discard and bdrv_aio_discard support

2011-10-14 Thread Paolo Bonzini

This similarly adds support for coroutine and asynchronous discard.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
I was not sure if qcow2 could be changed to co_discard, though
I suspected yes.

 block.c  |   72 +-
 block.h  |3 ++
 block/raw.c  |8 --
 block_int.h  |9 +-
 trace-events |1 +
 5 files changed, 77 insertions(+), 16 deletions(-)

diff --git a/block.c b/block.c
index 0af9a89..7c60361 100644
--- a/block.c
+++ b/block.c
@@ -1768,17 +1768,6 @@ int bdrv_has_zero_init(BlockDriverState *bs)
 return 1;
 }
 
-int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
-{
-if (!bs-drv) {
-return -ENOMEDIUM;
-}
-if (!bs-drv-bdrv_discard) {
-return 0;
-}
-return bs-drv-bdrv_discard(bs, sector_num, nb_sectors);
-}
-
 /*
  * Returns true iff the specified sector is present in the disk image. Drivers
  * not implementing the functionality are assumed to not support backing files,
@@ -2911,6 +2900,66 @@ int bdrv_flush(BlockDriverState *bs)
 return rwco.ret;
 }
 
+static void bdrv_discard_co_entry(void *opaque)
+{
+RwCo *rwco = opaque;
+BlockDriverState *bs = rwco-bs;
+int64_t sector_num = rwco-sector_num;
+int nb_sectors = rwco-nb_sectors;
+
+if (!bs-drv) {
+rwco-ret = -ENOMEDIUM;
+} else if (bdrv_check_request(bs, sector_num, nb_sectors)) {
+rwco-ret = -EIO;
+} else if (bs-read_only) {
+rwco-ret = -EROFS;
+} else if (bs-drv-bdrv_co_discard) {
+rwco-ret = bs-drv-bdrv_co_discard(bs, sector_num, nb_sectors);
+} else if (bs-drv-bdrv_aio_discard) {
+BlockDriverAIOCB *acb;
+CoroutineIOCompletion co = {
+.coroutine = qemu_coroutine_self(),
+};
+
+acb = bs-drv-bdrv_aio_discard(bs, sector_num, nb_sectors,
+bdrv_co_io_em_complete, co);
+if (acb == NULL) {
+rwco-ret = -EIO;
+} else {
+qemu_coroutine_yield();
+rwco-ret = co.ret;
+}
+} else if (bs-drv-bdrv_discard) {
+rwco-ret = bs-drv-bdrv_discard(bs, sector_num, nb_sectors);
+} else {
+rwco-ret = 0;
+}
+}
+
+int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
+{
+Coroutine *co;
+RwCo rwco = {
+.bs = bs,
+.sector_num = sector_num,
+.nb_sectors = nb_sectors,
+.ret = NOT_DONE,
+};
+
+if (qemu_in_coroutine()) {
+/* Fast-path if already in coroutine context */
+bdrv_discard_co_entry(rwco);
+} else {
+co = qemu_coroutine_create(bdrv_discard_co_entry);
+qemu_coroutine_enter(co, rwco);
+while (rwco.ret == NOT_DONE) {
+qemu_aio_wait();
+}
+}
+
+return rwco.ret;
+}
+
 /**/
 /* removable device support */
 
diff --git a/block.h b/block.h
index e77988e..4f4aa94 100644
--- a/block.h
+++ b/block.h
@@ -166,6 +166,9 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, 
int64_t sector_num,
   BlockDriverCompletionFunc *cb, void *opaque);
 BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
  BlockDriverCompletionFunc *cb, void *opaque);
+BlockDriverAIOCB *bdrv_aio_discard(BlockDriverState *bs,
+   int64_t sector_num, int nb_sectors,
+   BlockDriverCompletionFunc *cb, void 
*opaque);
 void bdrv_aio_cancel(BlockDriverAIOCB *acb);
 
 typedef struct BlockRequest {
diff --git a/block/raw.c b/block/raw.c
index 161c9cf..ff68514 100644
--- a/block/raw.c
+++ b/block/raw.c
@@ -45,7 +45,8 @@ static int raw_probe(const uint8_t *buf, int buf_size, const 
char *filename)
return 1; /* everything can be opened as raw image */
 }
 
-static int raw_discard(BlockDriverState *bs, int64_t sector_num, int 
nb_sectors)
+static int coroutine_fn raw_co_discard(BlockDriverState *bs,
+   int64_t sector_num, int nb_sectors)
 {
 return bdrv_discard(bs-file, sector_num, nb_sectors);
 }
@@ -109,15 +110,16 @@ static BlockDriver bdrv_raw = {
 
 .bdrv_open  = raw_open,
 .bdrv_close = raw_close,
+
 .bdrv_co_readv  = raw_co_readv,
 .bdrv_co_writev = raw_co_writev,
 .bdrv_co_flush  = raw_co_flush,
+.bdrv_co_discard= raw_co_discard,
+
 .bdrv_probe = raw_probe,
 .bdrv_getlength = raw_getlength,
 .bdrv_truncate  = raw_truncate,
 
-.bdrv_discard   = raw_discard,
-
 .bdrv_is_inserted   = raw_is_inserted,
 .bdrv_media_changed = raw_media_changed,
 .bdrv_eject = raw_eject,
diff --git a/block_int.h b/block_int.h
index 9cb536d..384598f 100644
--- a/block_int.h
+++ b/block_int.h
@@ -63,6 +63,8 @@ struct BlockDriver {
 void

Re: [Qemu-devel] [PATCH 1/1 V5] kernel/kvm: introduce KVM_SET_LINT1 and fix improper nmi emulation

2011-10-14 Thread Jan Kiszka

On 2011-10-14 11:03, Lai Jiangshan wrote:
 Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
 button event happens. This doesn't properly emulate real hardware on
 which NMI button event triggers LINT1. Because of this, NMI is sent to
 the processor even when LINT1 is masked in LVT. For example, this
 causes the problem that kdump initiated by NMI sometimes doesn't work
 on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
 
 With this patch, we introduce introduce KVM_SET_LINT1,
 and we can use KVM_SET_LINT1 to correctly emulate NMI button
 without change the old KVM_NMI behavior.
 
 Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
 Reported-by: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com
 ---
  arch/x86/include/asm/kvm.h |1 +
  arch/x86/kvm/irq.h |1 +
  arch/x86/kvm/lapic.c   |7 +++
  arch/x86/kvm/x86.c |8 
  include/linux/kvm.h|5 +
  5 files changed, 22 insertions(+), 0 deletions(-)
 diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
 index 4d8dcbd..88d0ac3 100644
 --- a/arch/x86/include/asm/kvm.h
 +++ b/arch/x86/include/asm/kvm.h
 @@ -24,6 +24,7 @@
  #define __KVM_HAVE_DEBUGREGS
  #define __KVM_HAVE_XSAVE
  #define __KVM_HAVE_XCRS
 +#define __KVM_HAVE_SET_LINT1
  
  /* Architectural interrupt line count. */
  #define KVM_NR_INTERRUPTS 256
 diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
 index 53e2d08..0c96315 100644
 --- a/arch/x86/kvm/irq.h
 +++ b/arch/x86/kvm/irq.h
 @@ -95,6 +95,7 @@ void kvm_pic_reset(struct kvm_kpic_state *s);
  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
  void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
  void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu);
 +void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu);
  void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu);
  void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu);
  void __kvm_migrate_timers(struct kvm_vcpu *vcpu);
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 57dcbd4..87fe36a 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -1039,6 +1039,13 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu)
   kvm_apic_local_deliver(apic, APIC_LVT0);
  }
  
 +void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu)
 +{
 + struct kvm_lapic *apic = vcpu-arch.apic;
 +
 + kvm_apic_local_deliver(apic, APIC_LVT1);
 +}
 +
  static struct kvm_timer_ops lapic_timer_ops = {
   .is_periodic = lapic_is_periodic,
  };
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 84a28ea..fccd094 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -2077,6 +2077,7 @@ int kvm_dev_ioctl_check_extension(long ext)
   case KVM_CAP_XSAVE:
   case KVM_CAP_ASYNC_PF:
   case KVM_CAP_GET_TSC_KHZ:
 + case KVM_CAP_SET_LINT1:
   r = 1;
   break;
   case KVM_CAP_COALESCED_MMIO:
 @@ -3264,6 +3265,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
  
   goto out;
   }
 + case KVM_SET_LINT1: {
 + r = -EINVAL;
 + if (!irqchip_in_kernel(vcpu-kvm))
 + goto out;
 + r = 0;
 + kvm_apic_lint1_deliver(vcpu);
 + }
   default:
   r = -EINVAL;
   }
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index aace6b8..3a10572 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -554,6 +554,9 @@ struct kvm_ppc_pvinfo {
  #define KVM_CAP_PPC_SMT 64
  #define KVM_CAP_PPC_RMA  65
  #define KVM_CAP_S390_GMAP 71
 +#ifdef __KVM_HAVE_SET_LINT1
 +#define KVM_CAP_SET_LINT1 72
 +#endif

Actually, there is no need for __KVM_HAVE_SET_LINT1 and #ifdef. User
land will just do a runtime check.

Jan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] compatfd.c: Don't pass NULL pointer to SYS_signalfd

2011-10-14 Thread Stefan Hajnoczi

On Thu, Oct 13, 2011 at 06:45:37PM +0100, Peter Maydell wrote:
 Don't pass a NULL pointer in to SYS_signalfd in qemu_signalfd_available():
 this isn't valid and Valgrind complains about it.
 
 Signed-off-by: Peter Maydell peter.mayd...@linaro.org
 ---
  compatfd.c |   12 ++--
  1 files changed, 10 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

[Qemu-devel] [0/12] Preliminary work for IOMMU emulation support (v2)

2011-10-14 Thread David Gibson

A while back, Eduard - Gabriel Munteanu send a series of patches
implementing support for emulating the AMD IOMMU in conjunction with
qemu emulated PCI devices.  A revised patch series added support for
the Intel IOMMU, and I also send a revised version of this series
which added support for the hypervisor mediated IOMMU on the pseries
machine.

Richard Henderson also weighed in on the discussion, and there's still
a cretain amount to be thrashed out in terms of exactly how to set up
an IOMMU / DMA translation subsystem.

However, really only 2 or 3 patches in any of these series have
contained anything interesting.  The rest of the series has been
converting existing PCI emulated devices to use the new DMA interface
which worked through the IOMMU translation, whatever it was.  While we
keep working out what we want for the guts of the IOMMU support, these
device conversion patches keep bitrotting against updates to the
various device implementations themselves.

Really, regardless of whether we're actually implementing IOMMU
translation, it makes sense that qemu code should distinguish between
when it is really operating in CPU physical addresses and when it is
operating in bus or DMA addresses which might have some kind of
translation into physical addresses.

This series, therefore, begins the conversion of existing PCI device
emulation code to use new (stub) pci dma access functions.  These are,
for now, just defined to be untranslated cpu physical memory accesses,
as before, but has three advantages:

   * It becomes obvious where the code is working with dma addresses,
 so it's easier to grep for what might be affected by an IOMMU or
 other bus address translation.

   * The new stubs take the PCIDevice *, from which any of the various
 suggested IOMMU interfaces should be able to locate the correct
 IOMMU translation context.

   * The new pci_dma_{read,write}() functions have a return value.
 When we do have IOMMU support, translation failures could lead to
 these functions failing, so we want a way to report it.

This series converts all the easy cases.  It doesn't yet handle
devices which have both PCI and non-PCI variants, such as AHCI, OHCI
and ne2k.  Unlike the earlier version of this series, functions using
scatter gather _are_ covered, though.

[Qemu-devel] [PATCH 06/12] e1000: Use PCI DMA stub functions

2011-10-14 Thread David Gibson

From: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro

This updates the e1000 device emulation to use the explicit PCI DMA
functions, instead of directly calling physical memory access functions.

Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
Signed-off-by: David Gibson da...@gibson.dropbear.id.au
---
 hw/e1000.c |   29 +++--
 1 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index ce8fc8b..986ed9c 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -31,6 +31,7 @@
 #include net/checksum.h
 #include loader.h
 #include sysemu.h
+#include dma.h
 
 #include e1000_hw.h
 
@@ -465,7 +466,7 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp)
 bytes = split_size;
 if (tp-size + bytes  msh)
 bytes = msh - tp-size;
-cpu_physical_memory_read(addr, tp-data + tp-size, bytes);
+pci_dma_read(s-dev, addr, tp-data + tp-size, bytes);
 if ((sz = tp-size + bytes) = hdr  tp-size  hdr)
 memmove(tp-header, tp-data, hdr);
 tp-size = sz;
@@ -480,7 +481,7 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp)
 // context descriptor TSE is not set, while data descriptor TSE is set
 DBGOUT(TXERR, TCP segmentaion Error\n);
 } else {
-cpu_physical_memory_read(addr, tp-data + tp-size, split_size);
+pci_dma_read(s-dev, addr, tp-data + tp-size, split_size);
 tp-size += split_size;
 }
 
@@ -496,7 +497,7 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp)
 }
 
 static uint32_t
-txdesc_writeback(target_phys_addr_t base, struct e1000_tx_desc *dp)
+txdesc_writeback(E1000State *s, dma_addr_t base, struct e1000_tx_desc *dp)
 {
 uint32_t txd_upper, txd_lower = le32_to_cpu(dp-lower.data);
 
@@ -505,8 +506,8 @@ txdesc_writeback(target_phys_addr_t base, struct 
e1000_tx_desc *dp)
 txd_upper = (le32_to_cpu(dp-upper.data) | E1000_TXD_STAT_DD) 
 ~(E1000_TXD_STAT_EC | E1000_TXD_STAT_LC | E1000_TXD_STAT_TU);
 dp-upper.data = cpu_to_le32(txd_upper);
-cpu_physical_memory_write(base + ((char *)dp-upper - (char *)dp),
-  (void *)dp-upper, sizeof(dp-upper));
+pci_dma_write(s-dev, base + ((char *)dp-upper - (char *)dp),
+  (void *)dp-upper, sizeof(dp-upper));
 return E1000_ICR_TXDW;
 }
 
@@ -521,7 +522,7 @@ static uint64_t tx_desc_base(E1000State *s)
 static void
 start_xmit(E1000State *s)
 {
-target_phys_addr_t base;
+dma_addr_t base;
 struct e1000_tx_desc desc;
 uint32_t tdh_start = s-mac_reg[TDH], cause = E1000_ICS_TXQE;
 
@@ -533,14 +534,14 @@ start_xmit(E1000State *s)
 while (s-mac_reg[TDH] != s-mac_reg[TDT]) {
 base = tx_desc_base(s) +
sizeof(struct e1000_tx_desc) * s-mac_reg[TDH];
-cpu_physical_memory_read(base, (void *)desc, sizeof(desc));
+pci_dma_read(s-dev, base, (void *)desc, sizeof(desc));
 
 DBGOUT(TX, index %d: %p : %x %x\n, s-mac_reg[TDH],
(void *)(intptr_t)desc.buffer_addr, desc.lower.data,
desc.upper.data);
 
 process_tx_desc(s, desc);
-cause |= txdesc_writeback(base, desc);
+cause |= txdesc_writeback(s, base, desc);
 
 if (++s-mac_reg[TDH] * sizeof(desc) = s-mac_reg[TDLEN])
 s-mac_reg[TDH] = 0;
@@ -668,7 +669,7 @@ e1000_receive(VLANClientState *nc, const uint8_t *buf, 
size_t size)
 {
 E1000State *s = DO_UPCAST(NICState, nc, nc)-opaque;
 struct e1000_rx_desc desc;
-target_phys_addr_t base;
+dma_addr_t base;
 unsigned int n, rdt;
 uint32_t rdh_start;
 uint16_t vlan_special = 0;
@@ -713,7 +714,7 @@ e1000_receive(VLANClientState *nc, const uint8_t *buf, 
size_t size)
 desc_size = s-rxbuf_size;
 }
 base = rx_desc_base(s) + sizeof(desc) * s-mac_reg[RDH];
-cpu_physical_memory_read(base, (void *)desc, sizeof(desc));
+pci_dma_read(s-dev, base, (void *)desc, sizeof(desc));
 desc.special = vlan_special;
 desc.status |= (vlan_status | E1000_RXD_STAT_DD);
 if (desc.buffer_addr) {
@@ -722,9 +723,9 @@ e1000_receive(VLANClientState *nc, const uint8_t *buf, 
size_t size)
 if (copy_size  s-rxbuf_size) {
 copy_size = s-rxbuf_size;
 }
-cpu_physical_memory_write(le64_to_cpu(desc.buffer_addr),
-  (void *)(buf + desc_offset + 
vlan_offset),
-  copy_size);
+pci_dma_write(s-dev, le64_to_cpu(desc.buffer_addr),
+ (void *)(buf + desc_offset + vlan_offset),
+ copy_size);
 }
 desc_offset += desc_size;
 desc.length = cpu_to_le16(desc_size);
@@ -738,7 +739,7 @@ e1000_receive(VLANClientState *nc, const uint8_t *buf, 
size_t size)
 }

[Qemu-devel] [PATCH 05/12] es1370: Use PCI DMA stub functions

2011-10-14 Thread David Gibson

From: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro

This updates the es1370 device emulation to use the explicit PCI DMA
functions, instead of directly calling physical memory access functions.

Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
Signed-off-by: David Gibson da...@gibson.dropbear.id.au
---
 hw/es1370.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/es1370.c b/hw/es1370.c
index 2daadde..c5c16b0 100644
--- a/hw/es1370.c
+++ b/hw/es1370.c
@@ -30,6 +30,7 @@
 #include audiodev.h
 #include audio/audio.h
 #include pci.h
+#include dma.h
 
 /* Missing stuff:
SCTRL_P[12](END|ST)INC
@@ -802,7 +803,7 @@ static void es1370_transfer_audio (ES1370State *s, struct 
chan *d, int loop_sel,
 if (!acquired)
 break;
 
-cpu_physical_memory_write (addr, tmpbuf, acquired);
+pci_dma_write (s-dev, addr, tmpbuf, acquired);
 
 temp -= acquired;
 addr += acquired;
@@ -816,7 +817,7 @@ static void es1370_transfer_audio (ES1370State *s, struct 
chan *d, int loop_sel,
 int copied, to_copy;
 
 to_copy = audio_MIN ((size_t) temp, sizeof (tmpbuf));
-cpu_physical_memory_read (addr, tmpbuf, to_copy);
+pci_dma_read (s-dev, addr, tmpbuf, to_copy);
 copied = AUD_write (voice, tmpbuf, to_copy);
 if (!copied)
 break;
-- 
1.7.6.3

[Qemu-devel] [PATCH 07/12] lsi53c895a: Use PCI DMA stub functions

2011-10-14 Thread David Gibson

From: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro

This updates the lsi53c895a device emulation to use the explicit PCI DMA
functions, instead of directly calling physical memory access functions.

Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
Signed-off-by: David Gibson da...@gibson.dropbear.id.au
---
 hw/lsi53c895a.c |   31 +++
 1 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index e077ec0..e7e817a 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -15,6 +15,8 @@
 #include hw.h
 #include pci.h
 #include scsi.h
+#include block_int.h
+#include dma.h
 
 //#define DEBUG_LSI
 //#define DEBUG_LSI_REG
@@ -390,10 +392,7 @@ static inline uint32_t read_dword(LSIState *s, uint32_t 
addr)
 {
 uint32_t buf;
 
-/* XXX: an optimization here used to fast-path the read from scripts
- * memory.  But that bypasses any iommu.
- */
-cpu_physical_memory_read(addr, (uint8_t *)buf, 4);
+pci_dma_read(s-dev, addr, (uint8_t *)buf, 4);
 return cpu_to_le32(buf);
 }
 
@@ -532,7 +531,7 @@ static void lsi_bad_selection(LSIState *s, uint32_t id)
 static void lsi_do_dma(LSIState *s, int out)
 {
 uint32_t count, id;
-target_phys_addr_t addr;
+dma_addr_t addr;
 SCSIDevice *dev;
 
 assert(s-current);
@@ -571,9 +570,9 @@ static void lsi_do_dma(LSIState *s, int out)
 }
 /* ??? Set SFBR to first data byte.  */
 if (out) {
-cpu_physical_memory_read(addr, s-current-dma_buf, count);
+pci_dma_read(s-dev, addr, s-current-dma_buf, count);
 } else {
-cpu_physical_memory_write(addr, s-current-dma_buf, count);
+pci_dma_write(s-dev, addr, s-current-dma_buf, count);
 }
 s-current-dma_len -= count;
 if (s-current-dma_len == 0) {
@@ -766,7 +765,7 @@ static void lsi_do_command(LSIState *s)
 DPRINTF(Send command len=%d\n, s-dbc);
 if (s-dbc  16)
 s-dbc = 16;
-cpu_physical_memory_read(s-dnad, buf, s-dbc);
+pci_dma_read(s-dev, s-dnad, buf, s-dbc);
 s-sfbr = buf[0];
 s-command_complete = 0;
 
@@ -817,7 +816,7 @@ static void lsi_do_status(LSIState *s)
 s-dbc = 1;
 status = s-status;
 s-sfbr = status;
-cpu_physical_memory_write(s-dnad, status, 1);
+pci_dma_write(s-dev, s-dnad, status, 1);
 lsi_set_phase(s, PHASE_MI);
 s-msg_action = 1;
 lsi_add_msg_byte(s, 0); /* COMMAND COMPLETE */
@@ -831,7 +830,7 @@ static void lsi_do_msgin(LSIState *s)
 len = s-msg_len;
 if (len  s-dbc)
 len = s-dbc;
-cpu_physical_memory_write(s-dnad, s-msg, len);
+pci_dma_write(s-dev, s-dnad, s-msg, len);
 /* Linux drivers rely on the last byte being in the SIDL.  */
 s-sidl = s-msg[len - 1];
 s-msg_len -= len;
@@ -863,7 +862,7 @@ static void lsi_do_msgin(LSIState *s)
 static uint8_t lsi_get_msgbyte(LSIState *s)
 {
 uint8_t data;
-cpu_physical_memory_read(s-dnad, data, 1);
+pci_dma_read(s-dev, s-dnad, data, 1);
 s-dnad++;
 s-dbc--;
 return data;
@@ -1015,8 +1014,8 @@ static void lsi_memcpy(LSIState *s, uint32_t dest, 
uint32_t src, int count)
 DPRINTF(memcpy dest 0x%08x src 0x%08x count %d\n, dest, src, count);
 while (count) {
 n = (count  LSI_BUF_SIZE) ? LSI_BUF_SIZE : count;
-cpu_physical_memory_read(src, buf, n);
-cpu_physical_memory_write(dest, buf, n);
+pci_dma_read(s-dev, src, buf, n);
+pci_dma_write(s-dev, dest, buf, n);
 src += n;
 dest += n;
 count -= n;
@@ -1084,7 +1083,7 @@ again:
 
 /* 32-bit Table indirect */
 offset = sxt24(addr);
-cpu_physical_memory_read(s-dsa + offset, (uint8_t *)buf, 8);
+pci_dma_read(s-dev, s-dsa + offset, (uint8_t *)buf, 8);
 /* byte count is stored in bits 0:23 only */
 s-dbc = cpu_to_le32(buf[0])  0xff;
 s-rbc = s-dbc;
@@ -1443,7 +1442,7 @@ again:
 n = (insn  7);
 reg = (insn  16)  0xff;
 if (insn  (1  24)) {
-cpu_physical_memory_read(addr, data, n);
+pci_dma_read(s-dev, addr, data, n);
 DPRINTF(Load reg 0x%x size %d addr 0x%08x = %08x\n, reg, n,
 addr, *(int *)data);
 for (i = 0; i  n; i++) {
@@ -1454,7 +1453,7 @@ again:
 for (i = 0; i  n; i++) {
 data[i] = lsi_reg_readb(s, reg + i);
 }
-cpu_physical_memory_write(addr, data, n);
+pci_dma_write(s-dev, addr, data, n);
 }
 }
 }
-- 
1.7.6.3

Re: [Qemu-devel] [0/12] Preliminary work for IOMMU emulation support (v2)

2011-10-14 Thread David Gibson

On Fri, Oct 14, 2011 at 08:20:51PM +1100, David Gibson wrote:
 A while back, Eduard - Gabriel Munteanu send a series of patches
 implementing support for emulating the AMD IOMMU in conjunction with
 qemu emulated PCI devices.  A revised patch series added support for
 the Intel IOMMU, and I also send a revised version of this series
 which added support for the hypervisor mediated IOMMU on the pseries
 machine.
 
 Richard Henderson also weighed in on the discussion, and there's still
 a cretain amount to be thrashed out in terms of exactly how to set up
 an IOMMU / DMA translation subsystem.
 
 However, really only 2 or 3 patches in any of these series have
 contained anything interesting.  The rest of the series has been
 converting existing PCI emulated devices to use the new DMA interface
 which worked through the IOMMU translation, whatever it was.  While we
 keep working out what we want for the guts of the IOMMU support, these
 device conversion patches keep bitrotting against updates to the
 various device implementations themselves.
 
 Really, regardless of whether we're actually implementing IOMMU
 translation, it makes sense that qemu code should distinguish between
 when it is really operating in CPU physical addresses and when it is
 operating in bus or DMA addresses which might have some kind of
 translation into physical addresses.
 
 This series, therefore, begins the conversion of existing PCI device
 emulation code to use new (stub) pci dma access functions.  These are,
 for now, just defined to be untranslated cpu physical memory accesses,
 as before, but has three advantages:
 
* It becomes obvious where the code is working with dma addresses,
  so it's easier to grep for what might be affected by an IOMMU or
  other bus address translation.
 
* The new stubs take the PCIDevice *, from which any of the various
  suggested IOMMU interfaces should be able to locate the correct
  IOMMU translation context.
 
* The new pci_dma_{read,write}() functions have a return value.
  When we do have IOMMU support, translation failures could lead to
  these functions failing, so we want a way to report it.
 
 This series converts all the easy cases.  It doesn't yet handle
 devices which have both PCI and non-PCI variants, such as AHCI, OHCI
 and ne2k.  Unlike the earlier version of this series, functions using
 scatter gather _are_ covered, though.

Oh, one more note.  There are a few checkpatch failures in the series;
most are false positives (mistaking pointer * for a multiplication,
mostly).  There are a few more that are true style errors that I've
left in on the basis that matching the surrounding code is more
important than being checkpatchily correct.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

Re: [Qemu-devel] [PATCH] correct spelling

2011-10-14 Thread Andreas Färber

[cc'ing qemu-trivial]

A one-line summary here and a topic like sheepdog: in the subject
would've been nice but given it's just a typo fix... :)

Am 14.10.2011 09:41, schrieb Dong Xu Wang:
 Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
 ---
  block/sheepdog.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/block/sheepdog.c b/block/sheepdog.c
 index c1f6e07..ae857e2 100644
 --- a/block/sheepdog.c
 +++ b/block/sheepdog.c
 @@ -66,7 +66,7 @@
   * 20 - 31 (12 bits): reserved data object space
   * 32 - 55 (24 bits): vdi object space
   * 56 - 59 ( 4 bits): reserved vdi object space
 - * 60 - 63 ( 4 bits): object type indentifier space
 + * 60 - 63 ( 4 bits): object type identifier space
   */
  
  #define VDI_SPACE_SHIFT   32

Reviewed-by: Andreas Färber afaer...@suse.de

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746, AG Nürnberg

Re: [Qemu-devel] [PATCH 1/1 V5] kernel/kvm: introduce KVM_SET_LINT1 and fix improper nmi emulation

2011-10-14 Thread Lai Jiangshan

On 10/14/2011 05:07 PM, Jan Kiszka wrote:
 On 2011-10-14 11:03, Lai Jiangshan wrote:
 Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
 button event happens. This doesn't properly emulate real hardware on
 which NMI button event triggers LINT1. Because of this, NMI is sent to
 the processor even when LINT1 is masked in LVT. For example, this
 causes the problem that kdump initiated by NMI sometimes doesn't work
 on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.

 With this patch, we introduce introduce KVM_SET_LINT1,
 and we can use KVM_SET_LINT1 to correctly emulate NMI button
 without change the old KVM_NMI behavior.

 Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
 Reported-by: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com
 ---
  arch/x86/include/asm/kvm.h |1 +
  arch/x86/kvm/irq.h |1 +
  arch/x86/kvm/lapic.c   |7 +++
  arch/x86/kvm/x86.c |8 
  include/linux/kvm.h|5 +
  5 files changed, 22 insertions(+), 0 deletions(-)
 diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
 index 4d8dcbd..88d0ac3 100644
 --- a/arch/x86/include/asm/kvm.h
 +++ b/arch/x86/include/asm/kvm.h
 @@ -24,6 +24,7 @@
  #define __KVM_HAVE_DEBUGREGS
  #define __KVM_HAVE_XSAVE
  #define __KVM_HAVE_XCRS
 +#define __KVM_HAVE_SET_LINT1
  
  /* Architectural interrupt line count. */
  #define KVM_NR_INTERRUPTS 256
 diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
 index 53e2d08..0c96315 100644
 --- a/arch/x86/kvm/irq.h
 +++ b/arch/x86/kvm/irq.h
 @@ -95,6 +95,7 @@ void kvm_pic_reset(struct kvm_kpic_state *s);
  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
  void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
  void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu);
 +void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu);
  void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu);
  void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu);
  void __kvm_migrate_timers(struct kvm_vcpu *vcpu);
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 57dcbd4..87fe36a 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -1039,6 +1039,13 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu)
  kvm_apic_local_deliver(apic, APIC_LVT0);
  }
  
 +void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu)
 +{
 +struct kvm_lapic *apic = vcpu-arch.apic;
 +
 +kvm_apic_local_deliver(apic, APIC_LVT1);
 +}
 +
  static struct kvm_timer_ops lapic_timer_ops = {
  .is_periodic = lapic_is_periodic,
  };
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 84a28ea..fccd094 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -2077,6 +2077,7 @@ int kvm_dev_ioctl_check_extension(long ext)
  case KVM_CAP_XSAVE:
  case KVM_CAP_ASYNC_PF:
  case KVM_CAP_GET_TSC_KHZ:
 +case KVM_CAP_SET_LINT1:
  r = 1;
  break;
  case KVM_CAP_COALESCED_MMIO:
 @@ -3264,6 +3265,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
  
  goto out;
  }
 +case KVM_SET_LINT1: {
 +r = -EINVAL;
 +if (!irqchip_in_kernel(vcpu-kvm))
 +goto out;
 +r = 0;
 +kvm_apic_lint1_deliver(vcpu);
 +}
  default:
  r = -EINVAL;
  }
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index aace6b8..3a10572 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -554,6 +554,9 @@ struct kvm_ppc_pvinfo {
  #define KVM_CAP_PPC_SMT 64
  #define KVM_CAP_PPC_RMA 65
  #define KVM_CAP_S390_GMAP 71
 +#ifdef __KVM_HAVE_SET_LINT1
 +#define KVM_CAP_SET_LINT1 72
 +#endif
 
 Actually, there is no need for __KVM_HAVE_SET_LINT1 and #ifdef. User
 land will just do a runtime check.
 


There is not bad result brought by __KVM_HAVE_SET_LINT1
and help for compile time check.

Thanks,
Lai

Re: [Qemu-devel] [PATCH 2/2 V5] qemu-kvm: fix improper nmi emulation

2011-10-14 Thread Jan Kiszka

On 2011-10-14 11:03, Lai Jiangshan wrote:
 Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
 button event happens. This doesn't properly emulate real hardware on
 which NMI button event triggers LINT1. Because of this, NMI is sent to
 the processor even when LINT1 is masked in LVT. For example, this
 causes the problem that kdump initiated by NMI sometimes doesn't work
 on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
 
 With this patch, inject-nmi request is handled as follows.
 
 - When in-kernel irqchip is enabled and KVM_SET_LINT1 is enabled,
   inject LINT1 instead of NMI interrupt.
 
 - otherwise when in-kernel irqchip is enabled, get the in-kernel
   LAPIC states and test the APIC_LVT_MASKED, if LINT1 is unmasked,
   and then delivering the NMI directly.
 
 - otherwise, userland lapic emulates NMI button and inject NMI
   if it is unmasked.
 
 Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
 Reported-by: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com
 ---
  hw/apic.c |   72 
 +
  hw/apic.h |1 +
  monitor.c |6 -
  3 files changed, 78 insertions(+), 1 deletions(-)
 
 diff --git a/hw/apic.c b/hw/apic.c
 index 69d6ac5..91b82d0 100644
 --- a/hw/apic.c
 +++ b/hw/apic.c
 @@ -205,6 +205,78 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
  }
  }
  
 +#ifdef KVM_CAP_IRQCHIP

Please read all my comments. That unfortunately also applies to the rest
of the patch.

 +static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id);
 +
 +static void kvm_irqchip_deliver_nmi(void *p)
 +{
 +APICState *s = p;
 +struct kvm_lapic_state klapic;
 +uint32_t lvt;
 +
 +kvm_get_lapic(s-cpu_env, klapic);
 +lvt = kapic_reg(klapic, 0x32 + APIC_LVT_LINT1);
 +
 +if (lvt  APIC_LVT_MASKED) {
 +return;
 +}
 +
 +if (((lvt  8)  7) != APIC_DM_NMI) {
 +return;
 +}
 +
 +kvm_vcpu_ioctl(s-cpu_env, KVM_NMI);
 +}
 +
 +static void __apic_deliver_nmi(APICState *s)
 +{
 +if (kvm_irqchip_in_kernel()) {
 +run_on_cpu(s-cpu_env, kvm_irqchip_deliver_nmi, s);
 +} else {
 +apic_local_deliver(s, APIC_LVT_LINT1);
 +}
 +}
 +#else
 +static void __apic_deliver_nmi(APICState *s)
 +{
 +apic_local_deliver(s, APIC_LVT_LINT1);
 +}
 +#endif
 +
 +enum {
 +KVM_SET_LINT1_UNKNOWN,
 +KVM_SET_LINT1_ENABLED,
 +KVM_SET_LINT1_DISABLED,
 +};
 +
 +static void kvm_set_lint1(void *p)
 +{
 +CPUState *env = p;
 +
 +kvm_vcpu_ioctl(env, KVM_SET_LINT1);
 +}
 +
 +void apic_deliver_nmi(DeviceState *d)
 +{
 +APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
 +static int kernel_lint1 = KVM_SET_LINT1_UNKNOWN;
 +
 +if (kernel_lint1 == KVM_SET_LINT1_UNKNOWN) {
 +if (kvm_enabled()  kvm_irqchip_in_kernel() 
 +kvm_check_extension(kvm_state, KVM_CAP_SET_LINT1)) {

That CAP test belongs where the injection shall happen. Here you decide
about user space vs. kernel space APIC model.

Let's try it together:

if kvm_enabled  kvm_irqchip_in_kernel
run_on_cpu(kvm_apic_deliver_nmi)
else
apic_local_deliver(APIC_LVT_LINT1)

with kvm_acpi_deliver_nmi like this:

if !check_extention(CAP_SET_LINT1)
get_kernel_apic_state
if !nmi_acceptable
return
kvm_vcpu_ioctl(KVM_NMI)

Please don't trust me blindly and re-check, but this is how the scenario
looks like to me.

Thanks for your patience,
Jan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 1/1 V5] kernel/kvm: introduce KVM_SET_LINT1 and fix improper nmi emulation

2011-10-14 Thread Jan Kiszka

On 2011-10-14 11:27, Lai Jiangshan wrote:
 On 10/14/2011 05:07 PM, Jan Kiszka wrote:
 On 2011-10-14 11:03, Lai Jiangshan wrote:
 Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
 button event happens. This doesn't properly emulate real hardware on
 which NMI button event triggers LINT1. Because of this, NMI is sent to
 the processor even when LINT1 is masked in LVT. For example, this
 causes the problem that kdump initiated by NMI sometimes doesn't work
 on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.

 With this patch, we introduce introduce KVM_SET_LINT1,
 and we can use KVM_SET_LINT1 to correctly emulate NMI button
 without change the old KVM_NMI behavior.

 Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
 Reported-by: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com
 ---
  arch/x86/include/asm/kvm.h |1 +
  arch/x86/kvm/irq.h |1 +
  arch/x86/kvm/lapic.c   |7 +++
  arch/x86/kvm/x86.c |8 
  include/linux/kvm.h|5 +
  5 files changed, 22 insertions(+), 0 deletions(-)
 diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
 index 4d8dcbd..88d0ac3 100644
 --- a/arch/x86/include/asm/kvm.h
 +++ b/arch/x86/include/asm/kvm.h
 @@ -24,6 +24,7 @@
  #define __KVM_HAVE_DEBUGREGS
  #define __KVM_HAVE_XSAVE
  #define __KVM_HAVE_XCRS
 +#define __KVM_HAVE_SET_LINT1
  
  /* Architectural interrupt line count. */
  #define KVM_NR_INTERRUPTS 256
 diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
 index 53e2d08..0c96315 100644
 --- a/arch/x86/kvm/irq.h
 +++ b/arch/x86/kvm/irq.h
 @@ -95,6 +95,7 @@ void kvm_pic_reset(struct kvm_kpic_state *s);
  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
  void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
  void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu);
 +void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu);
  void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu);
  void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu);
  void __kvm_migrate_timers(struct kvm_vcpu *vcpu);
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 57dcbd4..87fe36a 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -1039,6 +1039,13 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu)
 kvm_apic_local_deliver(apic, APIC_LVT0);
  }
  
 +void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu)
 +{
 +   struct kvm_lapic *apic = vcpu-arch.apic;
 +
 +   kvm_apic_local_deliver(apic, APIC_LVT1);
 +}
 +
  static struct kvm_timer_ops lapic_timer_ops = {
 .is_periodic = lapic_is_periodic,
  };
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 84a28ea..fccd094 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -2077,6 +2077,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 case KVM_CAP_XSAVE:
 case KVM_CAP_ASYNC_PF:
 case KVM_CAP_GET_TSC_KHZ:
 +   case KVM_CAP_SET_LINT1:
 r = 1;
 break;
 case KVM_CAP_COALESCED_MMIO:
 @@ -3264,6 +3265,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
  
 goto out;
 }
 +   case KVM_SET_LINT1: {
 +   r = -EINVAL;
 +   if (!irqchip_in_kernel(vcpu-kvm))
 +   goto out;
 +   r = 0;
 +   kvm_apic_lint1_deliver(vcpu);
 +   }
 default:
 r = -EINVAL;
 }
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index aace6b8..3a10572 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -554,6 +554,9 @@ struct kvm_ppc_pvinfo {
  #define KVM_CAP_PPC_SMT 64
  #define KVM_CAP_PPC_RMA65
  #define KVM_CAP_S390_GMAP 71
 +#ifdef __KVM_HAVE_SET_LINT1
 +#define KVM_CAP_SET_LINT1 72
 +#endif

 Actually, there is no need for __KVM_HAVE_SET_LINT1 and #ifdef. User
 land will just do a runtime check.


 
 There is not bad result brought by __KVM_HAVE_SET_LINT1
 and help for compile time check.

It's guarding an arch-specific CAP that will only be checked if there is
a need. That's in contrast to generic features that are no supported for
all archs (like __KVM_HAVE_GUEST_DEBUG - KVM_CAP_SET_GUEST_DEBUG).
Granted, there are quite a few examples for redundant __KVM_HAVE/#ifdef
KVM_CAP in the KVM header, but let's not add more.

Jan



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH] usb-hub: wakeup on attach

2011-10-14 Thread Gerd Hoffmann

When attaching a new device we must send a wakeup request to the root
hub, otherwise the guest will not notice the new device in case the
usb hub is suspended.

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 hw/usb-hub.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/usb-hub.c b/hw/usb-hub.c
index 3e923a0..908c622 100644
--- a/hw/usb-hub.c
+++ b/hw/usb-hub.c
@@ -163,6 +163,7 @@ static void usb_hub_attach(USBPort *port1)
 } else {
 port-wPortStatus = ~PORT_STAT_LOW_SPEED;
 }
+usb_wakeup(s-dev);
 }
 
 static void usb_hub_detach(USBPort *port1)
-- 
1.7.1

[Qemu-devel] [PATCH 11/12] usb-ehci: Use PCI DMA stub functions

2011-10-14 Thread David Gibson

This updates the usb-ehci device emulation to use the explicit PCI DMA
wrapper to initialize its scatter/gathjer structure.  This means this
driver should not need further changes when the sglist interface is
extended to support IOMMUs.

Signed-off-by: David Gibson da...@gibson.dropbear.id.au
---
 hw/usb-ehci.c |   44 +---
 1 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/hw/usb-ehci.c b/hw/usb-ehci.c
index 27376a2..9c74a9f 100644
--- a/hw/usb-ehci.c
+++ b/hw/usb-ehci.c
@@ -1101,12 +1101,13 @@ static void ehci_mem_writel(void *ptr, 
target_phys_addr_t addr, uint32_t val)
 // TODO : Put in common header file, duplication from usb-ohci.c
 
 /* Get an array of dwords from main memory */
-static inline int get_dwords(uint32_t addr, uint32_t *buf, int num)
+static inline int get_dwords(EHCIState *ehci, uint32_t addr,
+ uint32_t *buf, int num)
 {
 int i;
 
 for(i = 0; i  num; i++, buf++, addr += sizeof(*buf)) {
-cpu_physical_memory_rw(addr,(uint8_t *)buf, sizeof(*buf), 0);
+pci_dma_read(ehci-dev, addr, (uint8_t *)buf, sizeof(*buf));
 *buf = le32_to_cpu(*buf);
 }
 
@@ -1114,13 +1115,14 @@ static inline int get_dwords(uint32_t addr, uint32_t 
*buf, int num)
 }
 
 /* Put an array of dwords in to main memory */
-static inline int put_dwords(uint32_t addr, uint32_t *buf, int num)
+static inline int put_dwords(EHCIState *ehci, uint32_t addr,
+ uint32_t *buf, int num)
 {
 int i;
 
 for(i = 0; i  num; i++, buf++, addr += sizeof(*buf)) {
 uint32_t tmp = cpu_to_le32(*buf);
-cpu_physical_memory_rw(addr,(uint8_t *)tmp, sizeof(tmp), 1);
+pci_dma_write(ehci-dev, addr, (uint8_t *)tmp, sizeof(tmp));
 }
 
 return 1;
@@ -1169,7 +1171,8 @@ static int ehci_qh_do_overlay(EHCIQueue *q)
 q-qh.bufptr[1] = ~BUFPTR_CPROGMASK_MASK;
 q-qh.bufptr[2] = ~BUFPTR_FRAMETAG_MASK;
 
-put_dwords(NLPTR_GET(q-qhaddr), (uint32_t *) q-qh, sizeof(EHCIqh)  2);
+put_dwords(q-ehci, NLPTR_GET(q-qhaddr), (uint32_t *) q-qh,
+   sizeof(EHCIqh)  2);
 
 return 0;
 }
@@ -1177,12 +1180,12 @@ static int ehci_qh_do_overlay(EHCIQueue *q)
 static int ehci_init_transfer(EHCIQueue *q)
 {
 uint32_t cpage, offset, bytes, plen;
-target_phys_addr_t page;
+dma_addr_t page;
 
 cpage  = get_field(q-qh.token, QTD_TOKEN_CPAGE);
 bytes  = get_field(q-qh.token, QTD_TOKEN_TBYTES);
 offset = q-qh.bufptr[0]  ~QTD_BUFPTR_MASK;
-qemu_sglist_init(q-sgl, 5);
+pci_dma_sglist_init(q-sgl, q-ehci-dev, 5);
 
 while (bytes  0) {
 if (cpage  4) {
@@ -1428,7 +1431,7 @@ static int ehci_process_itd(EHCIState *ehci,
 return USB_RET_PROCERR;
 }
 
-qemu_sglist_init(ehci-isgl, 2);
+pci_dma_sglist_init(ehci-isgl, ehci-dev, 2);
 if (off + len  4096) {
 /* transfer crosses page border */
 uint32_t len2 = off + len - 4096;
@@ -1532,7 +1535,8 @@ static int ehci_state_waitlisthead(EHCIState *ehci,  int 
async)
 
 /*  Find the head of the list (4.9.1.1) */
 for(i = 0; i  MAX_QH; i++) {
-get_dwords(NLPTR_GET(entry), (uint32_t *) qh, sizeof(EHCIqh)  2);
+get_dwords(ehci, NLPTR_GET(entry), (uint32_t *) qh,
+   sizeof(EHCIqh)  2);
 ehci_trace_qh(NULL, NLPTR_GET(entry), qh);
 
 if (qh.epchar  QH_EPCHAR_H) {
@@ -1629,7 +1633,8 @@ static EHCIQueue *ehci_state_fetchqh(EHCIState *ehci, int 
async)
 goto out;
 }
 
-get_dwords(NLPTR_GET(q-qhaddr), (uint32_t *) q-qh, sizeof(EHCIqh)  2);
+get_dwords(ehci, NLPTR_GET(q-qhaddr),
+   (uint32_t *) q-qh, sizeof(EHCIqh)  2);
 ehci_trace_qh(q, NLPTR_GET(q-qhaddr), q-qh);
 
 if (q-async == EHCI_ASYNC_INFLIGHT) {
@@ -1698,7 +1703,7 @@ static int ehci_state_fetchitd(EHCIState *ehci, int async)
 assert(!async);
 entry = ehci_get_fetch_addr(ehci, async);
 
-get_dwords(NLPTR_GET(entry),(uint32_t *) itd,
+get_dwords(ehci, NLPTR_GET(entry), (uint32_t *) itd,
sizeof(EHCIitd)  2);
 ehci_trace_itd(ehci, entry, itd);
 
@@ -1706,8 +1711,8 @@ static int ehci_state_fetchitd(EHCIState *ehci, int async)
 return -1;
 }
 
-put_dwords(NLPTR_GET(entry), (uint32_t *) itd,
-sizeof(EHCIitd)  2);
+put_dwords(ehci, NLPTR_GET(entry), (uint32_t *) itd,
+   sizeof(EHCIitd)  2);
 ehci_set_fetch_addr(ehci, async, itd.next);
 ehci_set_state(ehci, async, EST_FETCHENTRY);
 
@@ -1722,7 +1727,7 @@ static int ehci_state_fetchsitd(EHCIState *ehci, int 
async)
 assert(!async);
 entry = ehci_get_fetch_addr(ehci, async);
 
-get_dwords(NLPTR_GET(entry), (uint32_t *)sitd,
+get_dwords(ehci, NLPTR_GET(entry), (uint32_t *)sitd,
sizeof(EHCIsitd)  2);
 ehci_trace_sitd(ehci, entry, sitd);
 
@@ -1784,7 +1789,8 @@ static int ehci_state_fetchqtd(EHCIQueue

[Qemu-devel] [PATCH 01/12] Add stub functions for PCI device models to do PCI DMA

2011-10-14 Thread David Gibson

From: Alexey Kardashevskiy a...@ozlabs.ru

This patch adds functions to pci.[ch] to perform PCI DMA operations.
At present, these are just stubs which perform directly cpu physical
memory accesses.  Stubs are included which are analogous to
cpu_physical_memory_{read,write}(), the stX_phys() and ldX_phys()
functions and cpu_physical_memory_{map,unmap}().

In addition, a wrapper around qemu_sglist_init() is provided, which
also takes a PCIDevice *.  It's assumed that _init() is the only
sglist function which will need wrapping, the idea being that once we
have IOMMU support whatever IOMMU context handle the wrapper derives
from the PCI device will be stored within the sglist structure for
later use.

Using these stubs, however, distinguishes PCI device DMA transactions from
other accesses to physical memory, which will allow PCI IOMMU support to
be added in one place, rather than updating every PCI driver at that time.

That is, it allows us to update individual PCI drivers to support an IOMMU
without having yet determined the details of how the IOMMU emulation will
operate.  This will let us remove the most bitrot-sensitive part of an
IOMMU patch in advance.

Signed-off-by: David Gibson da...@gibson.dropbear.id.au
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 dma.h|   13 +
 hw/pci.c |   53 +
 hw/pci.h |   49 +
 3 files changed, 111 insertions(+), 4 deletions(-)

diff --git a/dma.h b/dma.h
index 2bdc236..ce6aab2 100644
--- a/dma.h
+++ b/dma.h
@@ -18,9 +18,15 @@
 typedef struct ScatterGatherEntry ScatterGatherEntry;
 
 #if defined(TARGET_PHYS_ADDR_BITS)
+typedef target_phys_addr_t dma_addr_t;
+typedef enum {
+DMA_DIRECTION_TO_DEVICE = 0,
+DMA_DIRECTION_FROM_DEVICE = 1,
+} DMADirection;
+
 struct ScatterGatherEntry {
-target_phys_addr_t base;
-target_phys_addr_t len;
+dma_addr_t base;
+dma_addr_t len;
 };
 
 struct QEMUSGList {
@@ -31,8 +37,7 @@ struct QEMUSGList {
 };
 
 void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
-void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
- target_phys_addr_t len);
+void qemu_sglist_add(QEMUSGList *qsg, dma_addr_t base, dma_addr_t len);
 void qemu_sglist_destroy(QEMUSGList *qsg);
 #endif
 
diff --git a/hw/pci.c b/hw/pci.c
index 749e8d8..4dbdd81 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -2129,3 +2129,56 @@ MemoryRegion *pci_address_space_io(PCIDevice *dev)
 {
 return dev-bus-address_space_io;
 }
+
+#define PCI_DMA_DEFINE_LDST(_lname, _sname, _end, _bits)  \
+uint##_bits##_t ld##_lname##_##_end##_pci_dma(PCIDevice *dev, \
+  dma_addr_t addr)\
+{ \
+uint##_bits##_t val;  \
+pci_dma_read(dev, addr, val, sizeof(val));   \
+return _end##_bits##_to_cpu(val); \
+} \
+void st##_sname##_##_end##_pci_dma(PCIDevice *dev,\
+  dma_addr_t addr, uint##_bits##_t val)   \
+{ \
+val = cpu_to_##_end##_bits(val);  \
+pci_dma_write(dev, addr, val, sizeof(val));  \
+}
+
+uint8_t ldub_pci_dma(PCIDevice *dev, dma_addr_t addr)
+{
+uint8_t val;
+
+pci_dma_read(dev, addr, val, sizeof(val));
+return val;
+}
+
+void stb_pci_dma(PCIDevice *dev, dma_addr_t addr, uint8_t val)
+{
+pci_dma_write(dev, addr, val, sizeof(val));
+}
+
+PCI_DMA_DEFINE_LDST(uw, w, le, 16);
+PCI_DMA_DEFINE_LDST(uw, w, be, 16);
+PCI_DMA_DEFINE_LDST(l, l, le, 32);
+PCI_DMA_DEFINE_LDST(l, l, be, 32);
+PCI_DMA_DEFINE_LDST(q, q, le, 64);
+PCI_DMA_DEFINE_LDST(q, q, be, 64);
+
+void *pci_dma_map(PCIDevice *dev, dma_addr_t addr, dma_addr_t *plen,
+  DMADirection dir)
+{
+target_phys_addr_t len = *plen;
+void *buf;
+
+buf = cpu_physical_memory_map(addr, len, dir == 
DMA_DIRECTION_FROM_DEVICE);
+*plen = len;
+return buf;
+}
+
+void pci_dma_unmap(PCIDevice *dev, void *buffer, dma_addr_t len,
+   DMADirection dir, dma_addr_t access_len)
+{
+cpu_physical_memory_unmap(buffer, len, dir == DMA_DIRECTION_FROM_DEVICE,
+  access_len);
+}
diff --git a/hw/pci.h b/hw/pci.h
index 86a81c8..0f89d57 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -6,6 +6,7 @@
 
 #include qdev.h
 #include memory.h
+#include dma.h
 
 /* PCI includes legacy ISA access.  */
 #include isa.h
@@ -487,4 +488,52 @@ static inline uint32_t pci_config_size(const PCIDevice *d)
 return pci_is_express(d) ? PCIE_CONFIG_SPACE_SIZE : PCI_CONFIG_SPACE_SIZE;
 }
 
+/* DMA access

Re: [Qemu-devel] [PATCH v3 03/15] add qemu_send_full and qemu_recv_full

2011-10-14 Thread Kevin Wolf

Am 05.10.2011 09:17, schrieb Paolo Bonzini:
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  osdep.c   |   67 
 +
  qemu-common.h |4 +++
  2 files changed, 71 insertions(+), 0 deletions(-)
 
 diff --git a/osdep.c b/osdep.c
 index 56e6963..718a25d 100644
 --- a/osdep.c
 +++ b/osdep.c
 @@ -166,3 +166,70 @@ int qemu_accept(int s, struct sockaddr *addr, socklen_t 
 *addrlen)
  
  return ret;
  }
 +
 +/*
 + * A variant of send(2) which handles partial write.
 + *
 + * Return the number of bytes transferred, which is only
 + * smaller than `count' if there is an error.
 + *
 + * This function won't work with non-blocking fd's.
 + * Any of the possibilities with non-bloking fd's is bad:
 + *   - return a short write (then name is wrong)
 + *   - busy wait adding (errno == EAGAIN) to the loop
 + */
 +ssize_t qemu_send_full(int fd, const void *buf, size_t count, int flags)
 +{
 +ssize_t ret = 0;
 +ssize_t total = 0;
 +
 +while (count) {
 +ret = send(fd, buf, count, flags);
 +if (ret  0) {
 +if (errno == EINTR) {
 +continue;
 +}
 +break;
 +}
 +
 +count -= ret;
 +buf += ret;
 +total += ret;
 +}
 +
 +return total;
 +}
 +
 +/*
 + * A variant of recv(2) which handles partial write.
 + *
 + * Return the number of bytes transferred, which is only
 + * smaller than `count' if there is an error.
 + *
 + * This function won't work with non-blocking fd's.
 + * Any of the possibilities with non-bloking fd's is bad:
 + *   - return a short write (then name is wrong)
 + *   - busy wait adding (errno == EAGAIN) to the loop
 + */
 +ssize_t qemu_recv_full(int fd, const void *buf, size_t count, int flags)
 +{
 +ssize_t ret = 0;
 +ssize_t total = 0;
 +
 +while (count) {
 +ret = recv(fd, buf, count, flags);

osdep.c: In function 'qemu_recv_full':
osdep.c:220: error: passing argument 2 of 'recv' discards qualifiers
from pointer target type
/usr/include/bits/socket2.h:35: note: expected 'void *' but argument is
of type 'const void *'

Kevin

[Qemu-devel] [PATCH 1/1 V5 tuning] kernel/kvm: introduce KVM_SET_LINT1 and fix improper nmi emulation

2011-10-14 Thread Lai Jiangshan

Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
button event happens. This doesn't properly emulate real hardware on
which NMI button event triggers LINT1. Because of this, NMI is sent to
the processor even when LINT1 is masked in LVT. For example, this
causes the problem that kdump initiated by NMI sometimes doesn't work
on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.

With this patch, we introduce introduce KVM_SET_LINT1,
and we can use KVM_SET_LINT1 to correctly emulate NMI button
without change the old KVM_NMI behavior.

Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
Reported-by: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com
---
 arch/x86/kvm/irq.h   |1 +
 arch/x86/kvm/lapic.c |7 +++
 arch/x86/kvm/x86.c   |8 
 include/linux/kvm.h  |3 +++
 4 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index 53e2d08..0c96315 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -95,6 +95,7 @@ void kvm_pic_reset(struct kvm_kpic_state *s);
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu);
+void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu);
 void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu);
 void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu);
 void __kvm_migrate_timers(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 57dcbd4..87fe36a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1039,6 +1039,13 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu)
kvm_apic_local_deliver(apic, APIC_LVT0);
 }
 
+void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu)
+{
+   struct kvm_lapic *apic = vcpu-arch.apic;
+
+   kvm_apic_local_deliver(apic, APIC_LVT1);
+}
+
 static struct kvm_timer_ops lapic_timer_ops = {
.is_periodic = lapic_is_periodic,
 };
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 84a28ea..fccd094 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2077,6 +2077,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
case KVM_CAP_GET_TSC_KHZ:
+   case KVM_CAP_SET_LINT1:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -3264,6 +3265,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 
goto out;
}
+   case KVM_SET_LINT1: {
+   r = -EINVAL;
+   if (!irqchip_in_kernel(vcpu-kvm))
+   goto out;
+   r = 0;
+   kvm_apic_lint1_deliver(vcpu);
+   }
default:
r = -EINVAL;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index aace6b8..11a2c42 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -554,6 +554,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_SMT 64
 #define KVM_CAP_PPC_RMA65
 #define KVM_CAP_S390_GMAP 71
+#define KVM_CAP_SET_LINT1 72
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -759,6 +760,8 @@ struct kvm_clock_data {
 #define KVM_CREATE_SPAPR_TCE _IOW(KVMIO,  0xa8, struct 
kvm_create_spapr_tce)
 /* Available with KVM_CAP_RMA */
 #define KVM_ALLOCATE_RMA _IOR(KVMIO,  0xa9, struct kvm_allocate_rma)
+/* Available with KVM_CAP_SET_LINT1 for x86 */
+#define KVM_SET_LINT1_IO(KVMIO,   0xaa)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1  0)

[Qemu-devel] [PATCH 1/2 V5 tuning] qemu-kvm: Synchronize kernel headers

2011-10-14 Thread Lai Jiangshan



Synchronize newest kernel headers which have
KVM_CAP_SET_LINT1 and KVM_SET_LINT1 by
./scripts/update-linux-headers.sh

Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
---
 linux-headers/asm-powerpc/kvm.h  |   19 +--
 linux-headers/asm-x86/kvm_para.h |   14 ++
 linux-headers/linux/kvm.h|   24 +---
 linux-headers/linux/kvm_para.h   |1 +
 4 files changed, 49 insertions(+), 9 deletions(-)
diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h
index 777d307..a4f6c85 100644
--- a/linux-headers/asm-powerpc/kvm.h
+++ b/linux-headers/asm-powerpc/kvm.h
@@ -22,6 +22,10 @@
 
 #include linux/types.h
 
+/* Select powerpc specific features in linux/kvm.h */
+#define __KVM_HAVE_SPAPR_TCE
+#define __KVM_HAVE_PPC_SMT
+
 struct kvm_regs {
__u64 pc;
__u64 cr;
@@ -166,8 +170,8 @@ struct kvm_sregs {
} ppc64;
struct {
__u32 sr[16];
-   __u64 ibat[8];
-   __u64 dbat[8];
+   __u64 ibat[8]; 
+   __u64 dbat[8]; 
} ppc32;
} s;
struct {
@@ -272,4 +276,15 @@ struct kvm_guest_debug_arch {
 #define KVM_INTERRUPT_UNSET-2U
 #define KVM_INTERRUPT_SET_LEVEL-3U
 
+/* for KVM_CAP_SPAPR_TCE */
+struct kvm_create_spapr_tce {
+   __u64 liobn;
+   __u32 window_size;
+};
+
+/* for KVM_ALLOCATE_RMA */
+struct kvm_allocate_rma {
+   __u64 rma_size;
+};
+
 #endif /* __LINUX_KVM_POWERPC_H */
diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h
index 834d71e..f2ac46a 100644
--- a/linux-headers/asm-x86/kvm_para.h
+++ b/linux-headers/asm-x86/kvm_para.h
@@ -21,6 +21,7 @@
  */
 #define KVM_FEATURE_CLOCKSOURCE23
 #define KVM_FEATURE_ASYNC_PF   4
+#define KVM_FEATURE_STEAL_TIME 5
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -30,10 +31,23 @@
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
 
+#define KVM_MSR_ENABLED 1
 /* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */
 #define MSR_KVM_WALL_CLOCK_NEW  0x4b564d00
 #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
+#define MSR_KVM_STEAL_TIME  0x4b564d03
+
+struct kvm_steal_time {
+   __u64 steal;
+   __u32 version;
+   __u32 flags;
+   __u32 pad[12];
+};
+
+#define KVM_STEAL_ALIGNMENT_BITS 5
+#define KVM_STEAL_VALID_BITS ((-1ULL  (KVM_STEAL_ALIGNMENT_BITS + 1)))
+#define KVM_STEAL_RESERVED_MASK (((1  KVM_STEAL_ALIGNMENT_BITS) - 1 )  1)
 
 #define KVM_MAX_MMU_OP_BATCH   32
 
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index fc63b73..0fd246f 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -161,6 +161,7 @@ struct kvm_pit_config {
 #define KVM_EXIT_NMI  16
 #define KVM_EXIT_INTERNAL_ERROR   17
 #define KVM_EXIT_OSI  18
+#define KVM_EXIT_PAPR_HCALL  19
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 #define KVM_INTERNAL_ERROR_EMULATION 1
@@ -264,6 +265,11 @@ struct kvm_run {
struct {
__u64 gprs[32];
} osi;
+   struct {
+   __u64 nr;
+   __u64 ret;
+   __u64 args[9];
+   } papr_hcall;
/* Fix the size of the union. */
char padding[256];
};
@@ -544,6 +550,11 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_TSC_CONTROL 60
 #define KVM_CAP_GET_TSC_KHZ 61
 #define KVM_CAP_PPC_BOOKE_SREGS 62
+#define KVM_CAP_SPAPR_TCE 63
+#define KVM_CAP_PPC_SMT 64
+#define KVM_CAP_PPC_RMA65
+#define KVM_CAP_S390_GMAP 71
+#define KVM_CAP_SET_LINT1 72
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -746,6 +757,11 @@ struct kvm_clock_data {
 /* Available with KVM_CAP_XCRS */
 #define KVM_GET_XCRS _IOR(KVMIO,  0xa6, struct kvm_xcrs)
 #define KVM_SET_XCRS _IOW(KVMIO,  0xa7, struct kvm_xcrs)
+#define KVM_CREATE_SPAPR_TCE _IOW(KVMIO,  0xa8, struct 
kvm_create_spapr_tce)
+/* Available with KVM_CAP_RMA */
+#define KVM_ALLOCATE_RMA _IOR(KVMIO,  0xa9, struct kvm_allocate_rma)
+/* Available with KVM_CAP_SET_LINT1 for x86 */
+#define KVM_SET_LINT1_IO(KVMIO,   0xaa)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1  0)
 
@@ -773,20 +789,14 @@ struct kvm_assigned_pci_dev {
 
 struct kvm_assigned_irq {
__u32 assigned_dev_id;
-   __u32 host_irq;
+   __u32 host_irq; /* ignored (legacy field) */
__u32 guest_irq;
__u32 flags;
union {
-   struct {
-   __u32 addr_lo;
-   __u32 addr_hi;
-   __u32 data;
-   } guest_msi;
__u32 reserved[12];

Re: [Qemu-devel] balloon driver on winxp guest start failed

2011-10-14 Thread hkran


On 10/14/2011 04:55 AM, Vadim Rozenfeld wrote:

On Thu, 2011-10-13 at 15:47 +0100, Stefan Hajnoczi wrote:

On Thu, Oct 13, 2011 at 5:00 AM, hkranhk...@vnet.linux.ibm.com  wrote:

On 10/12/2011 07:09 PM, hkran wrote:

I used balloon driver for windows  virtio-win-0.1-15.iso (from
http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/bin/)

following the install guard , I installed the balloon driver like this:

devcon.exe install d:\wxp\x86\balloon.inf
PCI\VEN_1AF4DEV_1002SUBSYS_00051AF4REV_00
  then reboot guest Os, but the status of driver installed is always
incorrect, that show me the driver start failed (code 10) in the device
manager.

Seems like a resource allocation problem

I typed the following cmds in the monitor command line:

(qemu) device_add virtio-balloon
(qemu) info balloon
balloon: actual=2048
(qemu) balloon 1024
(qemu) info balloon
balloon: actual=2048
(qemu) info balloon
balloon: actual=2048

And I also tried it by using qemu -balloon virtio param  when getting
qemu up, the status is worse, the winxp guest froze at boot screen.

Am I using balloon driver in a correct way?




For the boot failure case, I take more looks into it.  I open the trace
output and see the following when boot failed
Balloon driver, built on Oct 13 2011 10:46:59
^M-- DriverEntry
^Mfile z:\source\kvm-guest-drivers-windows\balloon\sys\driver.c line 151
^M--  BalloonDeviceAdd
^M-- BalloonDeviceAdd
^M--  BalloonEvtDevicePrepareHardware
^M-  Port   Resource [C0A0-C0C0]
^M-- BalloonEvtDevicePrepareHardware
^M--  BalloonEvtDeviceD0Entry
^M--  BalloonInit
^M--  VIRTIO_BALLOON_F_STATS_VQ
^M-- BalloonInit
^M--  BalloonInterruptEnable
^M-- BalloonInterruptEnable

here, the system is blocked.

I compare it with the logfile in the normal case that I hot-plugin the
balloon device, and then find the system blocked before calling at
BalloonInterruptDpc.


What about ISR? Can you try changing balloon size and check if balloon
ISR was invoked or not?

Is it meaning that we open the interrupt of balloon device too soon when
booting the system?

I suggest CCing Vadim on virtio Windows driver questions.  Not sure if
he sees every qemu-devel email.

Stefan



To make the issue clearer, I do more tests about that. Now I use the 
package virtio-win-prewhql-0.1-15-sources.zip from 
http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/src/
The problem that the balloon driver status is incorrect was not 
reproduced any longer, but boot failure still be there.
more tests told me as if the failure will occur only in the case where 
virtio-serial and balloon are all attached when qemu booting:


(qemu) [huikai@oc0100708617 ~]$ 
/home/huikai/qemu15/bin/qemu-system-x86_64 --enable-kvm -m 2048   -drive 
file=/home/huikai/xp_shanghai.img,if=virtio -net user  -net 
nic,model=viga qxl -localtime -chardev stdio,id=muxstdio -mon 
chardev=muxstdio -usb -usbdevice tablet -device virtio-serial,id=vs0 
-chardev socket,path=/tmp/foo,server,nowait,id=foo -device 
virtserialport,bus=vs0.0,chardev=foo,name=helloworld -serial 
file:/tmp/xp_1014_6.log -balloon virtio,id=ball1


the trace:

Virtio-Serial driver started...built on Oct 14 2011 15:58:02
^M-- VIOSerialEvtDeviceAdd
^M-- VIOSerialInitInterruptHandling
^MBalloon driver, built on Oct 13 2011 17:34:56
^M-- DriverEntry
^M-- BalloonDeviceAdd
^M-- BalloonDeviceAdd
^M-- BalloonEvtDevicePrepareHardware
^M- Port   Resource [C0A0-C0C0]
^M-- BalloonEvtDevicePrepareHardware
^M-- BalloonEvtDeviceD0Entry
^M-- BalloonInit
^M-- VIRTIO_BALLOON_F_STATS_VQ
^M-- BalloonInit
^M-- BalloonInterruptEnable
^M-- BalloonInterruptEnable
^M-- VIOSerialEvtDevicePrepareHardware
^MIO Port Info  [C080-C0A0]
^MWe have multiport host
^MVirtIOConsoleConfig-max_nr_ports 31
^M-- VIOSerialEvtDeviceD0Entry
^M-- VIOSerialInitAllQueues
^M-- VIOSerialFillQueue
^M-- VIOSerialAllocateBuffer
^M-- VIOSerialAddInBuf  buf = 89B13A50
^M-- VIOSerialAllocateBuffer
^M-- VIOSerialAddInBuf  buf = 89B13638
^M-- VIOSerialAllocateBuffer
^M-- VIOSerialAddInBuf  buf = 89C07E08
^M-- VIOSerialAllocateBuffer
^M-- VIOSerialAddInBuf  buf = 89C07C50
^M-- VIOSerialAllocateBuffer
^M-- VIOSerialAddInBuf  buf = 89C07A98
... ...
^M-- VIOSerialAllocateBuffer
^M-- VIOSerialAddInBuf  buf = 89BD14B8
^M-- VIOSerialAllocateBuffer
^M-- VIOSerialAddInBuf  buf = 89B826E8
^M-- VIOSerialAllocateBuffer
^M-- VIOSerialAddInBuf  buf = 89BE4450
^M-- VIOSerialAllocateBuffer
^M-- VIOSerialAddInBuf  buf = 89BE2398
^M-- VIOSerialAllocateBuffer
^M-- VIOSerialAddInBuf  buf = 89C53468
^M-- VIOSerialAllocateBuffer
^M-- VIOSerialAddInBuf  buf = 89C37E18
^M-- VIOSerialAllocateBuffer
^M-- VIOSerialAddInBuf  buf = 89C374C0
^M-- VIOSerialFreeBuffer  buf = 89C374C0, buf-va_buf = 89983000
^MVIOSerialRenewAllPorts
^M-- VIOSerialFillQueue
^M-- VIOSerialAllocateBuffer
^M-- VIOSerialAddInBuf  buf = 89C374C0
^M-- VIOSerialFreeBuffer  buf = 89C374C0, buf-va_buf = 89983000
^MSetting VIRTIO_CONFIG_S_DRIVER_OK flag
^M

not any more output here.

Re: [Qemu-devel] [PATCH 0/5] block: remove unused emulation and synchronous functions

2011-10-14 Thread Kevin Wolf

Am 13.10.2011 22:09, schrieb Stefan Hajnoczi:
 Now that the block layer processes requests in coroutine context, some of the
 emulation wrappers and duplicate code paths can be dropped.  Paraphrasing a
 wise man, Arnold Schwarzenegger, I will go to the block layer and I will 
 clean
 house :).
 
 They key thing behind this series is that the block layer processes requests 
 in
 coroutine context and will try to use .brdv_co_readv()/.bdrv_co_writev() when
 possible.  Should the BlockDriver not implement those interfaces, an emulation
 function will be used to provide them using aio.  If the BlockDriver does not
 implement aio interfaces, then an emulation function will be used to provide
 them using synchronous I/O.
 
 This means:
 
 1. A BlockDriver that implements coroutine interfaces does not need to
implement aio or synchronous interfaces.
 
 2. A BlockDriver that implements aio interfaces does not need to implement
synchronous interfaces.
 
 3. Coroutine interfaces are preferred and do not require any emulation
functions.
 
 This patch series propagates these rules across existing BlockDrivers and
 removes unused emulation functions from block.c.
 
 Stefan Hajnoczi (5):
   block: drop emulation functions that use coroutines
   raw-posix: remove bdrv_read()/bdrv_write()
   block: use coroutine interface for raw format
   block: drop .bdrv_read()/.bdrv_write() emulation
   block: drop bdrv_has_async_rw()

Thanks, applied all to the block branch.

  block.c   |  142 ++-
  block/raw-posix.c |  277 
 -
  block/raw.c   |   32 ++-
  3 files changed, 18 insertions(+), 433 deletions(-)

Awesome. We need more series like this! :-)

Kevin

[Qemu-devel] [PATCH 09/12] intel-hda: Use PCI DMA stub functions

2011-10-14 Thread David Gibson

This updates the intel-hda device emulation to use the explicit PCI DMA
functions, instead of directly calling physical memory access functions.

Signed-off-by: David Gibson da...@gibson.dropbear.id.au
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 hw/intel-hda.c |   14 +++---
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/intel-hda.c b/hw/intel-hda.c
index 4272204..98790e8 100644
--- a/hw/intel-hda.c
+++ b/hw/intel-hda.c
@@ -24,6 +24,7 @@
 #include audiodev.h
 #include intel-hda.h
 #include intel-hda-defs.h
+#include dma.h
 
 /* - */
 /* hda bus   */
@@ -328,7 +329,7 @@ static void intel_hda_corb_run(IntelHDAState *d)
 
 rp = (d-corb_rp + 1)  0xff;
 addr = intel_hda_addr(d-corb_lbase, d-corb_ubase);
-verb = ldl_le_phys(addr + 4*rp);
+verb = ldl_le_pci_dma(d-pci, addr + 4*rp);
 d-corb_rp = rp;
 
 dprint(d, 2, %s: [rp 0x%x] verb 0x%08x\n, __FUNCTION__, rp, verb);
@@ -360,8 +361,8 @@ static void intel_hda_response(HDACodecDevice *dev, bool 
solicited, uint32_t res
 ex = (solicited ? 0 : (1  4)) | dev-cad;
 wp = (d-rirb_wp + 1)  0xff;
 addr = intel_hda_addr(d-rirb_lbase, d-rirb_ubase);
-stl_le_phys(addr + 8*wp, response);
-stl_le_phys(addr + 8*wp + 4, ex);
+stl_le_pci_dma(d-pci, addr + 8*wp, response);
+stl_le_pci_dma(d-pci, addr + 8*wp + 4, ex);
 d-rirb_wp = wp;
 
 dprint(d, 2, %s: [wp 0x%x] response 0x%x, extra 0x%x\n,
@@ -425,8 +426,7 @@ static bool intel_hda_xfer(HDACodecDevice *dev, uint32_t 
stnr, bool output,
 dprint(d, 3, dma: entry %d, pos %d/%d, copy %d\n,
st-be, st-bp, st-bpl[st-be].len, copy);
 
-cpu_physical_memory_rw(st-bpl[st-be].addr + st-bp,
-   buf, copy, !output);
+pci_dma_rw(d-pci, st-bpl[st-be].addr + st-bp, buf, copy, !output);
 st-lpib += copy;
 st-bp += copy;
 buf += copy;
@@ -448,7 +448,7 @@ static bool intel_hda_xfer(HDACodecDevice *dev, uint32_t 
stnr, bool output,
 }
 if (d-dp_lbase  0x01) {
 addr = intel_hda_addr(d-dp_lbase  ~0x01, d-dp_ubase);
-stl_le_phys(addr + 8*s, st-lpib);
+stl_le_pci_dma(d-pci, addr + 8*s, st-lpib);
 }
 dprint(d, 3, dma: --\n);
 
@@ -470,7 +470,7 @@ static void intel_hda_parse_bdl(IntelHDAState *d, 
IntelHDAStream *st)
 g_free(st-bpl);
 st-bpl = g_malloc(sizeof(bpl) * st-bentries);
 for (i = 0; i  st-bentries; i++, addr += 16) {
-cpu_physical_memory_read(addr, buf, 16);
+pci_dma_read(d-pci, addr, buf, 16);
 st-bpl[i].addr  = le64_to_cpu(*(uint64_t *)buf);
 st-bpl[i].len   = le32_to_cpu(*(uint32_t *)(buf + 8));
 st-bpl[i].flags = le32_to_cpu(*(uint32_t *)(buf + 12));
-- 
1.7.6.3

[Qemu-devel] [PATCH 03/12] eepro100: Use PCI DMA stub functions

2011-10-14 Thread David Gibson

From: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro

This updates the eepro100 device emulation to use the explicit PCI DMA
functions, instead of directly calling physical memory access functions.

Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
Signed-off-by: David Gibson d...@au1.ibm.com
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 hw/eepro100.c |  121 +++--
 1 files changed, 49 insertions(+), 72 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 4e3c52f..7d59e71 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -46,6 +46,7 @@
 #include net.h
 #include eeprom93xx.h
 #include sysemu.h
+#include dma.h
 
 /* QEMU sends frames smaller than 60 bytes to ethernet nics.
  * Such frames are rejected by real nics and their emulations.
@@ -315,38 +316,6 @@ static const uint16_t eepro100_mdi_mask[] = {
 0x, 0x, 0x, 0x, 0x, 0x, 0x, 0x,
 };
 
-/* Read a 16 bit little endian value from physical memory. */
-static uint16_t e100_ldw_le_phys(target_phys_addr_t addr)
-{
-/* Load 16 bit (little endian) word from emulated hardware. */
-uint16_t val;
-cpu_physical_memory_read(addr, val, sizeof(val));
-return le16_to_cpu(val);
-}
-
-/* Read a 32 bit little endian value from physical memory. */
-static uint32_t e100_ldl_le_phys(target_phys_addr_t addr)
-{
-/* Load 32 bit (little endian) word from emulated hardware. */
-uint32_t val;
-cpu_physical_memory_read(addr, val, sizeof(val));
-return le32_to_cpu(val);
-}
-
-/* Write a 16 bit little endian value to physical memory. */
-static void e100_stw_le_phys(target_phys_addr_t addr, uint16_t val)
-{
-val = cpu_to_le16(val);
-cpu_physical_memory_write(addr, val, sizeof(val));
-}
-
-/* Write a 32 bit little endian value to physical memory. */
-static void e100_stl_le_phys(target_phys_addr_t addr, uint32_t val)
-{
-val = cpu_to_le32(val);
-cpu_physical_memory_write(addr, val, sizeof(val));
-}
-
 #define POLYNOMIAL 0x04c11db6
 
 /* From FreeBSD */
@@ -744,21 +713,26 @@ static void dump_statistics(EEPRO100State * s)
  * values which really matter.
  * Number of data should check configuration!!!
  */
-cpu_physical_memory_write(s-statsaddr, s-statistics, s-stats_size);
-e100_stl_le_phys(s-statsaddr + 0, s-statistics.tx_good_frames);
-e100_stl_le_phys(s-statsaddr + 36, s-statistics.rx_good_frames);
-e100_stl_le_phys(s-statsaddr + 48, s-statistics.rx_resource_errors);
-e100_stl_le_phys(s-statsaddr + 60, s-statistics.rx_short_frame_errors);
+pci_dma_write(s-dev, s-statsaddr,
+  (uint8_t *) s-statistics, s-stats_size);
+stl_le_pci_dma(s-dev, s-statsaddr + 0,
+   s-statistics.tx_good_frames);
+stl_le_pci_dma(s-dev, s-statsaddr + 36,
+   s-statistics.rx_good_frames);
+stl_le_pci_dma(s-dev, s-statsaddr + 48,
+   s-statistics.rx_resource_errors);
+stl_le_pci_dma(s-dev, s-statsaddr + 60,
+   s-statistics.rx_short_frame_errors);
 #if 0
-e100_stw_le_phys(s-statsaddr + 76, s-statistics.xmt_tco_frames);
-e100_stw_le_phys(s-statsaddr + 78, s-statistics.rcv_tco_frames);
+stw_le_pci_dma(s-dev, s-statsaddr + 76, s-statistics.xmt_tco_frames);
+stw_le_pci_dma(s-dev, s-statsaddr + 78, s-statistics.rcv_tco_frames);
 missing(CU dump statistical counters);
 #endif
 }
 
 static void read_cb(EEPRO100State *s)
 {
-cpu_physical_memory_read(s-cb_address, s-tx, sizeof(s-tx));
+pci_dma_read(s-dev, s-cb_address, (uint8_t *) s-tx, sizeof(s-tx));
 s-tx.status = le16_to_cpu(s-tx.status);
 s-tx.command = le16_to_cpu(s-tx.command);
 s-tx.link = le32_to_cpu(s-tx.link);
@@ -788,18 +762,17 @@ static void tx_command(EEPRO100State *s)
 }
 assert(tcb_bytes = sizeof(buf));
 while (size  tcb_bytes) {
-uint32_t tx_buffer_address = e100_ldl_le_phys(tbd_address);
-uint16_t tx_buffer_size = e100_ldw_le_phys(tbd_address + 4);
+uint32_t tx_buffer_address = ldl_le_pci_dma(s-dev, tbd_address);
+uint16_t tx_buffer_size = lduw_le_pci_dma(s-dev, tbd_address + 4);
 #if 0
-uint16_t tx_buffer_el = e100_ldw_le_phys(tbd_address + 6);
+uint16_t tx_buffer_el = lduw_le_pci_dma(s-dev, tbd_address + 6);
 #endif
 tbd_address += 8;
 TRACE(RXTX, logout
 (TBD (simplified mode): buffer address 0x%08x, size 0x%04x\n,
  tx_buffer_address, tx_buffer_size));
 tx_buffer_size = MIN(tx_buffer_size, sizeof(buf) - size);
-cpu_physical_memory_read(tx_buffer_address, buf[size],
- tx_buffer_size);
+pci_dma_read(s-dev, tx_buffer_address, buf[size], tx_buffer_size);
 size += tx_buffer_size;
 }
 if (tbd_array == 0x) {
@@ -810,16 +783,19 @@ static void tx_command(EEPRO100State *s)
 if (s-has_extended_tcb_support  !(s-configuration[6]

[Qemu-devel] [PATCH 04/12] ac97: Use PCI DMA stub functions

2011-10-14 Thread David Gibson

From: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro

This updates the ac97 device emulation to use the explicit PCI DMA
functions, instead of directly calling physical memory access functions.

Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
Signed-off-by: David Gibson da...@gibson.dropbear.id.au
---
 hw/ac97.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/ac97.c b/hw/ac97.c
index bc69d4e..6800af4 100644
--- a/hw/ac97.c
+++ b/hw/ac97.c
@@ -18,6 +18,7 @@
 #include audiodev.h
 #include audio/audio.h
 #include pci.h
+#include dma.h
 
 enum {
 AC97_Reset = 0x00,
@@ -224,7 +225,7 @@ static void fetch_bd (AC97LinkState *s, AC97BusMasterRegs 
*r)
 {
 uint8_t b[8];
 
-cpu_physical_memory_read (r-bdbar + r-civ * 8, b, 8);
+pci_dma_read (s-dev, r-bdbar + r-civ * 8, b, 8);
 r-bd_valid = 1;
 r-bd.addr = le32_to_cpu (*(uint32_t *) b[0])  ~3;
 r-bd.ctl_len = le32_to_cpu (*(uint32_t *) b[4]);
@@ -973,7 +974,7 @@ static int write_audio (AC97LinkState *s, AC97BusMasterRegs 
*r,
 while (temp) {
 int copied;
 to_copy = audio_MIN (temp, sizeof (tmpbuf));
-cpu_physical_memory_read (addr, tmpbuf, to_copy);
+pci_dma_read (s-dev, addr, tmpbuf, to_copy);
 copied = AUD_write (s-voice_po, tmpbuf, to_copy);
 dolog (write_audio max=%x to_copy=%x copied=%x\n,
max, to_copy, copied);
@@ -1054,7 +1055,7 @@ static int read_audio (AC97LinkState *s, 
AC97BusMasterRegs *r,
 *stop = 1;
 break;
 }
-cpu_physical_memory_write (addr, tmpbuf, acquired);
+pci_dma_write (s-dev, addr, tmpbuf, acquired);
 temp -= acquired;
 addr += acquired;
 nread += acquired;
-- 
1.7.6.3

[Qemu-devel] [PATCH 12/12] usb-uhci: Use PCI DMA stub functions

2011-10-14 Thread David Gibson

This updates the usb-uhci device emulation to use the explicit PCI DMA
wrapper to initialize its scatter/gathjer structure.  This means this
driver should not need further changes when the sglist interface is
extended to support IOMMUs.

Signed-off-by: David Gibson da...@gibson.dropbear.id.au
---
 hw/usb-uhci.c |   22 +++---
 1 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/hw/usb-uhci.c b/hw/usb-uhci.c
index 17992cf..6f8ea21 100644
--- a/hw/usb-uhci.c
+++ b/hw/usb-uhci.c
@@ -178,7 +178,7 @@ static UHCIAsync *uhci_async_alloc(UHCIState *s)
 async-done  = 0;
 async-isoc  = 0;
 usb_packet_init(async-packet);
-qemu_sglist_init(async-sgl, 1);
+pci_dma_sglist_init(async-sgl, s-dev, 1);
 
 return async;
 }
@@ -876,7 +876,7 @@ static void uhci_async_complete(USBPort *port, USBPacket 
*packet)
 uint32_t link = async-td;
 uint32_t int_mask = 0, val;
 
-cpu_physical_memory_read(link  ~0xf, (uint8_t *) td, sizeof(td));
+pci_dma_read(s-dev, link  ~0xf, (uint8_t *) td, sizeof(td));
 le32_to_cpus(td.link);
 le32_to_cpus(td.ctrl);
 le32_to_cpus(td.token);
@@ -888,8 +888,8 @@ static void uhci_async_complete(USBPort *port, USBPacket 
*packet)
 
 /* update the status bits of the TD */
 val = cpu_to_le32(td.ctrl);
-cpu_physical_memory_write((link  ~0xf) + 4,
-  (const uint8_t *)val, sizeof(val));
+pci_dma_write(s-dev, (link  ~0xf) + 4,
+  (const uint8_t *)val, sizeof(val));
 uhci_async_free(s, async);
 } else {
 async-done = 1;
@@ -952,7 +952,7 @@ static void uhci_process_frame(UHCIState *s)
 
 DPRINTF(uhci: processing frame %d addr 0x%x\n , s-frnum, frame_addr);
 
-cpu_physical_memory_read(frame_addr, (uint8_t *)link, 4);
+pci_dma_read(s-dev, frame_addr, (uint8_t *)link, 4);
 le32_to_cpus(link);
 
 int_mask = 0;
@@ -976,7 +976,7 @@ static void uhci_process_frame(UHCIState *s)
 break;
 }
 
-cpu_physical_memory_read(link  ~0xf, (uint8_t *) qh, sizeof(qh));
+pci_dma_read(s-dev, link  ~0xf, (uint8_t *) qh, sizeof(qh));
 le32_to_cpus(qh.link);
 le32_to_cpus(qh.el_link);
 
@@ -996,7 +996,7 @@ static void uhci_process_frame(UHCIState *s)
 }
 
 /* TD */
-cpu_physical_memory_read(link  ~0xf, (uint8_t *) td, sizeof(td));
+pci_dma_read(s-dev, link  ~0xf, (uint8_t *) td, sizeof(td));
 le32_to_cpus(td.link);
 le32_to_cpus(td.ctrl);
 le32_to_cpus(td.token);
@@ -1010,8 +1010,8 @@ static void uhci_process_frame(UHCIState *s)
 if (old_td_ctrl != td.ctrl) {
 /* update the status bits of the TD */
 val = cpu_to_le32(td.ctrl);
-cpu_physical_memory_write((link  ~0xf) + 4,
-  (const uint8_t *)val, sizeof(val));
+pci_dma_write(s-dev, (link  ~0xf) + 4,
+  (const uint8_t *)val, sizeof(val));
 }
 
 if (ret  0) {
@@ -1039,8 +1039,8 @@ static void uhci_process_frame(UHCIState *s)
/* update QH element link */
 qh.el_link = link;
 val = cpu_to_le32(qh.el_link);
-cpu_physical_memory_write((curr_qh  ~0xf) + 4,
-  (const uint8_t *)val, sizeof(val));
+pci_dma_write(s-dev, (curr_qh  ~0xf) + 4,
+  (const uint8_t *)val, sizeof(val));
 
 if (!depth_first(link)) {
/* done with this QH */
-- 
1.7.6.3

[Qemu-devel] [PATCH 02/12] rtl8139: Use PCI DMA stub functions

2011-10-14 Thread David Gibson

From: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro

This updates the rtl8139 device emulation to use the explicit PCI DMA
functions, instead of directly calling physical memory access functions.

Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
Signed-off-by: David Gibson da...@gibson.dropbear.id.au
---
 hw/rtl8139.c |   98 +
 1 files changed, 50 insertions(+), 48 deletions(-)

diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index 3753950..d2c4980 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -53,6 +53,7 @@
 
 #include hw.h
 #include pci.h
+#include dma.h
 #include qemu-timer.h
 #include net.h
 #include loader.h
@@ -427,9 +428,6 @@ typedef struct RTL8139TallyCounters
 /* Clears all tally counters */
 static void RTL8139TallyCounters_clear(RTL8139TallyCounters* counters);
 
-/* Writes tally counters to specified physical memory address */
-static void RTL8139TallyCounters_physical_memory_write(target_phys_addr_t 
tc_addr, RTL8139TallyCounters* counters);
-
 typedef struct RTL8139State {
 PCIDevice dev;
 uint8_t phys[8]; /* mac address */
@@ -512,6 +510,9 @@ typedef struct RTL8139State {
 int rtl8139_mmio_io_addr_dummy;
 } RTL8139State;
 
+/* Writes tally counters to memory via DMA */
+static void RTL8139TallyCounters_dma_write(RTL8139State *s, dma_addr_t 
tc_addr);
+
 static void rtl8139_set_next_tctr_time(RTL8139State *s, int64_t current_time);
 
 static void prom9346_decode_command(EEprom9346 *eeprom, uint8_t command)
@@ -773,15 +774,15 @@ static void rtl8139_write_buffer(RTL8139State *s, const 
void *buf, int size)
 
 if (size  wrapped)
 {
-cpu_physical_memory_write( s-RxBuf + s-RxBufAddr,
-   buf, size-wrapped );
+pci_dma_write(s-dev, s-RxBuf + s-RxBufAddr,
+  buf, size-wrapped);
 }
 
 /* reset buffer pointer */
 s-RxBufAddr = 0;
 
-cpu_physical_memory_write( s-RxBuf + s-RxBufAddr,
-   buf + (size-wrapped), wrapped );
+pci_dma_write(s-dev, s-RxBuf + s-RxBufAddr,
+  buf + (size-wrapped), wrapped);
 
 s-RxBufAddr = wrapped;
 
@@ -790,13 +791,13 @@ static void rtl8139_write_buffer(RTL8139State *s, const 
void *buf, int size)
 }
 
 /* non-wrapping path or overwrapping enabled */
-cpu_physical_memory_write( s-RxBuf + s-RxBufAddr, buf, size );
+pci_dma_write(s-dev, s-RxBuf + s-RxBufAddr, buf, size);
 
 s-RxBufAddr += size;
 }
 
 #define MIN_BUF_SIZE 60
-static inline target_phys_addr_t rtl8139_addr64(uint32_t low, uint32_t high)
+static inline dma_addr_t rtl8139_addr64(uint32_t low, uint32_t high)
 {
 #if TARGET_PHYS_ADDR_BITS  32
 return low | ((target_phys_addr_t)high  32);
@@ -979,7 +980,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 /* w3 high 32bit of Rx buffer ptr */
 
 int descriptor = s-currCPlusRxDesc;
-target_phys_addr_t cplus_rx_ring_desc;
+dma_addr_t cplus_rx_ring_desc;
 
 cplus_rx_ring_desc = rtl8139_addr64(s-RxRingAddrLO, s-RxRingAddrHI);
 cplus_rx_ring_desc += 16 * descriptor;
@@ -990,13 +991,13 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 
 uint32_t val, rxdw0,rxdw1,rxbufLO,rxbufHI;
 
-cpu_physical_memory_read(cplus_rx_ring_desc,(uint8_t *)val, 4);
+pci_dma_read(s-dev, cplus_rx_ring_desc, (uint8_t *)val, 4);
 rxdw0 = le32_to_cpu(val);
-cpu_physical_memory_read(cplus_rx_ring_desc+4,  (uint8_t *)val, 4);
+pci_dma_read(s-dev, cplus_rx_ring_desc+4, (uint8_t *)val, 4);
 rxdw1 = le32_to_cpu(val);
-cpu_physical_memory_read(cplus_rx_ring_desc+8,  (uint8_t *)val, 4);
+pci_dma_read(s-dev, cplus_rx_ring_desc+8, (uint8_t *)val, 4);
 rxbufLO = le32_to_cpu(val);
-cpu_physical_memory_read(cplus_rx_ring_desc+12, (uint8_t *)val, 4);
+pci_dma_read(s-dev, cplus_rx_ring_desc+12, (uint8_t *)val, 4);
 rxbufHI = le32_to_cpu(val);
 
 DPRINTF(+++ C+ mode RX descriptor %d %08x %08x %08x %08x\n,
@@ -1060,16 +1061,16 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 return size_;
 }
 
-target_phys_addr_t rx_addr = rtl8139_addr64(rxbufLO, rxbufHI);
+dma_addr_t rx_addr = rtl8139_addr64(rxbufLO, rxbufHI);
 
 /* receive/copy to target memory */
 if (dot1q_buf) {
-cpu_physical_memory_write(rx_addr, buf, 2 * ETHER_ADDR_LEN);
-cpu_physical_memory_write(rx_addr + 2 * ETHER_ADDR_LEN,
-buf + 2 * ETHER_ADDR_LEN + VLAN_HLEN,
-size - 2 * ETHER_ADDR_LEN);
+pci_dma_write(s-dev, rx_addr, buf, 2 * ETHER_ADDR_LEN);
+pci_dma_write(s-dev, rx_addr + 2 * ETHER_ADDR_LEN,
+

Re: [Qemu-devel] [PATCH v3 03/15] add qemu_send_full and qemu_recv_full

2011-10-14 Thread Paolo Bonzini


On 10/14/2011 11:52 AM, Kevin Wolf wrote:

Am 05.10.2011 09:17, schrieb Paolo Bonzini:

Signed-off-by: Paolo Bonzinipbonz...@redhat.com
---
  osdep.c   |   67 +
  qemu-common.h |4 +++
  2 files changed, 71 insertions(+), 0 deletions(-)

diff --git a/osdep.c b/osdep.c
index 56e6963..718a25d 100644
--- a/osdep.c
+++ b/osdep.c
@@ -166,3 +166,70 @@ int qemu_accept(int s, struct sockaddr *addr, socklen_t 
*addrlen)

  return ret;
  }
+
+/*
+ * A variant of send(2) which handles partial write.
+ *
+ * Return the number of bytes transferred, which is only
+ * smaller than `count' if there is an error.
+ *
+ * This function won't work with non-blocking fd's.
+ * Any of the possibilities with non-bloking fd's is bad:
+ *   - return a short write (then name is wrong)
+ *   - busy wait adding (errno == EAGAIN) to the loop
+ */
+ssize_t qemu_send_full(int fd, const void *buf, size_t count, int flags)
+{
+ssize_t ret = 0;
+ssize_t total = 0;
+
+while (count) {
+ret = send(fd, buf, count, flags);
+if (ret  0) {
+if (errno == EINTR) {
+continue;
+}
+break;
+}
+
+count -= ret;
+buf += ret;
+total += ret;
+}
+
+return total;
+}
+
+/*
+ * A variant of recv(2) which handles partial write.
+ *
+ * Return the number of bytes transferred, which is only
+ * smaller than `count' if there is an error.
+ *
+ * This function won't work with non-blocking fd's.
+ * Any of the possibilities with non-bloking fd's is bad:
+ *   - return a short write (then name is wrong)
+ *   - busy wait adding (errno == EAGAIN) to the loop
+ */
+ssize_t qemu_recv_full(int fd, const void *buf, size_t count, int flags)
+{
+ssize_t ret = 0;
+ssize_t total = 0;
+
+while (count) {
+ret = recv(fd, buf, count, flags);


osdep.c: In function 'qemu_recv_full':
osdep.c:220: error: passing argument 2 of 'recv' discards qualifiers
from pointer target type
/usr/include/bits/socket2.h:35: note: expected 'void *' but argument is
of type 'const void *'


It's fixed in 4/15's osdep.c.  I attach the diff, and pushed the fixed 
version to github nbd-trim.


Also, all branches there are now rebased on top of block branch.

Paolo

diff --git a/osdep.c b/osdep.c
index 718a25d..70bad27 100644
--- a/osdep.c
+++ b/osdep.c
@@ -211,13 +211,13 @@ ssize_t qemu_send_full(int fd, const void *buf, size_t count, int flags)
  *   - return a short write (then name is wrong)
  *   - busy wait adding (errno == EAGAIN) to the loop
  */
-ssize_t qemu_recv_full(int fd, const void *buf, size_t count, int flags)
+ssize_t qemu_recv_full(int fd, void *buf, size_t count, int flags)
 {
 ssize_t ret = 0;
 ssize_t total = 0;
 
 while (count) {
-ret = recv(fd, buf, count, flags);
+ret = qemu_recv(fd, buf, count, flags);
 if (ret = 0) {
 if (ret  0  errno == EINTR) {
 continue;
diff --git a/roms/SLOF b/roms/SLOF
index d1d6b53..b94bde0 16
--- a/roms/SLOF
+++ b/roms/SLOF
@@ -1 +1 @@
-Subproject commit d1d6b53b713a2b7c2c25685268fa932d28a4b4c0
+Subproject commit b94bde008b0d49ec4bfe933e110d0952d032ac28

[Qemu-devel] [PATCH 08/12] pcnet-pci: Use PCI DMA stub functions

2011-10-14 Thread David Gibson

From: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro

This updates the pcnet-pci device emulation to use the explicit PCI DMA
functions, instead of directly calling physical memory access functions.

Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro
Signed-off-by: David Gibson da...@gibson.dropbear.id.au
---
 hw/pcnet-pci.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/pcnet-pci.c b/hw/pcnet-pci.c
index fb2a00c..41a6e07 100644
--- a/hw/pcnet-pci.c
+++ b/hw/pcnet-pci.c
@@ -31,6 +31,7 @@
 #include net.h
 #include loader.h
 #include qemu-timer.h
+#include dma.h
 
 #include pcnet.h
 
@@ -230,13 +231,13 @@ static const MemoryRegionOps pcnet_mmio_ops = {
 static void pci_physical_memory_write(void *dma_opaque, target_phys_addr_t 
addr,
   uint8_t *buf, int len, int do_bswap)
 {
-cpu_physical_memory_write(addr, buf, len);
+pci_dma_write(dma_opaque, addr, buf, len);
 }
 
 static void pci_physical_memory_read(void *dma_opaque, target_phys_addr_t addr,
  uint8_t *buf, int len, int do_bswap)
 {
-cpu_physical_memory_read(addr, buf, len);
+pci_dma_read(dma_opaque, addr, buf, len);
 }
 
 static void pci_pcnet_cleanup(VLANClientState *nc)
@@ -302,6 +303,7 @@ static int pci_pcnet_init(PCIDevice *pci_dev)
 s-irq = pci_dev-irq[0];
 s-phys_mem_read = pci_physical_memory_read;
 s-phys_mem_write = pci_physical_memory_write;
+s-dma_opaque = pci_dev;
 
 if (!pci_dev-qdev.hotplugged) {
 static int loaded = 0;
-- 
1.7.6.3

[Qemu-devel] [PATCH 10/12] PCI IDE: Use PCI DMA stub functions

2011-10-14 Thread David Gibson

This updates the PCI IDE device emulation to use the explicit PCI DMA
wrapper to initialize its scatter/gathjer structure.  This means this
driver should not need further changes when the sglist interface is
extended to support IOMMUs.

Signed-off-by: David Gibson da...@gibson.dropbear.id.au
---
 hw/ide/pci.c |   15 ---
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index f133c42..0747e09 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -62,7 +62,8 @@ static int bmdma_prepare_buf(IDEDMA *dma, int is_write)
 } prd;
 int l, len;
 
-qemu_sglist_init(s-sg, s-nsector / (BMDMA_PAGE_SIZE / 512) + 1);
+pci_dma_sglist_init(s-sg, bm-pci_dev-dev,
+s-nsector / (BMDMA_PAGE_SIZE / 512) + 1);
 s-io_buffer_size = 0;
 for(;;) {
 if (bm-cur_prd_len == 0) {
@@ -70,7 +71,7 @@ static int bmdma_prepare_buf(IDEDMA *dma, int is_write)
 if (bm-cur_prd_last ||
 (bm-cur_addr - bm-addr) = BMDMA_PAGE_SIZE)
 return s-io_buffer_size != 0;
-cpu_physical_memory_read(bm-cur_addr, (uint8_t *)prd, 8);
+pci_dma_read(bm-pci_dev-dev, bm-cur_addr, (uint8_t *)prd, 8);
 bm-cur_addr += 8;
 prd.addr = le32_to_cpu(prd.addr);
 prd.size = le32_to_cpu(prd.size);
@@ -112,7 +113,7 @@ static int bmdma_rw_buf(IDEDMA *dma, int is_write)
 if (bm-cur_prd_last ||
 (bm-cur_addr - bm-addr) = BMDMA_PAGE_SIZE)
 return 0;
-cpu_physical_memory_read(bm-cur_addr, (uint8_t *)prd, 8);
+pci_dma_read(bm-pci_dev-dev, bm-cur_addr, (uint8_t *)prd, 8);
 bm-cur_addr += 8;
 prd.addr = le32_to_cpu(prd.addr);
 prd.size = le32_to_cpu(prd.size);
@@ -127,11 +128,11 @@ static int bmdma_rw_buf(IDEDMA *dma, int is_write)
 l = bm-cur_prd_len;
 if (l  0) {
 if (is_write) {
-cpu_physical_memory_write(bm-cur_prd_addr,
-  s-io_buffer + s-io_buffer_index, 
l);
+pci_dma_write(bm-pci_dev-dev, bm-cur_prd_addr,
+  s-io_buffer + s-io_buffer_index, l);
 } else {
-cpu_physical_memory_read(bm-cur_prd_addr,
-  s-io_buffer + s-io_buffer_index, 
l);
+pci_dma_read(bm-pci_dev-dev, bm-cur_prd_addr,
+ s-io_buffer + s-io_buffer_index, l);
 }
 bm-cur_prd_addr += l;
 bm-cur_prd_len -= l;
-- 
1.7.6.3

[Qemu-devel] [Bug 874038] Re: ARM thumb2 does not propogate carry flag properly

2011-10-14 Thread Alexey Starikovskiy

** Patch added: fix for the problem
   
https://bugs.launchpad.net/bugs/874038/+attachment/2542719/+files/fix_carry_in_thumb2

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/874038

Title:
  ARM thumb2 does not propogate carry flag properly

Status in QEMU:
  New

Bug description:
  information on carry flag is lost if gen_set_CF_bit31(t1) is called
  after logic operation.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/874038/+subscriptions

[Qemu-devel] [Bug 874038] [NEW] ARM thumb2 does not propogate carry flag properly

2011-10-14 Thread Alexey Starikovskiy

Public bug reported:

information on carry flag is lost if gen_set_CF_bit31(t1) is called
after logic operation.

** Affects: qemu
 Importance: Undecided
 Status: New


** Tags: arm flags

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/874038

Title:
  ARM thumb2 does not propogate carry flag properly

Status in QEMU:
  New

Bug description:
  information on carry flag is lost if gen_set_CF_bit31(t1) is called
  after logic operation.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/874038/+subscriptions

Re: [Qemu-devel] [v7 Patch 1/5]Qemu: Enhance info block to display host cache setting

2011-10-14 Thread Supriya Kannery


On 10/12/2011 07:47 PM, Kevin Wolf wrote:

Am 11.10.2011 05:10, schrieb Supriya Kannery:

Enhance info block to display hostcache setting for each
block device.



+if (qdict_haskey(bs_dict, open_flags)) {
+int open_flags = qdict_get_int(bs_dict, open_flags);
+if (open_flags  BDRV_O_NOCACHE)
+monitor_printf(mon,  hostcache=0);
+else
+monitor_printf(mon,  hostcache=1);


Coding style requires braces.



ok..will add. checkpatch.pl didn't catch this!



  bs_obj = qobject_from_jsonf({ 'device': %s, 'type': 'unknown', 
-'removable': %i, 'locked': %i },
+'removable': %i, 'locked': %i, 
+'hostcache': %i },
  bs-device_name,
  bdrv_dev_has_removable_media(bs),
-bdrv_dev_is_medium_locked(bs));
+bdrv_dev_is_medium_locked(bs),
+!(bs-open_flags  BDRV_O_NOCACHE));
  bs_dict = qobject_to_qdict(bs_obj);
+qdict_put(bs_dict, open_flags, qint_from_int(bs-open_flags));


No. This adds a open_flags field to the QMP structure that is
transferred to clients. This is wrong, open_flags is an internal thing
that should never be visible on an interface.

In bdrv_print_dict, access the hostcache field that you introduced, it
provides the same information.



Will replace open_flags with hostcache field.

thanks, Supriya

[Qemu-devel] [PATCH] rtl8139: check the buffer availiability

2011-10-14 Thread Jason Wang

Reduce spurious packet drops on RX ring empty when in c+ mode by verifying that
we have at least 1 buffer ahead of the time.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 hw/rtl8139.c |   43 +--
 1 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index 3753950..c654d5d 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -84,6 +84,19 @@
 #define VLAN_TCI_LEN 2
 #define VLAN_HLEN (ETHER_TYPE_LEN + VLAN_TCI_LEN)
 
+/* w0 ownership flag */
+#define CP_RX_OWN (131)
+/* w0 end of ring flag */
+#define CP_RX_EOR (130)
+/* w0 bits 0...12 : buffer size */
+#define CP_RX_BUFFER_SIZE_MASK ((113) - 1)
+/* w1 tag available flag */
+#define CP_RX_TAVA (116)
+/* w1 bits 0...15 : VLAN tag */
+#define CP_RX_VLAN_TAG_MASK ((116) - 1)
+/* w2 low  32bit of Rx buffer ptr */
+/* w3 high 32bit of Rx buffer ptr */
+
 #if defined (DEBUG_RTL8139)
 #  define DPRINTF(fmt, ...) \
 do { fprintf(stderr, RTL8139:  fmt, ## __VA_ARGS__); } while (0)
@@ -805,6 +818,21 @@ static inline target_phys_addr_t rtl8139_addr64(uint32_t 
low, uint32_t high)
 #endif
 }
 
+/* Verify that we have at least one available rx buffer */
+static int rtl8139_cp_has_rxbuf(RTL8139State *s)
+{
+uint32_t val, rxdw0;
+target_phys_addr_t cplus_rx_ring_desc = rtl8139_addr64(s-RxRingAddrLO,
+   s-RxRingAddrHI);
+cplus_rx_ring_desc += 16 * s-currCPlusRxDesc;
+cpu_physical_memory_read(cplus_rx_ring_desc, (uint8_t *)val, 4);
+rxdw0 = le32_to_cpu(val);
+if (rxdw0  CP_RX_OWN)
+return 1;
+else
+return 0;
+}
+
 static int rtl8139_can_receive(VLANClientState *nc)
 {
 RTL8139State *s = DO_UPCAST(NICState, nc, nc)-opaque;
@@ -819,7 +847,7 @@ static int rtl8139_can_receive(VLANClientState *nc)
 if (rtl8139_cp_receiver_enabled(s)) {
 /* ??? Flow control not implemented in c+ mode.
This is a hack to work around slirp deficiencies anyway.  */
-return 1;
+return rtl8139_cp_has_rxbuf(s);
 } else {
 avail = MOD2(s-RxBufferSize + s-RxBufPtr - s-RxBufAddr,
  s-RxBufferSize);
@@ -965,19 +993,6 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 
 /* begin C+ receiver mode */
 
-/* w0 ownership flag */
-#define CP_RX_OWN (131)
-/* w0 end of ring flag */
-#define CP_RX_EOR (130)
-/* w0 bits 0...12 : buffer size */
-#define CP_RX_BUFFER_SIZE_MASK ((113) - 1)
-/* w1 tag available flag */
-#define CP_RX_TAVA (116)
-/* w1 bits 0...15 : VLAN tag */
-#define CP_RX_VLAN_TAG_MASK ((116) - 1)
-/* w2 low  32bit of Rx buffer ptr */
-/* w3 high 32bit of Rx buffer ptr */
-
 int descriptor = s-currCPlusRxDesc;
 target_phys_addr_t cplus_rx_ring_desc;

Re: [Qemu-devel] [PATCH 2/4] block: unify flush implementations

2011-10-14 Thread Kevin Wolf

Am 14.10.2011 10:41, schrieb Paolo Bonzini:
 Add coroutine support for flush and apply the same emulation that
 we already do for read/write.  bdrv_aio_flush is simplified to always
 go through a coroutine.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

To make the implementation more consistent with read/write operations,
wouldn't it make sense to provide a bdrv_co_flush() globally instead of
using the synchronous version as the preferred public interface?

This is the semantics that I would expect of a bdrv_co_flush() anyway,
your use of it for an AIO emulation functions confused me a bit at first.

Kevin

Re: [Qemu-devel] [v7 Patch 5/5]Qemu: New struct 'BDRVReopenState' for image files reopen

2011-10-14 Thread Stefan Hajnoczi

On Tue, Oct 11, 2011 at 08:41:59AM +0530, Supriya Kannery wrote:
 Index: qemu/block.c
 ===
 --- qemu.orig/block.c
 +++ qemu/block.c
 @@ -706,6 +706,7 @@ int bdrv_reopen(BlockDriverState *bs, in
  {
  BlockDriver *drv = bs-drv;
  int ret = 0, open_flags;
 +BDRVReopenState *rs;

BDRVReopenState *rs = NULL;

If the abort path is taken we need to make sure rs has a defined value.
Note that the abort path currently doesn't handle rs == NULL and will
segfault in raw_reopen_abort().

[Qemu-devel] website

2011-10-14 Thread ronnie sahlberg

http://wiki.qemu.org/Main_Page

Someone may want to have a look over the website, the links in the
Contribute section in the column on the left.
All pages seem to primarily discuss the quality of Pandora jewelry and
how beneficial it would be to purchase such.

Re: [Qemu-devel] [v7 Patch 4/5]Qemu: Add commandline -drive option 'hostcache'

2011-10-14 Thread Supriya Kannery


On 10/12/2011 08:00 PM, Kevin Wolf wrote:

Am 11.10.2011 05:11, schrieb Supriya Kannery:

qemu command option 'hostcache' added to -drive for block devices.
While starting a VM from qemu commandline, this option can be used
for setting host cache usage for block data access. Simultaneous
use of 'hostcache' and 'cache' options not allowed.

Signed-off-by: Supriya Kannerysupri...@linux.vnet.ibm.com


I'm not sure if introducing this alone makes sense. I think I would only
do it when we introduce more options that allow replacing all cache=...
options by other parameters.



Can we do transition to alternatives for 'cache=' in a phased manner?
Until all other params are ready, we can allow hostcache (as well
as other params as and when they are ready) in cmdline with the
condition that 'cache=x', if specified, overrides these params.
Once we have all other params ready, 'cache=' can be replaced completely.

thanks, Supriya

Re: [Qemu-devel] [v7 Patch 4/5]Qemu: Add commandline -drive option 'hostcache'

2011-10-14 Thread Kevin Wolf

Am 14.10.2011 13:19, schrieb Supriya Kannery:
 On 10/12/2011 08:00 PM, Kevin Wolf wrote:
 Am 11.10.2011 05:11, schrieb Supriya Kannery:
 qemu command option 'hostcache' added to -drive for block devices.
 While starting a VM from qemu commandline, this option can be used
 for setting host cache usage for block data access. Simultaneous
 use of 'hostcache' and 'cache' options not allowed.

 Signed-off-by: Supriya Kannerysupri...@linux.vnet.ibm.com

 I'm not sure if introducing this alone makes sense. I think I would only
 do it when we introduce more options that allow replacing all cache=...
 options by other parameters.

 
 Can we do transition to alternatives for 'cache=' in a phased manner?
 Until all other params are ready, we can allow hostcache (as well
 as other params as and when they are ready) in cmdline with the
 condition that 'cache=x', if specified, overrides these params.
 Once we have all other params ready, 'cache=' can be replaced completely.

I guess that would be good enough. There's still not much use in
specifying hostcache=... at the same time as cache=... but at least it
doesn't take away other options then.

Kevin

Re: [Qemu-devel] [PATCH] savevm: qemu_savevm_state(): Drop stop VM logic

2011-10-14 Thread Juan Quintela

Luiz Capitulino lcapitul...@redhat.com wrote:
 qemu_savevm_state() has some logic to stop the VM and to (or not to)
 resume it. But this seems to be a big noop, as qemu_savevm_state()
 is only called by do_savevm() when the VM is already stopped.

 So, let's drop qemu_savevm_state()'s stop VM logic.

 Signed-off-by: Luiz Capitulino lcapitul...@redhat.com

Reviewed-by: Juan Quintela quint...@redhat.com

Re: [Qemu-devel] [PATCH 2/4] block: unify flush implementations

2011-10-14 Thread Paolo Bonzini


On 10/14/2011 01:08 PM, Kevin Wolf wrote:

Am 14.10.2011 10:41, schrieb Paolo Bonzini:

Add coroutine support for flush and apply the same emulation that
we already do for read/write.  bdrv_aio_flush is simplified to always
go through a coroutine.

Signed-off-by: Paolo Bonzinipbonz...@redhat.com


To make the implementation more consistent with read/write operations,
wouldn't it make sense to provide a bdrv_co_flush() globally instead of
using the synchronous version as the preferred public interface?


I thought about it, but then it turned out that I would have

bdrv_flush
- create coroutine or just fast-path to bdrv_flush_co_entry
   - bdrv_flush_co_entry
  - driver

and

bdrv_co_flush
- bdrv_flush_co_entry
   - driver

In other words, the code would be exactly the same, save for an if 
(qemu_in_coroutine()).  The reason is that, unlike read/write, neither 
flush nor discard take a qiov.


In general, I think that with Stefan's cleanup having specialized 
coroutine versions has in general a much smaller benefit.  The code 
reading benefit of naming routines like bdrv_co_* is already lost, for 
example, since bdrv_read can yield when called for coroutine context.


Let me show how this might go.  Right now you have

bdrv_read/write
- bdrv_rw_co
   - create qiov
   - create coroutine or just fast-path to bdrv_rw_co_entry
  - bdrv_rw_co_entry
 - bdrv_co_do_readv/writev
- driver

bdrv_co_readv/writev
- bdrv_co_do_readv/writev
   - driver

But starting from here, you might just as well reorganize it like this:

bdrv_read/writev
- bdrv_rw_co
   - create qiov
   - bdrv_readv/writev

bdrv_readv/writev
- create coroutine or just fast-path to bdrv_rw_co_entry
   - bdrv_rw_co_entry
  - bdrv_co_do_readv/writev
 - driver

and just drop bdrv_co_readv, since it would just be hard-coding the 
fast-path of bdrv_readv.


Since some amount of synchronous I/O would likely always be there, for 
example in qemu-img, I think this unification would make more sense than 
providing two separate entrypoints for bdrv_co_flush and bdrv_flush.


Paolo

Re: [Qemu-devel] website

2011-10-14 Thread 陳韋任

 http://wiki.qemu.org/Main_Page
 
 Someone may want to have a look over the website, the links in the
 Contribute section in the column on the left.
 All pages seem to primarily discuss the quality of Pandora jewelry and
 how beneficial it would be to purchase such.

  Jcmvbkbc and I have reverted the spam.

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667

Re: [Qemu-devel] [PATCH 2/2] hda: do not mix output and input stream states, RHBZ #740493

2011-10-14 Thread Gerd Hoffmann


  Hi,


a) My understanding of this patch is that we move from an array of 16
bools representing anything to one array where the 1st 16 represent if
there are input and the 2nd 16's reprosenting if there are output for
that channel.


Correct.


So, what we should do if we migrate from one old version that only has
16 bools?  My understanding is that copying directly is not gonna work?


Yes.  Putting output first increases the chance that it works as sound 
playback is used much more than sound recording.


I think hda-audio doesn't need so safe any running state.  intel-hda 
knows which streams are running and it can call 
intel_hda_notify_codecs() for each stream in intel_hda_post_load().  The 
only problem is that intel_hda_post_load() might get called before 
hda-audio state is loaded.


Now how to handle compatibility?  We can have a running_compat[] array 
with the current (broken) semantics, so we can write out the state which 
older qemu versions expect to see.  Also on load running_compat[] will 
be filled, and any state already set by intel_hda_post_load in 
running_real[] (or however we'll name this) will be kept intact.


cheers,
  Gerd

Re: [Qemu-devel] [PATCH 2/4] block: unify flush implementations

2011-10-14 Thread Kevin Wolf

Am 14.10.2011 13:30, schrieb Paolo Bonzini:
 On 10/14/2011 01:08 PM, Kevin Wolf wrote:
 Am 14.10.2011 10:41, schrieb Paolo Bonzini:
 Add coroutine support for flush and apply the same emulation that
 we already do for read/write.  bdrv_aio_flush is simplified to always
 go through a coroutine.

 Signed-off-by: Paolo Bonzinipbonz...@redhat.com

 To make the implementation more consistent with read/write operations,
 wouldn't it make sense to provide a bdrv_co_flush() globally instead of
 using the synchronous version as the preferred public interface?
 
 I thought about it, but then it turned out that I would have
 
  bdrv_flush
  - create coroutine or just fast-path to bdrv_flush_co_entry
 - bdrv_flush_co_entry
- driver
 
 and
 
  bdrv_co_flush
  - bdrv_flush_co_entry
 - driver
 
 In other words, the code would be exactly the same, save for an if 
 (qemu_in_coroutine()).  The reason is that, unlike read/write, neither 
 flush nor discard take a qiov.

What I was thinking of looks a bit different:

bdrv_flush
- create coroutine or just fast-path to bdrv_flush_co_entry
   - bdrv_flush_co_entry
  - bdrv_co_flush

and

bdrv_co_flush
- driver

And the reason for this is that bdrv_co_flush would be a function that
does only little more than passing the function to the driver (just like
most bdrv_* functions do), with no emulation going on at all.

 In general, I think that with Stefan's cleanup having specialized 
 coroutine versions has in general a much smaller benefit.  The code 
 reading benefit of naming routines like bdrv_co_* is already lost, for 
 example, since bdrv_read can yield when called for coroutine context.

Instead of taking a void* and working on a RwCo structure that is really
meant for emulation, bdrv_co_flush would take a BlockDriverState and
improve readability this way.

The more complicated and ugly code would be left separated and only used
for emulation. I think that would make it easier to understand the
common path without being distracted by emulation code.

 Let me show how this might go.  Right now you have
 
  bdrv_read/write
  - bdrv_rw_co
 - create qiov
 - create coroutine or just fast-path to bdrv_rw_co_entry
 - bdrv_rw_co_entry
   - bdrv_co_do_readv/writev
  - driver
 
  bdrv_co_readv/writev
  - bdrv_co_do_readv/writev
 - driver
 
 But starting from here, you might just as well reorganize it like this:
 
  bdrv_read/writev
  - bdrv_rw_co
 - create qiov
 - bdrv_readv/writev
 
  bdrv_readv/writev
  - create coroutine or just fast-path to bdrv_rw_co_entry
 - bdrv_rw_co_entry
- bdrv_co_do_readv/writev
   - driver
 
 and just drop bdrv_co_readv, since it would just be hard-coding the 
 fast-path of bdrv_readv.

I guess it's all a matter of taste. Stefan, what do you think?

 Since some amount of synchronous I/O would likely always be there, for 
 example in qemu-img, I think this unification would make more sense than 
 providing two separate entrypoints for bdrv_co_flush and bdrv_flush.

Actually, I'm not so sure about qemu-img. I think we have thought of
scenarios where converting it to a coroutine based version with a main
loop would be helpful (can't remember the details, though).

Kevin

Re: [Qemu-devel] [PATCH 1/1 V5 tuning] kernel/kvm: introduce KVM_SET_LINT1 and fix improper nmi emulation

2011-10-14 Thread Sasha Levin

On Fri, 2011-10-14 at 17:51 +0800, Lai Jiangshan wrote:
 Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
 button event happens. This doesn't properly emulate real hardware on
 which NMI button event triggers LINT1. Because of this, NMI is sent to
 the processor even when LINT1 is masked in LVT. For example, this
 causes the problem that kdump initiated by NMI sometimes doesn't work
 on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
 
 With this patch, we introduce introduce KVM_SET_LINT1,
 and we can use KVM_SET_LINT1 to correctly emulate NMI button
 without change the old KVM_NMI behavior.
 
 Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
 Reported-by: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com
 ---

It could use a documentation update as well.

  arch/x86/kvm/irq.h   |1 +
  arch/x86/kvm/lapic.c |7 +++
  arch/x86/kvm/x86.c   |8 
  include/linux/kvm.h  |3 +++
  4 files changed, 19 insertions(+), 0 deletions(-)
 diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
 index 53e2d08..0c96315 100644
 --- a/arch/x86/kvm/irq.h
 +++ b/arch/x86/kvm/irq.h
 @@ -95,6 +95,7 @@ void kvm_pic_reset(struct kvm_kpic_state *s);
  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
  void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
  void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu);
 +void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu);
  void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu);
  void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu);
  void __kvm_migrate_timers(struct kvm_vcpu *vcpu);
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 57dcbd4..87fe36a 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -1039,6 +1039,13 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu)
   kvm_apic_local_deliver(apic, APIC_LVT0);
  }
  
 +void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu)
 +{
 + struct kvm_lapic *apic = vcpu-arch.apic;
 +
 + kvm_apic_local_deliver(apic, APIC_LVT1);
 +}
 +
  static struct kvm_timer_ops lapic_timer_ops = {
   .is_periodic = lapic_is_periodic,
  };
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 84a28ea..fccd094 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -2077,6 +2077,7 @@ int kvm_dev_ioctl_check_extension(long ext)
   case KVM_CAP_XSAVE:
   case KVM_CAP_ASYNC_PF:
   case KVM_CAP_GET_TSC_KHZ:
 + case KVM_CAP_SET_LINT1:
   r = 1;
   break;
   case KVM_CAP_COALESCED_MMIO:
 @@ -3264,6 +3265,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
  
   goto out;
   }
 + case KVM_SET_LINT1: {
 + r = -EINVAL;
 + if (!irqchip_in_kernel(vcpu-kvm))
 + goto out;
 + r = 0;
 + kvm_apic_lint1_deliver(vcpu);

We simply ignore the return value of kvm_apic_local_deliver() and assume
it always works. why?


-- 

Sasha.

[Qemu-devel] [PATCH] hw/9pfs: Handle Security model parsing

2011-10-14 Thread M. Mohan Kumar

Except local fs driver other fs drivers (handle) don't need
security model. Update fsdev parameter parsing accordingly.

Signed-off-by: M. Mohan Kumar mo...@in.ibm.com
---
 fsdev/qemu-fsdev.c |   26 +-
 qemu-options.hx|   12 
 vl.c   |6 ++
 3 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c
index ce920d6..5977bcc 100644
--- a/fsdev/qemu-fsdev.c
+++ b/fsdev/qemu-fsdev.c
@@ -58,8 +58,15 @@ int qemu_fsdev_add(QemuOpts *opts)
 return -1;
 }
 
-if (!sec_model) {
-fprintf(stderr, fsdev: No security_model specified.\n);
+if (!strcmp(fsdriver, local)  !sec_model) {
+fprintf(stderr, security model not specified, 
+local fs needs security model\nvalid options are:
+\tsecurity_model=[passthrough|mapped|none]\n);
+return -1;
+}
+
+if (strcmp(fsdriver, local)  sec_model) {
+fprintf(stderr, only local fs driver needs security model\n);
 return -1;
 }
 
@@ -80,6 +87,10 @@ int qemu_fsdev_add(QemuOpts *opts)
 }
 }
 
+if (strcmp(fsdriver, local)) {
+goto done;
+}
+
 if (!strcmp(sec_model, passthrough)) {
 fsle-fse.export_flags |= V9FS_SM_PASSTHROUGH;
 } else if (!strcmp(sec_model, mapped)) {
@@ -87,14 +98,11 @@ int qemu_fsdev_add(QemuOpts *opts)
 } else if (!strcmp(sec_model, none)) {
 fsle-fse.export_flags |= V9FS_SM_NONE;
 } else {
-fprintf(stderr, Default to security_model=none. You may want
- enable advanced security model using 
-security option:\n\t security_model=passthrough\n\t 
-security_model=mapped\n);
-
-fsle-fse.export_flags |= V9FS_SM_NONE;
+fprintf(stderr, Invalid security model %s specified, valid options 
are
+\n\t [passthrough|mapped|none]\n, sec_model);
+return -1;
 }
-
+done:
 QTAILQ_INSERT_TAIL(fsdriver_entries, fsle, next);
 return 0;
 }
diff --git a/qemu-options.hx b/qemu-options.hx
index 518a1f1..f05be30 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -527,13 +527,13 @@ DEFHEADING()
 DEFHEADING(File system options:)
 
 DEF(fsdev, HAS_ARG, QEMU_OPTION_fsdev,
--fsdev 
fsdriver,id=id,path=path,security_model=[mapped|passthrough|none]\n
+-fsdev 
fsdriver,id=id,path=path,[security_model={mapped|passthrough|none}]\n
[,writeout=immediate]\n,
 QEMU_ARCH_ALL)
 
 STEXI
 
-@item -fsdev 
@var{fsdriver},id=@var{id},path=@var{path},security_model=@var{security_model}[,writeout=@var{writeout}]
+@item -fsdev 
@var{fsdriver},id=@var{id},path=@var{path},[security_model=@var{security_model}][,writeout=@var{writeout}]
 @findex -fsdev
 Define a new file system device. Valid options are:
 @table @option
@@ -555,7 +555,9 @@ attributes like uid, gid, mode bits and link target are 
stored as
 file attributes. Directories exported by this security model cannot
 interact with other unix tools. none security model is same as
 passthrough except the sever won't report failures if it fails to
-set file attributes like ownership.
+set file attributes like ownership. Security model is mandatory
+only for local fsdriver. Other fsdrivers (like handle) don't take
+security model as a parameter.
 @item writeout=@var{writeout}
 This is an optional argument. The only supported value is immediate.
 This means that host page cache will be used to read and write data but
@@ -609,7 +611,9 @@ attributes like uid, gid, mode bits and link target are 
stored as
 file attributes. Directories exported by this security model cannot
 interact with other unix tools. none security model is same as
 passthrough except the sever won't report failures if it fails to
-set file attributes like ownership.
+set file attributes like ownership. Security model is mandatory only
+for local fsdriver. Other fsdrivers (like handle) don't take security
+model as a parameter.
 @item writeout=@var{writeout}
 This is an optional argument. The only supported value is immediate.
 This means that host page cache will be used to read and write data but
diff --git a/vl.c b/vl.c
index 3b8199f..d672268 100644
--- a/vl.c
+++ b/vl.c
@@ -2800,14 +2800,12 @@ int main(int argc, char **argv, char **envp)
 
 if (qemu_opt_get(opts, fsdriver) == NULL ||
 qemu_opt_get(opts, mount_tag) == NULL ||
-qemu_opt_get(opts, path) == NULL ||
-qemu_opt_get(opts, security_model) == NULL) {
+qemu_opt_get(opts, path) == NULL) {
 fprintf(stderr, Usage: -virtfs 
fsdriver,path=/share_path/,
-security_model=[mapped|passthrough|none],
+[security_model={mapped|passthrough|none}],
 mount_tag=tag.\n);
 exit(1);
 }
-
 fsdev =

Re: [Qemu-devel] [PATCH 1/1 V5 tuning] kernel/kvm: introduce KVM_SET_LINT1 and fix improper nmi emulation

2011-10-14 Thread Jan Kiszka

On 2011-10-14 13:59, Sasha Levin wrote:
 On Fri, 2011-10-14 at 17:51 +0800, Lai Jiangshan wrote:
 Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
 button event happens. This doesn't properly emulate real hardware on
 which NMI button event triggers LINT1. Because of this, NMI is sent to
 the processor even when LINT1 is masked in LVT. For example, this
 causes the problem that kdump initiated by NMI sometimes doesn't work
 on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.

 With this patch, we introduce introduce KVM_SET_LINT1,
 and we can use KVM_SET_LINT1 to correctly emulate NMI button
 without change the old KVM_NMI behavior.

 Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
 Reported-by: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com
 ---
 
 It could use a documentation update as well.
 
  arch/x86/kvm/irq.h   |1 +
  arch/x86/kvm/lapic.c |7 +++
  arch/x86/kvm/x86.c   |8 
  include/linux/kvm.h  |3 +++
  4 files changed, 19 insertions(+), 0 deletions(-)
 diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
 index 53e2d08..0c96315 100644
 --- a/arch/x86/kvm/irq.h
 +++ b/arch/x86/kvm/irq.h
 @@ -95,6 +95,7 @@ void kvm_pic_reset(struct kvm_kpic_state *s);
  void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
  void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
  void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu);
 +void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu);
  void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu);
  void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu);
  void __kvm_migrate_timers(struct kvm_vcpu *vcpu);
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 57dcbd4..87fe36a 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -1039,6 +1039,13 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu)
  kvm_apic_local_deliver(apic, APIC_LVT0);
  }
  
 +void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu)
 +{
 +struct kvm_lapic *apic = vcpu-arch.apic;
 +
 +kvm_apic_local_deliver(apic, APIC_LVT1);
 +}
 +
  static struct kvm_timer_ops lapic_timer_ops = {
  .is_periodic = lapic_is_periodic,
  };
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 84a28ea..fccd094 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -2077,6 +2077,7 @@ int kvm_dev_ioctl_check_extension(long ext)
  case KVM_CAP_XSAVE:
  case KVM_CAP_ASYNC_PF:
  case KVM_CAP_GET_TSC_KHZ:
 +case KVM_CAP_SET_LINT1:
  r = 1;
  break;
  case KVM_CAP_COALESCED_MMIO:
 @@ -3264,6 +3265,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
  
  goto out;
  }
 +case KVM_SET_LINT1: {
 +r = -EINVAL;
 +if (!irqchip_in_kernel(vcpu-kvm))
 +goto out;
 +r = 0;
 +kvm_apic_lint1_deliver(vcpu);
 
 We simply ignore the return value of kvm_apic_local_deliver() and assume
 it always works. why?
 

Hmm, I suddenly realized that we switched from enhancing the KVM_NMI
IOCTL to adding KVM_SET_LINT1 - what motivated this?

( Maybe we should let the kernel part settle first before iterating
through user space changes. )

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH 0/6] trace: Add support for trace events grouping

2011-10-14 Thread Stefan Hajnoczi

On Thu, Oct 13, 2011 at 04:10:56PM +0800, Mark Wu wrote:
 On 10/13/2011 01:14 AM, Mark Wu wrote:
 This series add support for trace events grouping. The state of a given group
 of trace events can be queried or changed in bulk by the following monitor
 commands:
 
 * info trace-groups
View available trace event groups and their state.  State 1 means enabled,
state 0 means disabled.
 
 * trace-group NAME on|off
Enable/disable a given trace event group.
 
 A group of trace events can also be enabled in early running stage through
 adding its group name prefixed with group: to trace events list file
 which is passed to -trace events.
 
 Mark Wu (6):
trace: Make tracetool generate a group list
trace: Add HMP monitor commands for trace events group
trace: Add trace events group implementation in the backend simple
trace: Add trace events group implementation in the backend stderr
trace: Enable -trace events argument to control initial state of
  groups
trace: Update doc for trace events group
 
   docs/tracing.txt  |   29 ++--
   hmp-commands.hx   |   14 
   monitor.c |   22 
   scripts/tracetool |   94 
  +++-
   trace-events  |   88 +
   trace/control.c   |   17 +
   trace/control.h   |9 +
   trace/default.c   |   15 
   trace/simple.c|   30 +
   trace/simple.h|7 
   trace/stderr.c|   32 ++
   trace/stderr.h|7 
   12 files changed, 359 insertions(+), 5 deletions(-)
 
 Sorry, there're some coding style problems in the patches. I have
 fixed them and will send out later in order to see if there's any
 other problem coming up. :)

Hi Mark,
Please run scripts/checkpatch.pl on the patches if you haven't already.
It will point out many of the coding style issues.

I think we can get the same convenience by adding wildcard trace event
support.  For example, virtio-blk trace events could be enabled using:

  trace-event virtio_blk_* on

This doesn't work for the memory allocation functions because their
names do not share a common unique prefix, e.g. g_malloc and
qemu_memalign.  However, most related trace events do share a common
unique prefix.

Wildcards don't require special comments in trace-events so it
eliminates the problem of people forgetting to add groups when they add
trace events to QEMU.

The other advantage is that wildcards can select fine-grained sets of
trace events, like ecc_mem_writel_mer, ecc_mem_writel_mdr, etc.
ecc_mem_writel_* selects all these related trace events (they are a
subset of the hw/eccmemctl.c trace events).  This is not possible with
the groups patches since each trace-event can only be in one group;
either we have a high-level hw/eccmemctl.c group or a lower-level
ecc_mem_writel group but both is not possible.

I suggest adding wildcard trace event matching instead of adding groups.
The code changes for wildcards should be much smaller than for groups.
What do you think?

Stefan

Re: [Qemu-devel] [PATCH] configure: Detect when glibc implements makecontext() to always fail

2011-10-14 Thread Peter Maydell

On 13 October 2011 23:23, Andreas Färber andreas.faer...@web.de wrote:
 Am 13.10.2011 16:26, schrieb Andreas Färber:
 Am 12.10.2011 18:21, schrieb Peter Maydell:
 Improve the configure test for presence of ucontext functions by
 making linker warnings fatal; this allows us to detect when we are
 linked with a glibc which implements makecontext() to always return
 ENOSYS.
 ---
 Compiling on an Ubuntu Natty ARM host will hit this.

 Works on Ubuntu Maverick ARM host as well.

 Erm... This works great, also for accept4(), on Linux, but it's not
 portable. Apple ld(1) doesn't seem to have --fatal-warnings.

I've also just discovered that it's no use on Oneiric,
where the linker warning has gone away but the syscall
still always returns ENOSYS.

I think we should just always use the gthread implementation
rather than preferring a non-portable-and-hard-to-detect
set of functions (which increases the set of different
configs we need to test with). If there's a performance problem
with that we should get it fixed in gthread :-)

-- PMM

Re: [Qemu-devel] [PATCH 2/4] block: unify flush implementations

2011-10-14 Thread Paolo Bonzini


On 10/14/2011 01:54 PM, Kevin Wolf wrote:

Am 14.10.2011 13:30, schrieb Paolo Bonzini:

On 10/14/2011 01:08 PM, Kevin Wolf wrote:

Am 14.10.2011 10:41, schrieb Paolo Bonzini:

Add coroutine support for flush and apply the same emulation that
we already do for read/write.  bdrv_aio_flush is simplified to always
go through a coroutine.

Signed-off-by: Paolo Bonzinipbonz...@redhat.com


To make the implementation more consistent with read/write operations,
wouldn't it make sense to provide a bdrv_co_flush() globally instead of
using the synchronous version as the preferred public interface?



What I was thinking of looks a bit different:

 bdrv_flush
 -  create coroutine or just fast-path to bdrv_flush_co_entry
-  bdrv_flush_co_entry
   -  bdrv_co_flush

and

 bdrv_co_flush
 -  driver

And the reason for this is that bdrv_co_flush would be a function that
does only little more than passing the function to the driver (just like
most bdrv_* functions do), with no emulation going on at all.


It would still host the checks on BDRV_O_NO_FLUSH and bs-drv-*_flush. 
 It would be the same as bdrv_flush_co_entry is now, minus the 
marshalling in/out of the RwCo.



Instead of taking a void* and working on a RwCo structure that is really
meant for emulation, bdrv_co_flush would take a BlockDriverState and
improve readability this way.


I see.  Yeah, that's doable, but I'd still need two coroutines (one for 
bdrv_flush, one for bdrv_aio_flush) and the patch would be bigger overall...



The more complicated and ugly code would be left separated and only used
for emulation. I think that would make it easier to understand the
common path without being distracted by emulation code.


... and on the other hand the length of the call chain would increse. 
It easily gets confusing, it already is for me in the read/write case.


Would bdrv_co_flush be static or not?  If not, you also get an 
additional entry point of dubious additional value, i.e. more complexity.



Actually, I'm not so sure about qemu-img. I think we have thought of
scenarios where converting it to a coroutine based version with a main
loop would be helpful (can't remember the details, though).


qemu-img convert might benefit from multiple in-flight requests if on of 
the endpoints is remote or perhaps even sparse, I guess.


Paolo

Re: [Qemu-devel] [PATCH] configure: Detect when glibc implements makecontext() to always fail

2011-10-14 Thread Paolo Bonzini


On 10/14/2011 02:30 PM, Peter Maydell wrote:

I've also just discovered that it's no use on Oneiric,
where the linker warning has gone away but the syscall
still always returns ENOSYS.

I think we should just always use the gthread implementation
rather than preferring a non-portable-and-hard-to-detect
set of functions (which increases the set of different
configs we need to test with). If there's a performance problem
with that we should get it fixed in gthread:-)


A user-space longjmp will always be slower than a mutex+condvar+context 
switch.  We're talking _orders of magnitude_ slower.  At this point it's 
better to write assembly, since we already support only a dozen TCG targets.


I played with an alternative implementation using a 2-barrier instead of 
mutex+condvar, but it didn't give any speedup and was still much slower 
than gthread.


Paolo

Re: [Qemu-devel] [PATCH] configure: Detect when glibc implements makecontext() to always fail

2011-10-14 Thread Paolo Bonzini


On 10/14/2011 02:47 PM, Paolo Bonzini wrote:

A user-space longjmp will always be slower than a mutex+condvar+context
switch.


Gah, I obviously meant faster. :)

Paolo

[Qemu-devel] [Bug 874038] Re: ARM thumb2 does not propogate carry flag properly

2011-10-14 Thread Peter Maydell

The existing code looks OK to me -- there's no need to call
gen_set_CF_bit31() early because the inputs t0 and t1 to
gen_thumb2_data_op() should always be distinct TCG values, and so
gen_thumb2_data_op() will never trash t1. (There was a bug in this area
involving ORN, but that was fixed in rev 29501f1, and I can see from
your patch that you have that fix.)

Can you clarify which exact instruction, input data and output data case
this patch is intended to fix, please?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/874038

Title:
  ARM thumb2 does not propogate carry flag properly

Status in QEMU:
  New

Bug description:
  information on carry flag is lost if gen_set_CF_bit31(t1) is called
  after logic operation.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/874038/+subscriptions

Re: [Qemu-devel] [PATCH 0/2] spice migration interface (RHBZ 737921)

2011-10-14 Thread Gerd Hoffmann


  Hi,


Yonit Halperin (2):
   spice: turn client_migrate_info to async
   spice: support the new migration interface (spice 0.8.3)


Added to spice patch queue.

thanks,
  Gerd

Re: [Qemu-devel] [PATCH] qxl: create slots on post_load in any state (fix RHBZ 740547)

2011-10-14 Thread Gerd Hoffmann


On 09/22/11 14:33, Alon Levy wrote:

If we migrate when the device is not in a native state the guest
still believes the slots are created, and will cause operations
that reference the slots, causing a panic: virtual address out of range
on the first of them. Easy to see by migrating in vga mode (with
a driver loaded, for instance windows cmd window in full screen mode)
and then exiting vga mode back to native mode will cause said panic.

Fixed by doing the slot recreation unconditionally at post_load


Added to spice patch queue.

thanks,
  Gerd

Re: [Qemu-devel] [PATCH] configure: Detect when glibc implements makecontext() to always fail

2011-10-14 Thread Peter Maydell

On 14 October 2011 13:55, Paolo Bonzini pbonz...@redhat.com wrote:
 On 10/14/2011 02:47 PM, Paolo Bonzini wrote:

 A user-space longjmp will always be slower than a mutex+condvar+context
 switch.

 Gah, I obviously meant faster. :)

Ah, I hadn't actually looked at the coroutine-gthread.c code, and
had assumed it was implementing coroutines via a gthread coroutine
abstraction rather than via gthread real actual threads.

-- PMM

Re: [Qemu-devel] [PATCH 2/4] block: unify flush implementations

2011-10-14 Thread Stefan Hajnoczi

On Fri, Oct 14, 2011 at 01:54:42PM +0200, Kevin Wolf wrote:
 Am 14.10.2011 13:30, schrieb Paolo Bonzini:
  On 10/14/2011 01:08 PM, Kevin Wolf wrote:
  Am 14.10.2011 10:41, schrieb Paolo Bonzini:
  Let me show how this might go.  Right now you have
  
   bdrv_read/write
   - bdrv_rw_co
  - create qiov
  - create coroutine or just fast-path to bdrv_rw_co_entry
- bdrv_rw_co_entry
- bdrv_co_do_readv/writev
   - driver
  
   bdrv_co_readv/writev
   - bdrv_co_do_readv/writev
  - driver
  
  But starting from here, you might just as well reorganize it like this:
  
   bdrv_read/writev
   - bdrv_rw_co
  - create qiov
  - bdrv_readv/writev
  
   bdrv_readv/writev
   - create coroutine or just fast-path to bdrv_rw_co_entry
  - bdrv_rw_co_entry
 - bdrv_co_do_readv/writev
- driver
  
  and just drop bdrv_co_readv, since it would just be hard-coding the 
  fast-path of bdrv_readv.
 
 I guess it's all a matter of taste. Stefan, what do you think?
 
  Since some amount of synchronous I/O would likely always be there, for 
  example in qemu-img, I think this unification would make more sense than 
  providing two separate entrypoints for bdrv_co_flush and bdrv_flush.
 
 Actually, I'm not so sure about qemu-img. I think we have thought of
 scenarios where converting it to a coroutine based version with a main
 loop would be helpful (can't remember the details, though).

I'd like to completely remove synchronous bdrv_*(), including for
qemu-img.

It's just too tempting to call these functions in contexts where it is
not okay to do so.  The bdrv_co_*() functions are all tagged as
coroutine_fn and make it clear that they can yield.

We already have an event loop in qemu-img except it's the nested event
loop in synchronous bdrv_*() emulation functions.  The nested event loop
is a mini event loop and can't really do things like timers.  It would
be nicer to remove it in favor of a single top-level event loop with the
qemu-img code running in a coroutine.

So I'm in favor of keeping bdrv_co_*() explicit for now and refactoring
both hw/ and qemu-tool users of synchronous functions.

Stefan

Re: [Qemu-devel] [PATCH] ui/spice-core: fix segfault in monitor

2011-10-14 Thread Gerd Hoffmann


On 10/04/11 13:25, Alon Levy wrote:

Fix segfault if a qxl device is present but no spice command line
argument is given.

RHBZ 743251.


Added to spice patch queue.

thanks,
  Gerd

Re: [Qemu-devel] [PATCH v2] runstate: add more valid transitions

2011-10-14 Thread Luiz Capitulino

On Fri, 14 Oct 2011 08:56:57 +0200
Paolo Bonzini pbonz...@redhat.com wrote:

 On 10/13/2011 10:26 PM, Luiz Capitulino wrote:
  I'm going to take my word back on this one, I've found the real cause of the
  problem. Will post the patch right now.
 
 
 Are you keeping the vl.c hunks though?

I'm not, because I'm assuming that allowing a transition from 'paused' to
'postmigrate' plus this fix:

 http://lists.gnu.org/archive/html/qemu-devel/2011-10/msg01430.html

Will solve the problems we have today. If you don't think so, would you
mind to send a patch against the qmp queue?

 git://repo.or.cz/qemu/qmp-unstable.git queue/qmp

Thanks!

[Qemu-devel] [PATCH] ARM GIC and CPU state saving/loading fix

2011-10-14 Thread Dmitry Koshelev

Fixes two trivial indices errors.

Signed-off-by: Dmitry Koshelev karaghio...@gmail.com
---
 hw/arm_gic.c |   12 ++--
 target-arm/machine.c |4 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/arm_gic.c b/hw/arm_gic.c
index 8286a28..ba05131 100644
--- a/hw/arm_gic.c
+++ b/hw/arm_gic.c
@@ -662,9 +662,6 @@ static void gic_save(QEMUFile *f, void *opaque)
 qemu_put_be32(f, s-enabled);
 for (i = 0; i  NUM_CPU(s); i++) {
 qemu_put_be32(f, s-cpu_enabled[i]);
-#ifndef NVIC
-qemu_put_be32(f, s-irq_target[i]);
-#endif
 for (j = 0; j  32; j++)
 qemu_put_be32(f, s-priority1[j][i]);
 for (j = 0; j  GIC_NIRQ; j++)
@@ -678,6 +675,9 @@ static void gic_save(QEMUFile *f, void *opaque)
 qemu_put_be32(f, s-priority2[i]);
 }
 for (i = 0; i  GIC_NIRQ; i++) {
+#ifndef NVIC
+qemu_put_be32(f, s-irq_target[i]);
+#endif
 qemu_put_byte(f, s-irq_state[i].enabled);
 qemu_put_byte(f, s-irq_state[i].pending);
 qemu_put_byte(f, s-irq_state[i].active);
@@ -699,9 +699,6 @@ static int gic_load(QEMUFile *f, void *opaque, int
version_id)
 s-enabled = qemu_get_be32(f);
 for (i = 0; i  NUM_CPU(s); i++) {
 s-cpu_enabled[i] = qemu_get_be32(f);
-#ifndef NVIC
-s-irq_target[i] = qemu_get_be32(f);
-#endif
 for (j = 0; j  32; j++)
 s-priority1[j][i] = qemu_get_be32(f);
 for (j = 0; j  GIC_NIRQ; j++)
@@ -715,6 +712,9 @@ static int gic_load(QEMUFile *f, void *opaque, int
version_id)
 s-priority2[i] = qemu_get_be32(f);
 }
 for (i = 0; i  GIC_NIRQ; i++) {
+#ifndef NVIC
+s-irq_target[i] = qemu_get_be32(f);
+#endif
 s-irq_state[i].enabled = qemu_get_byte(f);
 s-irq_state[i].pending = qemu_get_byte(f);
 s-irq_state[i].active = qemu_get_byte(f);
diff --git a/target-arm/machine.c b/target-arm/machine.c
index 3925d3a..1b1b3ec 100644
--- a/target-arm/machine.c
+++ b/target-arm/machine.c
@@ -53,7 +53,7 @@ void cpu_save(QEMUFile *f, void *opaque)
 qemu_put_be32(f, env-features);

 if (arm_feature(env, ARM_FEATURE_VFP)) {
-for (i = 0;  i  16; i++) {
+for (i = 16;  i  32; i++) {
 CPU_DoubleU u;
 u.d = env-vfp.regs[i];
 qemu_put_be32(f, u.l.upper);
@@ -175,7 +175,7 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
 env-vfp.vec_stride = qemu_get_be32(f);

 if (arm_feature(env, ARM_FEATURE_VFP3)) {
-for (i = 0;  i  16; i++) {
+for (i = 16;  i  32; i++) {
 CPU_DoubleU u;
 u.l.upper = qemu_get_be32(f);
 u.l.lower = qemu_get_be32(f);

Re: [Qemu-devel] [PATCH v2] runstate: add more valid transitions

2011-10-14 Thread Paolo Bonzini


On 10/14/2011 03:23 PM, Luiz Capitulino wrote:

I'm not, because I'm assuming that allowing a transition from 'paused' to
'postmigrate' plus this fix:

  http://lists.gnu.org/archive/html/qemu-devel/2011-10/msg01430.html


I think POST_MIGRATE - FINISH_MIGRATE should be still allowed in case 
you migrate twice.  Management would probably disallow that, but from 
the monitor you can do funny things: stop the machine, create two images 
based on the running machine's image, and migrate to both images.


Paolo

Re: [Qemu-devel] [v7 Patch 5/5]Qemu: New struct 'BDRVReopenState' for image files reopen

2011-10-14 Thread Supriya Kannery


On 10/12/2011 08:25 PM, Kevin Wolf wrote:

Am 11.10.2011 05:11, schrieb Supriya Kannery:

Struct BDRVReopenState introduced for handling reopen state of images.
This can be  extended by each of the block drivers to reopen respective
image files. Implementation for raw-posix is done here.

Signed-off-by: Supriya Kannerysupri...@linux.vnet.ibm.com


Maybe it would make sense to split this into two patches, one for the
block.c infrastructure and another one for adding the callbacks in drivers.



ok..will split in v8.


+
+/* If only O_DIRECT to be toggled, use fcntl */
+if (!((bs-open_flags  ~BDRV_O_NOCACHE) ^
+(flags  ~BDRV_O_NOCACHE))) {
+raw_rs-reopen_fd = dup(s-fd);
+if (raw_rs-reopen_fd= 0) {
+return -1;


This leaks raw_rs.



will handle


+}
+}
+
+/* TBD: Handle O_DSYNC and other flags */
+*rs = raw_rs;
+return 0;
+}
+
+static int raw_reopen_commit(BDRVReopenState *rs)


bdrv_reopen_commit must never fail. Any error that can happen must
already be handled in bdrv_reopen_prepare.

The commit function should really only do s-fd = rs-reopen_fd (besides
cleanup), everything else should already be prepared.



will move call to fcntl to bdrv_reopen_prepare.


+{
+BlockDriverState *bs = rs-bs;
+BDRVRawState *s = bs-opaque;
+
+if (!rs-reopen_fd) {
+return -1;
+}
+
+int ret = fcntl_setfl(rs-reopen_fd, rs-reopen_flags);


reopen_flags is BDRV_O_*, not O_*, so it needs to be translated.



ok


+/* Use driver specific reopen() if available */
+if (drv-bdrv_reopen_prepare) {
+ret = drv-bdrv_reopen_prepare(bs,rs, bdrv_flags);
  if (ret  0) {
-/* Reopen failed with orig and modified flags */
-abort();
+goto fail;
  }
-}
+if (drv-bdrv_reopen_commit) {
+ret = drv-bdrv_reopen_commit(rs);
+if (ret  0) {
+goto fail;
+}
+return 0;
+}


Pull the return 0; out one level. It would be really strange if we
turned a successful prepare into reopen_abort just because the driver
doesn't need a commit function.

(The other consistent way would be to require that if a driver
implements one reopen function, it has to implement all three of them.
I'm fine either way.)



Will give flexibility to drivers, not mandating all the three functions.


+ret = bdrv_open(bs, bs-filename, open_flags, drv);
+if (ret  0) {
+/*
+ * Reopen with orig and modified flags failed
+ */
+abort();


Make this bs-drv = NULL, so that trying to access to image will fail,
but at least not the whole VM crashes.



ok


  static int raw_read(BlockDriverState *bs, int64_t sector_num,
  uint8_t *buf, int nb_sectors)
  {
@@ -128,6 +137,8 @@ static BlockDriver bdrv_raw = {
  .instance_size  = 1,

  .bdrv_open  = raw_open,
+.bdrv_reopen_prepare
+= raw_reopen,
  .bdrv_close = raw_close,
  .bdrv_read  = raw_read,
  .bdrv_write = raw_write,


I think raw must pass through all three functions. Otherwise it could
happen that we need to abort, but the image has already been reopened.



ok..got it..will have three separate functions to avoid unnecessary
dependencies.

Got a question..
   In raw-posix, the three functions are implemented for file_reopen 
for now. Should this be extended to hdev, cdrom and floppy?



Kevin

Re: [Qemu-devel] [v7 Patch 5/5]Qemu: New struct 'BDRVReopenState' for image files reopen

2011-10-14 Thread Supriya Kannery


On 10/14/2011 04:43 PM, Stefan Hajnoczi wrote:

On Tue, Oct 11, 2011 at 08:41:59AM +0530, Supriya Kannery wrote:

Index: qemu/block.c
===
--- qemu.orig/block.c
+++ qemu/block.c
@@ -706,6 +706,7 @@ int bdrv_reopen(BlockDriverState *bs, in
  {
  BlockDriver *drv = bs-drv;
  int ret = 0, open_flags;
+BDRVReopenState *rs;


BDRVReopenState *rs = NULL;

If the abort path is taken we need to make sure rs has a defined value.
Note that the abort path currently doesn't handle rs == NULL and will
segfault in raw_reopen_abort().



sure..will check on this.

thanks, Supriya

Re: [Qemu-devel] [PATCH 2/4] block: unify flush implementations

2011-10-14 Thread Paolo Bonzini


On 10/14/2011 03:20 PM, Stefan Hajnoczi wrote:

It's just too tempting to call these functions in contexts where it is
not okay to do so.  The bdrv_co_*() functions are all tagged as
coroutine_fn and make it clear that they can yield.


Yes, I agree.


We already have an event loop in qemu-img except it's the nested event
loop in synchronous bdrv_*() emulation functions.  The nested event loop
is a mini event loop and can't really do things like timers.  It would
be nicer to remove it in favor of a single top-level event loop with the
qemu-img code running in a coroutine.


Note that the nested event loop cannot go away because of 
qemu_aio_flush, though. :(


Paolo

Re: [Qemu-devel] [v7 Patch 5/5]Qemu: New struct 'BDRVReopenState' for image files reopen

2011-10-14 Thread Kevin Wolf

Am 14.10.2011 15:42, schrieb Supriya Kannery:
 +/* Use driver specific reopen() if available */
 +if (drv-bdrv_reopen_prepare) {
 +ret = drv-bdrv_reopen_prepare(bs,rs, bdrv_flags);
   if (ret  0) {
 -/* Reopen failed with orig and modified flags */
 -abort();
 +goto fail;
   }
 -}
 +if (drv-bdrv_reopen_commit) {
 +ret = drv-bdrv_reopen_commit(rs);
 +if (ret  0) {
 +goto fail;
 +}
 +return 0;
 +}

 Pull the return 0; out one level. It would be really strange if we
 turned a successful prepare into reopen_abort just because the driver
 doesn't need a commit function.

 (The other consistent way would be to require that if a driver
 implements one reopen function, it has to implement all three of them.
 I'm fine either way.)

 
 Will give flexibility to drivers, not mandating all the three functions.

Do we have a use case where it is actually possible to implement less
functions without introducing bugs?

If yes, let's keep it as it is.

 Got a question..
 In raw-posix, the three functions are implemented for file_reopen 
 for now. Should this be extended to hdev, cdrom and floppy?

Yes, that would be good. And I think the same implementation can be used
for all of them.

Kevin

Re: [Qemu-devel] [PATCH] ARM GIC and CPU state saving/loading fix

2011-10-14 Thread Stefan Hajnoczi

On Fri, Oct 14, 2011 at 05:25:29PM +0400, Dmitry Koshelev wrote:
 Fixes two trivial indices errors.
 
 Signed-off-by: Dmitry Koshelev karaghio...@gmail.com
 ---
  hw/arm_gic.c |   12 ++--
  target-arm/machine.c |4 ++--
  2 files changed, 8 insertions(+), 8 deletions(-)

Not obvious to me what the implications are.  CCed Peter Maydell so it
can go through his ARM tree.

Stefan

Re: [Qemu-devel] [PATCH 2/4] block: unify flush implementations

2011-10-14 Thread Kevin Wolf

Am 14.10.2011 14:42, schrieb Paolo Bonzini:
 On 10/14/2011 01:54 PM, Kevin Wolf wrote:
 Am 14.10.2011 13:30, schrieb Paolo Bonzini:
 On 10/14/2011 01:08 PM, Kevin Wolf wrote:
 Am 14.10.2011 10:41, schrieb Paolo Bonzini:
 Add coroutine support for flush and apply the same emulation that
 we already do for read/write.  bdrv_aio_flush is simplified to always
 go through a coroutine.

 Signed-off-by: Paolo Bonzinipbonz...@redhat.com

 To make the implementation more consistent with read/write operations,
 wouldn't it make sense to provide a bdrv_co_flush() globally instead of
 using the synchronous version as the preferred public interface?

 What I was thinking of looks a bit different:

  bdrv_flush
  -  create coroutine or just fast-path to bdrv_flush_co_entry
 -  bdrv_flush_co_entry
-  bdrv_co_flush

 and

  bdrv_co_flush
  -  driver

 And the reason for this is that bdrv_co_flush would be a function that
 does only little more than passing the function to the driver (just like
 most bdrv_* functions do), with no emulation going on at all.
 
 It would still host the checks on BDRV_O_NO_FLUSH and bs-drv-*_flush. 
   It would be the same as bdrv_flush_co_entry is now, minus the 
 marshalling in/out of the RwCo.

Right.

By the way, I like how you handle all three backends in the same
function. I think this is a lot more readable than the solution used by
read/write (changing the function pointers on driver registration).

 Instead of taking a void* and working on a RwCo structure that is really
 meant for emulation, bdrv_co_flush would take a BlockDriverState and
 improve readability this way.
 
 I see.  Yeah, that's doable, but I'd still need two coroutines (one for 
 bdrv_flush, one for bdrv_aio_flush) and the patch would be bigger overall...

You already have two of them (bdrv_co_flush for AIO and
bdrv_flush_co_entry for synchronous), so I don't think it makes a
difference there.

 The more complicated and ugly code would be left separated and only used
 for emulation. I think that would make it easier to understand the
 common path without being distracted by emulation code.
 
 ... and on the other hand the length of the call chain would increse. 
 It easily gets confusing, it already is for me in the read/write case.

Well, depends on what you're looking at. The call chain length would
increase for AIO and synchronous bdrv_flush, but it would become shorter
for bdrv_co_flush.

If we want to declare coroutines as the preferred interface, I think
such a change makes sense.

 Would bdrv_co_flush be static or not?  If not, you also get an 
 additional entry point of dubious additional value, i.e. more complexity.

I think I would make it public. The one that has to go eventually is the
synchronous bdrv_flush(). Which is another reason why I wouldn't design
everything around it.

 Actually, I'm not so sure about qemu-img. I think we have thought of
 scenarios where converting it to a coroutine based version with a main
 loop would be helpful (can't remember the details, though).
 
 qemu-img convert might benefit from multiple in-flight requests if on of 
 the endpoints is remote or perhaps even sparse, I guess.

Quite possible, yes.

Kevin

Re: [Qemu-devel] [PATCH 2/4] block: unify flush implementations

2011-10-14 Thread Paolo Bonzini


On 10/14/2011 04:02 PM, Kevin Wolf wrote:


  It would still host the checks on BDRV_O_NO_FLUSH and bs-drv-*_flush.
 It would be the same as bdrv_flush_co_entry is now, minus the
  marshalling in/out of the RwCo.

Right.

By the way, I like how you handle all three backends in the same
function. I think this is a lot more readable than the solution used by
read/write (changing the function pointers on driver registration).



Yeah, and it makes sense to handle all of them in the bdrv_co_* version.

Will resubmit.

Paolo

[Qemu-devel] [PATCH 6/7] sheepdog: correct spelling

2011-10-14 Thread Stefan Hajnoczi

From: Dong Xu Wang wdon...@linux.vnet.ibm.com

Reviewed-by: Andreas Färber afaer...@suse.de
Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 block/sheepdog.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index c1f6e07..ae857e2 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -66,7 +66,7 @@
  * 20 - 31 (12 bits): reserved data object space
  * 32 - 55 (24 bits): vdi object space
  * 56 - 59 ( 4 bits): reserved vdi object space
- * 60 - 63 ( 4 bits): object type indentifier space
+ * 60 - 63 ( 4 bits): object type identifier space
  */
 
 #define VDI_SPACE_SHIFT   32
-- 
1.7.6.3

[Qemu-devel] [PATCH 3/7] arm_pic: Fix typo

2011-10-14 Thread Stefan Hajnoczi

From: Andreas Färber andreas.faer...@web.de

interrput - interrupt

Cc: Paul Brook p...@codesourcery.com
Signed-off-by: Andreas Färber andreas.faer...@web.de
Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 hw/arm_pic.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/arm_pic.c b/hw/arm_pic.c
index 985148a..41f8d3e 100644
--- a/hw/arm_pic.c
+++ b/hw/arm_pic.c
@@ -39,7 +39,7 @@ static void arm_pic_cpu_handler(void *opaque, int irq, int 
level)
 cpu_reset_interrupt(env, CPU_INTERRUPT_FIQ);
 break;
 default:
-hw_error(arm_pic_cpu_handler: Bad interrput line %d\n, irq);
+hw_error(arm_pic_cpu_handler: Bad interrupt line %d\n, irq);
 }
 }
 
-- 
1.7.6.3

[Qemu-devel] [PULL 0/7] Trivial patches for October 6 to 14 2011

2011-10-14 Thread Stefan Hajnoczi

The following changes since commit ebffe2afceb1a17b5d134b5debf553955fe5ea1a:

  Merge remote-tracking branch 'qmp/queue/qmp' into staging (2011-10-10 
08:21:46 -0500)

are available in the git repository at:

  ssh://repo.or.cz/srv/git/qemu/stefanha.git trivial-patches

Andreas Färber (1):
  arm_pic: Fix typo

Dong Xu Wang (1):
  sheepdog: correct spelling

Paolo Bonzini (1):
  remove hpet.h

Stefan Hajnoczi (1):
  qemu-options: avoid #if in spicevmc texi help

Stefan Weil (3):
  qemu-char: Fix use of free() instead of g_free()
  tcg: Fix spelling in comment (varables - variables)
  block/qcow: Fix use of free() instead of g_free()

 block/qcow.c |2 +-
 block/sheepdog.c |2 +-
 hpet.h   |   22 --
 hw/arm_pic.c |2 +-
 qemu-char.c  |8 
 qemu-options.hx  |4 ++--
 tcg/tcg.h|2 +-
 7 files changed, 10 insertions(+), 32 deletions(-)
 delete mode 100644 hpet.h

-- 
1.7.6.3

[Qemu-devel] [PATCH 5/7] tcg: Fix spelling in comment (varables - variables)

2011-10-14 Thread Stefan Hajnoczi

From: Stefan Weil s...@weilnetz.de

Signed-off-by: Stefan Weil s...@weilnetz.de
Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 tcg/tcg.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index de8a1d5..015f88a 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -175,7 +175,7 @@ typedef enum TCGType {
 
 typedef tcg_target_ulong TCGArg;
 
-/* Define a type and accessor macros for varables.  Using a struct is
+/* Define a type and accessor macros for variables.  Using a struct is
nice because it gives some level of type safely.  Ideally the compiler
be able to see through all this.  However in practice this is not true,
expecially on targets with braindamaged ABIs (e.g. i386).
-- 
1.7.6.3

[Qemu-devel] [PATCH 4/7] remove hpet.h

2011-10-14 Thread Stefan Hajnoczi

From: Paolo Bonzini pbonz...@redhat.com

It is unused since the HPET and RTC timers were removed (commit
25f3151, 2011-05-31).

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 hpet.h |   22 --
 1 files changed, 0 insertions(+), 22 deletions(-)
 delete mode 100644 hpet.h

diff --git a/hpet.h b/hpet.h
deleted file mode 100644
index 754051a..000
--- a/hpet.h
+++ /dev/null
@@ -1,22 +0,0 @@
-#ifndef__HPET__
-#define__HPET__ 1
-
-
-
-struct hpet_info {
-   unsigned long hi_ireqfreq;  /* Hz */
-   unsigned long hi_flags; /* information */
-   unsigned short hi_hpet;
-   unsigned short hi_timer;
-};
-
-#defineHPET_INFO_PERIODIC  0x0001  /* timer is periodic */
-
-#defineHPET_IE_ON  _IO('h', 0x01)  /* interrupt on */
-#defineHPET_IE_OFF _IO('h', 0x02)  /* interrupt off */
-#defineHPET_INFO   _IOR('h', 0x03, struct hpet_info)
-#defineHPET_EPI_IO('h', 0x04)  /* enable periodic */
-#defineHPET_DPI_IO('h', 0x05)  /* disable periodic */
-#defineHPET_IRQFREQ_IOW('h', 0x6, unsigned long)   /* IRQFREQ usec 
*/
-
-#endif /* !__HPET__ */
-- 
1.7.6.3

[Qemu-devel] [PATCH 7/7] block/qcow: Fix use of free() instead of g_free()

2011-10-14 Thread Stefan Hajnoczi

From: Stefan Weil s...@weilnetz.de

cppcheck reported this error:

qemu/block/qcow.c:599: error: Mismatching allocation and deallocation: 
cluster_data

Signed-off-by: Stefan Weil s...@weilnetz.de
Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 block/qcow.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/block/qcow.c b/block/qcow.c
index c8bfecc..eba5a04 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -596,7 +596,7 @@ static int qcow_co_writev(BlockDriverState *bs, int64_t 
sector_num,
 if (qiov-niov  1) {
 qemu_vfree(orig_buf);
 }
-free(cluster_data);
+g_free(cluster_data);
 
 return ret;
 }
-- 
1.7.6.3

[Qemu-devel] [PATCH 1/7] qemu-options: avoid #if in spicevmc texi help

2011-10-14 Thread Stefan Hajnoczi

Preprocessor directives cannot be used in STEXI/ETEXI sections since
they are not passed through the preprocessor.  The spicevmc chardev
option help currently uses #if, which is included verbatim in the man
page output.

Fix this by simply stating that spicevmc chardevs are available only in
builds with spice support.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 qemu-options.hx |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index dfbabd0..d4fe990 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1673,15 +1673,15 @@ Connect to a local parallel port.
 @option{path} specifies the path to the parallel port device. @option{path} is
 required.
 
-#if defined(CONFIG_SPICE)
 @item -chardev spicevmc ,id=@var{id} ,debug=@var{debug}, name=@var{name}
 
+@option{spicevmc} is only available when spice support is built in.
+
 @option{debug} debug level for spicevmc
 
 @option{name} name of spice channel to connect to
 
 Connect to a spice virtual machine channel, such as vdiport.
-#endif
 
 @end table
 ETEXI
-- 
1.7.6.3

[Qemu-devel] [PATCH 2/7] qemu-char: Fix use of free() instead of g_free()

2011-10-14 Thread Stefan Hajnoczi

From: Stefan Weil s...@weilnetz.de

cppcheck reported these errors:

qemu-char.c:1667: error: Mismatching allocation and deallocation: s
qemu-char.c:1668: error: Mismatching allocation and deallocation: chr
qemu-char.c:1769: error: Mismatching allocation and deallocation: s
qemu-char.c:1770: error: Mismatching allocation and deallocation: chr

Tested-by: Dongxu Wang wdon...@linux.vnet.ibm.com
Signed-off-by: Stefan Weil s...@weilnetz.de
Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 qemu-char.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 8bdbcfd..fb9e058 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -1664,8 +1664,8 @@ static int qemu_chr_open_win(QemuOpts *opts, 
CharDriverState **_chr)
 chr-chr_close = win_chr_close;
 
 if (win_chr_init(chr, filename)  0) {
-free(s);
-free(chr);
+g_free(s);
+g_free(chr);
 return -EIO;
 }
 qemu_chr_generic_open(chr);
@@ -1766,8 +1766,8 @@ static int qemu_chr_open_win_pipe(QemuOpts *opts, 
CharDriverState **_chr)
 chr-chr_close = win_chr_close;
 
 if (win_chr_pipe_init(chr, filename)  0) {
-free(s);
-free(chr);
+g_free(s);
+g_free(chr);
 return -EIO;
 }
 qemu_chr_generic_open(chr);
-- 
1.7.6.3

Re: [Qemu-devel] [PATCH 4/4] block: add bdrv_co_discard and bdrv_aio_discard support

2011-10-14 Thread Kevin Wolf

Am 14.10.2011 10:41, schrieb Paolo Bonzini:
 This similarly adds support for coroutine and asynchronous discard.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Do we really need bdrv_discard and bdrv_aio_discard in the backends? I
think it makes sense to have a bdrv_aio_discard() in block.h as AIO
generally fits well for device models, but I would just require
bdrv_co_discard for any block drivers implementing discard.

   I was not sure if qcow2 could be changed to co_discard, though
   I suspected yes.

As discussed on IRC: Yes, it just must make sure to take s-lock.

Kevin

Re: [Qemu-devel] [PATCH 1/2] spice: turn client_migrate_info to async

2011-10-14 Thread Gerd Hoffmann


On 09/21/11 17:50, Yonit Halperin wrote:

RHBZ 737921
Spice client is required to connect to the migration target before/as migration
starts. Since after migration starts, the target qemu is blocked and cannot 
accept new spice client
we trigger the connection to the target upon client_migrate_info command.
client_migrate_info completion cb will be called after spice client has been
connected to the target (or a timeout). See following patches and spice patches.


Patch failes to build with spice disabled.
Please fix.

thanks,
  Gerd

Re: [Qemu-devel] [PATCH v2] runstate: add more valid transitions

2011-10-14 Thread Luiz Capitulino

On Fri, 14 Oct 2011 15:37:29 +0200
Paolo Bonzini pbonz...@redhat.com wrote:

 On 10/14/2011 03:23 PM, Luiz Capitulino wrote:
  I'm not, because I'm assuming that allowing a transition from 'paused' to
  'postmigrate' plus this fix:
 
http://lists.gnu.org/archive/html/qemu-devel/2011-10/msg01430.html
 
 I think POST_MIGRATE - FINISH_MIGRATE should be still allowed in case 
 you migrate twice.  Management would probably disallow that, but from 
 the monitor you can do funny things: stop the machine, create two images 
 based on the running machine's image, and migrate to both images.

Yes, you're right. But there's another problem there: the VM is stopped
when the migration process finishes. So, if you migrate again, there
won't be a POST_MIGRATE - FINISH_MIGRATE transition, as vm_stop() will
just return.

I don't like the idea of changing current vm_stop() to always change the
state, so I'm adding a new function to do that:

diff --git a/cpus.c b/cpus.c
index 8978779..5f5b763 100644
--- a/cpus.c
+++ b/cpus.c
@@ -887,6 +887,17 @@ void vm_stop(RunState state)
 do_vm_stop(state);
 }
 
+/* does a state transition even if the VM is already stopped,
+   current state is forgotten forever */
+void vm_stop_force_state(RunState state)
+{
+if (runstate_is_running()) {
+vm_stop(state);
+} else {
+runstate_set(state);
+}
+}
+
 static int tcg_cpu_exec(CPUState *env)
 {
 int ret;
diff --git a/migration.c b/migration.c
index 77a51ad..62b74a6 100644
--- a/migration.c
+++ b/migration.c
@@ -375,7 +375,7 @@ void migrate_fd_put_ready(void *opaque)
 int old_vm_running = runstate_is_running();
 
 DPRINTF(done iterating\n);
-vm_stop(RUN_STATE_FINISH_MIGRATE);
+vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
 
 if ((qemu_savevm_state_complete(s-mon, s-file))  0) {
 if (old_vm_running) {
diff --git a/sysemu.h b/sysemu.h
index a889d90..7d288f8 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -35,6 +35,7 @@ void vm_state_notify(int running, RunState state);
 
 void vm_start(void);
 void vm_stop(RunState state);
+void vm_stop_force_state(RunState state);
 
 void qemu_system_reset_request(void);
 void qemu_system_shutdown_request(void);
diff --git a/vl.c b/vl.c
index 2e991fc..613204b 100644
--- a/vl.c
+++ b/vl.c
@@ -346,6 +346,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_PAUSED, RUN_STATE_POSTMIGRATE },
 
 { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
+{ RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
 
 { RUN_STATE_PRELAUNCH, RUN_STATE_RUNNING },
 { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
-- 
1.7.7.rc3

Re: [Qemu-devel] [PATCH 4/4] block: add bdrv_co_discard and bdrv_aio_discard support

2011-10-14 Thread Paolo Bonzini


On 10/14/2011 04:23 PM, Kevin Wolf wrote:

  This similarly adds support for coroutine and asynchronous discard.

  Signed-off-by: Paolo Bonzinipbonz...@redhat.com

Do we really need bdrv_discard and bdrv_aio_discard in the backends? I
think it makes sense to have a bdrv_aio_discard() in block.h as AIO
generally fits well for device models, but I would just require
bdrv_co_discard for any block drivers implementing discard.


bdrv_discard is needed for now since I wouldn't like to conflate this 
patch with the qcow2 patch.


I can certainly drop aio_discard from the backends, but I'm not sure how 
heavy can fallocate be (with FALLOC_FL_PUNCH_HOLE).  Probably not much, 
but I think there's no guarantee of O(1) behavior especially with 
filesystems like ecryptfs.  So you would need to go through the thread 
pool and aio_discard would come in handy.


Paolo

Re: [Qemu-devel] [PATCH v2] runstate: add more valid transitions

2011-10-14 Thread Paolo Bonzini


On 10/14/2011 04:24 PM, Luiz Capitulino wrote:

Yes, you're right. But there's another problem there: the VM is stopped
when the migration process finishes. So, if you migrate again, there
won't be a POST_MIGRATE -  FINISH_MIGRATE transition, as vm_stop() will
just return.


Acked-by: Paolo Bonzini pbonz...@redhat.com

for now.  I really think for 1.1 it's better to separate the 
paused-until-user-interaction-because-... and 
temporarily-paused-because-... state, as suggested elsewhere in the 
thread.


Paolo

Re: [Qemu-devel] [PATCH 4/4] block: add bdrv_co_discard and bdrv_aio_discard support

2011-10-14 Thread Kevin Wolf

Am 14.10.2011 16:24, schrieb Paolo Bonzini:
 On 10/14/2011 04:23 PM, Kevin Wolf wrote:
  This similarly adds support for coroutine and asynchronous discard.

  Signed-off-by: Paolo Bonzinipbonz...@redhat.com

 Do we really need bdrv_discard and bdrv_aio_discard in the backends? I
 think it makes sense to have a bdrv_aio_discard() in block.h as AIO
 generally fits well for device models, but I would just require
 bdrv_co_discard for any block drivers implementing discard.
 
 bdrv_discard is needed for now since I wouldn't like to conflate this 
 patch with the qcow2 patch.

Okay, that makes sense. Then I'll drop it when I convert qcow2.

 I can certainly drop aio_discard from the backends, but I'm not sure how 
 heavy can fallocate be (with FALLOC_FL_PUNCH_HOLE).  Probably not much, 
 but I think there's no guarantee of O(1) behavior especially with 
 filesystems like ecryptfs.  So you would need to go through the thread 
 pool and aio_discard would come in handy.

Sure, but the coroutine interface should be just as good for
implementing it this way in raw-posix.

Kevin

Re: [Qemu-devel] [PATCH 4/4] block: add bdrv_co_discard and bdrv_aio_discard support

2011-10-14 Thread Paolo Bonzini


On 10/14/2011 04:32 PM, Kevin Wolf wrote:

  I can certainly drop aio_discard from the backends, but I'm not sure how
  heavy can fallocate be (with FALLOC_FL_PUNCH_HOLE).  Probably not much,
  but I think there's no guarantee of O(1) behavior especially with
  filesystems like ecryptfs.  So you would need to go through the thread
  pool and aio_discard would come in handy.

Sure, but the coroutine interface should be just as good for
implementing it this way in raw-posix.


I think until someone rewrites raw_aio_submit/paio_submit in terms of 
coroutines, it's better to keep aio_discard.


Paolo

1 2 >

1 - 100 of 158 matches

Mail list logo