Re: linux-next: Tree for Jan 25 (kvm)

2013-01-25 Thread Stephen Rothwell
On Fri, 25 Jan 2013 08:53:58 -0800 Randy Dunlap  wrote:
>
> Seeing lots of this error on i386:
> 
> arch/x86/kvm/emulate.c:1016: Error: unsupported for `push'

Caused by commit 9ae9febae950 ("KVM: x86 emulator: covert SETCC to
fastop") from the kvm tree.  cc's added.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpItVwlZeuzC.pgp
Description: PGP signature


Re: windows 2008 guest causing rcu_shed to emit NMI

2013-01-25 Thread Marcelo Tosatti
On Fri, Jan 25, 2013 at 10:45:02AM +0300, Andrey Korolyov wrote:
> On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti  wrote:
> > On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote:
> >> Thank you Marcelo,
> >>
> >> Host node locking up sometimes later than yesterday, bur problem still
> >> here, please see attached dmesg. Stuck process looks like
> >> root 19251  0.0  0.0 228476 12488 ?D14:42   0:00
> >> /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device
> >> virtio-blk-pci,? -device
> >>
> >> on fourth vm by count.
> >>
> >> Should I try upstream kernel instead of applying patch to the latest
> >> 3.4 or it is useless?
> >
> > If you can upgrade to an upstream kernel, please do that.
> >
> 
> With vanilla 3.7.4 there is almost no changes, and NMI started firing
> again. External symptoms looks like following: starting from some
> count, may be third or sixth vm, qemu-kvm process allocating its
> memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch
> helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to
> kill stuck kvm processes and node returned back to the normal, when on
> 3.2 sending SIGKILL to the process causing zombies and hanged ``ps''
> output (problem and workaround when no scheduler involved described
> here http://www.spinics.net/lists/kvm/msg84799.html).

Try disabling pause loop exiting with ple_gap=0 kvm-intel.ko module parameter.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH V2 11/20] tap: support enabling or disabling a queue

2013-01-25 Thread Blue Swirl
On Fri, Jan 25, 2013 at 10:35 AM, Jason Wang  wrote:
> This patch introduce a new bit - enabled in TAPState which tracks whether a
> specific queue/fd is enabled. The tap/fd is enabled during initialization and
> could be enabled/disabled by tap_enalbe() and tap_disable() which calls 
> platform
> specific helpers to do the real work. Polling of a tap fd can only done when
> the tap was enabled.
>
> Signed-off-by: Jason Wang 
> ---
>  include/net/tap.h |2 ++
>  net/tap-win32.c   |   10 ++
>  net/tap.c |   43 ---
>  3 files changed, 52 insertions(+), 3 deletions(-)
>
> diff --git a/include/net/tap.h b/include/net/tap.h
> index bb7efb5..0caf8c4 100644
> --- a/include/net/tap.h
> +++ b/include/net/tap.h
> @@ -35,6 +35,8 @@ int tap_has_vnet_hdr_len(NetClientState *nc, int len);
>  void tap_using_vnet_hdr(NetClientState *nc, int using_vnet_hdr);
>  void tap_set_offload(NetClientState *nc, int csum, int tso4, int tso6, int 
> ecn, int ufo);
>  void tap_set_vnet_hdr_len(NetClientState *nc, int len);
> +int tap_enable(NetClientState *nc);
> +int tap_disable(NetClientState *nc);
>
>  int tap_get_fd(NetClientState *nc);
>
> diff --git a/net/tap-win32.c b/net/tap-win32.c
> index 265369c..a2cd94b 100644
> --- a/net/tap-win32.c
> +++ b/net/tap-win32.c
> @@ -764,3 +764,13 @@ void tap_set_vnet_hdr_len(NetClientState *nc, int len)
>  {
>  assert(0);
>  }
> +
> +int tap_enable(NetClientState *nc)
> +{
> +assert(0);

abort()

> +}
> +
> +int tap_disable(NetClientState *nc)
> +{
> +assert(0);
> +}
> diff --git a/net/tap.c b/net/tap.c
> index 67080f1..95e557b 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -59,6 +59,7 @@ typedef struct TAPState {
>  unsigned int write_poll : 1;
>  unsigned int using_vnet_hdr : 1;
>  unsigned int has_ufo: 1;
> +unsigned int enabled : 1;

bool without bit field?

>  VHostNetState *vhost_net;
>  unsigned host_vnet_hdr_len;
>  } TAPState;
> @@ -72,9 +73,9 @@ static void tap_writable(void *opaque);
>  static void tap_update_fd_handler(TAPState *s)
>  {
>  qemu_set_fd_handler2(s->fd,
> - s->read_poll  ? tap_can_send : NULL,
> - s->read_poll  ? tap_send : NULL,
> - s->write_poll ? tap_writable : NULL,
> + s->read_poll && s->enabled ? tap_can_send : NULL,
> + s->read_poll && s->enabled ? tap_send : NULL,
> + s->write_poll && s->enabled ? tap_writable : NULL,
>   s);
>  }
>
> @@ -339,6 +340,7 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
>  s->host_vnet_hdr_len = vnet_hdr ? sizeof(struct virtio_net_hdr) : 0;
>  s->using_vnet_hdr = 0;
>  s->has_ufo = tap_probe_has_ufo(s->fd);
> +s->enabled = 1;
>  tap_set_offload(&s->nc, 0, 0, 0, 0, 0);
>  /*
>   * Make sure host header length is set correctly in tap:
> @@ -737,3 +739,38 @@ VHostNetState *tap_get_vhost_net(NetClientState *nc)
>  assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_TAP);
>  return s->vhost_net;
>  }
> +
> +int tap_enable(NetClientState *nc)
> +{
> +TAPState *s = DO_UPCAST(TAPState, nc, nc);
> +int ret;
> +
> +if (s->enabled) {
> +return 0;
> +} else {
> +ret = tap_fd_enable(s->fd);
> +if (ret == 0) {
> +s->enabled = 1;
> +tap_update_fd_handler(s);
> +}
> +return ret;
> +}
> +}
> +
> +int tap_disable(NetClientState *nc)
> +{
> +TAPState *s = DO_UPCAST(TAPState, nc, nc);
> +int ret;
> +
> +if (s->enabled == 0) {
> +return 0;
> +} else {
> +ret = tap_fd_disable(s->fd);
> +if (ret == 0) {
> +qemu_purge_queued_packets(nc);
> +s->enabled = 0;
> +tap_update_fd_handler(s);
> +}
> +return ret;
> +}
> +}
> --
> 1.7.1
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 RESEND RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2013-01-25 Thread Ingo Molnar

* Raghavendra K T  wrote:

> On 01/25/2013 04:17 PM, Ingo Molnar wrote:
> >
> >* Raghavendra K T  wrote:
> >
> >>* Ingo Molnar  [2013-01-24 11:32:13]:
> >>
> >>>
> >>>* Raghavendra K T  wrote:
> >>>
> From: Peter Zijlstra 
> 
> In case of undercomitted scenarios, especially in large guests
> yield_to overhead is significantly high. when run queue length of
> source and target is one, take an opportunity to bail out and return
> -ESRCH. This return condition can be further exploited to quickly come
> out of PLE handler.
> 
> (History: Raghavendra initially worked on break out of kvm ple handler 
> upon
>   seeing source runqueue length = 1, but it had to export rq length).
>   Peter came up with the elegant idea of return -ESRCH in scheduler core.
> 
> Signed-off-by: Peter Zijlstra 
> Raghavendra, Checking the rq length of target vcpu condition 
> added.(thanks Avi)
> Reviewed-by: Srikar Dronamraju 
> Signed-off-by: Raghavendra K T 
> Acked-by: Andrew Jones 
> Tested-by: Chegu Vinod 
> ---
> 
>   kernel/sched/core.c |   25 +++--
>   1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2d8927f..fc219a5 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
>    * It's the caller's job to ensure that the target task struct
>    * can't go away on us before we can do any checks.
>    *
> - * Returns true if we indeed boosted the target task.
> + * Returns:
> + *   true (>0) if we indeed boosted the target task.
> + *   false (0) if we failed to boost the target.
> + *   -ESRCH if there's no task to yield to.
>    */
>   bool __sched yield_to(struct task_struct *p, bool preempt)
>   {
> @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool 
> preempt)
> 
>   again:
>   p_rq = task_rq(p);
> + /*
> +  * If we're the only runnable task on the rq and target rq also
> +  * has only one task, there's absolutely no point in yielding.
> +  */
> + if (rq->nr_running == 1 && p_rq->nr_running == 1) {
> + yielded = -ESRCH;
> + goto out_irq;
> + }
> >>>
> >>>Looks good to me in principle.
> >>>
> >>>Would be nice to get more consistent benchmark numbers. Once
> >>>those are unambiguously showing that this is a win:
> >>>
> >>>   Acked-by: Ingo Molnar 
> >>>
> >>
> >>I ran the test with kernbench and sysbench again on 32 core mx3850
> >>machine with 32 vcpu guests. Results shows definite improvements.
> >>
> >>ebizzy and dbench show similar improvement for 1x overcommit
> >>(note that stdev for 1x in dbench is lesser improvemet is now seen at
> >>only 20%)
> >>
> >>[ all the experiments are taken out of 8 run averages ].
> >>
> >>The patches benefit large guest undercommit scenarios, so I believe
> >>with large guest performance improvemnt is even significant. [ Chegu
> >>Vinod results show performance near to no ple cases ]. Unfortunately I
> >>do not have a machine to test larger guest (>32).
> >>
> >>Ingo, Please let me know if this is okay to you.
> >>
> >>base kernel = 3.8.0-rc4
> >>
> >>+---+---+---++---+
> >> kernbench  (time in sec lower is better)
> >>+---+---+---++---+
> >> basestdevpatchedstdev  %improve
> >>+---+---+---++---+
> >>1x   46.6028 1.8672 42.4494 1.1390 8.91234
> >>2x   99.9074 9.1859 90.4050 2.6131 9.51121
> >>+---+---+---++---+
> >>+---+---+---++---+
> >>sysbench (time in sec lower is better)
> >>+---+---+---++---+
> >> basestdevpatchedstdev  %improve
> >>+---+---+---++---+
> >>1x   18.7402 0.3764 17.7431 0.3589 5.32065
> >>2x   13.2238 0.1935 13.0096 0.3152 1.61981
> >>+---+---+---++---+
> >>
> >>+---+---+---++---+
> >> ebizzy  (records/sec higher is better)
> >>+---+---+---++---+
> >> basestdevpatchedstdev  %improve
> >>+---+---+---++---+
> >>1x  2421.900019.1801  5883.1000   112.7243   142.91259
> >>+---+---+---++---+
> >>
> >>+---+---+---++---+
> >> dbench (throughput MB/sec  higher is better)
> >>+--

Re: linux-next: Tree for Jan 25 (kvm)

2013-01-25 Thread Randy Dunlap
On 01/24/13 21:26, Stephen Rothwell wrote:
> Hi all,
> 
> Changes since 20130124:
> 


Seeing lots of this error on i386:

arch/x86/kvm/emulate.c:1016: Error: unsupported for `push'



-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 RESEND RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2013-01-25 Thread Raghavendra K T

On 01/25/2013 04:35 PM, Andrew Jones wrote:

On Fri, Jan 25, 2013 at 04:10:25PM +0530, Raghavendra K T wrote:

* Ingo Molnar  [2013-01-24 11:32:13]:



* Raghavendra K T  wrote:


From: Peter Zijlstra 

In case of undercomitted scenarios, especially in large guests
yield_to overhead is significantly high. when run queue length of
source and target is one, take an opportunity to bail out and return
-ESRCH. This return condition can be further exploited to quickly come
out of PLE handler.

(History: Raghavendra initially worked on break out of kvm ple handler upon
  seeing source runqueue length = 1, but it had to export rq length).
  Peter came up with the elegant idea of return -ESRCH in scheduler core.

Signed-off-by: Peter Zijlstra 
Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi)
Reviewed-by: Srikar Dronamraju 
Signed-off-by: Raghavendra K T 
Acked-by: Andrew Jones 
Tested-by: Chegu Vinod 
---

  kernel/sched/core.c |   25 +++--
  1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..fc219a5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
   * It's the caller's job to ensure that the target task struct
   * can't go away on us before we can do any checks.
   *
- * Returns true if we indeed boosted the target task.
+ * Returns:
+ * true (>0) if we indeed boosted the target task.
+ * false (0) if we failed to boost the target.
+ * -ESRCH if there's no task to yield to.
   */
  bool __sched yield_to(struct task_struct *p, bool preempt)
  {
@@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool 
preempt)

  again:
p_rq = task_rq(p);
+   /*
+* If we're the only runnable task on the rq and target rq also
+* has only one task, there's absolutely no point in yielding.
+*/
+   if (rq->nr_running == 1 && p_rq->nr_running == 1) {
+   yielded = -ESRCH;
+   goto out_irq;
+   }


Looks good to me in principle.

Would be nice to get more consistent benchmark numbers. Once
those are unambiguously showing that this is a win:

   Acked-by: Ingo Molnar 



I ran the test with kernbench and sysbench again on 32 core mx3850
machine with 32 vcpu guests. Results shows definite improvements.

ebizzy and dbench show similar improvement for 1x overcommit
(note that stdev for 1x in dbench is lesser improvemet is now seen at
only 20%)

[ all the experiments are taken out of 8 run averages ].

The patches benefit large guest undercommit scenarios, so I believe
with large guest performance improvemnt is even significant. [ Chegu
Vinod results show performance near to no ple cases ].


The last results you posted for dbench for the patched 1x case were
showing much better throughput than the no-ple 1x case, which is what
was strange. Is that still happening? You don't have the no-ple 1x
data here this time. The percent errors look a lot better.


I re-ran the experiment and almost got 4% (13500 vs 14100) less 
throughput compared to patched for no-ple case. ( I believe this 
variation may be due to having 4 guest with 3 idle.. as no-ple is very 
sensitive after 1x).




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 RESEND RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2013-01-25 Thread Raghavendra K T

On 01/25/2013 04:17 PM, Ingo Molnar wrote:


* Raghavendra K T  wrote:


* Ingo Molnar  [2013-01-24 11:32:13]:



* Raghavendra K T  wrote:


From: Peter Zijlstra 

In case of undercomitted scenarios, especially in large guests
yield_to overhead is significantly high. when run queue length of
source and target is one, take an opportunity to bail out and return
-ESRCH. This return condition can be further exploited to quickly come
out of PLE handler.

(History: Raghavendra initially worked on break out of kvm ple handler upon
  seeing source runqueue length = 1, but it had to export rq length).
  Peter came up with the elegant idea of return -ESRCH in scheduler core.

Signed-off-by: Peter Zijlstra 
Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi)
Reviewed-by: Srikar Dronamraju 
Signed-off-by: Raghavendra K T 
Acked-by: Andrew Jones 
Tested-by: Chegu Vinod 
---

  kernel/sched/core.c |   25 +++--
  1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..fc219a5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
   * It's the caller's job to ensure that the target task struct
   * can't go away on us before we can do any checks.
   *
- * Returns true if we indeed boosted the target task.
+ * Returns:
+ * true (>0) if we indeed boosted the target task.
+ * false (0) if we failed to boost the target.
+ * -ESRCH if there's no task to yield to.
   */
  bool __sched yield_to(struct task_struct *p, bool preempt)
  {
@@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool 
preempt)

  again:
p_rq = task_rq(p);
+   /*
+* If we're the only runnable task on the rq and target rq also
+* has only one task, there's absolutely no point in yielding.
+*/
+   if (rq->nr_running == 1 && p_rq->nr_running == 1) {
+   yielded = -ESRCH;
+   goto out_irq;
+   }


Looks good to me in principle.

Would be nice to get more consistent benchmark numbers. Once
those are unambiguously showing that this is a win:

   Acked-by: Ingo Molnar 



I ran the test with kernbench and sysbench again on 32 core mx3850
machine with 32 vcpu guests. Results shows definite improvements.

ebizzy and dbench show similar improvement for 1x overcommit
(note that stdev for 1x in dbench is lesser improvemet is now seen at
only 20%)

[ all the experiments are taken out of 8 run averages ].

The patches benefit large guest undercommit scenarios, so I believe
with large guest performance improvemnt is even significant. [ Chegu
Vinod results show performance near to no ple cases ]. Unfortunately I
do not have a machine to test larger guest (>32).

Ingo, Please let me know if this is okay to you.

base kernel = 3.8.0-rc4

+---+---+---++---+
 kernbench  (time in sec lower is better)
+---+---+---++---+
 basestdevpatchedstdev  %improve
+---+---+---++---+
1x   46.6028 1.8672 42.4494 1.1390 8.91234
2x   99.9074 9.1859 90.4050 2.6131 9.51121
+---+---+---++---+
+---+---+---++---+
sysbench (time in sec lower is better)
+---+---+---++---+
 basestdevpatchedstdev  %improve
+---+---+---++---+
1x   18.7402 0.3764 17.7431 0.3589 5.32065
2x   13.2238 0.1935 13.0096 0.3152 1.61981
+---+---+---++---+

+---+---+---++---+
 ebizzy  (records/sec higher is better)
+---+---+---++---+
 basestdevpatchedstdev  %improve
+---+---+---++---+
1x  2421.900019.1801  5883.1000   112.7243   142.91259
+---+---+---++---+

+---+---+---++---+
 dbench (throughput MB/sec  higher is better)
+---+---+---++---+
 basestdevpatchedstdev  %improve
+---+---+---++---+
1x  11675.9900   857.415414103.5000   215.842520.79061
+---+---+---++---+


The numbers look pretty convincing, thanks. The workloads were
CPU bound most of the time, right?


Yes. CPU bound most of the time. I also used tmpfs to reduce io
overhead (for dbbench).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the bod

[PATCH qom-cpu for-1.4?] kvm: Pass CPUState to kvm_on_sigbus_vcpu()

2013-01-25 Thread Andreas Färber
Since commit 20d695a9254c1b086a456d3b79a3c311236643ba (kvm: Pass
CPUState to kvm_arch_*) CPUArchState is no longer needed.

Allows to change qemu_kvm_eat_signals() argument as well.

Signed-off-by: Andreas Färber 
---
 Extracted from my qom-cpu-8 queue.

 cpus.c   |8 
 include/sysemu/kvm.h |2 +-
 kvm-all.c|3 +--
 kvm-stub.c   |2 +-
 4 Dateien geändert, 7 Zeilen hinzugefügt(+), 8 Zeilen entfernt(-)

diff --git a/cpus.c b/cpus.c
index a4390c3..41779eb 100644
--- a/cpus.c
+++ b/cpus.c
@@ -517,7 +517,7 @@ static void qemu_init_sigbus(void)
 prctl(PR_MCE_KILL, PR_MCE_KILL_SET, PR_MCE_KILL_EARLY, 0, 0);
 }
 
-static void qemu_kvm_eat_signals(CPUArchState *env)
+static void qemu_kvm_eat_signals(CPUState *cpu)
 {
 struct timespec ts = { 0, 0 };
 siginfo_t siginfo;
@@ -538,7 +538,7 @@ static void qemu_kvm_eat_signals(CPUArchState *env)
 
 switch (r) {
 case SIGBUS:
-if (kvm_on_sigbus_vcpu(env, siginfo.si_code, siginfo.si_addr)) {
+if (kvm_on_sigbus_vcpu(cpu, siginfo.si_code, siginfo.si_addr)) {
 sigbus_reraise();
 }
 break;
@@ -560,7 +560,7 @@ static void qemu_init_sigbus(void)
 {
 }
 
-static void qemu_kvm_eat_signals(CPUArchState *env)
+static void qemu_kvm_eat_signals(CPUState *cpu)
 {
 }
 #endif /* !CONFIG_LINUX */
@@ -727,7 +727,7 @@ static void qemu_kvm_wait_io_event(CPUArchState *env)
 qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
 }
 
-qemu_kvm_eat_signals(env);
+qemu_kvm_eat_signals(cpu);
 qemu_wait_io_event_common(cpu);
 }
 
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 384ee66..6e6dfb3 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -159,7 +159,7 @@ int kvm_update_guest_debug(CPUArchState *env, unsigned long 
reinject_trap);
 int kvm_set_signal_mask(CPUArchState *env, const sigset_t *sigset);
 #endif
 
-int kvm_on_sigbus_vcpu(CPUArchState *env, int code, void *addr);
+int kvm_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
 int kvm_on_sigbus(int code, void *addr);
 
 /* internal API */
diff --git a/kvm-all.c b/kvm-all.c
index 363a358..04ec2d5 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -2026,9 +2026,8 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, 
uint16_t val, bool assign)
 return 0;
 }
 
-int kvm_on_sigbus_vcpu(CPUArchState *env, int code, void *addr)
+int kvm_on_sigbus_vcpu(CPUState *cpu, int code, void *addr)
 {
-CPUState *cpu = ENV_GET_CPU(env);
 return kvm_arch_on_sigbus_vcpu(cpu, code, addr);
 }
 
diff --git a/kvm-stub.c b/kvm-stub.c
index 47f8dca..760aadc 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -112,7 +112,7 @@ int kvm_set_ioeventfd_mmio(int fd, uint32_t adr, uint32_t 
val, bool assign, uint
 return -ENOSYS;
 }
 
-int kvm_on_sigbus_vcpu(CPUArchState *env, int code, void *addr)
+int kvm_on_sigbus_vcpu(CPUState *cpu, int code, void *addr)
 {
 return 1;
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] s390/virtio-ccw: Fix setup_vq error handling.

2013-01-25 Thread Christian Borntraeger
On 25/01/13 15:34, Christian Borntraeger wrote:
> Signed-off-by: Cornelia Huck 

Sorry, I messed up the From.
Should I resend or can you change the author to "Cornelia Huck 
"

Christian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] s390/kvm: Fix instruction decoding

2013-01-25 Thread Christian Borntraeger
Instructions with long displacement have a signed displacement.
Currently the sign bit is interpreted as 2^20: Lets fix it by doing the
sign extension from 20bit to 32bit and then use it as a signed variable
in the addition (see kvm_s390_get_base_disp_rsy).

Furthermore, there are lots of "int" in that code. This is problematic,
because shifting on a signed integer is undefined/implementation defined
if the bit value happens to be negative.
Fortunately the promotion rules will make the right hand side unsigned
anyway, so there is no real problem right now.
Let's convert them anyway to unsigned where appropriate to avoid
problems if the code is changed or copy/pasted later on.

Signed-off-by: Christian Borntraeger 
Reviewed-by: Cornelia Huck 
---
 arch/s390/kvm/kvm-s390.h | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 3e05def..4d89d64 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -67,8 +67,8 @@ static inline void kvm_s390_set_prefix(struct kvm_vcpu *vcpu, 
u32 prefix)
 
 static inline u64 kvm_s390_get_base_disp_s(struct kvm_vcpu *vcpu)
 {
-   int base2 = vcpu->arch.sie_block->ipb >> 28;
-   int disp2 = ((vcpu->arch.sie_block->ipb & 0x0fff) >> 16);
+   u32 base2 = vcpu->arch.sie_block->ipb >> 28;
+   u32 disp2 = ((vcpu->arch.sie_block->ipb & 0x0fff) >> 16);
 
return (base2 ? vcpu->run->s.regs.gprs[base2] : 0) + disp2;
 }
@@ -76,10 +76,10 @@ static inline u64 kvm_s390_get_base_disp_s(struct kvm_vcpu 
*vcpu)
 static inline void kvm_s390_get_base_disp_sse(struct kvm_vcpu *vcpu,
  u64 *address1, u64 *address2)
 {
-   int base1 = (vcpu->arch.sie_block->ipb & 0xf000) >> 28;
-   int disp1 = (vcpu->arch.sie_block->ipb & 0x0fff) >> 16;
-   int base2 = (vcpu->arch.sie_block->ipb & 0xf000) >> 12;
-   int disp2 = vcpu->arch.sie_block->ipb & 0x0fff;
+   u32 base1 = (vcpu->arch.sie_block->ipb & 0xf000) >> 28;
+   u32 disp1 = (vcpu->arch.sie_block->ipb & 0x0fff) >> 16;
+   u32 base2 = (vcpu->arch.sie_block->ipb & 0xf000) >> 12;
+   u32 disp2 = vcpu->arch.sie_block->ipb & 0x0fff;
 
*address1 = (base1 ? vcpu->run->s.regs.gprs[base1] : 0) + disp1;
*address2 = (base2 ? vcpu->run->s.regs.gprs[base2] : 0) + disp2;
@@ -87,17 +87,20 @@ static inline void kvm_s390_get_base_disp_sse(struct 
kvm_vcpu *vcpu,
 
 static inline u64 kvm_s390_get_base_disp_rsy(struct kvm_vcpu *vcpu)
 {
-   int base2 = vcpu->arch.sie_block->ipb >> 28;
-   int disp2 = ((vcpu->arch.sie_block->ipb & 0x0fff) >> 16) +
+   u32 base2 = vcpu->arch.sie_block->ipb >> 28;
+   u32 disp2 = ((vcpu->arch.sie_block->ipb & 0x0fff) >> 16) +
((vcpu->arch.sie_block->ipb & 0xff00) << 4);
+   /* The displacement is a 20bit _SIGNED_ value */
+   if (disp2 & 0x8)
+   disp2+=0xfff0;
 
-   return (base2 ? vcpu->run->s.regs.gprs[base2] : 0) + disp2;
+   return (base2 ? vcpu->run->s.regs.gprs[base2] : 0) + (long)(int)disp2;
 }
 
 static inline u64 kvm_s390_get_base_disp_rs(struct kvm_vcpu *vcpu)
 {
-   int base2 = vcpu->arch.sie_block->ipb >> 28;
-   int disp2 = ((vcpu->arch.sie_block->ipb & 0x0fff) >> 16);
+   u32 base2 = vcpu->arch.sie_block->ipb >> 28;
+   u32 disp2 = ((vcpu->arch.sie_block->ipb & 0x0fff) >> 16);
 
return (base2 ? vcpu->run->s.regs.gprs[base2] : 0) + disp2;
 }
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] s390/kvm: Fix store status for ACRS/FPRS

2013-01-25 Thread Christian Borntraeger
On store status we need to copy the current state of registers
into a save area. Currently we might save stale versions:
The sie state descriptor doesnt have fields for guest ACRS,FPRS,
those registers are simply stored in the host registers. The host
program must copy these away if needed. We do that in vcpu_put/load.

If we now do a store status in KVM code between vcpu_put/load, the
saved values are not up-to-date. Lets collect the ACRS/FPRS before
saving them.

This also fixes some strange problems with hotplug and virtio-ccw,
since the low level machine check handler (on hotplug a machine check
will happen) will revalidate all registers with the content of the
save area.

Signed-off-by: Christian Borntraeger 
CC: sta...@vger.kernel.org
---
 arch/s390/kvm/kvm-s390.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 5b01f09..4377d18 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -770,6 +770,14 @@ int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, 
unsigned long addr)
} else
prefix = 0;
 
+   /*
+* The guest FPRS and ACRS are in the host FPRS/ACRS due to the lazy
+* copying in vcpu load/put. Lets update our copies before we save
+* it into the save area
+*/
+   save_fp_regs(&vcpu->arch.guest_fpregs);
+   save_access_regs(vcpu->run->s.regs.acrs);
+
if (__guestcopy(vcpu, addr + offsetof(struct save_area, fp_regs),
vcpu->arch.guest_fpregs.fprs, 128, prefix))
return -EFAULT;
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] s390/virtio-ccw: Fix setup_vq error handling.

2013-01-25 Thread Christian Borntraeger
virtio_ccw_setup_vq() failed to unwind correctly on errors. In
particular, it failed to delete the virtqueue on errors, leading to
list corruption when virtio_ccw_del_vqs() iterated over a virtqueue
that had not been added to the vcdev's list.

Fix this with redoing the error unwinding in virtio_ccw_setup_vq(),
using a single path for all errors.

Signed-off-by: Cornelia Huck 
Reviewed-by: Christian Borntraeger 
Signed-off-by: Christian Borntraeger 
---
 drivers/s390/kvm/virtio_ccw.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 2edd94a..3217dfe 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -244,9 +244,9 @@ static struct virtqueue *virtio_ccw_setup_vq(struct 
virtio_device *vdev,
 {
struct virtio_ccw_device *vcdev = to_vc_device(vdev);
int err;
-   struct virtqueue *vq;
+   struct virtqueue *vq = NULL;
struct virtio_ccw_vq_info *info;
-   unsigned long size;
+   unsigned long size = 0; /* silence the compiler */
unsigned long flags;
 
/* Allocate queue. */
@@ -279,11 +279,8 @@ static struct virtqueue *virtio_ccw_setup_vq(struct 
virtio_device *vdev,
/* For now, we fail if we can't get the requested size. */
dev_warn(&vcdev->cdev->dev, "no vq\n");
err = -ENOMEM;
-   free_pages_exact(info->queue, size);
goto out_err;
}
-   info->vq = vq;
-   vq->priv = info;
 
/* Register it with the host. */
info->info_block->queue = (__u64)info->queue;
@@ -297,12 +294,12 @@ static struct virtqueue *virtio_ccw_setup_vq(struct 
virtio_device *vdev,
err = ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_SET_VQ | i);
if (err) {
dev_warn(&vcdev->cdev->dev, "SET_VQ failed\n");
-   free_pages_exact(info->queue, size);
-   info->vq = NULL;
-   vq->priv = NULL;
goto out_err;
}
 
+   info->vq = vq;
+   vq->priv = info;
+
/* Save it to our list. */
spin_lock_irqsave(&vcdev->lock, flags);
list_add(&info->node, &vcdev->virtqueues);
@@ -311,8 +308,13 @@ static struct virtqueue *virtio_ccw_setup_vq(struct 
virtio_device *vdev,
return vq;
 
 out_err:
-   if (info)
+   if (vq)
+   vring_del_virtqueue(vq);
+   if (info) {
+   if (info->queue)
+   free_pages_exact(info->queue, size);
kfree(info->info_block);
+   }
kfree(info);
return ERR_PTR(err);
 }
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] s390/kvm fixes

2013-01-25 Thread Christian Borntraeger
Gleb, Marcelo,

here are 3 kvm fixes for kvm-next.

Christian Borntraeger (3):
  s390/kvm: Fix store status for ACRS/FPRS
  s390/virtio-ccw: Fix setup_vq error handling.
  s390/kvm: Fix instruction decoding

 arch/s390/kvm/kvm-s390.c  |  8 
 arch/s390/kvm/kvm-s390.h  | 25 ++---
 drivers/s390/kvm/virtio_ccw.c | 20 +++-
 3 files changed, 33 insertions(+), 20 deletions(-)

-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 00/11] s390: channel I/O support in qemu.

2013-01-25 Thread Anthony Liguori
Hi,

Thank you for submitting your patch series.  checkpatch.pl has
detected that one or more of the patches in this series violate
the QEMU coding style.

If you believe this message was sent in error, please ignore it
or respond here with an explanation.

Otherwise, please correct the coding style issues and resubmit a
new version of the patch.

For more information about QEMU coding style, see:

http://git.qemu.org/?p=qemu.git;a=blob_plain;f=CODING_STYLE;hb=HEAD

Here is the output from checkpatch.pl:

Subject: s390: Add s390-ccw-virtio machine.
Subject: s390: Add default support for SCLP console
ERROR: do not initialise statics to 0 or NULL
#72: FILE: vl.c:2468:
+static int index = 0;

WARNING: braces {} are necessary for all arms of this statement
#126: FILE: vl.c:3923:
+if (default_sclp)
[...]

WARNING: braces {} are necessary for all arms of this statement
#135: FILE: vl.c:3937:
+if (default_sclp)
[...]

WARNING: braces {} are necessary for all arms of this statement
#144: FILE: vl.c:4109:
+if (foreach_device_config(DEV_SCLP, sclp_parse) < 0)
[...]

total: 1 errors, 3 warnings, 114 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Subject: s390-virtio: Factor out some initialization code.
Subject: s390: Add new channel I/O based virtio transport.
Subject: s390: Wire up channel I/O in kvm.
Subject: s390: Virtual channel subsystem support.
ERROR: need consistent spacing around '*' (ctx:WxV)
#56: FILE: hw/s390x/css.c:31:
+SubchDev *sch[MAX_SCHID + 1];
  ^

ERROR: need consistent spacing around '*' (ctx:WxV)
#62: FILE: hw/s390x/css.c:37:
+SubchSet *sch_set[MAX_SSID + 1];
  ^

ERROR: need consistent spacing around '*' (ctx:WxV)
#74: FILE: hw/s390x/css.c:49:
+CssImage *css[MAX_CSSID + 1];
  ^

total: 3 errors, 0 warnings, 1469 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Subject: s390: Add channel I/O instructions.
Subject: s390: I/O interrupt and machine check injection.
Subject: s390: Channel I/O basic definitions.
Subject: s390: Add mapping helper functions.
Subject: s390: Lowcore mapping helper.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm tools: Beautify debug output

2013-01-25 Thread Asias He
1. print mem debug info into debugfd instead guest console
2. always print page table info

Signed-off-by: Asias He 
---
 tools/kvm/include/kvm/kvm.h |  2 +-
 tools/kvm/kvm.c | 11 ++-
 tools/kvm/x86/kvm-cpu.c | 12 +---
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
index acb0818..ad53ca7 100644
--- a/tools/kvm/include/kvm/kvm.h
+++ b/tools/kvm/include/kvm/kvm.h
@@ -114,7 +114,7 @@ bool load_bzimage(struct kvm *kvm, int fd_kernel, int 
fd_initrd, const char *ker
 /*
  * Debugging
  */
-void kvm__dump_mem(struct kvm *kvm, unsigned long addr, unsigned long size);
+void kvm__dump_mem(struct kvm *kvm, unsigned long addr, unsigned long size, 
int debug_fd);
 
 extern const char *kvm_exit_reasons[];
 
diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index a6b3c23..cfd30dd 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -444,7 +444,7 @@ int kvm_timer__exit(struct kvm *kvm)
 }
 firmware_exit(kvm_timer__exit);
 
-void kvm__dump_mem(struct kvm *kvm, unsigned long addr, unsigned long size)
+void kvm__dump_mem(struct kvm *kvm, unsigned long addr, unsigned long size, 
int debug_fd)
 {
unsigned char *p;
unsigned long n;
@@ -456,10 +456,11 @@ void kvm__dump_mem(struct kvm *kvm, unsigned long addr, 
unsigned long size)
p = guest_flat_to_host(kvm, addr);
 
for (n = 0; n < size; n += 8) {
-   if (!host_ptr_in_ram(kvm, p + n))
-   break;
-
-   printf("  0x%08lx: %02x %02x %02x %02x  %02x %02x %02x %02x\n",
+   if (!host_ptr_in_ram(kvm, p + n)) {
+   dprintf(debug_fd, " 0x%08lx: \n", addr + n);
+   continue;
+   }
+   dprintf(debug_fd, " 0x%08lx: %02x %02x %02x %02x  %02x %02x 
%02x %02x\n",
addr + n, p[n + 0], p[n + 1], p[n + 2], p[n + 3],
  p[n + 4], p[n + 5], p[n + 6], p[n + 7]);
}
diff --git a/tools/kvm/x86/kvm-cpu.c b/tools/kvm/x86/kvm-cpu.c
index b6190ed..5cc4e1e 100644
--- a/tools/kvm/x86/kvm-cpu.c
+++ b/tools/kvm/x86/kvm-cpu.c
@@ -364,7 +364,8 @@ void kvm_cpu__show_code(struct kvm_cpu *vcpu)
 
dprintf(debug_fd, "\n Stack:\n");
dprintf(debug_fd,   " --\n");
-   kvm__dump_mem(vcpu->kvm, vcpu->regs.rsp, 32);
+   dprintf(debug_fd, " rsp: [<%016lx>] \n", (unsigned long) 
vcpu->regs.rsp);
+   kvm__dump_mem(vcpu->kvm, vcpu->regs.rsp, 32, debug_fd);
 }
 
 void kvm_cpu__show_page_tables(struct kvm_cpu *vcpu)
@@ -374,8 +375,12 @@ void kvm_cpu__show_page_tables(struct kvm_cpu *vcpu)
u64 *pte3;
u64 *pte4;
 
-   if (!is_in_protected_mode(vcpu))
+   if (!is_in_protected_mode(vcpu)) {
+   dprintf(debug_fd, "\n Page Tables:\n");
+   dprintf(debug_fd, " --\n");
+   dprintf(debug_fd, " Not in protected mode\n");
return;
+   }
 
if (ioctl(vcpu->vcpu_fd, KVM_GET_SREGS, &vcpu->sregs) < 0)
die("KVM_GET_SREGS failed");
@@ -396,7 +401,8 @@ void kvm_cpu__show_page_tables(struct kvm_cpu *vcpu)
if (!host_ptr_in_ram(vcpu->kvm, pte1))
return;
 
-   dprintf(debug_fd, "Page Tables:\n");
+   dprintf(debug_fd, "\n Page Tables:\n");
+   dprintf(debug_fd, " --\n");
if (*pte2 & (1 << 7))
dprintf(debug_fd, " pte4: %016llx   pte3: %016llx"
"   pte2: %016llx\n",
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 0/2] s390: virtio-ccw transport.

2013-01-25 Thread Alexander Graf

On 25.01.2013, at 13:37, Cornelia Huck wrote:

> On Thu, 24 Jan 2013 17:17:46 +0100
> Alexander Graf  wrote:
> 
>> 
>> On 24.01.2013, at 17:08, Cornelia Huck wrote:
>> 
>>> Hi,
>>> 
>>> patches against s390-next again, with coding style fixes.
>> 
>> Thanks, applied to s390-next.
> 
> Hm, did you forget to apply 2/2?

Oops. Fixed :)


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] KVM: BOOKE/BOOKEHV : Added debug stub support

2013-01-25 Thread Alexander Graf

On 16.01.2013, at 09:20, Bharat Bhushan wrote:

> This patchset adds the QEMU debug stub support for powerpc (booke/bookehv).
> [1/8] KVM: PPC: booke: use vcpu reference from thread_struct
>   - This is a cleanup patch to use vcpu reference from thread struct
> [2/8] KVM: PPC: booke: Allow multiple exception types
> [3/8] KVM: PPC: booke: Added debug handler
>   - These two patches install the KVM debug handler.
> [4/8] Added ONE_REG interface for debug instruction
>   - Add the ioctl interface to get the debug instruction for
> setting software breakpoint from QEMU debug stub.
> [5/8] KVM: PPC: debug stub interface parameter defined
> [6/8] booke: Added DBCR4 SPR number
> [7/8] KVM: booke/bookehv: Add debug stub support
>   - Add the debug stub interface on booke/bookehv
> [8/8] KVM:PPC:booke: Allow debug interrupt injection to guest
>   -- with this qemu can inject debug interrupt to guest

Thanks, applied 1/8, 2/8, 6/8.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 0/2] s390: virtio-ccw transport.

2013-01-25 Thread Cornelia Huck
On Thu, 24 Jan 2013 17:17:46 +0100
Alexander Graf  wrote:

> 
> On 24.01.2013, at 17:08, Cornelia Huck wrote:
> 
> > Hi,
> > 
> > patches against s390-next again, with coding style fixes.
> 
> Thanks, applied to s390-next.

Hm, did you forget to apply 2/2?

> 
> 
> Alex
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] KVM:PPC:booke: Allow debug interrupt injection to guest

2013-01-25 Thread Alexander Graf

On 16.01.2013, at 09:24, Bharat Bhushan wrote:

> Allow userspace to inject debug interrupt to guest. QEMU can

s/QEMU/user space.

> inject the debug interrupt to guest if it is not able to handle
> the debug interrupt.
> 
> Signed-off-by: Bharat Bhushan 
> ---
> arch/powerpc/kvm/booke.c  |   32 +++-
> arch/powerpc/kvm/e500mc.c |   10 +-
> 2 files changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index faa0a0b..547797f 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -133,6 +133,13 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu)
> #endif
> }
> 
> +#ifdef CONFIG_KVM_BOOKE_HV
> +static int kvmppc_core_pending_debug(struct kvm_vcpu *vcpu)
> +{
> + return test_bit(BOOKE_IRQPRIO_DEBUG, &vcpu->arch.pending_exceptions);
> +}
> +#endif
> +
> /*
>  * Helper function for "full" MSR writes.  No need to call this if only
>  * EE/CE/ME/DE/RI are changing.
> @@ -144,7 +151,11 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
> #ifdef CONFIG_KVM_BOOKE_HV
>   new_msr |= MSR_GS;
> 
> - if (vcpu->guest_debug)
> + /*
> +  * Set MSR_DE if the hardware debug resources are owned by user-space
> +  * and there is no debug interrupt pending for guest to handle.

Why? And why is this whole thing only executed on HV?


Alex

> +  */
> + if (vcpu->guest_debug && !kvmppc_core_pending_debug(vcpu))
>   new_msr |= MSR_DE;
> #endif
> 
> @@ -234,6 +245,16 @@ static void kvmppc_core_dequeue_watchdog(struct kvm_vcpu 
> *vcpu)
>   clear_bit(BOOKE_IRQPRIO_WATCHDOG, &vcpu->arch.pending_exceptions);
> }
> 
> +static void kvmppc_core_queue_debug(struct kvm_vcpu *vcpu)
> +{
> + kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DEBUG);
> +}
> +
> +static void kvmppc_core_dequeue_debug(struct kvm_vcpu *vcpu)
> +{
> + clear_bit(BOOKE_IRQPRIO_DEBUG, &vcpu->arch.pending_exceptions);
> +}
> +
> static void set_guest_srr(struct kvm_vcpu *vcpu, unsigned long srr0, u32 srr1)
> {
> #ifdef CONFIG_KVM_BOOKE_HV
> @@ -1278,6 +1299,7 @@ static void get_sregs_base(struct kvm_vcpu *vcpu,
>   sregs->u.e.dec = kvmppc_get_dec(vcpu, tb);
>   sregs->u.e.tb = tb;
>   sregs->u.e.vrsave = vcpu->arch.vrsave;
> + sregs->u.e.dbsr = vcpu->arch.dbsr;
> }
> 
> static int set_sregs_base(struct kvm_vcpu *vcpu,
> @@ -1310,6 +1332,14 @@ static int set_sregs_base(struct kvm_vcpu *vcpu,
>   update_timer_ints(vcpu);
>   }
> 
> + if (sregs->u.e.update_special & KVM_SREGS_E_UPDATE_DBSR) {
> + vcpu->arch.dbsr = sregs->u.e.dbsr;
> + if (vcpu->arch.dbsr)
> + kvmppc_core_queue_debug(vcpu);
> + else
> + kvmppc_core_dequeue_debug(vcpu);
> + }
> +
>   return 0;
> }
> 
> diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
> index 81abe92..7d90622 100644
> --- a/arch/powerpc/kvm/e500mc.c
> +++ b/arch/powerpc/kvm/e500mc.c
> @@ -208,7 +208,7 @@ void kvmppc_core_get_sregs(struct kvm_vcpu *vcpu, struct 
> kvm_sregs *sregs)
>   struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
> 
>   sregs->u.e.features |= KVM_SREGS_E_ARCH206_MMU | KVM_SREGS_E_PM |
> -KVM_SREGS_E_PC;
> +KVM_SREGS_E_PC | KVM_SREGS_E_ED;
>   sregs->u.e.impl_id = KVM_SREGS_E_IMPL_FSL;
> 
>   sregs->u.e.impl.fsl.features = 0;
> @@ -216,6 +216,9 @@ void kvmppc_core_get_sregs(struct kvm_vcpu *vcpu, struct 
> kvm_sregs *sregs)
>   sregs->u.e.impl.fsl.hid0 = vcpu_e500->hid0;
>   sregs->u.e.impl.fsl.mcar = vcpu_e500->mcar;
> 
> + sregs->u.e.dsrr0 = vcpu->arch.dsrr0;
> + sregs->u.e.dsrr1 = vcpu->arch.dsrr1;
> +
>   kvmppc_get_sregs_e500_tlb(vcpu, sregs);
> 
>   sregs->u.e.ivor_high[3] =
> @@ -256,6 +259,11 @@ int kvmppc_core_set_sregs(struct kvm_vcpu *vcpu, struct 
> kvm_sregs *sregs)
>   sregs->u.e.ivor_high[5];
>   }
> 
> + if (sregs->u.e.features & KVM_SREGS_E_ED) {
> + vcpu->arch.dsrr0 = sregs->u.e.dsrr0;
> + vcpu->arch.dsrr1 = sregs->u.e.dsrr1;
> + }
> +
>   return kvmppc_set_sregs_ivor(vcpu, sregs);
> }
> 
> -- 
> 1.7.0.4
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] KVM: PPC: booke/bookehv: Add debug stub support

2013-01-25 Thread Alexander Graf

On 16.01.2013, at 09:24, Bharat Bhushan wrote:

> This patch adds the debug stub support on booke/bookehv.
> Now QEMU debug stub can use hw breakpoint, watchpoint and
> software breakpoint to debug guest.
> 
> Signed-off-by: Bharat Bhushan 
> ---
> arch/powerpc/include/asm/kvm_host.h   |5 +
> arch/powerpc/include/asm/kvm_ppc.h|2 +
> arch/powerpc/include/uapi/asm/kvm.h   |   22 -
> arch/powerpc/kernel/asm-offsets.c |   26 ++
> arch/powerpc/kvm/booke.c  |  124 +
> arch/powerpc/kvm/booke_interrupts.S   |  114 ++
> arch/powerpc/kvm/bookehv_interrupts.S |  145 -
> arch/powerpc/kvm/e500_emulate.c   |6 ++
> arch/powerpc/kvm/e500mc.c |3 +-
> 9 files changed, 422 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index f4ba881..a9feeb0 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -504,7 +504,12 @@ struct kvm_vcpu_arch {
>   u32 mmucfg;
>   u32 epr;
>   u32 crit_save;
> + /* guest debug registers*/
>   struct kvmppc_booke_debug_reg dbg_reg;
> + /* shadow debug registers */
> + struct kvmppc_booke_debug_reg shadow_dbg_reg;
> + /* host debug registers*/
> + struct kvmppc_booke_debug_reg host_dbg_reg;
> #endif
>   gpa_t paddr_accessed;
>   gva_t vaddr_accessed;
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> b/arch/powerpc/include/asm/kvm_ppc.h
> index b3c481e..e4b3398 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -45,6 +45,8 @@ enum emulation_result {
>   EMULATE_FAIL, /* can't emulate this instruction */
>   EMULATE_AGAIN,/* something went wrong. go again */
>   EMULATE_DO_PAPR,  /* kvm_run filled with PAPR request */
> + EMULATE_DEBUG_INST,   /* debug instruction for software
> +  breakpoint, exit to userspace */

Does this do something different from DO_PAPR? Maybe it makes sense to have an 
exit code EMULATE_EXIT_USER?

> };
> 
> extern int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
> b/arch/powerpc/include/uapi/asm/kvm.h
> index e8842ed..a81ab29 100644
> --- a/arch/powerpc/include/uapi/asm/kvm.h
> +++ b/arch/powerpc/include/uapi/asm/kvm.h
> @@ -25,6 +25,7 @@
> /* Select powerpc specific features in  */
> #define __KVM_HAVE_SPAPR_TCE
> #define __KVM_HAVE_PPC_SMT
> +#define __KVM_HAVE_GUEST_DEBUG
> 
> struct kvm_regs {
>   __u64 pc;
> @@ -267,7 +268,24 @@ struct kvm_fpu {
>   __u64 fpr[32];
> };
> 
> +/*
> + * Defines for h/w breakpoint, watchpoint (read, write or both) and
> + * software breakpoint.
> + * These are used as "type" in KVM_SET_GUEST_DEBUG ioctl and "status"
> + * for KVM_DEBUG_EXIT.
> + */
> +#define KVMPPC_DEBUG_NONE0x0
> +#define KVMPPC_DEBUG_BREAKPOINT  (1UL << 1)
> +#define KVMPPC_DEBUG_WATCH_WRITE (1UL << 2)
> +#define KVMPPC_DEBUG_WATCH_READ  (1UL << 3)
> struct kvm_debug_exit_arch {
> + __u64 address;
> + /*
> +  * exiting to userspace because of h/w breakpoint, watchpoint
> +  * (read, write or both) and software breakpoint.
> +  */
> + __u32 status;
> + __u32 reserved;
> };
> 
> /* for KVM_SET_GUEST_DEBUG */
> @@ -279,10 +297,6 @@ struct kvm_guest_debug_arch {
>* Type denotes h/w breakpoint, read watchpoint, write
>* watchpoint or watchpoint (both read and write).
>*/
> -#define KVMPPC_DEBUG_NOTYPE  0x0
> -#define KVMPPC_DEBUG_BREAKPOINT  (1UL << 1)
> -#define KVMPPC_DEBUG_WATCH_WRITE (1UL << 2)
> -#define KVMPPC_DEBUG_WATCH_READ  (1UL << 3)
>   __u32 type;
>   __u32 reserved;
>   } bp[16];
> diff --git a/arch/powerpc/kernel/asm-offsets.c 
> b/arch/powerpc/kernel/asm-offsets.c
> index 02048f3..22deda7 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -563,6 +563,32 @@ int main(void)
>   DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
>   DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
>   DEFINE(VCPU_CRIT_SAVE, offsetof(struct kvm_vcpu, arch.crit_save));
> + DEFINE(VCPU_DBSR, offsetof(struct kvm_vcpu, arch.dbsr));
> + DEFINE(VCPU_SHADOW_DBG, offsetof(struct kvm_vcpu, arch.shadow_dbg_reg));
> + DEFINE(VCPU_HOST_DBG, offsetof(struct kvm_vcpu, arch.host_dbg_reg));
> + DEFINE(KVMPPC_DBG_DBCR0, offsetof(struct kvmppc_booke_debug_reg,
> +   dbcr0));
> + DEFINE(KVMPPC_DBG_DBCR1, offsetof(struct kvmppc_booke_debug_reg,
> +   dbcr1));
> + DEFINE(KVMPPC_DBG_DBCR2, offsetof(struct kvmppc_booke_debug_reg,
> +  

Re: [PATCH 5/8] KVM: PPC: debug stub interface parameter defined

2013-01-25 Thread Alexander Graf

On 17.01.2013, at 12:11, Bhushan Bharat-R65777 wrote:

> 
> 
>> -Original Message-
>> From: Paul Mackerras [mailto:pau...@samba.org]
>> Sent: Thursday, January 17, 2013 12:53 PM
>> To: Bhushan Bharat-R65777
>> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Bhushan 
>> Bharat-
>> R65777
>> Subject: Re: [PATCH 5/8] KVM: PPC: debug stub interface parameter defined
>> 
>> On Wed, Jan 16, 2013 at 01:54:42PM +0530, Bharat Bhushan wrote:
>>> This patch defines the interface parameter for KVM_SET_GUEST_DEBUG
>>> ioctl support. Follow up patches will use this for setting up hardware
>>> breakpoints, watchpoints and software breakpoints.
>> 
>> [snip]
>> 
>>> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index
>>> 453a10f..7d5a51c 100644
>>> --- a/arch/powerpc/kvm/booke.c
>>> +++ b/arch/powerpc/kvm/booke.c
>>> @@ -1483,6 +1483,12 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu,
>> struct kvm_one_reg *reg)
>>> return r;
>>> }
>>> 
>>> +int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
>>> +struct kvm_guest_debug *dbg)
>>> +{
>>> +   return -EINVAL;
>>> +}
>>> +
>>> int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu
>>> *fpu)  {
>>> return -ENOTSUPP;
>>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
>>> index 934413c..4c94ca9 100644
>>> --- a/arch/powerpc/kvm/powerpc.c
>>> +++ b/arch/powerpc/kvm/powerpc.c
>>> @@ -532,12 +532,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>> #endif  }
>>> 
>>> -int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
>>> -struct kvm_guest_debug *dbg)
>>> -{
>>> -   return -EINVAL;
>>> -}
>>> -
>> 
>> This will break the build for non-book E machines, since
>> kvm_arch_vcpu_ioctl_set_guest_debug() is referenced from generic code.
>> You need to add it to arch/powerpc/kvm/book3s.c as well.
> 
> right,  I will correct this.

Would the implementation actually be different on booke vs book3s? My feeling 
is that powerpc.c is actually the right place for this.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/8] Added ONE_REG interface for debug instruction

2013-01-25 Thread Alexander Graf

On 16.01.2013, at 09:24, Bharat Bhushan wrote:

> This patch adds the one_reg interface to get the special instruction
> to be used for setting software breakpoint from userspace.
> 
> Signed-off-by: Bharat Bhushan 
> ---
> Documentation/virtual/kvm/api.txt   |1 +
> arch/powerpc/include/asm/kvm_ppc.h  |1 +
> arch/powerpc/include/uapi/asm/kvm.h |3 +++
> arch/powerpc/kvm/44x.c  |5 +
> arch/powerpc/kvm/booke.c|   10 ++
> arch/powerpc/kvm/e500.c |5 +
> arch/powerpc/kvm/e500.h |9 +
> arch/powerpc/kvm/e500mc.c   |5 +
> 8 files changed, 39 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 09905cb..7e8be9e 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1775,6 +1775,7 @@ registers, find a list below:
>   PPC   | KVM_REG_PPC_VPA_DTL   | 128
>   PPC   | KVM_REG_PPC_EPCR| 32
>   PPC   | KVM_REG_PPC_EPR | 32
> +  PPC   | KVM_REG_PPC_DEBUG_INST| 32
> 
> 4.69 KVM_GET_ONE_REG
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> b/arch/powerpc/include/asm/kvm_ppc.h
> index 44a657a..b3c481e 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -235,6 +235,7 @@ union kvmppc_one_reg {
> 
> void kvmppc_core_get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs);
> int kvmppc_core_set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs);
> +u32 kvmppc_core_debug_inst_op(void);
> 
> void kvmppc_get_sregs_ivor(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs);
> int kvmppc_set_sregs_ivor(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs);
> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
> b/arch/powerpc/include/uapi/asm/kvm.h
> index 16064d0..e81ae5b 100644
> --- a/arch/powerpc/include/uapi/asm/kvm.h
> +++ b/arch/powerpc/include/uapi/asm/kvm.h
> @@ -417,4 +417,7 @@ struct kvm_get_htab_header {
> #define KVM_REG_PPC_EPCR  (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x85)
> #define KVM_REG_PPC_EPR   (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x86)
> 
> +/* Debugging: Special instruction for software breakpoint */
> +#define KVM_REG_PPC_DEBUG_INST (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x87)
> +
> #endif /* __LINUX_KVM_POWERPC_H */
> diff --git a/arch/powerpc/kvm/44x.c b/arch/powerpc/kvm/44x.c
> index 3d7fd21..41501be 100644
> --- a/arch/powerpc/kvm/44x.c
> +++ b/arch/powerpc/kvm/44x.c
> @@ -114,6 +114,11 @@ int kvmppc_core_vcpu_translate(struct kvm_vcpu *vcpu,
>   return 0;
> }
> 
> +u32 kvmppc_core_debug_inst_op(void)
> +{
> + return -1;
> +}
> +
> void kvmppc_core_get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
> {
>   kvmppc_get_sregs_ivor(vcpu, sregs);
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index d2f502d..453a10f 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c

Please provide the DEBUG_INST on a more global level - across all ppc subarchs.

> @@ -1424,6 +1424,12 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, 
> struct kvm_one_reg *reg)
>   r = put_user(vcpu->arch.epcr, (u32 __user *)(long)reg->addr);
>   break;
> #endif
> + case KVM_REG_PPC_DEBUG_INST: {
> + u32 opcode = kvmppc_core_debug_inst_op();
> + r = copy_to_user((u32 __user *)(long)reg->addr,
> +  &opcode, sizeof(u32));
> + break;
> + }
>   default:
>   break;
>   }
> @@ -1467,6 +1473,10 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, 
> struct kvm_one_reg *reg)
>   break;
>   }
> #endif
> + case KVM_REG_PPC_DEBUG_INST:
> + /* This is read only, so write to this is nop*/
> + r = 0;
> + break;

Just don't support set_one_reg on this reg.

>   default:
>   break;
>   }
> diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
> index 6dd4de7..d8a5e8e 100644
> --- a/arch/powerpc/kvm/e500.c
> +++ b/arch/powerpc/kvm/e500.c
> @@ -367,6 +367,11 @@ int kvmppc_core_vcpu_setup(struct kvm_vcpu *vcpu)
>   return 0;
> }
> 
> +u32 kvmppc_core_debug_inst_op(void)
> +{
> + return KVMPPC_INST_GUEST_GDB;
> +}
> +
> void kvmppc_core_get_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
> {
>   struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
> diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
> index c70d37e..17942d2 100644
> --- a/arch/powerpc/kvm/e500.h
> +++ b/arch/powerpc/kvm/e500.h
> @@ -302,4 +302,13 @@ static inline unsigned int get_tlbmiss_tid(struct 
> kvm_vcpu *vcpu)
> #define get_tlb_sts(gtlbe)  (MAS1_TS)
> #endif /* !BOOKE_HV */
> 
> +/* When setting software breakpoint, Change the software breakpoint
> + * instruction to special trap/invalid instruction and set
> + * KVM_GUESTDBG_USE_SW_BP flag in kvm_guest_debug->control. KVM does
> + * keep track of softwa

Re: [PATCH 3/8] KVM: PPC: booke: Added debug handler

2013-01-25 Thread Alexander Graf

On 16.01.2013, at 09:24, Bharat Bhushan wrote:

> From: Bharat Bhushan 
> 
> Installed debug handler will be used for guest debug support
> and debug facility emulation features (patches for these
> features will follow this patch).
> 
> Signed-off-by: Liu Yu 
> [bharat.bhus...@freescale.com: Substantial changes]
> Signed-off-by: Bharat Bhushan 
> ---
> arch/powerpc/include/asm/kvm_host.h |1 +
> arch/powerpc/kernel/asm-offsets.c   |1 +
> arch/powerpc/kvm/booke_interrupts.S |   49 ++-
> 3 files changed, 44 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index 8a72d59..f4ba881 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -503,6 +503,7 @@ struct kvm_vcpu_arch {
>   u32 tlbcfg[4];
>   u32 mmucfg;
>   u32 epr;
> + u32 crit_save;
>   struct kvmppc_booke_debug_reg dbg_reg;
> #endif
>   gpa_t paddr_accessed;
> diff --git a/arch/powerpc/kernel/asm-offsets.c 
> b/arch/powerpc/kernel/asm-offsets.c
> index 46f6afd..02048f3 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -562,6 +562,7 @@ int main(void)
>   DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst));
>   DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
>   DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
> + DEFINE(VCPU_CRIT_SAVE, offsetof(struct kvm_vcpu, arch.crit_save));
> #endif /* CONFIG_PPC_BOOK3S */
> #endif /* CONFIG_KVM */
> 
> diff --git a/arch/powerpc/kvm/booke_interrupts.S 
> b/arch/powerpc/kvm/booke_interrupts.S
> index eae8483..dd9c5d4 100644
> --- a/arch/powerpc/kvm/booke_interrupts.S
> +++ b/arch/powerpc/kvm/booke_interrupts.S
> @@ -52,12 +52,7 @@
>(1<(1< 
> -.macro KVM_HANDLER ivor_nr scratch srr0
> -_GLOBAL(kvmppc_handler_\ivor_nr)
> - /* Get pointer to vcpu and record exit number. */
> - mtspr   \scratch , r4
> - mfspr   r4, SPRN_SPRG_THREAD
> - lwz r4, THREAD_KVM_VCPU(r4)
> +.macro __KVM_HANDLER ivor_nr scratch srr0
>   stw r3, VCPU_GPR(R3)(r4)
>   stw r5, VCPU_GPR(R5)(r4)
>   stw r6, VCPU_GPR(R6)(r4)
> @@ -74,6 +69,46 @@ _GLOBAL(kvmppc_handler_\ivor_nr)
>   bctr
> .endm
> 
> +.macro KVM_HANDLER ivor_nr scratch srr0
> +_GLOBAL(kvmppc_handler_\ivor_nr)
> + /* Get pointer to vcpu and record exit number. */
> + mtspr   \scratch , r4
> + mfspr   r4, SPRN_SPRG_THREAD
> + lwz r4, THREAD_KVM_VCPU(r4)
> + __KVM_HANDLER \ivor_nr \scratch \srr0
> +.endm
> +
> +.macro KVM_DBG_HANDLER ivor_nr scratch srr0
> +_GLOBAL(kvmppc_handler_\ivor_nr)
> + mtspr   \scratch, r4
> + mfspr   r4, SPRN_SPRG_THREAD
> + lwz r4, THREAD_KVM_VCPU(r4)
> + stw r3, VCPU_CRIT_SAVE(r4)
> + mfcrr3
> + mfspr   r4, SPRN_CSRR1
> + andi.   r4, r4, MSR_PR
> + bne 1f


> + /* debug interrupt happened in enter/exit path */
> + mfspr   r4, SPRN_CSRR1
> + rlwinm  r4, r4, 0, ~MSR_DE
> + mtspr   SPRN_CSRR1, r4
> + lis r4, 0x
> + ori r4, r4, 0x
> + mtspr   SPRN_DBSR, r4
> + mfspr   r4, SPRN_SPRG_THREAD
> + lwz r4, THREAD_KVM_VCPU(r4)
> + mtcrr3
> + lwz r3, VCPU_CRIT_SAVE(r4)
> + mfspr   r4, \scratch
> + rfci

What is this part doing? Try to ignore the debug exit? Why would we have MSR_DE 
enabled in the first place when we can't handle it?

> +1:   /* debug interrupt happened in guest */
> + mtcrr3
> + mfspr   r4, SPRN_SPRG_THREAD
> + lwz r4, THREAD_KVM_VCPU(r4)
> + lwz r3, VCPU_CRIT_SAVE(r4)
> + __KVM_HANDLER \ivor_nr \scratch \srr0

I don't think you need the __KVM_HANDLER split. This should be quite easily 
refactorable into a simple DBG prolog.


Alex

> +.endm
> +
> .macro KVM_HANDLER_ADDR ivor_nr
>   .long   kvmppc_handler_\ivor_nr
> .endm
> @@ -98,7 +133,7 @@ KVM_HANDLER BOOKE_INTERRUPT_FIT SPRN_SPRG_RSCRATCH0 
> SPRN_SRR0
> KVM_HANDLER BOOKE_INTERRUPT_WATCHDOG SPRN_SPRG_RSCRATCH_CRIT SPRN_CSRR0
> KVM_HANDLER BOOKE_INTERRUPT_DTLB_MISS SPRN_SPRG_RSCRATCH0 SPRN_SRR0
> KVM_HANDLER BOOKE_INTERRUPT_ITLB_MISS SPRN_SPRG_RSCRATCH0 SPRN_SRR0
> -KVM_HANDLER BOOKE_INTERRUPT_DEBUG SPRN_SPRG_RSCRATCH_CRIT SPRN_CSRR0
> +KVM_DBG_HANDLER BOOKE_INTERRUPT_DEBUG SPRN_SPRG_RSCRATCH_CRIT SPRN_CSRR0
> KVM_HANDLER BOOKE_INTERRUPT_SPE_UNAVAIL SPRN_SPRG_RSCRATCH0 SPRN_SRR0
> KVM_HANDLER BOOKE_INTERRUPT_SPE_FP_DATA SPRN_SPRG_RSCRATCH0 SPRN_SRR0
> KVM_HANDLER BOOKE_INTERRUPT_SPE_FP_ROUND SPRN_SPRG_RSCRATCH0 SPRN_SRR0
> -- 
> 1.7.0.4
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm tools: Fix SDL and VNC by setting vidmode correctly

2013-01-25 Thread Asias He
In commit dfefbe9d4894efc44c39b2041bd667d0dea43eca
kvm tools: allow arch's to provide their own command-line options,
vidmode is not setup correctly.

Signed-off-by: Asias He 
---
 tools/kvm/x86/kvm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/kvm/x86/kvm.c b/tools/kvm/x86/kvm.c
index 687e6b7..2ba1db0 100644
--- a/tools/kvm/x86/kvm.c
+++ b/tools/kvm/x86/kvm.c
@@ -294,13 +294,13 @@ bool load_bzimage(struct kvm *kvm, int fd_kernel, int 
fd_initrd,
memcpy(p, kernel_cmdline, cmdline_size - 1);
}
 
-   if (!kvm->cfg.arch.vidmode)
-   vidmode = -1;
 
/* vidmode should be either specified or set by default */
if (kvm->cfg.vnc || kvm->cfg.sdl) {
-   if (vidmode == -1)
+   if (!kvm->cfg.arch.vidmode)
vidmode = 0x312;
+   else
+   vidmode = kvm->cfg.arch.vidmode;
} else {
vidmode = 0;
}
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 RESEND RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2013-01-25 Thread Andrew Jones
On Fri, Jan 25, 2013 at 04:10:25PM +0530, Raghavendra K T wrote:
> * Ingo Molnar  [2013-01-24 11:32:13]:
> 
> > 
> > * Raghavendra K T  wrote:
> > 
> > > From: Peter Zijlstra 
> > > 
> > > In case of undercomitted scenarios, especially in large guests
> > > yield_to overhead is significantly high. when run queue length of
> > > source and target is one, take an opportunity to bail out and return
> > > -ESRCH. This return condition can be further exploited to quickly come
> > > out of PLE handler.
> > > 
> > > (History: Raghavendra initially worked on break out of kvm ple handler 
> > > upon
> > >  seeing source runqueue length = 1, but it had to export rq length).
> > >  Peter came up with the elegant idea of return -ESRCH in scheduler core.
> > > 
> > > Signed-off-by: Peter Zijlstra 
> > > Raghavendra, Checking the rq length of target vcpu condition 
> > > added.(thanks Avi)
> > > Reviewed-by: Srikar Dronamraju 
> > > Signed-off-by: Raghavendra K T 
> > > Acked-by: Andrew Jones 
> > > Tested-by: Chegu Vinod 
> > > ---
> > > 
> > >  kernel/sched/core.c |   25 +++--
> > >  1 file changed, 19 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index 2d8927f..fc219a5 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
> > >   * It's the caller's job to ensure that the target task struct
> > >   * can't go away on us before we can do any checks.
> > >   *
> > > - * Returns true if we indeed boosted the target task.
> > > + * Returns:
> > > + *   true (>0) if we indeed boosted the target task.
> > > + *   false (0) if we failed to boost the target.
> > > + *   -ESRCH if there's no task to yield to.
> > >   */
> > >  bool __sched yield_to(struct task_struct *p, bool preempt)
> > >  {
> > > @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool 
> > > preempt)
> > >  
> > >  again:
> > >   p_rq = task_rq(p);
> > > + /*
> > > +  * If we're the only runnable task on the rq and target rq also
> > > +  * has only one task, there's absolutely no point in yielding.
> > > +  */
> > > + if (rq->nr_running == 1 && p_rq->nr_running == 1) {
> > > + yielded = -ESRCH;
> > > + goto out_irq;
> > > + }
> > 
> > Looks good to me in principle.
> > 
> > Would be nice to get more consistent benchmark numbers. Once 
> > those are unambiguously showing that this is a win:
> > 
> >   Acked-by: Ingo Molnar 
> >
> 
> I ran the test with kernbench and sysbench again on 32 core mx3850
> machine with 32 vcpu guests. Results shows definite improvements.
> 
> ebizzy and dbench show similar improvement for 1x overcommit
> (note that stdev for 1x in dbench is lesser improvemet is now seen at
> only 20%)
> 
> [ all the experiments are taken out of 8 run averages ].
> 
> The patches benefit large guest undercommit scenarios, so I believe
> with large guest performance improvemnt is even significant. [ Chegu
> Vinod results show performance near to no ple cases ].

The last results you posted for dbench for the patched 1x case were
showing much better throughput than the no-ple 1x case, which is what
was strange. Is that still happening? You don't have the no-ple 1x
data here this time. The percent errors look a lot better.

Unfortunately I
> do not have a machine to test larger guest (>32).
> 
> Ingo, Please let me know if this is okay to you.
> 
> base kernel = 3.8.0-rc4
> 
> +---+---+---++---+
> kernbench  (time in sec lower is better)
> +---+---+---++---+
> basestdevpatchedstdev  %improve
> +---+---+---++---+
> 1x   46.6028 1.8672   42.4494 1.1390 8.91234
> 2x   99.9074 9.1859   90.4050 2.6131 9.51121
> +---+---+---++---+
> +---+---+---++---+
>sysbench (time in sec lower is better) 
> +---+---+---++---+
> basestdevpatchedstdev  %improve
> +---+---+---++---+
> 1x   18.7402 0.3764   17.7431 0.3589 5.32065
> 2x   13.2238 0.1935   13.0096 0.3152 1.61981
> +---+---+---++---+
> 
> +---+---+---++---+
> ebizzy  (records/sec higher is better)
> +---+---+---++---+
> basestdevpatchedstdev  %improve
> +---+---+---++---+
> 1x  2421.900019.18015883.1000   112.7243   142.91259
> +---+---+---++---+
> 
> +---+---+-

Re: [PATCH V3 RESEND RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2013-01-25 Thread Ingo Molnar

* Raghavendra K T  wrote:

> * Ingo Molnar  [2013-01-24 11:32:13]:
> 
> > 
> > * Raghavendra K T  wrote:
> > 
> > > From: Peter Zijlstra 
> > > 
> > > In case of undercomitted scenarios, especially in large guests
> > > yield_to overhead is significantly high. when run queue length of
> > > source and target is one, take an opportunity to bail out and return
> > > -ESRCH. This return condition can be further exploited to quickly come
> > > out of PLE handler.
> > > 
> > > (History: Raghavendra initially worked on break out of kvm ple handler 
> > > upon
> > >  seeing source runqueue length = 1, but it had to export rq length).
> > >  Peter came up with the elegant idea of return -ESRCH in scheduler core.
> > > 
> > > Signed-off-by: Peter Zijlstra 
> > > Raghavendra, Checking the rq length of target vcpu condition 
> > > added.(thanks Avi)
> > > Reviewed-by: Srikar Dronamraju 
> > > Signed-off-by: Raghavendra K T 
> > > Acked-by: Andrew Jones 
> > > Tested-by: Chegu Vinod 
> > > ---
> > > 
> > >  kernel/sched/core.c |   25 +++--
> > >  1 file changed, 19 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index 2d8927f..fc219a5 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
> > >   * It's the caller's job to ensure that the target task struct
> > >   * can't go away on us before we can do any checks.
> > >   *
> > > - * Returns true if we indeed boosted the target task.
> > > + * Returns:
> > > + *   true (>0) if we indeed boosted the target task.
> > > + *   false (0) if we failed to boost the target.
> > > + *   -ESRCH if there's no task to yield to.
> > >   */
> > >  bool __sched yield_to(struct task_struct *p, bool preempt)
> > >  {
> > > @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool 
> > > preempt)
> > >  
> > >  again:
> > >   p_rq = task_rq(p);
> > > + /*
> > > +  * If we're the only runnable task on the rq and target rq also
> > > +  * has only one task, there's absolutely no point in yielding.
> > > +  */
> > > + if (rq->nr_running == 1 && p_rq->nr_running == 1) {
> > > + yielded = -ESRCH;
> > > + goto out_irq;
> > > + }
> > 
> > Looks good to me in principle.
> > 
> > Would be nice to get more consistent benchmark numbers. Once 
> > those are unambiguously showing that this is a win:
> > 
> >   Acked-by: Ingo Molnar 
> >
> 
> I ran the test with kernbench and sysbench again on 32 core mx3850
> machine with 32 vcpu guests. Results shows definite improvements.
> 
> ebizzy and dbench show similar improvement for 1x overcommit
> (note that stdev for 1x in dbench is lesser improvemet is now seen at
> only 20%)
> 
> [ all the experiments are taken out of 8 run averages ].
> 
> The patches benefit large guest undercommit scenarios, so I believe
> with large guest performance improvemnt is even significant. [ Chegu
> Vinod results show performance near to no ple cases ]. Unfortunately I
> do not have a machine to test larger guest (>32).
> 
> Ingo, Please let me know if this is okay to you.
> 
> base kernel = 3.8.0-rc4
> 
> +---+---+---++---+
> kernbench  (time in sec lower is better)
> +---+---+---++---+
> basestdevpatchedstdev  %improve
> +---+---+---++---+
> 1x   46.6028 1.8672   42.4494 1.1390 8.91234
> 2x   99.9074 9.1859   90.4050 2.6131 9.51121
> +---+---+---++---+
> +---+---+---++---+
>sysbench (time in sec lower is better) 
> +---+---+---++---+
> basestdevpatchedstdev  %improve
> +---+---+---++---+
> 1x   18.7402 0.3764   17.7431 0.3589 5.32065
> 2x   13.2238 0.1935   13.0096 0.3152 1.61981
> +---+---+---++---+
> 
> +---+---+---++---+
> ebizzy  (records/sec higher is better)
> +---+---+---++---+
> basestdevpatchedstdev  %improve
> +---+---+---++---+
> 1x  2421.900019.18015883.1000   112.7243   142.91259
> +---+---+---++---+
> 
> +---+---+---++---+
> dbench (throughput MB/sec  higher is better)
> +---+---+---++---+
> basestdevpatchedstdev  %improve
> +---+---+---++---+
> 1x  11675.9900   

[PATCH V2 07/20] net: multiqueue support

2013-01-25 Thread Jason Wang
This patch adds basic multiqueue support for qemu. The idea is simple, an array
of NetClientStates were introduced in NICState, parse_netdev() were extended to
find and match all NetClientStates belongs to the backend and place their
pointers in NICConf. Then qemu_new_nic can setup a N:N mapping between NICStates
that belongs to a nic and NICStates belongs to the netdev. And a queue_index
were introduced in NetClientState to track its index. After this, each peers of
a NICState were abstracted as a queue.

After this change, all NetClientState that belongs to the same backend/nic has
the same id. When use want to change the link status, all NetClientStates that
belongs to the same backend/nic will be also changed. When user want to delete
a device or netdev, all NetClientStates that belongs to the same backend/nic
will be deleted also. Changing or deleting an specific queue is not allowed.

Signed-off-by: Jason Wang 
---
 hw/dp8393x.c|2 +-
 hw/mcf_fec.c|2 +-
 hw/qdev-properties-system.c |   46 +++---
 hw/qdev-properties.h|6 +-
 include/net/net.h   |   18 +--
 net/net.c   |  113 +++
 6 files changed, 139 insertions(+), 48 deletions(-)

diff --git a/hw/dp8393x.c b/hw/dp8393x.c
index 0273fad..808157b 100644
--- a/hw/dp8393x.c
+++ b/hw/dp8393x.c
@@ -900,7 +900,7 @@ void dp83932_init(NICInfo *nd, hwaddr base, int it_shift,
 s->regs[SONIC_SR] = 0x0004; /* only revision recognized by Linux */
 
 s->conf.macaddr = nd->macaddr;
-s->conf.peer = nd->netdev;
+s->conf.peers.ncs[0] = nd->netdev;
 
 s->nic = qemu_new_nic(&net_dp83932_info, &s->conf, nd->model, nd->name, s);
 
diff --git a/hw/mcf_fec.c b/hw/mcf_fec.c
index 909e32b..8e60f09 100644
--- a/hw/mcf_fec.c
+++ b/hw/mcf_fec.c
@@ -472,7 +472,7 @@ void mcf_fec_init(MemoryRegion *sysmem, NICInfo *nd,
 memory_region_add_subregion(sysmem, base, &s->iomem);
 
 s->conf.macaddr = nd->macaddr;
-s->conf.peer = nd->netdev;
+s->conf.peers.ncs[0] = nd->netdev;
 
 s->nic = qemu_new_nic(&net_mcf_fec_info, &s->conf, nd->model, nd->name, s);
 
diff --git a/hw/qdev-properties-system.c b/hw/qdev-properties-system.c
index ce0f793..ce3af22 100644
--- a/hw/qdev-properties-system.c
+++ b/hw/qdev-properties-system.c
@@ -173,16 +173,47 @@ PropertyInfo qdev_prop_chr = {
 
 static int parse_netdev(DeviceState *dev, const char *str, void **ptr)
 {
-NetClientState *netdev = qemu_find_netdev(str);
+NICPeers *peers_ptr = (NICPeers *)ptr;
+NICConf *conf = container_of(peers_ptr, NICConf, peers);
+NetClientState **ncs = peers_ptr->ncs;
+NetClientState *peers[MAX_QUEUE_NUM];
+int queues, i = 0;
+int ret;
 
-if (netdev == NULL) {
-return -ENOENT;
+queues = qemu_find_net_clients_except(str, peers,
+  NET_CLIENT_OPTIONS_KIND_NIC,
+  MAX_QUEUE_NUM);
+if (queues == 0) {
+ret = -ENOENT;
+goto err;
 }
-if (netdev->peer) {
-return -EEXIST;
+
+if (queues > MAX_QUEUE_NUM) {
+ret = -E2BIG;
+goto err;
+}
+
+for (i = 0; i < queues; i++) {
+if (peers[i] == NULL) {
+ret = -ENOENT;
+goto err;
+}
+
+if (peers[i]->peer) {
+ret = -EEXIST;
+goto err;
+}
+
+ncs[i] = peers[i];
+ncs[i]->queue_index = i;
 }
-*ptr = netdev;
+
+conf->queues = queues;
+
 return 0;
+
+err:
+return ret;
 }
 
 static const char *print_netdev(void *ptr)
@@ -249,7 +280,8 @@ static void set_vlan(Object *obj, Visitor *v, void *opaque,
 {
 DeviceState *dev = DEVICE(obj);
 Property *prop = opaque;
-NetClientState **ptr = qdev_get_prop_ptr(dev, prop);
+NICPeers *peers_ptr = qdev_get_prop_ptr(dev, prop);
+NetClientState **ptr = &peers_ptr->ncs[0];
 Error *local_err = NULL;
 int32_t id;
 NetClientState *hubport;
diff --git a/hw/qdev-properties.h b/hw/qdev-properties.h
index ddcf774..20c67f3 100644
--- a/hw/qdev-properties.h
+++ b/hw/qdev-properties.h
@@ -31,7 +31,7 @@ extern PropertyInfo qdev_prop_pci_host_devaddr;
 .name  = (_name),\
 .info  = &(_prop),   \
 .offset= offsetof(_state, _field)\
-+ type_check(_type,typeof_field(_state, _field)),\
++ type_check(_type, typeof_field(_state, _field)),   \
 }
 #define DEFINE_PROP_DEFAULT(_name, _state, _field, _defval, _prop, _type) { \
 .name  = (_name),   \
@@ -77,9 +77,9 @@ extern PropertyInfo qdev_prop_pci_host_devaddr;
 #define DEFINE_PROP_STRING(_n, _s, _f) \
 DEFINE_PROP(_n, _s, _f, qdev_prop_string, char*)
 #define DEFINE_PROP_NETDEV(_n, _s, _f) \
-DEFINE_PROP(_n,

[PATCH V2 19/20] virtio-net: migration support for multiqueue

2013-01-25 Thread Jason Wang
This patch add migration support for multiqueue virtio-net. Instead of bumping
the version, we conditionally send the info of multiqueue only when the device
support more than one queue to maintain the backward compatibility.

Signed-off-by: Jason Wang 
---
 hw/virtio-net.c |   35 +--
 1 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index cec91a7..4eb191f 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -1069,8 +1069,8 @@ static void virtio_net_set_multiqueue(VirtIONet *n, int 
multiqueue, int ctrl)
 
 static void virtio_net_save(QEMUFile *f, void *opaque)
 {
+int i;
 VirtIONet *n = opaque;
-VirtIONetQueue *q = &n->vqs[0];
 
 /* At this point, backend must be stopped, otherwise
  * it might keep writing to memory. */
@@ -1078,7 +1078,7 @@ static void virtio_net_save(QEMUFile *f, void *opaque)
 virtio_save(&n->vdev, f);
 
 qemu_put_buffer(f, n->mac, ETH_ALEN);
-qemu_put_be32(f, q->tx_waiting);
+qemu_put_be32(f, n->vqs[0].tx_waiting);
 qemu_put_be32(f, n->mergeable_rx_bufs);
 qemu_put_be16(f, n->status);
 qemu_put_byte(f, n->promisc);
@@ -1094,13 +1094,19 @@ static void virtio_net_save(QEMUFile *f, void *opaque)
 qemu_put_byte(f, n->nouni);
 qemu_put_byte(f, n->nobcast);
 qemu_put_byte(f, n->has_ufo);
+if (n->max_queues > 1) {
+qemu_put_be16(f, n->max_queues);
+qemu_put_be16(f, n->curr_queues);
+for (i = 1; i < n->curr_queues; i++) {
+qemu_put_be32(f, n->vqs[i].tx_waiting);
+}
+}
 }
 
 static int virtio_net_load(QEMUFile *f, void *opaque, int version_id)
 {
 VirtIONet *n = opaque;
-VirtIONetQueue *q = &n->vqs[0];
-int ret, i;
+int ret, i, link_down;
 
 if (version_id < 2 || version_id > VIRTIO_NET_VM_VERSION)
 return -EINVAL;
@@ -,7 +1117,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int 
version_id)
 }
 
 qemu_get_buffer(f, n->mac, ETH_ALEN);
-q->tx_waiting = qemu_get_be32(f);
+n->vqs[0].tx_waiting = qemu_get_be32(f);
 
 virtio_net_set_mrg_rx_bufs(n, qemu_get_be32(f));
 
@@ -1181,6 +1187,20 @@ static int virtio_net_load(QEMUFile *f, void *opaque, 
int version_id)
 }
 }
 
+if (n->max_queues > 1) {
+if (n->max_queues != qemu_get_be16(f)) {
+error_report("virtio-net: different max_queues ");
+return -1;
+}
+
+n->curr_queues = qemu_get_be16(f);
+for (i = 1; i < n->curr_queues; i++) {
+n->vqs[i].tx_waiting = qemu_get_be32(f);
+}
+}
+
+virtio_net_set_queues(n);
+
 /* Find the first multicast entry in the saved MAC filter */
 for (i = 0; i < n->mac_table.in_use; i++) {
 if (n->mac_table.macs[i * ETH_ALEN] & 1) {
@@ -1191,7 +1211,10 @@ static int virtio_net_load(QEMUFile *f, void *opaque, 
int version_id)
 
 /* nc.link_down can't be migrated, so infer link_down according
  * to link status bit in n->status */
-qemu_get_queue(n->nic)->link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 
0;
+link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
+for (i = 0; i < n->max_queues; i++) {
+qemu_get_subqueue(n->nic, i)->link_down = link_down;
+}
 
 return 0;
 }
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 20/20] virtio-net: compat multiqueue support

2013-01-25 Thread Jason Wang
Disable multiqueue support for pre 1.4.

Signed-off-by: Jason Wang 
---
 hw/pc_piix.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 0a6923d..7bc3563 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -297,6 +297,10 @@ static QEMUMachine pc_i440fx_machine_v1_4 = {
 .driver   = "usb-tablet",\
 .property = "usb_version",\
 .value= stringify(1),\
+},{ \
+.driver   = "virtio-net-pci", \
+.property = "mq", \
+.value= "off", \
 }
 
 static QEMUMachine pc_machine_v1_3 = {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 17/20] virtio-net: separate virtqueue from VirtIONet

2013-01-25 Thread Jason Wang
To support multiqueue virtio-net, the first step is to separate the virtqueue
related fields from VirtIONet to a new structure VirtIONetQueue. The following
patches will add an array of VirtIONetQueue to VirtIONet based on this patch.

Signed-off-by: Jason Wang 
---
 hw/virtio-net.c |  195 ---
 1 files changed, 114 insertions(+), 81 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 2f49fd8..ef522d5 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -26,28 +26,33 @@
 #define MAC_TABLE_ENTRIES64
 #define MAX_VLAN(1 << 12)   /* Per 802.1Q definition */
 
+typedef struct VirtIONetQueue {
+VirtQueue *rx_vq;
+VirtQueue *tx_vq;
+QEMUTimer *tx_timer;
+QEMUBH *tx_bh;
+int tx_waiting;
+struct {
+VirtQueueElement elem;
+ssize_t len;
+} async_tx;
+struct VirtIONet *n;
+} VirtIONetQueue;
+
 typedef struct VirtIONet
 {
 VirtIODevice vdev;
 uint8_t mac[ETH_ALEN];
 uint16_t status;
-VirtQueue *rx_vq;
-VirtQueue *tx_vq;
+VirtIONetQueue vq;
 VirtQueue *ctrl_vq;
 NICState *nic;
-QEMUTimer *tx_timer;
-QEMUBH *tx_bh;
 uint32_t tx_timeout;
 int32_t tx_burst;
-int tx_waiting;
 uint32_t has_vnet_hdr;
 size_t host_hdr_len;
 size_t guest_hdr_len;
 uint8_t has_ufo;
-struct {
-VirtQueueElement elem;
-ssize_t len;
-} async_tx;
 int mergeable_rx_bufs;
 uint8_t promisc;
 uint8_t allmulti;
@@ -67,6 +72,12 @@ typedef struct VirtIONet
 DeviceState *qdev;
 } VirtIONet;
 
+static VirtIONetQueue *virtio_net_get_queue(NetClientState *nc)
+{
+VirtIONet *n = qemu_get_nic_opaque(nc);
+
+return &n->vq;
+}
 /* TODO
  * - we could suppress RX interrupt if we were so inclined.
  */
@@ -134,6 +145,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t 
status)
 error_report("unable to start vhost net: %d: "
  "falling back on userspace virtio", -r);
 n->vhost_started = 0;
+} else {
+n->vhost_started = 1;
 }
 } else {
 vhost_net_stop(&n->vdev, nc, 1, 1);
@@ -144,25 +157,26 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t 
status)
 static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
 {
 VirtIONet *n = to_virtio_net(vdev);
+VirtIONetQueue *q = &n->vq;
 
 virtio_net_vhost_status(n, status);
 
-if (!n->tx_waiting) {
+if (!q->tx_waiting) {
 return;
 }
 
 if (virtio_net_started(n, status) && !n->vhost_started) {
-if (n->tx_timer) {
-qemu_mod_timer(n->tx_timer,
+if (q->tx_timer) {
+qemu_mod_timer(q->tx_timer,
qemu_get_clock_ns(vm_clock) + n->tx_timeout);
 } else {
-qemu_bh_schedule(n->tx_bh);
+qemu_bh_schedule(q->tx_bh);
 }
 } else {
-if (n->tx_timer) {
-qemu_del_timer(n->tx_timer);
+if (q->tx_timer) {
+qemu_del_timer(q->tx_timer);
 } else {
-qemu_bh_cancel(n->tx_bh);
+qemu_bh_cancel(q->tx_bh);
 }
 }
 }
@@ -474,35 +488,40 @@ static void virtio_net_handle_rx(VirtIODevice *vdev, 
VirtQueue *vq)
 static int virtio_net_can_receive(NetClientState *nc)
 {
 VirtIONet *n = qemu_get_nic_opaque(nc);
+VirtIONetQueue *q = virtio_net_get_queue(nc);
+
 if (!n->vdev.vm_running) {
 return 0;
 }
 
-if (!virtio_queue_ready(n->rx_vq) ||
-!(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK))
+if (!virtio_queue_ready(q->rx_vq) ||
+!(n->vdev.status & VIRTIO_CONFIG_S_DRIVER_OK)) {
 return 0;
+}
 
 return 1;
 }
 
-static int virtio_net_has_buffers(VirtIONet *n, int bufsize)
+static int virtio_net_has_buffers(VirtIONetQueue *q, int bufsize)
 {
-if (virtio_queue_empty(n->rx_vq) ||
+VirtIONet *n = q->n;
+if (virtio_queue_empty(q->rx_vq) ||
 (n->mergeable_rx_bufs &&
- !virtqueue_avail_bytes(n->rx_vq, bufsize, 0))) {
-virtio_queue_set_notification(n->rx_vq, 1);
+ !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
+virtio_queue_set_notification(q->rx_vq, 1);
 
 /* To avoid a race condition where the guest has made some buffers
  * available after the above check but before notification was
  * enabled, check for available buffers again.
  */
-if (virtio_queue_empty(n->rx_vq) ||
+if (virtio_queue_empty(q->rx_vq) ||
 (n->mergeable_rx_bufs &&
- !virtqueue_avail_bytes(n->rx_vq, bufsize, 0)))
+ !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
 return 0;
+}
 }
 
-virtio_queue_set_notification(n->rx_vq, 0);
+virtio_queue_set_notification(q->rx_vq, 0);
 return 1;
 }
 
@@ -605,6 +624,7 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, 
int size)
 static ssize_t vir

[PATCH V2 18/20] virtio-net: multiqueue support

2013-01-25 Thread Jason Wang
This patch implements both userspace and vhost support for multiple queue
virtio-net (VIRTIO_NET_F_MQ). This is done by introducing an array of
VirtIONetQueue to VirtIONet.

Signed-off-by: Jason Wang 
---
 hw/virtio-net.c |  317 +++
 hw/virtio-net.h |   28 +-
 2 files changed, 275 insertions(+), 70 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index ef522d5..cec91a7 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -44,7 +44,7 @@ typedef struct VirtIONet
 VirtIODevice vdev;
 uint8_t mac[ETH_ALEN];
 uint16_t status;
-VirtIONetQueue vq;
+VirtIONetQueue vqs[MAX_QUEUE_NUM];
 VirtQueue *ctrl_vq;
 NICState *nic;
 uint32_t tx_timeout;
@@ -70,14 +70,24 @@ typedef struct VirtIONet
 } mac_table;
 uint32_t *vlans;
 DeviceState *qdev;
+int multiqueue;
+uint16_t max_queues;
+uint16_t curr_queues;
+bool queues_changed;
 } VirtIONet;
 
-static VirtIONetQueue *virtio_net_get_queue(NetClientState *nc)
+static VirtIONetQueue *virtio_net_get_subqueue(NetClientState *nc)
 {
 VirtIONet *n = qemu_get_nic_opaque(nc);
 
-return &n->vq;
+return &n->vqs[nc->queue_index];
 }
+
+static int vq2q(int queue_index)
+{
+return queue_index / 2;
+}
+
 /* TODO
  * - we could suppress RX interrupt if we were so inclined.
  */
@@ -93,6 +103,7 @@ static void virtio_net_get_config(VirtIODevice *vdev, 
uint8_t *config)
 struct virtio_net_config netcfg;
 
 stw_p(&netcfg.status, n->status);
+stw_p(&netcfg.max_virtqueue_pairs, n->max_queues);
 memcpy(netcfg.mac, n->mac, ETH_ALEN);
 memcpy(config, &netcfg, sizeof(netcfg));
 }
@@ -119,6 +130,7 @@ static bool virtio_net_started(VirtIONet *n, uint8_t status)
 static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
 {
 NetClientState *nc = qemu_get_queue(n->nic);
+int queues = n->multiqueue ? n->max_queues : 1;
 
 if (!nc->peer) {
 return;
@@ -130,26 +142,27 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t 
status)
 if (!tap_get_vhost_net(nc->peer)) {
 return;
 }
-if (!!n->vhost_started == virtio_net_started(n, status) &&
-  !nc->peer->link_down) {
+
+if (!n->queues_changed &&
+!!n->vhost_started ==
+(virtio_net_started(n, status) && !nc->peer->link_down)) {
 return;
 }
-if (!n->vhost_started) {
+if (!n->vhost_started || n->queues_changed) {
 int r;
 if (!vhost_net_query(tap_get_vhost_net(nc->peer), &n->vdev)) {
 return;
 }
 n->vhost_started = 1;
-r = vhost_net_start(&n->vdev, nc, 1, 1);
+r = vhost_net_start(&n->vdev, n->nic->ncs, n->curr_queues, queues);
 if (r < 0) {
 error_report("unable to start vhost net: %d: "
  "falling back on userspace virtio", -r);
 n->vhost_started = 0;
-} else {
-n->vhost_started = 1;
 }
+n->queues_changed = false;
 } else {
-vhost_net_stop(&n->vdev, nc, 1, 1);
+vhost_net_stop(&n->vdev, n->nic->ncs, n->curr_queues, queues);
 n->vhost_started = 0;
 }
 }
@@ -157,26 +170,38 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t 
status)
 static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
 {
 VirtIONet *n = to_virtio_net(vdev);
-VirtIONetQueue *q = &n->vq;
+VirtIONetQueue *q;
+int i;
+uint8_t queue_status;
 
 virtio_net_vhost_status(n, status);
 
-if (!q->tx_waiting) {
-return;
-}
+for (i = 0; i < n->max_queues; i++) {
+q = &n->vqs[i];
 
-if (virtio_net_started(n, status) && !n->vhost_started) {
-if (q->tx_timer) {
-qemu_mod_timer(q->tx_timer,
-   qemu_get_clock_ns(vm_clock) + n->tx_timeout);
+if ((!n->multiqueue && i != 0) || i >= n->curr_queues) {
+queue_status = 0;
 } else {
-qemu_bh_schedule(q->tx_bh);
+queue_status = status;
 }
-} else {
-if (q->tx_timer) {
-qemu_del_timer(q->tx_timer);
+
+if (!q->tx_waiting) {
+continue;
+}
+
+if (virtio_net_started(n, queue_status) && !n->vhost_started) {
+if (q->tx_timer) {
+qemu_mod_timer(q->tx_timer,
+   qemu_get_clock_ns(vm_clock) + n->tx_timeout);
+} else {
+qemu_bh_schedule(q->tx_bh);
+}
 } else {
-qemu_bh_cancel(q->tx_bh);
+if (q->tx_timer) {
+qemu_del_timer(q->tx_timer);
+} else {
+qemu_bh_cancel(q->tx_bh);
+}
 }
 }
 }
@@ -208,6 +233,9 @@ static void virtio_net_reset(VirtIODevice *vdev)
 n->nomulti = 0;
 n->nouni = 0;
 n->nobcast = 0;
+/* multiqueue is disabled by default */
+n->curr_queues = 1;
+

[PATCH V2 16/20] virtio: add a queue_index to VirtQueue

2013-01-25 Thread Jason Wang
Add a queue_index to VirtQueue and a helper to fetch it, this could be used by
multiqueue supported device.

Signed-off-by: Jason Wang 
---
 hw/virtio.c |8 
 hw/virtio.h |1 +
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index d8c77b0..e259348 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -73,6 +73,8 @@ struct VirtQueue
 /* Notification enabled? */
 bool notification;
 
+uint16_t queue_index;
+
 int inuse;
 
 uint16_t vector;
@@ -931,6 +933,7 @@ void virtio_init(VirtIODevice *vdev, const char *name,
 for (i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
 vdev->vq[i].vector = VIRTIO_NO_VECTOR;
 vdev->vq[i].vdev = vdev;
+vdev->vq[i].queue_index = i;
 }
 
 vdev->name = name;
@@ -1018,6 +1021,11 @@ VirtQueue *virtio_get_queue(VirtIODevice *vdev, int n)
 return vdev->vq + n;
 }
 
+uint16_t virtio_get_queue_index(VirtQueue *vq)
+{
+return vq->queue_index;
+}
+
 static void virtio_queue_guest_notifier_read(EventNotifier *n)
 {
 VirtQueue *vq = container_of(n, VirtQueue, guest_notifier);
diff --git a/hw/virtio.h b/hw/virtio.h
index d3da1d2..a29a54d 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -280,6 +280,7 @@ hwaddr virtio_queue_get_ring_size(VirtIODevice *vdev, int 
n);
 uint16_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n);
 void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx);
 VirtQueue *virtio_get_queue(VirtIODevice *vdev, int n);
+uint16_t virtio_get_queue_index(VirtQueue *vq);
 int virtio_queue_get_id(VirtQueue *vq);
 EventNotifier *virtio_queue_get_guest_notifier(VirtQueue *vq);
 void virtio_queue_set_guest_notifier_fd_handler(VirtQueue *vq, bool assign,
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 14/20] vhost: multiqueue support

2013-01-25 Thread Jason Wang
This patch lets vhost support multiqueue. The idea is simple, just launching
multiple threads of vhost and let each of vhost thread processing a subset of
the virtqueues of the device. After this change each emulated device can have
multiple vhost threads as its backend.

To do this, a virtqueue index were introduced to record to first virtqueue that
will be handled by this vhost_net device. Based on this and nvqs, vhost could
calculate its relative index to setup vhost_net device.

Since we may have many vhost/net devices for a virtio-net device. The setting of
guest notifiers were moved out of the starting/stopping of a specific vhost
thread. The vhost_net_{start|stop}() were renamed to
vhost_net_{start|stop}_one(), and a new vhost_net_{start|stop}() were introduced
to configure the guest notifiers and start/stop all vhost/vhost_net devices.

Signed-off-by: Jason Wang 
---
 hw/vhost.c  |   82 +---
 hw/vhost.h  |2 +
 hw/vhost_net.c  |   92 ++-
 hw/vhost_net.h  |6 ++-
 hw/virtio-net.c |4 +-
 5 files changed, 128 insertions(+), 58 deletions(-)

diff --git a/hw/vhost.c b/hw/vhost.c
index cee8aad..38257b9 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -619,14 +619,17 @@ static int vhost_virtqueue_start(struct vhost_dev *dev,
 {
 hwaddr s, l, a;
 int r;
+int vhost_vq_index = idx - dev->vq_index;
 struct vhost_vring_file file = {
-.index = idx,
+.index = vhost_vq_index
 };
 struct vhost_vring_state state = {
-.index = idx,
+.index = vhost_vq_index
 };
 struct VirtQueue *vvq = virtio_get_queue(vdev, idx);
 
+assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs);
+
 vq->num = state.num = virtio_queue_get_num(vdev, idx);
 r = ioctl(dev->control, VHOST_SET_VRING_NUM, &state);
 if (r) {
@@ -669,11 +672,12 @@ static int vhost_virtqueue_start(struct vhost_dev *dev,
 goto fail_alloc_ring;
 }
 
-r = vhost_virtqueue_set_addr(dev, vq, idx, dev->log_enabled);
+r = vhost_virtqueue_set_addr(dev, vq, vhost_vq_index, dev->log_enabled);
 if (r < 0) {
 r = -errno;
 goto fail_alloc;
 }
+
 file.fd = event_notifier_get_fd(virtio_queue_get_host_notifier(vvq));
 r = ioctl(dev->control, VHOST_SET_VRING_KICK, &file);
 if (r) {
@@ -709,9 +713,10 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
 unsigned idx)
 {
 struct vhost_vring_state state = {
-.index = idx,
+.index = idx - dev->vq_index
 };
 int r;
+assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs);
 r = ioctl(dev->control, VHOST_GET_VRING_BASE, &state);
 if (r < 0) {
 fprintf(stderr, "vhost VQ %d ring restore failed: %d\n", idx, r);
@@ -867,7 +872,9 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, 
VirtIODevice *vdev)
 }
 
 for (i = 0; i < hdev->nvqs; ++i) {
-r = vdev->binding->set_host_notifier(vdev->binding_opaque, i, true);
+r = vdev->binding->set_host_notifier(vdev->binding_opaque,
+ hdev->vq_index + i,
+ true);
 if (r < 0) {
 fprintf(stderr, "vhost VQ %d notifier binding failed: %d\n", i, 
-r);
 goto fail_vq;
@@ -877,7 +884,9 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, 
VirtIODevice *vdev)
 return 0;
 fail_vq:
 while (--i >= 0) {
-r = vdev->binding->set_host_notifier(vdev->binding_opaque, i, false);
+r = vdev->binding->set_host_notifier(vdev->binding_opaque,
+ hdev->vq_index + i,
+ false);
 if (r < 0) {
 fprintf(stderr, "vhost VQ %d notifier cleanup error: %d\n", i, -r);
 fflush(stderr);
@@ -898,7 +907,9 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, 
VirtIODevice *vdev)
 int i, r;
 
 for (i = 0; i < hdev->nvqs; ++i) {
-r = vdev->binding->set_host_notifier(vdev->binding_opaque, i, false);
+r = vdev->binding->set_host_notifier(vdev->binding_opaque,
+ hdev->vq_index + i,
+ false);
 if (r < 0) {
 fprintf(stderr, "vhost VQ %d notifier cleanup failed: %d\n", i, 
-r);
 fflush(stderr);
@@ -912,8 +923,9 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, 
VirtIODevice *vdev)
  */
 bool vhost_virtqueue_pending(struct vhost_dev *hdev, int n)
 {
-struct vhost_virtqueue *vq = hdev->vqs + n;
+struct vhost_virtqueue *vq = hdev->vqs + n - hdev->vq_index;
 assert(hdev->started);
+assert(n >= hdev->vq_index && n < hdev->vq_index + hdev->nvqs);
 return event_notifier_test_and_clear(&vq->masked_notifier);
 }
 
@@ -922,15 +934,16 @@ void vhost_virtqu

[PATCH V2 15/20] virtio: introduce virtio_del_queue()

2013-01-25 Thread Jason Wang
Some device (such as virtio-net) needs the ability to destroy or re-order the
virtqueues, this patch adds a helper to do this.

Signed-off-by: Jason Wang 
---
 hw/virtio.c |9 +
 hw/virtio.h |2 ++
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index ca170c3..d8c77b0 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -701,6 +701,15 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
queue_size,
 return &vdev->vq[i];
 }
 
+void virtio_del_queue(VirtIODevice *vdev, int n)
+{
+if (n < 0 || n >= VIRTIO_PCI_QUEUE_MAX) {
+abort();
+}
+
+vdev->vq[n].vring.num = 0;
+}
+
 void virtio_irq(VirtQueue *vq)
 {
 trace_virtio_irq(vq);
diff --git a/hw/virtio.h b/hw/virtio.h
index 9cc7b85..d3da1d2 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -181,6 +181,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
queue_size,
 void (*handle_output)(VirtIODevice *,
   VirtQueue *));
 
+void virtio_del_queue(VirtIODevice *vdev, int n);
+
 void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
 unsigned int len);
 void virtqueue_flush(VirtQueue *vq, unsigned int count);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 12/20] tap: introduce a helper to get the name of an interface

2013-01-25 Thread Jason Wang
This patch introduces a helper tap_get_ifname() to get the device name of tap
device. This is needed when ifname is unspecified in the command line and qemu
were asked to create tap device by itself. In this situation, the name were
allocated by kernel, so if multiqueue is asked, we need to fetch its name after
creating the first queue.

Only linux has this support since it's the only platform that supports
multiqueue tap.

Signed-off-by: Jason Wang 
---
 include/net/tap.h |1 +
 net/tap-aix.c |6 ++
 net/tap-bsd.c |4 
 net/tap-haiku.c   |4 
 net/tap-linux.c   |   13 +
 net/tap-solaris.c |4 
 net/tap_int.h |1 +
 7 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/include/net/tap.h b/include/net/tap.h
index 0caf8c4..c523ff0 100644
--- a/include/net/tap.h
+++ b/include/net/tap.h
@@ -37,6 +37,7 @@ void tap_set_offload(NetClientState *nc, int csum, int tso4, 
int tso6, int ecn,
 void tap_set_vnet_hdr_len(NetClientState *nc, int len);
 int tap_enable(NetClientState *nc);
 int tap_disable(NetClientState *nc);
+int tap_get_ifname(NetClientState *nc, char *ifname);
 
 int tap_get_fd(NetClientState *nc);
 
diff --git a/net/tap-aix.c b/net/tap-aix.c
index 66e0574..e760e9a 100644
--- a/net/tap-aix.c
+++ b/net/tap-aix.c
@@ -69,3 +69,9 @@ int tap_fd_disable(int fd)
 {
 return -1;
 }
+
+int tap_fd_get_ifname(int fd, char *ifname)
+{
+return -1;
+}
+
diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index cfc7a28..4f22109 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -156,3 +156,7 @@ int tap_fd_disable(int fd)
 return -1;
 }
 
+int tap_fd_get_ifname(int fd, char *ifname)
+{
+return -1;
+}
diff --git a/net/tap-haiku.c b/net/tap-haiku.c
index 664d40f..b3b5fbb 100644
--- a/net/tap-haiku.c
+++ b/net/tap-haiku.c
@@ -70,3 +70,7 @@ int tap_fd_disable(int fd)
 return -1;
 }
 
+int tap_fd_get_ifname(int fd, char *ifname)
+{
+return -1;
+}
diff --git a/net/tap-linux.c b/net/tap-linux.c
index 60ea8d0..6827c2a 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -261,3 +261,16 @@ int tap_fd_disable(int fd)
 return ret;
 }
 
+int tap_fd_get_ifname(int fd, char *ifname)
+{
+struct ifreq ifr;
+
+if (ioctl(fd, TUNGETIFF, &ifr) != 0) {
+error_report("TUNGETIFF ioctl() failed: %s",
+ strerror(errno));
+return -1;
+}
+
+pstrcpy(ifname, sizeof(ifr.ifr_name), ifr.ifr_name);
+return 0;
+}
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index 12cc392..214d95e 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -236,3 +236,7 @@ int tap_fd_disable(int fd)
 return -1;
 }
 
+int tap_fd_get_ifname(int fd, char *ifname)
+{
+return -1;
+}
diff --git a/net/tap_int.h b/net/tap_int.h
index ca1c21b..125f83d 100644
--- a/net/tap_int.h
+++ b/net/tap_int.h
@@ -44,5 +44,6 @@ void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, 
int ecn, int ufo);
 void tap_fd_set_vnet_hdr_len(int fd, int len);
 int tap_fd_enable(int fd);
 int tap_fd_disable(int fd);
+int tap_fd_get_ifname(int fd, char *ifname);
 
 #endif /* QEMU_TAP_H */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 13/20] tap: multiqueue support

2013-01-25 Thread Jason Wang
Recently, linux support multiqueue tap which could let userspace call TUNSETIFF
for a signle device many times to create multiple file descriptors as
independent queues. User could also enable/disabe a specific queue through
TUNSETQUEUE.

The patch adds the generic infrastructure to create multiqueue taps. To achieve
this a new parameter "queues" were introduced to specify how many queues were
expected to be created for tap by qemu itself. Alternatively, management could
also pass multiple pre-created tap file descriptors separated with ':' through a
new parameter fds like -netdev tap,id=hn0,fds="X:Y:..:Z". Multiple vhost file
descriptors could also be passed in this way.

Each TAPState were still associated to a tap fd, which mean multiple TAPStates
were created when user needs multiqueue taps. Since each TAPState contains one
NetClientState, with the multiqueue nic support, an N peers of NetClientState
were built up.

A new parameter, mq_required were introduce in tap_open() to create multiqueue
tap fds.

Signed-off-by: Jason Wang 
---
 include/net/tap.h |1 -
 net/tap-aix.c |3 +-
 net/tap-bsd.c |3 +-
 net/tap-haiku.c   |3 +-
 net/tap-linux.c   |4 +-
 net/tap-solaris.c |3 +-
 net/tap.c |  158 +
 net/tap_int.h |3 +-
 qapi-schema.json  |5 +-
 9 files changed, 139 insertions(+), 44 deletions(-)

diff --git a/include/net/tap.h b/include/net/tap.h
index c523ff0..0caf8c4 100644
--- a/include/net/tap.h
+++ b/include/net/tap.h
@@ -37,7 +37,6 @@ void tap_set_offload(NetClientState *nc, int csum, int tso4, 
int tso6, int ecn,
 void tap_set_vnet_hdr_len(NetClientState *nc, int len);
 int tap_enable(NetClientState *nc);
 int tap_disable(NetClientState *nc);
-int tap_get_ifname(NetClientState *nc, char *ifname);
 
 int tap_get_fd(NetClientState *nc);
 
diff --git a/net/tap-aix.c b/net/tap-aix.c
index e760e9a..804d164 100644
--- a/net/tap-aix.c
+++ b/net/tap-aix.c
@@ -25,7 +25,8 @@
 #include "tap_int.h"
 #include 
 
-int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int 
vnet_hdr_required)
+int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
+ int vnet_hdr_required, int mq_required)
 {
 fprintf(stderr, "no tap on AIX\n");
 return -1;
diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index 4f22109..bcdb268 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -33,7 +33,8 @@
 #include 
 #endif
 
-int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int 
vnet_hdr_required)
+int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
+ int vnet_hdr_required, int mq_required)
 {
 int fd;
 #ifdef TAPGIFNAME
diff --git a/net/tap-haiku.c b/net/tap-haiku.c
index b3b5fbb..e5ce436 100644
--- a/net/tap-haiku.c
+++ b/net/tap-haiku.c
@@ -25,7 +25,8 @@
 #include "tap_int.h"
 #include 
 
-int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int 
vnet_hdr_required)
+int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
+ int vnet_hdr_required, int mq_required)
 {
 fprintf(stderr, "no tap on Haiku\n");
 return -1;
diff --git a/net/tap-linux.c b/net/tap-linux.c
index 6827c2a..a1a6128 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -36,12 +36,12 @@
 
 #define PATH_NET_TUN "/dev/net/tun"
 
-int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int 
vnet_hdr_required)
+int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
+ int vnet_hdr_required, int mq_required)
 {
 struct ifreq ifr;
 int fd, ret;
 int len = sizeof(struct virtio_net_hdr);
-int mq_required = 0;
 
 TFR(fd = open(PATH_NET_TUN, O_RDWR));
 if (fd < 0) {
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index 214d95e..9c7278f 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -173,7 +173,8 @@ static int tap_alloc(char *dev, size_t dev_size)
 return tap_fd;
 }
 
-int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int 
vnet_hdr_required)
+int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
+ int vnet_hdr_required, int mq_required)
 {
 char  dev[10]="";
 int fd;
diff --git a/net/tap.c b/net/tap.c
index 95e557b..072e166 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -560,17 +560,10 @@ int net_init_bridge(const NetClientOptions *opts, const 
char *name,
 
 static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr,
 const char *setup_script, char *ifname,
-size_t ifname_sz)
+size_t ifname_sz, int mq_required)
 {
 int fd, vnet_hdr_required;
 
-if (tap->has_ifname) {
-pstrcpy(ifname, ifname_sz, tap->ifname);
-} else {
-assert(ifname_sz > 0);
-ifname[0] = '\0';
-}
-
 if (tap->has_vnet_hdr) {
 *vnet_hdr = tap->vnet_hdr;
 vnet_hdr_required = *vnet_hdr;
@@ -579,7 +572,8 @@ static int net_tap_init(const NetdevTapOptions *tap, int 
*vnet_hdr,
 vnet_hdr_required = 0;
 }
 
-

[PATCH V2 08/20] tap: import linux multiqueue constants

2013-01-25 Thread Jason Wang
Import multiqueue constants from if_tun.h from 3.8-rc3. A new ifr flag
IFF_MULTI_QUEUE were introduced to create a multiqueue backend by calling
TUNSETIFF with the this flag and with the same interface name many times.

A new ioctl TUNSETQUEUE were introduced. When doing this ioctl with
IFF_DETACH_QUEUE, the queue were disabled in the linux kernel. When doing this
ioctl with IFF_ATTACH_QUEUE, the queue were enabled in the linux kernel.

Signed-off-by: Jason Wang 
---
 net/tap-linux.h |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/tap-linux.h b/net/tap-linux.h
index cb2a6d4..65087e1 100644
--- a/net/tap-linux.h
+++ b/net/tap-linux.h
@@ -29,6 +29,7 @@
 #define TUNSETSNDBUF   _IOW('T', 212, int)
 #define TUNGETVNETHDRSZ _IOR('T', 215, int)
 #define TUNSETVNETHDRSZ _IOW('T', 216, int)
+#define TUNSETQUEUE  _IOW('T', 217, int)
 
 #endif
 
@@ -36,6 +37,9 @@
 #define IFF_TAP0x0002
 #define IFF_NO_PI  0x1000
 #define IFF_VNET_HDR   0x4000
+#define IFF_MULTI_QUEUE 0x0100
+#define IFF_ATTACH_QUEUE 0x0200
+#define IFF_DETACH_QUEUE 0x0400
 
 /* Features for GSO (TUNSETOFFLOAD). */
 #define TUN_F_CSUM 0x01/* You can hand me unchecksummed packets. */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 10/20] tap: add Linux multiqueue support

2013-01-25 Thread Jason Wang
This patch add basic multiqueue support for Linux. When multiqueue is needed, we
will first check whether kernel support multiqueue tap before creating more
queues. Two new functions tap_fd_enable() and tap_fd_disable() were introduced
to enable and disable a specific queue. Since the multiqueue is only supported
in Linux, return error on other platforms.

Signed-off-by: Jason Wang 
---
 net/tap-aix.c |   10 ++
 net/tap-bsd.c |   11 +++
 net/tap-haiku.c   |   11 +++
 net/tap-linux.c   |   52 
 net/tap-solaris.c |   11 +++
 net/tap_int.h |2 ++
 6 files changed, 97 insertions(+), 0 deletions(-)

diff --git a/net/tap-aix.c b/net/tap-aix.c
index aff6c52..66e0574 100644
--- a/net/tap-aix.c
+++ b/net/tap-aix.c
@@ -59,3 +59,13 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
 int tso6, int ecn, int ufo)
 {
 }
+
+int tap_fd_enable(int fd)
+{
+return -1;
+}
+
+int tap_fd_disable(int fd)
+{
+return -1;
+}
diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index 01c705b..cfc7a28 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -145,3 +145,14 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
 int tso6, int ecn, int ufo)
 {
 }
+
+int tap_fd_enable(int fd)
+{
+return -1;
+}
+
+int tap_fd_disable(int fd)
+{
+return -1;
+}
+
diff --git a/net/tap-haiku.c b/net/tap-haiku.c
index 08cc034..664d40f 100644
--- a/net/tap-haiku.c
+++ b/net/tap-haiku.c
@@ -59,3 +59,14 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
 int tso6, int ecn, int ufo)
 {
 }
+
+int tap_fd_enable(int fd)
+{
+return -1;
+}
+
+int tap_fd_disable(int fd)
+{
+return -1;
+}
+
diff --git a/net/tap-linux.c b/net/tap-linux.c
index 059f5f3..60ea8d0 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -41,6 +41,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, 
int vnet_hdr_required
 struct ifreq ifr;
 int fd, ret;
 int len = sizeof(struct virtio_net_hdr);
+int mq_required = 0;
 
 TFR(fd = open(PATH_NET_TUN, O_RDWR));
 if (fd < 0) {
@@ -76,6 +77,20 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, 
int vnet_hdr_required
 ioctl(fd, TUNSETVNETHDRSZ, &len);
 }
 
+if (mq_required) {
+unsigned int features;
+
+if ((ioctl(fd, TUNGETFEATURES, &features) != 0) ||
+!(features & IFF_MULTI_QUEUE)) {
+error_report("multiqueue required, but no kernel "
+ "support for IFF_MULTI_QUEUE available");
+close(fd);
+return -1;
+} else {
+ifr.ifr_flags |= IFF_MULTI_QUEUE;
+}
+}
+
 if (ifname[0] != '\0')
 pstrcpy(ifr.ifr_name, IFNAMSIZ, ifname);
 else
@@ -209,3 +224,40 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
 }
 }
 }
+
+/* Enable a specific queue of tap. */
+int tap_fd_enable(int fd)
+{
+struct ifreq ifr;
+int ret;
+
+memset(&ifr, 0, sizeof(ifr));
+
+ifr.ifr_flags = IFF_ATTACH_QUEUE;
+ret = ioctl(fd, TUNSETQUEUE, (void *) &ifr);
+
+if (ret != 0) {
+error_report("could not enable queue");
+}
+
+return ret;
+}
+
+/* Disable a specific queue of tap/ */
+int tap_fd_disable(int fd)
+{
+struct ifreq ifr;
+int ret;
+
+memset(&ifr, 0, sizeof(ifr));
+
+ifr.ifr_flags = IFF_DETACH_QUEUE;
+ret = ioctl(fd, TUNSETQUEUE, (void *) &ifr);
+
+if (ret != 0) {
+error_report("could not disable queue");
+}
+
+return ret;
+}
+
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index 486a7ea..12cc392 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -225,3 +225,14 @@ void tap_fd_set_offload(int fd, int csum, int tso4,
 int tso6, int ecn, int ufo)
 {
 }
+
+int tap_fd_enable(int fd)
+{
+return -1;
+}
+
+int tap_fd_disable(int fd)
+{
+return -1;
+}
+
diff --git a/net/tap_int.h b/net/tap_int.h
index 1dffe12..ca1c21b 100644
--- a/net/tap_int.h
+++ b/net/tap_int.h
@@ -42,5 +42,7 @@ int tap_probe_vnet_hdr_len(int fd, int len);
 int tap_probe_has_ufo(int fd);
 void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int 
ufo);
 void tap_fd_set_vnet_hdr_len(int fd, int len);
+int tap_fd_enable(int fd);
+int tap_fd_disable(int fd);
 
 #endif /* QEMU_TAP_H */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 11/20] tap: support enabling or disabling a queue

2013-01-25 Thread Jason Wang
This patch introduce a new bit - enabled in TAPState which tracks whether a
specific queue/fd is enabled. The tap/fd is enabled during initialization and
could be enabled/disabled by tap_enalbe() and tap_disable() which calls platform
specific helpers to do the real work. Polling of a tap fd can only done when
the tap was enabled.

Signed-off-by: Jason Wang 
---
 include/net/tap.h |2 ++
 net/tap-win32.c   |   10 ++
 net/tap.c |   43 ---
 3 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/include/net/tap.h b/include/net/tap.h
index bb7efb5..0caf8c4 100644
--- a/include/net/tap.h
+++ b/include/net/tap.h
@@ -35,6 +35,8 @@ int tap_has_vnet_hdr_len(NetClientState *nc, int len);
 void tap_using_vnet_hdr(NetClientState *nc, int using_vnet_hdr);
 void tap_set_offload(NetClientState *nc, int csum, int tso4, int tso6, int 
ecn, int ufo);
 void tap_set_vnet_hdr_len(NetClientState *nc, int len);
+int tap_enable(NetClientState *nc);
+int tap_disable(NetClientState *nc);
 
 int tap_get_fd(NetClientState *nc);
 
diff --git a/net/tap-win32.c b/net/tap-win32.c
index 265369c..a2cd94b 100644
--- a/net/tap-win32.c
+++ b/net/tap-win32.c
@@ -764,3 +764,13 @@ void tap_set_vnet_hdr_len(NetClientState *nc, int len)
 {
 assert(0);
 }
+
+int tap_enable(NetClientState *nc)
+{
+assert(0);
+}
+
+int tap_disable(NetClientState *nc)
+{
+assert(0);
+}
diff --git a/net/tap.c b/net/tap.c
index 67080f1..95e557b 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -59,6 +59,7 @@ typedef struct TAPState {
 unsigned int write_poll : 1;
 unsigned int using_vnet_hdr : 1;
 unsigned int has_ufo: 1;
+unsigned int enabled : 1;
 VHostNetState *vhost_net;
 unsigned host_vnet_hdr_len;
 } TAPState;
@@ -72,9 +73,9 @@ static void tap_writable(void *opaque);
 static void tap_update_fd_handler(TAPState *s)
 {
 qemu_set_fd_handler2(s->fd,
- s->read_poll  ? tap_can_send : NULL,
- s->read_poll  ? tap_send : NULL,
- s->write_poll ? tap_writable : NULL,
+ s->read_poll && s->enabled ? tap_can_send : NULL,
+ s->read_poll && s->enabled ? tap_send : NULL,
+ s->write_poll && s->enabled ? tap_writable : NULL,
  s);
 }
 
@@ -339,6 +340,7 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
 s->host_vnet_hdr_len = vnet_hdr ? sizeof(struct virtio_net_hdr) : 0;
 s->using_vnet_hdr = 0;
 s->has_ufo = tap_probe_has_ufo(s->fd);
+s->enabled = 1;
 tap_set_offload(&s->nc, 0, 0, 0, 0, 0);
 /*
  * Make sure host header length is set correctly in tap:
@@ -737,3 +739,38 @@ VHostNetState *tap_get_vhost_net(NetClientState *nc)
 assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_TAP);
 return s->vhost_net;
 }
+
+int tap_enable(NetClientState *nc)
+{
+TAPState *s = DO_UPCAST(TAPState, nc, nc);
+int ret;
+
+if (s->enabled) {
+return 0;
+} else {
+ret = tap_fd_enable(s->fd);
+if (ret == 0) {
+s->enabled = 1;
+tap_update_fd_handler(s);
+}
+return ret;
+}
+}
+
+int tap_disable(NetClientState *nc)
+{
+TAPState *s = DO_UPCAST(TAPState, nc, nc);
+int ret;
+
+if (s->enabled == 0) {
+return 0;
+} else {
+ret = tap_fd_disable(s->fd);
+if (ret == 0) {
+qemu_purge_queued_packets(nc);
+s->enabled = 0;
+tap_update_fd_handler(s);
+}
+return ret;
+}
+}
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 09/20] tap: factor out common tap initialization

2013-01-25 Thread Jason Wang
This patch factors out the common initialization of tap into a new helper
net_init_tap_one(). This will be used by multiqueue tap patches.

Signed-off-by: Jason Wang 
---
 net/tap.c |  130 ++---
 1 files changed, 73 insertions(+), 57 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index eb40c42..67080f1 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -593,6 +593,73 @@ static int net_tap_init(const NetdevTapOptions *tap, int 
*vnet_hdr,
 return fd;
 }
 
+static int net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
+const char *model, const char *name,
+const char *ifname, const char *script,
+const char *downscript, const char *vhostfdname,
+int vnet_hdr, int fd)
+{
+TAPState *s;
+
+s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
+if (!s) {
+close(fd);
+return -1;
+}
+
+if (tap_set_sndbuf(s->fd, tap) < 0) {
+return -1;
+}
+
+if (tap->has_fd) {
+snprintf(s->nc.info_str, sizeof(s->nc.info_str), "fd=%d", fd);
+} else if (tap->has_helper) {
+snprintf(s->nc.info_str, sizeof(s->nc.info_str), "helper=%s",
+ tap->helper);
+} else {
+const char *downscript;
+
+downscript = tap->has_downscript ? tap->downscript :
+DEFAULT_NETWORK_DOWN_SCRIPT;
+
+snprintf(s->nc.info_str, sizeof(s->nc.info_str),
+ "ifname=%s,script=%s,downscript=%s", ifname, script,
+ downscript);
+
+if (strcmp(downscript, "no") != 0) {
+snprintf(s->down_script, sizeof(s->down_script), "%s", downscript);
+snprintf(s->down_script_arg, sizeof(s->down_script_arg),
+ "%s", ifname);
+}
+}
+
+if (tap->has_vhost ? tap->vhost :
+vhostfdname || (tap->has_vhostforce && tap->vhostforce)) {
+int vhostfd;
+
+if (tap->has_vhostfd) {
+vhostfd = monitor_handle_fd_param(cur_mon, vhostfdname);
+if (vhostfd == -1) {
+return -1;
+}
+} else {
+vhostfd = -1;
+}
+
+s->vhost_net = vhost_net_init(&s->nc, vhostfd,
+  tap->has_vhostforce && tap->vhostforce);
+if (!s->vhost_net) {
+error_report("vhost-net requested but could not be initialized");
+return -1;
+}
+} else if (tap->has_vhostfd) {
+error_report("vhostfd= is not valid without vhost");
+return -1;
+}
+
+return 0;
+}
+
 int net_init_tap(const NetClientOptions *opts, const char *name,
  NetClientState *peer)
 {
@@ -600,10 +667,10 @@ int net_init_tap(const NetClientOptions *opts, const char 
*name,
 
 int fd, vnet_hdr = 0;
 const char *model;
-TAPState *s;
 
 /* for the no-fd, no-helper case */
 const char *script = NULL; /* suppress wrong "uninit'd use" gcc warning */
+const char *downscript = NULL;
 char ifname[128];
 
 assert(opts->kind == NET_CLIENT_OPTIONS_KIND_TAP);
@@ -649,6 +716,8 @@ int net_init_tap(const NetClientOptions *opts, const char 
*name,
 
 } else {
 script = tap->has_script ? tap->script : DEFAULT_NETWORK_SCRIPT;
+downscript = tap->has_downscript ? tap->downscript :
+DEFAULT_NETWORK_DOWN_SCRIPT;
 fd = net_tap_init(tap, &vnet_hdr, script, ifname, sizeof ifname);
 if (fd == -1) {
 return -1;
@@ -657,62 +726,9 @@ int net_init_tap(const NetClientOptions *opts, const char 
*name,
 model = "tap";
 }
 
-s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
-if (!s) {
-close(fd);
-return -1;
-}
-
-if (tap_set_sndbuf(s->fd, tap) < 0) {
-return -1;
-}
-
-if (tap->has_fd) {
-snprintf(s->nc.info_str, sizeof(s->nc.info_str), "fd=%d", fd);
-} else if (tap->has_helper) {
-snprintf(s->nc.info_str, sizeof(s->nc.info_str), "helper=%s",
- tap->helper);
-} else {
-const char *downscript;
-
-downscript = tap->has_downscript ? tap->downscript :
-   DEFAULT_NETWORK_DOWN_SCRIPT;
-
-snprintf(s->nc.info_str, sizeof(s->nc.info_str),
- "ifname=%s,script=%s,downscript=%s", ifname, script,
- downscript);
-
-if (strcmp(downscript, "no") != 0) {
-snprintf(s->down_script, sizeof(s->down_script), "%s", downscript);
-snprintf(s->down_script_arg, sizeof(s->down_script_arg), "%s", 
ifname);
-}
-}
-
-if (tap->has_vhost ? tap->vhost :
-tap->has_vhostfd || (tap->has_vhostforce && tap->vhostforce)) {
-int vhostfd;
-
-if (tap->has_vhostfd) {
-vhostfd = monitor_handle_fd_param(cur_mon, tap->vhostfd);
-if (vhostfd == -1) {
-   

[PATCH V2 06/20] net: introduce NetClientState destructor

2013-01-25 Thread Jason Wang
To allow allocating an array of NetClientState and free it once, this patch
introduces destructor of NetClientState. Which could do type specific free,
which could be used by multiqueue to free the array once.

Signed-off-by: Jason Wang 
---
 include/net/net.h |2 ++
 net/net.c |   17 +
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index 995df5c..22adc99 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -35,6 +35,7 @@ typedef ssize_t (NetReceive)(NetClientState *, const uint8_t 
*, size_t);
 typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
 typedef void (NetCleanup) (NetClientState *);
 typedef void (LinkStatusChanged)(NetClientState *);
+typedef void (NetClientDestructor)(NetClientState *);
 
 typedef struct NetClientInfo {
 NetClientOptionsKind type;
@@ -58,6 +59,7 @@ struct NetClientState {
 char *name;
 char info_str[256];
 unsigned receive_disabled : 1;
+NetClientDestructor *destructor;
 };
 
 typedef struct NICState {
diff --git a/net/net.c b/net/net.c
index 4e84d54..6368896 100644
--- a/net/net.c
+++ b/net/net.c
@@ -182,11 +182,17 @@ static char *assign_name(NetClientState *nc1, const char 
*model)
 return g_strdup(buf);
 }
 
+static void qemu_net_client_destructor(NetClientState *nc)
+{
+g_free(nc);
+}
+
 static void qemu_net_client_setup(NetClientState *nc,
   NetClientInfo *info,
   NetClientState *peer,
   const char *model,
-  const char *name)
+  const char *name,
+  NetClientDestructor *destructor)
 {
 nc->info = info;
 nc->model = g_strdup(model);
@@ -204,7 +210,7 @@ static void qemu_net_client_setup(NetClientState *nc,
 QTAILQ_INSERT_TAIL(&net_clients, nc, next);
 
 nc->send_queue = qemu_new_net_queue(nc);
-
+nc->destructor = destructor;
 }
 
 NetClientState *qemu_new_net_client(NetClientInfo *info,
@@ -217,7 +223,8 @@ NetClientState *qemu_new_net_client(NetClientInfo *info,
 assert(info->size >= sizeof(NetClientState));
 
 nc = g_malloc0(info->size);
-qemu_net_client_setup(nc, info, peer, model, name);
+qemu_net_client_setup(nc, info, peer, model, name,
+  qemu_net_client_destructor);
 
 return nc;
 }
@@ -279,7 +286,9 @@ static void qemu_free_net_client(NetClientState *nc)
 }
 g_free(nc->name);
 g_free(nc->model);
-g_free(nc);
+if (nc->destructor) {
+nc->destructor(nc);
+}
 }
 
 void qemu_del_net_client(NetClientState *nc)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 04/20] net: introduce qemu_find_net_clients_except()

2013-01-25 Thread Jason Wang
In multiqueue, all NetClientState that belongs to the same netdev or nic has the
same id. So this patches introduces an helper qemu_find_net_clients_except()
which finds all NetClientState with the same id. This will be used by multiqueue
networking.

Signed-off-by: Jason Wang 
---
 include/net/net.h |2 ++
 net/net.c |   21 +
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index f0d1aa2..995df5c 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -68,6 +68,8 @@ typedef struct NICState {
 } NICState;
 
 NetClientState *qemu_find_netdev(const char *id);
+int qemu_find_net_clients_except(const char *id, NetClientState **ncs,
+ NetClientOptionsKind type, int max);
 NetClientState *qemu_new_net_client(NetClientInfo *info,
 NetClientState *peer,
 const char *model,
diff --git a/net/net.c b/net/net.c
index 8999f8d..6457fc0 100644
--- a/net/net.c
+++ b/net/net.c
@@ -508,6 +508,27 @@ NetClientState *qemu_find_netdev(const char *id)
 return NULL;
 }
 
+int qemu_find_net_clients_except(const char *id, NetClientState **ncs,
+ NetClientOptionsKind type, int max)
+{
+NetClientState *nc;
+int ret = 0;
+
+QTAILQ_FOREACH(nc, &net_clients, next) {
+if (nc->info->type == type) {
+continue;
+}
+if (!strcmp(nc->name, id)) {
+if (ret < max) {
+ncs[ret] = nc;
+}
+ret++;
+}
+}
+
+return ret;
+}
+
 static int nic_get_free_idx(void)
 {
 int index;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 05/20] net: introduce qemu_net_client_setup()

2013-01-25 Thread Jason Wang
This patch separates the setup of NetClientState from its allocation, this will
allow allocating an arrays of NetClientState and does the initialization one by
one which is what multiqueue needs.

Signed-off-by: Jason Wang 
---
 net/net.c |   29 +++--
 1 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/net/net.c b/net/net.c
index 6457fc0..4e84d54 100644
--- a/net/net.c
+++ b/net/net.c
@@ -182,17 +182,12 @@ static char *assign_name(NetClientState *nc1, const char 
*model)
 return g_strdup(buf);
 }
 
-NetClientState *qemu_new_net_client(NetClientInfo *info,
-NetClientState *peer,
-const char *model,
-const char *name)
+static void qemu_net_client_setup(NetClientState *nc,
+  NetClientInfo *info,
+  NetClientState *peer,
+  const char *model,
+  const char *name)
 {
-NetClientState *nc;
-
-assert(info->size >= sizeof(NetClientState));
-
-nc = g_malloc0(info->size);
-
 nc->info = info;
 nc->model = g_strdup(model);
 if (name) {
@@ -210,6 +205,20 @@ NetClientState *qemu_new_net_client(NetClientInfo *info,
 
 nc->send_queue = qemu_new_net_queue(nc);
 
+}
+
+NetClientState *qemu_new_net_client(NetClientInfo *info,
+NetClientState *peer,
+const char *model,
+const char *name)
+{
+NetClientState *nc;
+
+assert(info->size >= sizeof(NetClientState));
+
+nc = g_malloc0(info->size);
+qemu_net_client_setup(nc, info, peer, model, name);
+
 return nc;
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 02/20] net: introduce qemu_get_nic()

2013-01-25 Thread Jason Wang
To support multiqueue, this patch introduces a helper qemu_get_nic() to get
NICState from a NetClientState. The following patches would refactor this helper
to support multiqueue.

Signed-off-by: Jason Wang 
---
 hw/cadence_gem.c|8 
 hw/dp8393x.c|6 +++---
 hw/e1000.c  |8 
 hw/eepro100.c   |6 +++---
 hw/etraxfs_eth.c|6 +++---
 hw/lan9118.c|6 +++---
 hw/lance.c  |2 +-
 hw/mcf_fec.c|6 +++---
 hw/milkymist-minimac2.c |6 +++---
 hw/mipsnet.c|6 +++---
 hw/musicpal.c   |4 ++--
 hw/ne2000-isa.c |2 +-
 hw/ne2000.c |6 +++---
 hw/opencores_eth.c  |6 +++---
 hw/pcnet-pci.c  |2 +-
 hw/pcnet.c  |6 +++---
 hw/rtl8139.c|8 
 hw/smc91c111.c  |6 +++---
 hw/spapr_llan.c |4 ++--
 hw/stellaris_enet.c |6 +++---
 hw/usb/dev-network.c|6 +++---
 hw/virtio-net.c |   10 +-
 hw/xen_nic.c|4 ++--
 hw/xgmac.c  |6 +++---
 hw/xilinx_axienet.c |6 +++---
 hw/xilinx_ethlite.c |6 +++---
 include/net/net.h   |2 ++
 net/net.c   |   20 
 28 files changed, 92 insertions(+), 78 deletions(-)

diff --git a/hw/cadence_gem.c b/hw/cadence_gem.c
index 9de688f..ab35329 100644
--- a/hw/cadence_gem.c
+++ b/hw/cadence_gem.c
@@ -409,7 +409,7 @@ static int gem_can_receive(NetClientState *nc)
 {
 GemState *s;
 
-s = DO_UPCAST(NICState, nc, nc)->opaque;
+s = qemu_get_nic_opaque(nc);
 
 DB_PRINT("\n");
 
@@ -612,7 +612,7 @@ static ssize_t gem_receive(NetClientState *nc, const 
uint8_t *buf, size_t size)
 uint8_trxbuf[2048];
 uint8_t   *rxbuf_ptr;
 
-s = DO_UPCAST(NICState, nc, nc)->opaque;
+s = qemu_get_nic_opaque(nc);
 
 /* Do nothing if receive is not enabled. */
 if (!(s->regs[GEM_NWCTRL] & GEM_NWCTRL_RXENA)) {
@@ -1149,7 +1149,7 @@ static const MemoryRegionOps gem_ops = {
 
 static void gem_cleanup(NetClientState *nc)
 {
-GemState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+GemState *s = qemu_get_nic_opaque(nc);
 
 DB_PRINT("\n");
 s->nic = NULL;
@@ -1158,7 +1158,7 @@ static void gem_cleanup(NetClientState *nc)
 static void gem_set_link(NetClientState *nc)
 {
 DB_PRINT("\n");
-phy_update_link(DO_UPCAST(NICState, nc, nc)->opaque);
+phy_update_link(qemu_get_nic_opaque(nc));
 }
 
 static NetClientInfo net_gem_info = {
diff --git a/hw/dp8393x.c b/hw/dp8393x.c
index c2d0bc8..0273fad 100644
--- a/hw/dp8393x.c
+++ b/hw/dp8393x.c
@@ -676,7 +676,7 @@ static const MemoryRegionOps dp8393x_ops = {
 
 static int nic_can_receive(NetClientState *nc)
 {
-dp8393xState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+dp8393xState *s = qemu_get_nic_opaque(nc);
 
 if (!(s->regs[SONIC_CR] & SONIC_CR_RXEN))
 return 0;
@@ -725,7 +725,7 @@ static int receive_filter(dp8393xState *s, const uint8_t * 
buf, int size)
 
 static ssize_t nic_receive(NetClientState *nc, const uint8_t * buf, size_t 
size)
 {
-dp8393xState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+dp8393xState *s = qemu_get_nic_opaque(nc);
 uint16_t data[10];
 int packet_type;
 uint32_t available, address;
@@ -861,7 +861,7 @@ static void nic_reset(void *opaque)
 
 static void nic_cleanup(NetClientState *nc)
 {
-dp8393xState *s = DO_UPCAST(NICState, nc, nc)->opaque;
+dp8393xState *s = qemu_get_nic_opaque(nc);
 
 memory_region_del_subregion(s->address_space, &s->mmio);
 memory_region_destroy(&s->mmio);
diff --git a/hw/e1000.c b/hw/e1000.c
index 7b310d7..36f4051 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -743,7 +743,7 @@ receive_filter(E1000State *s, const uint8_t *buf, int size)
 static void
 e1000_set_link_status(NetClientState *nc)
 {
-E1000State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+E1000State *s = qemu_get_nic_opaque(nc);
 uint32_t old_status = s->mac_reg[STATUS];
 
 if (nc->link_down) {
@@ -777,7 +777,7 @@ static bool e1000_has_rxbufs(E1000State *s, size_t 
total_size)
 static int
 e1000_can_receive(NetClientState *nc)
 {
-E1000State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+E1000State *s = qemu_get_nic_opaque(nc);
 
 return (s->mac_reg[RCTL] & E1000_RCTL_EN) && e1000_has_rxbufs(s, 1);
 }
@@ -793,7 +793,7 @@ static uint64_t rx_desc_base(E1000State *s)
 static ssize_t
 e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size)
 {
-E1000State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+E1000State *s = qemu_get_nic_opaque(nc);
 struct e1000_rx_desc desc;
 dma_addr_t base;
 unsigned int n, rdt;
@@ -1230,7 +1230,7 @@ e1000_mmio_setup(E1000State *d)
 static void
 e1000_cleanup(NetClientState *nc)
 {
-E1000State *s = DO_UPCAST(NICState, nc, nc)->opaque;
+E1000State *s = qemu_get_nic_opaque(nc);
 
 s->nic = NULL;
 }
diff --git a/hw/eepro100.c b/hw/ee

[PATCH V2 03/20] net: intorduce qemu_del_nic()

2013-01-25 Thread Jason Wang
To support multiqueue nic, this patch separate the nic destructor from
qemu_del_net_client() to a new helper qemu_del_nic() since the mapping bettween
NiCState and NetClientState were not 1:1 in multiqueue. The following patches
would refactor this function to support multiqueue nic.

Signed-off-by: Jason Wang 
---
 hw/e1000.c   |2 +-
 hw/eepro100.c|2 +-
 hw/ne2000.c  |2 +-
 hw/pcnet-pci.c   |2 +-
 hw/rtl8139.c |2 +-
 hw/usb/dev-network.c |2 +-
 hw/virtio-net.c  |2 +-
 hw/xen_nic.c |2 +-
 include/net/net.h|1 +
 net/net.c|   15 ++-
 10 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index 36f4051..f3590a9 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -1244,7 +1244,7 @@ pci_e1000_uninit(PCIDevice *dev)
 qemu_free_timer(d->autoneg_timer);
 memory_region_destroy(&d->mmio);
 memory_region_destroy(&d->io);
-qemu_del_net_client(qemu_get_queue(d->nic));
+qemu_del_nic(d->nic);
 }
 
 static NetClientInfo net_e1000_info = {
diff --git a/hw/eepro100.c b/hw/eepro100.c
index f9856ae..5d23796 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -1849,7 +1849,7 @@ static void pci_nic_uninit(PCIDevice *pci_dev)
 memory_region_destroy(&s->flash_bar);
 vmstate_unregister(&pci_dev->qdev, s->vmstate, s);
 eeprom93xx_free(&pci_dev->qdev, s->eeprom);
-qemu_del_net_client(qemu_get_queue(s->nic));
+qemu_del_nic(s->nic);
 }
 
 static NetClientInfo net_eepro100_info = {
diff --git a/hw/ne2000.c b/hw/ne2000.c
index c989190..3dd1c84 100644
--- a/hw/ne2000.c
+++ b/hw/ne2000.c
@@ -751,7 +751,7 @@ static void pci_ne2000_exit(PCIDevice *pci_dev)
 NE2000State *s = &d->ne2000;
 
 memory_region_destroy(&s->io);
-qemu_del_net_client(qemu_get_queue(s->nic));
+qemu_del_nic(s->nic);
 }
 
 static Property ne2000_properties[] = {
diff --git a/hw/pcnet-pci.c b/hw/pcnet-pci.c
index 26c90bf..df63b22 100644
--- a/hw/pcnet-pci.c
+++ b/hw/pcnet-pci.c
@@ -279,7 +279,7 @@ static void pci_pcnet_uninit(PCIDevice *dev)
 memory_region_destroy(&d->io_bar);
 qemu_del_timer(d->state.poll_timer);
 qemu_free_timer(d->state.poll_timer);
-qemu_del_net_client(qemu_get_queue(d->state.nic));
+qemu_del_nic(d->state.nic);
 }
 
 static NetClientInfo net_pci_pcnet_info = {
diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index b825e83..d7716be 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -3446,7 +3446,7 @@ static void pci_rtl8139_uninit(PCIDevice *dev)
 }
 qemu_del_timer(s->timer);
 qemu_free_timer(s->timer);
-qemu_del_net_client(qemu_get_queue(s->nic));
+qemu_del_nic(s->nic);
 }
 
 static void rtl8139_set_link_status(NetClientState *nc)
diff --git a/hw/usb/dev-network.c b/hw/usb/dev-network.c
index abc6eac..a01a5e7 100644
--- a/hw/usb/dev-network.c
+++ b/hw/usb/dev-network.c
@@ -1330,7 +1330,7 @@ static void usb_net_handle_destroy(USBDevice *dev)
 
 /* TODO: remove the nd_table[] entry */
 rndis_clear_responsequeue(s);
-qemu_del_net_client(qemu_get_queue(s->nic));
+qemu_del_nic(s->nic);
 }
 
 static NetClientInfo net_usbnet_info = {
diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 0b43add..47f4ab4 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -1124,6 +1124,6 @@ void virtio_net_exit(VirtIODevice *vdev)
 qemu_bh_delete(n->tx_bh);
 }
 
-qemu_del_net_client(qemu_get_queue(n->nic));
+qemu_del_nic(n->nic);
 virtio_cleanup(&n->vdev);
 }
diff --git a/hw/xen_nic.c b/hw/xen_nic.c
index 55b7960..4be077d 100644
--- a/hw/xen_nic.c
+++ b/hw/xen_nic.c
@@ -408,7 +408,7 @@ static void net_disconnect(struct XenDevice *xendev)
 netdev->rxs = NULL;
 }
 if (netdev->nic) {
-qemu_del_net_client(qemu_get_queue(netdev->nic));
+qemu_del_nic(netdev->nic);
 netdev->nic = NULL;
 }
 }
diff --git a/include/net/net.h b/include/net/net.h
index 96e05c4..f0d1aa2 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -77,6 +77,7 @@ NICState *qemu_new_nic(NetClientInfo *info,
const char *model,
const char *name,
void *opaque);
+void qemu_del_nic(NICState *nic);
 NetClientState *qemu_get_queue(NICState *nic);
 NICState *qemu_get_nic(NetClientState *nc);
 void *qemu_get_nic_opaque(NetClientState *nc);
diff --git a/net/net.c b/net/net.c
index 41dc12c..8999f8d 100644
--- a/net/net.c
+++ b/net/net.c
@@ -291,6 +291,15 @@ void qemu_del_net_client(NetClientState *nc)
 return;
 }
 
+assert(nc->info->type != NET_CLIENT_OPTIONS_KIND_NIC);
+
+qemu_cleanup_net_client(nc);
+qemu_free_net_client(nc);
+}
+
+void qemu_del_nic(NICState *nic)
+{
+NetClientState *nc = qemu_get_queue(nic);
 /* If this is a peer NIC and peer has already been deleted, free it now. */
 if (nc->peer && nc->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
 NICState *nic = qemu_get_nic(nc);
@@ -933,7 +942,11 @@ v

[PATCH V2 01/20] net: introduce qemu_get_queue()

2013-01-25 Thread Jason Wang
To support multiqueue, the patch introduce a helper qemu_get_queue()
which is used to get the NetClientState of a device. The following patches would
refactor this helper to support multiqueue.

Signed-off-by: Jason Wang 
---
 hw/cadence_gem.c|9 +++--
 hw/dp8393x.c|9 +++--
 hw/e1000.c  |   24 ---
 hw/eepro100.c   |   12 
 hw/etraxfs_eth.c|4 +-
 hw/lan9118.c|   10 +++---
 hw/mcf_fec.c|4 +-
 hw/milkymist-minimac2.c |4 +-
 hw/mipsnet.c|4 +-
 hw/musicpal.c   |2 +-
 hw/ne2000-isa.c |2 +-
 hw/ne2000.c |7 ++--
 hw/opencores_eth.c  |6 ++--
 hw/pcnet-pci.c  |2 +-
 hw/pcnet.c  |7 ++--
 hw/rtl8139.c|   14 
 hw/smc91c111.c  |4 +-
 hw/spapr_llan.c |4 +-
 hw/stellaris_enet.c |5 ++-
 hw/usb/dev-network.c|   10 +++---
 hw/virtio-net.c |   76 ++-
 hw/xen_nic.c|   13 +---
 hw/xgmac.c  |4 +-
 hw/xilinx_axienet.c |4 +-
 hw/xilinx_ethlite.c |4 +-
 include/net/net.h   |1 +
 net/net.c   |5 +++
 savevm.c|2 +-
 28 files changed, 138 insertions(+), 114 deletions(-)

diff --git a/hw/cadence_gem.c b/hw/cadence_gem.c
index 0d83442..9de688f 100644
--- a/hw/cadence_gem.c
+++ b/hw/cadence_gem.c
@@ -389,10 +389,10 @@ static void gem_init_register_masks(GemState *s)
  */
 static void phy_update_link(GemState *s)
 {
-DB_PRINT("down %d\n", s->nic->nc.link_down);
+DB_PRINT("down %d\n", qemu_get_queue(s->nic)->link_down);
 
 /* Autonegotiation status mirrors link status.  */
-if (s->nic->nc.link_down) {
+if (qemu_get_queue(s->nic)->link_down) {
 s->phy_regs[PHY_REG_STATUS] &= ~(PHY_REG_STATUS_ANEGCMPL |
  PHY_REG_STATUS_LINK);
 s->phy_regs[PHY_REG_INT_ST] |= PHY_REG_INT_ST_LINKC;
@@ -906,9 +906,10 @@ static void gem_transmit(GemState *s)
 
 /* Send the packet somewhere */
 if (s->phy_loop) {
-gem_receive(&s->nic->nc, tx_packet, total_bytes);
+gem_receive(qemu_get_queue(s->nic), tx_packet, total_bytes);
 } else {
-qemu_send_packet(&s->nic->nc, tx_packet, total_bytes);
+qemu_send_packet(qemu_get_queue(s->nic), tx_packet,
+ total_bytes);
 }
 
 /* Prepare for next packet */
diff --git a/hw/dp8393x.c b/hw/dp8393x.c
index b501450..c2d0bc8 100644
--- a/hw/dp8393x.c
+++ b/hw/dp8393x.c
@@ -339,6 +339,7 @@ static void do_receiver_disable(dp8393xState *s)
 
 static void do_transmit_packets(dp8393xState *s)
 {
+NetClientState *nc = qemu_get_queue(s->nic);
 uint16_t data[12];
 int width, size;
 int tx_len, len;
@@ -408,13 +409,13 @@ static void do_transmit_packets(dp8393xState *s)
 if (s->regs[SONIC_RCR] & (SONIC_RCR_LB1 | SONIC_RCR_LB0)) {
 /* Loopback */
 s->regs[SONIC_TCR] |= SONIC_TCR_CRSL;
-if (s->nic->nc.info->can_receive(&s->nic->nc)) {
+if (nc->info->can_receive(nc)) {
 s->loopback_packet = 1;
-s->nic->nc.info->receive(&s->nic->nc, s->tx_buffer, tx_len);
+nc->info->receive(nc, s->tx_buffer, tx_len);
 }
 } else {
 /* Transmit packet */
-qemu_send_packet(&s->nic->nc, s->tx_buffer, tx_len);
+qemu_send_packet(nc, s->tx_buffer, tx_len);
 }
 s->regs[SONIC_TCR] |= SONIC_TCR_PTX;
 
@@ -903,7 +904,7 @@ void dp83932_init(NICInfo *nd, hwaddr base, int it_shift,
 
 s->nic = qemu_new_nic(&net_dp83932_info, &s->conf, nd->model, nd->name, s);
 
-qemu_format_nic_info_str(&s->nic->nc, s->conf.macaddr.a);
+qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 qemu_register_reset(nic_reset, s);
 nic_reset(s);
 
diff --git a/hw/e1000.c b/hw/e1000.c
index ef06ca1..7b310d7 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -167,11 +167,11 @@ set_phy_ctrl(E1000State *s, int index, uint16_t val)
 {
 if ((val & MII_CR_AUTO_NEG_EN) && (val & MII_CR_RESTART_AUTO_NEG)) {
 /* no need auto-negotiation if link was down */
-if (s->nic->nc.link_down) {
+if (qemu_get_queue(s->nic)->link_down) {
 s->phy_reg[PHY_STATUS] |= MII_SR_AUTONEG_COMPLETE;
 return;
 }
-s->nic->nc.link_down = true;
+qemu_get_queue(s->nic)->link_down = true;
 e1000_link_down(s);
 s->phy_reg[PHY_STATUS] &= ~MII_SR_AUTONEG_COMPLETE;
 DBGOUT(PHY, "Start link auto negotiation\n");
@@ -183,7 +183,7 @@ static void
 e1000_autoneg_timer(void *opaque)
 {
 E1000State *s = opaque;
-s->nic->nc.link_down = false;
+qemu_get_queue(s->nic)->link_down = false;
 e1000_link_u

[PATCH V2 00/20] Multiqueue virtio-net

2013-01-25 Thread Jason Wang
Hello all:

This seires is an update of last version of multiqueue virtio-net support.

This series tries to brings multiqueue support to virtio-net through a
multiqueue support tap backend and multiple vhost threads.

To support this, multiqueue nic support were added to qemu. This is done by
introducing an array of NetClientStates in NICState, and make each pair of peers
to be an queue of the nic. This is done in patch 1-7.

Tap were also converted to be able to create a multiple queue
backend. Currently, only linux support this by issuing TUNSETIFF N times with
the same device name to create N queues. Each fd returned by TUNSETIFF were a
queue supported by kernel. Three new command lines were introduced, "queues"
were used to tell how many queues will be created by qemu; "fds" were used to
pass multiple pre-created tap file descriptors to qemu; "vhostfds" were used to
pass multiple pre-created vhost descriptors to qemu. This is done in patch 8-13.

A method of deleting a queue and queue_index were also introduce for virtio,
this is done in patch 14-15.

Vhost were also changed to support multiqueue by introducing a start vq index
which tracks the first virtqueue that will be used by vhost instead of the
assumption that the vhost always use virtqueue from index 0. This is done in
patch 16.

The last part is the multiqueue userspace changes, this is done in patch 17-20.

With this changes, user could start a multiqueue virtio-net device through

./qemu -netdev tap,id=hn0,queues=2,vhost=on -device virtio-net-pci,netdev=hn0

Management tools such as libvirt can pass multiple pre-created fds/vhostfds 
through

./qemu -netdev tap,id=hn0,fds=X:Y,vhostfds=M:N -device virtio-net-pci,netdev=hn0

No git tree this round since github is unavailable in China...

Changes from V1:
- silent checkpatch (Blue)
- use fds/vhostfds instead of fd/vhostfd (Stefan)
- use fds="X:Y:Z" instead of fd=X,fd=Y,fd=Z (Anthony)
- split patches (Stefan)
- typos in commit log (Stefan)
- Warn 'queues=' when fds/vhostfds is used (Stefan)
- rename __net_init_tap to net_init_tap_one (Stefan)
- check the consistency of vnet_hdr of multiple tap fds (Stefan)
- disable multiqueue support for bridge-helper (Stefan)
- rename tap_attach()/tap_detach() to tap_enable()/tap_disable() (Stefan)
- fix booting with legacy guest (WanLong)
- don't bump the version when doing migration (Michael)
- simplify the interface between virtio-net and multiqueue vhost_net (Michael)
- rebase the patches to latest
- re-order the patches that let the net part comes first to simplify the
  reviewing
- simplify the interface between virtio-net and multiqueue vhost_net
- move the guest notifiers setup from vhost to vhost_net
- fix a build issue of hw/mcf_fce.c

Changes from RFC v2:
- rebase the codes to latest qemu
- align the multiqueue virtio-net implementation to virtio spec
- split the patches into more smaller patches
- set_link and hotplug support

Changes from RFC V1:
- rebase to the latest
- fix memory leak in parse_netdev
- fix guest notifiers assignment/de-assignment
- changes the command lines to:
   qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2

Reference:
V1: http://lists.nongnu.org/archive/html/qemu-devel/2012-12/msg03558.html
RFC v2: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04108.html
RFC v1: http://comments.gmane.org/gmane.comp.emulators.qemu/100481

Perf Numbers:
- norm is short for normalize result
- trans.rate is short for transaction rate

Two Intel Xeon 5620 with direct connected intel 82599EB
Host/Guest kernel: David net tree
vhost enabled

- lots of improvents of both latency and cpu utilization in request-reponse test
- get regression of guest sending small packets which because TCP tends to batch
  less when the latency were improved

1q/2q/4q
TCP_RR
 size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
64 19453.42   632.33  9371.37   616.13  9338.19   615.97
64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
64 50   1069662448.29 146518.67 2514.47 242134.07 2720.91
64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
TCP_CRR
 size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
1 1 2848.37  163.41 2230.39  130.89 2013.09  120.47
1 2023434.5  562.11 31057.43 531.07 49488.28 564.41
1 5028514.88 582.17 40494.23 605.92 60113.35 654.97
1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
64 12780.08  159.4  2201.07  127.96 2006.8   117.63
64 20   23318.51 564.47 30982.44 530.

Re: [PATCH V3 RESEND RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2013-01-25 Thread Raghavendra K T
* Ingo Molnar  [2013-01-24 11:32:13]:

> 
> * Raghavendra K T  wrote:
> 
> > From: Peter Zijlstra 
> > 
> > In case of undercomitted scenarios, especially in large guests
> > yield_to overhead is significantly high. when run queue length of
> > source and target is one, take an opportunity to bail out and return
> > -ESRCH. This return condition can be further exploited to quickly come
> > out of PLE handler.
> > 
> > (History: Raghavendra initially worked on break out of kvm ple handler upon
> >  seeing source runqueue length = 1, but it had to export rq length).
> >  Peter came up with the elegant idea of return -ESRCH in scheduler core.
> > 
> > Signed-off-by: Peter Zijlstra 
> > Raghavendra, Checking the rq length of target vcpu condition added.(thanks 
> > Avi)
> > Reviewed-by: Srikar Dronamraju 
> > Signed-off-by: Raghavendra K T 
> > Acked-by: Andrew Jones 
> > Tested-by: Chegu Vinod 
> > ---
> > 
> >  kernel/sched/core.c |   25 +++--
> >  1 file changed, 19 insertions(+), 6 deletions(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 2d8927f..fc219a5 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
> >   * It's the caller's job to ensure that the target task struct
> >   * can't go away on us before we can do any checks.
> >   *
> > - * Returns true if we indeed boosted the target task.
> > + * Returns:
> > + * true (>0) if we indeed boosted the target task.
> > + * false (0) if we failed to boost the target.
> > + * -ESRCH if there's no task to yield to.
> >   */
> >  bool __sched yield_to(struct task_struct *p, bool preempt)
> >  {
> > @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool 
> > preempt)
> >  
> >  again:
> > p_rq = task_rq(p);
> > +   /*
> > +* If we're the only runnable task on the rq and target rq also
> > +* has only one task, there's absolutely no point in yielding.
> > +*/
> > +   if (rq->nr_running == 1 && p_rq->nr_running == 1) {
> > +   yielded = -ESRCH;
> > +   goto out_irq;
> > +   }
> 
> Looks good to me in principle.
> 
> Would be nice to get more consistent benchmark numbers. Once 
> those are unambiguously showing that this is a win:
> 
>   Acked-by: Ingo Molnar 
>

I ran the test with kernbench and sysbench again on 32 core mx3850
machine with 32 vcpu guests. Results shows definite improvements.

ebizzy and dbench show similar improvement for 1x overcommit
(note that stdev for 1x in dbench is lesser improvemet is now seen at
only 20%)

[ all the experiments are taken out of 8 run averages ].

The patches benefit large guest undercommit scenarios, so I believe
with large guest performance improvemnt is even significant. [ Chegu
Vinod results show performance near to no ple cases ]. Unfortunately I
do not have a machine to test larger guest (>32).

Ingo, Please let me know if this is okay to you.

base kernel = 3.8.0-rc4

+---+---+---++---+
kernbench  (time in sec lower is better)
+---+---+---++---+
basestdevpatchedstdev  %improve
+---+---+---++---+
1x   46.6028 1.8672 42.4494 1.1390 8.91234
2x   99.9074 9.1859 90.4050 2.6131 9.51121
+---+---+---++---+
+---+---+---++---+
   sysbench (time in sec lower is better) 
+---+---+---++---+
basestdevpatchedstdev  %improve
+---+---+---++---+
1x   18.7402 0.3764 17.7431 0.3589 5.32065
2x   13.2238 0.1935 13.0096 0.3152 1.61981
+---+---+---++---+

+---+---+---++---+
ebizzy  (records/sec higher is better)
+---+---+---++---+
basestdevpatchedstdev  %improve
+---+---+---++---+
1x  2421.900019.1801  5883.1000   112.7243   142.91259
+---+---+---++---+

+---+---+---++---+
dbench (throughput MB/sec  higher is better)
+---+---+---++---+
basestdevpatchedstdev  %improve
+---+---+---++---+
1x  11675.9900   857.415414103.5000   215.842520.79061
+---+---+---++---+

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordom

Re: [RFC v5 0/8] QEMU: Support KVM on ARM

2013-01-25 Thread Paolo Bonzini
Il 24/01/2013 16:43, Peter Maydell ha scritto:
> Round 5 of the QEMU patches to support KVM for
> ARM on Cortex-A15 hardware. It's intended for use with
> the kernel tree at
>  git://github.com/virtualopensystems/linux-kvm-arm.git kvm-arm-v17-vgic-timers
> 
> Still RFC pending the kernel patches actually being accepted
> upstream...

Apart from patch 2,

Reviewed-by: Paolo Bonzini 

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html