date:20111222

Re: [PATCH 1/8] KVM: MMU: combine unsync and unsync_children

2011-12-22 Thread Xiao Guangrong

On 12/23/2011 12:06 AM, Marcelo Tosatti wrote:

> On Fri, Dec 16, 2011 at 06:13:42PM +0800, Xiao Guangrong wrote:
>> unsync is only used for the page of level == 1 and unsync_children is only
>> used in the upper page, so combine them to reduce the size of shadow page
>> structure
>>
>> Signed-off-by: Xiao Guangrong 
>> ---
>>  Documentation/virtual/kvm/mmu.txt |5 ++-
>>  arch/x86/include/asm/kvm_host.h   |   14 +++-
>>  arch/x86/kvm/mmu.c|   39 
>> +---
>>  arch/x86/kvm/mmu_audit.c  |6 ++--
>>  arch/x86/kvm/mmutrace.h   |3 +-
>>  arch/x86/kvm/paging_tmpl.h|2 +-
>>  6 files changed, 48 insertions(+), 21 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/mmu.txt 
>> b/Documentation/virtual/kvm/mmu.txt
>> index 5dc972c..4a5fedd 100644
>> --- a/Documentation/virtual/kvm/mmu.txt
>> +++ b/Documentation/virtual/kvm/mmu.txt
>> @@ -205,14 +205,15 @@ Shadow pages contain the following information:
>>  this page's spt.  Otherwise, parent_ptes points at a data structure
>>  with a list of parent_ptes.
>>unsync:
>> +It is used when role.level == 1, only level 1 shadow pages can be 
>> unsync.
>>  If true, then the translations in this page may not match the guest's
>>  translation.  This is equivalent to the state of the tlb when a pte is
>>  changed but before the tlb entry is flushed.  Accordingly, unsync ptes
>>  are synchronized when the guest executes invlpg or flushes its tlb by
>>  other means.  Valid for leaf pages.
>>unsync_children:
>> -How many sptes in the page point at pages that are unsync (or have
>> -unsynchronized children).
>> +It is used when role.level > 1 and indicates how many sptes in the page
>> +point at unsync pages or unsynchronized children.
>>unsync_child_bitmap:
>>  A bitmap indicating which sptes in spt point (directly or indirectly) at
>>  pages that may be unsynchronized.  Used to quickly locate all 
>> unsychronized
>> diff --git a/arch/x86/include/asm/kvm_host.h 
>> b/arch/x86/include/asm/kvm_host.h
>> index 52d6640..c0a89cd 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -233,9 +233,19 @@ struct kvm_mmu_page {
>>   * in this shadow page.
>>   */
>>  DECLARE_BITMAP(slot_bitmap, KVM_MEM_SLOTS_NUM);
>> -bool unsync;
>>  int root_count;  /* Currently serving as active root */
>> -unsigned int unsync_children;
>> +
>> +/*
>> + * If role.level == 1, unsync indicates whether the sp is
>> + * unsync, otherwise unsync_children indicates the number
>> + * of sptes which point at unsync sp or unsychronized children.
>> + * See sp_is_unsync() / sp_unsync_children_num.
>> + */
>> +union {
>> +bool unsync;
>> +unsigned int unsync_children;
>> +};
>> +
>>  unsigned long parent_ptes;  /* Reverse mapping for parent_pte */
>>  DECLARE_BITMAP(unsync_child_bitmap, 512);
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 2a2a9b4..6913a16 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -151,6 +151,21 @@ module_param(dbg, bool, 0644);
>>
>>  #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
>>
>> +static bool sp_is_unsync(struct kvm_mmu_page *sp)
>> +{
>> +return sp->role.level == PT_PAGE_TABLE_LEVEL && sp->unsync;
>> +}
>> +
>> +static unsigned int sp_unsync_children_num(struct kvm_mmu_page *sp)
>> +{
>> +unsigned int num = 0;
>> +
>> +if (sp->role.level != PT_PAGE_TABLE_LEVEL)
>> +num = sp->unsync_children;
>> +
>> +return num;
>> +}
> 
> IIRC there are windows where unsync_children is not accurate. Have you 
> verified this function is not called inside one of those windows?
> 


Actually, this function is only used to see whether the sp has unsync children
in current code.

And this patch did not change the logic, it just use sp_unsync_children_num
instead of using sp->unsync_children directly.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/8] KVM: MMU: remove PT64_SECOND_AVAIL_BITS_SHIFT

2011-12-22 Thread Xiao Guangrong

On 12/18/2011 06:42 PM, Avi Kivity wrote:

> On 12/16/2011 12:18 PM, Xiao Guangrong wrote:
>> It is not used, remove it
>>
>> Signed-off-by: Xiao Guangrong 
>> ---
>>  arch/x86/kvm/mmu.c |1 -
>>  1 files changed, 0 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 5d0f0e3..234a32e 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -91,7 +91,6 @@ module_param(dbg, bool, 0644);
>>  #define PTE_PREFETCH_NUM8
>>
>>  #define PT_FIRST_AVAIL_BITS_SHIFT 9
>> -#define PT64_SECOND_AVAIL_BITS_SHIFT 52
>>
>>
> 
> Don't see a reason to drop it, we may use it some day.
> 


Anyway, it is not used in currently code, I do not have strong opinion on it :)
Please ignore it if you do not like.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/8] KVM: MMU: drop unsync_child_bitmap

2011-12-22 Thread Xiao Guangrong

On 12/18/2011 04:59 PM, Avi Kivity wrote:

> On 12/16/2011 12:16 PM, Xiao Guangrong wrote:
>> unsync_child_bitmap is used to record which spte has unsync page or unsync
>> children, we can set a free bit in the spte instead of it
>>
> 
> unsync_child_bitmap takes one cacheline; the shadow page table takes
> 64.  This will make unsync/resync much more expensive.
> 


Indeed, need more thing about this patch...

> I suggest to put unsync_child_bitmap at the end (together with
> unsync_children) and simply not allocate it when unneeded (have two
> kmem_caches for the two cases).
> 
> 


Good point, thanks Avi.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/6][RFC] virtio-blk: Change I/O path from request to BIO

2011-12-22 Thread Minchan Kim

On Thu, Dec 22, 2011 at 12:57:40PM +, Stefan Hajnoczi wrote:
> On Wed, Dec 21, 2011 at 1:00 AM, Minchan Kim  wrote:
> > This patch is follow-up of Christohp Hellwig's work
> > [RFC: ->make_request support for virtio-blk].
> > http://thread.gmane.org/gmane.linux.kernel/1199763
> >
> > Quote from hch
> > "This patchset allows the virtio-blk driver to support much higher IOP
> > rates which can be driven out of modern PCI-e flash devices.  At this
> > point it really is just a RFC due to various issues."
> 
> Basic question to make sure I understood this series: does this patch
> bypass the guest I/O scheduler (but then you added custom batching
> code into virtio_blk.c)?

Right.

> 
> If you're stumped by the performance perhaps compare blktraces of the
> request approach vs the bio approach.  We're probably performing I/O
> more CPU-efficiently but the I/O pattern itself is worse.

You mean I/O scheduler have many techniques to do well in I/O pattern?
That's what I want to discuss in this RFC.

I guess request layer have many techniques proved during long time
to do well I/O but BIO-based drvier ignores them for just reducing locking
overhead. Of course, we can add such techniques to BIO-batch driver like 
custom-batch in this series. But it needs lots of work, is really duplication,
and will have a problem on maintenance.

I would like to listen opinions whether this direction is good or bad.

> 
> Stefan

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/6][RFC] virtio-blk: Change I/O path from request to BIO

2011-12-22 Thread Minchan Kim

On Thu, Dec 22, 2011 at 10:45:06AM -0500, Vivek Goyal wrote:
> On Thu, Dec 22, 2011 at 10:05:38AM +0900, Minchan Kim wrote:
> 
> [..]
> > > May be using deadline or noop in guest is better to benchmark against
> > > PCI-E based flash.
> > 
> > Good suggestion.
> > I tested it by deadline on guest side.
> > 
> > The result is not good.
> > Although gap is within noise, Batch BIO's random performance is regressed
> > compared to CFQ. 
> > 
> > RequestBatch BIO
> > 
> >  (MB/s)  stddev  (MB/s)  stddev
> > w787.030 31.494 w748.714 68.490
> > rw   216.044 29.734 rw   216.977 40.635
> > r771.765 3.327  r771.107 4.299
> > rr   280.096 25.135 rr   258.067 43.916
> > 
> > I did some small test for only Batch BIO with deadline and cfq.
> > to see I/O scheduler's effect.
> > I think result is very strange, deadline :149MB, CFQ : 87M
> > Because Batio BIO patch uses make_request_fn instead of request_rfn.
> > So I think we should not affect by I/O scheduler.(I mean we issue I/O 
> > before I/O scheduler handles it)
> > 
> > What do you think about it?
> > Do I miss something?
> 
> This indeed is very strange. In case of bio based drivers, changing IO
> scheduler on the queue should not change anything.
> 
> Trying running blktrace on the vda devie and see if you notice something
> odd.
> 
> Also you seem to be reporting contracdicting results for batch bio.
> 
> Initially you mention that random IO is regressing with deadline as
> comapred to CFQ. (It dropped from 325.976 MB/s to 258.067 MB/s).
> 
> In this second test you are reporting that CFQ performs badly as
> compared to deadline. (149MB/s vs 87MB/s).

First test is with 64 thread with 512M and second test is only 1 thread with 
128M
for finishing quick.
I think so result is very stange.

This is data on test of deadline.
In summary, data is very fluctuated.

Sometime it's very stable but sometime data is very fluctuated like this.
I don't know it's aio-stress problem or fusion I/O stuff problem

1)
[root@RHEL-6 ~]# ./aio-stress -c 1 -t 1 -s 128 -r 8 -O -o 3 -d 512 /dev/vda
num_thread 1
adding stage random read
starting with random read
file size 128MB, record size 8KB, depth 512, ios per iteration 8
max io_submit 8, buffer alignment set to 4KB
threads 1 files 1 contexts 1 context offset 2MB verification off
Running single thread version 
random read on /dev/vda (151.71 MB/s) 128.00 MB in 0.84s
thread 0 random read totals (151.55 MB/s) 128.00 MB in 0.84s

2)
[root@RHEL-6 ~]# ./aio-stress -c 1 -t 1 -s 128 -r 8 -O -o 3 -d 512 /dev/vda
num_thread 1
adding stage random read
starting with random read
file size 128MB, record size 8KB, depth 512, ios per iteration 8
max io_submit 8, buffer alignment set to 4KB
threads 1 files 1 contexts 1 context offset 2MB verification off
Running single thread version 
random read on /dev/vda (201.04 MB/s) 128.00 MB in 0.64s
thread 0 random read totals (200.76 MB/s) 128.00 MB in 0.64s

3)
[root@RHEL-6 ~]# ./aio-stress -c 1 -t 1 -s 128 -r 8 -O -o 3 -d 512 /dev/vda
num_thread 1
adding stage random read
starting with random read
file size 128MB, record size 8KB, depth 512, ios per iteration 8
max io_submit 8, buffer alignment set to 4KB
threads 1 files 1 contexts 1 context offset 2MB verification off
Running single thread version 
random read on /dev/vda (135.31 MB/s) 128.00 MB in 0.95s
thread 0 random read totals (135.19 MB/s) 128.00 MB in 0.95s

4)
[root@RHEL-6 ~]# ./aio-stress -c 1 -t 1 -s 128 -r 8 -O -o 3 -d 512 /dev/vda
num_thread 1
adding stage random read
starting with random read
file size 128MB, record size 8KB, depth 512, ios per iteration 8
max io_submit 8, buffer alignment set to 4KB
threads 1 files 1 contexts 1 context offset 2MB verification off
Running single thread version 
random read on /dev/vda (116.93 MB/s) 128.00 MB in 1.09s
thread 0 random read totals (116.82 MB/s) 128.00 MB in 1.10s

5)
[root@RHEL-6 ~]# ./aio-stress -c 1 -t 1 -s 128 -r 8 -O -o 3 -d 512 /dev/vda
num_thread 1
adding stage random read
starting with random read
file size 128MB, record size 8KB, depth 512, ios per iteration 8
max io_submit 8, buffer alignment set to 4KB
threads 1 files 1 contexts 1 context offset 2MB verification off
Running single thread version 
random read on /dev/vda (130.79 MB/s) 128.00 MB in 0.98s
thread 0 random read totals (130.65 MB/s) 128.00 MB in 0.98s

> 
> Two contradicting results?

Hmm, I need test in /tmp/ to prevent it.

> 
> Thanks
> Vivek

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/5] kvm: x86: Use symbols for all xsave field

2011-12-22 Thread Marcelo Tosatti

From: Jan Kiszka 

Field 0 (FCW+FSW) and 1 (FTW+FOP) were hard-coded so far.

Signed-off-by: Jan Kiszka 
Signed-off-by: Marcelo Tosatti 
---
 target-i386/kvm.c |   14 --
 1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5bfc21f..d2f70f9 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -759,6 +759,8 @@ static int kvm_put_fpu(CPUState *env)
 return kvm_vcpu_ioctl(env, KVM_SET_FPU, &fpu);
 }
 
+#define XSAVE_FCW_FSW 0
+#define XSAVE_FTW_FOP 1
 #define XSAVE_CWD_RIP 2
 #define XSAVE_CWD_RDP 4
 #define XSAVE_MXCSR   6
@@ -786,8 +788,8 @@ static int kvm_put_xsave(CPUState *env)
 for (i = 0; i < 8; ++i) {
 twd |= (!env->fptags[i]) << i;
 }
-xsave->region[0] = (uint32_t)(swd << 16) + cwd;
-xsave->region[1] = (uint32_t)(env->fpop << 16) + twd;
+xsave->region[XSAVE_FCW_FSW] = (uint32_t)(swd << 16) + cwd;
+xsave->region[XSAVE_FTW_FOP] = (uint32_t)(env->fpop << 16) + twd;
 memcpy(&xsave->region[XSAVE_CWD_RIP], &env->fpip, sizeof(env->fpip));
 memcpy(&xsave->region[XSAVE_CWD_RDP], &env->fpdp, sizeof(env->fpdp));
 memcpy(&xsave->region[XSAVE_ST_SPACE], env->fpregs,
@@ -991,10 +993,10 @@ static int kvm_get_xsave(CPUState *env)
 return ret;
 }
 
-cwd = (uint16_t)xsave->region[0];
-swd = (uint16_t)(xsave->region[0] >> 16);
-twd = (uint16_t)xsave->region[1];
-env->fpop = (uint16_t)(xsave->region[1] >> 16);
+cwd = (uint16_t)xsave->region[XSAVE_FCW_FSW];
+swd = (uint16_t)(xsave->region[XSAVE_FCW_FSW] >> 16);
+twd = (uint16_t)xsave->region[XSAVE_FTW_FOP];
+env->fpop = (uint16_t)(xsave->region[XSAVE_FTW_FOP] >> 16);
 env->fpstt = (swd >> 11) & 7;
 env->fpus = swd;
 env->fpuc = cwd;
-- 
1.7.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/5] [PULL] qemu-kvm.git uq/master queue

2011-12-22 Thread Marcelo Tosatti

The following changes since commit 03ecd2c80a64d030a22fe67cc7a60f24e17ff211:

  virtio-serial-bus: Ports are expected to implement 'have_data' callback 
(2011-12-21 15:00:29 -0600)

are available in the git repository at:
  git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master

Gleb Natapov (1):
  enable architectural PMU cpuid leaf for kvm

Jan Kiszka (3):
  kvm: x86: Use symbols for all xsave field
  kvm: x86: Avoid runtime allocation of xsave buffer
  kvm: x86: Drop redundant apic base and tpr update from kvm_get_sregs

Vasilis Liaskovitis (1):
  Set numa topology for max_cpus

 hw/pc.c |8 
 target-i386/cpu.h   |3 ++-
 target-i386/cpuid.c |   17 +
 target-i386/kvm.c   |   34 +-
 vl.c|2 +-
 5 files changed, 37 insertions(+), 27 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/6] virtio-blk: implement ->make_request

2011-12-22 Thread Christoph Hellwig

On Thu, Dec 22, 2011 at 12:20:01PM +, Stefan Hajnoczi wrote:
> virtblk_make_request() checks that the queue is not plugged.  Do we
> need to do that here too?

biot based drivers don't have a queue that could be plugged.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-12-22 Thread Marcelo Tosatti

On Thu, Dec 22, 2011 at 11:13:15AM -0600, Anthony Liguori wrote:
> On 12/22/2011 05:01 AM, Marcelo Tosatti wrote:
> >On Thu, Dec 01, 2011 at 06:40:31PM +0100, Peter Zijlstra wrote:
> >>No virt is crap, it needs to die, its horrid, and any solution aimed
> >>squarely at virt only is shit and not worth considering, that simple.
> >
> >Removing this phrase from context (feel free to object on that basis
> >to the following inquiry), what are your concerns with virtualization
> >itself? Is it the fact that having an unknownable operating system under
> >your feet uncomfortable only, or is there something else? Because virt
> >is green, it saves silicon.
> 
> Oh man, if you say virt solves global warming, I think I'm going to
> have to jump off a bridge to end the madness...

I said it is green (saves energy) and it saves silicon (therefore saves
fuel?). The rest of conclusions are your own.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/5] kvm: x86: Drop redundant apic base and tpr update from kvm_get_sregs

2011-12-22 Thread Marcelo Tosatti

From: Jan Kiszka 

The latter was already commented out, the former is redundant as well.
We always get the latest changes after return from the guest via
kvm_arch_post_run.

Signed-off-by: Jan Kiszka 
Signed-off-by: Marcelo Tosatti 
---
 target-i386/kvm.c |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 06f4401..d206852 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1082,10 +1082,9 @@ static int kvm_get_sregs(CPUState *env)
 env->cr[3] = sregs.cr3;
 env->cr[4] = sregs.cr4;
 
-cpu_set_apic_base(env->apic_state, sregs.apic_base);
-
 env->efer = sregs.efer;
-//cpu_set_apic_tpr(env->apic_state, sregs.cr8);
+
+/* changes to apic base and cr8/tpr are read back via kvm_arch_post_run */
 
 #define HFLAG_COPY_MASK \
 ~( HF_CPL_MASK | HF_PE_MASK | HF_MP_MASK | HF_EM_MASK | \
-- 
1.7.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/5] kvm: x86: Avoid runtime allocation of xsave buffer

2011-12-22 Thread Marcelo Tosatti

From: Jan Kiszka 

Keep a per-VCPU xsave buffer for kvm_put/get_xsave instead of
continuously allocating and freeing it on state sync.

Signed-off-by: Jan Kiszka 
Signed-off-by: Marcelo Tosatti 
---
 target-i386/cpu.h |3 ++-
 target-i386/kvm.c |   15 +++
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index a08ce9d..37dde79 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -751,7 +751,8 @@ typedef struct CPUX86State {
 uint32_t cpuid_svm_features;
 bool tsc_valid;
 int tsc_khz;
-
+void *kvm_xsave_buf;
+
 /* in order to simplify APIC support, we leave this pointer to the
user */
 struct DeviceState *apic_state;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index d2f70f9..06f4401 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -516,6 +516,10 @@ int kvm_arch_init_vcpu(CPUState *env)
 }
 }
 
+if (kvm_has_xsave()) {
+env->kvm_xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave));
+}
+
 return 0;
 }
 
@@ -771,15 +775,14 @@ static int kvm_put_fpu(CPUState *env)
 
 static int kvm_put_xsave(CPUState *env)
 {
-int i, r;
-struct kvm_xsave* xsave;
+struct kvm_xsave* xsave = env->kvm_xsave_buf;
 uint16_t cwd, swd, twd;
+int i, r;
 
 if (!kvm_has_xsave()) {
 return kvm_put_fpu(env);
 }
 
-xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
 memset(xsave, 0, sizeof(struct kvm_xsave));
 twd = 0;
 swd = env->fpus & ~(7 << 11);
@@ -801,7 +804,6 @@ static int kvm_put_xsave(CPUState *env)
 memcpy(&xsave->region[XSAVE_YMMH_SPACE], env->ymmh_regs,
 sizeof env->ymmh_regs);
 r = kvm_vcpu_ioctl(env, KVM_SET_XSAVE, xsave);
-g_free(xsave);
 return r;
 }
 
@@ -978,7 +980,7 @@ static int kvm_get_fpu(CPUState *env)
 
 static int kvm_get_xsave(CPUState *env)
 {
-struct kvm_xsave* xsave;
+struct kvm_xsave* xsave = env->kvm_xsave_buf;
 int ret, i;
 uint16_t cwd, swd, twd;
 
@@ -986,10 +988,8 @@ static int kvm_get_xsave(CPUState *env)
 return kvm_get_fpu(env);
 }
 
-xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
 ret = kvm_vcpu_ioctl(env, KVM_GET_XSAVE, xsave);
 if (ret < 0) {
-g_free(xsave);
 return ret;
 }
 
@@ -1013,7 +1013,6 @@ static int kvm_get_xsave(CPUState *env)
 env->xstate_bv = *(uint64_t *)&xsave->region[XSAVE_XSTATE_BV];
 memcpy(env->ymmh_regs, &xsave->region[XSAVE_YMMH_SPACE],
 sizeof env->ymmh_regs);
-g_free(xsave);
 return 0;
 }
 
-- 
1.7.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/5] Set numa topology for max_cpus

2011-12-22 Thread Marcelo Tosatti

From: Vasilis Liaskovitis 

qemu-kvm passes numa/SRAT topology information for smp_cpus to SeaBIOS. However
SeaBIOS always expects to setup max_cpus number of SRAT cpu entries
(MaxCountCPUs variable in build_srat function of Seabios). When qemu-kvm runs
with smp_cpus != max_cpus (e.g. -smp 2,maxcpus=4), Seabios will mistakenly use
memory SRAT info for setting up CPU SRAT entries for the offline CPUs. Wrong
SRAT memory entries are also created. This breaks NUMA in a guest.
Fix by setting up SRAT info for max_cpus in qemu-kvm.

Signed-off-by: Vasilis Liaskovitis 
Signed-off-by: Marcelo Tosatti 
---
 hw/pc.c |8 
 vl.c|2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 3a71992..f51afa8 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -624,9 +624,9 @@ static void *bochs_bios_init(void)
  * of nodes, one word for each VCPU->node and one word for each node to
  * hold the amount of memory.
  */
-numa_fw_cfg = g_malloc0((1 + smp_cpus + nb_numa_nodes) * 8);
+numa_fw_cfg = g_malloc0((1 + max_cpus + nb_numa_nodes) * 8);
 numa_fw_cfg[0] = cpu_to_le64(nb_numa_nodes);
-for (i = 0; i < smp_cpus; i++) {
+for (i = 0; i < max_cpus; i++) {
 for (j = 0; j < nb_numa_nodes; j++) {
 if (node_cpumask[j] & (1 << i)) {
 numa_fw_cfg[i + 1] = cpu_to_le64(j);
@@ -635,10 +635,10 @@ static void *bochs_bios_init(void)
 }
 }
 for (i = 0; i < nb_numa_nodes; i++) {
-numa_fw_cfg[smp_cpus + 1 + i] = cpu_to_le64(node_mem[i]);
+numa_fw_cfg[max_cpus + 1 + i] = cpu_to_le64(node_mem[i]);
 }
 fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, (uint8_t *)numa_fw_cfg,
- (1 + smp_cpus + nb_numa_nodes) * 8);
+ (1 + max_cpus + nb_numa_nodes) * 8);
 
 return fw_cfg;
 }
diff --git a/vl.c b/vl.c
index c03abb6..d925424 100644
--- a/vl.c
+++ b/vl.c
@@ -3305,7 +3305,7 @@ int main(int argc, char **argv, char **envp)
  * real machines which also use this scheme.
  */
 if (i == nb_numa_nodes) {
-for (i = 0; i < smp_cpus; i++) {
+for (i = 0; i < max_cpus; i++) {
 node_cpumask[i % nb_numa_nodes] |= 1 << i;
 }
 }
-- 
1.7.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/5] enable architectural PMU cpuid leaf for kvm

2011-12-22 Thread Marcelo Tosatti

From: Gleb Natapov 

Signed-off-by: Gleb Natapov 
Signed-off-by: Marcelo Tosatti 
---
 target-i386/cpuid.c |   17 +
 1 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
index 0b3af90..91a104b 100644
--- a/target-i386/cpuid.c
+++ b/target-i386/cpuid.c
@@ -1180,10 +1180,19 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 case 0xA:
 /* Architectural Performance Monitoring Leaf */
-*eax = 0;
-*ebx = 0;
-*ecx = 0;
-*edx = 0;
+if (kvm_enabled()) {
+KVMState *s = env->kvm_state;
+
+*eax = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EAX);
+*ebx = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EBX);
+*ecx = kvm_arch_get_supported_cpuid(s, 0xA, count, R_ECX);
+*edx = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EDX);
+} else {
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+}
 break;
 case 0xD:
 /* Processor Extended State */
-- 
1.7.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-12-22 Thread Peter Zijlstra

On Thu, 2011-12-22 at 09:01 -0200, Marcelo Tosatti wrote:
> 
> > No virt is crap, it needs to die, its horrid, and any solution aimed
> > squarely at virt only is shit and not worth considering, that simple.
> 
> Removing this phrase from context (feel free to object on that basis
> to the following inquiry), what are your concerns with virtualization
> itself? Is it the fact that having an unknownable operating system under
> your feet uncomfortable only, or is there something else? Because virt
> is green, it saves silicon.

No, you're going the wrong way around that argument. Resource control
would save the planet in that case. That's an entirely separate concept
from virtualization. Look how much cgroup crap you still need on top of
the whole virt thing.

Virt deals with running legacy OSs, mostly because you're in a bind and
for a host of reasons can't get this super critical application you
really must have running on your new and improved platform.

So you emulate hardware to run the old os, to run the old app or
somesuch nonsense.

Virt really is mostly a technical solution to a mostly non-technical
problem.

There's of course the debug angle, but I've never really found it
reliable enough to use in that capacity, give me real hardware with a
serial port any day of the week.

Also, it just really offends me, we work really hard to make stuff go as
fast as possible and then you stick a gigantic emulation layer in
between and complain that shit is slow again. Don't do that!!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-12-22 Thread Anthony Liguori


On 12/22/2011 05:01 AM, Marcelo Tosatti wrote:

On Thu, Dec 01, 2011 at 06:40:31PM +0100, Peter Zijlstra wrote:

No virt is crap, it needs to die, its horrid, and any solution aimed
squarely at virt only is shit and not worth considering, that simple.


Removing this phrase from context (feel free to object on that basis
to the following inquiry), what are your concerns with virtualization
itself? Is it the fact that having an unknownable operating system under
your feet uncomfortable only, or is there something else? Because virt
is green, it saves silicon.


Oh man, if you say virt solves global warming, I think I'm going to have to jump 
off a bridge to end the madness...


Regards,

Anthony Liguori






--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/8] KVM: MMU: combine unsync and unsync_children

2011-12-22 Thread Marcelo Tosatti

On Fri, Dec 16, 2011 at 06:13:42PM +0800, Xiao Guangrong wrote:
> unsync is only used for the page of level == 1 and unsync_children is only
> used in the upper page, so combine them to reduce the size of shadow page
> structure
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  Documentation/virtual/kvm/mmu.txt |5 ++-
>  arch/x86/include/asm/kvm_host.h   |   14 +++-
>  arch/x86/kvm/mmu.c|   39 +---
>  arch/x86/kvm/mmu_audit.c  |6 ++--
>  arch/x86/kvm/mmutrace.h   |3 +-
>  arch/x86/kvm/paging_tmpl.h|2 +-
>  6 files changed, 48 insertions(+), 21 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/mmu.txt 
> b/Documentation/virtual/kvm/mmu.txt
> index 5dc972c..4a5fedd 100644
> --- a/Documentation/virtual/kvm/mmu.txt
> +++ b/Documentation/virtual/kvm/mmu.txt
> @@ -205,14 +205,15 @@ Shadow pages contain the following information:
>  this page's spt.  Otherwise, parent_ptes points at a data structure
>  with a list of parent_ptes.
>unsync:
> +It is used when role.level == 1, only level 1 shadow pages can be unsync.
>  If true, then the translations in this page may not match the guest's
>  translation.  This is equivalent to the state of the tlb when a pte is
>  changed but before the tlb entry is flushed.  Accordingly, unsync ptes
>  are synchronized when the guest executes invlpg or flushes its tlb by
>  other means.  Valid for leaf pages.
>unsync_children:
> -How many sptes in the page point at pages that are unsync (or have
> -unsynchronized children).
> +It is used when role.level > 1 and indicates how many sptes in the page
> +point at unsync pages or unsynchronized children.
>unsync_child_bitmap:
>  A bitmap indicating which sptes in spt point (directly or indirectly) at
>  pages that may be unsynchronized.  Used to quickly locate all 
> unsychronized
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 52d6640..c0a89cd 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -233,9 +233,19 @@ struct kvm_mmu_page {
>* in this shadow page.
>*/
>   DECLARE_BITMAP(slot_bitmap, KVM_MEM_SLOTS_NUM);
> - bool unsync;
>   int root_count;  /* Currently serving as active root */
> - unsigned int unsync_children;
> +
> + /*
> +  * If role.level == 1, unsync indicates whether the sp is
> +  * unsync, otherwise unsync_children indicates the number
> +  * of sptes which point at unsync sp or unsychronized children.
> +  * See sp_is_unsync() / sp_unsync_children_num.
> +  */
> + union {
> + bool unsync;
> + unsigned int unsync_children;
> + };
> +
>   unsigned long parent_ptes;  /* Reverse mapping for parent_pte */
>   DECLARE_BITMAP(unsync_child_bitmap, 512);
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 2a2a9b4..6913a16 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -151,6 +151,21 @@ module_param(dbg, bool, 0644);
> 
>  #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
> 
> +static bool sp_is_unsync(struct kvm_mmu_page *sp)
> +{
> + return sp->role.level == PT_PAGE_TABLE_LEVEL && sp->unsync;
> +}
> +
> +static unsigned int sp_unsync_children_num(struct kvm_mmu_page *sp)
> +{
> + unsigned int num = 0;
> +
> + if (sp->role.level != PT_PAGE_TABLE_LEVEL)
> + num = sp->unsync_children;
> +
> + return num;
> +}

IIRC there are windows where unsync_children is not accurate. Have you 
verified this function is not called inside one of those windows?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] arch/powerpc/kvm/e500: Additional module.h => export.h fixup

2011-12-22 Thread Kyle Moffett

This file, like many others, needs to include .

Signed-off-by: Kyle Moffett 
---
 arch/powerpc/kvm/e500.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
index 26d2090..387c383 100644
--- a/arch/powerpc/kvm/e500.c
+++ b/arch/powerpc/kvm/e500.c
@@ -13,6 +13,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/6][RFC] virtio-blk: Change I/O path from request to BIO

2011-12-22 Thread Vivek Goyal

On Thu, Dec 22, 2011 at 10:05:38AM +0900, Minchan Kim wrote:

[..]
> > May be using deadline or noop in guest is better to benchmark against
> > PCI-E based flash.
> 
> Good suggestion.
> I tested it by deadline on guest side.
> 
> The result is not good.
> Although gap is within noise, Batch BIO's random performance is regressed
> compared to CFQ. 
> 
> RequestBatch BIO
> 
>  (MB/s)  stddev  (MB/s)  stddev
> w787.030 31.494 w748.714 68.490
> rw   216.044 29.734 rw   216.977 40.635
> r771.765 3.327  r771.107 4.299
> rr   280.096 25.135 rr   258.067 43.916
> 
> I did some small test for only Batch BIO with deadline and cfq.
> to see I/O scheduler's effect.
> I think result is very strange, deadline :149MB, CFQ : 87M
> Because Batio BIO patch uses make_request_fn instead of request_rfn.
> So I think we should not affect by I/O scheduler.(I mean we issue I/O 
> before I/O scheduler handles it)
> 
> What do you think about it?
> Do I miss something?

This indeed is very strange. In case of bio based drivers, changing IO
scheduler on the queue should not change anything.

Trying running blktrace on the vda devie and see if you notice something
odd.

Also you seem to be reporting contracdicting results for batch bio.

Initially you mention that random IO is regressing with deadline as
comapred to CFQ. (It dropped from 325.976 MB/s to 258.067 MB/s).

In this second test you are reporting that CFQ performs badly as
compared to deadline. (149MB/s vs 87MB/s).

Two contradicting results?

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v2] KVM: x86: Prevent exposing TSC deadline timer feature in the absence of in-kernel APIC

2011-12-22 Thread Liu, Jinsong

Avi Kivity wrote:
> On 12/21/2011 12:25 PM, Jan Kiszka wrote:
>> We must not report the TSC deadline timer feature on our own when
>> user space provides the APIC as we have no clue about its features.
> 
> We must not report the TSC deadline timer feature on our own, period.
> We should just update the timer mode mask there.  Don't know how this
> slipped through review.
> 
> I think your original idea was correct.  Add a new KVM_CAP for the tsc
> deadline timer.  Userspace can add the bit to cpuid if either it
> implements the feature in a userspace apic, or if it finds the new
> capability and uses the kernel apic.

Is it necessary to use KVM_CAP? If I didn't misunderstand, the KVM_CAP sulotion 
would be:
1. qemu get kvm tsc deadline timer capability by KVM_CAP_...;
2. qemu add cpuid bit
if ((guest use qemu apic && qemu emualte tsc deadline timer) || 
   (guest use kvm apic && kvm emulate tsc deadline timer (KVM_CAP)))
3. qemu ioctl KVM_SET_CPUID2
4. kvm expose the feature to guest by saving it at vcpu->arch.cpuid_entries, 
seems it's logically redundant.

Jan's patch v2 is a straight forward and simple fix. in the patch
if (apic) { ... }
means apic (and then its sub-logic tsc deadline timer) emulated by kvm, that's 
enough:
if quest use kvm apic, it's OK to add cpuid bit and expose to guest;
if guest don't use kvm apic, it will not touch cpuid bit;

Thanks,
Jinsong--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

SOLVED [was Re: Controlling (WinXP-Guest)- Textmode via chardev pipe broken?]

2011-12-22 Thread Oliver Rath

Am 20.12.2011 13:08, schrieb Sasha Levin:
> [..]
> Whats wrong?
> Are you sure that windows is using the serial device at all?
>
> You're not emulating a keyboard there, you're emulating input from a
> serial device.
>
HI Sasha,

Ive found a better solution:

It is possible to pipe the qemu-monitor:

qemu ... -monitor pipe:.winpipe/path

so Im able to send commands through the path.in pipe via
sendkey-kommand, i.e.

echo "sendkey r" > .winpipe/path.in
echo "sendkey d" > .winpipe/path.in
echo "sendkey shift-dot" > .winpipe/path.in
echo "sendkey ret" > .winpipe/path.in

Its not very komfortable, but does its job :-)

Tfh!

Regards,

Oliver

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-12-22 Thread Marcelo Tosatti

On Thu, Dec 01, 2011 at 06:40:31PM +0100, Peter Zijlstra wrote:
> On Wed, 2011-11-23 at 16:03 +0100, Andrea Arcangeli wrote:
> > Hi!
> > 
> > On Mon, Nov 21, 2011 at 07:51:21PM -0600, Anthony Liguori wrote:
> > > Fundamentally, the entity that should be deciding what memory should be 
> > > present 
> > > and where it should located is the kernel.  I'm fundamentally opposed to 
> > > trying 
> > > to make QEMU override the scheduler/mm by using cpu or memory pinning in 
> > > QEMU.
> > > 
> > >  From what I can tell about ms_mbind(), it just uses process knowledge to 
> > > bind 
> > > specific areas of memory to a memsched group and let's the kernel decide 
> > > what to 
> > > do with that knowledge.  This is exactly the type of interface that QEMU 
> > > should 
> > > be using.
> > > 
> > > QEMU should tell the kernel enough information such that the kernel can 
> > > make 
> > > good decisions.  QEMU should not be the one making the decisions.
> > 
> > True, QEMU won't have to decide where the memory and vcpus should be
> > located (but hey it wouldn't need to decide that even if you use
> > cpusets, you can use relative mbind with cpusets, the admin or a
> > cpuset job scheduler could decide) but it's still QEMU making the
> > decision of what memory and which vcpus threads to
> > ms_mbind/ms_tbind. Think how you're going to create the input of those
> > syscalls...
> > 
> > If it wasn't qemu to decide that, qemu wouldn't be required to scan
> > the whole host physical numa (cpu/memory) topology in order to create
> > the "input" arguments of "ms_mbind/ms_tbind".
> 
> That's a plain falsehood, you don't need to scan host physcal topology
> in order to create useful ms_[mt]bind arguments. You can use physical
> topology to optimize for particular hardware, but its not a strict
> requirement.
> 
> >  And when you migrate the
> > VM to another host, the whole vtopology may be counter-productive
> > because the kernel isn't automatically detecting the numa affinity
> > between threads and the guest vtopology will stick to whatever numa
> > _physical_ topology that was seen on the first node where the VM was
> > created.
> 
> This doesn't make any sense at all.
> 
> > I doubt that the assumption that all cloud nodes will have the same
> > physical numa topology is reasonable.
> 
> So what? If you want to be very careful you can make sure you vnodes are
> small enough they fit any any physical node in your cloud (god I f*king
> hate that word).
> 
> If you're slightly less careful, things will still work, you might get
> less max parallelism, but typically (from what I understood) these VM
> hosting thingies are overloaded so you never get your max cpu anyway, so
> who cares.
> 
> Things is, whatever you set-up it will always work, it might not be
> optimal, but the one guarantee: [threads,vrange] will stay on the same
> node will be kept true, no matter where you run it.
> 
> Also, migration between non-identical hosts is always 'tricky'. You're
> always stuck with some minimally supported subset or average case thing.
> Really, why do you think NUMA would be any different.
> 
> > Furthermore to get the same benefits that qemu gets on host by using
> > ms_mbind/ms_tbind, every single guest application should be modified
> > to scan the guest vtopology and call ms_mbind/ms_tbind too (or use the
> > hard bindings which is what we try to avoid).
> 
> No! ms_[tm]bind() is just part of the solution, the other part is what
> to do for simple programs, and like I wrote in my email earlier, and
> what we talked about in Prague, is that for normal simple proglets we
> simply pick a numa node and stick to it. Much like:
> 
>  http://home.arcor.de/efocht/sched/
> 
> Except we could actually migrate the whole thing if needed. Basically
> you give each task its own 1 vnode and assign all threads to it.
> 
> Only big programs that need to span multiple nodes need to be modified
> to get best advantage of numa. But that has always been true.
> 
> > In my view the trouble of the numa hard bindings is not the fact
> > they're hard and qemu has to also decide the location (in fact it
> > doesn't need to decide the location if you use cpusets and relative
> > mbinds). The bigger problem is the fact either the admin or the app
> > developer has to explicitly scan the numa physical topology (both cpus
> > and memory) and tell the kernel how much memory to bind to each
> > thread. ms_mbind/ms_tbind only partially solve that problem. They're
> > similar to the mbind MPOL_F_RELATIVE_NODES with cpusets, except you
> > don't need an admin or a cpuset-job-scheduler (or a perl script) to
> > redistribute the hardware resources.
> 
> You're full of crap Andrea. 
> 
> Yes you need some clue as to your actual topology, but that's life, you
> can't get SMP for free either, you need to have some clue.
> 
> Just like with regular SMP where you need to be aware of data sharing,
> NUMA just makes it worse. If your app decomposes well enough

Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

2011-12-22 Thread Marcelo Tosatti

On Thu, Dec 01, 2011 at 06:40:31PM +0100, Peter Zijlstra wrote:
> On Wed, 2011-11-23 at 16:03 +0100, Andrea Arcangeli wrote:



> >From what I gather what you propose is to periodically unmap all user
> memory (or map it !r !w !x, which is effectively the same) and take the
> fault. This fault will establish a thread:page relation. One can just
> use that or involve some history as well. Once you have this thread:page
> relation set you want to group them on the same node.
> 
> There's various problems with that, firstly of course the overhead,
> storing this thread:page relation set requires quite a lot of memory.
> Secondly I'm not quite sure I see how that works for threads that share
> a working set. Suppose you have 4 threads and 2 working sets, how do you
> make sure to keep the 2 groups together. I don't think that's evident
> from the simple thread:page relation data [*]. Thirdly I immensely
> dislike all these background scanner things, they make it very hard to
> account time to those who actually use it.

Picture yourself as the administrator of a virtualized host, with
a given workload of guests doing their tasks. All it takes is to
understand from a high level what the algorithms of ksm (collapsing of
equal content-pages into same physical RAM) and khugepaged (collapsing
of 4k pages in 2MB pages, good for TLB) are doing (and that should be
documented), and infer from that what is happening. The same is valid
for the guy who is writing management tools and exposing the statistics
to the system administrator.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: ensure that debugfs entries have been created

2011-12-22 Thread Marcelo Tosatti

On Thu, Dec 15, 2011 at 02:23:16PM +0800, Hamo wrote:
> by checking the return value from kvm_init_debug, we
> can ensure that the entries under debugfs for KVM have
> been created correctly.
> 
> Signed-off-by: Yang Bai 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/3] RFC: provide synchronous registers in kvm_run

2011-12-22 Thread Avi Kivity

On 12/22/2011 03:17 PM, Christian Borntraeger wrote:
> On 22/12/11 13:54, Avi Kivity wrote:
> >> My main concern was the prefix register (this is a per cpu register that
> >> defines the address of two pages that are swapped with the pages at 0 for 
> >> this cpu).
> >> SMP on s390 is done that way (e.g. interrupt things are stored in page 0 
> >> for this cpu)
> >> The storage that qemu sees is storage without prefix. For architecture 
> >> compliance
> >> we actually must check _every_ memory access if it hits the prefix/swpa 
> >> area and 
> >> the add/subtract the prefix value. 
> > 
> > Those are only memory accesses coming from the cpu, yes?  Why does
> > userspace have to access them at all?  I imagine DMA ignores it
> > completely since it doesn't come from the cpu.
>
> Not sure if I got you question...(just ask again if that doesnt aswer it)
>
> The prefix page contains HW-defined content (like the PSWs for the different
> interrupt types) as well as some OS-defined values (for CPU local data 
> structures
> as well as a place to store information in critical sections) 
> The prefix page (and the swap area) must not be used for device I/O (since it 
> will
> be broken as you pointed out), but some I/O instructions can and will write 
> status
> information to the prefix page. For example the channel subsystem driver in 
> Linux
> will use an area in the prefix page as a store address for some instructions.
>
> So let me phrase the above sentence differently:
> For architecture compliance we actually must check every memory access that 
> is done
> on behalf of a guest cpu and was not already handled by the host kernel.

"on behalf of the guest cpu" is the key phrase.  This never happens in
userspace for x86, but I guess that s390 is different here.  Thanks for
the clarification.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/3] RFC: provide synchronous registers in kvm_run

2011-12-22 Thread Christian Borntraeger

On 22/12/11 13:54, Avi Kivity wrote:
>> My main concern was the prefix register (this is a per cpu register that
>> defines the address of two pages that are swapped with the pages at 0 for 
>> this cpu).
>> SMP on s390 is done that way (e.g. interrupt things are stored in page 0 for 
>> this cpu)
>> The storage that qemu sees is storage without prefix. For architecture 
>> compliance
>> we actually must check _every_ memory access if it hits the prefix/swpa area 
>> and 
>> the add/subtract the prefix value. 
> 
> Those are only memory accesses coming from the cpu, yes?  Why does
> userspace have to access them at all?  I imagine DMA ignores it
> completely since it doesn't come from the cpu.

Not sure if I got you question...(just ask again if that doesnt aswer it)

The prefix page contains HW-defined content (like the PSWs for the different
interrupt types) as well as some OS-defined values (for CPU local data 
structures
as well as a place to store information in critical sections) 
The prefix page (and the swap area) must not be used for device I/O (since it 
will
be broken as you pointed out), but some I/O instructions can and will write 
status
information to the prefix page. For example the channel subsystem driver in 
Linux
will use an area in the prefix page as a store address for some instructions.

So let me phrase the above sentence differently:
For architecture compliance we actually must check every memory access that is 
done
on behalf of a guest cpu and was not already handled by the host kernel.

PS: Most of the things are really handled in the kernel. As you can see, the 
current
paravirtual I/O stack does not need the prefix at all

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/23] vhost: convert to MemoryListener API

2011-12-22 Thread Avi Kivity

On 12/22/2011 02:57 PM, Michael S. Tsirkin wrote:
> On Thu, Dec 22, 2011 at 02:50:27PM +0200, Avi Kivity wrote:
> > On 12/22/2011 02:50 PM, Michael S. Tsirkin wrote:
> > > On Mon, Dec 19, 2011 at 04:13:35PM +0200, Avi Kivity wrote:
> > > > +static void vhost_log_start(MemoryListener *listener,
> > > > +MemoryRegionSection *section)
> > > > +{
> > > > +/* FIXME: implement */
> > > > +}
> > > > +
> > > > +static void vhost_log_stop(MemoryListener *listener,
> > > > +   MemoryRegionSection *section)
> > > > +{
> > > > +/* FIXME: implement */
> > > > +}
> > > > +
> > >
> > > What exactly do we need to fix here?
> > 
> > Tell vhost to start tracking those regions?
> > 
> > I guess you don't often read packets into the framebuffer, or we'd have
> > a lot of bug reports.
>
> Yes, we currently simply don't pass these regions to vhost.
> It currently signals an error if such is
> observed, so we could handle framebuffer in userspace
> if we wanted to.

I see, thanks.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/6][RFC] virtio-blk: Change I/O path from request to BIO

2011-12-22 Thread Stefan Hajnoczi

On Wed, Dec 21, 2011 at 1:00 AM, Minchan Kim  wrote:
> This patch is follow-up of Christohp Hellwig's work
> [RFC: ->make_request support for virtio-blk].
> http://thread.gmane.org/gmane.linux.kernel/1199763
>
> Quote from hch
> "This patchset allows the virtio-blk driver to support much higher IOP
> rates which can be driven out of modern PCI-e flash devices.  At this
> point it really is just a RFC due to various issues."

Basic question to make sure I understood this series: does this patch
bypass the guest I/O scheduler (but then you added custom batching
code into virtio_blk.c)?

If you're stumped by the performance perhaps compare blktraces of the
request approach vs the bio approach.  We're probably performing I/O
more CPU-efficiently but the I/O pattern itself is worse.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/23] vhost: convert to MemoryListener API

2011-12-22 Thread Michael S. Tsirkin

On Thu, Dec 22, 2011 at 02:50:27PM +0200, Avi Kivity wrote:
> On 12/22/2011 02:50 PM, Michael S. Tsirkin wrote:
> > On Mon, Dec 19, 2011 at 04:13:35PM +0200, Avi Kivity wrote:
> > > +static void vhost_log_start(MemoryListener *listener,
> > > +MemoryRegionSection *section)
> > > +{
> > > +/* FIXME: implement */
> > > +}
> > > +
> > > +static void vhost_log_stop(MemoryListener *listener,
> > > +   MemoryRegionSection *section)
> > > +{
> > > +/* FIXME: implement */
> > > +}
> > > +
> >
> > What exactly do we need to fix here?
> 
> Tell vhost to start tracking those regions?
> 
> I guess you don't often read packets into the framebuffer, or we'd have
> a lot of bug reports.

Yes, we currently simply don't pass these regions to vhost.
It currently signals an error if such is
observed, so we could handle framebuffer in userspace
if we wanted to.

> -- 
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/3] RFC: provide synchronous registers in kvm_run

2011-12-22 Thread Avi Kivity

On 12/22/2011 02:49 PM, Christian Borntraeger wrote:
> > Interesting.  Other archs emulate everything to do with registers in the
> > kernel, so this is not a fast path.
> > 
> > What workload does this benefit?
>
> My main concern was the prefix register (this is a per cpu register that
> defines the address of two pages that are swapped with the pages at 0 for 
> this cpu).
> SMP on s390 is done that way (e.g. interrupt things are stored in page 0 for 
> this cpu)
> The storage that qemu sees is storage without prefix. For architecture 
> compliance
> we actually must check _every_ memory access if it hits the prefix/swpa area 
> and 
> the add/subtract the prefix value. 

Those are only memory accesses coming from the cpu, yes?  Why does
userspace have to access them at all?  I imagine DMA ignores it
completely since it doesn't come from the cpu.

> I just added the ability to share other registers later after some discussions
> with Alexander Graf because it seems to be doable for almost no cost.

I doubt other archs will benefit, but it makes sense to reserve the
space generically as you did.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/23] vhost: convert to MemoryListener API

2011-12-22 Thread Avi Kivity

On 12/22/2011 02:50 PM, Michael S. Tsirkin wrote:
> On Mon, Dec 19, 2011 at 04:13:35PM +0200, Avi Kivity wrote:
> > +static void vhost_log_start(MemoryListener *listener,
> > +MemoryRegionSection *section)
> > +{
> > +/* FIXME: implement */
> > +}
> > +
> > +static void vhost_log_stop(MemoryListener *listener,
> > +   MemoryRegionSection *section)
> > +{
> > +/* FIXME: implement */
> > +}
> > +
>
> What exactly do we need to fix here?

Tell vhost to start tracking those regions?

I guess you don't often read packets into the framebuffer, or we'd have
a lot of bug reports.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/3] RFC: provide synchronous registers in kvm_run

2011-12-22 Thread Christian Borntraeger

On 22/12/11 13:35, Avi Kivity wrote:
> On 12/22/2011 01:56 PM, Christian Borntraeger wrote:
>> Avi, Marcelo,
>>
>> currently userspace can access guest registers via several ioctls. Some
>> of these registers might be useful very often. Here the system call overhead
>> for ioctl can make an exit more expensive than necessary.
>> In a discussion with Alex Graf we concluded that it might be beneficial to
>> have a subset of registers available in kvm_run. (The ioctls will also be
>> available).
>>
>> This series provides a prototype implementation together with two example
>> users for s390.
>>
>>
> 
> Interesting.  Other archs emulate everything to do with registers in the
> kernel, so this is not a fast path.
> 
> What workload does this benefit?

My main concern was the prefix register (this is a per cpu register that
defines the address of two pages that are swapped with the pages at 0 for this 
cpu).
SMP on s390 is done that way (e.g. interrupt things are stored in page 0 for 
this cpu)
The storage that qemu sees is storage without prefix. For architecture 
compliance
we actually must check _every_ memory access if it hits the prefix/swpa area 
and 
the add/subtract the prefix value. 

I just added the ability to share other registers later after some discussions
with Alexander Graf because it seems to be doable for almost no cost.

Christian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 20/23] vhost: avoid cpu_get_physical_page_desc()

2011-12-22 Thread Avi Kivity

On 12/22/2011 02:48 PM, Michael S. Tsirkin wrote:
> On Mon, Dec 19, 2011 at 04:13:41PM +0200, Avi Kivity wrote:
> > @@ -871,7 +899,10 @@ void vhost_dev_stop(struct vhost_dev *hdev, 
> > VirtIODevice *vdev)
> >  hdev->vqs + i,
> >  i);
> >  }
> > -vhost_sync_dirty_bitmap(hdev, 0, (target_phys_addr_t)~0x0ull);
> > +for (i = 0; i < hdev->n_mem_sections; ++i) {
> > +vhost_sync_dirty_bitmap(hdev, &hdev->mem_sections[i],
> > +0, (target_phys_addr_t)~0x0ull);
> > +}
> >  r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, false);
> >  if (r < 0) {
> >  fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
> > diff --git a/hw/vhost.h b/hw/vhost.h
> > index d1824ec..80e64df 100644
> > --- a/hw/vhost.h
> > +++ b/hw/vhost.h
> > @@ -30,6 +30,8 @@ struct vhost_dev {
> >  MemoryListener memory_listener;
> >  int control;
> >  struct vhost_memory *mem;
> > +int n_mem_sections;
> > +MemoryRegionSection *mem_sections;
> >  struct vhost_virtqueue *vqs;
> >  int nvqs;
> >  unsigned long long features;
>
> This adds need to track all sections which is unfortunate.
> Couldn't the memory API get an extension e.g. to scan them all?

I thought about it, it makes sense.

We even have memory_region_find() which can be used to implement it,
just need a FOR_EACH wrapper.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/23] vhost: convert to MemoryListener API

2011-12-22 Thread Michael S. Tsirkin

On Mon, Dec 19, 2011 at 04:13:35PM +0200, Avi Kivity wrote:
> +static void vhost_log_start(MemoryListener *listener,
> +MemoryRegionSection *section)
> +{
> +/* FIXME: implement */
> +}
> +
> +static void vhost_log_stop(MemoryListener *listener,
> +   MemoryRegionSection *section)
> +{
> +/* FIXME: implement */
> +}
> +

What exactly do we need to fix here?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 3/3] kvm-s390: provide general purpose registers via kvm_run

2011-12-22 Thread Avi Kivity

On 12/22/2011 02:41 PM, Heiko Carstens wrote:
> > Index: b/arch/s390/include/asm/kvm_host.h
> > ===
>
> Btw. you may want to set QUILT_NO_DIFF_INDEX to get rid of the
> Index lines junk :)

Or just use git, when I switched some years ago I found I didn't miss
anything in quilt.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 20/23] vhost: avoid cpu_get_physical_page_desc()

2011-12-22 Thread Michael S. Tsirkin

On Mon, Dec 19, 2011 at 04:13:41PM +0200, Avi Kivity wrote:
> @@ -871,7 +899,10 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice 
> *vdev)
>  hdev->vqs + i,
>  i);
>  }
> -vhost_sync_dirty_bitmap(hdev, 0, (target_phys_addr_t)~0x0ull);
> +for (i = 0; i < hdev->n_mem_sections; ++i) {
> +vhost_sync_dirty_bitmap(hdev, &hdev->mem_sections[i],
> +0, (target_phys_addr_t)~0x0ull);
> +}
>  r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, false);
>  if (r < 0) {
>  fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
> diff --git a/hw/vhost.h b/hw/vhost.h
> index d1824ec..80e64df 100644
> --- a/hw/vhost.h
> +++ b/hw/vhost.h
> @@ -30,6 +30,8 @@ struct vhost_dev {
>  MemoryListener memory_listener;
>  int control;
>  struct vhost_memory *mem;
> +int n_mem_sections;
> +MemoryRegionSection *mem_sections;
>  struct vhost_virtqueue *vqs;
>  int nvqs;
>  unsigned long long features;

This adds need to track all sections which is unfortunate.
Couldn't the memory API get an extension e.g. to scan them all?

> -- 
> 1.7.7.1
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 3/3] kvm-s390: provide general purpose registers via kvm_run

2011-12-22 Thread Avi Kivity

On 12/22/2011 02:39 PM, Christian Borntraeger wrote:
> On 22/12/11 13:34, Avi Kivity wrote:
> >> The general purpose registers are often necessary to handle SIE exits.
> >> Avoid additional ioctls by providing the guest registers in the r/w 
> >> section of the kvm_run structure.
> >>
> > 
> > This is only needed for S390_UCONTROL?
>
> No for the standard path. It was more like a "we copy the regs around
> anyway inside the kernel, so why not use kvm_run as a place to store 
> the guest regs". So I will probably also have a look at floating point
> regs and access registers.

Sure, for side effect free registers that's a good approach.  I'd copy
it for x86, except vmx keeps %rsp cached in the cpu and won't let it out
except via an expensive instruction, and also because it's very rarely
used (mostly by "info registers" in qemu and live migration).

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 3/3] kvm-s390: provide general purpose registers via kvm_run

2011-12-22 Thread Heiko Carstens

On Thu, Dec 22, 2011 at 12:56:49PM +0100, Christian Borntraeger wrote:
> From: Christian Borntraeger 
> 
> The general purpose registers are often necessary to handle SIE exits.
> Avoid additional ioctls by providing the guest registers in the r/w 
> section of the kvm_run structure.
> 
> Signed-off-by: Christian Borntraeger 
> ---
>  struct sync_rw_regs {
> + __u64 gprs[16]; /* general purpose registers */

FWIW, you should be able to get rid of guest_gprs[] from the
kvm_vcpu_arch structure.

> Index: b/arch/s390/include/asm/kvm_host.h
> ===

Btw. you may want to set QUILT_NO_DIFF_INDEX to get rid of the
Index lines junk :)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 3/3] kvm-s390: provide general purpose registers via kvm_run

2011-12-22 Thread Christian Borntraeger

On 22/12/11 13:34, Avi Kivity wrote:
>> The general purpose registers are often necessary to handle SIE exits.
>> Avoid additional ioctls by providing the guest registers in the r/w 
>> section of the kvm_run structure.
>>
> 
> This is only needed for S390_UCONTROL?

No for the standard path. It was more like a "we copy the regs around
anyway inside the kernel, so why not use kvm_run as a place to store 
the guest regs". So I will probably also have a look at floating point
regs and access registers.

> 
>>  
>>  struct sync_rw_regs {
>> +__u64 gprs[16]; /* general purpose registers */
>>  };
>>  #endif
> 
> Don't you have to remove arch.guest_gprs too? (interesting approach btw).

Yes. Done

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 0/3] RFC: provide synchronous registers in kvm_run

2011-12-22 Thread Avi Kivity

On 12/22/2011 01:56 PM, Christian Borntraeger wrote:
> Avi, Marcelo,
>
> currently userspace can access guest registers via several ioctls. Some
> of these registers might be useful very often. Here the system call overhead
> for ioctl can make an exit more expensive than necessary.
> In a discussion with Alex Graf we concluded that it might be beneficial to
> have a subset of registers available in kvm_run. (The ioctls will also be
> available).
>
> This series provides a prototype implementation together with two example
> users for s390.
>
>

Interesting.  Other archs emulate everything to do with registers in the
kernel, so this is not a fast path.

What workload does this benefit?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 3/3] kvm-s390: provide general purpose registers via kvm_run

2011-12-22 Thread Avi Kivity

On 12/22/2011 01:56 PM, Christian Borntraeger wrote:
> From: Christian Borntraeger 
>
> The general purpose registers are often necessary to handle SIE exits.
> Avoid additional ioctls by providing the guest registers in the r/w 
> section of the kvm_run structure.
>

This is only needed for S390_UCONTROL?

>  
>  struct sync_rw_regs {
> + __u64 gprs[16]; /* general purpose registers */
>  };
>  #endif

Don't you have to remove arch.guest_gprs too? (interesting approach btw).

>  
>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> @@ -528,7 +528,7 @@ rerun_vcpu:
>   might_fault();
>  
>   do {
> - __vcpu_run(vcpu);
> + __vcpu_run(vcpu, kvm_run);

kvm_run == vcpu->run, no need for the extra param.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 1/3] kvm: provide synchronous registers in kvm_run

2011-12-22 Thread Avi Kivity

On 12/22/2011 01:56 PM, Christian Borntraeger wrote:
> From: Christian Borntraeger 
>
> On some cpus the overhead for virtualization instructions is in the same
> range as a system call. Having to call multiple ioctls to get set registers
> will make userspace handled exits more expensive than necessary.
> Lets provide two sections in kvm_run to have a shared save area for
> guest registers.
> 1. the first section is read-only, to handle registers that have side-effects
> 2. the second section is read/write, e.g. for general purpose registers.

An alternative is to have a single section, with a bitmask set by
userspace to indicate which registers have not been changed (a la svm's
CleanBits).  Is this worthwhile for s390?

>   };
> + /* Here are two fields that allow to access often used registers
> + * directly, to avoid the overhead of the ioctl system call */
> + union {
> + /* registers which can be only read */
> + struct sync_ro_regs sync_ro_regs;

kvm_

> + char padding[1024];
> + };
> + union {
> + /* read/write guest registers */
> + struct sync_rw_regs sync_rw_regs;

kvm_

> + char padding[1024];
> + };
>  };

blank line

> +These fields allow userspace to access certain guest registers without
> +having to call SET/GET_*REGS. Thus we can avoid some system call
> +overhead if userspace has to handle the exit. (only available if
> +KVM_CAP_SYNC_REGS is set). The ioctls will still work.
>  
>


-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/6] virtio-blk: implement ->make_request

2011-12-22 Thread Stefan Hajnoczi

On Wed, Dec 21, 2011 at 1:00 AM, Minchan Kim  wrote:
> +static void virtblk_add_buf_wait(struct virtio_blk *vblk,
> +       struct virtblk_req *vbr, unsigned long out, unsigned long in)
> +{
> +       DEFINE_WAIT(wait);
> +       bool retry, notify;
> +
> +       for (;;) {
> +               prepare_to_wait(&vblk->queue_wait, &wait,
> +                               TASK_UNINTERRUPTIBLE);
> +
> +               spin_lock_irq(&vblk->lock);
> +               if (virtqueue_add_buf(vblk->vq, vbr->sg,
> +                       out, in, vbr) < 0) {
> +                       retry = true;
> +               } else {
> +                       retry = false;
> +               }
> +               notify = virtqueue_kick_prepare(vblk->vq);
> +               spin_unlock_irq(&vblk->lock);
> +
> +               if (notify)

virtblk_make_request() checks that the queue is not plugged.  Do we
need to do that here too?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 2/3] kvm-s390: provide the prefix register via kvm_run

2011-12-22 Thread Christian Borntraeger

From: Christian Borntraeger 

The prefix register is a read-mostly value that is necessary to emulate
memory accesses in an architecture compliant-way. Avoid an additional
ioctl by providing the prefix content in the r/o section of kvm_run.

Signed-off-by: Christian Borntraeger 
---
 arch/s390/include/asm/kvm.h |1 +
 arch/s390/kvm/kvm-s390.c|2 ++
 2 files changed, 3 insertions(+)

Index: b/arch/s390/include/asm/kvm.h
===
--- a/arch/s390/include/asm/kvm.h
+++ b/arch/s390/include/asm/kvm.h
@@ -43,6 +43,7 @@ struct kvm_guest_debug_arch {
 
 /* definition of registers in kvm_run */
 struct sync_ro_regs {
+   __u32 prefix;   /* prefix register */
 };
 
 struct sync_rw_regs {
Index: b/arch/s390/kvm/kvm-s390.c
===
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -129,6 +129,7 @@ int kvm_dev_ioctl_check_extension(long e
case KVM_CAP_S390_PSW:
case KVM_CAP_S390_GMAP:
case KVM_CAP_SYNC_MMU:
+   case KVM_CAP_SYNC_REGS:
r = 1;
break;
default:
@@ -556,6 +557,7 @@ rerun_vcpu:
 
kvm_run->psw_mask = vcpu->arch.sie_block->gpsw.mask;
kvm_run->psw_addr = vcpu->arch.sie_block->gpsw.addr;
+   kvm_run->sync_ro_regs.prefix = vcpu->arch.sie_block->prefix;
 
if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, &sigsaved, NULL);

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 0/3] RFC: provide synchronous registers in kvm_run

2011-12-22 Thread Christian Borntraeger

Avi, Marcelo,

currently userspace can access guest registers via several ioctls. Some
of these registers might be useful very often. Here the system call overhead
for ioctl can make an exit more expensive than necessary.
In a discussion with Alex Graf we concluded that it might be beneficial to
have a subset of registers available in kvm_run. (The ioctls will also be
available).

This series provides a prototype implementation together with two example
users for s390.

Opionions?



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 3/3] kvm-s390: provide general purpose registers via kvm_run

2011-12-22 Thread Christian Borntraeger

From: Christian Borntraeger 

The general purpose registers are often necessary to handle SIE exits.
Avoid additional ioctls by providing the guest registers in the r/w 
section of the kvm_run structure.

Signed-off-by: Christian Borntraeger 
---
 arch/s390/include/asm/kvm.h  |1 +
 arch/s390/include/asm/kvm_host.h |2 +-
 arch/s390/kvm/diag.c |6 +++---
 arch/s390/kvm/intercept.c|4 ++--
 arch/s390/kvm/kvm-s390.c |   16 
 arch/s390/kvm/priv.c |   24 
 arch/s390/kvm/sigp.c |   14 +++---
 7 files changed, 34 insertions(+), 33 deletions(-)

==
Index: b/arch/s390/include/asm/kvm.h
===
--- a/arch/s390/include/asm/kvm.h
+++ b/arch/s390/include/asm/kvm.h
@@ -47,5 +47,6 @@ struct sync_ro_regs {
 };
 
 struct sync_rw_regs {
+   __u64 gprs[16]; /* general purpose registers */
 };
 #endif
Index: b/arch/s390/include/asm/kvm_host.h
===
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -253,5 +253,5 @@ struct kvm_arch{
struct gmap *gmap;
 };
 
-extern int sie64a(struct kvm_s390_sie_block *, unsigned long *);
+extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 #endif
Index: b/arch/s390/kvm/diag.c
===
--- a/arch/s390/kvm/diag.c
+++ b/arch/s390/kvm/diag.c
@@ -20,8 +20,8 @@ static int diag_release_pages(struct kvm
unsigned long start, end;
unsigned long prefix  = vcpu->arch.sie_block->prefix;
 
-   start = vcpu->arch.guest_gprs[(vcpu->arch.sie_block->ipa & 0xf0) >> 4];
-   end = vcpu->arch.guest_gprs[vcpu->arch.sie_block->ipa & 0xf] + 4096;
+   start = vcpu->run->sync_rw_regs.gprs[(vcpu->arch.sie_block->ipa & 0xf0) 
>> 4];
+   end = vcpu->run->sync_rw_regs.gprs[vcpu->arch.sie_block->ipa & 0xf] + 
4096;
 
if (start & ~PAGE_MASK || end & ~PAGE_MASK || start > end
|| start < 2 * PAGE_SIZE)
@@ -56,7 +56,7 @@ static int __diag_time_slice_end(struct 
 static int __diag_ipl_functions(struct kvm_vcpu *vcpu)
 {
unsigned int reg = vcpu->arch.sie_block->ipa & 0xf;
-   unsigned long subcode = vcpu->arch.guest_gprs[reg] & 0x;
+   unsigned long subcode = vcpu->run->sync_rw_regs.gprs[reg] & 0x;
 
VCPU_EVENT(vcpu, 5, "diag ipl functions, subcode %lx", subcode);
switch (subcode) {
Index: b/arch/s390/kvm/intercept.c
===
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -36,7 +36,7 @@ static int handle_lctlg(struct kvm_vcpu 
 
useraddr = disp2;
if (base2)
-   useraddr += vcpu->arch.guest_gprs[base2];
+   useraddr += vcpu->run->sync_rw_regs.gprs[base2];
 
if (useraddr & 7)
return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
@@ -75,7 +75,7 @@ static int handle_lctl(struct kvm_vcpu *
 
useraddr = disp2;
if (base2)
-   useraddr += vcpu->arch.guest_gprs[base2];
+   useraddr += vcpu->run->sync_rw_regs.gprs[base2];
 
if (useraddr & 3)
return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
Index: b/arch/s390/kvm/kvm-s390.c
===
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -389,13 +389,13 @@ static int kvm_arch_vcpu_ioctl_initial_r
 
 int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 {
-   memcpy(&vcpu->arch.guest_gprs, ®s->gprs, sizeof(regs->gprs));
+   memcpy(&vcpu->run->sync_rw_regs.gprs, ®s->gprs, sizeof(regs->gprs));
return 0;
 }
 
 int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
 {
-   memcpy(®s->gprs, &vcpu->arch.guest_gprs, sizeof(regs->gprs));
+   memcpy(®s->gprs, &vcpu->run->sync_rw_regs.gprs, sizeof(regs->gprs));
return 0;
 }
 
@@ -468,9 +468,9 @@ int kvm_arch_vcpu_ioctl_set_mpstate(stru
return -EINVAL; /* not implemented yet */
 }
 
-static void __vcpu_run(struct kvm_vcpu *vcpu)
+static void __vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
-   memcpy(&vcpu->arch.sie_block->gg14, &vcpu->arch.guest_gprs[14], 16);
+   memcpy(&vcpu->arch.sie_block->gg14, &kvm_run->sync_rw_regs.gprs[14], 
16);
 
if (need_resched())
schedule();
@@ -486,7 +486,7 @@ static void __vcpu_run(struct kvm_vcpu *
local_irq_enable();
VCPU_EVENT(vcpu, 6, "entering sie flags %x",
   atomic_read(&vcpu->arch.sie_block->cpuflags));
-   if (sie64a(vcpu->arch.sie_block, vcpu->arch.guest_gprs)) {
+   if (sie64a(vcpu->arch.sie_block, kvm_run->sync_rw_regs.gprs)) {
VCPU_EVENT(vcp

[patch 1/3] kvm: provide synchronous registers in kvm_run

2011-12-22 Thread Christian Borntraeger

From: Christian Borntraeger 

On some cpus the overhead for virtualization instructions is in the same
range as a system call. Having to call multiple ioctls to get set registers
will make userspace handled exits more expensive than necessary.
Lets provide two sections in kvm_run to have a shared save area for
guest registers.
1. the first section is read-only, to handle registers that have side-effects
2. the second section is read/write, e.g. for general purpose registers.


Signed-off-by: Christian Borntraeger 
---
 Documentation/virtual/kvm/api.txt |   16 
 arch/ia64/include/asm/kvm.h   |7 +++
 arch/powerpc/include/asm/kvm.h|7 +++
 arch/s390/include/asm/kvm.h   |6 ++
 arch/x86/include/asm/kvm.h|7 +++
 include/linux/kvm.h   |   13 +
 6 files changed, 56 insertions(+)

Index: b/Documentation/virtual/kvm/api.txt
===
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1652,7 +1652,23 @@ developer registration required to acces
/* Fix the size of the union. */
char padding[256];
};
+   /* Here are two fields that allow to access often used registers
+ * directly, to avoid the overhead of the ioctl system call */
+   union {
+   /* registers which can be only read */
+   struct sync_ro_regs sync_ro_regs;
+   char padding[1024];
+   };
+   union {
+   /* read/write guest registers */
+   struct sync_rw_regs sync_rw_regs;
+   char padding[1024];
+   };
 };
+These fields allow userspace to access certain guest registers without
+having to call SET/GET_*REGS. Thus we can avoid some system call
+overhead if userspace has to handle the exit. (only available if
+KVM_CAP_SYNC_REGS is set). The ioctls will still work.
 
 6. Capabilities that can be enabled
 
Index: b/arch/ia64/include/asm/kvm.h
===
--- a/arch/ia64/include/asm/kvm.h
+++ b/arch/ia64/include/asm/kvm.h
@@ -261,4 +261,11 @@ struct kvm_debug_exit_arch {
 struct kvm_guest_debug_arch {
 };
 
+/* definition of registers in kvm_run */
+struct sync_ro_regs {
+};
+
+struct sync_rw_regs {
+};
+
 #endif
Index: b/arch/powerpc/include/asm/kvm.h
===
--- a/arch/powerpc/include/asm/kvm.h
+++ b/arch/powerpc/include/asm/kvm.h
@@ -265,6 +265,13 @@ struct kvm_debug_exit_arch {
 struct kvm_guest_debug_arch {
 };
 
+/* definition of registers in kvm_run */
+struct sync_ro_regs {
+};
+
+struct sync_rw_regs {
+};
+
 #define KVM_REG_MASK   0x001f
 #define KVM_REG_EXT_MASK   0xffe0
 #define KVM_REG_GPR0x
Index: b/arch/s390/include/asm/kvm.h
===
--- a/arch/s390/include/asm/kvm.h
+++ b/arch/s390/include/asm/kvm.h
@@ -41,4 +41,10 @@ struct kvm_debug_exit_arch {
 struct kvm_guest_debug_arch {
 };
 
+/* definition of registers in kvm_run */
+struct sync_ro_regs {
+};
+
+struct sync_rw_regs {
+};
 #endif
Index: b/arch/x86/include/asm/kvm.h
===
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -321,4 +321,11 @@ struct kvm_xcrs {
__u64 padding[16];
 };
 
+/* definition of registers in kvm_run */
+struct sync_ro_regs {
+};
+
+struct sync_rw_regs {
+};
+
 #endif /* _ASM_X86_KVM_H */
Index: b/include/linux/kvm.h
===
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -273,6 +273,18 @@ struct kvm_run {
/* Fix the size of the union. */
char padding[256];
};
+   /* Here are two fields that allow to access often used registers
+ * directly, to avoid the overhead of the ioctl system call */
+   union {
+   /* registers which can be only read */
+   struct sync_ro_regs sync_ro_regs;
+   char padding[1024];
+   };
+   union {
+   /* read/write guest registers */
+   struct sync_rw_regs sync_rw_regs;
+   char padding[1024];
+   };
 };
 
 /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */
@@ -557,6 +569,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_MAX_VCPUS 66   /* returns max vcpus per vm */
 #define KVM_CAP_PPC_PAPR 68
 #define KVM_CAP_S390_GMAP 71
+#define KVM_CAP_SYNC_REGS 72
 
 #ifdef KVM_CAP_IRQ_ROUTING
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] kvm tools: Fix KVM_RUN exit code check

2011-12-22 Thread Sasha Levin

On Thu, 2011-12-22 at 12:21 +0200, Avi Kivity wrote:
> On 12/22/2011 12:18 PM, Sasha Levin wrote:
> > On Thu, 2011-12-22 at 12:03 +0200, Avi Kivity wrote:
> > > On 12/08/2011 07:31 AM, Sasha Levin wrote:
> > > > > 
> > > > > It sounds like Alex is considering KVM PPC's return value in this 
> > > > > case (and
> > > > > updating api.txt if appropriate) -- what say you on this patch?  It 
> > > > > actually
> > > > > brings kvmtool's KVM_RUN return val check in line with QEMU's (also 
> > > > > "< 0") and
> > > > > nothing PPC will run without it, currently.  (I'm about to repost a 
> > > > > new series,
> > > > > will include it for these reasons, until I hear more complaint ;) )
> > > >
> > > > '<0' is fine as it's what api.txt says :)
> > > 
> > > What? ioctls return -1 on error, not <0.
> >
> > <0 as opposed to the !=0 check we had there before.
> >
> > Theres no harm in checking for <0 even if the only possible negative
> > result is -1.
> >
> >
> 
> Yes, but the documentation should say -1.  Which ioctl is this?

That was KVM_RUN, but the documentation on that is correct (-1), so I'm
not sure what I was thinking.

On the other hand, there are several ioctls that do need fixing:

 - KVM_IOEVENTFD
 - KVM_PPC_GET_PVINFO
 - KVM_CAP_GET_TSC_KHZ

-- 

Sasha.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] kvm tools: Fix KVM_RUN exit code check

2011-12-22 Thread Avi Kivity

On 12/22/2011 12:18 PM, Sasha Levin wrote:
> On Thu, 2011-12-22 at 12:03 +0200, Avi Kivity wrote:
> > On 12/08/2011 07:31 AM, Sasha Levin wrote:
> > > > 
> > > > It sounds like Alex is considering KVM PPC's return value in this case 
> > > > (and
> > > > updating api.txt if appropriate) -- what say you on this patch?  It 
> > > > actually
> > > > brings kvmtool's KVM_RUN return val check in line with QEMU's (also "< 
> > > > 0") and
> > > > nothing PPC will run without it, currently.  (I'm about to repost a new 
> > > > series,
> > > > will include it for these reasons, until I hear more complaint ;) )
> > >
> > > '<0' is fine as it's what api.txt says :)
> > 
> > What? ioctls return -1 on error, not <0.
>
> <0 as opposed to the !=0 check we had there before.
>
> Theres no harm in checking for <0 even if the only possible negative
> result is -1.
>
>

Yes, but the documentation should say -1.  Which ioctl is this?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] kvm tools: Fix KVM_RUN exit code check

2011-12-22 Thread Sasha Levin

On Thu, 2011-12-22 at 12:03 +0200, Avi Kivity wrote:
> On 12/08/2011 07:31 AM, Sasha Levin wrote:
> > > 
> > > It sounds like Alex is considering KVM PPC's return value in this case 
> > > (and
> > > updating api.txt if appropriate) -- what say you on this patch?  It 
> > > actually
> > > brings kvmtool's KVM_RUN return val check in line with QEMU's (also "< 
> > > 0") and
> > > nothing PPC will run without it, currently.  (I'm about to repost a new 
> > > series,
> > > will include it for these reasons, until I hear more complaint ;) )
> >
> > '<0' is fine as it's what api.txt says :)
> 
> What? ioctls return -1 on error, not <0.

<0 as opposed to the !=0 check we had there before.

Theres no harm in checking for <0 even if the only possible negative
result is -1.

> 
> The syscalls do return <0, but libc converts that to -1/errno.
> 

-- 

Sasha.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4 V7] Add ioctl for KVMCLOCK_GUEST_STOPPED

2011-12-22 Thread Avi Kivity

On 12/15/2011 09:42 PM, Eric B Munson wrote:
> Now that we have a flag that will tell the guest it was suspended, create an
> interface for that communication using a KVM ioctl.
>
>  
> +int kvm_set_guest_paused(struct kvm_vcpu *vcpu);

Please avoid the needless forward declaration.

> +
>  extern bool tdp_enabled;
>  
>  u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c38efd7..1dab5fd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3295,6 +3295,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  
>   goto out;
>   }
> + case KVMCLOCK_GUEST_PAUSED: {
> + r = kvm_set_guest_paused(vcpu);
> + break;
> + }
>   default:
>   r = -EINVAL;
>   }
> @@ -6117,6 +6121,22 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 
> tss_selector, int reason,
>  }
>  EXPORT_SYMBOL_GPL(kvm_task_switch);
>  
> +/*
> + * kvm_set_guest_paused() indicates to the guest kernel that it has been
> + * stopped by the hypervisor.  This function will be called from the host 
> only.
> + * EINVAL is returned when the host attempts to set the flag for a guest that
> + * does not support pv clocks.
> + */
> +int kvm_set_guest_paused(struct kvm_vcpu *vcpu)
> +{
> + struct pvclock_vcpu_time_info *src = &vcpu->arch.hv_clock;
> + if (!vcpu->arch.time_page)
> + return -EINVAL;
> + src->flags |= PVCLOCK_GUEST_STOPPED;
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(kvm_set_guest_paused);
> +
>

You're writing to a page without mark_page_dirty(), this breaks live
migration.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4 V7] Add ioctl for KVMCLOCK_GUEST_STOPPED

2011-12-22 Thread Avi Kivity

On 12/15/2011 09:42 PM, Eric B Munson wrote:
>  
> +4.64 KVMCLOCK_GUEST_PAUSED
> +
> +Capability: basic

Capability: basic means that this ioctl is available on any Linux
version that has KVM (apart from 2.6.2[01], before the ABI was
stabilized).  You need to define a new KVM_CAP, document it, and have
KVM_CHECK_EXTENSION acknowledge it, see the other ioctls for examples.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4 V7] Add ioctl for KVMCLOCK_GUEST_STOPPED

2011-12-22 Thread Avi Kivity

On 12/15/2011 09:42 PM, Eric B Munson wrote:
> Now that we have a flag that will tell the guest it was suspended, create an
> interface for that communication using a KVM ioctl.
>  
> +4.64 KVMCLOCK_GUEST_PAUSED
> +
> +Capability: basic
> +Architechtures: Any that implement pvclocks (currently x86 only)
> +Type: vcpu ioctl
> +Parameters: None
> +Returns: 0 on success, -1 on error
> +
> +This signals to the host kernel that the specified guest is being paused by
> +userspace.  The host will set a flag in the pvclock structure that is checked
> +from the soft lockup watchdog.
> +

This can be called during unpause, yes? The documentation should reflect
that.  I'm thinking of the SIGCONT handler here.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/28] kvm tools: Fix KVM_RUN exit code check

2011-12-22 Thread Avi Kivity

On 12/08/2011 07:31 AM, Sasha Levin wrote:
> > 
> > It sounds like Alex is considering KVM PPC's return value in this case (and
> > updating api.txt if appropriate) -- what say you on this patch?  It actually
> > brings kvmtool's KVM_RUN return val check in line with QEMU's (also "< 0") 
> > and
> > nothing PPC will run without it, currently.  (I'm about to repost a new 
> > series,
> > will include it for these reasons, until I hear more complaint ;) )
>
> '<0' is fine as it's what api.txt says :)

What? ioctls return -1 on error, not <0.

The syscalls do return <0, but libc converts that to -1/errno.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5 V5] Avoid soft lockup message when KVM is stopped by host

2011-12-22 Thread Avi Kivity

On 12/19/2011 07:50 PM, Marcelo Tosatti wrote:
> > 
> > Maybe it is good (not sure), need to look into schedstats and think of
> > cases that would break legitimate guest hangs. And it probably also
> > affects the position of clearing the flag on the guest side as its 
> > currently done in Eric's patchset.
>
> Is the task going to be TASK_UNINTERRUPTIBLE with SIGSTOP? Don't think
> so. 

No.

> Note "Preemption while TASK_RUNNING or TASK_UNINTERRUPTIBLE" is not
> functional for guest-paused-via-QEMU-monitor case.

Right, it's a different case.  Note SIGCONT can be trapped, so it's
similar to monitor cont.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5 V5] Avoid soft lockup message when KVM is stopped by host

2011-12-22 Thread Dor Laor


On 12/19/2011 07:59 PM, Amit Shah wrote:

On (Mon) 19 Dec 2011 [14:59:36], Avi Kivity wrote:

On 12/19/2011 02:52 PM, Amit Shah wrote:


(snip)


S4 needs some treatment, though, as resume after s4 doesn't work with
kvmclock enabled.  I didn't realise this series was only handling the
soft lockup case.



What's the issue there?


Not sure yet; after resume, the VM freezes while bringing the 2nd vcpu
up.  Maybe it's just sitting there spinning, maybe the delta needs
adjusting.  Haven't poked at it yet.


Bug 694801 - Guest fail to resume from S4 if guest using kvmclock
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] intel-iommu: Add device info into list before doing context mapping

2011-12-22 Thread Hao, Xudong

Add device info into list before do context mapping, because device info will 
be used by iommu_enable_dev_iotlb function, in this function, pci_enable_ats 
would not be called without this patch.

Signed-off-by: Xudong Hao 
Signed-off-by: Xiantao Zhang 
Acked-by: Chris Wright 
---
 drivers/iommu/intel-iommu.c |   17 +++--
 1 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 
bdc447f..ccf347f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2267,12 +2267,6 @@ static int domain_add_dev_info(struct dmar_domain 
*domain,
if (!info)
return -ENOMEM;
 
-   ret = domain_context_mapping(domain, pdev, translation);
-   if (ret) {
-   free_devinfo_mem(info);
-   return ret;
-   }
-
info->segment = pci_domain_nr(pdev->bus);
info->bus = pdev->bus->number;
info->devfn = pdev->devfn;
@@ -2285,6 +2279,17 @@ static int domain_add_dev_info(struct dmar_domain 
*domain,
pdev->dev.archdata.iommu = info;
spin_unlock_irqrestore(&device_domain_lock, flags);
 
+   ret = domain_context_mapping(domain, pdev, translation);
+   if (ret) {
+   spin_lock_irqsave(&device_domain_lock, flags);
+   list_del(&info->link);
+   list_del(&info->global);
+   pdev->dev.archdata.iommu = NULL;
+   spin_unlock_irqrestore(&device_domain_lock, flags);
+   free_devinfo_mem(info);
+   return ret;
+   }
+
return 0;
 }
 
--
1.6.0.rc1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

56 matches

Mail list logo