Re: [Qemu-devel] [RFC]QEMU disk I/O limits

2011-06-02 Thread Zhi Yong Wu
On Thu, Jun 2, 2011 at 5:33 PM, Michal Suchanek  wrote:
> On 1 June 2011 05:12, Zhi Yong Wu  wrote:
>> On Tue, May 31, 2011 at 03:55:49PM -0400, Vivek Goyal wrote:
>>>Date: Tue, 31 May 2011 15:55:49 -0400
>>>From: Vivek Goyal 
>>>To: Zhi Yong Wu 
>>>Cc: kw...@redhat.com, aligu...@us.ibm.com, stefa...@linux.vnet.ibm.com,
>>>       kvm@vger.kernel.org, guijianf...@cn.fujitsu.com,
>>>       qemu-de...@nongnu.org, wu...@cn.ibm.com,
>>>       herb...@gondor.hengli.com.au, luow...@cn.ibm.com, zh...@cn.ibm.com,
>>>       zhaoy...@cn.ibm.com, l...@redhat.com, rahar...@us.ibm.com
>>>Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>>>User-Agent: Mutt/1.5.21 (2010-09-15)
>>>
>>>On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
>>>
>>>[..]
     3.) How the users enable and play with it
     QEMU -drive option will be extended so that disk I/O limits can be 
 specified on its command line, such as -drive [iops=xxx,][throughput=xxx] 
 or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this 
 argument is specified, it means that "disk I/O limits" feature is enabled 
 for this drive disk.
>>>
>>>How does throughput interface look like? is it bytes per second or something
>>>else?
>> HI, Vivek,
>> It will be a value based on bytes per second.
>>
>>>
>>>Do we have read and write variants for throughput as we have for iops.
>> QEMU code has two variants "rd_bytes, wr_bytes", but we maybe need to get 
>> their bytes per second.
>>
>>>
>>>if you have bytes interface(as kenrel does), then "bps_rd" and "bps_wr"
>>>might be good names too for thoughput interface.
>> I agree with you, and can change them as your suggestions.
>>
>
> Changing them this way is not going to be an improvement. While
> rd_bytes and wr_bytes lack the time interval specification bps_rd and
right, rd_bytes and wr_bytes lack.
> bps_wr is ambiguous. Is that bits? bytes? Sure, there should be some
if we implement them, they will be bytes.
> distinction by capitalization but that does not apply since qemu
> arguments are all lowercase.
Michal,
maybe you misunderstand what i mean. I mean that two variables
rd_bytes and wr_bytes exist in block.c file, and are not qemu
arguments. But bps_rd and bps_wr wil be added as qemu arguments.

Regards,

Zhiyong Wu
>
> Thanks
>
> Michal
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: Clarify KVM_ASSIGN_PCI_DEVICE documentation

2011-06-02 Thread Jan Kiszka
From: Jan Kiszka 

Neither host_irq nor the guest_msi struct are used anymore today.
Tag the former, drop the latter to avoid confusion.

Signed-off-by: Jan Kiszka 
---
 Documentation/virtual/kvm/api.txt |7 +--
 include/linux/kvm.h   |8 +---
 2 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 2bd06b0..0ebe922 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1143,15 +1143,10 @@ Assigns an IRQ to a passed-through device.
 
 struct kvm_assigned_irq {
__u32 assigned_dev_id;
-   __u32 host_irq;
+   __u32 host_irq; /* ignored (legacy field) */
__u32 guest_irq;
__u32 flags;
union {
-   struct {
-   __u32 addr_lo;
-   __u32 addr_hi;
-   __u32 data;
-   } guest_msi;
__u32 reserved[12];
};
 };
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 55ef181..9c9ca7c 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -773,20 +773,14 @@ struct kvm_assigned_pci_dev {
 
 struct kvm_assigned_irq {
__u32 assigned_dev_id;
-   __u32 host_irq;
+   __u32 host_irq; /* ignored (legacy field) */
__u32 guest_irq;
__u32 flags;
union {
-   struct {
-   __u32 addr_lo;
-   __u32 addr_hi;
-   __u32 data;
-   } guest_msi;
__u32 reserved[12];
};
 };
 
-
 struct kvm_assigned_msix_nr {
__u32 assigned_dev_id;
__u16 entry_nr;
-- 
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Add 'kvm pause' command

2011-06-02 Thread Sasha Levin
On Fri, 2011-06-03 at 09:29 +0300, Pekka Enberg wrote:
> On Thu, Jun 2, 2011 at 11:22 PM, Sasha Levin  wrote:
> > This patch adds a 'kvm debug' command that's currently an alias for
> >
> >  kill -USR2 `pidof kvm`
> >
> > Which pauses a guest (freezes all VCPU threads).
> >
> > Signed-off-by: Sasha Levin 
> 
> What is this useful for? Debugging? Don't we need a command to
> unfreeze the guest as well?

It unfreezes the guest if it's paused. I should have mentioned that in
the commit message.

-- 

Sasha.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Add 'kvm pause' command

2011-06-02 Thread Pekka Enberg
On Thu, Jun 2, 2011 at 11:22 PM, Sasha Levin  wrote:
> This patch adds a 'kvm debug' command that's currently an alias for
>
>  kill -USR2 `pidof kvm`
>
> Which pauses a guest (freezes all VCPU threads).
>
> Signed-off-by: Sasha Levin 

What is this useful for? Debugging? Don't we need a command to
unfreeze the guest as well?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] kvm tools: Add QCOW level2 caching support

2011-06-02 Thread Pekka Enberg
On Fri, Jun 3, 2011 at 12:19 AM, Sasha Levin  wrote:
>> +static int cache_table(struct qcow *q, u64 *table, u64 offset)
>> +{
>> +     struct qcow_l2_cache *n;
>> +     struct rb_root *r = &q->root;
>> +     struct qcow_l2_cache *lru;
>> +
>> +     n = calloc(1, sizeof(struct qcow_l2_cache));
>                        sizeof(*n)
> sizeof() should use the variable name itself, not the data type. Check
> out chapter 14 in 'Documentation/CodingStyle'.

Well, it doesn't matter that much, to be honest. 'n' could use a
better name, though - 'cache' or 'c'.

Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 3/3] virtio_net: limit xmit polling

2011-06-02 Thread Krishna Kumar2
"Michael S. Tsirkin"  wrote on 06/02/2011 09:04:23 PM:

> > > Is this where the bug was?
> >
> > Return value in free_old_xmit() was wrong. I will re-do against the
> > mainline kernel.
> >
> > Thanks,
> >
> > - KK
>
> Just noting that I'm working on that patch as well, it might
> be more efficient if we don't both of us do this in parallel :)

OK, but my intention was to work on a alternate approach, which
was the reason to base it against your patch.

I will check your latest patch.

thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 4/4] Add instruction fetch checking when walking guest page table

2011-06-02 Thread Yang, Wei Y

This patch adds instruction fetch checking when walking guest page table.

 Signed-off-by: Yang, Wei 
 Signed-off-by: Shan, Haitao 
 Signed-off-by: Li, Xin 

---
 arch/x86/kvm/paging_tmpl.h |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 6c4dc01..6a56ca3 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -246,6 +246,12 @@ walk:
gfn_t gfn;
u32 ac;

+   /* check if the kernel is fetching from user page */
+   if (unlikely(pte_access & PT_USER_MASK) &&
+   kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
+   if (fetch_fault && !user_fault)
+   eperm = true;
+
gfn = gpte_to_gfn_lvl(pte, lvl);
gfn += (addr & PT_LVL_OFFSET_MASK(lvl)) >> PAGE_SHIFT;

@@ -305,7 +311,8 @@ error:

walker->fault.error_code |= write_fault | user_fault;

-   if (fetch_fault && mmu->nx)
+   if (fetch_fault && (mmu->nx ||
+   kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
walker->fault.error_code |= PFERR_FETCH_MASK;
if (rsvd_fault)
walker->fault.error_code |= PFERR_RSVD_MASK;
--
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 2/4] Add SMEP support when setting CR4

2011-06-02 Thread Yang, Wei Y

This patch adds SMEP handling when setting CR4.

 Signed-off-by: Yang, Wei 
 Signed-off-by: Shan, Haitao 
 Signed-off-by: Li, Xin 

---
 arch/x86/kvm/x86.c |   15 +--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 77c9d86..91bfc40 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -579,6 +579,14 @@ static bool guest_cpuid_has_xsave(struct kvm_vcpu *vcpu)
return best && (best->ecx & bit(X86_FEATURE_XSAVE));
 }

+static bool guest_cpuid_has_smep(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpuid_entry2 *best;
+
+   best = kvm_find_cpuid_entry(vcpu, 7, 0);
+   return best && (best->ebx & bit(X86_FEATURE_SMEP));
+}
+
 static void update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
@@ -598,14 +606,17 @@ static void update_cpuid(struct kvm_vcpu *vcpu)
 int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
unsigned long old_cr4 = kvm_read_cr4(vcpu);
-   unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE;
-
+   unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE |
+  X86_CR4_PAE | X86_CR4_SMEP;
if (cr4 & CR4_RESERVED_BITS)
return 1;

if (!guest_cpuid_has_xsave(vcpu) && (cr4 & X86_CR4_OSXSAVE))
return 1;

+   if (!guest_cpuid_has_smep(vcpu) && (cr4 & X86_CR4_SMEP))
+   return 1;
+
if (is_long_mode(vcpu)) {
if (!(cr4 & X86_CR4_PAE))
return 1;
--
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 0/4] Enable SMEP feature support for KVM

2011-06-02 Thread Yang, Wei Y

This patchset enables a new CPU feature SMEP (Supervisor Mode Execution
Protection) in KVM. SMEP prevents kernel from executing code in application.
Updated Intel SDM describes this CPU feature. The document will be published 
soon.

This patchset is based on Fenghua's SMEP patch series, as referred by:
https://lkml.org/lkml/2011/5/17/523

changes since v7:
Correct the indentation and ebx

changes since v6:
Set KVM_CPUID_FLAG_SIGNIFCANT_INDEX flag for leaf 7

changes since v5:
Add kvm_supported_word9_x86_features and mask against it
before masking against host capability

changes since v4:
Update patch 1/4 comment
Change PT_USER_MASK to ACC_USER_MASK

changes since v3:
Add SMEP bit in CR4_RESERVED_BITS while removing cr4_reserved_bits;
Mask CPUID leaf 7 ebx against host capability word9 in do_cpuid_ent;

Changes since v2:
add instruction fetch checking when walking guest page table.
---
 arch/x86/include/asm/kvm_host.h |2 +-
 arch/x86/kvm/paging_tmpl.h  |9 -
 arch/x86/kvm/x86.c  |   35 +---
 3 files changed, 41 insertions(+), 5 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 3/4] Mask function7 ebx against host capability word9

2011-06-02 Thread Yang, Wei Y

This patch masks CPUID leaf 7 ebx against host capability word9.

 Signed-off-by: Yang, Wei 
 Signed-off-by: Shan, Haitao 
 Signed-off-by: Li, Xin 


---
 arch/x86/kvm/x86.c |   20 +++-
 1 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 77c9d86..29d357c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2342,6 +2342,10 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, 
u32 function,
F(ACE2) | F(ACE2_EN) | F(PHE) | F(PHE_EN) |
F(PMM) | F(PMM_EN);
 
+   /* cpuid 7.0.ebx */
+   const u32 kvm_supported_word9_x86_features =
+   F(SMEP);
+
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
do_cpuid_1_ent(entry, function, index);
@@ -2376,7 +2380,7 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, 
u32 function,
}
break;
}
-   /* function 4 and 0xb have additional index. */
+   /* function 4 has additional index. */
case 4: {
int i, cache_type;
 
@@ -2393,6 +2397,20 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, 
u32 function,
}
break;
}
+   case 7: {
+   entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
+   /* Mask ebx against host capbability word 9 */
+   if (index == 0) {
+   entry->ebx &= kvm_supported_word9_x86_features;
+   cpuid_mask(&entry->ebx, 9);
+   } else
+   entry->ebx = 0;
+   entry->eax = 0;
+   entry->ecx = 0;
+   entry->edx = 0;
+   break;
+   }
+   /* function 0xb has additional index. */
case 0xb: {
int i, level_type;
 
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 1/4] Remove SMEP bit from CR4_RESERVED_BITS

2011-06-02 Thread Yang, Wei Y

This patch removes SMEP bit from CR4_RESERVED_BITS.

 Signed-off-by: Yang, Wei 
 Signed-off-by: Shan, Haitao 
 Signed-off-by: Li, Xin 

---
 arch/x86/include/asm/kvm_host.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d2ac8e2..4e63432 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -48,7 +48,7 @@
(~(unsigned long)(X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE\
  | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE \
  | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR  \
- | X86_CR4_OSXSAVE \
+ | X86_CR4_OSXSAVE | X86_CR4_SMEP  \
  | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE))

 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
--
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] nVMX: Fix bug preventing more than two levels of nesting

2011-06-02 Thread Marcelo Tosatti
On Thu, Jun 02, 2011 at 11:54:52AM +0300, Nadav Har'El wrote:
> The nested VMX feature is supposed to fully emulate VMX for the guest. This
> (theoretically) not only allows it to run its own guests, but also also
> to further emulate VMX for its own guests, and allow arbitrarily deep nesting.
> 
> This patch fixes a bug (discovered by Kevin Tian) in handling a VMLAUNCH
> by L2, which prevented deeper nesting.
> 
> Deeper nesting now works (I only actually tested L3), but is currently
> *absurdly* slow, to the point of being unusable.
> 
> Signed-off-by: Nadav Har'El 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Document KVM_IOEVENTFD

2011-06-02 Thread Marcelo Tosatti
On Thu, Jun 02, 2011 at 04:16:20PM +0200, Jan Kiszka wrote:
> On 2011-06-02 15:43, Avi Kivity wrote:
> > On 06/01/2011 01:51 PM, Jan Kiszka wrote:
> >> On 2011-05-31 15:44, Marcelo Tosatti wrote:
> >> >  On Sat, May 28, 2011 at 02:12:30PM +0300, Sasha Levin wrote:
> >> >>  Document KVM_IOEVENTFD that can be used to receive
> >> >>  notifications of PIO/MMIO events without triggering
> >> >>  an exit.
> >> >>
> >> >>  Cc: Avi Kivity
> >> >>  Cc: Marcelo Tosatti
> >> >>  Signed-off-by: Sasha Levin
> >> >>  ---
> >> >>   Documentation/virtual/kvm/api.txt |   30
> >> ++
> >> >>   1 files changed, 30 insertions(+), 0 deletions(-)
> >> >
> >> >  Applied (with wording fix), thanks.
> >>
> >> Requires section number fix-up (4.56 ->  4.58).
> > 
> > The usual fix is to cut a 5.0 release.
> 
> Too bad that we already have one ("5. The kvm_run structure"). :)
> 
> Jan
> 
> --8<--
> 
> From: Jan Kiszka 
> 
> KVM: Fixup documentation section numbering
> 
> Signed-off-by: Jan Kiszka 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Clarify KVM_ASSIGN_PCI_DEVICE documentation

2011-06-02 Thread Marcelo Tosatti
On Thu, Jun 02, 2011 at 04:16:09PM +0200, Jan Kiszka wrote:
> From: Jan Kiszka 
> 
> Neither host_irq nor the guest_msi struct are used anymore today.
> Tag the former, drop the latter to avoid confusion.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  Documentation/virtual/kvm/api.txt |7 +--
>  1 files changed, 1 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 2bd06b0..7adc1ac 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1143,15 +1143,10 @@ Assigns an IRQ to a passed-through device.
>  
>  struct kvm_assigned_irq {
>   __u32 assigned_dev_id;
> - __u32 host_irq;
> + __u32 host_irq; /* ignored (legacy field) */
>   __u32 guest_irq;
>   __u32 flags;
>   union {
> - struct {
> - __u32 addr_lo;
> - __u32 addr_hi;
> - __u32 data;
> - } guest_msi;
>   __u32 reserved[12];
>   };
>  };

Please update the kernel headers accordingly.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Silence warning on 32-bit hosts

2011-06-02 Thread Marcelo Tosatti
On Wed, Jun 01, 2011 at 12:57:30PM +0200, Jan Kiszka wrote:
> a is unused now on CONFIG_X86_32.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  arch/x86/kvm/vmx.c |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: Remove some spurious diffs to upstream

2011-06-02 Thread Marcelo Tosatti
On Wed, Jun 01, 2011 at 10:54:45AM +0200, Jan Kiszka wrote:
> No functional changes.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  Makefile.target|3 ---
>  block/raw-posix.c  |2 --
>  cutils.c   |5 -
>  dma-helpers.c  |4 
>  fpu/softfloat-native.c |1 -
>  hw/i8259.c |5 +
>  sysemu.h   |1 -
>  target-i386/kvm.c  |2 +-
>  target-i386/machine.c  |1 -
>  9 files changed, 2 insertions(+), 22 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "qemu-kvm: Update kvm_check_many_ioeventfds for qemu-kvm use"

2011-06-02 Thread Marcelo Tosatti
On Wed, Jun 01, 2011 at 10:54:23AM +0200, Jan Kiszka wrote:
> This reverts commit cad7c5d76b30d30fc91dea049fef7340fd96ff3c.
> 
> We migrated to upstream io-thread, so this hack became obsolete.
> 
> Signed-off-by: Jan Kiszka 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/4] Mask function7 ebx against host capability word9

2011-06-02 Thread Marcelo Tosatti
On Wed, Jun 01, 2011 at 09:07:51PM +0800, Yang, Wei Y wrote:
> 
> This patch masks CPUID leaf 7 ebx against host capability word9.
> 
>  Signed-off-by: Yang, Wei 
>  Signed-off-by: Shan, Haitao 
>  Signed-off-by: Li, Xin 
> 
> ---
>  arch/x86/kvm/x86.c |   21 -
>  1 files changed, 20 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 91bfc40..d27202d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2353,6 +2353,10 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 
> *entry, u32 function,
>   F(ACE2) | F(ACE2_EN) | F(PHE) | F(PHE_EN) |
>   F(PMM) | F(PMM_EN);
>  
> + /* cpuid 7.0.ebx */
> + const u32 kvm_supported_word9_x86_features =
> + F(SMEP);
> +
>   /* all calls to cpuid_count() should be made on the same cpu */
>   get_cpu();
>   do_cpuid_1_ent(entry, function, index);
> @@ -2387,7 +2391,7 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 
> *entry, u32 function,
>   }
>   break;
>   }
> - /* function 4 and 0xb have additional index. */
> + /* function 4 has additional index. */
>   case 4: {
>   int i, cache_type;
>  
> @@ -2404,6 +2408,21 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 
> *entry, u32 function,
>   }
>   break;
>   }
> + case 7: {
> + entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> + /* Mask ebx against host capbability word 9 */
> + if (index == 0) {
> + entry->ebx &= kvm_supported_word9_x86_features;
> + cpuid_mask(&entry->edx, 9);

ebx?

> + }
> + else
> + entry->ebx = 0;

indentation is wrong.


> + entry->eax = 0;
> + entry->ecx = 0;
> + entry->edx = 0;
> + break;
> + }
> + /* function 0xb has additional index. */
>   case 0xb: {
>   int i, level_type;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Add 'kvm pause' command

2011-06-02 Thread Prasad Joshi
On Thu, Jun 2, 2011 at 9:22 PM, Sasha Levin  wrote:
> This patch adds a 'kvm debug' command that's currently an alias for
>
>  kill -USR2 `pidof kvm`
>
> Which pauses a guest (freezes all VCPU threads).
>
> Signed-off-by: Sasha Levin 
> ---
>  tools/kvm/Makefile                |    1 +
>  tools/kvm/include/kvm/kvm-pause.h |    6 ++
>  tools/kvm/kvm-cmd.c               |    2 ++
>  tools/kvm/kvm-pause.c             |   13 +
>  4 files changed, 22 insertions(+), 0 deletions(-)
>  create mode 100644 tools/kvm/include/kvm/kvm-pause.h
>  create mode 100644 tools/kvm/kvm-pause.c
>
> diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
> index a05a6b1..ed82cdf 100644
> --- a/tools/kvm/Makefile
> +++ b/tools/kvm/Makefile
> @@ -42,6 +42,7 @@ OBJS  += irq.o
>  OBJS   += kvm-cmd.o
>  OBJS   += kvm-debug.o
>  OBJS   += kvm-help.o
> +OBJS    += kvm-pause.o
>  OBJS   += kvm-run.o
>  OBJS   += mptable.o
>  OBJS   += rbtree.o
> diff --git a/tools/kvm/include/kvm/kvm-pause.h 
> b/tools/kvm/include/kvm/kvm-pause.h
> new file mode 100644
> index 000..0f8e96b
> --- /dev/null
> +++ b/tools/kvm/include/kvm/kvm-pause.h
> @@ -0,0 +1,6 @@
> +#ifndef KVM__PAUSE_H
> +#define KVM__PAUSE_H
> +
> +int kvm_cmd_pause(int argc, const char **argv, const char *prefix);
> +
> +#endif
> diff --git a/tools/kvm/kvm-cmd.c b/tools/kvm/kvm-cmd.c
> index 2ea51a5..ffbc4ff 100644
> --- a/tools/kvm/kvm-cmd.c
> +++ b/tools/kvm/kvm-cmd.c
> @@ -6,11 +6,13 @@
>
>  /* user defined header files */
>  #include "kvm/kvm-debug.h"
> +#include "kvm/kvm-pause.h"
>  #include "kvm/kvm-help.h"
>  #include "kvm/kvm-cmd.h"
>  #include "kvm/kvm-run.h"
>
>  struct cmd_struct kvm_commands[] = {
> +       { "pause", kvm_cmd_pause, NULL,         0 },

We automatically generate  header file that
lists the command options with kvm. Should we ensure 'pause' is added
to the file? I think then './kvm help' will display 'pause' as one of
the possible command.

I think we should ensure 'pause' is added to the file during compilation.

>        { "debug", kvm_cmd_debug, NULL,         0 },
>        { "help",  kvm_cmd_help,  NULL,         0 },
>        { "run",   kvm_cmd_run,   kvm_run_help, 0 },
> diff --git a/tools/kvm/kvm-pause.c b/tools/kvm/kvm-pause.c
> new file mode 100644
> index 000..fdf8714
> --- /dev/null
> +++ b/tools/kvm/kvm-pause.c
> @@ -0,0 +1,13 @@
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +
> +int kvm_cmd_pause(int argc, const char **argv, const char *prefix)
> +{
> +       signal(SIGUSR2, SIG_IGN);
> +       return system("kill -USR2 $(pidof kvm)");
> +}
> --
> 1.7.5.3
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] kvm tools: Add QCOW level2 caching support

2011-06-02 Thread Sasha Levin
On Thu, 2011-06-02 at 21:01 +0100, Prasad Joshi wrote:
> QCOW uses two tables level1 (L1) table and level2 (L2) table. The L1 table
> points to offset of L2 table. When a QCOW image is probed, the L1 table is
> cached in the memory to avoid reading it from disk on every access. This
> caching improves the performance.
> 
> The similar performance improvement can be observed when L2 tables are cached.
> It is impossible to cache all of the L2 tables because of the memory
> constraint. The patch adds L2 table caching capability for up to 128 L2 
> tables,
> it uses combination of RB tree and List to manage the L2 cached tables. The
> link list implementation helps in building simple LRU structure and RB tree
> helps to search cached table efficiently
> 
> The performance numbers are below, the machine was started with following
> command line arguments
> 
> $ ./kvm run -d /home/prasad/VMDisks/Ubuntu10.10_64_cilk_qemu.qcow \
> > --params "root=/dev/vda1" -m 1024
> 
> Without QCOW caching
> 
> $ bonnie++ -d tmp/ -c 10 -s 2048
> Writing a byte at a time...done
> Writing intelligently...done
> Rewriting...done
> Reading a byte at a time...done
> Reading intelligently...done
> start 'em...done...done...done...done...done...
> Create files in sequential order...done.
> Stat files in sequential order...done.
> Delete files in sequential order...done.
> Create files in random order...done.
> Stat files in random order...done.
> Delete files in random order...done.
> Version  1.96   --Sequential Output-- --Sequential Input- 
> --Random-
> Concurrency  10 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
> --Seeks--
> MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
> %CP
> prasad-virtual-m 2G  1043  99 555406  74 227605  55  5360  99 489080  68 
> + +++
> Latency 24646us   48544us   57893us6686us3595us   21026us
> Version  1.96   --Sequential Create-- Random 
> Create
> prasad-virtual-mach -Create-- --Read--- -Delete-- -Create-- --Read--- 
> -Delete--
>   files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec 
> %CP
>  16 + +++ + +++ + +++ + +++ + +++ + 
> +++
> Latency   343us1175us 327us 555us  48us  82us
> 1.96,1.96,prasad-virtual-machine,10,1307043085,2G,,1043,99,555406,74,227605,55,
> 5360,99,489080,68,+,+++,16,+,+++,+,+++,+,+++,+,+++,
> +,+++,+,+++,24646us,48544us,57893us,6686us,3595us,21026us,343us,1175us,
> 327us,555us,48us,82us
> 
> With QCOW caching
> =
> $ bonnie++ -d tmp/ -c 10 -s 2048
> Writing a byte at a time...done
> Writing intelligently...done
> Rewriting...done
> Reading a byte at a time...done
> Reading intelligently...done
> start 'em...done...done...done...done...done...
> Create files in sequential order...done.
> Stat files in sequential order...done.
> Delete files in sequential order...done.
> Create files in random order...done.
> Stat files in random order...done.
> Delete files in random order...done.
> Version  1.96   --Sequential Output-- --Sequential Input- 
> --Random-
> Concurrency  10 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
> --Seeks--
> MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
> %CP
> prasad-virtual-m 2G  1033  99 467899  64 182387  41  5422 100 338294  48 
> + +++
> Latency 21549us   60585us   65723us6331us   30014us   19994us
> Version  1.96   --Sequential Create-- Random 
> Create
> prasad-virtual-mach -Create-- --Read--- -Delete-- -Create-- --Read--- 
> -Delete--
>   files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec 
> %CP
>  16 + +++ + +++ + +++ + +++ + +++ + 
> +++
> Latency   478us1142us 344us 402us  72us  98us
> 1.96,1.96,prasad-virtual-machine,10,1307042839,2G,,1033,99,467899,64,182387,41,
> 5422,100,338294,48,+,+++,16,+,+++,+,+++,+,+++,+,+++,
> +,+++,+,+++,21549us,60585us,65723us,6331us,30014us,19994us,478us,1142us,
> 344us,402us,72us,98us
> 
> Summary of performance numbers
> ==
> There is not much difference with sequential character operations are
> performed, the code with caching performed better by small margin. The caching
> code performance raised by 18% to 24% with sequential block output and by 44%
> for sequentail block input. Which is understandable as the Level2 table will
> always be cached after a write operation. Random seek operation worked slower
> with caching code.
> 
> Signed-off-by: Prasad Joshi 
> ---
>  tools/kvm/disk/qcow.c|  216 
> ++
>  tools/kvm/include/kvm/qcow.h |   17 
>  2 files changed, 216 insertions(+), 17 deletions(-)
> 
> diff --git a/tools/kvm/disk/qcow.c b/tools/kvm/dis

Re: [PATCH v3] kvm tools: Add QCOW level2 caching support

2011-06-02 Thread Pekka Enberg
On Thu, Jun 2, 2011 at 11:01 PM, Prasad Joshi  wrote:
> There is not much difference with sequential character operations are
> performed, the code with caching performed better by small margin. The caching
> code performance raised by 18% to 24% with sequential block output and by 44%
> for sequentail block input. Which is understandable as the Level2 table will
> always be cached after a write operation. Random seek operation worked slower
> with caching code.

Btw, please mention how much slower random seeks got in the next
version of the patch - unless you're able to make the slowdown go
away, of course. ;-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] kvm tools: Add QCOW level2 caching support

2011-06-02 Thread Pekka Enberg
On Thu, 2011-06-02 at 21:57 +0100, Prasad Joshi wrote:
> > Assuming the data is just mixed up, I'm not completely happy that
> > we're making random seeks almost 5% slower. Any ideas why that's
> > happening?
> 
> Yes they are mixed.
> 
> I am assuming, probably delay while caching.

That doesn't make sense to me. Why would "caching delay" be
significantly slower than the non-cached version where we do full read
all the time? Maybe I didn't quite understand the code but that sounds
like a bug lurking somewhere in the caching code - maybe we cache too
much or something?

Pekka

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] kvm tools: Add QCOW level2 caching support

2011-06-02 Thread Prasad Joshi
On Thu, Jun 2, 2011 at 9:37 PM, Pekka Enberg  wrote:
> On Thu, Jun 2, 2011 at 11:01 PM, Prasad Joshi  
> wrote:
>> QCOW uses two tables level1 (L1) table and level2 (L2) table. The L1 table
>> points to offset of L2 table. When a QCOW image is probed, the L1 table is
>> cached in the memory to avoid reading it from disk on every access. This
>> caching improves the performance.
>>
>> The similar performance improvement can be observed when L2 tables are 
>> cached.
>> It is impossible to cache all of the L2 tables because of the memory
>> constraint. The patch adds L2 table caching capability for up to 128 L2 
>> tables,
>> it uses combination of RB tree and List to manage the L2 cached tables. The
>> link list implementation helps in building simple LRU structure and RB tree
>> helps to search cached table efficiently
>>
>> The performance numbers are below, the machine was started with following
>> command line arguments
>>
>> $ ./kvm run -d /home/prasad/VMDisks/Ubuntu10.10_64_cilk_qemu.qcow \
>>> --params "root=/dev/vda1" -m 1024
>>
>> Without QCOW caching
>> 
>> $ bonnie++ -d tmp/ -c 10 -s 2048
>> Writing a byte at a time...done
>> Writing intelligently...done
>> Rewriting...done
>> Reading a byte at a time...done
>> Reading intelligently...done
>> start 'em...done...done...done...done...done...
>> Create files in sequential order...done.
>> Stat files in sequential order...done.
>> Delete files in sequential order...done.
>> Create files in random order...done.
>> Stat files in random order...done.
>> Delete files in random order...done.
>> Version  1.96       --Sequential Output-- --Sequential Input- 
>> --Random-
>> Concurrency  10     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
>> --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
>> %CP
>> prasad-virtual-m 2G  1043  99 555406  74 227605  55  5360  99 489080  68 
>> + +++
>> Latency             24646us   48544us   57893us    6686us    3595us   21026us
>> Version  1.96       --Sequential Create-- Random 
>> Create
>> prasad-virtual-mach -Create-- --Read--- -Delete-- -Create-- --Read--- 
>> -Delete--
>>              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec 
>> %CP
>>                 16 + +++ + +++ + +++ + +++ + +++ + 
>> +++
>> Latency               343us    1175us     327us     555us      48us      82us
>> 1.96,1.96,prasad-virtual-machine,10,1307043085,2G,,1043,99,555406,74,227605,55,
>> 5360,99,489080,68,+,+++,16,+,+++,+,+++,+,+++,+,+++,
>> +,+++,+,+++,24646us,48544us,57893us,6686us,3595us,21026us,343us,1175us,
>> 327us,555us,48us,82us
>>
>> With QCOW caching
>> =
>> $ bonnie++ -d tmp/ -c 10 -s 2048
>> Writing a byte at a time...done
>> Writing intelligently...done
>> Rewriting...done
>> Reading a byte at a time...done
>> Reading intelligently...done
>> start 'em...done...done...done...done...done...
>> Create files in sequential order...done.
>> Stat files in sequential order...done.
>> Delete files in sequential order...done.
>> Create files in random order...done.
>> Stat files in random order...done.
>> Delete files in random order...done.
>> Version  1.96       --Sequential Output-- --Sequential Input- 
>> --Random-
>> Concurrency  10     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
>> --Seeks--
>> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
>> %CP
>> prasad-virtual-m 2G  1033  99 467899  64 182387  41  5422 100 338294  48 
>> + +++
>> Latency             21549us   60585us   65723us    6331us   30014us   19994us
>> Version  1.96       --Sequential Create-- Random 
>> Create
>> prasad-virtual-mach -Create-- --Read--- -Delete-- -Create-- --Read--- 
>> -Delete--
>>              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec 
>> %CP
>>                 16 + +++ + +++ + +++ + +++ + +++ + 
>> +++
>> Latency               478us    1142us     344us     402us      72us      98us
>> 1.96,1.96,prasad-virtual-machine,10,1307042839,2G,,1033,99,467899,64,182387,41,
>> 5422,100,338294,48,+,+++,16,+,+++,+,+++,+,+++,+,+++,
>> +,+++,+,+++,21549us,60585us,65723us,6331us,30014us,19994us,478us,1142us,
>> 344us,402us,72us,98us
>>
>> Summary of performance numbers
>> ==
>> There is not much difference with sequential character operations are
>> performed, the code with caching performed better by small margin. The 
>> caching
>> code performance raised by 18% to 24% with sequential block output and by 44%
>> for sequentail block input. Which is understandable as the Level2 table will
>> always be cached after a write operation. Random seek operation worked slower
>> with caching code.
>
> I see performance _degradation_ in the raw data:

I think I copied the wrong results in the commit log.

>
> Before:
>

Re: [PATCH v3] kvm tools: Add QCOW level2 caching support

2011-06-02 Thread Pekka Enberg
On Thu, Jun 2, 2011 at 11:01 PM, Prasad Joshi  wrote:
> QCOW uses two tables level1 (L1) table and level2 (L2) table. The L1 table
> points to offset of L2 table. When a QCOW image is probed, the L1 table is
> cached in the memory to avoid reading it from disk on every access. This
> caching improves the performance.
>
> The similar performance improvement can be observed when L2 tables are cached.
> It is impossible to cache all of the L2 tables because of the memory
> constraint. The patch adds L2 table caching capability for up to 128 L2 
> tables,
> it uses combination of RB tree and List to manage the L2 cached tables. The
> link list implementation helps in building simple LRU structure and RB tree
> helps to search cached table efficiently
>
> The performance numbers are below, the machine was started with following
> command line arguments
>
> $ ./kvm run -d /home/prasad/VMDisks/Ubuntu10.10_64_cilk_qemu.qcow \
>> --params "root=/dev/vda1" -m 1024
>
> Without QCOW caching
> 
> $ bonnie++ -d tmp/ -c 10 -s 2048
> Writing a byte at a time...done
> Writing intelligently...done
> Rewriting...done
> Reading a byte at a time...done
> Reading intelligently...done
> start 'em...done...done...done...done...done...
> Create files in sequential order...done.
> Stat files in sequential order...done.
> Delete files in sequential order...done.
> Create files in random order...done.
> Stat files in random order...done.
> Delete files in random order...done.
> Version  1.96       --Sequential Output-- --Sequential Input- 
> --Random-
> Concurrency  10     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
> %CP
> prasad-virtual-m 2G  1043  99 555406  74 227605  55  5360  99 489080  68 
> + +++
> Latency             24646us   48544us   57893us    6686us    3595us   21026us
> Version  1.96       --Sequential Create-- Random 
> Create
> prasad-virtual-mach -Create-- --Read--- -Delete-- -Create-- --Read--- 
> -Delete--
>              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>                 16 + +++ + +++ + +++ + +++ + +++ + +++
> Latency               343us    1175us     327us     555us      48us      82us
> 1.96,1.96,prasad-virtual-machine,10,1307043085,2G,,1043,99,555406,74,227605,55,
> 5360,99,489080,68,+,+++,16,+,+++,+,+++,+,+++,+,+++,
> +,+++,+,+++,24646us,48544us,57893us,6686us,3595us,21026us,343us,1175us,
> 327us,555us,48us,82us
>
> With QCOW caching
> =
> $ bonnie++ -d tmp/ -c 10 -s 2048
> Writing a byte at a time...done
> Writing intelligently...done
> Rewriting...done
> Reading a byte at a time...done
> Reading intelligently...done
> start 'em...done...done...done...done...done...
> Create files in sequential order...done.
> Stat files in sequential order...done.
> Delete files in sequential order...done.
> Create files in random order...done.
> Stat files in random order...done.
> Delete files in random order...done.
> Version  1.96       --Sequential Output-- --Sequential Input- 
> --Random-
> Concurrency  10     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
> %CP
> prasad-virtual-m 2G  1033  99 467899  64 182387  41  5422 100 338294  48 
> + +++
> Latency             21549us   60585us   65723us    6331us   30014us   19994us
> Version  1.96       --Sequential Create-- Random 
> Create
> prasad-virtual-mach -Create-- --Read--- -Delete-- -Create-- --Read--- 
> -Delete--
>              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
>                 16 + +++ + +++ + +++ + +++ + +++ + +++
> Latency               478us    1142us     344us     402us      72us      98us
> 1.96,1.96,prasad-virtual-machine,10,1307042839,2G,,1033,99,467899,64,182387,41,
> 5422,100,338294,48,+,+++,16,+,+++,+,+++,+,+++,+,+++,
> +,+++,+,+++,21549us,60585us,65723us,6331us,30014us,19994us,478us,1142us,
> 344us,402us,72us,98us
>
> Summary of performance numbers
> ==
> There is not much difference with sequential character operations are
> performed, the code with caching performed better by small margin. The caching
> code performance raised by 18% to 24% with sequential block output and by 44%
> for sequentail block input. Which is understandable as the Level2 table will
> always be cached after a write operation. Random seek operation worked slower
> with caching code.

I see performance _degradation_ in the raw data:

Before:

> Concurrency  10     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
> %CP
> prasad-virtual-m 2G  1043  99 555406  74 227605  55  5360  99 489080  6

Re: KVM host freezing

2011-06-02 Thread Marc Haber
On Thu, Jun 02, 2011 at 04:41:30PM +0300, Avi Kivity wrote:
> On 06/02/2011 11:25 AM, Marc Haber wrote:
> >Is there any possibility that the freezes have to do with the
> >"unhandles (rd|wr)msr" messages?
> 
> Very unlikely.

What does that mean anyway? It's looked with sufficiently high
priority to get spewed onto the console.

> >When else could be the cause?
> >
> >In the mean time, I have taken the box offline and am running memtest.
> >Up to now, everything seems to be fine.
> >
> >Any hints will be appreciated.
> 
> You might try setting up netconsole to get reliable logging.

logging of what? Of things written to syslog before the freeze occurs
so that they don't reach the disk reliably? I could set up logging
(which facility/priority?) to the serial console.

> Do you have NMIs?  'grep NMI /proc/interrupts'.

about one per minute.

> Does running 'perf top -F 1' make the hang come sooner?

If that's important, I'll make some effort to compile a statically
linked perf for 2.6.39. Is it?

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm tools: Add 'kvm pause' command

2011-06-02 Thread Sasha Levin
This patch adds a 'kvm debug' command that's currently an alias for

  kill -USR2 `pidof kvm`

Which pauses a guest (freezes all VCPU threads).

Signed-off-by: Sasha Levin 
---
 tools/kvm/Makefile|1 +
 tools/kvm/include/kvm/kvm-pause.h |6 ++
 tools/kvm/kvm-cmd.c   |2 ++
 tools/kvm/kvm-pause.c |   13 +
 4 files changed, 22 insertions(+), 0 deletions(-)
 create mode 100644 tools/kvm/include/kvm/kvm-pause.h
 create mode 100644 tools/kvm/kvm-pause.c

diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile
index a05a6b1..ed82cdf 100644
--- a/tools/kvm/Makefile
+++ b/tools/kvm/Makefile
@@ -42,6 +42,7 @@ OBJS  += irq.o
 OBJS   += kvm-cmd.o
 OBJS   += kvm-debug.o
 OBJS   += kvm-help.o
+OBJS+= kvm-pause.o
 OBJS   += kvm-run.o
 OBJS   += mptable.o
 OBJS   += rbtree.o
diff --git a/tools/kvm/include/kvm/kvm-pause.h 
b/tools/kvm/include/kvm/kvm-pause.h
new file mode 100644
index 000..0f8e96b
--- /dev/null
+++ b/tools/kvm/include/kvm/kvm-pause.h
@@ -0,0 +1,6 @@
+#ifndef KVM__PAUSE_H
+#define KVM__PAUSE_H
+
+int kvm_cmd_pause(int argc, const char **argv, const char *prefix);
+
+#endif
diff --git a/tools/kvm/kvm-cmd.c b/tools/kvm/kvm-cmd.c
index 2ea51a5..ffbc4ff 100644
--- a/tools/kvm/kvm-cmd.c
+++ b/tools/kvm/kvm-cmd.c
@@ -6,11 +6,13 @@
 
 /* user defined header files */
 #include "kvm/kvm-debug.h"
+#include "kvm/kvm-pause.h"
 #include "kvm/kvm-help.h"
 #include "kvm/kvm-cmd.h"
 #include "kvm/kvm-run.h"
 
 struct cmd_struct kvm_commands[] = {
+   { "pause", kvm_cmd_pause, NULL, 0 },
{ "debug", kvm_cmd_debug, NULL, 0 },
{ "help",  kvm_cmd_help,  NULL, 0 },
{ "run",   kvm_cmd_run,   kvm_run_help, 0 },
diff --git a/tools/kvm/kvm-pause.c b/tools/kvm/kvm-pause.c
new file mode 100644
index 000..fdf8714
--- /dev/null
+++ b/tools/kvm/kvm-pause.c
@@ -0,0 +1,13 @@
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+int kvm_cmd_pause(int argc, const char **argv, const char *prefix)
+{
+   signal(SIGUSR2, SIG_IGN);
+   return system("kill -USR2 $(pidof kvm)");
+}
-- 
1.7.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] kvm tools: Add QCOW level2 caching support

2011-06-02 Thread Prasad Joshi
QCOW uses two tables level1 (L1) table and level2 (L2) table. The L1 table
points to offset of L2 table. When a QCOW image is probed, the L1 table is
cached in the memory to avoid reading it from disk on every access. This
caching improves the performance.

The similar performance improvement can be observed when L2 tables are cached.
It is impossible to cache all of the L2 tables because of the memory
constraint. The patch adds L2 table caching capability for up to 128 L2 tables,
it uses combination of RB tree and List to manage the L2 cached tables. The
link list implementation helps in building simple LRU structure and RB tree
helps to search cached table efficiently

The performance numbers are below, the machine was started with following
command line arguments

$ ./kvm run -d /home/prasad/VMDisks/Ubuntu10.10_64_cilk_qemu.qcow \
> --params "root=/dev/vda1" -m 1024

Without QCOW caching

$ bonnie++ -d tmp/ -c 10 -s 2048
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.96   --Sequential Output-- --Sequential Input- --Random-
Concurrency  10 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
prasad-virtual-m 2G  1043  99 555406  74 227605  55  5360  99 489080  68 + 
+++
Latency 24646us   48544us   57893us6686us3595us   21026us
Version  1.96   --Sequential Create-- Random Create
prasad-virtual-mach -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16 + +++ + +++ + +++ + +++ + +++ + +++
Latency   343us1175us 327us 555us  48us  82us
1.96,1.96,prasad-virtual-machine,10,1307043085,2G,,1043,99,555406,74,227605,55,
5360,99,489080,68,+,+++,16,+,+++,+,+++,+,+++,+,+++,
+,+++,+,+++,24646us,48544us,57893us,6686us,3595us,21026us,343us,1175us,
327us,555us,48us,82us

With QCOW caching
=
$ bonnie++ -d tmp/ -c 10 -s 2048
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.96   --Sequential Output-- --Sequential Input- --Random-
Concurrency  10 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
prasad-virtual-m 2G  1033  99 467899  64 182387  41  5422 100 338294  48 + 
+++
Latency 21549us   60585us   65723us6331us   30014us   19994us
Version  1.96   --Sequential Create-- Random Create
prasad-virtual-mach -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16 + +++ + +++ + +++ + +++ + +++ + +++
Latency   478us1142us 344us 402us  72us  98us
1.96,1.96,prasad-virtual-machine,10,1307042839,2G,,1033,99,467899,64,182387,41,
5422,100,338294,48,+,+++,16,+,+++,+,+++,+,+++,+,+++,
+,+++,+,+++,21549us,60585us,65723us,6331us,30014us,19994us,478us,1142us,
344us,402us,72us,98us

Summary of performance numbers
==
There is not much difference with sequential character operations are
performed, the code with caching performed better by small margin. The caching
code performance raised by 18% to 24% with sequential block output and by 44%
for sequentail block input. Which is understandable as the Level2 table will
always be cached after a write operation. Random seek operation worked slower
with caching code.

Signed-off-by: Prasad Joshi 
---
 tools/kvm/disk/qcow.c|  216 ++
 tools/kvm/include/kvm/qcow.h |   17 
 2 files changed, 216 insertions(+), 17 deletions(-)

diff --git a/tools/kvm/disk/qcow.c b/tools/kvm/disk/qcow.c
index b52b734..4fa3850 100644
--- a/tools/kvm/disk/qcow.c
+++ b/tools/kvm/disk/qcow.c
@@ -16,6 +16,140 @@
 #include 
 #include 
 
+static int insert(struct rb_root *root, struct qcow_l2_cache *new)
+{
+   struct rb_node **link = &(root->rb_node), *parent = NULL;
+   u64 offset 

Re: [PATCHv2 RFC 3/4] virtio_net: limit xmit polling

2011-06-02 Thread Michael S. Tsirkin
On Thu, Jun 02, 2011 at 11:09:53AM -0700, Sridhar Samudrala wrote:
> On Thu, 2011-06-02 at 18:43 +0300, Michael S. Tsirkin wrote:
> > Current code might introduce a lot of latency variation
> > if there are many pending bufs at the time we
> > attempt to transmit a new one. This is bad for
> > real-time applications and can't be good for TCP either.
> > 
> > Free up just enough to both clean up all buffers
> > eventually and to be able to xmit the next packet.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >  drivers/net/virtio_net.c |  106 
> > +-
> >  1 files changed, 67 insertions(+), 39 deletions(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index a0ee78d..b25db1c 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -509,17 +509,33 @@ again:
> > return received;
> >  }
> > 
> > -static void free_old_xmit_skbs(struct virtnet_info *vi)
> > +static bool free_old_xmit_skb(struct virtnet_info *vi)
> >  {
> > struct sk_buff *skb;
> > unsigned int len;
> > 
> > -   while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
> > -   pr_debug("Sent skb %p\n", skb);
> > -   vi->dev->stats.tx_bytes += skb->len;
> > -   vi->dev->stats.tx_packets++;
> > -   dev_kfree_skb_any(skb);
> > -   }
> > +   skb = virtqueue_get_buf(vi->svq, &len);
> > +   if (unlikely(!skb))
> > +   return false;
> > +   pr_debug("Sent skb %p\n", skb);
> > +   vi->dev->stats.tx_bytes += skb->len;
> > +   vi->dev->stats.tx_packets++;
> > +   dev_kfree_skb_any(skb);
> > +   return true;
> > +}
> > +
> > +/* Check capacity and try to free enough pending old buffers to enable 
> > queueing
> > + * new ones. Return true if we can guarantee that a following
> > + * virtqueue_add_buf will succeed. */
> > +static bool free_xmit_capacity(struct virtnet_info *vi)
> > +{
> > +   struct sk_buff *skb;
> > +   unsigned int len;
> > +
> > +   while (virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2)
> > +   if (unlikely(!free_old_xmit_skb))
> > +   return false;
> If we are using INDIRECT descriptors, 1 descriptor entry is good enough
> to guarantee that an skb can be queued unless we run out of memory.
> Is it worth checking if 'indirect' is set on the svq and then only free
> 1 descriptor? Otherwise, we will be dropping the packet if there are
> less than 18 free descriptors although we ony need 1.

This is not introduced with this patch: the
issue is that we need to consider worst case
as indirect memory allocation can fail.


I don't think it matters much: 18 out of 256
descriptors is not too expensive.

> > +   return true;
> >  }
> > 
> >  static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
> > @@ -572,30 +588,34 @@ static int xmit_skb(struct virtnet_info *vi, struct 
> > sk_buff *skb)
> >  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  {
> > struct virtnet_info *vi = netdev_priv(dev);
> > -   int capacity;
> > -
> > -   /* Free up any pending old buffers before queueing new ones. */
> > -   free_old_xmit_skbs(vi);
> > -
> > -   /* Try to transmit */
> > -   capacity = xmit_skb(vi, skb);
> > -
> > -   /* This can happen with OOM and indirect buffers. */
> > -   if (unlikely(capacity < 0)) {
> > -   if (net_ratelimit()) {
> > -   if (likely(capacity == -ENOMEM)) {
> > -   dev_warn(&dev->dev,
> > -"TX queue failure: out of memory\n");
> > -   } else {
> > -   dev->stats.tx_fifo_errors++;
> > +   int ret, i;
> > +
> > +   /* We normally do have space in the ring, so try to queue the skb as
> > +* fast as possible. */
> > +   ret = xmit_skb(vi, skb);
> > +   if (unlikely(ret < 0)) {
> > +   /* This triggers on the first xmit after ring full condition.
> > +* We need to free up some skbs first. */
> > +   if (likely(free_xmit_capacity(vi))) {
> > +   ret = xmit_skb(vi, skb);
> > +   /* This should never fail. Check, just in case. */
> > +   if (unlikely(ret < 0)) {
> > dev_warn(&dev->dev,
> >  "Unexpected TX queue failure: %d\n",
> > -capacity);
> > +ret);
> > +   dev->stats.tx_fifo_errors++;
> > +   dev->stats.tx_dropped++;
> > +   kfree_skb(skb);
> > +   return NETDEV_TX_OK;
> > }
> > +   } else {
> > +   /* Ring full: it might happen if we get a callback while
> > +* the queue is still mostly full. This should be
> > +* extremely rare. */
> > +   dev->stats.tx_dropped++;
> > +  

Re: [PATCHv2 RFC 3/4] virtio_net: limit xmit polling

2011-06-02 Thread Sridhar Samudrala
On Thu, 2011-06-02 at 18:43 +0300, Michael S. Tsirkin wrote:
> Current code might introduce a lot of latency variation
> if there are many pending bufs at the time we
> attempt to transmit a new one. This is bad for
> real-time applications and can't be good for TCP either.
> 
> Free up just enough to both clean up all buffers
> eventually and to be able to xmit the next packet.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  drivers/net/virtio_net.c |  106 
> +-
>  1 files changed, 67 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index a0ee78d..b25db1c 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -509,17 +509,33 @@ again:
>   return received;
>  }
> 
> -static void free_old_xmit_skbs(struct virtnet_info *vi)
> +static bool free_old_xmit_skb(struct virtnet_info *vi)
>  {
>   struct sk_buff *skb;
>   unsigned int len;
> 
> - while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
> - pr_debug("Sent skb %p\n", skb);
> - vi->dev->stats.tx_bytes += skb->len;
> - vi->dev->stats.tx_packets++;
> - dev_kfree_skb_any(skb);
> - }
> + skb = virtqueue_get_buf(vi->svq, &len);
> + if (unlikely(!skb))
> + return false;
> + pr_debug("Sent skb %p\n", skb);
> + vi->dev->stats.tx_bytes += skb->len;
> + vi->dev->stats.tx_packets++;
> + dev_kfree_skb_any(skb);
> + return true;
> +}
> +
> +/* Check capacity and try to free enough pending old buffers to enable 
> queueing
> + * new ones. Return true if we can guarantee that a following
> + * virtqueue_add_buf will succeed. */
> +static bool free_xmit_capacity(struct virtnet_info *vi)
> +{
> + struct sk_buff *skb;
> + unsigned int len;
> +
> + while (virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2)
> + if (unlikely(!free_old_xmit_skb))
> + return false;
If we are using INDIRECT descriptors, 1 descriptor entry is good enough
to guarantee that an skb can be queued unless we run out of memory.
Is it worth checking if 'indirect' is set on the svq and then only free
1 descriptor? Otherwise, we will be dropping the packet if there are
less than 18 free descriptors although we ony need 1.


> + return true;
>  }
> 
>  static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
> @@ -572,30 +588,34 @@ static int xmit_skb(struct virtnet_info *vi, struct 
> sk_buff *skb)
>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>   struct virtnet_info *vi = netdev_priv(dev);
> - int capacity;
> -
> - /* Free up any pending old buffers before queueing new ones. */
> - free_old_xmit_skbs(vi);
> -
> - /* Try to transmit */
> - capacity = xmit_skb(vi, skb);
> -
> - /* This can happen with OOM and indirect buffers. */
> - if (unlikely(capacity < 0)) {
> - if (net_ratelimit()) {
> - if (likely(capacity == -ENOMEM)) {
> - dev_warn(&dev->dev,
> -  "TX queue failure: out of memory\n");
> - } else {
> - dev->stats.tx_fifo_errors++;
> + int ret, i;
> +
> + /* We normally do have space in the ring, so try to queue the skb as
> +  * fast as possible. */
> + ret = xmit_skb(vi, skb);
> + if (unlikely(ret < 0)) {
> + /* This triggers on the first xmit after ring full condition.
> +  * We need to free up some skbs first. */
> + if (likely(free_xmit_capacity(vi))) {
> + ret = xmit_skb(vi, skb);
> + /* This should never fail. Check, just in case. */
> + if (unlikely(ret < 0)) {
>   dev_warn(&dev->dev,
>"Unexpected TX queue failure: %d\n",
> -  capacity);
> +  ret);
> + dev->stats.tx_fifo_errors++;
> + dev->stats.tx_dropped++;
> + kfree_skb(skb);
> + return NETDEV_TX_OK;
>   }
> + } else {
> + /* Ring full: it might happen if we get a callback while
> +  * the queue is still mostly full. This should be
> +  * extremely rare. */
> + dev->stats.tx_dropped++;
> + kfree_skb(skb);
> + goto stop;
>   }
> - dev->stats.tx_dropped++;
> - kfree_skb(skb);
> - return NETDEV_TX_OK;
>   }
>   virtqueue_kick(vi->svq);
> 
> @@ -603,18 +623,26 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>   skb_orphan(skb);
>   nf_reset(s

Re: [PATCHv2 RFC 0/4] virtio and vhost-net capacity handling

2011-06-02 Thread Michael S. Tsirkin
On Thu, Jun 02, 2011 at 06:42:35PM +0300, Michael S. Tsirkin wrote:
> OK, here's a new attempt to use the new capacity api.  I also added more
> comments to clarify the logic.  Hope this is more readable.  Let me know
> pls.
> 
> This is on top of the patches applied by Rusty.
> 
> Warning: untested. Posting now to give people chance to
> comment on the API.
> 
> Changes from v1:
> - fix comment in patch 2 to correct confusion noted by Rusty
> - rewrite patch 3 along the lines suggested by Rusty
>   note: it's not exactly the same but I hope it's close
>   enough, the main difference is that mine does limited
>   polling even in the unlikely xmit failure case.
> - added a patch to not return capacity from add_buf
>   it always looked like a weird hack
> 
> Michael S. Tsirkin (4):
>   virtio_ring: add capacity check API
>   virtio_net: fix tx capacity checks using new API
>   virtio_net: limit xmit polling
>   Revert "virtio: make add_buf return capacity remaining:
> 
>  drivers/net/virtio_net.c |  111 
> ++
>  drivers/virtio/virtio_ring.c |   10 +++-
>  include/linux/virtio.h   |7 ++-
>  3 files changed, 84 insertions(+), 44 deletions(-)
> 
> -- 
> 1.7.5.53.gc233e


And just FYI, here's a patch (on top) that I considered but
decided against. This does a single get_buf before
xmit. I thought it's not really needed as the capacity
check in add_buf is relatively cheap, and we removed
the kick in xmit_skb. But the point is that the loop
will now be easy to move around if someone manages
to show this benefits speed (which I doubt).

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b25db1c..75af5be 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -590,6 +590,9 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
struct virtnet_info *vi = netdev_priv(dev);
int ret, i;
 
+   /* Try to pop an skb, to increase the chance we can add this one. */
+   free_old_xmit_skb(v);
+
/* We normally do have space in the ring, so try to queue the skb as
 * fast as possible. */
ret = xmit_skb(vi, skb);
@@ -627,8 +630,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 * This is so that we don't hog the skb memory unnecessarily. *
 * Doing this after kick means there's a chance we'll free
 * the skb we have just sent, which is hot in cache. */
-   for (i = 0; i < 2; i++)
-   free_old_xmit_skb(v);
+   free_old_xmit_skb(v);
 
if (likely(free_xmit_capacity(vi)))
return NETDEV_TX_OK;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm tools: Add QCOW level2 caching support

2011-06-02 Thread Prasad Joshi
On Thu, Jun 2, 2011 at 9:36 AM, Prasad Joshi  wrote:
> On Thu, Jun 2, 2011 at 8:28 AM, Ingo Molnar  wrote:
>>
>> * Prasad Joshi  wrote:
>>
>>> Summary of performance numbers
>>> ==
>>> There is not much difference with sequential character operations are
>>> performed, the code with caching performed better by small margin. The 
>>> caching
>>> code performance raised by 12% with sequential block output and dropped by
>>> 0.5% with sequential block input. The caching code also suffered with
>>> Random seeks and performed badly by 12%. The performance numbers drastically
>>> improved with sequential creates (62%) and delete operations (30%).
>>
>> Looking at the numbers i think it's pretty clear that from this point
>> on the quality of IO tests should be improved: Bonnie is too noisy
>> and does not cut it anymore for finer enhancements.
>>
>> To make measurements easier you could also do a simple trick: put
>> *all* of the disk image into /dev/shm and add a command-line debug
>> option that add a fixed-amount udelay(1000) call every time the code
>> reads from the disk image.
>>
>> This introduces a ~1msec delay and thus simulates IO, but the delays
>> are *constant* [make sure you use a high-res timers kernel], so they
>> do not result in nearly as much measurement noise as real block IO
>> does.
>>
>> The IO delays will still be there, so any caching advantages (and CPU
>> overhead reductions) will be measurable very clearly.
>>
>> This way you are basically 'emulating' a real disk drive but you will
>> emulate uniform latencies, which makes measurements a lot more
>> reliable - while still relevant to the end result.
>>
>> So if under such a measurement model you can prove an improvement
>> with a patch, that improvement will be there with real disks as well
>> - just harder to prove.
>>
>> Wanna try this? I really think you are hitting the limits of your
>> current measurement methodology and you will be wasting time running
>> more measurements instead of saving time doing more intelligent
>> measurements ;-)
>>
>
> Thanks Ingo, will try this method.
>
> I am not sure how to induce the delay you mentioned. I could vaguely
> remember you suggesting something similar few days back. Let me see if
> I can find out the correct arguments to use this delay, otherwise will
> post a query.
>

I repeated the test after copying the image file on /dev/shm as
suggested by Ingo

BEFORE CACHING
=
$ bonnie++ -d tmp/ -c 2 -s 1024
Version  1.96   --Sequential Output-- --Sequential Input- --Random-
Concurrency   2 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
prasad-virtual-m 1G   996  99 433356  59 172755  42  5331 100 339838
49 15534 749
Latency 17704us   49654us   61299us6152us1838us 106ms
Version  1.96   --Sequential Create-- Random Create
prasad-virtual-mach -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16 + +++ + +++ + +++ + +++ + +++ + +++
Latency   475us 315us 358us 565us  56us  91us
1.96,1.96,prasad-virtual-machine,2,1307032968,1G,,996,99,433356,59,172755,42,5331,100,339838,49,15534,749,16,+,+++,+,+++,+,+++,+,+++,+,+++,+,+++,17704us,49654us,61299us,6152us,1838us,106ms,475us,315us,358us,565us,56us,91us


AFTER CACHING
===
$ bonnie++ -d tmp/ -c 2 -s 1024
Version  1.96   --Sequential Output-- --Sequential Input- --Random-
Concurrency   2 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
prasad-virtual-m 1G  1054  99 558323  74 226297  55  5436  99 504042
70 + +++
Latency 12776us   23582us   39912us6778us   20050us   30421us
Version  1.96   --Sequential Create-- Random Create
prasad-virtual-mach -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16 + +++ + +++ + +++ + +++ + +++ + +++
Latency   334us 391us 427us 472us  82us  79us
1.96,1.96,prasad-virtual-machine,2,1307032815,1G,,1054,99,558323,74,226297,55,5436,99,504042,70,+,+++,16,+,+++,+,+++,+,+++,+,+++,+,+++,+,+++,12776us,23582us,39912us,6778us,20050us,30421us,334us,391us,427us,472us,82us,79us

During both the tests the machine was started with 512MB memory and
test was performed without any additional the IO delay.

The tests shows that caching improves the performance. Thanks Ingo.

Regards,
Prasad


> Thanks and Regards,
> Prasad
>
>> Thanks,
>>
>>        Ingo
>>
>
--
To unsubscribe from this list: send the line

Re: [PATCH RFC 3/3] virtio_net: limit xmit polling

2011-06-02 Thread Michael S. Tsirkin
On Thu, Jun 02, 2011 at 08:56:42PM +0530, Krishna Kumar2 wrote:
> Return value in free_old_xmit() was wrong.

Could you check my latest RFC pls?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 RFC 4/4] Revert "virtio: make add_buf return capacity remaining:

2011-06-02 Thread Michael S. Tsirkin
This reverts commit 3c1b27d5043086a485f8526353ae9fe37bfa1065.
The only user was virtio_net, and it switched to
min_capacity instead.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_ring.c |2 +-
 include/linux/virtio.h   |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 23422f1..a6c21eb 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -233,7 +233,7 @@ add_head:
pr_debug("Added buffer head %i to %p\n", head, vq);
END_USE(vq);
 
-   return vq->num_free;
+   return 0;
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_buf_gfp);
 
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 209220d..63c4908 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -34,7 +34,7 @@ struct virtqueue {
  * in_num: the number of sg which are writable (after readable ones)
  * data: the token identifying the buffer.
  * gfp: how to do memory allocations (if necessary).
- *  Returns remaining capacity of queue (sg segments) or a negative error.
+ *  Returns 0 on success or a negative error.
  * virtqueue_kick: update after add_buf
  * vq: the struct virtqueue
  * After one or more add_buf calls, invoke this to kick the other side.
-- 
1.7.5.53.gc233e
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 RFC 1/4] virtio_ring: add capacity check API

2011-06-02 Thread Michael S. Tsirkin
Add API to check ring capacity. Because of the option
to use indirect buffers, this returns the worst
case, not the normal case capacity.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_ring.c |8 
 include/linux/virtio.h   |5 +
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 68b9136..23422f1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -344,6 +344,14 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned 
int *len)
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_buf);
 
+int virtqueue_min_capacity(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   return vq->num_free;
+}
+EXPORT_SYMBOL_GPL(virtqueue_min_capacity);
+
 void virtqueue_disable_cb(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 7108857..209220d 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -42,6 +42,9 @@ struct virtqueue {
  * vq: the struct virtqueue we're talking about.
  * len: the length written into the buffer
  * Returns NULL or the "data" token handed to add_buf.
+ * virtqueue_min_capacity: get the current capacity of the queue
+ * vq: the struct virtqueue we're talking about.
+ * Returns the current worst case capacity of the queue.
  * virtqueue_disable_cb: disable callbacks
  * vq: the struct virtqueue we're talking about.
  * Note that this is not necessarily synchronous, hence unreliable and only
@@ -89,6 +92,8 @@ void virtqueue_kick(struct virtqueue *vq);
 
 void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
 
+int virtqueue_min_capacity(struct virtqueue *vq);
+
 void virtqueue_disable_cb(struct virtqueue *vq);
 
 bool virtqueue_enable_cb(struct virtqueue *vq);
-- 
1.7.5.53.gc233e

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 RFC 2/4] virtio_net: fix tx capacity checks using new API

2011-06-02 Thread Michael S. Tsirkin
In the (rare) case where new descriptors are used
while virtio_net enables vq callback for the TX vq,
virtio_net uses the number of sg entries in the skb it frees to
calculate how many descriptors in the ring have just been made
available.  But this value is an overestimate: with indirect buffers
each skb only uses one descriptor entry, meaning we may wake the queue
only to find we are still out of space and need to stop
the ring almost at once.

Using the new virtqueue_min_capacity() call, we can determine
the remaining capacity, so we should use that instead.
This estimation is worst-case which is consistent with
the value returned by virtqueue_add_buf.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/net/virtio_net.c |9 -
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f685324..a0ee78d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -509,19 +509,17 @@ again:
return received;
 }
 
-static unsigned int free_old_xmit_skbs(struct virtnet_info *vi)
+static void free_old_xmit_skbs(struct virtnet_info *vi)
 {
struct sk_buff *skb;
-   unsigned int len, tot_sgs = 0;
+   unsigned int len;
 
while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
pr_debug("Sent skb %p\n", skb);
vi->dev->stats.tx_bytes += skb->len;
vi->dev->stats.tx_packets++;
-   tot_sgs += skb_vnet_hdr(skb)->num_sg;
dev_kfree_skb_any(skb);
}
-   return tot_sgs;
 }
 
 static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
@@ -611,7 +609,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
netif_stop_queue(dev);
if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
/* More just got used, free them then recheck. */
-   capacity += free_old_xmit_skbs(vi);
+   free_old_xmit_skbs(vi);
+   capacity = virtqueue_min_capacity(vi->svq);
if (capacity >= 2+MAX_SKB_FRAGS) {
netif_start_queue(dev);
virtqueue_disable_cb(vi->svq);
-- 
1.7.5.53.gc233e

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 RFC 3/4] virtio_net: limit xmit polling

2011-06-02 Thread Michael S. Tsirkin
Current code might introduce a lot of latency variation
if there are many pending bufs at the time we
attempt to transmit a new one. This is bad for
real-time applications and can't be good for TCP either.

Free up just enough to both clean up all buffers
eventually and to be able to xmit the next packet.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/net/virtio_net.c |  106 +-
 1 files changed, 67 insertions(+), 39 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a0ee78d..b25db1c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -509,17 +509,33 @@ again:
return received;
 }
 
-static void free_old_xmit_skbs(struct virtnet_info *vi)
+static bool free_old_xmit_skb(struct virtnet_info *vi)
 {
struct sk_buff *skb;
unsigned int len;
 
-   while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
-   pr_debug("Sent skb %p\n", skb);
-   vi->dev->stats.tx_bytes += skb->len;
-   vi->dev->stats.tx_packets++;
-   dev_kfree_skb_any(skb);
-   }
+   skb = virtqueue_get_buf(vi->svq, &len);
+   if (unlikely(!skb))
+   return false;
+   pr_debug("Sent skb %p\n", skb);
+   vi->dev->stats.tx_bytes += skb->len;
+   vi->dev->stats.tx_packets++;
+   dev_kfree_skb_any(skb);
+   return true;
+}
+
+/* Check capacity and try to free enough pending old buffers to enable queueing
+ * new ones. Return true if we can guarantee that a following
+ * virtqueue_add_buf will succeed. */
+static bool free_xmit_capacity(struct virtnet_info *vi)
+{
+   struct sk_buff *skb;
+   unsigned int len;
+
+   while (virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2)
+   if (unlikely(!free_old_xmit_skb))
+   return false;
+   return true;
 }
 
 static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
@@ -572,30 +588,34 @@ static int xmit_skb(struct virtnet_info *vi, struct 
sk_buff *skb)
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
struct virtnet_info *vi = netdev_priv(dev);
-   int capacity;
-
-   /* Free up any pending old buffers before queueing new ones. */
-   free_old_xmit_skbs(vi);
-
-   /* Try to transmit */
-   capacity = xmit_skb(vi, skb);
-
-   /* This can happen with OOM and indirect buffers. */
-   if (unlikely(capacity < 0)) {
-   if (net_ratelimit()) {
-   if (likely(capacity == -ENOMEM)) {
-   dev_warn(&dev->dev,
-"TX queue failure: out of memory\n");
-   } else {
-   dev->stats.tx_fifo_errors++;
+   int ret, i;
+
+   /* We normally do have space in the ring, so try to queue the skb as
+* fast as possible. */
+   ret = xmit_skb(vi, skb);
+   if (unlikely(ret < 0)) {
+   /* This triggers on the first xmit after ring full condition.
+* We need to free up some skbs first. */
+   if (likely(free_xmit_capacity(vi))) {
+   ret = xmit_skb(vi, skb);
+   /* This should never fail. Check, just in case. */
+   if (unlikely(ret < 0)) {
dev_warn(&dev->dev,
 "Unexpected TX queue failure: %d\n",
-capacity);
+ret);
+   dev->stats.tx_fifo_errors++;
+   dev->stats.tx_dropped++;
+   kfree_skb(skb);
+   return NETDEV_TX_OK;
}
+   } else {
+   /* Ring full: it might happen if we get a callback while
+* the queue is still mostly full. This should be
+* extremely rare. */
+   dev->stats.tx_dropped++;
+   kfree_skb(skb);
+   goto stop;
}
-   dev->stats.tx_dropped++;
-   kfree_skb(skb);
-   return NETDEV_TX_OK;
}
virtqueue_kick(vi->svq);
 
@@ -603,18 +623,26 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
skb_orphan(skb);
nf_reset(skb);
 
-   /* Apparently nice girls don't return TX_BUSY; stop the queue
-* before it gets out of hand.  Naturally, this wastes entries. */
-   if (capacity < 2+MAX_SKB_FRAGS) {
-   netif_stop_queue(dev);
-   if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
-   /* More just got used, free them then recheck. */
-   free_old_xmit_skbs(vi);
-   capacity = virtqueue_min_capacity(vi->svq)

[PATCHv2 RFC 0/4] virtio and vhost-net capacity handling

2011-06-02 Thread Michael S. Tsirkin
OK, here's a new attempt to use the new capacity api.  I also added more
comments to clarify the logic.  Hope this is more readable.  Let me know
pls.

This is on top of the patches applied by Rusty.

Warning: untested. Posting now to give people chance to
comment on the API.

Changes from v1:
- fix comment in patch 2 to correct confusion noted by Rusty
- rewrite patch 3 along the lines suggested by Rusty
  note: it's not exactly the same but I hope it's close
  enough, the main difference is that mine does limited
  polling even in the unlikely xmit failure case.
- added a patch to not return capacity from add_buf
  it always looked like a weird hack

Michael S. Tsirkin (4):
  virtio_ring: add capacity check API
  virtio_net: fix tx capacity checks using new API
  virtio_net: limit xmit polling
  Revert "virtio: make add_buf return capacity remaining:

 drivers/net/virtio_net.c |  111 ++
 drivers/virtio/virtio_ring.c |   10 +++-
 include/linux/virtio.h   |7 ++-
 3 files changed, 84 insertions(+), 44 deletions(-)

-- 
1.7.5.53.gc233e
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 3/3] virtio_net: limit xmit polling

2011-06-02 Thread Michael S. Tsirkin
On Thu, Jun 02, 2011 at 08:56:42PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin"  wrote on 06/02/2011 08:13:46 PM:
> 
> > > Please review this patch to see if it looks reasonable:
> >
> > Hmm, since you decided to work on top of my patch,
> > I'd appreciate split-up fixes.
> 
> OK (that also explains your next comment).
> 
> > > 1. Picked comments/code from MST's code and Rusty's review.
> > > 2. virtqueue_min_capacity() needs to be called only if it returned
> > >empty the last time it was called.
> > > 3. Fix return value bug in free_old_xmit_skbs (hangs guest).
> > > 4. Stop queue only if capacity is not enough for next xmit.
> >
> > That's what we always did ...
> 
> I had made the patch against your patch, hence this change (sorry for
> the confusion!).
> 
> > > 5. Fix/clean some likely/unlikely checks (hopefully).
> > >
> > > I have done some minimal netperf tests with this.
> > >
> > > With this patch, add_buf returning capacity seems to be useful - it
> > > allows less virtio API calls.
> >
> > Why bother? It's cheap ...
> 
> If add_buf retains it's functionality to return the capacity (it
> is going to need a change to return 0 otherwise anyway), is it
> useful to call another function at each xmit?
> 
> > > +static bool free_old_xmit_skbs(struct virtnet_info *vi, int to_free)
> > > +{
> > > +   bool empty = virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2;
> > > +
> > > +   do {
> > > +  if (!free_one_old_xmit_skb(vi)) {
> > > + /* No more skbs to free up */
> > >   break;
> > > -  pr_debug("Sent skb %p\n", skb);
> > > -  vi->dev->stats.tx_bytes += skb->len;
> > > -  vi->dev->stats.tx_packets++;
> > > -  dev_kfree_skb_any(skb);
> > > -   }
> > > -   return r;
> > > +  }
> > > +
> > > +  if (empty) {
> > > + /* Check again if there is enough space */
> > > + empty = virtqueue_min_capacity(vi->svq) <
> > > +MAX_SKB_FRAGS + 2;
> > > +  } else {
> > > + --to_free;
> > > +  }
> > > +   } while (to_free > 0);
> > > +
> > > +   return !empty;
> > >  }
> >
> > Why bother doing the capacity check in this function?
> 
> To return whether we have enough space for next xmit. It should call
> it only once unless space is running out. Does it sound OK?
> 
> > > -   if (unlikely(ret < 0)) {
> > > +   if (unlikely(capacity < 0)) {
> > > +  /*
> > > +   * Failure to queue should be impossible. The only way to
> > > +   * reach here is if we got a cb before 3/4th of space was
> > > +   * available. We could stop the queue and re-enable
> > > +   * callbacks (and possibly return TX_BUSY), but we don't
> > > +   * bother since this is impossible.
> >
> > It's far from impossible.  The 3/4 thing is only a hint, and old devices
> > don't support it anyway.
> 
> OK, I will re-put back your comment.
> 
> > > -   if (!likely(free_old_xmit_skbs(vi, 2))) {
> > > -  netif_stop_queue(dev);
> > > -  if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
> > > - /* More just got used, free them and recheck. */
> > > - if (!likely(free_old_xmit_skbs(vi, 0))) {
> > > -netif_start_queue(dev);
> > > -virtqueue_disable_cb(vi->svq);
> > > +   /*
> > > +* Apparently nice girls don't return TX_BUSY; check capacity and
> > > +* stop the queue before it gets out of hand. Naturally, this
> wastes
> > > +* entries.
> > > +*/
> > > +   if (capacity < 2+MAX_SKB_FRAGS) {
> > > +  /*
> > > +   * We don't have enough space for the next packet. Try
> > > +   * freeing more.
> > > +   */
> > > +  if (likely(!free_old_xmit_skbs(vi, UINT_MAX))) {
> > > + netif_stop_queue(dev);
> > > + if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
> > > +/* More just got used, free them and recheck. */
> > > +if (likely(free_old_xmit_skbs(vi, UINT_MAX))) {
> >
> > Is this where the bug was?
> 
> Return value in free_old_xmit() was wrong. I will re-do against the
> mainline kernel.
> 
> Thanks,
> 
> - KK

Just noting that I'm working on that patch as well, it might
be more efficient if we don't both of us do this in parallel :)

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 3/3] virtio_net: limit xmit polling

2011-06-02 Thread Krishna Kumar2
"Michael S. Tsirkin"  wrote on 06/02/2011 08:13:46 PM:

> > Please review this patch to see if it looks reasonable:
>
> Hmm, since you decided to work on top of my patch,
> I'd appreciate split-up fixes.

OK (that also explains your next comment).

> > 1. Picked comments/code from MST's code and Rusty's review.
> > 2. virtqueue_min_capacity() needs to be called only if it returned
> >empty the last time it was called.
> > 3. Fix return value bug in free_old_xmit_skbs (hangs guest).
> > 4. Stop queue only if capacity is not enough for next xmit.
>
> That's what we always did ...

I had made the patch against your patch, hence this change (sorry for
the confusion!).

> > 5. Fix/clean some likely/unlikely checks (hopefully).
> >
> > I have done some minimal netperf tests with this.
> >
> > With this patch, add_buf returning capacity seems to be useful - it
> > allows less virtio API calls.
>
> Why bother? It's cheap ...

If add_buf retains it's functionality to return the capacity (it
is going to need a change to return 0 otherwise anyway), is it
useful to call another function at each xmit?

> > +static bool free_old_xmit_skbs(struct virtnet_info *vi, int to_free)
> > +{
> > +   bool empty = virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2;
> > +
> > +   do {
> > +  if (!free_one_old_xmit_skb(vi)) {
> > + /* No more skbs to free up */
> >   break;
> > -  pr_debug("Sent skb %p\n", skb);
> > -  vi->dev->stats.tx_bytes += skb->len;
> > -  vi->dev->stats.tx_packets++;
> > -  dev_kfree_skb_any(skb);
> > -   }
> > -   return r;
> > +  }
> > +
> > +  if (empty) {
> > + /* Check again if there is enough space */
> > + empty = virtqueue_min_capacity(vi->svq) <
> > +MAX_SKB_FRAGS + 2;
> > +  } else {
> > + --to_free;
> > +  }
> > +   } while (to_free > 0);
> > +
> > +   return !empty;
> >  }
>
> Why bother doing the capacity check in this function?

To return whether we have enough space for next xmit. It should call
it only once unless space is running out. Does it sound OK?

> > -   if (unlikely(ret < 0)) {
> > +   if (unlikely(capacity < 0)) {
> > +  /*
> > +   * Failure to queue should be impossible. The only way to
> > +   * reach here is if we got a cb before 3/4th of space was
> > +   * available. We could stop the queue and re-enable
> > +   * callbacks (and possibly return TX_BUSY), but we don't
> > +   * bother since this is impossible.
>
> It's far from impossible.  The 3/4 thing is only a hint, and old devices
> don't support it anyway.

OK, I will re-put back your comment.

> > -   if (!likely(free_old_xmit_skbs(vi, 2))) {
> > -  netif_stop_queue(dev);
> > -  if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
> > - /* More just got used, free them and recheck. */
> > - if (!likely(free_old_xmit_skbs(vi, 0))) {
> > -netif_start_queue(dev);
> > -virtqueue_disable_cb(vi->svq);
> > +   /*
> > +* Apparently nice girls don't return TX_BUSY; check capacity and
> > +* stop the queue before it gets out of hand. Naturally, this
wastes
> > +* entries.
> > +*/
> > +   if (capacity < 2+MAX_SKB_FRAGS) {
> > +  /*
> > +   * We don't have enough space for the next packet. Try
> > +   * freeing more.
> > +   */
> > +  if (likely(!free_old_xmit_skbs(vi, UINT_MAX))) {
> > + netif_stop_queue(dev);
> > + if (unlikely(!virtqueue_enable_cb_delayed(vi->svq))) {
> > +/* More just got used, free them and recheck. */
> > +if (likely(free_old_xmit_skbs(vi, UINT_MAX))) {
>
> Is this where the bug was?

Return value in free_old_xmit() was wrong. I will re-do against the
mainline kernel.

Thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RCU red-black tree (was: Re: [PATCH 4/6] kvm tools: Add rwlock wrapper)

2011-06-02 Thread Mathieu Desnoyers
* Sasha Levin (levinsasha...@gmail.com) wrote:
[...]
> Mathieu,
> 
> I've started working on converting our MMIO code to use RCU rbtree.
> 
> It looks like each node contains one key, and the search functions
> search for a node with a key in a specific range.
> 
> Instead, the key in interval tree nodes is a range, and when searching
> we try to find which node's range contains our search param.
> 
> For example, our MMIO mapper maps an address space into devices, so we
> can have one node which holds the range (0x100-0x200) which maps to a
> VGA card, a (0x400-0x500) which maps to a sound card, and so on. Then,
> when a guest is running and tries to write to 0x150, we want to know
> which device it maps to.

Hi Sasha,

I finished updating the RCU RBTree internals and API to store ranges and
query points or ranges instead of simple values. My tests are passing
fine so far. I also added some documentation in the code explaining how
I deal with search/prev/next reads vs concurrent writers. Feedback about
the API would be very welcome! It's all available in the urcu rbtree2
branch.

Thanks!

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 3/3] virtio_net: limit xmit polling

2011-06-02 Thread Michael S. Tsirkin
On Thu, Jun 02, 2011 at 07:47:48PM +0530, Krishna Kumar2 wrote:
> > OK, I have something very similar, but I still dislike the screw the
> > latency part: this path is exactly what the IBM guys seem to hit.  So I
> > created two functions: one tries to free a constant number and another
> > one up to capacity. I'll post that now.
> 
> Please review this patch to see if it looks reasonable:

Hmm, since you decided to work on top of my patch,
I'd appreciate split-up fixes.

> 1. Picked comments/code from MST's code and Rusty's review.
> 2. virtqueue_min_capacity() needs to be called only if it returned
>empty the last time it was called.
> 3. Fix return value bug in free_old_xmit_skbs (hangs guest).
> 4. Stop queue only if capacity is not enough for next xmit.

That's what we always did ...

> 5. Fix/clean some likely/unlikely checks (hopefully).
> 
> I have done some minimal netperf tests with this.
> 
> With this patch, add_buf returning capacity seems to be useful - it
> allows less virtio API calls.

Why bother? It's cheap ...

> 
> Signed-off-by: Krishna Kumar 
> ---
>  drivers/net/virtio_net.c |  105 ++---
>  1 file changed, 64 insertions(+), 41 deletions(-)
> 
> diff -ruNp org/drivers/net/virtio_net.c new/drivers/net/virtio_net.c
> --- org/drivers/net/virtio_net.c  2011-06-02 15:49:25.0 +0530
> +++ new/drivers/net/virtio_net.c  2011-06-02 19:13:02.0 +0530
> @@ -509,27 +509,43 @@ again:
>   return received;
>  }
>  
> -/* Check capacity and try to free enough pending old buffers to enable 
> queueing
> - * new ones.  If min_skbs > 0, try to free at least the specified number of 
> skbs
> - * even if the ring already has sufficient capacity.  Return true if we can
> - * guarantee that a following virtqueue_add_buf will succeed. */
> -static bool free_old_xmit_skbs(struct virtnet_info *vi, int min_skbs)
> +/* Return true if freed a skb, else false */
> +static inline bool free_one_old_xmit_skb(struct virtnet_info *vi)
>  {
>   struct sk_buff *skb;
>   unsigned int len;
> - bool r;
>  
> - while ((r = virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2) ||
> -min_skbs-- > 0) {
> - skb = virtqueue_get_buf(vi->svq, &len);
> - if (unlikely(!skb))
> + skb = virtqueue_get_buf(vi->svq, &len);
> + if (unlikely(!skb))
> + return 0;
> +
> + pr_debug("Sent skb %p\n", skb);
> + vi->dev->stats.tx_bytes += skb->len;
> + vi->dev->stats.tx_packets++;
> + dev_kfree_skb_any(skb);
> + return 1;
> +}
> +
> +static bool free_old_xmit_skbs(struct virtnet_info *vi, int to_free)
> +{
> + bool empty = virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2;
> +
> + do {
> + if (!free_one_old_xmit_skb(vi)) {
> + /* No more skbs to free up */
>   break;
> - pr_debug("Sent skb %p\n", skb);
> - vi->dev->stats.tx_bytes += skb->len;
> - vi->dev->stats.tx_packets++;
> - dev_kfree_skb_any(skb);
> - }
> - return r;
> + }
> +
> + if (empty) {
> + /* Check again if there is enough space */
> + empty = virtqueue_min_capacity(vi->svq) <
> + MAX_SKB_FRAGS + 2;
> + } else {
> + --to_free;
> + }
> + } while (to_free > 0);
> +
> + return !empty;
>  }

Why bother doing the capacity check in this function?

>  static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
> @@ -582,46 +598,53 @@ static int xmit_skb(struct virtnet_info 
>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>   struct virtnet_info *vi = netdev_priv(dev);
> - int ret, n;
> + int capacity;
>  
> - /* Free up space in the ring in case this is the first time we get
> -  * woken up after ring full condition.  Note: this might try to free
> -  * more than strictly necessary if the skb has a small
> -  * number of fragments, but keep it simple. */
> - free_old_xmit_skbs(vi, 0);
> + /* Try to free 2 buffers for every 1 xmit, to stay ahead. */
> + free_old_xmit_skbs(vi, 2);
>  
>   /* Try to transmit */
> - ret = xmit_skb(vi, skb);
> + capacity = xmit_skb(vi, skb);
>  
> - /* Failure to queue is unlikely. It's not a bug though: it might happen
> -  * if we get an interrupt while the queue is still mostly full.
> -  * We could stop the queue and re-enable callbacks (and possibly return
> -  * TX_BUSY), but as this should be rare, we don't bother. */
> - if (unlikely(ret < 0)) {
> + if (unlikely(capacity < 0)) {
> + /*
> +  * Failure to queue should be impossible. The only way to
> +  * reach here is if we got a cb before 3/4th of space was
> +  * available. We could stop the queue and re-enable
> +  * callb

Re: [PATCH] kvm: Document KVM_IOEVENTFD

2011-06-02 Thread Jan Kiszka
On 2011-06-02 15:43, Avi Kivity wrote:
> On 06/01/2011 01:51 PM, Jan Kiszka wrote:
>> On 2011-05-31 15:44, Marcelo Tosatti wrote:
>> >  On Sat, May 28, 2011 at 02:12:30PM +0300, Sasha Levin wrote:
>> >>  Document KVM_IOEVENTFD that can be used to receive
>> >>  notifications of PIO/MMIO events without triggering
>> >>  an exit.
>> >>
>> >>  Cc: Avi Kivity
>> >>  Cc: Marcelo Tosatti
>> >>  Signed-off-by: Sasha Levin
>> >>  ---
>> >>   Documentation/virtual/kvm/api.txt |   30
>> ++
>> >>   1 files changed, 30 insertions(+), 0 deletions(-)
>> >
>> >  Applied (with wording fix), thanks.
>>
>> Requires section number fix-up (4.56 ->  4.58).
> 
> The usual fix is to cut a 5.0 release.

Too bad that we already have one ("5. The kvm_run structure"). :)

Jan

--8<--

From: Jan Kiszka 

KVM: Fixup documentation section numbering

Signed-off-by: Jan Kiszka 
---
 Documentation/virtual/kvm/api.txt |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 3755a39..2bd06b0 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1323,7 +1323,7 @@ struct kvm_lapic_state {
 Copies the input argument into the the Local APIC registers.  The data format
 and layout are the same as documented in the architecture manual.
 
-4.56 KVM_IOEVENTFD
+4.58 KVM_IOEVENTFD
 
 Capability: KVM_CAP_IOEVENTFD
 Architectures: all
-- 
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Clarify KVM_ASSIGN_PCI_DEVICE documentation

2011-06-02 Thread Jan Kiszka
From: Jan Kiszka 

Neither host_irq nor the guest_msi struct are used anymore today.
Tag the former, drop the latter to avoid confusion.

Signed-off-by: Jan Kiszka 
---
 Documentation/virtual/kvm/api.txt |7 +--
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 2bd06b0..7adc1ac 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1143,15 +1143,10 @@ Assigns an IRQ to a passed-through device.
 
 struct kvm_assigned_irq {
__u32 assigned_dev_id;
-   __u32 host_irq;
+   __u32 host_irq; /* ignored (legacy field) */
__u32 guest_irq;
__u32 flags;
union {
-   struct {
-   __u32 addr_lo;
-   __u32 addr_hi;
-   __u32 data;
-   } guest_msi;
__u32 reserved[12];
};
 };
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 3/3] virtio_net: limit xmit polling

2011-06-02 Thread Krishna Kumar2
> OK, I have something very similar, but I still dislike the screw the
> latency part: this path is exactly what the IBM guys seem to hit.  So I
> created two functions: one tries to free a constant number and another
> one up to capacity. I'll post that now.

Please review this patch to see if it looks reasonable (inline and
attachment):

1. Picked comments/code from Michael's code and Rusty's review.
2. virtqueue_min_capacity() needs to be called only if it returned
   empty the last time it was called.
3. Fix return value bug in free_old_xmit_skbs (hangs guest).
4. Stop queue only if capacity is not enough for next xmit.
5. Fix/clean some likely/unlikely checks (hopefully).
6. I think xmit_skb cannot return error since
virtqueue_enable_cb_delayed() can return false only if
   3/4th space became available, which is what we check.
6. The comments for free_old_xmit_skbs needs to be more
clear (not done).

I have done some minimal netperf tests with this.

With this patch, add_buf returning capacity seems to be useful - it
allows using fewer virtio API calls.

(See attached file: patch)

Signed-off-by: Krishna Kumar 
---
 drivers/net/virtio_net.c |  105 ++---
 1 file changed, 64 insertions(+), 41 deletions(-)

diff -ruNp org/drivers/net/virtio_net.c new/drivers/net/virtio_net.c
--- org/drivers/net/virtio_net.c2011-06-02 15:49:25.0 +0530
+++ new/drivers/net/virtio_net.c2011-06-02 19:13:02.0 +0530
@@ -509,27 +509,43 @@ again:
return received;
 }

-/* Check capacity and try to free enough pending old buffers to enable
queueing
- * new ones.  If min_skbs > 0, try to free at least the specified number
of skbs
- * even if the ring already has sufficient capacity.  Return true if we
can
- * guarantee that a following virtqueue_add_buf will succeed. */
-static bool free_old_xmit_skbs(struct virtnet_info *vi, int min_skbs)
+/* Return true if freed a skb, else false */
+static inline bool free_one_old_xmit_skb(struct virtnet_info *vi)
 {
struct sk_buff *skb;
unsigned int len;
-   bool r;

-   while ((r = virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2) ||
-  min_skbs-- > 0) {
-   skb = virtqueue_get_buf(vi->svq, &len);
-   if (unlikely(!skb))
+   skb = virtqueue_get_buf(vi->svq, &len);
+   if (unlikely(!skb))
+   return 0;
+
+   pr_debug("Sent skb %p\n", skb);
+   vi->dev->stats.tx_bytes += skb->len;
+   vi->dev->stats.tx_packets++;
+   dev_kfree_skb_any(skb);
+   return 1;
+}
+
+static bool free_old_xmit_skbs(struct virtnet_info *vi, int to_free)
+{
+   bool empty = virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2;
+
+   do {
+   if (!free_one_old_xmit_skb(vi)) {
+   /* No more skbs to free up */
break;
-   pr_debug("Sent skb %p\n", skb);
-   vi->dev->stats.tx_bytes += skb->len;
-   vi->dev->stats.tx_packets++;
-   dev_kfree_skb_any(skb);
-   }
-   return r;
+   }
+
+   if (empty) {
+   /* Check again if there is enough space */
+   empty = virtqueue_min_capacity(vi->svq) <
+   MAX_SKB_FRAGS + 2;
+   } else {
+   --to_free;
+   }
+   } while (to_free > 0);
+
+   return !empty;
 }

 static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
@@ -582,46 +598,53 @@ static int xmit_skb(struct virtnet_info
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
struct virtnet_info *vi = netdev_priv(dev);
-   int ret, n;
+   int capacity;

-   /* Free up space in the ring in case this is the first time we get
-* woken up after ring full condition.  Note: this might try to free
-* more than strictly necessary if the skb has a small
-* number of fragments, but keep it simple. */
-   free_old_xmit_skbs(vi, 0);
+   /* Try to free 2 buffers for every 1 xmit, to stay ahead. */
+   free_old_xmit_skbs(vi, 2);

/* Try to transmit */
-   ret = xmit_skb(vi, skb);
+   capacity = xmit_skb(vi, skb);

-   /* Failure to queue is unlikely. It's not a bug though: it might happen
-* if we get an interrupt while the queue is still mostly full.
-* We could stop the queue and re-enable callbacks (and possibly
return
-* TX_BUSY), but as this should be rare, we don't bother. */
-   if (unlikely(ret < 0)) {
+   if (unlikely(capacity < 0)) {
+   /*
+* Failure to queue should be impossible. The only way to
+* reach here is if we got a cb before 3/4th of space was
+* available. We could stop the queue and re-enable
+* callbacks (and possibly return TX_BUSY), but we don't
+* bother s

Re: [PATCH] kvm tools, i8042: Fix device init failure

2011-06-02 Thread Pekka Enberg
On Thu, Jun 2, 2011 at 4:40 PM, Ingo Molnar  wrote:
>> +     case I8042_COMMAND_REG: {
>> +             u8 value = kbd_read_status();
>> +             ioport__write8(data, value);
>> +             break;
>> +     }
>> +     case I8042_DATA_REG: {
>> +             u32 value = kbd_read_data();
>> +             ioport__write32(data, value);
>> +             break;
>>       }
>> +     default:
>> +             break;
>
> should we BUG_ON() [or WARN_ON()] in that 'default' case?
>
>> +     default:
>> +             break;
>
> ditto. This could have caught the bug straight away, right?

I changed them to return false. That WARN_ON should be in common code, I think.

Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools, i8042: Fix device init failure

2011-06-02 Thread Sasha Levin
On Thu, 2011-06-02 at 16:38 +0300, Pekka Enberg wrote:
> This patch fixes the following device init failure:
> 
>   [0.945942] usbcore: registered new interface driver sisusb
>   [0.947349] i8042: PNP: No PS/2 controller found. Probing ports directly.
>   [0.949033] i8042: [0] 20 -> i8042 (command)
>   [0.950370] i8042: [57]  -- i8042 (timeout)
>   [1.521143] i8042: Can't read CTR while initializing i8042
>   [1.522495] i8042: probe of i8042 failed with error -5
> 
> The kbd_out() function was taking 32 bits instead of 8 bits for 'outb'. This
> caused kbd_write_command() to receive bogus 'val' which meant that
> I8042_CMD_CTL_RCTR case in the switch statement was never executed.
> 
> Cc: Ingo Molnar 
> Cc: Sasha Levin 
> Signed-off-by: Pekka Enberg 
> ---
>  tools/kvm/hw/i8042.c |   48 +---
>  1 files changed, 29 insertions(+), 19 deletions(-)
> 
> diff --git a/tools/kvm/hw/i8042.c b/tools/kvm/hw/i8042.c
> index 934ef4e..42ed4ba 100644
> --- a/tools/kvm/hw/i8042.c
> +++ b/tools/kvm/hw/i8042.c
> @@ -140,10 +140,7 @@ static void kbd_queue(u8 c)
>   kbd_update_irq();
>  }
>  
> -/*
> - * This function is called when the OS issues a write to port 0x64
> - */
> -static void kbd_write_command(u32 val)
> +static void kbd_write_command(u8 val)
>  {
>   switch (val) {
>   case I8042_CMD_CTL_RCTR:
> @@ -451,27 +448,40 @@ void kbd_handle_ptr(int buttonMask, int x, int y, 
> rfbClientPtr cl)
>   */
>  static bool kbd_in(struct ioport *ioport, struct kvm *kvm, u16 port, void 
> *data, int size, u32 count)
>  {
> - u32 result;
> -
> - if (port == I8042_COMMAND_REG) {
> - result = kbd_read_status();
> - ioport__write8(data, (char)result);
> - } else {
> - result = kbd_read_data();
> - ioport__write32(data, result);
> + switch (port) {
> + case I8042_COMMAND_REG: {
> + u8 value = kbd_read_status();
> + ioport__write8(data, value);
> + break;
> + }
> + case I8042_DATA_REG: {
> + u32 value = kbd_read_data();
> + ioport__write32(data, value);
> + break;
>   }
> + default:
> + break;
> + }
> +
>   return true;
>  }
>  
> -/*
> - * Called when the OS attempts to read from a keyboard port (0x60 or 0x64)
> - */
>  static bool kbd_out(struct ioport *ioport, struct kvm *kvm, u16 port, void 
> *data, int size, u32 count)
>  {
> - if (port == I8042_COMMAND_REG)
> - kbd_write_command(*((u32 *)data));
> - else
> - kbd_write_data(*((u32 *)data));
> + switch (port) {
> + case I8042_COMMAND_REG: {
> + u8 value = ioport__read8(data);
> + kbd_write_command(value);
> + break;
> + }
> + case I8042_DATA_REG: {
> + u32 value = ioport__read32(data);
> + kbd_write_data(value);
> + break;
> + }
> + default:
> + break;
> + }
>  
>   return true;
>  }

Tested it here, works great.
Looks like in my cause, instead of not detecting the keyboard it used to
not detect the mouse sometimes - that's also fixed now.

-- 

Sasha.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Document KVM_IOEVENTFD

2011-06-02 Thread Avi Kivity

On 06/01/2011 01:51 PM, Jan Kiszka wrote:

On 2011-05-31 15:44, Marcelo Tosatti wrote:
>  On Sat, May 28, 2011 at 02:12:30PM +0300, Sasha Levin wrote:
>>  Document KVM_IOEVENTFD that can be used to receive
>>  notifications of PIO/MMIO events without triggering
>>  an exit.
>>
>>  Cc: Avi Kivity
>>  Cc: Marcelo Tosatti
>>  Signed-off-by: Sasha Levin
>>  ---
>>   Documentation/virtual/kvm/api.txt |   30 ++
>>   1 files changed, 30 insertions(+), 0 deletions(-)
>
>  Applied (with wording fix), thanks.

Requires section number fix-up (4.56 ->  4.58).


The usual fix is to cut a 5.0 release.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM host freezing

2011-06-02 Thread Avi Kivity

On 06/02/2011 11:25 AM, Marc Haber wrote:

Hi,

I have just started deploying a host doing virtualization with KVM.
The box has an Athlon 64 X2, 4 GB RAM and is running Debian squeeze
with a locally built 2.6,39 kernel and backported versions of qemu-kvm
(0.14.0) and libvirt (0.9.0) from Debian sid. The box is currently
hosting five VMs, all of them Debian systems as well and rather
unloaded. The only time when there is significant load is when all VMs
are simultaneously starting up their cron jobs.

When the host starts up, it immediately spews the following lines to
the console:

kvm: 2865: cpu0 unhandled rdmsr: 0xc0010048
kvm: 2865: cpu0 unhandled wrmsr: 0xc0010048 data 210401
kvm: 2865: cpu0 unhandled rdmsr: 0xc0010001
kvm: 2849: cpu0 unhandled rdmsr: 0xc0010048
kvm: 2849: cpu0 unhandled wrmsr: 0xc0010048 data c0579f7cc0010448
kvm: 2849: cpu0 unhandled rdmsr: 0xc0010001
kvm: 2950: cpu0 unhandled rdmsr: 0xc0010048
kvm: 2950: cpu0 unhandled wrmsr: 0xc0010048 data c0579f7cc0010448
kvm: 2849: cpu1 unhandled rdmsr: 0xc0010048
kvm: 2963: cpu0 unhandled rdmsr: 0xc0010112
kvm: 2963: cpu0 unhandled rdmsr: 0xc0010048
kvm: 2963: cpu0 unhandled wrmsr: 0xc0010048 data 210401
kvm: 2963: cpu0 unhandled rdmsr: 0xc0010001
kvm: 2963: cpu1 unhandled rdmsr: 0xc0010048
kvm: 2963: cpu1 unhandled wrmsr: 0xc0010048 data 210401

Every few days, the system stops dead in its tracks and needs a hard
reset to be revived. I have a serial console, which unfortunately
disconnects me after a few minutes of inactivity, and only caches the
last few lines of activity. Whenever I connect to the serial console
of the frozen system, I have a few lines of the same "unhandled
(rd|wr)msr" messages.

The syslog doesn't show anything strange. The system just stops dead
in its tracks.

Is there any possibility that the freezes have to do with the
"unhandles (rd|wr)msr" messages?


Very unlikely.


When else could be the cause?

In the mean time, I have taken the box offline and am running memtest.
Up to now, everything seems to be fine.

Any hints will be appreciated.


You might try setting up netconsole to get reliable logging.

Do you have NMIs?  'grep NMI /proc/interrupts'.

Does running 'perf top -F 1' make the hang come sooner?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools, i8042: Fix device init failure

2011-06-02 Thread Ingo Molnar

* Pekka Enberg  wrote:

>  static bool kbd_in(struct ioport *ioport, struct kvm *kvm, u16 port, void 
> *data, int size, u32 count)
>  {
> - u32 result;
> -
> - if (port == I8042_COMMAND_REG) {
> - result = kbd_read_status();
> - ioport__write8(data, (char)result);
> - } else {
> - result = kbd_read_data();
> - ioport__write32(data, result);
> + switch (port) {
> + case I8042_COMMAND_REG: {
> + u8 value = kbd_read_status();
> + ioport__write8(data, value);
> + break;
> + }
> + case I8042_DATA_REG: {
> + u32 value = kbd_read_data();
> + ioport__write32(data, value);
> + break;
>   }
> + default:
> + break;

should we BUG_ON() [or WARN_ON()] in that 'default' case?

> + default:
> + break;

ditto. This could have caught the bug straight away, right?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm tools, i8042: Fix device init failure

2011-06-02 Thread Pekka Enberg
This patch fixes the following device init failure:

  [0.945942] usbcore: registered new interface driver sisusb
  [0.947349] i8042: PNP: No PS/2 controller found. Probing ports directly.
  [0.949033] i8042: [0] 20 -> i8042 (command)
  [0.950370] i8042: [57]  -- i8042 (timeout)
  [1.521143] i8042: Can't read CTR while initializing i8042
  [1.522495] i8042: probe of i8042 failed with error -5

The kbd_out() function was taking 32 bits instead of 8 bits for 'outb'. This
caused kbd_write_command() to receive bogus 'val' which meant that
I8042_CMD_CTL_RCTR case in the switch statement was never executed.

Cc: Ingo Molnar 
Cc: Sasha Levin 
Signed-off-by: Pekka Enberg 
---
 tools/kvm/hw/i8042.c |   48 +---
 1 files changed, 29 insertions(+), 19 deletions(-)

diff --git a/tools/kvm/hw/i8042.c b/tools/kvm/hw/i8042.c
index 934ef4e..42ed4ba 100644
--- a/tools/kvm/hw/i8042.c
+++ b/tools/kvm/hw/i8042.c
@@ -140,10 +140,7 @@ static void kbd_queue(u8 c)
kbd_update_irq();
 }
 
-/*
- * This function is called when the OS issues a write to port 0x64
- */
-static void kbd_write_command(u32 val)
+static void kbd_write_command(u8 val)
 {
switch (val) {
case I8042_CMD_CTL_RCTR:
@@ -451,27 +448,40 @@ void kbd_handle_ptr(int buttonMask, int x, int y, 
rfbClientPtr cl)
  */
 static bool kbd_in(struct ioport *ioport, struct kvm *kvm, u16 port, void 
*data, int size, u32 count)
 {
-   u32 result;
-
-   if (port == I8042_COMMAND_REG) {
-   result = kbd_read_status();
-   ioport__write8(data, (char)result);
-   } else {
-   result = kbd_read_data();
-   ioport__write32(data, result);
+   switch (port) {
+   case I8042_COMMAND_REG: {
+   u8 value = kbd_read_status();
+   ioport__write8(data, value);
+   break;
+   }
+   case I8042_DATA_REG: {
+   u32 value = kbd_read_data();
+   ioport__write32(data, value);
+   break;
}
+   default:
+   break;
+   }
+
return true;
 }
 
-/*
- * Called when the OS attempts to read from a keyboard port (0x60 or 0x64)
- */
 static bool kbd_out(struct ioport *ioport, struct kvm *kvm, u16 port, void 
*data, int size, u32 count)
 {
-   if (port == I8042_COMMAND_REG)
-   kbd_write_command(*((u32 *)data));
-   else
-   kbd_write_data(*((u32 *)data));
+   switch (port) {
+   case I8042_COMMAND_REG: {
+   u8 value = ioport__read8(data);
+   kbd_write_command(value);
+   break;
+   }
+   case I8042_DATA_REG: {
+   u32 value = ioport__read32(data);
+   kbd_write_data(value);
+   break;
+   }
+   default:
+   break;
+   }
 
return true;
 }
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 3/3] virtio_net: limit xmit polling

2011-06-02 Thread Michael S. Tsirkin
On Thu, Jun 02, 2011 at 01:24:57PM +0930, Rusty Russell wrote:
> On Wed, 1 Jun 2011 12:50:03 +0300, "Michael S. Tsirkin"  
> wrote:
> > Current code might introduce a lot of latency variation
> > if there are many pending bufs at the time we
> > attempt to transmit a new one. This is bad for
> > real-time applications and can't be good for TCP either.
> > 
> > Free up just enough to both clean up all buffers
> > eventually and to be able to xmit the next packet.
> 
> OK, I found this quite confusing to read.
> 
> > -   while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) {
> > +   while ((r = virtqueue_min_capacity(vi->svq) < MAX_SKB_FRAGS + 2) ||
> > +  min_skbs-- > 0) {
> > +   skb = virtqueue_get_buf(vi->svq, &len);
> > +   if (unlikely(!skb))
> > +   break;
> > pr_debug("Sent skb %p\n", skb);
> > vi->dev->stats.tx_bytes += skb->len;
> > vi->dev->stats.tx_packets++;
> > dev_kfree_skb_any(skb);
> > }
> > +   return r;
> >  }
> 
> Gah... what a horrible loop.
> 
> Basically, this patch makes hard-to-read code worse, and we should try
> to make it better.
> 
> Currently, xmit *can* fail when an xmit interrupt wakes the queue, but
> the packet(s) xmitted didn't free up enough space for the new packet.
> With indirect buffers this only happens if we hit OOM (and thus go to
> direct buffers).
> 
> We could solve this by only waking the queue in skb_xmit_done if the
> capacity is >= 2 + MAX_SKB_FRAGS.  But can we do it without a race?

I don't think so.

> If not, then I'd really prefer to see this, because I think it's clearer:
> 
> // Try to free 2 buffers for every 1 xmit, to stay ahead.
> free_old_buffers(2)
> 
> if (!add_buf()) {
> // Screw latency, free them all.
> free_old_buffers(UINT_MAX)
> // OK, this can happen if we are using direct buffers,
> // and the xmit interrupt woke us but the packets
> // xmitted were smaller than this one.  Rare though.
> if (!add_buf())
> Whinge and stop queue, maybe loop.
> }
> 
> if (capacity < 2 + MAX_SKB_FRAGS) {
> // We don't have enough for the next packet?  Try
> // freeing more.
> free_old_buffers(UINT_MAX);
> if (capacity < 2 + MAX_SKB_FRAGS) {
> Stop queue, maybe loop.
> }
> 
> The current code makes my head hurt :(
> 
> Thoughts?
> Rusty.

OK, I have something very similar, but I still dislike the screw the
latency part: this path is exactly what the IBM guys seem to hit.  So I
created two functions: one tries to free a constant number and another
one up to capacity. I'll post that now.


-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/3] virtio_ring: add capacity check API

2011-06-02 Thread Michael S. Tsirkin
On Thu, Jun 02, 2011 at 11:41:50AM +0930, Rusty Russell wrote:
> On Wed, 1 Jun 2011 12:49:46 +0300, "Michael S. Tsirkin"  
> wrote:
> > Add API to check ring capacity. Because of the option
> > to use indirect buffers, this returns the worst
> > case, not the normal case capacity.
> 
> Can we drop the silly "add_buf() returns capacity" hack then?
> 
> Thanks,
> Rusty.

Sure.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/3] virtio_net: fix tx capacity checks using new API

2011-06-02 Thread Michael S. Tsirkin
On Thu, Jun 02, 2011 at 11:40:26AM +0930, Rusty Russell wrote:
> On Wed, 1 Jun 2011 12:49:54 +0300, "Michael S. Tsirkin"  
> wrote:
> > In the (rare) case where new descriptors are used
> > while virtio_net enables vq callback for the TX vq,
> > virtio_net uses the number of sg entries in the skb it frees to
> > calculate how many descriptors in the ring have just been made
> > available.  But this value is an overestimate: with indirect buffers
> > each skb only uses one descriptor entry, meaning we may wake the queue
> > only to find we still can't transmit anything.
> 
> This is a bit misleading.
> 
> The value is an overestimate, but so is the requirement for
> 2+MAX_SKB_FRAGS, *unless* we suddenly drop into direct mode due to OOM.
> 
> Thanks,
> Rusty.

I agree, it's unlikely.

s/still can't transmit anything/are still out of space and need to stop
the ring almost at once/

Better?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] kvm tools, i8042: Use kernel command names

2011-06-02 Thread Ingo Molnar

oh, the keyboard driver did not work for me due to:

if (vnc) {
kbd__init(kvm);
vesa__init(kvm);
}

I don't have VNC installed - i just wanted to see whether the 
keyboard gets detected.

With kbd__init() called unconditionally it works fine here:

[1.658000] serio: i8042 KBD port at 0x60,0x64 irq 1
[1.659000] mousedev: PS/2 mouse device common for all mice

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] kvm tools, i8042: Use kernel command names

2011-06-02 Thread Pekka Enberg
On Thu, Jun 2, 2011 at 2:19 PM, Ingo Molnar  wrote:
>
> ok, with the shared commands it place here's some
> feature-completeness comparison of the kernel driver and the
> emulator:
>
>                                guest driver i8042.h   |   hw/i8042.c emu
> /*
>  * Status register bits
>  */
> #define I8042_STR_PARITY        0x80                   |  missing
> #define I8042_STR_TIMEOUT       0x40                   |  missing
> #define I8042_STR_AUXDATA       0x20                   |  ok
> #define I8042_STR_KEYLOCK       0x10                   |  ok
> #define I8042_STR_CMDDAT        0x08                   |  ok
> #define I8042_STR_MUXERR        0x04                   |  ok
> #define I8042_STR_IBF           0x02                   |  missing
> #define I8042_STR_OBF           0x01                   |  ok
>
> The missing bits might be uninteresting, but wanted to map them out.
>
> There's more divergence for commands:
>
>                                guest driver i8042.h   |   hw/i8042.c emu
>
> /*
>  * Standard commands.
>  */
>
> #define I8042_CMD_CTL_RCTR      0x0120                 != 0x20
> #define I8042_CMD_CTL_WCTR      0x1060                 != 0x60
> #define I8042_CMD_CTL_TEST      0x01aa                 | missing
>
> #define I8042_CMD_KBD_DISABLE   0x00ad                 | missing
> #define I8042_CMD_KBD_ENABLE    0x00ae                 | missing
> #define I8042_CMD_KBD_TEST      0x01ab                 | missing
> #define I8042_CMD_KBD_LOOP      0x11d2                 | missing
>
> #define I8042_CMD_AUX_DISABLE   0x00a7                 | ok
> #define I8042_CMD_AUX_ENABLE    0x00a8                 | ok
> #define I8042_CMD_AUX_TEST      0x01a9                 | != 0xa9
> #define I8042_CMD_AUX_SEND      0x10d4                 | != 0xd4
> #define I8042_CMD_AUX_LOOP      0x11d3                 | != 0xd3
>
> #define I8042_CMD_MUX_PFX       0x0090                 | missing
> #define I8042_CMD_MUX_SEND      0x1090                 | missing
>
> Now the missing high word might just be some command symbol encoding
> difference, but it might be something more substantial as well.

The high byte is the data byte that's written immediately after the
command byte.

Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] kvm tools, i8042: Use kernel command names

2011-06-02 Thread Ingo Molnar

ok, with the shared commands it place here's some 
feature-completeness comparison of the kernel driver and the 
emulator:

guest driver i8042.h   |   hw/i8042.c emu
/*
 * Status register bits
 */
#define I8042_STR_PARITY0x80   |  missing
#define I8042_STR_TIMEOUT   0x40   |  missing
#define I8042_STR_AUXDATA   0x20   |  ok
#define I8042_STR_KEYLOCK   0x10   |  ok
#define I8042_STR_CMDDAT0x08   |  ok
#define I8042_STR_MUXERR0x04   |  ok
#define I8042_STR_IBF   0x02   |  missing
#define I8042_STR_OBF   0x01   |  ok

The missing bits might be uninteresting, but wanted to map them out.

There's more divergence for commands:

guest driver i8042.h   |   hw/i8042.c emu

/*
 * Standard commands.
 */

#define I8042_CMD_CTL_RCTR  0x0120 != 0x20
#define I8042_CMD_CTL_WCTR  0x1060 != 0x60
#define I8042_CMD_CTL_TEST  0x01aa | missing

#define I8042_CMD_KBD_DISABLE   0x00ad | missing
#define I8042_CMD_KBD_ENABLE0x00ae | missing
#define I8042_CMD_KBD_TEST  0x01ab | missing
#define I8042_CMD_KBD_LOOP  0x11d2 | missing

#define I8042_CMD_AUX_DISABLE   0x00a7 | ok
#define I8042_CMD_AUX_ENABLE0x00a8 | ok
#define I8042_CMD_AUX_TEST  0x01a9 | != 0xa9
#define I8042_CMD_AUX_SEND  0x10d4 | != 0xd4
#define I8042_CMD_AUX_LOOP  0x11d3 | != 0xd3

#define I8042_CMD_MUX_PFX   0x0090 | missing
#define I8042_CMD_MUX_SEND  0x1090 | missing


Now the missing high word might just be some command symbol encoding 
difference, but it might be something more substantial as well.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] kvm tools, i8042: Use kernel command names

2011-06-02 Thread Ingo Molnar

nice!

Acked-by: Ingo Molnar 

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] kvm tools, i8042: Use kernel command names

2011-06-02 Thread Pekka Enberg
This patch renames the command constants in hw/i8042.c to use similar names as
in . Note: we cannot use  constants directly
because they include the command and data.

Cc: Ingo Molnar 
Cc: Sasha Levin 
Signed-off-by: Pekka Enberg 
---
 tools/kvm/hw/i8042.c |   37 -
 1 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/tools/kvm/hw/i8042.c b/tools/kvm/hw/i8042.c
index 48a3c9d..934ef4e 100644
--- a/tools/kvm/hw/i8042.c
+++ b/tools/kvm/hw/i8042.c
@@ -22,13 +22,16 @@
 #define I8042_DATA_REG 0x60
 #define I8042_COMMAND_REG  0x64
 
-#define CMD_READ_MODE  0x20
-#define CMD_WRITE_MODE 0x60
-#define CMD_WRITE_AUX_BUF  0xD3
-#define CMD_WRITE_AUX  0xD4
-#define CMD_TEST_AUX   0xA9
-#define CMD_DISABLE_AUX0xA7
-#define CMD_ENABLE_AUX 0xA8
+/*
+ * Commands
+ */
+#define I8042_CMD_CTL_RCTR 0x20
+#define I8042_CMD_CTL_WCTR 0x60
+#define I8042_CMD_AUX_LOOP 0xD3
+#define I8042_CMD_AUX_SEND 0xD4
+#define I8042_CMD_AUX_TEST 0xA9
+#define I8042_CMD_AUX_DISABLE  0xA7
+#define I8042_CMD_AUX_ENABLE   0xA8
 
 #define RESPONSE_ACK   0xFA
 
@@ -143,22 +146,22 @@ static void kbd_queue(u8 c)
 static void kbd_write_command(u32 val)
 {
switch (val) {
-   case CMD_READ_MODE:
+   case I8042_CMD_CTL_RCTR:
kbd_queue(state.mode);
break;
-   case CMD_WRITE_MODE:
-   case CMD_WRITE_AUX:
-   case CMD_WRITE_AUX_BUF:
+   case I8042_CMD_CTL_WCTR:
+   case I8042_CMD_AUX_SEND:
+   case I8042_CMD_AUX_LOOP:
state.write_cmd = val;
break;
-   case CMD_TEST_AUX:
+   case I8042_CMD_AUX_TEST:
/* 0 means we're a normal PS/2 mouse */
mouse_queue(0);
break;
-   case CMD_DISABLE_AUX:
+   case I8042_CMD_AUX_DISABLE:
state.mode |= MODE_DISABLE_AUX;
break;
-   case CMD_ENABLE_AUX:
+   case I8042_CMD_AUX_ENABLE:
state.mode &= ~MODE_DISABLE_AUX;
break;
default:
@@ -211,15 +214,15 @@ static u32 kbd_read_status(void)
 static void kbd_write_data(u32 val)
 {
switch (state.write_cmd) {
-   case CMD_WRITE_MODE:
+   case I8042_CMD_CTL_WCTR:
state.mode = val;
kbd_update_irq();
break;
-   case CMD_WRITE_AUX_BUF:
+   case I8042_CMD_AUX_LOOP:
mouse_queue(val);
mouse_queue(RESPONSE_ACK);
break;
-   case CMD_WRITE_AUX:
+   case I8042_CMD_AUX_SEND:
/* The OS wants to send a command to the mouse */
mouse_queue(RESPONSE_ACK);
switch (val) {
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] kmv tools, i8042: Use kernel status register bit names

2011-06-02 Thread Pekka Enberg
This patch renames the status register bit constant names to be identical with
drivers/input/serio/i8042.h. This makes code review easier.

Cc: Ingo Molnar 
Cc: Sasha Levin 
Signed-off-by: Pekka Enberg 
---
 tools/kvm/hw/i8042.c |   18 +-
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/kvm/hw/i8042.c b/tools/kvm/hw/i8042.c
index d3f5d9a..48a3c9d 100644
--- a/tools/kvm/hw/i8042.c
+++ b/tools/kvm/hw/i8042.c
@@ -42,11 +42,11 @@
 /*
  * Status register bits
  */
-#define KBD_STATUS_AUX_OBF 0x20
-#define KBD_STATUS_INH 0x10
-#define KBD_STATUS_A2  0x08
-#define KBD_STATUS_SYS 0x04
-#define KBD_STATUS_OBF 0x01
+#define I8042_STR_AUXDATA  0x20
+#define I8042_STR_KEYLOCK  0x10
+#define I8042_STR_CMDDAT   0x08
+#define I8042_STR_MUXERR   0x04
+#define I8042_STR_OBF  0x01
 
 #define KBD_MODE_KBD_INT   0x01
 #define KBD_MODE_SYS   0x02
@@ -92,16 +92,16 @@ static void kbd_update_irq(void)
u8 mlevel = 0;
 
/* First, clear the kbd and aux output buffer full bits */
-   state.status &= ~(KBD_STATUS_OBF | KBD_STATUS_AUX_OBF);
+   state.status &= ~(I8042_STR_OBF | I8042_STR_AUXDATA);
 
if (state.kcount > 0) {
-   state.status |= KBD_STATUS_OBF;
+   state.status |= I8042_STR_OBF;
klevel = 1;
}
 
/* Keyboard has higher priority than mouse */
if (klevel == 0 && state.mcount != 0) {
-   state.status |= KBD_STATUS_OBF | KBD_STATUS_AUX_OBF;
+   state.status |= I8042_STR_OBF | I8042_STR_AUXDATA;
mlevel = 1;
}
 
@@ -281,7 +281,7 @@ static void kbd_write_data(u32 val)
 static void kbd_reset(void)
 {
state = (struct kbd_state) {
-   .status = KBD_STATUS_SYS | KBD_STATUS_A2 | 
KBD_STATUS_INH, /* 0x1c */
+   .status = I8042_STR_MUXERR | I8042_STR_CMDDAT | 
I8042_STR_KEYLOCK, /* 0x1c */
.mode   = KBD_MODE_KBD_INT | KBD_MODE_SYS, /* 0x3 */
.mres   = AUX_DEFAULT_RESOLUTION,
.msample= AUX_DEFAULT_SAMPLE,
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] kvm tools, i8042: Sort status register bits

2011-06-02 Thread Pekka Enberg
This patch sorts the status register bits in preparation for switching over to
kernel constant names.

Cc: Ingo Molnar 
Cc: Sasha Levin 
Signed-off-by: Pekka Enberg 
---
 tools/kvm/hw/i8042.c |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/tools/kvm/hw/i8042.c b/tools/kvm/hw/i8042.c
index de0e73b..d3f5d9a 100644
--- a/tools/kvm/hw/i8042.c
+++ b/tools/kvm/hw/i8042.c
@@ -39,11 +39,14 @@
 #define AUX_DEFAULT_RESOLUTION 0x2
 #define AUX_DEFAULT_SAMPLE 100
 
-#define KBD_STATUS_SYS 0x4
-#define KBD_STATUS_A2  0x8
+/*
+ * Status register bits
+ */
+#define KBD_STATUS_AUX_OBF 0x20
 #define KBD_STATUS_INH 0x10
+#define KBD_STATUS_A2  0x08
+#define KBD_STATUS_SYS 0x04
 #define KBD_STATUS_OBF 0x01
-#define KBD_STATUS_AUX_OBF 0x20
 
 #define KBD_MODE_KBD_INT   0x01
 #define KBD_MODE_SYS   0x02
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index

2011-06-02 Thread Michael S. Tsirkin
On Wed, Jun 01, 2011 at 03:24:29AM -0400, Mark Wu wrote:
> Current index allocation in virtio-blk is based on a monotonically
> increasing variable "index". It could cause some confusion about disk
> name in the case of hot-plugging disks. And it's impossible to find the
> lowest available index by just maintaining a simple index. So it's
> changed to use ida to allocate index via referring to the index
> allocation in scsi disk.
> 
> Signed-off-by: Mark Wu 
> ---
>  drivers/block/virtio_blk.c |   37 -
>  1 files changed, 32 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 079c088..ba734b3 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -8,10 +8,14 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define PART_BITS 4
>  
> -static int major, index;
> +static int major;
> +static DEFINE_SPINLOCK(vd_index_lock);
> +static DEFINE_IDA(vd_index_ida);
> +
>  struct workqueue_struct *virtblk_wq;
>  
>  struct virtio_blk
> @@ -23,6 +27,7 @@ struct virtio_blk
>  
>   /* The disk structure for the kernel. */
>   struct gendisk *disk;
> + u32 index;
>  
>   /* Request tracking. */
>   struct list_head reqs;
> @@ -343,12 +348,26 @@ static int __devinit virtblk_probe(struct virtio_device 
> *vdev)
>   struct request_queue *q;
>   int err;
>   u64 cap;
> - u32 v, blk_size, sg_elems, opt_io_size;
> + u32 v, blk_size, sg_elems, opt_io_size, index;
>   u16 min_io_size;
>   u8 physical_block_exp, alignment_offset;
>  
> - if (index_to_minor(index) >= 1 << MINORBITS)
> - return -ENOSPC;
> + do {
> + if (!ida_pre_get(&vd_index_ida, GFP_KERNEL))
> + return err;
> +
> + spin_lock(&vd_index_lock);
> + err = ida_get_new(&vd_index_ida, &index);
> + spin_unlock(&vd_index_lock);
> + } while (err == -EAGAIN);
> +
> + if (err)
> + return err;
> +
> + if (index_to_minor(index) >= 1 << MINORBITS) {
> + err =  -ENOSPC;
> + goto out_free_index;
> + }
>  
>   /* We need to know how many segments before we allocate. */
>   err = virtio_config_val(vdev, VIRTIO_BLK_F_SEG_MAX,
> @@ -421,7 +440,7 @@ static int __devinit virtblk_probe(struct virtio_device 
> *vdev)
>   vblk->disk->private_data = vblk;
>   vblk->disk->fops = &virtblk_fops;
>   vblk->disk->driverfs_dev = &vdev->dev;
> - index++;
> + vblk->index = index;
>  
>   /* configure queue flush support */
>   if (virtio_has_feature(vdev, VIRTIO_BLK_F_FLUSH))
> @@ -516,6 +535,10 @@ out_free_vq:
>   vdev->config->del_vqs(vdev);
>  out_free_vblk:
>   kfree(vblk);
> +out_free_index:
> + spin_lock(&vd_index_lock);
> + ida_remove(&vd_index_ida, index);
> + spin_unlock(&vd_index_lock);
>  out:
>   return err;
>  }
> @@ -529,6 +552,10 @@ static void __devexit virtblk_remove(struct 
> virtio_device *vdev)
>   /* Nothing should be pending. */
>   BUG_ON(!list_empty(&vblk->reqs));
>  
> + spin_lock(&vd_index_lock);
> + ida_remove(&vd_index_ida, vblk->index);
> + spin_unlock(&vd_index_lock);
> +
>   /* Stop all the virtqueues. */
>   vdev->config->reset(vdev);

As we get index first thing in _probe, let's remove last thing
in _remove.

I'm not sure violating the rule of cleanup
in the reverse order of initialization can lead
to problems here, but it's better to stick to this rule regardless,
IMO.

> -- 
> 1.7.1
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index

2011-06-02 Thread Michael S. Tsirkin
On Wed, Jun 01, 2011 at 04:25:48AM -0400, Mark Wu wrote:
> On 06/01/2011 03:24 AM, Mark Wu wrote:
> >-if (index_to_minor(index)>= 1<<  MINORBITS)
> >-return -ENOSPC;
> >+do {
> >+if (!ida_pre_get(&vd_index_ida, GFP_KERNEL))
> >+return err;
> >+
> There's a problem in above code: err is not initialized before
> using, so change it to return -1;
> +   do {
> +   if (!ida_pre_get(&vd_index_ida, GFP_KERNEL))
> +   return -1;

Not -1. Pls return -ENOMEM.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 01/13] Generic DMA memory access interface

2011-06-02 Thread David Gibson
On Wed, Jun 01, 2011 at 08:16:44AM -0700, Richard Henderson wrote:
> On 06/01/2011 07:29 AM, Avi Kivity wrote:
> > On 06/01/2011 05:01 PM, Richard Henderson wrote:
> >> >  +err = dev->mmu->translate(dev, addr,&paddr,&plen, is_write);
> >>
> >> I see you didn't take my suggestion for using an opaque callback pointer.
> >> Really and truly, I won't be able to use this as-is for Alpha.
> >>
> > 
> > Rather than opaques, please pass the DMA engine itself and use 
> > container_of().
> 
> The dma engine object is currently sitting in the PCIBus structure.
> Which is private, and can't be extended by a host bridge implementation.
> 
> The entire code could be re-arranged, true, but please suggest something
> reasonable.
> 
> > We should be removing opaques, not adding them.
> 
> See my followup elsewhere.  Opaques *can* be cleaner than upcasting,
> particularly if there are too many hoops through which to jump.

So, in the meantime, I've also done a version of Eduard's earlier
patches, with added support for the PAPR hypervisor managed IOMMU.

I have also significantly reworked how the structure lookup works,
partly because in my case I'l looking at IOMMU translation for non-PCI
devices, but I think it may also address your concerns.  I'm still
using upcasts, but there are less steps from the device to the IOMMU
state.

I've been sick and haven't had a chance to merge my stuff with
Eduard's changes.  I'll post them anyway, as another discussion
point.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Add support for PS/2 keyboard system

2011-06-02 Thread Ingo Molnar

btw., i tried to match up the kernel's i8042.c with kvm's and it was 
pretty hard due to arbitrary differences. The kernel knows about:

--->
/*
 * Status register bits.
 */

#define I8042_STR_PARITY0x80
#define I8042_STR_TIMEOUT   0x40
#define I8042_STR_AUXDATA   0x20
#define I8042_STR_KEYLOCK   0x10
#define I8042_STR_CMDDAT0x08
#define I8042_STR_MUXERR0x04
#define I8042_STR_IBF   0x02
#define I8042_STR_OBF   0x01

/*
 * Control register bits.
 */

#define I8042_CTR_KBDINT0x01
#define I8042_CTR_AUXINT0x02
#define I8042_CTR_IGNKEYLOCK0x08
#define I8042_CTR_KBDDIS0x10
#define I8042_CTR_AUXDIS0x20
#define I8042_CTR_XLATE 0x40

/*
 * Return codes.
 */

#define I8042_RET_CTL_TEST  0x55
<---


While hw/i8042.c knows about similar things, just under different 
names:

--->
#define CMD_READ_MODE   0x20
#define CMD_WRITE_MODE  0x60
#define CMD_WRITE_AUX_BUF   0xD3
#define CMD_WRITE_AUX   0xD4
#define CMD_TEST_AUX0xA9
#define CMD_DISABLE_AUX 0xA7
#define CMD_ENABLE_AUX  0xA8

#define RESPONSE_ACK0xFA

#define MODE_DISABLE_AUX0x20

#define AUX_ENABLE_REPORTING0x20
#define AUX_SCALING_FLAG0x10
#define AUX_DEFAULT_RESOLUTION  0x2
#define AUX_DEFAULT_SAMPLE  100

#define KBD_STATUS_SYS  0x4
#define KBD_STATUS_A2   0x8
#define KBD_STATUS_INH  0x10
#define KBD_STATUS_OBF  0x01
#define KBD_STATUS_AUX_OBF  0x20

#define KBD_MODE_KBD_INT0x01
#define KBD_MODE_SYS0x02
<---

Would it be possible for hw/i8042.c to use the kernel names, to make 
it easier to review and debug this code?

(also, 0x2 should be 0x02 to make it easier to review as well.)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 01/13] Generic DMA memory access interface

2011-06-02 Thread David Gibson
On Wed, Jun 01, 2011 at 08:45:56AM -0700, Richard Henderson wrote:
> On 06/01/2011 08:35 AM, Eduard - Gabriel Munteanu wrote:
> > Maybe it's not nice, but you're missing the fact upcasting gives you
> > some type safety. With opaques you have none.
> 
> Lol.  Do you understand what container_of does?
> This is not dynamic_cast<> with RTTI.
> 
> You can put any type name in there that you like,
> so long as it has a field name to match.  The type
> of the field you give doesn't even have to match
> the type of the pointer that you pass in.

Uh, if that's true, that's a bug in the container_of implementation.
The ccan container_of implementation, for example, certainly does
check that the given field has type matching the pointer.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Add support for PS/2 keyboard system

2011-06-02 Thread Ingo Molnar

* Pekka Enberg  wrote:

> On Thu, 2011-06-02 at 11:18 +0200, Ingo Molnar wrote:
> > * Pekka Enberg  wrote:
> > 
> > > On Thu, 2 Jun 2011, Sasha Levin wrote:
> > > >Strange, those are the same errors you're supposed to get when theres no
> > > >i8042 device at all.
> > > >
> > > >Could you verify that hw/pckbd.c is being built? Did the makefile get
> > > >updated?
> > > >
> > > >What I'm seeing here is:
> > > >
> > > >[   49.326316] i8042: PNP: No PS/2 controller found. Probing ports
> > > >directly.
> > > >[   50.410492] serio: i8042 KBD port at 0x60,0x64 irq 1
> > > >[   51.518829] serio: i8042 AUX port at 0x60,0x64 irq 12
> > > >[   51.519104] mousedev: PS/2 mouse device common for all mice
> > > 
> > > kbd_init() is called if that's what you wanted to know.
> > 
> > btw., i suspect port IO printf()s in i8042.c would be rather 
> > informative now: they would nicely and synchronously interlace with 
> > guest serial console output, showing exactly what is going on in the 
> > probing sequence.
> 
> Do you mean a generic ioport tracing thing?
> 
>   ./kvm run --trace=ioport
> 
> or something?

heh, yeah :-)

btw., i'm getting the same message here:

[1.458000] usbcore: registered new interface driver libusual
[1.46] i8042: PNP: No PS/2 controller found. Probing ports directly.
[1.463000] i8042: Can't read CTR while initializing i8042
[1.464000] i8042: probe of i8042 failed with error -5
[1.465000] mousedev: PS/2 mouse device common for all mice
[1.467000] rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Add support for PS/2 keyboard system

2011-06-02 Thread Sasha Levin
On Thu, 2011-06-02 at 12:15 +0300, Pekka Enberg wrote:
> On Thu, 2 Jun 2011, Sasha Levin wrote:
> > Strange, those are the same errors you're supposed to get when theres no
> > i8042 device at all.
> >
> > Could you verify that hw/pckbd.c is being built? Did the makefile get
> > updated?
> >
> > What I'm seeing here is:
> >
> > [   49.326316] i8042: PNP: No PS/2 controller found. Probing ports
> > directly.
> > [   50.410492] serio: i8042 KBD port at 0x60,0x64 irq 1
> > [   51.518829] serio: i8042 AUX port at 0x60,0x64 irq 12
> > [   51.519104] mousedev: PS/2 mouse device common for all mice
> 
> kbd_init() is called if that's what you wanted to know.

I've pulled master, and tried running keyboard on it. What I see now is:

[   33.114810] i8042: PNP: No PS/2 controller found. Probing ports
directly.
[   33.365370] serio: i8042 KBD port at 0x60,0x64 irq 1
[   33.365599] mousedev: PS/2 mouse device common for all mice

Which means that the mouse is gone. Might be related to what you're
seeing.


-- 

Sasha.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC]QEMU disk I/O limits

2011-06-02 Thread Michal Suchanek
On 1 June 2011 05:12, Zhi Yong Wu  wrote:
> On Tue, May 31, 2011 at 03:55:49PM -0400, Vivek Goyal wrote:
>>Date: Tue, 31 May 2011 15:55:49 -0400
>>From: Vivek Goyal 
>>To: Zhi Yong Wu 
>>Cc: kw...@redhat.com, aligu...@us.ibm.com, stefa...@linux.vnet.ibm.com,
>>       kvm@vger.kernel.org, guijianf...@cn.fujitsu.com,
>>       qemu-de...@nongnu.org, wu...@cn.ibm.com,
>>       herb...@gondor.hengli.com.au, luow...@cn.ibm.com, zh...@cn.ibm.com,
>>       zhaoy...@cn.ibm.com, l...@redhat.com, rahar...@us.ibm.com
>>Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>>User-Agent: Mutt/1.5.21 (2010-09-15)
>>
>>On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
>>
>>[..]
>>>     3.) How the users enable and play with it
>>>     QEMU -drive option will be extended so that disk I/O limits can be 
>>> specified on its command line, such as -drive [iops=xxx,][throughput=xxx] 
>>> or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this 
>>> argument is specified, it means that "disk I/O limits" feature is enabled 
>>> for this drive disk.
>>
>>How does throughput interface look like? is it bytes per second or something
>>else?
> HI, Vivek,
> It will be a value based on bytes per second.
>
>>
>>Do we have read and write variants for throughput as we have for iops.
> QEMU code has two variants "rd_bytes, wr_bytes", but we maybe need to get 
> their bytes per second.
>
>>
>>if you have bytes interface(as kenrel does), then "bps_rd" and "bps_wr"
>>might be good names too for thoughput interface.
> I agree with you, and can change them as your suggestions.
>

Changing them this way is not going to be an improvement. While
rd_bytes and wr_bytes lack the time interval specification bps_rd and
bps_wr is ambiguous. Is that bits? bytes? Sure, there should be some
distinction by capitalization but that does not apply since qemu
arguments are all lowercase.

Thanks

Michal
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Add support for PS/2 keyboard system

2011-06-02 Thread Pekka Enberg
On Thu, 2011-06-02 at 11:18 +0200, Ingo Molnar wrote:
> * Pekka Enberg  wrote:
> 
> > On Thu, 2 Jun 2011, Sasha Levin wrote:
> > >Strange, those are the same errors you're supposed to get when theres no
> > >i8042 device at all.
> > >
> > >Could you verify that hw/pckbd.c is being built? Did the makefile get
> > >updated?
> > >
> > >What I'm seeing here is:
> > >
> > >[   49.326316] i8042: PNP: No PS/2 controller found. Probing ports
> > >directly.
> > >[   50.410492] serio: i8042 KBD port at 0x60,0x64 irq 1
> > >[   51.518829] serio: i8042 AUX port at 0x60,0x64 irq 12
> > >[   51.519104] mousedev: PS/2 mouse device common for all mice
> > 
> > kbd_init() is called if that's what you wanted to know.
> 
> btw., i suspect port IO printf()s in i8042.c would be rather 
> informative now: they would nicely and synchronously interlace with 
> guest serial console output, showing exactly what is going on in the 
> probing sequence.

Do you mean a generic ioport tracing thing?

  ./kvm run --trace=ioport

or something?

Pekka


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Add support for PS/2 keyboard system

2011-06-02 Thread Ingo Molnar

* Pekka Enberg  wrote:

> On Thu, 2 Jun 2011, Sasha Levin wrote:
> >Strange, those are the same errors you're supposed to get when theres no
> >i8042 device at all.
> >
> >Could you verify that hw/pckbd.c is being built? Did the makefile get
> >updated?
> >
> >What I'm seeing here is:
> >
> >[   49.326316] i8042: PNP: No PS/2 controller found. Probing ports
> >directly.
> >[   50.410492] serio: i8042 KBD port at 0x60,0x64 irq 1
> >[   51.518829] serio: i8042 AUX port at 0x60,0x64 irq 12
> >[   51.519104] mousedev: PS/2 mouse device common for all mice
> 
> kbd_init() is called if that's what you wanted to know.

btw., i suspect port IO printf()s in i8042.c would be rather 
informative now: they would nicely and synchronously interlace with 
guest serial console output, showing exactly what is going on in the 
probing sequence.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm tools: Add QCOW level2 caching support

2011-06-02 Thread Ingo Molnar

* Ingo Molnar  wrote:

> This introduces a ~1msec delay and thus simulates IO, but the 
> delays are *constant* [make sure you use a high-res timers kernel], 
> so they do not result in nearly as much measurement noise as real 
> block IO does.
> 
> The IO delays will still be there, so any caching advantages (and 
> CPU overhead reductions) will be measurable very clearly.
> 
> This way you are basically 'emulating' a real disk drive but you 
> will emulate uniform latencies, which makes measurements a lot more 
> reliable - while still relevant to the end result.
> 
> So if under such a measurement model you can prove an improvement 
> with a patch, that improvement will be there with real disks as 
> well - just harder to prove.

Another risk that the current situation carries in itself, beyond 
making it more difficult to measure improvements, is that based on a 
"bad" Bonnie outlier or artifact you might throw away a perfectly 
good change accidentally!

So whenever you think you are fighting noise you need to improve your 
measurements, as entropy is a pretty tough opponent to beat.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Add support for PS/2 keyboard system

2011-06-02 Thread Pekka Enberg

On Thu, 2 Jun 2011, Sasha Levin wrote:

Strange, those are the same errors you're supposed to get when theres no
i8042 device at all.

Could you verify that hw/pckbd.c is being built? Did the makefile get
updated?

What I'm seeing here is:

[   49.326316] i8042: PNP: No PS/2 controller found. Probing ports
directly.
[   50.410492] serio: i8042 KBD port at 0x60,0x64 irq 1
[   51.518829] serio: i8042 AUX port at 0x60,0x64 irq 12
[   51.519104] mousedev: PS/2 mouse device common for all mice


kbd_init() is called if that's what you wanted to know.

Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Add support for PS/2 keyboard system

2011-06-02 Thread Sasha Levin
On Thu, 2011-06-02 at 11:52 +0300, Pekka Enberg wrote:
> On Wed, 1 Jun 2011, Sasha Levin wrote:
> > From: John Floren 
> >
> > Add support for PS/2 keyboard system with AUX device (aka mouse).
> > The device works with vnc, the guest must be started with the
> > '--vnc' parameter for the device to be initialized.
> >
> > Signed-off-by: John Floren 
> > [ turn into patch and clean up code ]
> > Signed-off-by: Sasha Levin 
> 
> I am seeing this in my guest dmesg:
> 
> [   24.972609] i8042: PNP: No PS/2 controller found. Probing ports 
> directly.
> [   25.568982] i8042: Can't read CTR while initializing i8042
> [   25.568996] i8042: probe of i8042 failed with error -5
> [   25.569102] mousedev: PS/2 mouse device common for all mice
> 
> and nothing happens when I try to press some keys in vncviewer. There's a 
> small blinking thingy following the mouse which I assume is our guest 
> mouse cursor?

Strange, those are the same errors you're supposed to get when theres no
i8042 device at all.

Could you verify that hw/pckbd.c is being built? Did the makefile get
updated?

What I'm seeing here is:

[   49.326316] i8042: PNP: No PS/2 controller found. Probing ports
directly.
[   50.410492] serio: i8042 KBD port at 0x60,0x64 irq 1
[   51.518829] serio: i8042 AUX port at 0x60,0x64 irq 12
[   51.519104] mousedev: PS/2 mouse device common for all mice

-- 

Sasha.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm tools: Add QCOW level2 caching support

2011-06-02 Thread Ingo Molnar

* Prasad Joshi  wrote:

> I am not sure how to induce the delay you mentioned. [...]

In the simplest version just add:

if (debug__io_delay)
udelay(1000);

to the code that does a read() from the disk image. This will 
introduce a 1 msec delay - that should be plenty enough to simulate 
most effects of IO delays.

Later on you could complicate it with some disk geometry 
approximation:

delay = read_size_in_kb;/* 1 usec per KB read */
delay += seek_distance_in_kb;   /* 1 usec per KB disk-head movement */
udelay(delay);

Where 'read_size_in_kb' is the number of bytes read in KB, while 
seek_distance_in_kb measures the position of the the last byte read 
by the previous read() call to the first byte of the current read().

( Also, instead of copying the disk image to /dev/shm you could add a 
  debug switch to mmap() and mlock() it directly. Assuming there's enough
  RAM in the box. )

But i'd strongly suggest to keep the initial version simple.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Add support for PS/2 keyboard system

2011-06-02 Thread Pekka Enberg

On Thu, 2 Jun 2011, Ingo Molnar wrote:

Since this is a PS2 keyboard and mouse driver please name it
i8042.[ch] like the kernel-side does.

that will also make it easier to find the guest-side <-> host-side
pair of guest-driver/host-driver files :-)


Fixed, thanks!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Use vesa reserved space for strings and modes

2011-06-02 Thread Pekka Enberg

On Wed, 1 Jun 2011, Sasha Levin wrote:


As defined in the spec, the reserved space in struct vesa_general_info
should be used to store vesa oem string an a list of possible modes.

Signed-off-by: Sasha Levin 


Francis, could you test this, please? I sent you a copy of the patch 
privately.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] nVMX: Fix bug preventing more than two levels of nesting

2011-06-02 Thread Nadav Har'El
The nested VMX feature is supposed to fully emulate VMX for the guest. This
(theoretically) not only allows it to run its own guests, but also also
to further emulate VMX for its own guests, and allow arbitrarily deep nesting.

This patch fixes a bug (discovered by Kevin Tian) in handling a VMLAUNCH
by L2, which prevented deeper nesting.

Deeper nesting now works (I only actually tested L3), but is currently
*absurdly* slow, to the point of being unusable.

Signed-off-by: Nadav Har'El 
---
 arch/x86/kvm/vmx.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- .before/arch/x86/kvm/vmx.c  2011-06-02 10:46:13.0 +0300
+++ .after/arch/x86/kvm/vmx.c   2011-06-02 10:46:13.0 +0300
@@ -5691,8 +5691,8 @@ static int vmx_handle_exit(struct kvm_vc
if (vmx->nested.nested_run_pending)
kvm_make_request(KVM_REQ_EVENT, vcpu);
 
-   if (exit_reason == EXIT_REASON_VMLAUNCH ||
-   exit_reason == EXIT_REASON_VMRESUME)
+   if (!is_guest_mode(vcpu) && (exit_reason == EXIT_REASON_VMLAUNCH ||
+   exit_reason == EXIT_REASON_VMRESUME))
vmx->nested.nested_run_pending = 1;
else
vmx->nested.nested_run_pending = 0;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM host freezing

2011-06-02 Thread Marc Haber
Hi,

I have just started deploying a host doing virtualization with KVM.
The box has an Athlon 64 X2, 4 GB RAM and is running Debian squeeze
with a locally built 2.6,39 kernel and backported versions of qemu-kvm
(0.14.0) and libvirt (0.9.0) from Debian sid. The box is currently
hosting five VMs, all of them Debian systems as well and rather
unloaded. The only time when there is significant load is when all VMs
are simultaneously starting up their cron jobs.

When the host starts up, it immediately spews the following lines to
the console:

kvm: 2865: cpu0 unhandled rdmsr: 0xc0010048
kvm: 2865: cpu0 unhandled wrmsr: 0xc0010048 data 210401
kvm: 2865: cpu0 unhandled rdmsr: 0xc0010001
kvm: 2849: cpu0 unhandled rdmsr: 0xc0010048
kvm: 2849: cpu0 unhandled wrmsr: 0xc0010048 data c0579f7cc0010448
kvm: 2849: cpu0 unhandled rdmsr: 0xc0010001
kvm: 2950: cpu0 unhandled rdmsr: 0xc0010048
kvm: 2950: cpu0 unhandled wrmsr: 0xc0010048 data c0579f7cc0010448
kvm: 2849: cpu1 unhandled rdmsr: 0xc0010048
kvm: 2963: cpu0 unhandled rdmsr: 0xc0010112
kvm: 2963: cpu0 unhandled rdmsr: 0xc0010048
kvm: 2963: cpu0 unhandled wrmsr: 0xc0010048 data 210401
kvm: 2963: cpu0 unhandled rdmsr: 0xc0010001
kvm: 2963: cpu1 unhandled rdmsr: 0xc0010048
kvm: 2963: cpu1 unhandled wrmsr: 0xc0010048 data 210401

Every few days, the system stops dead in its tracks and needs a hard
reset to be revived. I have a serial console, which unfortunately
disconnects me after a few minutes of inactivity, and only caches the
last few lines of activity. Whenever I connect to the serial console
of the frozen system, I have a few lines of the same "unhandled
(rd|wr)msr" messages.

The syslog doesn't show anything strange. The system just stops dead
in its tracks.

Is there any possibility that the freezes have to do with the
"unhandles (rd|wr)msr" messages? When else could be the cause?

In the mean time, I have taken the box offline and am running memtest.
Up to now, everything seems to be fine.

Any hints will be appreciated.

Greetings
Marc



-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Mannheim, Germany  |  lose things."Winona Ryder | Fon: *49 621 72739834
Nordisch by Nature |  How to make an American Quilt | Fax: *49 3221 2323190
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Add support for PS/2 keyboard system

2011-06-02 Thread Pekka Enberg

On Wed, 1 Jun 2011, Sasha Levin wrote:

From: John Floren 

Add support for PS/2 keyboard system with AUX device (aka mouse).
The device works with vnc, the guest must be started with the
'--vnc' parameter for the device to be initialized.

Signed-off-by: John Floren 
[ turn into patch and clean up code ]
Signed-off-by: Sasha Levin 


I am seeing this in my guest dmesg:

[   24.972609] i8042: PNP: No PS/2 controller found. Probing ports 
directly.

[   25.568982] i8042: Can't read CTR while initializing i8042
[   25.568996] i8042: probe of i8042 failed with error -5
[   25.569102] mousedev: PS/2 mouse device common for all mice

and nothing happens when I try to press some keys in vncviewer. There's a 
small blinking thingy following the mouse which I assume is our guest 
mouse cursor?


Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm tools: Add QCOW level2 caching support

2011-06-02 Thread Prasad Joshi
On Thu, Jun 2, 2011 at 8:28 AM, Ingo Molnar  wrote:
>
> * Prasad Joshi  wrote:
>
>> Summary of performance numbers
>> ==
>> There is not much difference with sequential character operations are
>> performed, the code with caching performed better by small margin. The 
>> caching
>> code performance raised by 12% with sequential block output and dropped by
>> 0.5% with sequential block input. The caching code also suffered with
>> Random seeks and performed badly by 12%. The performance numbers drastically
>> improved with sequential creates (62%) and delete operations (30%).
>
> Looking at the numbers i think it's pretty clear that from this point
> on the quality of IO tests should be improved: Bonnie is too noisy
> and does not cut it anymore for finer enhancements.
>
> To make measurements easier you could also do a simple trick: put
> *all* of the disk image into /dev/shm and add a command-line debug
> option that add a fixed-amount udelay(1000) call every time the code
> reads from the disk image.
>
> This introduces a ~1msec delay and thus simulates IO, but the delays
> are *constant* [make sure you use a high-res timers kernel], so they
> do not result in nearly as much measurement noise as real block IO
> does.
>
> The IO delays will still be there, so any caching advantages (and CPU
> overhead reductions) will be measurable very clearly.
>
> This way you are basically 'emulating' a real disk drive but you will
> emulate uniform latencies, which makes measurements a lot more
> reliable - while still relevant to the end result.
>
> So if under such a measurement model you can prove an improvement
> with a patch, that improvement will be there with real disks as well
> - just harder to prove.
>
> Wanna try this? I really think you are hitting the limits of your
> current measurement methodology and you will be wasting time running
> more measurements instead of saving time doing more intelligent
> measurements ;-)
>

Thanks Ingo, will try this method.

I am not sure how to induce the delay you mentioned. I could vaguely
remember you suggesting something similar few days back. Let me see if
I can find out the correct arguments to use this delay, otherwise will
post a query.

Thanks and Regards,
Prasad

> Thanks,
>
>        Ingo
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC]QEMU disk I/O limits

2011-06-02 Thread Zhi Yong Wu
On Thu, Jun 02, 2011 at 10:15:02AM +0300, Sasha Levin wrote:
>Date: Thu, 02 Jun 2011 10:15:02 +0300
>From: Sasha Levin 
>To: Zhi Yong Wu 
>Cc: kw...@redhat.com, aligu...@us.ibm.com, herb...@gondor.apana.org.au,
>   kvm@vger.kernel.org, guijianf...@cn.fujitsu.com,
>   qemu-de...@nongnu.org, wu...@cn.ibm.com, luow...@cn.ibm.com,
>   zh...@cn.ibm.com, zhaoy...@cn.ibm.com, l...@redhat.com,
>   rahar...@us.ibm.com, vgo...@redhat.com, stefa...@linux.vnet.ibm.com
>Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>X-Mailer: Evolution 2.32.2 
>
>On Thu, 2011-06-02 at 14:29 +0800, Zhi Yong Wu wrote:
>> On Thu, Jun 02, 2011 at 09:17:06AM +0300, Sasha Levin wrote:
>> >Date: Thu, 02 Jun 2011 09:17:06 +0300
>> >From: Sasha Levin 
>> >To: Zhi Yong Wu 
>> >Cc: qemu-de...@nongnu.org, kvm@vger.kernel.org, kw...@redhat.com,
>> >aligu...@us.ibm.com, herb...@gondor.apana.org.au,
>> >guijianf...@cn.fujitsu.com, wu...@cn.ibm.com, luow...@cn.ibm.com,
>> >zh...@cn.ibm.com, zhaoy...@cn.ibm.com, l...@redhat.com,
>> >rahar...@us.ibm.com, vgo...@redhat.com, stefa...@linux.vnet.ibm.com
>> >Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>> >X-Mailer: Evolution 2.32.2 
>> >
>> >Hi,
>> >
>> >On Mon, 2011-05-30 at 13:09 +0800, Zhi Yong Wu wrote:
>> >> Hello, all,
>> >> 
>> >> I have prepared to work on a feature called "Disk I/O limits" for 
>> >> qemu-kvm projeect.
>> >> This feature will enable the user to cap disk I/O amount performed by 
>> >> a VM.It is important for some storage resources to be shared among 
>> >> multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, 
>> >> they will hurt the performance of other VMs.
>> >> 
>> >> More detail is available here:
>> >> http://wiki.qemu.org/Features/DiskIOLimits
>> >> 
>> >> 1.) Why we need per-drive disk I/O limits 
>> >> As you've known, for linux, cgroup blkio-controller has supported I/O 
>> >> throttling on block devices. More importantly, there is no single 
>> >> mechanism for disk I/O throttling across all underlying storage types 
>> >> (image file, LVM, NFS, Ceph) and for some types there is no way to 
>> >> throttle at all. 
>> >> 
>> >> Disk I/O limits feature introduces QEMU block layer I/O limits 
>> >> together with command-line and QMP interfaces for configuring limits. 
>> >> This allows I/O limits to be imposed across all underlying storage types 
>> >> using a single interface.
>> >> 
>> >> 2.) How disk I/O limits will be implemented
>> >> QEMU block layer will introduce a per-drive disk I/O request queue 
>> >> for those disks whose "disk I/O limits" feature is enabled. It can 
>> >> control disk I/O limits individually for each disk when multiple disks 
>> >> are attached to a VM, and enable use cases like unlimited local disk 
>> >> access but shared storage access with limits. 
>> >> In mutliple I/O threads scenario, when an application in a VM issues 
>> >> a block I/O request, this request will be intercepted by QEMU block 
>> >> layer, then it will calculate disk runtime I/O rate and determine if it 
>> >> has go beyond its limits. If yes, this I/O request will enqueue to that 
>> >> introduced queue; otherwise it will be serviced.
>> >> 
>> >> 3.) How the users enable and play with it
>> >> QEMU -drive option will be extended so that disk I/O limits can be 
>> >> specified on its command line, such as -drive [iops=xxx,][throughput=xxx] 
>> >> or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this 
>> >> argument is specified, it means that "disk I/O limits" feature is enabled 
>> >> for this drive disk.
>> >> The feature will also provide users with the ability to change 
>> >> per-drive disk I/O limits at runtime using QMP commands.
>> >
>> >I'm wondering if you've considered adding a 'burst' parameter -
>> >something which will not limit (or limit less) the io ops or the
>> >throughput for the first 'x' ms in a given time window.
>> Currently no, Do you let us know what scenario it will make sense to?
>
>My assumption is that most guests are not doing constant disk I/O
>access. Instead, the operations are usually short and happen on small
>scale (relatively small amount of bytes accessed).
>
>For example: Multiple table DB lookup, serving a website, file servers.
>
>Basically, if I need to do a DB lookup which needs 50MB of data from a
>disk which is limited to 10MB/s, I'd rather let it burst for 1 second
>and complete the lookup faster instead of having it read data for 5
>seconds.
>
>If the guest now starts running multiple lookups one after the other,
>thats when I would like to limit.
HI, Sasha,

If iops or bps parameters are not specified to -drive, it will not limit this 
disk I/O rate. Of course, QMP commands will be extended to support changing or 
disabling disk I/O limits at runtime. If you'd like not limit a disk I/O rate, 
you can use it to disabled this feature.

I don't make sure that this is the right answer for your qu

Re: [PATCH 31/31] nVMX: Documentation

2011-06-02 Thread Nadav Har'El
On Wed, Jun 01, 2011, Jan Kiszka wrote about "Re: [PATCH 31/31] nVMX: 
Documentation":
> >  Documentation/kvm/nested-vmx.txt |  251 +
> 
> This needs to go to Documentation/virtual/kvm.

Oops, I guess the rug moved under my feet while I was preparing this patch
set :-)

Avi, Marcelo, I don't know how to send a patch that only renames a file
(and deletes the superfluous Documentation/kvm directory) - so can you
please fix this directly in your tree?

Thanks,
Nadav.

-- 
Nadav Har'El| Thursday, Jun  2 2011, 29 Iyyar 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |Bore, n.: A person who talks when you
http://nadav.harel.org.il   |wish him to listen.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm tools: Add QCOW level2 caching support

2011-06-02 Thread Ingo Molnar

* Prasad Joshi  wrote:

> Summary of performance numbers
> ==
> There is not much difference with sequential character operations are
> performed, the code with caching performed better by small margin. The caching
> code performance raised by 12% with sequential block output and dropped by
> 0.5% with sequential block input. The caching code also suffered with
> Random seeks and performed badly by 12%. The performance numbers drastically
> improved with sequential creates (62%) and delete operations (30%).

Looking at the numbers i think it's pretty clear that from this point 
on the quality of IO tests should be improved: Bonnie is too noisy 
and does not cut it anymore for finer enhancements.

To make measurements easier you could also do a simple trick: put 
*all* of the disk image into /dev/shm and add a command-line debug 
option that add a fixed-amount udelay(1000) call every time the code 
reads from the disk image.

This introduces a ~1msec delay and thus simulates IO, but the delays 
are *constant* [make sure you use a high-res timers kernel], so they 
do not result in nearly as much measurement noise as real block IO 
does.

The IO delays will still be there, so any caching advantages (and CPU 
overhead reductions) will be measurable very clearly.

This way you are basically 'emulating' a real disk drive but you will 
emulate uniform latencies, which makes measurements a lot more 
reliable - while still relevant to the end result.

So if under such a measurement model you can prove an improvement 
with a patch, that improvement will be there with real disks as well 
- just harder to prove.

Wanna try this? I really think you are hitting the limits of your 
current measurement methodology and you will be wasting time running 
more measurements instead of saving time doing more intelligent 
measurements ;-)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Add support for PS/2 keyboard system

2011-06-02 Thread Ingo Molnar

* Sasha Levin  wrote:

> From: John Floren 
> 
> Add support for PS/2 keyboard system with AUX device (aka mouse).
> The device works with vnc, the guest must be started with the
> '--vnc' parameter for the device to be initialized.
> 
> Signed-off-by: John Floren 
> [ turn into patch and clean up code ]
> Signed-off-by: Sasha Levin 
> ---
>  tools/kvm/Makefile|1 +
>  tools/kvm/hw/pckbd.c  |  475 
> +

Since this is a PS2 keyboard and mouse driver please name it 
i8042.[ch] like the kernel-side does.

that will also make it easier to find the guest-side <-> host-side 
pair of guest-driver/host-driver files :-)

>  tools/kvm/hw/vesa.c   |3 +
>  tools/kvm/include/kvm/pckbd.h |   19 ++
>  tools/kvm/ioport.c|4 -
>  tools/kvm/kvm-run.c   |5 +-
>  6 files changed, 502 insertions(+), 5 deletions(-)
>  create mode 100644 tools/kvm/hw/pckbd.c
>  create mode 100644 tools/kvm/include/kvm/pckbd.h

In general the code looks very clean to me!

Found two minor nits:

> + ret = state.kq[state.kread++ % QUEUE_SIZE];
> + ret = state.mq[state.mread++ % QUEUE_SIZE];

Please increment the index explicitly, on a separate line - the 
permanent side-effect to the keyboard state is not obvious at first 
sight.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC]QEMU disk I/O limits

2011-06-02 Thread Sasha Levin
On Thu, 2011-06-02 at 14:29 +0800, Zhi Yong Wu wrote:
> On Thu, Jun 02, 2011 at 09:17:06AM +0300, Sasha Levin wrote:
> >Date: Thu, 02 Jun 2011 09:17:06 +0300
> >From: Sasha Levin 
> >To: Zhi Yong Wu 
> >Cc: qemu-de...@nongnu.org, kvm@vger.kernel.org, kw...@redhat.com,
> > aligu...@us.ibm.com, herb...@gondor.apana.org.au,
> > guijianf...@cn.fujitsu.com, wu...@cn.ibm.com, luow...@cn.ibm.com,
> > zh...@cn.ibm.com, zhaoy...@cn.ibm.com, l...@redhat.com,
> > rahar...@us.ibm.com, vgo...@redhat.com, stefa...@linux.vnet.ibm.com
> >Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
> >X-Mailer: Evolution 2.32.2 
> >
> >Hi,
> >
> >On Mon, 2011-05-30 at 13:09 +0800, Zhi Yong Wu wrote:
> >> Hello, all,
> >> 
> >> I have prepared to work on a feature called "Disk I/O limits" for 
> >> qemu-kvm projeect.
> >> This feature will enable the user to cap disk I/O amount performed by 
> >> a VM.It is important for some storage resources to be shared among 
> >> multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, 
> >> they will hurt the performance of other VMs.
> >> 
> >> More detail is available here:
> >> http://wiki.qemu.org/Features/DiskIOLimits
> >> 
> >> 1.) Why we need per-drive disk I/O limits 
> >> As you've known, for linux, cgroup blkio-controller has supported I/O 
> >> throttling on block devices. More importantly, there is no single 
> >> mechanism for disk I/O throttling across all underlying storage types 
> >> (image file, LVM, NFS, Ceph) and for some types there is no way to 
> >> throttle at all. 
> >> 
> >> Disk I/O limits feature introduces QEMU block layer I/O limits 
> >> together with command-line and QMP interfaces for configuring limits. This 
> >> allows I/O limits to be imposed across all underlying storage types using 
> >> a single interface.
> >> 
> >> 2.) How disk I/O limits will be implemented
> >> QEMU block layer will introduce a per-drive disk I/O request queue for 
> >> those disks whose "disk I/O limits" feature is enabled. It can control 
> >> disk I/O limits individually for each disk when multiple disks are 
> >> attached to a VM, and enable use cases like unlimited local disk access 
> >> but shared storage access with limits. 
> >> In mutliple I/O threads scenario, when an application in a VM issues a 
> >> block I/O request, this request will be intercepted by QEMU block layer, 
> >> then it will calculate disk runtime I/O rate and determine if it has go 
> >> beyond its limits. If yes, this I/O request will enqueue to that 
> >> introduced queue; otherwise it will be serviced.
> >> 
> >> 3.) How the users enable and play with it
> >> QEMU -drive option will be extended so that disk I/O limits can be 
> >> specified on its command line, such as -drive [iops=xxx,][throughput=xxx] 
> >> or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this 
> >> argument is specified, it means that "disk I/O limits" feature is enabled 
> >> for this drive disk.
> >> The feature will also provide users with the ability to change 
> >> per-drive disk I/O limits at runtime using QMP commands.
> >
> >I'm wondering if you've considered adding a 'burst' parameter -
> >something which will not limit (or limit less) the io ops or the
> >throughput for the first 'x' ms in a given time window.
> Currently no, Do you let us know what scenario it will make sense to?

My assumption is that most guests are not doing constant disk I/O
access. Instead, the operations are usually short and happen on small
scale (relatively small amount of bytes accessed).

For example: Multiple table DB lookup, serving a website, file servers.

Basically, if I need to do a DB lookup which needs 50MB of data from a
disk which is limited to 10MB/s, I'd rather let it burst for 1 second
and complete the lookup faster instead of having it read data for 5
seconds.

If the guest now starts running multiple lookups one after the other,
thats when I would like to limit.

> Regards,
> 
> Zhiyong Wu
> >
> >> Regards,
> >> 
> >> Zhiyong Wu
> >> 
> >
> >-- 
> >
> >Sasha.
> >

-- 

Sasha.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html