[PATCH] KVM: MMU: initialize sptes early

2011-10-24 Thread Zhao Jin
Otherwise, the following kvm_sync_pages() will see invalid sptes in a new
shadow page.

Signed-off-by: Zhao Jin crono...@gmail.com
---
 arch/x86/kvm/mmu.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8e8da79..d7e1694 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1692,6 +1692,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
return sp;
sp-gfn = gfn;
sp-role = role;
+   init_shadow_page_table(sp);
hlist_add_head(sp-hash_link,
vcpu-kvm-arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]);
if (!direct) {
@@ -1702,7 +1703,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
 
account_shadowed(vcpu-kvm, gfn);
}
-   init_shadow_page_table(sp);
trace_kvm_mmu_get_page(sp, true);
return sp;
 }
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: MMU: fix the condition of syncing a new shadow page

2011-10-24 Thread Zhao Jin
Should be or since a new shadow page is synced if either it is
not leaf or there already exists another unsync shadow page with 
the same gfn.

Signed-off-by: Zhao Jin crono...@gmail.com
---
 arch/x86/kvm/mmu.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d7e1694..f36de41 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1698,7 +1698,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (!direct) {
if (rmap_write_protect(vcpu-kvm, gfn))
kvm_flush_remote_tlbs(vcpu-kvm);
-   if (level  PT_PAGE_TABLE_LEVEL  need_sync)
+   if (level  PT_PAGE_TABLE_LEVEL || need_sync)
kvm_sync_pages(vcpu, gfn);
 
account_shadowed(vcpu-kvm, gfn);
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: fix incorrect operand

2011-10-24 Thread Zhao Jin
Should test save-ar for access rights.

Signed-off-by: Zhao Jin crono...@gmail.com
---
 arch/x86/kvm/vmx.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e65a158..62086da 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2544,7 +2544,7 @@ static void fix_pmode_dataseg(int seg, struct 
kvm_save_segment *save)
 {
struct kvm_vmx_segment_field *sf = kvm_vmx_segment_fields[seg];
 
-   if (vmcs_readl(sf-base) == save-base  (save-base  AR_S_MASK)) {
+   if (vmcs_readl(sf-base) == save-base  (save-ar  AR_S_MASK)) {
vmcs_write16(sf-selector, save-selector);
vmcs_writel(sf-base, save-base);
vmcs_write32(sf-limit, save-limit);
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qcow2 eating up space when formattng Centos

2011-10-24 Thread Philipp Hahn
Hello Anonnymous,

On Monday 24 October 2011 03:22:15 day knight wrote:
 I am not sure if this is the right behaviour but qcow2 image seems to
 grow when Centos is only formatting the image. I mean it goes upto 30
 Gig once evrything is formatted and installed and it is minimal Centos
 install with no gui or apps just baseline.

 OS = Centos5
 Virtualization: KVM
 Total Qcow2 Image Created = 1TB

 Once Centos is installed the qcow2 image shows as around 30 Gig. I
 have done several install and they were all less than 4 gig or even
 lesser but this seems to not make sense

 Can someone please explain what is going on?

You didn't tell which file system you're using. ext3 needs to initialize its 
meta data (super blocks, inote tables), which are scattered all over the 
image. Depending on your cluster size fpr the qcow2 file, each (small) write 
takes the space of a full cluster. Add to that the meta-data needed by qcow2 
itself (a two-level tree), 30 GiB seem to be okay.

You might want to try ext4 with delayed allocation (IMHO enabled by 
default), which doesn't write all over the range, since initialization of the 
superblock and inode tables is delayed until they are acually needed.

Sincerely
Philipp
-- 
Philipp Hahn   Open Source Software Engineer  h...@univention.de
Univention GmbHLinux for Your Businessfon: +49 421 22 232- 0
Mary-Somerville-Str.1  D-28359 Bremen fax: +49 421 22 232-99
   http://www.univention.de/


signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] KVM: MMU: initialize sptes early

2011-10-24 Thread Xiao Guangrong
On 2011/10/24 15:21, Zhao Jin wrote:
 Otherwise, the following kvm_sync_pages() will see invalid sptes in a new
 shadow page.
 

No, kvm_sync_pages just handle the unsync page, but the new sp is the sync page.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: fix the condition of syncing a new shadow page

2011-10-24 Thread Xiao Guangrong
On 2011/10/24 15:21, Zhao Jin wrote:
 Should be or since a new shadow page is synced if either it is
 not leaf or there already exists another unsync shadow page with 
 the same gfn.
 

It is obviously wrong, we need to sync pages only if it has unsync page
*and* the new shadow page breaks the unsync rule(only the level 1 sp can
became unsync).
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] kvm tools: Simplify msi message handling

2011-10-24 Thread Sasha Levin
This patch simplifies passing around msi messages by using
'struct kvm_irq_routing_msi' for storing of msi messages instead
of passing all msi parameters around.

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/hw/pci-shmem.c|5 +
 tools/kvm/include/kvm/irq.h |4 +++-
 tools/kvm/include/kvm/pci.h |7 +++
 tools/kvm/irq.c |8 
 tools/kvm/virtio/pci.c  |   10 ++
 5 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/tools/kvm/hw/pci-shmem.c b/tools/kvm/hw/pci-shmem.c
index 2907a66..780a377 100644
--- a/tools/kvm/hw/pci-shmem.c
+++ b/tools/kvm/hw/pci-shmem.c
@@ -124,10 +124,7 @@ int pci_shmem__get_local_irqfd(struct kvm *kvm)
return fd;
 
if (pci_shmem_pci_device.msix.ctrl  PCI_MSIX_FLAGS_ENABLE) {
-   gsi = irq__add_msix_route(kvm,
- msix_table[0].low,
- msix_table[0].high,
- msix_table[0].data);
+   gsi = irq__add_msix_route(kvm, msix_table[0].msg);
} else {
gsi = pci_shmem_pci_device.irq_line;
}
diff --git a/tools/kvm/include/kvm/irq.h b/tools/kvm/include/kvm/irq.h
index 401bee9..61f593d 100644
--- a/tools/kvm/include/kvm/irq.h
+++ b/tools/kvm/include/kvm/irq.h
@@ -4,6 +4,8 @@
 #include linux/types.h
 #include linux/rbtree.h
 #include linux/list.h
+#include linux/kvm.h
+#include linux/msi.h
 
 struct kvm;
 
@@ -24,6 +26,6 @@ int irq__register_device(u32 dev, u8 *num, u8 *pin, u8 *line);
 struct rb_node *irq__get_pci_tree(void);
 
 void irq__init(struct kvm *kvm);
-int irq__add_msix_route(struct kvm *kvm, u32 low, u32 high, u32 data);
+int irq__add_msix_route(struct kvm *kvm, struct msi_msg *msg);
 
 #endif
diff --git a/tools/kvm/include/kvm/pci.h b/tools/kvm/include/kvm/pci.h
index 5ee8005..f71af0b 100644
--- a/tools/kvm/include/kvm/pci.h
+++ b/tools/kvm/include/kvm/pci.h
@@ -2,8 +2,9 @@
 #define KVM__PCI_H
 
 #include linux/types.h
-
+#include linux/kvm.h
 #include linux/pci_regs.h
+#include linux/msi.h
 
 /*
  * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1.
@@ -26,9 +27,7 @@ struct pci_config_address {
 };
 
 struct msix_table {
-   u32 low;
-   u32 high;
-   u32 data;
+   struct msi_msg msg;
u32 ctrl;
 };
 
diff --git a/tools/kvm/irq.c b/tools/kvm/irq.c
index e35bf18..dc2247e 100644
--- a/tools/kvm/irq.c
+++ b/tools/kvm/irq.c
@@ -167,7 +167,7 @@ void irq__init(struct kvm *kvm)
die(Failed setting GSI routes);
 }
 
-int irq__add_msix_route(struct kvm *kvm, u32 low, u32 high, u32 data)
+int irq__add_msix_route(struct kvm *kvm, struct msi_msg *msg)
 {
int r;
 
@@ -175,9 +175,9 @@ int irq__add_msix_route(struct kvm *kvm, u32 low, u32 high, 
u32 data)
(struct kvm_irq_routing_entry) {
.gsi = gsi,
.type = KVM_IRQ_ROUTING_MSI,
-   .u.msi.address_lo = low,
-   .u.msi.address_hi = high,
-   .u.msi.data = data,
+   .u.msi.address_hi = msg-address_hi,
+   .u.msi.address_lo = msg-address_lo,
+   .u.msi.data = msg-data,
};
 
r = ioctl(kvm-vm_fd, KVM_SET_GSI_ROUTING, irq_routing);
diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c
index f01851b..73d55a9 100644
--- a/tools/kvm/virtio/pci.c
+++ b/tools/kvm/virtio/pci.c
@@ -126,20 +126,14 @@ static bool virtio_pci__specific_io_out(struct kvm *kvm, 
struct virtio_pci *vpci
case VIRTIO_MSI_CONFIG_VECTOR:
vec = vpci-config_vector = ioport__read16(data);
 
-   gsi = irq__add_msix_route(kvm,
- vpci-msix_table[vec].low,
- vpci-msix_table[vec].high,
- vpci-msix_table[vec].data);
+   gsi = irq__add_msix_route(kvm, 
vpci-msix_table[vec].msg);
 
vpci-config_gsi = gsi;
break;
case VIRTIO_MSI_QUEUE_VECTOR: {
vec = vpci-vq_vector[vpci-queue_selector] = 
ioport__read16(data);
 
-   gsi = irq__add_msix_route(kvm,
- vpci-msix_table[vec].low,
- vpci-msix_table[vec].high,
- vpci-msix_table[vec].data);
+   gsi = irq__add_msix_route(kvm, 
vpci-msix_table[vec].msg);
vpci-gsis[vpci-queue_selector] = gsi;
break;
}
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo 

Re: [PATCH] KVM: MMU: initialize sptes early

2011-10-24 Thread Zhao Jin
2011/10/24 Xiao Guangrong xiao.guangr...@qq.com:
 On 2011/10/24 15:21, Zhao Jin wrote:
 Otherwise, the following kvm_sync_pages() will see invalid sptes in a new
 shadow page.


 No, kvm_sync_pages just handle the unsync page, but the new sp is the sync 
 page.


Sorry, I didn't notice the sp itself was zero-ed when allocated hence
was considered as synced. Please ignore this patch.
Thanks for the remainder.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 4/5] kvm guest : Added configuration support to enable debug information for KVM Guests

2011-10-24 Thread Sasha Levin
On Mon, 2011-10-24 at 00:37 +0530, Raghavendra K T wrote:
 Added configuration support to enable debug information
 for KVM Guests in debugfs
 
 Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
 Signed-off-by: Suzuki Poulose suz...@in.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
 diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
 index 1f03f82..ed34269 100644
 --- a/arch/x86/Kconfig
 +++ b/arch/x86/Kconfig
 @@ -562,6 +562,15 @@ config KVM_GUEST
 This option enables various optimizations for running under the KVM
 hypervisor.
  
 +config KVM_DEBUG_FS
 + bool Enable debug information for KVM Guests in debugfs
 + depends on KVM_GUEST

Shouldn't it depend on DEBUG_FS as well?

 + default n
 + ---help---
 +   This option enables collection of various statistics for KVM guest.
 +   Statistics are displayed in debugfs filesystem. Enabling this option
 +   may incur significant overhead.
 +
  source arch/x86/lguest/Kconfig
  
  config PARAVIRT

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 3/5] kvm hypervisor : Add two hypercalls to support pv-ticketlock

2011-10-24 Thread Sasha Levin
On Mon, 2011-10-24 at 00:35 +0530, Raghavendra K T wrote:
 Add two hypercalls to KVM hypervisor to support pv-ticketlocks.
 
 KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or it
 is woken up because of an event like interrupt.

 KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu.
 
 The presence of these hypercalls is indicated to guest via
 KVM_FEATURE_WAIT_FOR_KICK/KVM_CAP_WAIT_FOR_KICK.
 
 Qemu needs a corresponding patch to pass up the presence of this feature to 
 guest via cpuid. Patch to qemu will be sent separately.
 
 There is no Xen/KVM hypercall interface to await kick from.
 
 Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
 Signed-off-by: Suzuki Poulose suz...@in.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
 diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
 index 734c376..2874c19 100644
 --- a/arch/x86/include/asm/kvm_para.h
 +++ b/arch/x86/include/asm/kvm_para.h
 @@ -16,12 +16,14 @@
  #define KVM_FEATURE_CLOCKSOURCE  0
  #define KVM_FEATURE_NOP_IO_DELAY 1
  #define KVM_FEATURE_MMU_OP   2
 +
  /* This indicates that the new set of kvmclock msrs
   * are available. The use of 0x11 and 0x12 is deprecated
   */
  #define KVM_FEATURE_CLOCKSOURCE23
  #define KVM_FEATURE_ASYNC_PF 4
  #define KVM_FEATURE_STEAL_TIME   5
 +#define KVM_FEATURE_WAIT_FOR_KICK   6
  
  /* The last 8 bits are used to indicate how to interpret the flags field
   * in pvclock structure. If no bits are set, all flags are ignored.
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 84a28ea..b43fd18 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -2077,6 +2077,7 @@ int kvm_dev_ioctl_check_extension(long ext)
   case KVM_CAP_XSAVE:
   case KVM_CAP_ASYNC_PF:
   case KVM_CAP_GET_TSC_KHZ:
 + case KVM_CAP_WAIT_FOR_KICK:
   r = 1;
   break;
   case KVM_CAP_COALESCED_MMIO:
 @@ -2548,7 +2549,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 
 *entry, u32 function,
(1  KVM_FEATURE_NOP_IO_DELAY) |
(1  KVM_FEATURE_CLOCKSOURCE2) |
(1  KVM_FEATURE_ASYNC_PF) |
 -  (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
 +  (1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
 +  (1  KVM_FEATURE_WAIT_FOR_KICK);
  
   if (sched_info_on())
   entry-eax |= (1  KVM_FEATURE_STEAL_TIME);
 @@ -5231,6 +5233,61 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
   return 1;
  }
  
 +/*
 + * kvm_pv_wait_for_kick_op : Block until kicked by either a KVM_HC_KICK_CPU
 + * hypercall or a event like interrupt.
 + *
 + * @vcpu : vcpu which is blocking.
 + */
 +static void kvm_pv_wait_for_kick_op(struct kvm_vcpu *vcpu)
 +{
 + DEFINE_WAIT(wait);
 +
 + /*
 +  * Blocking on vcpu-wq allows us to wake up sooner if required to
 +  * service pending events (like interrupts).
 +  *
 +  * Also set state to TASK_INTERRUPTIBLE before checking vcpu-kicked to
 +  * avoid racing with kvm_pv_kick_cpu_op().
 +  */
 + prepare_to_wait(vcpu-wq, wait, TASK_INTERRUPTIBLE);
 +
 + /*
 +  * Somebody has already tried kicking us. Acknowledge that
 +  * and terminate the wait.
 +  */
 + if (vcpu-kicked) {
 + vcpu-kicked = 0;
 + goto end_wait;
 + }
 +
 + /* Let's wait for either KVM_HC_KICK_CPU or someother event
 +  * to wake us up.
 +  */
 +
 + srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx);
 + schedule();
 + vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu);
 +
 +end_wait:
 + finish_wait(vcpu-wq, wait);
 +}
 +
 +/*
 + * kvm_pv_kick_cpu_op:  Kick a vcpu.
 + *
 + * @cpu - vcpu to be kicked.
 + */
 +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int cpu)
 +{
 + struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, cpu);
 +
 + if (vcpu) {
 + vcpu-kicked = 1;

I'm not sure about it, but maybe we want a memory barrier over here?

 + wake_up_interruptible(vcpu-wq);
 + }
 +}
-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 5/5] kvm guest : pv-ticketlocks support for linux guests running on KVM hypervisor

2011-10-24 Thread Sasha Levin
On Mon, 2011-10-24 at 00:37 +0530, Raghavendra K T wrote:
 This patch extends Linux guests running on KVM hypervisor to support
 pv-ticketlocks. Very early during bootup, paravirtualied KVM guest detects if 
 the hypervisor has required feature (KVM_FEATURE_WAIT_FOR_KICK) to support 
 pv-ticketlocks. If so, support for pv-ticketlocks is registered via 
 pv_lock_ops.
 
 Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
 Signed-off-by: Suzuki Poulose suz...@in.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
 diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
 index 2874c19..c7f34b7 100644
 --- a/arch/x86/include/asm/kvm_para.h
 +++ b/arch/x86/include/asm/kvm_para.h
 @@ -195,10 +195,18 @@ void kvm_async_pf_task_wait(u32 token);
  void kvm_async_pf_task_wake(u32 token);
  u32 kvm_read_and_reset_pf_reason(void);
  extern void kvm_disable_steal_time(void);
 -#else
 +
 +#ifdef CONFIG_PARAVIRT_SPINLOCKS
 +void __init kvm_guest_early_init(void);
 +#else /* CONFIG_PARAVIRT_SPINLOCKS */
 +#define kvm_guest_early_init() do { } while (0)

This should be defined as an empty function.

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: fix the condition of syncing a new shadow page

2011-10-24 Thread Zhao Jin
2011/10/24 Xiao Guangrong xiao.guangr...@qq.com:
 On 2011/10/24 15:21, Zhao Jin wrote:
 Should be or since a new shadow page is synced if either it is
 not leaf or there already exists another unsync shadow page with
 the same gfn.


 It is obviously wrong, we need to sync pages only if it has unsync page
 *and* the new shadow page breaks the unsync rule(only the level 1 sp can
 became unsync).


Please ignore this patch as I had taken an incorrect assumption.
Thanks very much for the correction.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC v2 PATCH 5/4 PATCH] virtio-net: send gratuitous packet when needed

2011-10-24 Thread Ben Hutchings
On Mon, 2011-10-24 at 07:25 +0200, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 02:54:59PM +1030, Rusty Russell wrote:
  On Sat, 22 Oct 2011 13:43:11 +0800, Jason Wang jasow...@redhat.com wrote:
   This make let virtio-net driver can send gratituous packet by a new
   config bit - VIRTIO_NET_S_ANNOUNCE in each config update
   interrupt. When this bit is set by backend, the driver would schedule
   a workqueue to send gratituous packet through NETDEV_NOTIFY_PEERS.
   
   This feature is negotiated through bit VIRTIO_NET_F_GUEST_ANNOUNCE.
   
   Signed-off-by: Jason Wang jasow...@redhat.com
  
  This seems like a huge layering violation.  Imagine this in real
  hardware, for example.
 
 commits 06c4648d46d1b757d6b9591a86810be79818b60c
 and 99606477a5888b0ead0284fecb13417b1da8e3af
 document the need for this:
 
 NETDEV_NOTIFY_PEERS notifier indicates that a device moved to a 
 different physical link.
   and
 In real hardware such notifications are only
 generated when the device comes up or the address changes.
 
 So hypervisor could get the same behaviour by sending link up/down
 events, this is just an optimization so guest won't do
 unecessary stuff like try to reconfigure an IP address.
 
 
 Maybe LOCATION_CHANGE would be a better name?
[...]

We also use this in bonding failover, where the system location doesn't
change but a different link is used.  However, I do recognise that the
name ought to indicate what kind of change happened and not what the
expected action is.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 2/5] debugfs: Renaming of xen functions and change unsigned to u32

2011-10-24 Thread Raghavendra K T

On 10/24/2011 03:49 AM, Greg KH wrote:

On Mon, Oct 24, 2011 at 12:34:59AM +0530, Raghavendra K T wrote:

Renaming of xen functions and change unsigned to u32.


Why not just rename when you move the functions?  Why the extra step?

Intention was only clarity. Yes, if this patch is an overhead, I 'll 
combine both the patches.

greg k-h



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 1/5] debugfs: Add support to print u32 array in debugfs

2011-10-24 Thread Raghavendra K T

On 10/24/2011 03:50 AM, Greg KH wrote:

On Mon, Oct 24, 2011 at 12:34:04AM +0530, Raghavendra K T wrote:

Add debugfs support to print u32-arrays in debugfs. Move the code from Xen to 
debugfs
to make the code common for other users as well.


You forgot the kerneldoc for the function explaining what it is and how
to use it, and the EXPORT_SYMBOL_GPL() marking for the global function
as that's the only way it will be able to be used, right?


Greg right. Thanks for finding this. I 'll update the patch for that.

thanks,

greg k-h



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 5/5] kvm guest : pv-ticketlocks support for linux guests running on KVM hypervisor

2011-10-24 Thread Raghavendra K T

On 10/24/2011 03:31 PM, Sasha Levin wrote:

On Mon, 2011-10-24 at 00:37 +0530, Raghavendra K T wrote:

+#else /* CONFIG_PARAVIRT_SPINLOCKS */
+#define kvm_guest_early_init() do { } while (0)


This should be defined as an empty function.


Yes Agree, I 'll change to an empty function.
- Raghu

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 2/5] debugfs: Renaming of xen functions and change unsigned to u32

2011-10-24 Thread Greg KH
On Mon, Oct 24, 2011 at 02:58:47PM +0530, Raghavendra K T wrote:
 On 10/24/2011 03:49 AM, Greg KH wrote:
 On Mon, Oct 24, 2011 at 12:34:59AM +0530, Raghavendra K T wrote:
 Renaming of xen functions and change unsigned to u32.
 
 Why not just rename when you move the functions?  Why the extra step?
 
 Intention was only clarity. Yes, if this patch is an overhead, I 'll
 combine both the patches.

Yeah, it makes more sense as it originally confused me why you were
adding a xen_* function to the debugfs core code :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Avi Kivity
On 10/21/2011 11:19 AM, Jan Kiszka wrote:
 Currently, MSI messages can only be injected to in-kernel irqchips by
 defining a corresponding IRQ route for each message. This is not only
 unhandy if the MSI messages are generated on the fly by user space,
 IRQ routes are a limited resource that user space as to manage
 carefully.

By itself, this does not provide enough value to offset the cost of a
new ABI, especially as userspace will need to continue supporting the
old method for a very long while.

 By providing a direct injection with, we can both avoid using up limited
 resources and simplify the necessary steps for user land. The API
 already provides a channel (flags) to revoke an injected but not yet
 delivered message which will become important for in-kernel MSI-X vector
 masking support.


With the new feature it may be worthwhile, but I'd like to see the whole
thing, with numbers attached.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index

2011-10-24 Thread Michael S. Tsirkin
On Wed, Oct 19, 2011 at 12:12:20PM +0200, Michael S. Tsirkin wrote:
 On Thu, Jun 09, 2011 at 06:41:56AM -0400, Mark Wu wrote:
  On 06/09/2011 05:14 AM, Tejun Heo wrote:
   Hello,
   
   On Thu, Jun 09, 2011 at 08:51:05AM +0930, Rusty Russell wrote:
   On Wed, 08 Jun 2011 09:08:29 -0400, Mark Wu d...@redhat.com wrote:
   Hi Rusty,
   Yes, I can't figure out an instance of disk probing in parallel either, 
   but as
   per the following commit, I think we still need use lock for safety. 
   What's your opinion?
  
   commit 4034cc68157bfa0b6622efe368488d3d3e20f4e6
   Author: Tejun Heo t...@kernel.org
   Date:   Sat Feb 21 11:04:45 2009 +0900
  
   [SCSI] sd: revive sd_index_lock
  
   Commit f27bac2761cab5a2e212dea602d22457a9aa6943 which converted sd 
   to
   use ida instead of idr incorrectly removed sd_index_lock around id
   allocation and free.  idr/ida do have internal locks but they 
   protect
   their free object lists not the allocation itself.  The caller is
   responsible for that.  This missing synchronization led to the same 
   id
   being assigned to multiple devices leading to oops.
  
   I'm confused.  Tejun, Greg, anyone can probes happen in parallel?
  
   If so, I'll have to review all my drivers.
   
   Unless async is explicitly used, probe happens sequentially.  IOW, if
   there's no async_schedule() call, things won't happen in parallel.
   That said, I think it wouldn't be such a bad idea to protect ida with
   spinlock regardless unless the probe code explicitly requires
   serialization.
   
   Thanks.
   
  Since virtio blk driver doesn't use async probe, it needn't use spinlock to 
  protect ida.
  So remove the lock from patch.
  
  From fbb396df9dbf8023f1b268be01b43529a3993d57 Mon Sep 17 00:00:00 2001
  From: Mark Wu d...@redhat.com
  Date: Thu, 9 Jun 2011 06:34:07 -0400
  Subject: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index
  
  Current index allocation in virtio-blk is based on a monotonically
  increasing variable index. It could cause some confusion about disk
  name in the case of hot-plugging disks. And it's impossible to find the
  lowest available index by just maintaining a simple index. So it's
  changed to use ida to allocate index via referring to the index
  allocation in scsi disk.
  
  Signed-off-by: Mark Wu d...@redhat.com
 
 Acked-by: Michael S. Tsirkin m...@redhat.com
 
 This got lost in the noise and missed 3.1 which is unfortunate.
 How about we apply this as is and look at cleanups as a next step?

Rusty, any opinion on merging this for 3.2?
I expect merge window will open right after the summit,
so need to decide soon ...

  ---
   drivers/block/virtio_blk.c |   28 +++-
   1 files changed, 23 insertions(+), 5 deletions(-)
  
  diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
  index 079c088..bf81ab6 100644
  --- a/drivers/block/virtio_blk.c
  +++ b/drivers/block/virtio_blk.c
  @@ -8,10 +8,13 @@
   #include linux/scatterlist.h
   #include linux/string_helpers.h
   #include scsi/scsi_cmnd.h
  +#include linux/idr.h
   
   #define PART_BITS 4
   
  -static int major, index;
  +static int major;
  +static DEFINE_IDA(vd_index_ida);
  +
   struct workqueue_struct *virtblk_wq;
   
   struct virtio_blk
  @@ -23,6 +26,7 @@ struct virtio_blk
   
  /* The disk structure for the kernel. */
  struct gendisk *disk;
  +   u32 index;
   
  /* Request tracking. */
  struct list_head reqs;
  @@ -343,12 +347,23 @@ static int __devinit virtblk_probe(struct 
  virtio_device *vdev)
  struct request_queue *q;
  int err;
  u64 cap;
  -   u32 v, blk_size, sg_elems, opt_io_size;
  +   u32 v, blk_size, sg_elems, opt_io_size, index;
  u16 min_io_size;
  u8 physical_block_exp, alignment_offset;
   
  -   if (index_to_minor(index) = 1  MINORBITS)
  -   return -ENOSPC;
  +   do {
  +   if (!ida_pre_get(vd_index_ida, GFP_KERNEL))
  +   return -ENOMEM;
  +   err = ida_get_new(vd_index_ida, index);
  +   } while (err == -EAGAIN);
  +
  +   if (err)
  +   return err;
  +
  +   if (index_to_minor(index) = 1  MINORBITS) {
  +   err =  -ENOSPC;
  +   goto out_free_index;
  +   }
   
  /* We need to know how many segments before we allocate. */
  err = virtio_config_val(vdev, VIRTIO_BLK_F_SEG_MAX,
  @@ -421,7 +436,7 @@ static int __devinit virtblk_probe(struct virtio_device 
  *vdev)
  vblk-disk-private_data = vblk;
  vblk-disk-fops = virtblk_fops;
  vblk-disk-driverfs_dev = vdev-dev;
  -   index++;
  +   vblk-index = index;
   
  /* configure queue flush support */
  if (virtio_has_feature(vdev, VIRTIO_BLK_F_FLUSH))
  @@ -516,6 +531,8 @@ out_free_vq:
  vdev-config-del_vqs(vdev);
   out_free_vblk:
  kfree(vblk);
  +out_free_index:
  +   ida_remove(vd_index_ida, index);
   out:
  return err;
   }
  @@ -538,6 +555,7 @@ static void __devexit virtblk_remove(struct 
  

Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index

2011-10-24 Thread Jens Axboe
On 2011-10-24 12:02, Michael S. Tsirkin wrote:
 On Wed, Oct 19, 2011 at 12:12:20PM +0200, Michael S. Tsirkin wrote:
 On Thu, Jun 09, 2011 at 06:41:56AM -0400, Mark Wu wrote:
 On 06/09/2011 05:14 AM, Tejun Heo wrote:
 Hello,

 On Thu, Jun 09, 2011 at 08:51:05AM +0930, Rusty Russell wrote:
 On Wed, 08 Jun 2011 09:08:29 -0400, Mark Wu d...@redhat.com wrote:
 Hi Rusty,
 Yes, I can't figure out an instance of disk probing in parallel either, 
 but as
 per the following commit, I think we still need use lock for safety. 
 What's your opinion?

 commit 4034cc68157bfa0b6622efe368488d3d3e20f4e6
 Author: Tejun Heo t...@kernel.org
 Date:   Sat Feb 21 11:04:45 2009 +0900

 [SCSI] sd: revive sd_index_lock

 Commit f27bac2761cab5a2e212dea602d22457a9aa6943 which converted sd to
 use ida instead of idr incorrectly removed sd_index_lock around id
 allocation and free.  idr/ida do have internal locks but they protect
 their free object lists not the allocation itself.  The caller is
 responsible for that.  This missing synchronization led to the same 
 id
 being assigned to multiple devices leading to oops.

 I'm confused.  Tejun, Greg, anyone can probes happen in parallel?

 If so, I'll have to review all my drivers.

 Unless async is explicitly used, probe happens sequentially.  IOW, if
 there's no async_schedule() call, things won't happen in parallel.
 That said, I think it wouldn't be such a bad idea to protect ida with
 spinlock regardless unless the probe code explicitly requires
 serialization.

 Thanks.

 Since virtio blk driver doesn't use async probe, it needn't use spinlock to 
 protect ida.
 So remove the lock from patch.

 From fbb396df9dbf8023f1b268be01b43529a3993d57 Mon Sep 17 00:00:00 2001
 From: Mark Wu d...@redhat.com
 Date: Thu, 9 Jun 2011 06:34:07 -0400
 Subject: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index

 Current index allocation in virtio-blk is based on a monotonically
 increasing variable index. It could cause some confusion about disk
 name in the case of hot-plugging disks. And it's impossible to find the
 lowest available index by just maintaining a simple index. So it's
 changed to use ida to allocate index via referring to the index
 allocation in scsi disk.

 Signed-off-by: Mark Wu d...@redhat.com

 Acked-by: Michael S. Tsirkin m...@redhat.com

 This got lost in the noise and missed 3.1 which is unfortunate.
 How about we apply this as is and look at cleanups as a next step?
 
 Rusty, any opinion on merging this for 3.2?
 I expect merge window will open right after the summit,

I can toss it into for-3.2/drivers, if there's consensus to do that now.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 3/5] kvm hypervisor : Add two hypercalls to support pv-ticketlock

2011-10-24 Thread Avi Kivity
On 10/23/2011 09:05 PM, Raghavendra K T wrote:
 Add two hypercalls to KVM hypervisor to support pv-ticketlocks.
 
 KVM_HC_WAIT_FOR_KICK blocks the calling vcpu until another vcpu kicks it or it
 is woken up because of an event like interrupt.

 KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu.
 
 The presence of these hypercalls is indicated to guest via
 KVM_FEATURE_WAIT_FOR_KICK/KVM_CAP_WAIT_FOR_KICK.

 Qemu needs a corresponding patch to pass up the presence of this feature to 
 guest via cpuid. Patch to qemu will be sent separately.

 There is no Xen/KVM hypercall interface to await kick from.
 
  
 +/*
 + * kvm_pv_wait_for_kick_op : Block until kicked by either a KVM_HC_KICK_CPU
 + * hypercall or a event like interrupt.
 + *
 + * @vcpu : vcpu which is blocking.
 + */
 +static void kvm_pv_wait_for_kick_op(struct kvm_vcpu *vcpu)
 +{
 + DEFINE_WAIT(wait);
 +
 + /*
 +  * Blocking on vcpu-wq allows us to wake up sooner if required to
 +  * service pending events (like interrupts).
 +  *
 +  * Also set state to TASK_INTERRUPTIBLE before checking vcpu-kicked to
 +  * avoid racing with kvm_pv_kick_cpu_op().
 +  */
 + prepare_to_wait(vcpu-wq, wait, TASK_INTERRUPTIBLE);
 +
 + /*
 +  * Somebody has already tried kicking us. Acknowledge that
 +  * and terminate the wait.
 +  */
 + if (vcpu-kicked) {
 + vcpu-kicked = 0;
 + goto end_wait;
 + }
 +
 + /* Let's wait for either KVM_HC_KICK_CPU or someother event
 +  * to wake us up.
 +  */
 +
 + srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx);
 + schedule();
 + vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu);
 +
 +end_wait:
 + finish_wait(vcpu-wq, wait);
 +}

This hypercall can be replaced by a HLT instruction, no?

I'm pretty sure this misses a lot of stuff from kvm_vcpu_block().

 +
 +/*
 + * kvm_pv_kick_cpu_op:  Kick a vcpu.
 + *
 + * @cpu - vcpu to be kicked.
 + */
 +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int cpu)
 +{
 + struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, cpu);
 +

Is the vcpu number meaningful?  We should reuse an existing identifier
like the APIC ID.

 + if (vcpu) {
 + vcpu-kicked = 1;

Need to use smp memory barriers here.

 + wake_up_interruptible(vcpu-wq);
 + }
 +}
 +
  int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
  {
   unsigned long nr, a0, a1, a2, a3, ret;


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 4/5] kvm guest : Added configuration support to enable debug information for KVM Guests

2011-10-24 Thread Avi Kivity
On 10/23/2011 09:07 PM, Raghavendra K T wrote:
 Added configuration support to enable debug information
 for KVM Guests in debugfs
 
 Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
 Signed-off-by: Suzuki Poulose suz...@in.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
 diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
 index 1f03f82..ed34269 100644
 --- a/arch/x86/Kconfig
 +++ b/arch/x86/Kconfig
 @@ -562,6 +562,15 @@ config KVM_GUEST
 This option enables various optimizations for running under the KVM
 hypervisor.
  
 +config KVM_DEBUG_FS
 + bool Enable debug information for KVM Guests in debugfs
 + depends on KVM_GUEST
 + default n
 + ---help---
 +   This option enables collection of various statistics for KVM guest.
 +   Statistics are displayed in debugfs filesystem. Enabling this option
 +   may incur significant overhead.
 +
  source arch/x86/lguest/Kconfig
  


This might be better implemented through tracepoints, which an be
enabled dynamically.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Jan Kiszka
On 2011-10-24 11:45, Avi Kivity wrote:
 On 10/21/2011 11:19 AM, Jan Kiszka wrote:
 Currently, MSI messages can only be injected to in-kernel irqchips by
 defining a corresponding IRQ route for each message. This is not only
 unhandy if the MSI messages are generated on the fly by user space,
 IRQ routes are a limited resource that user space as to manage
 carefully.
 
 By itself, this does not provide enough value to offset the cost of a
 new ABI, especially as userspace will need to continue supporting the
 old method for a very long while.

Yes, but less sophistically as it would now.

 
 By providing a direct injection with, we can both avoid using up limited
 resources and simplify the necessary steps for user land. The API
 already provides a channel (flags) to revoke an injected but not yet
 delivered message which will become important for in-kernel MSI-X vector
 masking support.

 
 With the new feature it may be worthwhile, but I'd like to see the whole
 thing, with numbers attached.

It's not a performance issue, it's a resource limitation issue: With the
new API we can stop worrying about user space device models consuming
limited IRQ routes of the KVM subsystem.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qemu-kvm guest which won't 'cont' (emulation failure?)

2011-10-24 Thread Chris Webb
I have a qemu-kvm guest (apparently a Ubuntu 11.04 x86-64 install) which has
stopped and refuses to continue:

  (qemu) info status
  VM status: paused
  (qemu) cont
  (qemu) info status
  VM status: paused

The host is running linux 2.6.39.2 with qemu-kvm 0.14.1 on 24-core Opteron
6176 box, and has nine other 2GB production guests on it running absolutely
fine.

It's been a while since I've seen one of these. When I last saw a cluster of
them, they were emulation failures (big real mode instructions, maybe?). I
also remember a message about abnormal exit in the dmesg previously, but I
don't have that here. This time, there is no host kernel output at all, just
the paused guest.

I have qemu monitor access and can even strace the relevant qemu process if
necessary: is it possible to use this to diagnose what's caused this guest
to stop, e.g. the unsupported instruction if it's an emulation failure?

Cheers,

Chris.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)

2011-10-24 Thread Kevin Wolf
Am 24.10.2011 12:00, schrieb Chris Webb:
 I have a qemu-kvm guest (apparently a Ubuntu 11.04 x86-64 install) which has
 stopped and refuses to continue:
 
   (qemu) info status
   VM status: paused
   (qemu) cont
   (qemu) info status
   VM status: paused
 
 The host is running linux 2.6.39.2 with qemu-kvm 0.14.1 on 24-core Opteron
 6176 box, and has nine other 2GB production guests on it running absolutely
 fine.
 
 It's been a while since I've seen one of these. When I last saw a cluster of
 them, they were emulation failures (big real mode instructions, maybe?). I
 also remember a message about abnormal exit in the dmesg previously, but I
 don't have that here. This time, there is no host kernel output at all, just
 the paused guest.
 
 I have qemu monitor access and can even strace the relevant qemu process if
 necessary: is it possible to use this to diagnose what's caused this guest
 to stop, e.g. the unsupported instruction if it's an emulation failure?

Another common cause for stopped VMs are I/O errors, for example writes
to a sparse image when the disk is full.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)

2011-10-24 Thread Chris Webb
Kevin Wolf kw...@redhat.com writes:

 Am 24.10.2011 12:00, schrieb Chris Webb:
  I have qemu monitor access and can even strace the relevant qemu process if
  necessary: is it possible to use this to diagnose what's caused this guest
  to stop, e.g. the unsupported instruction if it's an emulation failure?
 
 Another common cause for stopped VMs are I/O errors, for example writes
 to a sparse image when the disk is full.

This guest are backed by LVM LVs so I don't think they can return EFULL, but I
could imagine read errors, so I've just done a trivial test to make sure I can
read them end-to-end:

  0015# dd 
if=/dev/mapper/guest\:e549f8e1-4c0e-4dea-826a-e4b877282c07\:ide\:0\:0 
of=/dev/null bs=1M
  3136+0 records in
  3136+0 records out
  3288334336 bytes (3.3 GB) copied, 20.898 s, 157 MB/s

  0015# dd 
if=/dev/mapper/guest\:e549f8e1-4c0e-4dea-826a-e4b877282c07\:ide\:0\:1 
of=/dev/null bs=1M
  276+0 records in
  276+0 records out
  289406976 bytes (289 MB) copied, 1.85218 s, 156 MB/s

Is there any way to ask qemu why a guest has stopped, so I can distinguish IO
problems from emulation problems from anything else?

Cheers,

Chris.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 3/5] kvm hypervisor : Add two hypercalls to support pv-ticketlock

2011-10-24 Thread Raghavendra K T

On 10/24/2011 03:31 PM, Sasha Levin wrote:

On Mon, 2011-10-24 at 00:35 +0530, Raghavendra K T wrote:

Add two hypercalls to KVM hypervisor to support pv-ticketlocks.

+static void kvm_pv_kick_cpu_op(struct kvm *kvm, int cpu)
+{
+   struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, cpu);
+
+   if (vcpu) {
+   vcpu-kicked = 1;


I'm not sure about it, but maybe we want a memory barrier over here?


Yes, Thanks for pointing this. Avi Kivity also pointed same. 'll add 
barrier() here.





+   wake_up_interruptible(vcpu-wq);
+   }
+}


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for October 25

2011-10-24 Thread Juan Quintela

Hi

Please send in any agenda items you are interested in covering.

Thanks, Juan.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm: use this_cpu_xxx replace percpu_xxx funcs

2011-10-24 Thread Avi Kivity
On 10/24/2011 04:50 AM, Alex,Shi wrote:
 On Thu, 2011-10-20 at 15:34 +0800, Alex,Shi wrote:
  percpu_xxx funcs are duplicated with this_cpu_xxx funcs, so replace them
  for further code clean up.
  
  And in preempt safe scenario, __this_cpu_xxx funcs has a bit better
  performance since __this_cpu_xxx has no redundant preempt_disable()
  

 Avi: 
 Would you like to give some comments of this? 


Sorry, was travelling:

Acked-by: Avi Kivity a...@redhat.com

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Avi Kivity
On 10/24/2011 12:19 PM, Jan Kiszka wrote:
  
  With the new feature it may be worthwhile, but I'd like to see the whole
  thing, with numbers attached.

 It's not a performance issue, it's a resource limitation issue: With the
 new API we can stop worrying about user space device models consuming
 limited IRQ routes of the KVM subsystem.


Only if those devices are in the same process (or have access to the
vmfd).  Interrupt routing together with irqfd allows you to disaggregate
the device model.  Instead of providing a competing implementation with
new limitations, we need to remove the limitations of the old
implementation.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Nested Virtualization Of Hyper-V 2K8R2

2011-10-24 Thread Jim

Hi

Anyone got any further ideas on how I get the Hyper-V guest to work ?  
My kvm is 0.14 (Ubuntu 11.04 Server) - is this just too old ?


Jim


On 19/10/2011 16:07, Jim wrote:



On 19/10/2011 16:06, Jim wrote:

Hi Joerg,

I added the -cpu phenom,-hv but it made no difference.  I then tried 
to call it from the command line (rather then via virsh) and get this :


# /usr/bin/kvm -cpu phenom,-hv
*CPU feature hv not found*


I played around a little and found 'svm' seemed to be a supported cpu 
flag but both +svm and -svm made no difference either.  Alas kvm -cpu 
? only listed CPUs and not the options the various ones support.


Am I on too low a version of kvm perhaps ?  This is an Ubuntu 11.04 
server system and I've just used the Ubuntu packages - I did not 
build kvm myself.


Thanks
Jim

My CPU reports as :

*processor: 0-3  i.e. 4 cores*
vendor_id: AuthenticAMD
cpu family: 16
model: 2
model name: Quad-Core AMD Opteron(tm) Processor 1354
stepping: 3
cpu MHz: 1100.000
cache size: 512 KB
physical id: 0
siblings: 4
core id: 3
cpu cores: 4
apicid: 3
initial apicid: 3
fpu: yes
fpu_exception: yes
cpuid level: 5
wp: yes
flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext 
fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl 
nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy 
svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs 
npt lbrv svm_lock

bogomips: 4400.04
TLB size: 1024 4K pages
clflush size: 64
cache_alignment: 64
address sizes: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate



On 19/10/2011 15:19, Joerg Roedel wrote:

Hi Jim,

On Tue, Oct 18, 2011 at 07:28:52PM +0100, Jim wrote:

Sure, the KVM command is :

/usr/bin/kvm -enable-nesting -no-kvm-irqchip -S -M pc-0.14 -enable-kvm
-m 2048 -smp 2,sockets=2,cores=1,threads=1 -name hyperv1 -uuid
8c5d8f1f-5767-b388-d408-1b53a1b66e72 -nodefconfig -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/hyperv1.monitor,server,nowait 


-mon chardev=charmonitor,id=monitor,mode=readline -rtc base=localtime
-no-reboot -boot d -drive
file=/srv/hyperv/hyperv1.vmimg,if=none,id=drive-ide0-0-0,format=raw
-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0
-drive
file=/srv/virtual-machines/fromiscsi/iso/W2K8ENTR2SP1.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw 


-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0
-drive
file=/srv/virtual-machines/fromiscsi/iso/virtio-win-1.1.16.iso,if=none,media=cdrom,id=drive-ide0-1-1,readonly=on,format=raw 


-device ide-drive,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1
-netdev tap,fd=17,id=hostnet0 -device
rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:2a:be:2f,bus=pci.0,addr=0x3 


-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -usb -device
usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga std -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4

This is missing a -cpu parameter. Please try again with adding
'-cpu phenom,-hv'. This is the combination I used during testing and
development.


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message tomajord...@vger.kernel.org
More majordomo info athttp://vger.kernel.org/majordomo-info.html



--
All e-mail and telephone communications are subject to Suresafe Terms
And Conditions and may be monitored, recorded and processed for the
purposes contained therein and adherence to regulatory and legal
requirements.

Your further communication or reply to this e-mail indicates your
acceptance of this.

Any views or opinions expressed are the responsibility of the author
and may not reflect those of Suresafe Protection Limited.

Suresafe Protection Limited is registered in Scotland, number SC132827
The registered office is at 8 Kelvin Road, Cumbernauld, G67 2BA.
Telephone: 01236 727792Fax: 01236 723301   VAT Number: 556 6950 02


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)

2011-10-24 Thread Kevin Wolf
Am 24.10.2011 12:58, schrieb Chris Webb:
 Kevin Wolf kw...@redhat.com writes:
 
 Am 24.10.2011 12:00, schrieb Chris Webb:
 I have qemu monitor access and can even strace the relevant qemu process if
 necessary: is it possible to use this to diagnose what's caused this guest
 to stop, e.g. the unsupported instruction if it's an emulation failure?

 Another common cause for stopped VMs are I/O errors, for example writes
 to a sparse image when the disk is full.
 
 This guest are backed by LVM LVs so I don't think they can return EFULL, but I
 could imagine read errors, so I've just done a trivial test to make sure I can
 read them end-to-end:
 
   0015# dd 
 if=/dev/mapper/guest\:e549f8e1-4c0e-4dea-826a-e4b877282c07\:ide\:0\:0 
 of=/dev/null bs=1M
   3136+0 records in
   3136+0 records out
   3288334336 bytes (3.3 GB) copied, 20.898 s, 157 MB/s
 
   0015# dd 
 if=/dev/mapper/guest\:e549f8e1-4c0e-4dea-826a-e4b877282c07\:ide\:0\:1 
 of=/dev/null bs=1M
   276+0 records in
   276+0 records out
   289406976 bytes (289 MB) copied, 1.85218 s, 156 MB/s
 
 Is there any way to ask qemu why a guest has stopped, so I can distinguish IO
 problems from emulation problems from anything else?

In qemu 1.0 we'll have an extended 'info status' that includes the stop
reason, but 0.14 doesn't have this yet (was committed to git master only
recently).

If you attach a QMP monitor (see QMP/README, don't forget to send the
capabilities command, it's part of creating the connection) you will
receive messages for I/O errors, though.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 3/5] kvm hypervisor : Add two hypercalls to support pv-ticketlock

2011-10-24 Thread Raghavendra K T

On 10/24/2011 03:44 PM, Avi Kivity wrote:

On 10/23/2011 09:05 PM, Raghavendra K T wrote:

Add two hypercalls to KVM hypervisor to support pv-ticketlocks.
+
+end_wait:
+   finish_wait(vcpu-wq,wait);
+}


This hypercall can be replaced by a HLT instruction, no?

I'm pretty sure this misses a lot of stuff from kvm_vcpu_block().


Yes.. agree. HLT sounds better idea. 'll try this out.




+   if (vcpu) {
+   vcpu-kicked = 1;


Need to use smp memory barriers here.


Agree.




+   wake_up_interruptible(vcpu-wq);
+   }
+}
+
  int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
  {
unsigned long nr, a0, a1, a2, a3, ret;





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)

2011-10-24 Thread Chris Webb
Kevin Wolf kw...@redhat.com writes:

 In qemu 1.0 we'll have an extended 'info status' that includes the stop
 reason, but 0.14 doesn't have this yet (was committed to git master only
 recently).

Right, okay. I might take a look at cherry-picking and back-porting that to
our version of qemu-kvm if it's not too entangled with other changes. It
would be very useful in these situations.

 If you attach a QMP monitor (see QMP/README, don't forget to send the
 capabilities command, it's part of creating the connection) you will
 receive messages for I/O errors, though.

Thanks. I don't think I can do this with an already-running qemu-kvm that's
in a stopped state can I, only with a new qemu-kvm invocation and wait to
try to catch the problem again?

Cheers,

Chris.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)

2011-10-24 Thread Kevin Wolf
Am 24.10.2011 13:29, schrieb Chris Webb:
 Kevin Wolf kw...@redhat.com writes:
 
 In qemu 1.0 we'll have an extended 'info status' that includes the stop
 reason, but 0.14 doesn't have this yet (was committed to git master only
 recently).
 
 Right, okay. I might take a look at cherry-picking and back-porting that to
 our version of qemu-kvm if it's not too entangled with other changes. It
 would be very useful in these situations.

I'm afraid that it depends on many other changes, but you can try.

 
 If you attach a QMP monitor (see QMP/README, don't forget to send the
 capabilities command, it's part of creating the connection) you will
 receive messages for I/O errors, though.
 
 Thanks. I don't think I can do this with an already-running qemu-kvm that's
 in a stopped state can I, only with a new qemu-kvm invocation and wait to
 try to catch the problem again?

Good point... The only other thing that I can think of would be
attaching gdb and setting a breakpoint in vm_stop() or something.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for October 25

2011-10-24 Thread Peter Maydell
On 24 October 2011 12:35, Paolo Bonzini pbonz...@redhat.com wrote:
 On 10/24/2011 01:04 PM, Juan Quintela wrote:
 Please send in any agenda items you are interested in covering.

 - What's left to merge for 1.0.

Things on my list, FWIW:
 * current target-arm pullreq
 * PL041 support (needs another patch round to fix a minor bug
   Andrzej spotted)
 * cpu_single_env must be thread-local

I also think that it's somewhat unfortunate that we now will
compile on ARM hosts so that we always abort on startup (due
to the reliance on a working makecontext()) but I'm not really
sure how to deal with that one.

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)

2011-10-24 Thread Chris Webb
Kevin Wolf kw...@redhat.com writes:

 Good point... The only other thing that I can think of would be
 attaching gdb and setting a breakpoint in vm_stop() or something.

Perfect, that seems to identified what's going on very nicely:

(gdb) break vm_stop
Breakpoint 1 at 0x407d10: file /home/root/packages/qemu-kvm/src-UMBurO/cpus.c, 
line 318.
(gdb) fg
Continuing.

Breakpoint 1, vm_stop (reason=0)
at /home/root/packages/qemu-kvm/src-UMBurO/cpus.c:318
318 /home/root/packages/qemu-kvm/src-UMBurO/cpus.c: No such file or 
directory.
in /home/root/packages/qemu-kvm/src-UMBurO/cpus.c
(gdb) bt
#0  vm_stop (reason=0) at /home/root/packages/qemu-kvm/src-UMBurO/cpus.c:318
#1  0x0058585f in ide_handle_rw_error (s=0x20330d8, error=28, op=8)
at /home/root/packages/qemu-kvm/src-UMBurO/hw/ide/core.c:468
#2  0x00588376 in ide_dma_cb (opaque=0x20330d8, 
ret=value optimized out)
at /home/root/packages/qemu-kvm/src-UMBurO/hw/ide/core.c:494
#3  0x00590092 in dma_bdrv_cb (opaque=0x2043a10, ret=-28)
at /home/root/packages/qemu-kvm/src-UMBurO/dma-helpers.c:94
#4  0x0044d64a in qcow2_aio_write_cb (opaque=0x2034900, ret=-28)
at block/qcow2.c:714
#5  0x0043df6d in posix_aio_process_queue (
opaque=value optimized out) at posix-aio-compat.c:462
#6  0x0043e07d in posix_aio_read (opaque=0x17c8110)
at posix-aio-compat.c:503
#7  0x00415fca in main_loop_wait (nonblocking=value optimized out)
at /home/root/packages/qemu-kvm/src-UMBurO/vl.c:1383
#8  0x0042ca37 in kvm_main_loop ()
at /home/root/packages/qemu-kvm/src-UMBurO/qemu-kvm.c:1589
#9  0x004170a3 in main (argc=32, argv=value optimized out, 
envp=value optimized out)
at /home/root/packages/qemu-kvm/src-UMBurO/vl.c:1429

I see what's happened here: we're not explicitly setting format=raw when we
start that guest and someone's uploaded a qcow2 image directly to a block
device. Ouch. Sorry for the noise!

Best wishes,

Chris.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Jan Kiszka
On 2011-10-24 13:09, Avi Kivity wrote:
 On 10/24/2011 12:19 PM, Jan Kiszka wrote:

 With the new feature it may be worthwhile, but I'd like to see the whole
 thing, with numbers attached.

 It's not a performance issue, it's a resource limitation issue: With the
 new API we can stop worrying about user space device models consuming
 limited IRQ routes of the KVM subsystem.

 
 Only if those devices are in the same process (or have access to the
 vmfd).  Interrupt routing together with irqfd allows you to disaggregate
 the device model.  Instead of providing a competing implementation with
 new limitations, we need to remove the limitations of the old
 implementation.

That depends on where we do the cut. Currently we let the IRQ source
signal an abstract edge on a pre-allocated pseudo IRQ line. But we
cannot build correct MSI-X on top of the current irqfd model as we lack
the level information (for PBA emulation). *) So we either need to
extend the existing model anyway -- or push per-vector masking back to
the IRQ source. In the latter case, it would be a very good chance to
give up on limited pseudo GSIs with static routes and do MSI messaging
from external IRQ sources to KVM directly.

But all those considerations affect different APIs than what I'm
proposing here. We will always need a way to inject MSIs in the context
of the VM as there will always be scenarios where devices are better run
in that very same context, for performance or simplicity or whatever
reasons. E.g., I could imagine that one would like to execute an
emulated IRQ remapper rather in the hypervisor context than
over-microkernelized in a separate process.

Jan

*) Realized this while trying to generalize the proposed MSI-X MMIO
acceleration for assigned devices to arbitrary device models, vhost-net,
and specifically vfio.

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 3/5] kvm hypervisor : Add two hypercalls to support pv-ticketlock

2011-10-24 Thread Srivatsa Vaddagiri
* Avi Kivity a...@redhat.com [2011-10-24 12:14:21]:

  +/*
  + * kvm_pv_wait_for_kick_op : Block until kicked by either a KVM_HC_KICK_CPU
  + * hypercall or a event like interrupt.
  + *
  + * @vcpu : vcpu which is blocking.
  + */
  +static void kvm_pv_wait_for_kick_op(struct kvm_vcpu *vcpu)
  +{

[snip]

  +}
 
 This hypercall can be replaced by a HLT instruction, no?

Good point. Assuming yield_on_hlt=1, that would allow the vcpu to be put
to sleep and let other vcpus make progress.

I guess with that change, we can also dropthe need for other hypercall
introduced in this patch (kvm_pv_kick_cpu_op()). Essentially a vcpu sleeping 
because of HLT instruction can be woken up by a IPI issued by vcpu releasing a
lock.

- vatsa

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Michael S. Tsirkin
On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote:
 On 2011-10-24 13:09, Avi Kivity wrote:
  On 10/24/2011 12:19 PM, Jan Kiszka wrote:
 
  With the new feature it may be worthwhile, but I'd like to see the whole
  thing, with numbers attached.
 
  It's not a performance issue, it's a resource limitation issue: With the
  new API we can stop worrying about user space device models consuming
  limited IRQ routes of the KVM subsystem.
 
  
  Only if those devices are in the same process (or have access to the
  vmfd).  Interrupt routing together with irqfd allows you to disaggregate
  the device model.  Instead of providing a competing implementation with
  new limitations, we need to remove the limitations of the old
  implementation.
 
 That depends on where we do the cut. Currently we let the IRQ source
 signal an abstract edge on a pre-allocated pseudo IRQ line. But we
 cannot build correct MSI-X on top of the current irqfd model as we lack
 the level information (for PBA emulation). *)


I don't agree here. IMO PBA emulation would need to
clear pending bits on interrupt status register read.
So clearing pending bits could be done by ioctl from qemu
while setting them would be done from irqfd.

 So we either need to
 extend the existing model anyway -- or push per-vector masking back to
 the IRQ source. In the latter case, it would be a very good chance to
 give up on limited pseudo GSIs with static routes and do MSI messaging
 from external IRQ sources to KVM directly.
 But all those considerations affect different APIs than what I'm
 proposing here. We will always need a way to inject MSIs in the context
 of the VM as there will always be scenarios where devices are better run
 in that very same context, for performance or simplicity or whatever
 reasons. E.g., I could imagine that one would like to execute an
 emulated IRQ remapper rather in the hypervisor context than
 over-microkernelized in a separate process.
 
 Jan
 
 *) Realized this while trying to generalize the proposed MSI-X MMIO
 acceleration for assigned devices to arbitrary device models, vhost-net,

I'm actually working on a qemu patch to get pba emulation working correctly.
I think it's doable with existing irqfd.

 and specifically vfio.

Interesting. How would you clear the pseudo interrupt level?

 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-autotest]client.tests.kvm.tests.cgroup: Add TestDeviceAccess subtest

2011-10-24 Thread Lukas Doktor
Hi guys,

I have a new subtest which tests the 'devices' cgroup subsystem and improve the 
logging a bit.

Please find the pull request on github:
https://github.com/autotest/autotest/pull/48

Cheers,
Lukáš

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [kvm-autotest]client.tests.kvm.tests.cgroup: Add TestDeviceAccess subtest

2011-10-24 Thread Lukas Doktor
This subtest tries to attach scsi_debug disk with different cgroup
devices.list setting.

 * subtests the devices.{allow, deny, list} cgroup functionality
 * new function get_maj_min(dev) which returns (major, minor) numbers
   of dev
 * rm_drive: support for rm_device without drive (only remove the host
   file)
 * improved logging

Signed-off-by: Lukas Doktor ldok...@redhat.com
---
 client/tests/kvm/tests/cgroup.py |  234 +-
 1 files changed, 203 insertions(+), 31 deletions(-)

diff --git a/client/tests/kvm/tests/cgroup.py b/client/tests/kvm/tests/cgroup.py
index 6c64532..c83f91a 100644
--- a/client/tests/kvm/tests/cgroup.py
+++ b/client/tests/kvm/tests/cgroup.py
@@ -50,7 +50,7 @@ def run_cgroup(test, params, env):
 return abs(float(actual-reference) / reference)
 
 
-def get_dd_cmd(direction, dev='vd?', count=None, bs=None):
+def get_dd_cmd(direction, dev=None, count=None, bs=None):
 
 Generates dd_cmd string
 @param direction: {read,write,bi} dd direction
@@ -59,6 +59,11 @@ def run_cgroup(test, params, env):
 @param bs: bs parameter of dd
 @return: dd command string
 
+if dev is None:
+if get_device_driver() == virtio:
+dev = 'vd?'
+else:
+dev = '[sh]d?'
 if direction == read:
 params = if=$FILE of=/dev/null iflag=direct
 elif direction == write:
@@ -82,6 +87,21 @@ def run_cgroup(test, params, env):
 return params.get('drive_format', 'virtio')
 
 
+def get_maj_min(dev):
+
+Returns the major and minor numbers of the dev device
+@return: Tupple(major, minor) numbers of the dev device
+
+try:
+ret = utils.system_output(ls -l %s % dev)
+ret = re.match(r'[bc][rwx-]{9} \d+ \w+ \w+ (\d+), (\d+)',
+   ret).groups()
+except Exception, details:
+raise error.TestFail(Couldn't get %s maj and min numbers: %s %
+  (dev, details))
+return ret
+
+
 def add_file_drive(vm, driver=get_device_driver(), host_file=None):
 
 Hot-add a drive based on file to a vm
@@ -173,14 +193,17 @@ def run_cgroup(test, params, env):
 
 err = False
 # TODO: Implement also via QMP
-vm.monitor.cmd(pci_del %s % device)
-time.sleep(3)
-qtree = vm.monitor.info('qtree', debug=False)
-if qtree.count('addr %s.0' % device) != 0:
-err = True
-vm.destroy()
-
-if isinstance(host_file, str):# scsi device
+if device:
+vm.monitor.cmd(pci_del %s % device)
+time.sleep(3)
+qtree = vm.monitor.info('qtree', debug=False)
+if qtree.count('addr %s.0' % device) != 0:
+err = True
+vm.destroy()
+
+if host_file is None:   # Do not remove
+pass
+elif isinstance(host_file, str):# scsi device
 utils.system(echo -1 
/sys/bus/pseudo/drivers/scsi_debug/add_host)
 else: # file
 host_file.close()
@@ -334,7 +357,7 @@ def run_cgroup(test, params, env):
 _TestBlkioBandwidth.__init__(self, vms, modules)
 # Read from the last vd* in a loop until test removes the
 # /tmp/cgroup_lock file (and kills us)
-self.dd_cmd = get_dd_cmd(read, bs=100K)
+self.dd_cmd = get_dd_cmd(read, dev='vd?', bs=100K)
 
 
 class TestBlkioBandwidthWeigthWrite(_TestBlkioBandwidth):
@@ -350,7 +373,7 @@ def run_cgroup(test, params, env):
 # Write on the last vd* in a loop until test removes the
 # /tmp/cgroup_lock file (and kills us)
 _TestBlkioBandwidth.__init__(self, vms, modules)
-self.dd_cmd = get_dd_cmd(write, bs=100K)
+self.dd_cmd = get_dd_cmd(write, dev='vd?', bs=100K)
 
 
 class _TestBlkioThrottle:
@@ -376,10 +399,6 @@ def run_cgroup(test, params, env):
 self.devices = None # Temporary virt devices (PCI drive 1 per vm)
 self.dd_cmd = None  # DD command used to test the throughput
 self.speeds = None  # cgroup throughput
-if get_device_driver() == virtio:
-self.dev = vd?
-else:
-self.dev = [sh]d?
 
 def cleanup(self):
 
@@ -417,13 +436,8 @@ def run_cgroup(test, params, env):
 driver=virtio)
 else:
 (self.files, self.devices) = add_scsi_drive(self.vm)
-try:
-dev = utils.system_output(ls -l %s % self.files).split()[4:6]
-dev[0] = dev[0][:-1]# Remove tailing ','
-except:
-time.sleep(5)
-raise error.TestFail(Couldn't get %s maj and min numbers
- % 

Re: [Qemu-devel] KVM call agenda for October 25

2011-10-24 Thread Andreas Färber
Am 24.10.2011 14:02, schrieb Peter Maydell:
 On 24 October 2011 12:35, Paolo Bonzini pbonz...@redhat.com wrote:
 On 10/24/2011 01:04 PM, Juan Quintela wrote:
 Please send in any agenda items you are interested in covering.

 - What's left to merge for 1.0.

 I also think that it's somewhat unfortunate that we now will
 compile on ARM hosts so that we always abort on startup (due
 to the reliance on a working makecontext()) but I'm not really
 sure how to deal with that one.

FWIW we're also not working / not building on Darwin ppc+Intel, which is
related to a) softfloat integer types, b) GThread initialization, c)
unknown issues. Bisecting did not work well and I am lacking time and
ideas to investigate and fix this. For softfloat there are several
solutions around, in need of a decision.

Nice to merge would be the Cocoa sheet issue, once verified.

Andreas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 3/5] kvm hypervisor : Add two hypercalls to support pv-ticketlock

2011-10-24 Thread Avi Kivity
On 10/24/2011 02:27 PM, Srivatsa Vaddagiri wrote:
 Good point. Assuming yield_on_hlt=1, that would allow the vcpu to be put
 to sleep and let other vcpus make progress.

 I guess with that change, we can also dropthe need for other hypercall
 introduced in this patch (kvm_pv_kick_cpu_op()). Essentially a vcpu sleeping 
 because of HLT instruction can be woken up by a IPI issued by vcpu releasing a
 lock.

Not if interrupts are disabled.  My original plan was to use NMIs for
wakeups, but it turns out NMIs can be coalesced under certain rare
circumstances; this requires workarounds by the generic NMI code that
make NMIs too slow.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Jan Kiszka
On 2011-10-24 14:43, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote:
 On 2011-10-24 13:09, Avi Kivity wrote:
 On 10/24/2011 12:19 PM, Jan Kiszka wrote:

 With the new feature it may be worthwhile, but I'd like to see the whole
 thing, with numbers attached.

 It's not a performance issue, it's a resource limitation issue: With the
 new API we can stop worrying about user space device models consuming
 limited IRQ routes of the KVM subsystem.


 Only if those devices are in the same process (or have access to the
 vmfd).  Interrupt routing together with irqfd allows you to disaggregate
 the device model.  Instead of providing a competing implementation with
 new limitations, we need to remove the limitations of the old
 implementation.

 That depends on where we do the cut. Currently we let the IRQ source
 signal an abstract edge on a pre-allocated pseudo IRQ line. But we
 cannot build correct MSI-X on top of the current irqfd model as we lack
 the level information (for PBA emulation). *)
 
 
 I don't agree here. IMO PBA emulation would need to
 clear pending bits on interrupt status register read.
 So clearing pending bits could be done by ioctl from qemu
 while setting them would be done from irqfd.

How should QEMU know if the reason for pending has been cleared at
device level if the device is outside the scope of QEMU? This model only
works for PV devices when you agree that spurious IRQs are OK.

 
 So we either need to
 extend the existing model anyway -- or push per-vector masking back to
 the IRQ source. In the latter case, it would be a very good chance to
 give up on limited pseudo GSIs with static routes and do MSI messaging
 from external IRQ sources to KVM directly.
 But all those considerations affect different APIs than what I'm
 proposing here. We will always need a way to inject MSIs in the context
 of the VM as there will always be scenarios where devices are better run
 in that very same context, for performance or simplicity or whatever
 reasons. E.g., I could imagine that one would like to execute an
 emulated IRQ remapper rather in the hypervisor context than
 over-microkernelized in a separate process.

 Jan

 *) Realized this while trying to generalize the proposed MSI-X MMIO
 acceleration for assigned devices to arbitrary device models, vhost-net,
 
 I'm actually working on a qemu patch to get pba emulation working correctly.
 I think it's doable with existing irqfd.

irqfd has no notion of level. You can only communicate a rising edge and
then need a side channel for the state of the edge reason.

 
 and specifically vfio.
 
 Interesting. How would you clear the pseudo interrupt level?

Ideally: not at all (for MSI). If we manage the mask at device level, we
only need to send the message if there is actually something to deliver
to the interrupt controller and masked input events would be lost on
real HW as well.

That said, we still need to address the irqfd level topic for the finite
amount of legacy interrupt lines. If a line is masked at an IRQ
controller, the device need to keep the controller up to date /wrt to
the line state, or the controller has to poll the current state on
unmask to avoid spurious injections.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Jan Kiszka
On 2011-10-24 15:11, Jan Kiszka wrote:
 On 2011-10-24 14:43, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote:
 On 2011-10-24 13:09, Avi Kivity wrote:
 On 10/24/2011 12:19 PM, Jan Kiszka wrote:

 With the new feature it may be worthwhile, but I'd like to see the whole
 thing, with numbers attached.

 It's not a performance issue, it's a resource limitation issue: With the
 new API we can stop worrying about user space device models consuming
 limited IRQ routes of the KVM subsystem.


 Only if those devices are in the same process (or have access to the
 vmfd).  Interrupt routing together with irqfd allows you to disaggregate
 the device model.  Instead of providing a competing implementation with
 new limitations, we need to remove the limitations of the old
 implementation.

 That depends on where we do the cut. Currently we let the IRQ source
 signal an abstract edge on a pre-allocated pseudo IRQ line. But we
 cannot build correct MSI-X on top of the current irqfd model as we lack
 the level information (for PBA emulation). *)


 I don't agree here. IMO PBA emulation would need to
 clear pending bits on interrupt status register read.
 So clearing pending bits could be done by ioctl from qemu
 while setting them would be done from irqfd.
 
 How should QEMU know if the reason for pending has been cleared at
 device level if the device is outside the scope of QEMU? This model only
 works for PV devices when you agree that spurious IRQs are OK.
 

 So we either need to
 extend the existing model anyway -- or push per-vector masking back to
 the IRQ source. In the latter case, it would be a very good chance to
 give up on limited pseudo GSIs with static routes and do MSI messaging
 from external IRQ sources to KVM directly.
 But all those considerations affect different APIs than what I'm
 proposing here. We will always need a way to inject MSIs in the context
 of the VM as there will always be scenarios where devices are better run
 in that very same context, for performance or simplicity or whatever
 reasons. E.g., I could imagine that one would like to execute an
 emulated IRQ remapper rather in the hypervisor context than
 over-microkernelized in a separate process.

 Jan

 *) Realized this while trying to generalize the proposed MSI-X MMIO
 acceleration for assigned devices to arbitrary device models, vhost-net,

 I'm actually working on a qemu patch to get pba emulation working correctly.
 I think it's doable with existing irqfd.
 
 irqfd has no notion of level. You can only communicate a rising edge and
 then need a side channel for the state of the edge reason.
 

 and specifically vfio.

 Interesting. How would you clear the pseudo interrupt level?
 
 Ideally: not at all (for MSI). If we manage the mask at device level, we
 only need to send the message if there is actually something to deliver
 to the interrupt controller and masked input events would be lost on
 real HW as well.

This wouldn't work out nicely as well. We rather need a combined model:

Devices need to maintain the PBA actively, i.e. set  clear them
themselves and do not rely on the core here (with the core being either
QEMU user space or an in-kernel MSI-X MMIO accelerator). The core only
checks the PBA if it is about to deliver some message and refrains from
doing so if the bit became 0 in the meantime (specifically during the
masked period). For QEMU device models, that means no additional IOCTLs,
just memory sharing of the PBA which is required anyway.

But that means QEMU-external device models need to gain at least basic
MSI-X knowledge. And if they gain this awareness, they could also use it
to send full-blown messages directly (e.g. device-id/vector tuples)
instead of encoding them into finite GSI numbers. But that's an add-on
topic.

Moreover, we still need a corresponding side channel for line-base
interrupts.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 3/5] kvm hypervisor : Add two hypercalls to support pv-ticketlock

2011-10-24 Thread Srivatsa Vaddagiri
* Avi Kivity a...@redhat.com [2011-10-24 15:09:25]:

  I guess with that change, we can also dropthe need for other hypercall
  introduced in this patch (kvm_pv_kick_cpu_op()). Essentially a vcpu 
  sleeping 
  because of HLT instruction can be woken up by a IPI issued by vcpu 
  releasing a
  lock.
 
 Not if interrupts are disabled.

Hmm yes ..so we need a kick hypercall then.

- vatsa
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Michael S. Tsirkin
On Mon, Oct 24, 2011 at 03:11:25PM +0200, Jan Kiszka wrote:
 On 2011-10-24 14:43, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote:
  On 2011-10-24 13:09, Avi Kivity wrote:
  On 10/24/2011 12:19 PM, Jan Kiszka wrote:
 
  With the new feature it may be worthwhile, but I'd like to see the whole
  thing, with numbers attached.
 
  It's not a performance issue, it's a resource limitation issue: With the
  new API we can stop worrying about user space device models consuming
  limited IRQ routes of the KVM subsystem.
 
 
  Only if those devices are in the same process (or have access to the
  vmfd).  Interrupt routing together with irqfd allows you to disaggregate
  the device model.  Instead of providing a competing implementation with
  new limitations, we need to remove the limitations of the old
  implementation.
 
  That depends on where we do the cut. Currently we let the IRQ source
  signal an abstract edge on a pre-allocated pseudo IRQ line. But we
  cannot build correct MSI-X on top of the current irqfd model as we lack
  the level information (for PBA emulation). *)
  
  
  I don't agree here. IMO PBA emulation would need to
  clear pending bits on interrupt status register read.
  So clearing pending bits could be done by ioctl from qemu
  while setting them would be done from irqfd.
 
 How should QEMU know if the reason for pending has been cleared at
 device level if the device is outside the scope of QEMU? This model only
 works for PV devices when you agree that spurious IRQs are OK.

A read or irq status clears pending in the same way it clears
irq line for level.  I don't think this generates spurious irqs. Yes it
only works for PV.

For assigned devices, the only way I see to implement PBA
correctly is by masking the vector in the device
and looking at the actual pending bit.

  
  So we either need to
  extend the existing model anyway -- or push per-vector masking back to
  the IRQ source. In the latter case, it would be a very good chance to
  give up on limited pseudo GSIs with static routes and do MSI messaging
  from external IRQ sources to KVM directly.
  But all those considerations affect different APIs than what I'm
  proposing here. We will always need a way to inject MSIs in the context
  of the VM as there will always be scenarios where devices are better run
  in that very same context, for performance or simplicity or whatever
  reasons. E.g., I could imagine that one would like to execute an
  emulated IRQ remapper rather in the hypervisor context than
  over-microkernelized in a separate process.
 
  Jan
 
  *) Realized this while trying to generalize the proposed MSI-X MMIO
  acceleration for assigned devices to arbitrary device models, vhost-net,
  
  I'm actually working on a qemu patch to get pba emulation working correctly.
  I think it's doable with existing irqfd.
 
 irqfd has no notion of level. You can only communicate a rising edge and
 then need a side channel for the state of the edge reason.

True. But we only need that for PBA read which is unused ATM.
So kvm can just send the read to userspace, have qemu query
vfio or whatever.

  
  and specifically vfio.
  
  Interesting. How would you clear the pseudo interrupt level?
 
 Ideally: not at all (for MSI). If we manage the mask at device level, we
 only need to send the message if there is actually something to deliver
 to the interrupt controller and masked input events would be lost on
 real HW as well.

Not sure I understand. we certainly shouldn't send masked
interrupts to the APIC if for no other reason that
the message value is invalid while masked.

 That said, we still need to address the irqfd level topic for the finite
 amount of legacy interrupt lines. If a line is masked at an IRQ
 controller, the device need to keep the controller up to date /wrt to
 the line state, or the controller has to poll the current state on
 unmask to avoid spurious injections.
 
 Jan

Yes, level interrupts are tricky.

 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Michael S. Tsirkin
On Mon, Oct 24, 2011 at 03:43:53PM +0200, Jan Kiszka wrote:
 On 2011-10-24 15:11, Jan Kiszka wrote:
  On 2011-10-24 14:43, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote:
  On 2011-10-24 13:09, Avi Kivity wrote:
  On 10/24/2011 12:19 PM, Jan Kiszka wrote:
 
  With the new feature it may be worthwhile, but I'd like to see the 
  whole
  thing, with numbers attached.
 
  It's not a performance issue, it's a resource limitation issue: With the
  new API we can stop worrying about user space device models consuming
  limited IRQ routes of the KVM subsystem.
 
 
  Only if those devices are in the same process (or have access to the
  vmfd).  Interrupt routing together with irqfd allows you to disaggregate
  the device model.  Instead of providing a competing implementation with
  new limitations, we need to remove the limitations of the old
  implementation.
 
  That depends on where we do the cut. Currently we let the IRQ source
  signal an abstract edge on a pre-allocated pseudo IRQ line. But we
  cannot build correct MSI-X on top of the current irqfd model as we lack
  the level information (for PBA emulation). *)
 
 
  I don't agree here. IMO PBA emulation would need to
  clear pending bits on interrupt status register read.
  So clearing pending bits could be done by ioctl from qemu
  while setting them would be done from irqfd.
  
  How should QEMU know if the reason for pending has been cleared at
  device level if the device is outside the scope of QEMU? This model only
  works for PV devices when you agree that spurious IRQs are OK.
  
 
  So we either need to
  extend the existing model anyway -- or push per-vector masking back to
  the IRQ source. In the latter case, it would be a very good chance to
  give up on limited pseudo GSIs with static routes and do MSI messaging
  from external IRQ sources to KVM directly.
  But all those considerations affect different APIs than what I'm
  proposing here. We will always need a way to inject MSIs in the context
  of the VM as there will always be scenarios where devices are better run
  in that very same context, for performance or simplicity or whatever
  reasons. E.g., I could imagine that one would like to execute an
  emulated IRQ remapper rather in the hypervisor context than
  over-microkernelized in a separate process.
 
  Jan
 
  *) Realized this while trying to generalize the proposed MSI-X MMIO
  acceleration for assigned devices to arbitrary device models, vhost-net,
 
  I'm actually working on a qemu patch to get pba emulation working 
  correctly.
  I think it's doable with existing irqfd.
  
  irqfd has no notion of level. You can only communicate a rising edge and
  then need a side channel for the state of the edge reason.
  
 
  and specifically vfio.
 
  Interesting. How would you clear the pseudo interrupt level?
  
  Ideally: not at all (for MSI). If we manage the mask at device level, we
  only need to send the message if there is actually something to deliver
  to the interrupt controller and masked input events would be lost on
  real HW as well.
 
 This wouldn't work out nicely as well. We rather need a combined model:
 
 Devices need to maintain the PBA actively, i.e. set  clear them
 themselves and do not rely on the core here (with the core being either
 QEMU user space or an in-kernel MSI-X MMIO accelerator). The core only
 checks the PBA if it is about to deliver some message and refrains from
 doing so if the bit became 0 in the meantime (specifically during the
 masked period).

 For QEMU device models, that means no additional IOCTLs,
 just memory sharing of the PBA which is required anyway.

Sorry, I don't understand the above two paragraphs. Maybe I am
confused by terminology here. We really only need to check PBA when it's
read.  Whether the message is delivered only depends on the mask bit.


 
 But that means QEMU-external device models need to gain at least basic
 MSI-X knowledge. And if they gain this awareness, they could also use it
 to send full-blown messages directly (e.g. device-id/vector tuples)
 instead of encoding them into finite GSI numbers. But that's an add-on
 topic.
 
 Moreover, we still need a corresponding side channel for line-base
 interrupts.
 
 Jan

Agree on all points with the above.

 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Jan Kiszka
On 2011-10-24 16:40, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 03:43:53PM +0200, Jan Kiszka wrote:
 On 2011-10-24 15:11, Jan Kiszka wrote:
 On 2011-10-24 14:43, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote:
 On 2011-10-24 13:09, Avi Kivity wrote:
 On 10/24/2011 12:19 PM, Jan Kiszka wrote:

 With the new feature it may be worthwhile, but I'd like to see the 
 whole
 thing, with numbers attached.

 It's not a performance issue, it's a resource limitation issue: With the
 new API we can stop worrying about user space device models consuming
 limited IRQ routes of the KVM subsystem.


 Only if those devices are in the same process (or have access to the
 vmfd).  Interrupt routing together with irqfd allows you to disaggregate
 the device model.  Instead of providing a competing implementation with
 new limitations, we need to remove the limitations of the old
 implementation.

 That depends on where we do the cut. Currently we let the IRQ source
 signal an abstract edge on a pre-allocated pseudo IRQ line. But we
 cannot build correct MSI-X on top of the current irqfd model as we lack
 the level information (for PBA emulation). *)


 I don't agree here. IMO PBA emulation would need to
 clear pending bits on interrupt status register read.
 So clearing pending bits could be done by ioctl from qemu
 while setting them would be done from irqfd.

 How should QEMU know if the reason for pending has been cleared at
 device level if the device is outside the scope of QEMU? This model only
 works for PV devices when you agree that spurious IRQs are OK.


 So we either need to
 extend the existing model anyway -- or push per-vector masking back to
 the IRQ source. In the latter case, it would be a very good chance to
 give up on limited pseudo GSIs with static routes and do MSI messaging
 from external IRQ sources to KVM directly.
 But all those considerations affect different APIs than what I'm
 proposing here. We will always need a way to inject MSIs in the context
 of the VM as there will always be scenarios where devices are better run
 in that very same context, for performance or simplicity or whatever
 reasons. E.g., I could imagine that one would like to execute an
 emulated IRQ remapper rather in the hypervisor context than
 over-microkernelized in a separate process.

 Jan

 *) Realized this while trying to generalize the proposed MSI-X MMIO
 acceleration for assigned devices to arbitrary device models, vhost-net,

 I'm actually working on a qemu patch to get pba emulation working 
 correctly.
 I think it's doable with existing irqfd.

 irqfd has no notion of level. You can only communicate a rising edge and
 then need a side channel for the state of the edge reason.


 and specifically vfio.

 Interesting. How would you clear the pseudo interrupt level?

 Ideally: not at all (for MSI). If we manage the mask at device level, we
 only need to send the message if there is actually something to deliver
 to the interrupt controller and masked input events would be lost on
 real HW as well.

 This wouldn't work out nicely as well. We rather need a combined model:

 Devices need to maintain the PBA actively, i.e. set  clear them
 themselves and do not rely on the core here (with the core being either
 QEMU user space or an in-kernel MSI-X MMIO accelerator). The core only
 checks the PBA if it is about to deliver some message and refrains from
 doing so if the bit became 0 in the meantime (specifically during the
 masked period).

 For QEMU device models, that means no additional IOCTLs,
 just memory sharing of the PBA which is required anyway.
 
 Sorry, I don't understand the above two paragraphs. Maybe I am
 confused by terminology here. We really only need to check PBA when it's
 read.  Whether the message is delivered only depends on the mask bit.

This is what I have in mind:
 - devices set PBA bit if MSI message cannot be sent due to mask (*)
 - core checksclears PBA bit on unmask, injects message if bit was set
 - devices clear PBA bit if message reason is resolved before unmask (*)

The marked (*) lines differ from the current user space model where only
the core does PBA manipulation (including clearance via a special
function). Basically, the PBA becomes a communication channel also
between device and MSI core. And this model also works if core and
device run in different processes provided they set up the PBA as shared
memory.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for October 25

2011-10-24 Thread Luiz Capitulino
On Mon, 24 Oct 2011 13:02:05 +0100
Peter Maydell peter.mayd...@linaro.org wrote:

 On 24 October 2011 12:35, Paolo Bonzini pbonz...@redhat.com wrote:
  On 10/24/2011 01:04 PM, Juan Quintela wrote:
  Please send in any agenda items you are interested in covering.
 
  - What's left to merge for 1.0.
 
 Things on my list, FWIW:
  * current target-arm pullreq
  * PL041 support (needs another patch round to fix a minor bug
Andrzej spotted)
  * cpu_single_env must be thread-local

I submitted today the second round of QAPI conversions, which converts all
existing QMP query commands to the QAPI (plus some fixes).

I expect that to make 1.0.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Michael S. Tsirkin
On Mon, Oct 24, 2011 at 05:00:27PM +0200, Jan Kiszka wrote:
 On 2011-10-24 16:40, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 03:43:53PM +0200, Jan Kiszka wrote:
  On 2011-10-24 15:11, Jan Kiszka wrote:
  On 2011-10-24 14:43, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 02:06:08PM +0200, Jan Kiszka wrote:
  On 2011-10-24 13:09, Avi Kivity wrote:
  On 10/24/2011 12:19 PM, Jan Kiszka wrote:
 
  With the new feature it may be worthwhile, but I'd like to see the 
  whole
  thing, with numbers attached.
 
  It's not a performance issue, it's a resource limitation issue: With 
  the
  new API we can stop worrying about user space device models consuming
  limited IRQ routes of the KVM subsystem.
 
 
  Only if those devices are in the same process (or have access to the
  vmfd).  Interrupt routing together with irqfd allows you to 
  disaggregate
  the device model.  Instead of providing a competing implementation with
  new limitations, we need to remove the limitations of the old
  implementation.
 
  That depends on where we do the cut. Currently we let the IRQ source
  signal an abstract edge on a pre-allocated pseudo IRQ line. But we
  cannot build correct MSI-X on top of the current irqfd model as we lack
  the level information (for PBA emulation). *)
 
 
  I don't agree here. IMO PBA emulation would need to
  clear pending bits on interrupt status register read.
  So clearing pending bits could be done by ioctl from qemu
  while setting them would be done from irqfd.
 
  How should QEMU know if the reason for pending has been cleared at
  device level if the device is outside the scope of QEMU? This model only
  works for PV devices when you agree that spurious IRQs are OK.
 
 
  So we either need to
  extend the existing model anyway -- or push per-vector masking back to
  the IRQ source. In the latter case, it would be a very good chance to
  give up on limited pseudo GSIs with static routes and do MSI messaging
  from external IRQ sources to KVM directly.
  But all those considerations affect different APIs than what I'm
  proposing here. We will always need a way to inject MSIs in the context
  of the VM as there will always be scenarios where devices are better run
  in that very same context, for performance or simplicity or whatever
  reasons. E.g., I could imagine that one would like to execute an
  emulated IRQ remapper rather in the hypervisor context than
  over-microkernelized in a separate process.
 
  Jan
 
  *) Realized this while trying to generalize the proposed MSI-X MMIO
  acceleration for assigned devices to arbitrary device models, vhost-net,
 
  I'm actually working on a qemu patch to get pba emulation working 
  correctly.
  I think it's doable with existing irqfd.
 
  irqfd has no notion of level. You can only communicate a rising edge and
  then need a side channel for the state of the edge reason.
 
 
  and specifically vfio.
 
  Interesting. How would you clear the pseudo interrupt level?
 
  Ideally: not at all (for MSI). If we manage the mask at device level, we
  only need to send the message if there is actually something to deliver
  to the interrupt controller and masked input events would be lost on
  real HW as well.
 
  This wouldn't work out nicely as well. We rather need a combined model:
 
  Devices need to maintain the PBA actively, i.e. set  clear them
  themselves and do not rely on the core here (with the core being either
  QEMU user space or an in-kernel MSI-X MMIO accelerator). The core only
  checks the PBA if it is about to deliver some message and refrains from
  doing so if the bit became 0 in the meantime (specifically during the
  masked period).
 
  For QEMU device models, that means no additional IOCTLs,
  just memory sharing of the PBA which is required anyway.
  
  Sorry, I don't understand the above two paragraphs. Maybe I am
  confused by terminology here. We really only need to check PBA when it's
  read.  Whether the message is delivered only depends on the mask bit.
 
 This is what I have in mind:
  - devices set PBA bit if MSI message cannot be sent due to mask (*)
  - core checksclears PBA bit on unmask, injects message if bit was set
  - devices clear PBA bit if message reason is resolved before unmask (*)

OK, but practically, when exactly does the device clear PBA?

 The marked (*) lines differ from the current user space model where only
 the core does PBA manipulation (including clearance via a special
 function). Basically, the PBA becomes a communication channel also
 between device and MSI core. And this model also works if core and
 device run in different processes provided they set up the PBA as shared
 memory.
 
 Jan
 


 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Jan Kiszka
On 2011-10-24 18:05, Michael S. Tsirkin wrote:
 This is what I have in mind:
  - devices set PBA bit if MSI message cannot be sent due to mask (*)
  - core checksclears PBA bit on unmask, injects message if bit was set
  - devices clear PBA bit if message reason is resolved before unmask (*)
 
 OK, but practically, when exactly does the device clear PBA?

Consider a network adapter that signals messages in a RX ring: If the
corresponding vector is masked while the guest empties the ring, I
strongly assume that the device is supposed to take back the pending bit
in that case so that there is no interrupt inject on a later vector
unmask operation.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] virt: Revert only update macaddr cache when capture dhcp ACK pkt

2011-10-24 Thread Lucas Meneghel Rodrigues
Revert commit d9bab5bef598b4b415d004eb62e9cd32c3243565, that changes
how the macaddr cache is updated. This patch brought a lot of
regressions on our internal tests, so it'll be dropped until a possibly
safer version of the fix is proposed.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/virt/virt_env_process.py |   10 ++
 1 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/client/virt/virt_env_process.py b/client/virt/virt_env_process.py
index a1ec07a..25285b8 100644
--- a/client/virt/virt_env_process.py
+++ b/client/virt/virt_env_process.py
@@ -403,20 +403,14 @@ def _update_address_cache(address_cache, line):
 address_cache[last_seen] = matches[0]
 if re.search(Client.Ethernet.Address, line, re.IGNORECASE):
 matches = re.findall(r\w*:\w*:\w*:\w*:\w*:\w*, line)
-if matches:
-address_cache[last_mac] = matches[0]
-if re.search(DHCP-Message, line, re.IGNORECASE):
-matches = re.findall(rACK, line)
-if matches and (address_cache.get(last_seen) and
-address_cache.get(last_mac)):
-mac_address = address_cache.get(last_mac).lower()
+if matches and address_cache.get(last_seen):
+mac_address = matches[0].lower()
 if time.time() - address_cache.get(time_%s % mac_address, 0)  5:
 logging.debug((address cache) Adding cache entry: %s --- %s,
   mac_address, address_cache.get(last_seen))
 address_cache[mac_address] = address_cache.get(last_seen)
 address_cache[time_%s % mac_address] = time.time()
 del address_cache[last_seen]
-del address_cache[last_mac]
 
 
 def _take_screendumps(test, params, env):
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Michael S. Tsirkin
On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote:
 On 2011-10-24 18:05, Michael S. Tsirkin wrote:
  This is what I have in mind:
   - devices set PBA bit if MSI message cannot be sent due to mask (*)
   - core checksclears PBA bit on unmask, injects message if bit was set
   - devices clear PBA bit if message reason is resolved before unmask (*)
  
  OK, but practically, when exactly does the device clear PBA?
 
 Consider a network adapter that signals messages in a RX ring: If the
 corresponding vector is masked while the guest empties the ring, I
 strongly assume that the device is supposed to take back the pending bit
 in that case so that there is no interrupt inject on a later vector
 unmask operation.
 
 Jan

Do you mean virtio here? Do you expect this optimization to give
a significant performance gain?

 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-24 Thread Michael S. Tsirkin
On Mon, Oct 24, 2011 at 07:05:08PM +0200, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote:
  On 2011-10-24 18:05, Michael S. Tsirkin wrote:
   This is what I have in mind:
- devices set PBA bit if MSI message cannot be sent due to mask (*)
- core checksclears PBA bit on unmask, injects message if bit was set
- devices clear PBA bit if message reason is resolved before unmask (*)
   
   OK, but practically, when exactly does the device clear PBA?
  
  Consider a network adapter that signals messages in a RX ring: If the
  corresponding vector is masked while the guest empties the ring, I
  strongly assume that the device is supposed to take back the pending bit
  in that case so that there is no interrupt inject on a later vector
  unmask operation.
  
  Jan
 
 Do you mean virtio here? Do you expect this optimization to give
 a significant performance gain?

It would also be challenging to implement this in
a race free manner. Clearing on interrupt status read
seems straight-forward.

  -- 
  Siemens AG, Corporate Technology, CT T DE IT 1
  Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next-2.6 PATCH 0/8 RFC v2] macvlan: MAC Address filtering support for passthru mode

2011-10-24 Thread Roopa Prabhu
On 10/23/11 10:47 PM, Michael S. Tsirkin m...@redhat.com wrote:

 On Tue, Oct 18, 2011 at 11:25:54PM -0700, Roopa Prabhu wrote:
 v1 version of this RFC patch was posted at
 http://www.spinics.net/lists/netdev/msg174245.html
 
 Today macvtap used in virtualized environment does not have support to
 propagate MAC, VLAN and interface flags from guest to lowerdev.
 Which means to be able to register additional VLANs, unicast and multicast
 addresses or change pkt filter flags in the guest, the lowerdev has to be
 put in promisocous mode. Today the only macvlan mode that supports this is
 the PASSTHRU mode and it puts the lower dev in promiscous mode.
 
 PASSTHRU mode was added primarily for the SRIOV usecase. In PASSTHRU mode
 there is a 1-1 mapping between macvtap and physical NIC or VF.
 
 There are two problems with putting the lowerdev in promiscous mode (ie SRIOV
 VF's):
 - Some SRIOV cards dont support promiscous mode today (Thread on Intel
 driver indicates that http://lists.openwall.net/netdev/2011/09/27/6)
 - For the SRIOV NICs that support it, Putting the lowerdev in
 promiscous mode leads to additional traffic being sent up to the
 guest virtio-net to filter result in extra overheads.
 
 Both the above problems can be solved by offloading filtering to the
 lowerdev hw. ie lowerdev does not need to be in promiscous mode as
 long as the guest filters are passed down to the lowerdev.
 
 This patch basically adds the infrastructure to set and get MAC and VLAN
 filters on an interface via rtnetlink. And adds support in macvlan and
 macvtap
 to allow set and get filter operations.
 
 Looks sane to me. Some minor comments below.
 
 Earlier version of this patch provided the TUNSETTXFILTER macvtap interface
 for setting address filtering. In response to feedback, This version
 introduces a netlink interface for the same.
 
 Response to some of the questions raised during v1:
 
 - Netlink interface:
 This patch provides the following netlink interface to set mac and vlan
 filters :
 [IFLA_RX_FILTER] = {
 [IFLA_ADDR_FILTER] = {
 [IFLA_ADDR_FILTER_FLAGS]
 [IFLA_ADDR_FILTER_UC_LIST] = {
 [IFLA_ADDR_LIST_ENTRY]
 }
 [IFLA_ADDR_FILTER_MC_LIST] = {
 [IFLA_ADDR_LIST_ENTRY]
 }
 }
 [IFLA_VLAN_FILTER] = {
 [IFLA_VLAN_BITMAP]
 }
 }
 
 Note: The IFLA_VLAN_FILTER is a nested attribute and contains only
 IFLA_VLAN_BITMAP today. The idea is that the IFLA_VLAN_FILTER can
 be extended tomorrow to use a vlan list option if some implementations
 prefer a list instead.
 
 And it provides the following rtnl_link_ops to set/get MAC/VLAN filters:
 
int (*set_rx_addr_filter)(struct net_device *dev,
struct nlattr *tb[]);
int (*set_rx_vlan_filter)(struct net_device *dev,
 struct nlattr *tb[]);
size_t  (*get_rx_addr_filter_size)(const struct
 net_device *dev);
size_t  (*get_rx_vlan_filter_size)(const struct
 net_device *dev);
int (*fill_rx_addr_filter)(struct sk_buff *skb,
 const struct net_device
 *dev);
int (*fill_rx_vlan_filter)(struct sk_buff *skb,
 const struct net_device
 *dev);
 
 
 Note: The choice of rtnl_link_ops was because I saw the use case for
 this in virtual devices that need  to do filtering in sw like macvlan
 and tun. Hw devices usually have filtering in hw with netdev-uc and
 mc lists to indicate active filters. But I can move from rtnl_link_ops
 to netdev_ops if that is the preferred way to go and if there is a
 need to support this interface on all kinds of interfaces.
 Please suggest.
 
 - Protection against address spoofing:
 - This patch adds filtering support only for macvtap PASSTHRU
 Mode. PASSTHRU mode is used mainly with SRIOV VF's. And SRIOV VF's
 come with anti mac/vlan spoofing support. (Recently added
 IFLA_VF_SPOOFCHK). In 802.1Qbh case the port profile has a knob to
 enable/disable anti spoof check. Lowerdevice drivers also enforce limits
 on the number of address registrations allowed.
 
 - Support for multiqueue devices: Enable filtering on individual queues (?):
 AFAIK, there is no netdev interface to install per queue hw
 filters for a multi queue interface. And also I dont know of any hw
 that provides an interface to set hw filters on a per queue basis.
 
 VMDq hardware would support this, no?
 
Am not really sure. This patch uses netdev to pass filters to hw. And I
don't see any netdev infrastructure that would support per queue filters.
Maybe Greg (CC'ed) or anyone else from Intel can answer this.
Greg, michael had brought up this question during first version of these
patches as well. Will be nice to get the VMDq requirements for propagating
guest filters to hw clarified. Do you see any special VMDq nic requirement
we can cover in this patch. 

Re: [PATCH RFC V2 4/5] kvm guest : Added configuration support to enable debug information for KVM Guests

2011-10-24 Thread Raghavendra K T

On 10/24/2011 03:31 PM, Sasha Levin wrote:

On Mon, 2011-10-24 at 00:37 +0530, Raghavendra K T wrote:

Added configuration support to enable debug information
for KVM Guests in debugfs
+config KVM_DEBUG_FS
+   bool Enable debug information for KVM Guests in debugfs
+   depends on KVM_GUEST


Shouldn't it depend on DEBUG_FS as well?

Thanks again for pointing. will correct this too.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


11 Seconds waiting between pings, but network-throughput is fast

2011-10-24 Thread Andreas Piening
Dear KVM-List,

I have a really strange network-issue and I'm running out of ideas how to track 
this down.

I have two tap-devices on one bridge:

bridge name bridge id   STP enabled interfaces
br0 8000.00e081c682e7   no  eth0

   vm0

   vm1

Each of them are connected to a virtual machine with virtio. The machines are 
both running MS Windows Server 2008 R2 and are configured equally from the 
kvm-perspective. I can transfer data over the network from a windows-share in 
both directions with over 40 MB/sec.

On one server, I have a noticeable latency over RDP. When I do a ping from this 
server, the first response comes immediately but there is a 11 seconds delay 
before the second ping is send on it's way!
I know this sounds crazy but I've installed wireshark to track this down and 
the response for every ping comes immediately but there are huge pauses (about 
11 seconds) between the pings are sent out. The CPU utilization is nearly zero 
and the disk-io is fast (raid10, lvm, virtio).

Any help or ideas on that are appreciated,

Andreas Piening--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hard limit for the cpu usage of a VM

2011-10-24 Thread sethuraman subbiah
Hi ,

I was previously using xen and currently moved to KVM. I am using libvirt to 
manage these VMs. In den's credit scheduler , I had the ability to set a cap on 
the cpu usage for a VM. But I was not able to find a similar substitute in KVM. 
I find that we can use cgroups to provide shares for VM but that will be more 
like weight based and it doesn't set a hard cap for that VM. I tried using 
cpulimit but I find it inaccurate and we can give values only between 0-100. 
Thus, I think it cannot support multi core environments. Can any one suggest a 
method to set a hard limit on a VM's cpu usage? Thank you. 


-
Regards,
Sethuraman Subbiah

Graduate Student - NC state University
M.S in Computer Science--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [net-next-2.6 PATCH 0/8 RFC v2] macvlan: MAC Address filtering support for passthru mode

2011-10-24 Thread Rose, Gregory V
 -Original Message-
 From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
 On Behalf Of Roopa Prabhu
 Sent: Monday, October 24, 2011 11:15 AM
 To: Michael S. Tsirkin
 Cc: net...@vger.kernel.org; s...@us.ibm.com; dragos.tatu...@gmail.com;
 a...@arndb.de; kvm@vger.kernel.org; da...@davemloft.net;
 mc...@broadcom.com; dwa...@cisco.com; shemmin...@vyatta.com;
 eric.duma...@gmail.com; ka...@trash.net; be...@cisco.com; Rose, Gregory V
 Subject: Re: [net-next-2.6 PATCH 0/8 RFC v2] macvlan: MAC Address
 filtering support for passthru mode
 
 On 10/23/11 10:47 PM, Michael S. Tsirkin m...@redhat.com wrote:
 
  AFAIK, there is no netdev interface to install per queue hw
  filters for a multi queue interface. And also I dont know of any hw
  that provides an interface to set hw filters on a per queue basis.
 
  VMDq hardware would support this, no?
 
 Am not really sure. This patch uses netdev to pass filters to hw. And I
 don't see any netdev infrastructure that would support per queue filters.
 Maybe Greg (CC'ed) or anyone else from Intel can answer this.
 Greg, michael had brought up this question during first version of these
 patches as well. Will be nice to get the VMDq requirements for propagating
 guest filters to hw clarified. Do you see any special VMDq nic requirement
 we can cover in this patch. This is for VMDq queues directly connected to
 guest nics. Thanks.

So far as I know there is no support for VMDq in the Linux kernel and while I 
know some folks have been working on it I can't really speak to that work or 
their plans.  Much would depend on the implementation.

For now it makes sense to me to get support for multiple MAC and VLAN filters 
per virtual function (or virtual nic) and it seems to me you're going in the 
right direction for this.  We'll have a look at your next set of patches and 
take it from there.

- Greg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Extreme time-drifts under windows server 2008 R2

2011-10-24 Thread Andreas Piening
Hi KVM-list,

I have sent an email with my problem allready a few hours ago (attached below, 
for reference).

After some additional examination of the system I figured out new facts that 
completely change my problem:

When I open up the date- and time-settings dialog in windows the time seems to 
be freezed! But after (gues what?) 11 seconds the next second is displayed! The 
system-clock seems to be at a tenth of the real-time-clock.
I have a time-drift of about 10 seconds per second on my windows server 2008 R2 
guest.

I searched the web and found a command that should be entered as Administrator 
to eliminate the time-drift: bcdedit /set {default} USEPLATFORMCLOCK
But it hasn't changed the situation for me.
I have set -localtime as a kvm parameter, what else can I do? Someone got a 
similar problem before and solved it?

I'm not sure if it is related to the problem, but in the host-kernel I have:
Tickless System (Dynamic Ticks) ENABLED,
and
High Resolution Timer Support DISABLED

... for no specific reason. What are the correct settings here?

Thank you in advance!

Andreas Piening





-

Dear KVM-List,

I have a really strange network-issue and I'm running out of ideas how to track 
this down.

I have two tap-devices on one bridge:

bridge name bridge id   STP enabled interfaces
br0 8000.00e081c682e7   no  eth0
   vm0
   vm1

Each of them are connected to a virtual machine with virtio. The machines are 
both running MS Windows Server 2008 R2 and are configured equally from the 
kvm-perspective. I can transfer data over the network from a windows-share in 
both directions with over 40 MB/sec.

On one server, I have a noticeable latency over RDP. When I do a ping from this 
server, the first response comes immediately but there is a 11 seconds delay 
before the second ping is send on it's way!
I know this sounds crazy but I've installed wireshark to track this down and 
the response for every ping comes immediately but there are huge pauses (about 
11 seconds) between the pings are sent out. The CPU utilization is nearly zero 
and the disk-io is fast (raid10, lvm, virtio).

Any help or ideas on that are appreciated,

Andreas Piening--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC v2 PATCH 5/4 PATCH] virtio-net: send gratuitous packet when needed

2011-10-24 Thread Jason Wang
On 10/24/2011 01:25 PM, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 02:54:59PM +1030, Rusty Russell wrote:
 On Sat, 22 Oct 2011 13:43:11 +0800, Jason Wang jasow...@redhat.com wrote:
 This make let virtio-net driver can send gratituous packet by a new
 config bit - VIRTIO_NET_S_ANNOUNCE in each config update
 interrupt. When this bit is set by backend, the driver would schedule
 a workqueue to send gratituous packet through NETDEV_NOTIFY_PEERS.

 This feature is negotiated through bit VIRTIO_NET_F_GUEST_ANNOUNCE.

 Signed-off-by: Jason Wang jasow...@redhat.com

 This seems like a huge layering violation.  Imagine this in real
 hardware, for example.
 
 commits 06c4648d46d1b757d6b9591a86810be79818b60c
 and 99606477a5888b0ead0284fecb13417b1da8e3af
 document the need for this:
 
 NETDEV_NOTIFY_PEERS notifier indicates that a device moved to a 
 different physical link.
   and
 In real hardware such notifications are only
 generated when the device comes up or the address changes.
 
 So hypervisor could get the same behaviour by sending link up/down
 events, this is just an optimization so guest won't do
 unecessary stuff like try to reconfigure an IP address.
 
 
 Maybe LOCATION_CHANGE would be a better name?
 

ANNOUNCE_SELF?

 
 There may be a good reason why virtual devices might want this kind of
 reconfiguration cheat, which is unnecessary for normal machines,
 
 I think yes, the difference with real hardware is guest can change
 location without link getting dropped.
 FWIW, Xen seems to use this capability too.

So does ms netvsc.

 
 but
 it'd have to be spelled out clearly in the spec to justify it...

 Cheers,
 Rusty.
 
 Agree, and I'd like to see the spec too. The interface seems
 to involve the guest clearing the status bit when it detects
 an event?

I would describe this in spec. The interface need guest to clear the
status bit, this would let the back-end know it has finished the work as
we may need to send the gratuitous packets many times.

 
 Also - how does it interact with the link up event?
 We probably don't want to schedule this when we detect
 a link status change or during initialization, as
 this patch seems to do? What if link goes down
 while the work is running? Is that OK?
 

Looks like there's are duplications if guest enable arp_notify vm is
started, but we need to handle the situation that resuming a stopped
virtual machine.

For the link down race, I don't see any real issue, either dropping or
queued.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index

2011-10-24 Thread Rusty Russell
On Mon, 24 Oct 2011 12:02:18 +0200, Jens Axboe ax...@kernel.dk wrote:
 On 2011-10-24 12:02, Michael S. Tsirkin wrote:
  On Wed, Oct 19, 2011 at 12:12:20PM +0200, Michael S. Tsirkin wrote:
  Rusty, any opinion on merging this for 3.2?
  I expect merge window will open right after the summit,
 
 I can toss it into for-3.2/drivers, if there's consensus to do that now.

I'd like to see the final patch... we got the new simplified ida stuff
in, so I assume it uses that?

But assume silence from me means consent: it's obviously the Right
Thing.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html