Re: [PATCH] target-ppc: Update slb array with correct index values.
On 20.08.2013, at 14:57, Aneesh Kumar K.V wrote: Alexander Graf ag...@suse.de writes: On 19.08.2013, at 09:25, Aneesh Kumar K.V wrote: Alexander Graf ag...@suse.de writes: On 11.08.2013, at 20:16, Aneesh Kumar K.V wrote: From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Without this, a value of rb=0 and rs=0, result in us replacing the 0th index Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Wrong mailing list again ;). Will post the series again with updated commit message to the qemu list. --- target-ppc/kvm.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 30a870e..5d4e613 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -1034,8 +1034,18 @@ int kvm_arch_get_registers(CPUState *cs) /* Sync SLB */ #ifdef TARGET_PPC64 for (i = 0; i 64; i++) { -ppc_store_slb(env, sregs.u.s.ppc64.slb[i].slbe, - sregs.u.s.ppc64.slb[i].slbv); +target_ulong rb = sregs.u.s.ppc64.slb[i].slbe; +/* + * KVM_GET_SREGS doesn't retun slb entry with slot information + * same as index. So don't depend on the slot information in + * the returned value. This is the generating code in book3s_pr.c: if (vcpu-arch.hflags BOOK3S_HFLAG_SLB) { for (i = 0; i 64; i++) { sregs-u.s.ppc64.slb[i].slbe = vcpu-arch.slb[i].orige | i; sregs-u.s.ppc64.slb[i].slbv = vcpu-arch.slb[i].origv; } Where exactly did you see broken slbe entries? I noticed this when adding support for guest memory dumping via qemu gdb server. Now the array we get would look like below slbe0 slbv0 slbe1 slbv1 0 0 Ok, so that's where the problem lies. Why are the entries 0 here? Either we try to fetch more entries than we should, we populate entries incorrectly or the kernel simply returns invalid SLB entry values for invalid entries. The ioctl zero out the sregs, and fill only slb_max entries. So we find 0 filled entries above slb_max. Also we don't pass slb_max to user space. So userspace have to look at all the 64 entries. We do pass slb_max, it's just called differently and calculated implicitly :). How about something like this: diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 30a870e..29a2ec3 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -818,6 +818,8 @@ int kvm_arch_put_registers(CPUState *cs, int level) /* Sync SLB */ #ifdef TARGET_PPC64 +/* We need to loop through all entries to give them potentially + valid values */ for (i = 0; i 64; i++) { sregs.u.s.ppc64.slb[i].slbe = env-slb[i].esid; sregs.u.s.ppc64.slb[i].slbv = env-slb[i].vsid; @@ -1033,7 +1035,7 @@ int kvm_arch_get_registers(CPUState *cs) /* Sync SLB */ #ifdef TARGET_PPC64 -for (i = 0; i 64; i++) { +for (i = 0; i env-slb_nr; i++) { ppc_store_slb(env, sregs.u.s.ppc64.slb[i].slbe, sregs.u.s.ppc64.slb[i].slbv); } Are you seeing this with PR KVM or HV KVM? HV KVM If the above didn't help, could you please dig out how HV KVM assembles its SLB information and just paste it here? :) Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-ppc: Update slb array with correct index values.
On Mon, Aug 19, 2013 at 10:21:09AM +0200, Alexander Graf wrote: On 19.08.2013, at 09:25, Aneesh Kumar K.V wrote: I noticed this when adding support for guest memory dumping via qemu gdb server. Now the array we get would look like below slbe0 slbv0 slbe1 slbv1 0 0 Ok, so that's where the problem lies. Why are the entries 0 here? Either we try to fetch more entries than we should, we populate entries incorrectly or the kernel simply returns invalid SLB entry values for invalid entries. Are you seeing this with PR KVM or HV KVM? I suspect this is to do with the fact that PR and HV KVM use the vcpu-arch.slb[] array differently. PR stores SLB entry n in vcpu-arch.slb[n], whereas HV packs the valid entries down in the low-numbered entries and puts the index in the bottom bits of the esid field (this is so they can be loaded efficiently with the slbmte instruction on guest entry). Then, kvm_arch_vcpu_ioctl_get_sregs() on PR copies out all 64 entries (valid or not) and puts an index value in the bottom bits of the esid, whereas on HV it just copies out the valid entries (which already have the index in the esid field). So, the question is, what is the ABI here? It sounds a bit like qemu is ignoring the index value in the esid field. Either qemu needs to take notice of the index in the esid field or we need to change the HV versions of kvm_arch_vcpu_ioctl_get/set_sregs to put entry n in sregs-u.s.ppc64.slb[n] like PR does. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-ppc: Update slb array with correct index values.
On 21.08.2013, at 06:11, Paul Mackerras wrote: On Mon, Aug 19, 2013 at 10:21:09AM +0200, Alexander Graf wrote: On 19.08.2013, at 09:25, Aneesh Kumar K.V wrote: I noticed this when adding support for guest memory dumping via qemu gdb server. Now the array we get would look like below slbe0 slbv0 slbe1 slbv1 0 0 Ok, so that's where the problem lies. Why are the entries 0 here? Either we try to fetch more entries than we should, we populate entries incorrectly or the kernel simply returns invalid SLB entry values for invalid entries. Are you seeing this with PR KVM or HV KVM? I suspect this is to do with the fact that PR and HV KVM use the vcpu-arch.slb[] array differently. PR stores SLB entry n in vcpu-arch.slb[n], whereas HV packs the valid entries down in the low-numbered entries and puts the index in the bottom bits of the esid field (this is so they can be loaded efficiently with the slbmte instruction on guest entry). Then, kvm_arch_vcpu_ioctl_get_sregs() on PR copies out all 64 entries (valid or not) and puts an index value in the bottom bits of the esid, whereas on HV it just copies out the valid entries (which already have the index in the esid field). So, the question is, what is the ABI here? It sounds a bit like qemu is ignoring the index value in the esid field. Either qemu needs to take notice of the index in the esid field or we need to change the HV versions of kvm_arch_vcpu_ioctl_get/set_sregs to put entry n in sregs-u.s.ppc64.slb[n] like PR does. It's the opposite today - QEMU does honor the index value on sregs get. Aneesh's patch wants to change it to ignore it instead. For sregs set we copy our internal copy of the slb linearly into the array, so we don't pack it there. Can we safely assume on HV KVM that esid == 0 vsid == 0 is the end of the list? If so, we can just add a break statement in the get loop and call it a day. The rest should work just fine. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-ppc: Update slb array with correct index values.
On 21.08.2013, at 08:37, Alexander Graf wrote: On 21.08.2013, at 06:11, Paul Mackerras wrote: On Mon, Aug 19, 2013 at 10:21:09AM +0200, Alexander Graf wrote: On 19.08.2013, at 09:25, Aneesh Kumar K.V wrote: I noticed this when adding support for guest memory dumping via qemu gdb server. Now the array we get would look like below slbe0 slbv0 slbe1 slbv1 0 0 Ok, so that's where the problem lies. Why are the entries 0 here? Either we try to fetch more entries than we should, we populate entries incorrectly or the kernel simply returns invalid SLB entry values for invalid entries. Are you seeing this with PR KVM or HV KVM? I suspect this is to do with the fact that PR and HV KVM use the vcpu-arch.slb[] array differently. PR stores SLB entry n in vcpu-arch.slb[n], whereas HV packs the valid entries down in the low-numbered entries and puts the index in the bottom bits of the esid field (this is so they can be loaded efficiently with the slbmte instruction on guest entry). Then, kvm_arch_vcpu_ioctl_get_sregs() on PR copies out all 64 entries (valid or not) and puts an index value in the bottom bits of the esid, whereas on HV it just copies out the valid entries (which already have the index in the esid field). So, the question is, what is the ABI here? Oh, and to answer your question here: The original intent was to copy the SLB array 1:1 from kvm to user space. But that's moot by now, since we already have kvm versions out there that return your packed format :). Alex It sounds a bit like qemu is ignoring the index value in the esid field. Either qemu needs to take notice of the index in the esid field or we need to change the HV versions of kvm_arch_vcpu_ioctl_get/set_sregs to put entry n in sregs-u.s.ppc64.slb[n] like PR does. It's the opposite today - QEMU does honor the index value on sregs get. Aneesh's patch wants to change it to ignore it instead. For sregs set we copy our internal copy of the slb linearly into the array, so we don't pack it there. Can we safely assume on HV KVM that esid == 0 vsid == 0 is the end of the list? If so, we can just add a break statement in the get loop and call it a day. The rest should work just fine. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-ppc: Update slb array with correct index values.
Alexander Graf ag...@suse.de writes: On 20.08.2013, at 14:57, Aneesh Kumar K.V wrote: Alexander Graf ag...@suse.de writes: On 19.08.2013, at 09:25, Aneesh Kumar K.V wrote: Alexander Graf ag...@suse.de writes: On 11.08.2013, at 20:16, Aneesh Kumar K.V wrote: From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Without this, a value of rb=0 and rs=0, result in us replacing the 0th index Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Wrong mailing list again ;). Will post the series again with updated commit message to the qemu list. --- target-ppc/kvm.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 30a870e..5d4e613 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -1034,8 +1034,18 @@ int kvm_arch_get_registers(CPUState *cs) /* Sync SLB */ #ifdef TARGET_PPC64 for (i = 0; i 64; i++) { -ppc_store_slb(env, sregs.u.s.ppc64.slb[i].slbe, - sregs.u.s.ppc64.slb[i].slbv); +target_ulong rb = sregs.u.s.ppc64.slb[i].slbe; +/* + * KVM_GET_SREGS doesn't retun slb entry with slot information + * same as index. So don't depend on the slot information in + * the returned value. This is the generating code in book3s_pr.c: if (vcpu-arch.hflags BOOK3S_HFLAG_SLB) { for (i = 0; i 64; i++) { sregs-u.s.ppc64.slb[i].slbe = vcpu-arch.slb[i].orige | i; sregs-u.s.ppc64.slb[i].slbv = vcpu-arch.slb[i].origv; } Where exactly did you see broken slbe entries? I noticed this when adding support for guest memory dumping via qemu gdb server. Now the array we get would look like below slbe0 slbv0 slbe1 slbv1 0 0 Ok, so that's where the problem lies. Why are the entries 0 here? Either we try to fetch more entries than we should, we populate entries incorrectly or the kernel simply returns invalid SLB entry values for invalid entries. The ioctl zero out the sregs, and fill only slb_max entries. So we find 0 filled entries above slb_max. Also we don't pass slb_max to user space. So userspace have to look at all the 64 entries. We do pass slb_max, it's just called differently and calculated implicitly :). How about something like this: diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 30a870e..29a2ec3 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -818,6 +818,8 @@ int kvm_arch_put_registers(CPUState *cs, int level) /* Sync SLB */ #ifdef TARGET_PPC64 +/* We need to loop through all entries to give them potentially + valid values */ for (i = 0; i 64; i++) { sregs.u.s.ppc64.slb[i].slbe = env-slb[i].esid; sregs.u.s.ppc64.slb[i].slbv = env-slb[i].vsid; @@ -1033,7 +1035,7 @@ int kvm_arch_get_registers(CPUState *cs) /* Sync SLB */ #ifdef TARGET_PPC64 -for (i = 0; i 64; i++) { +for (i = 0; i env-slb_nr; i++) { ppc_store_slb(env, sregs.u.s.ppc64.slb[i].slbe, sregs.u.s.ppc64.slb[i].slbv); } But we don't sync slb_max (max valid slb index), env-slb_nr is slb_nr (total number of slb slots). ? We also don't sync env-slb_nr everytime we do kvm_arch_get_register. The problem we have is, we first memset sregs with 0 and then fill only slb_max entries. Now slbe and slbv entries with value 0 results in us looking at those entries for 0th index, and update the 0th entry. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oracle RAC in libvirt+KVM environment
Il 21/08/2013 04:11, Timon Wang ha scritto: From the fedora 19 host: [root@fedora ~]# sg_inq /dev/sdc standard INQUIRY: PQual=0 Device_type=0 RMB=0 version=0x05 [SPC-3] [AERC=0] [TrmTsk=0] NormACA=0 HiSUP=0 Resp_data_format=0 SCCS=1 ACC=0 TPGS=1 3PC=0 Protect=0 [BQue=0] EncServ=0 MultiP=0 [MChngr=0] [ACKREQQ=0] Addr16=0 [RelAdr=0] WBus16=1 Sync=1 Linked=0 [TranDis=0] CmdQue=1 length=36 (0x24) Peripheral device type: disk Vendor identification: MacroSAN Product identification: LU Product revision level: 1.0 Unit serial number: fd01ece6-8540-f4c7--fe170142b300 From the fedora 19 vm: [root@fedoravm ~]# sg_inq /dev/sdb standard INQUIRY: PQual=0 Device_type=0 RMB=0 version=0x05 [SPC-3] [AERC=0] [TrmTsk=0] NormACA=0 HiSUP=0 Resp_data_format=0 SCCS=1 ACC=0 TPGS=1 3PC=0 Protect=0 [BQue=0] EncServ=0 MultiP=0 [MChngr=0] [ACKREQQ=0] Addr16=0 [RelAdr=0] WBus16=1 Sync=1 Linked=0 [TranDis=0] CmdQue=1 length=36 (0x24) Peripheral device type: disk Vendor identification: MacroSAN Product identification: LU Product revision level: 1.0 Unit serial number: fd01ece6-8540-f4c7--fe170142b300 The result from fedora 19 host and fedora 19 vm are the same. It's that means I got a wrong windows vm scsi pass-through driver? Or is there any tool like sg_inq in windows 2008? Yeah, there's something weird in the Windows VM. sg_inq should be available for Windows too, but I don't know where to get a precompiled binary from. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-ppc: Update slb array with correct index values.
On Wed, Aug 21, 2013 at 08:37:47AM +0100, Alexander Graf wrote: On 21.08.2013, at 06:11, Paul Mackerras wrote: On Mon, Aug 19, 2013 at 10:21:09AM +0200, Alexander Graf wrote: On 19.08.2013, at 09:25, Aneesh Kumar K.V wrote: I noticed this when adding support for guest memory dumping via qemu gdb server. Now the array we get would look like below slbe0 slbv0 slbe1 slbv1 0 0 Ok, so that's where the problem lies. Why are the entries 0 here? Either we try to fetch more entries than we should, we populate entries incorrectly or the kernel simply returns invalid SLB entry values for invalid entries. Are you seeing this with PR KVM or HV KVM? I suspect this is to do with the fact that PR and HV KVM use the vcpu-arch.slb[] array differently. PR stores SLB entry n in vcpu-arch.slb[n], whereas HV packs the valid entries down in the low-numbered entries and puts the index in the bottom bits of the esid field (this is so they can be loaded efficiently with the slbmte instruction on guest entry). Then, kvm_arch_vcpu_ioctl_get_sregs() on PR copies out all 64 entries (valid or not) and puts an index value in the bottom bits of the esid, whereas on HV it just copies out the valid entries (which already have the index in the esid field). So, the question is, what is the ABI here? It sounds a bit like qemu is ignoring the index value in the esid field. Either qemu needs to take notice of the index in the esid field or we need to change the HV versions of kvm_arch_vcpu_ioctl_get/set_sregs to put entry n in sregs-u.s.ppc64.slb[n] like PR does. It's the opposite today - QEMU does honor the index value on sregs get. Aneesh's patch wants to change it to ignore it instead. For sregs set we copy our internal copy of the slb linearly into the array, so we don't pack it there. Can we safely assume on HV KVM that esid == 0 vsid == 0 is the end of the list? If so, we can just add a break statement in the get loop and call it a day. The rest should work just fine. On HV KVM yes, that would be the end of the list, but PR KVM could give you entry 0 containing esid==0 and vsid==0 followed by valid entries. Perhaps the best approach is to ignore any entries with SLB_ESID_V clear. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Oracle RAC in libvirt+KVM environment
On Wed, 2013-08-21 at 11:09 +0200, Paolo Bonzini wrote: Il 21/08/2013 04:11, Timon Wang ha scritto: From the fedora 19 host: [root@fedora ~]# sg_inq /dev/sdc standard INQUIRY: PQual=0 Device_type=0 RMB=0 version=0x05 [SPC-3] [AERC=0] [TrmTsk=0] NormACA=0 HiSUP=0 Resp_data_format=0 SCCS=1 ACC=0 TPGS=1 3PC=0 Protect=0 [BQue=0] EncServ=0 MultiP=0 [MChngr=0] [ACKREQQ=0] Addr16=0 [RelAdr=0] WBus16=1 Sync=1 Linked=0 [TranDis=0] CmdQue=1 length=36 (0x24) Peripheral device type: disk Vendor identification: MacroSAN Product identification: LU Product revision level: 1.0 Unit serial number: fd01ece6-8540-f4c7--fe170142b300 From the fedora 19 vm: [root@fedoravm ~]# sg_inq /dev/sdb standard INQUIRY: PQual=0 Device_type=0 RMB=0 version=0x05 [SPC-3] [AERC=0] [TrmTsk=0] NormACA=0 HiSUP=0 Resp_data_format=0 SCCS=1 ACC=0 TPGS=1 3PC=0 Protect=0 [BQue=0] EncServ=0 MultiP=0 [MChngr=0] [ACKREQQ=0] Addr16=0 [RelAdr=0] WBus16=1 Sync=1 Linked=0 [TranDis=0] CmdQue=1 length=36 (0x24) Peripheral device type: disk Vendor identification: MacroSAN Product identification: LU Product revision level: 1.0 Unit serial number: fd01ece6-8540-f4c7--fe170142b300 The result from fedora 19 host and fedora 19 vm are the same. It's that means I got a wrong windows vm scsi pass-through driver? Or is there any tool like sg_inq in windows 2008? Yeah, there's something weird in the Windows VM. sg_inq should be available for Windows too, but I don't know where to get a precompiled binary from. AFAIK, the latest sg3-utils build for MSFT is here: http://sg.danny.cz/sg/p/sg3_utils-1.36exe.zip --nab -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-ppc: Update slb array with correct index values.
Am 21.08.2013 um 10:25 schrieb Paul Mackerras pau...@samba.org: On Wed, Aug 21, 2013 at 08:37:47AM +0100, Alexander Graf wrote: On 21.08.2013, at 06:11, Paul Mackerras wrote: On Mon, Aug 19, 2013 at 10:21:09AM +0200, Alexander Graf wrote: On 19.08.2013, at 09:25, Aneesh Kumar K.V wrote: I noticed this when adding support for guest memory dumping via qemu gdb server. Now the array we get would look like below slbe0 slbv0 slbe1 slbv1 0 0 Ok, so that's where the problem lies. Why are the entries 0 here? Either we try to fetch more entries than we should, we populate entries incorrectly or the kernel simply returns invalid SLB entry values for invalid entries. Are you seeing this with PR KVM or HV KVM? I suspect this is to do with the fact that PR and HV KVM use the vcpu-arch.slb[] array differently. PR stores SLB entry n in vcpu-arch.slb[n], whereas HV packs the valid entries down in the low-numbered entries and puts the index in the bottom bits of the esid field (this is so they can be loaded efficiently with the slbmte instruction on guest entry). Then, kvm_arch_vcpu_ioctl_get_sregs() on PR copies out all 64 entries (valid or not) and puts an index value in the bottom bits of the esid, whereas on HV it just copies out the valid entries (which already have the index in the esid field). So, the question is, what is the ABI here? It sounds a bit like qemu is ignoring the index value in the esid field. Either qemu needs to take notice of the index in the esid field or we need to change the HV versions of kvm_arch_vcpu_ioctl_get/set_sregs to put entry n in sregs-u.s.ppc64.slb[n] like PR does. It's the opposite today - QEMU does honor the index value on sregs get. Aneesh's patch wants to change it to ignore it instead. For sregs set we copy our internal copy of the slb linearly into the array, so we don't pack it there. Can we safely assume on HV KVM that esid == 0 vsid == 0 is the end of the list? If so, we can just add a break statement in the get loop and call it a day. The rest should work just fine. On HV KVM yes, that would be the end of the list, but PR KVM could give you entry 0 containing esid==0 and vsid==0 followed by valid entries. Perhaps the best approach is to ignore any entries with SLB_ESID_V clear. That means we don't clear entries we don't receive from the kernel because they're V=0 but which were V=1 before. Which with the current code is probably already broken. So yes, clear all cached entries first (to make sure we have no stale ones), then loop through all and only add entries with V=1 should fix everything for PR as well as HV. Alex Paul. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM in HA active/active + fault-tolerant configuration
Hi all, I have a question about Linux KVM HA cluster. I understand that in a HA setup I can live migrate virtual machine between host that shares the same storage (via various methods, eg: DRDB). This enable us to migrate the VMs based on hosts loads and performance. ìMy current understanding is that, with this setup, an host crash will cause the VMs to be restarded on another host. However, I wonder if there is a method to have a fully fault-tolerant HA configuration, where for fully fault-tolerant I means that an host crash (eg: power failures) will cause the VMs to be migrated to another hosts with no state change. In other word: it is possible to have an always-synchronized (both disk memory) VM instance on another host, so that the migrated VM does not need to be restarted but only restored/unpaused? For disk data synchronization we can use shared storages (bypassing the problem) or something similar do DRDB, but what about memory? Thank you, regards. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: updated: kvm PCI todo wiki
On 08/21/2013 12:48 PM, Michael S. Tsirkin wrote: Hey guys, I've put up a wiki page with a kvm PCI todo list, mainly to avoid effort duplication, but also in the hope to draw attention to what I think we should try addressing in KVM: http://www.linux-kvm.org/page/PCITodo This page could cover all PCI related activity in KVM, it is very incomplete. We should probably add e.g. IOMMU related stuff. Note: if there's no developer listed for an item, this just means I don't know of anyone actively working on an issue at the moment, not that no one intends to. I would appreciate it if others working on one of the items on this list would add their names so we can communicate better. If others like this wiki page, please go ahead and add stuff you are working on if any. It would be especially nice to add testing projects. Also, feel free to add links to bugzillas items. On a related note, did anyone ever tried to test MSI / MSI-X with a windows guest? I've tried to enable it for virtio but for some reason Windows didn't wanted to enable it. AHCI was even worse; the stock Windows version doesn't support MSI and the Intel one doesn't like our implementation :-(. Anyone ever managed to get this to work? If not it'd be a good topic for the wiki ... Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: updated: kvm PCI todo wiki
On Wed, 2013-08-21 at 14:45 +0200, Hannes Reinecke wrote: On 08/21/2013 12:48 PM, Michael S. Tsirkin wrote: Hey guys, I've put up a wiki page with a kvm PCI todo list, mainly to avoid effort duplication, but also in the hope to draw attention to what I think we should try addressing in KVM: http://www.linux-kvm.org/page/PCITodo This page could cover all PCI related activity in KVM, it is very incomplete. We should probably add e.g. IOMMU related stuff. Note: if there's no developer listed for an item, this just means I don't know of anyone actively working on an issue at the moment, not that no one intends to. I would appreciate it if others working on one of the items on this list would add their names so we can communicate better. If others like this wiki page, please go ahead and add stuff you are working on if any. It would be especially nice to add testing projects. Also, feel free to add links to bugzillas items. On a related note, did anyone ever tried to test MSI / MSI-X with a windows guest? I've tried to enable it for virtio but for some reason MSI-X is a default mode for NetKvm and viostor on Vista and forward. It must work. Just make sure that CPU family is 0xf or higher, otherwise Windows will not activate MSI-X. Vadim. Windows didn't wanted to enable it. AHCI was even worse; the stock Windows version doesn't support MSI and the Intel one doesn't like our implementation :-(. Anyone ever managed to get this to work? If not it'd be a good topic for the wiki ... Cheers, Hannes -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: updated: kvm PCI todo wiki
On Wed, Aug 21, 2013 at 02:45:36PM +0200, Hannes Reinecke wrote: On 08/21/2013 12:48 PM, Michael S. Tsirkin wrote: Hey guys, I've put up a wiki page with a kvm PCI todo list, mainly to avoid effort duplication, but also in the hope to draw attention to what I think we should try addressing in KVM: http://www.linux-kvm.org/page/PCITodo This page could cover all PCI related activity in KVM, it is very incomplete. We should probably add e.g. IOMMU related stuff. Note: if there's no developer listed for an item, this just means I don't know of anyone actively working on an issue at the moment, not that no one intends to. I would appreciate it if others working on one of the items on this list would add their names so we can communicate better. If others like this wiki page, please go ahead and add stuff you are working on if any. It would be especially nice to add testing projects. Also, feel free to add links to bugzillas items. On a related note, did anyone ever tried to test MSI / MSI-X with a windows guest? I've tried to enable it for virtio but for some reason Windows didn't wanted to enable it. AHCI was even worse; the stock Windows version doesn't support MSI and the Intel one doesn't like our implementation :-(. Anyone ever managed to get this to work? If not it'd be a good topic for the wiki ... Cheers, Hannes I put some AHCI - related things there: making intel's AHCI driver work with QEMU would be nice - e.g. for windows XP guests. It might also help uncover some bugs. -- Dr. Hannes Reinecke zSeries Storage h...@suse.de+49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH qom-cpu for-next 1/2] cpu: Use QTAILQ for CPU list
Am 30.07.2013 18:55, schrieb Andreas Färber: Introduce CPU_FOREACH(), CPU_FOREACH_SAFE() and CPU_NEXT() shorthand macros. Signed-off-by: Andreas Färber afaer...@suse.de Needs the following addition now: diff --git a/hw/cpu/a15mpcore.c b/hw/cpu/a15mpcore.c index af182da..9d0e27e 100644 --- a/hw/cpu/a15mpcore.c +++ b/hw/cpu/a15mpcore.c @@ -72,9 +72,15 @@ static int a15mp_priv_init(SysBusDevice *dev) /* Wire the outputs from each CPU's generic timer to the * appropriate GIC PPI inputs */ -for (i = 0, cpu = first_cpu; i s-num_cpu; i++, cpu = cpu-next_cpu) { +i = 0; +CPU_FOREACH(cpu) { DeviceState *cpudev = DEVICE(cpu); int ppibase = s-num_irq - 32 + i * 32; + +if (i s-num_cpu) { +break; +} + /* physical timer; we wire it up to the non-secure timer's ID, * since a real A15 always has TrustZone but QEMU doesn't. */ @@ -83,6 +89,7 @@ static int a15mp_priv_init(SysBusDevice *dev) /* virtual timer */ qdev_connect_gpio_out(cpudev, 1, qdev_get_gpio_in(s-gic, ppibase + 27)); +i++; } /* Memory map (addresses are offsets from PERIPHBASE): -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH qom-cpu for-next 1/2] cpu: Use QTAILQ for CPU list
On 21 August 2013 15:12, Andreas Färber afaer...@suse.de wrote: -for (i = 0, cpu = first_cpu; i s-num_cpu; i++, cpu = cpu-next_cpu) { +i = 0; +CPU_FOREACH(cpu) { DeviceState *cpudev = DEVICE(cpu); int ppibase = s-num_irq - 32 + i * 32; + +if (i s-num_cpu) { +break; +} + /* physical timer; we wire it up to the non-secure timer's ID, * since a real A15 always has TrustZone but QEMU doesn't. */ @@ -83,6 +89,7 @@ static int a15mp_priv_init(SysBusDevice *dev) /* virtual timer */ qdev_connect_gpio_out(cpudev, 1, qdev_get_gpio_in(s-gic, ppibase + 27)); +i++; } It seems a bit ugly to have to both enumerate the CPUs via CPU_FOREACH and update an index i simultaneously. Isn't there any way to either say give me the CPU pointer for CPU i or give me the index i of this CPU ? (there are a few bits post-arm_pic-removal that could be a little cleaner with one or the other of the above.) thanks -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] vfio-pci: Use fdget() rather than eventfd_fget()
eventfd_fget() tests to see whether the file is an eventfd file, which we then immediately pass to eventfd_ctx_fileget(), which again tests whether the file is an eventfd file. Simplify slightly by using fdget() so that we only test that we're looking at an eventfd once. fget() could also be used, but fdget() makes use of fget_light() for another slight optimization. Signed-off-by: Alex Williamson alex.william...@redhat.com --- v2: Use direct gotos in error path per Al Viro's comment drivers/vfio/pci/vfio_pci_intrs.c | 35 --- 1 file changed, 16 insertions(+), 19 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 4bc704e..641bc87 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -130,8 +130,8 @@ static int virqfd_enable(struct vfio_pci_device *vdev, void (*thread)(struct vfio_pci_device *, void *), void *data, struct virqfd **pvirqfd, int fd) { - struct file *file = NULL; - struct eventfd_ctx *ctx = NULL; + struct fd irqfd; + struct eventfd_ctx *ctx; struct virqfd *virqfd; int ret = 0; unsigned int events; @@ -149,16 +149,16 @@ static int virqfd_enable(struct vfio_pci_device *vdev, INIT_WORK(virqfd-shutdown, virqfd_shutdown); INIT_WORK(virqfd-inject, virqfd_inject); - file = eventfd_fget(fd); - if (IS_ERR(file)) { - ret = PTR_ERR(file); - goto fail; + irqfd = fdget(fd); + if (!irqfd.file) { + ret = -EBADF; + goto err_fd; } - ctx = eventfd_ctx_fileget(file); + ctx = eventfd_ctx_fileget(irqfd.file); if (IS_ERR(ctx)) { ret = PTR_ERR(ctx); - goto fail; + goto err_ctx; } virqfd-eventfd = ctx; @@ -174,7 +174,7 @@ static int virqfd_enable(struct vfio_pci_device *vdev, if (*pvirqfd) { spin_unlock_irq(vdev-irqlock); ret = -EBUSY; - goto fail; + goto err_busy; } *pvirqfd = virqfd; @@ -187,7 +187,7 @@ static int virqfd_enable(struct vfio_pci_device *vdev, init_waitqueue_func_entry(virqfd-wait, virqfd_wakeup); init_poll_funcptr(virqfd-pt, virqfd_ptable_queue_proc); - events = file-f_op-poll(file, virqfd-pt); + events = irqfd.file-f_op-poll(irqfd.file, virqfd-pt); /* * Check if there was an event already pending on the eventfd @@ -202,17 +202,14 @@ static int virqfd_enable(struct vfio_pci_device *vdev, * Do not drop the file until the irqfd is fully initialized, * otherwise we might race against the POLLHUP. */ - fput(file); + fdput(irqfd); return 0; - -fail: - if (ctx !IS_ERR(ctx)) - eventfd_ctx_put(ctx); - - if (file !IS_ERR(file)) - fput(file); - +err_busy: + eventfd_ctx_put(ctx); +err_ctx: + fdput(irqfd); +err_fd: kfree(virqfd); return ret; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-ppc: Update slb array with correct index values.
Alexander Graf ag...@suse.de writes: On HV KVM yes, that would be the end of the list, but PR KVM could give you entry 0 containing esid==0 and vsid==0 followed by valid entries. Perhaps the best approach is to ignore any entries with SLB_ESID_V clear. That means we don't clear entries we don't receive from the kernel because they're V=0 but which were V=1 before. Which with the current code is probably already broken. So yes, clear all cached entries first (to make sure we have no stale ones), then loop through all and only add entries with V=1 should fix everything for PR as well as HV. This is more or less what the patch is doing. The kernel already does memset of all the slb entries. The only difference is we don't depend on the slb index in the return value. Instead we just use the array index as the slb index. Do we really need to make sure the slb index remain same ? -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH qom-cpu for-next 1/2] cpu: Use QTAILQ for CPU list
Am 21.08.2013 16:36, schrieb Peter Maydell: On 21 August 2013 15:12, Andreas Färber afaer...@suse.de wrote: -for (i = 0, cpu = first_cpu; i s-num_cpu; i++, cpu = cpu-next_cpu) { +i = 0; +CPU_FOREACH(cpu) { DeviceState *cpudev = DEVICE(cpu); int ppibase = s-num_irq - 32 + i * 32; + +if (i s-num_cpu) { +break; +} + /* physical timer; we wire it up to the non-secure timer's ID, * since a real A15 always has TrustZone but QEMU doesn't. */ @@ -83,6 +89,7 @@ static int a15mp_priv_init(SysBusDevice *dev) /* virtual timer */ qdev_connect_gpio_out(cpudev, 1, qdev_get_gpio_in(s-gic, ppibase + 27)); +i++; } It seems a bit ugly to have to both enumerate the CPUs via CPU_FOREACH and update an index i simultaneously. Same for the original code. :) Isn't there any way to either say give me the CPU pointer for CPU i or give me the index i of this CPU ? There is: diff --git a/hw/cpu/a15mpcore.c b/hw/cpu/a15mpcore.c index 9d0e27e..1263b12 100644 --- a/hw/cpu/a15mpcore.c +++ b/hw/cpu/a15mpcore.c @@ -50,7 +50,6 @@ static int a15mp_priv_init(SysBusDevice *dev) SysBusDevice *busdev; const char *gictype = arm_gic; int i; -CPUState *cpu; if (kvm_irqchip_in_kernel()) { gictype = kvm-arm-gic; @@ -72,15 +71,10 @@ static int a15mp_priv_init(SysBusDevice *dev) /* Wire the outputs from each CPU's generic timer to the * appropriate GIC PPI inputs */ -i = 0; -CPU_FOREACH(cpu) { -DeviceState *cpudev = DEVICE(cpu); +for (i = 0; i s-num_cpu; i++) { +DeviceState *cpudev = DEVICE(qemu_get_cpu(i)); int ppibase = s-num_irq - 32 + i * 32; -if (i s-num_cpu) { -break; -} - /* physical timer; we wire it up to the non-secure timer's ID, * since a real A15 always has TrustZone but QEMU doesn't. */ @@ -89,7 +83,6 @@ static int a15mp_priv_init(SysBusDevice *dev) /* virtual timer */ qdev_connect_gpio_out(cpudev, 1, qdev_get_gpio_in(s-gic, ppibase + 27)); -i++; } /* Memory map (addresses are offsets from PERIPHBASE): -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ppc kvm-unit-tests?
Hi Alex, I'm looking at adding arm to kvm-unit-tests. One the first things I'd like to do is clean up the kvm-unit-tests repo a bit. There's some arch-specific files (including ppc) laying around in the root dir that I'd sweep up before adding another arch to the mix. Although, checking the git history of the ppc ones even indicates they've never been used. They appear to have been put there during the initial drop by Avi, likely expecting to continue development with them later. So my question to you is, do you use kvm- unit-tests? or plan to? Otherwise, maybe we can just remove ppc altogether during the clean up. Thanks, drew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH-v3 1/4] idr: Percpu ida
On Fri, 16 Aug 2013, Nicholas A. Bellinger wrote: + spinlock_t lock; Remove the spinlock. + unsignednr_free; + unsignedfreelist[]; +}; + +static inline void move_tags(unsigned *dst, unsigned *dst_nr, + unsigned *src, unsigned *src_nr, + unsigned nr) +{ + *src_nr -= nr; + memcpy(dst + *dst_nr, src + *src_nr, sizeof(unsigned) * nr); + *dst_nr += nr; +} + +static inline unsigned alloc_local_tag(struct percpu_ida *pool, +struct percpu_ida_cpu *tags) Pass the __percpu offset and not the tags pointer. +{ + int tag = -ENOSPC; + + spin_lock(tags-lock); Interupts are already disabled. Drop the spinlock. + if (tags-nr_free) + tag = tags-freelist[--tags-nr_free]; You can keep this or avoid address calculation through segment prefixes. F.e. if (__this_cpu_read(tags-nrfree) { int n = __this_cpu_dec_return(tags-nr_free); tag = __this_cpu_read(tags-freelist[n]); } + spin_unlock(tags-lock); Drop. + * Returns a tag - an integer in the range [0..nr_tags) (passed to + * tag_pool_init()), or otherwise -ENOSPC on allocation failure. + * + * Safe to be called from interrupt context (assuming it isn't passed + * __GFP_WAIT, of course). + * + * Will not fail if passed __GFP_WAIT. + */ +int percpu_ida_alloc(struct percpu_ida *pool, gfp_t gfp) +{ + DEFINE_WAIT(wait); + struct percpu_ida_cpu *tags; + unsigned long flags; + int tag; + + local_irq_save(flags); + tags = this_cpu_ptr(pool-tag_cpu); You could drop this_cpu_ptr if you pass pool-tag_cpu to alloc_local_tag. +/** + * percpu_ida_free - free a tag + * @pool: pool @tag was allocated from + * @tag: a tag previously allocated with percpu_ida_alloc() + * + * Safe to be called from interrupt context. + */ +void percpu_ida_free(struct percpu_ida *pool, unsigned tag) +{ + struct percpu_ida_cpu *tags; + unsigned long flags; + unsigned nr_free; + + BUG_ON(tag = pool-nr_tags); + + local_irq_save(flags); + tags = this_cpu_ptr(pool-tag_cpu); + + spin_lock(tags-lock); No need for spinlocking + tags-freelist[tags-nr_free++] = tag; nr_free = __this_cpu_inc_return(pool-tag_cpu.nr_free) ? __this_cpu_write(pool-tag_cpu.freelist[nr_free], tag) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Cross-Platform KVM
Hi, Am 16.08.2013 09:41, schrieb Wincy Van: Hi,there: I have implemented a version of cross-platform KVM. Now, it can works on Linux and Windows(kernel version 7600-9200, amd64). Is it useful? If so, I want make it as a branch of current KVM. Here are some screenshots: Let's CC the KVM mailing list. More telling than screenshots would be some info about your code! Is there a public Git repository to look at? Is it based on a current kvm.git or some older Win32 KVM fork on SourceForge? If so, how invasive are your changes? Or is it a clean-room implementation of your own against the header/ioctl interface? How does it work technically? etc. Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM in HA active/active + fault-tolerant configuration
On Wednesday, August 21, 2013 6:02:31 AM CDT, g.da...@assyoma.it wrote: Hi all, I have a question about Linux KVM HA cluster. I understand that in a HA setup I can live migrate virtual machine between host that shares the same storage (via various methods, eg: DRDB). This enable us to migrate the VMs based on hosts loads and performance. ìMy current understanding is that, with this setup, an host crash will cause the VMs to be restarded on another host. However, I wonder if there is a method to have a fully fault-tolerant HA configuration, where for fully fault-tolerant I means that an host crash (eg: power failures) will cause the VMs to be migrated to another hosts with no state change. In other word: it is possible to have an always-synchronized (both disk memory) VM instance on another host, so that the migrated VM does not need to be restarted but only restored/unpaused? For disk data synchronization we can use shared storages (bypassing the problem) or something similar do DRDB, but what about memory? You're looking for something that doesn't exist for KVM. There was a project once for it called Kemari, but afaik, it's been abandoned for a while. Thank you, regards. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM in HA active/active + fault-tolerant configuration
On 2013-08-21 21:40, Brian Jackson wrote: On Wednesday, August 21, 2013 6:02:31 AM CDT, g.da...@assyoma.it wrote: Hi all, I have a question about Linux KVM HA cluster. I understand that in a HA setup I can live migrate virtual machine between host that shares the same storage (via various methods, eg: DRDB). This enable us to migrate the VMs based on hosts loads and performance. ìMy current understanding is that, with this setup, an host crash will cause the VMs to be restarded on another host. However, I wonder if there is a method to have a fully fault-tolerant HA configuration, where for fully fault-tolerant I means that an host crash (eg: power failures) will cause the VMs to be migrated to another hosts with no state change. In other word: it is possible to have an always-synchronized (both disk memory) VM instance on another host, so that the migrated VM does not need to be restarted but only restored/unpaused? For disk data synchronization we can use shared storages (bypassing the problem) or something similar do DRDB, but what about memory? You're looking for something that doesn't exist for KVM. There was a project once for it called Kemari, but afaik, it's been abandoned for a while. Hi Brian, thank you for your reply. As I googled extensively without finding anything, I was prepared to a similar response. Anyway, from what I understand, Qemu already use a similar approach (tracking dirty memory pages) when live migrating virtual machines to another host. So what is missing is the glue code between Qemu and KVM/libvirt stack, right? Thanks again. Thank you, regards. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM in HA active/active + fault-tolerant configuration
On Wednesday, August 21, 2013 3:49:09 PM CDT, g.da...@assyoma.it wrote: On 2013-08-21 21:40, Brian Jackson wrote: On Wednesday, August 21, 2013 6:02:31 AM CDT, g.da...@assyoma.it wrote: ... Hi Brian, thank you for your reply. As I googled extensively without finding anything, I was prepared to a similar response. Anyway, from what I understand, Qemu already use a similar approach (tracking dirty memory pages) when live migrating virtual machines to another host. So what is missing is the glue code between Qemu and KVM/libvirt stack, right? Live migration isn't what you asked about (at least not from what I understood). Live migration is just moving a VM from one host to another. That is definitely supported by libvirt. Having a constantly running lock-step sync of guest state is what Qemu/KVM does not support. So with Qemu's current live migration abilities, if HostA dies, all it's guests will have downtime while they are restarted on other hosts. Thanks again. ... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majordomo@vger.kernelorg More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-ppc: Update slb array with correct index values.
Am 21.08.2013 um 16:59 schrieb Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com: Alexander Graf ag...@suse.de writes: On HV KVM yes, that would be the end of the list, but PR KVM could give you entry 0 containing esid==0 and vsid==0 followed by valid entries. Perhaps the best approach is to ignore any entries with SLB_ESID_V clear. That means we don't clear entries we don't receive from the kernel because they're V=0 but which were V=1 before. Which with the current code is probably already broken. So yes, clear all cached entries first (to make sure we have no stale ones), then loop through all and only add entries with V=1 should fix everything for PR as well as HV. This is more or less what the patch is doing. The kernel already does memset of all the slb entries. The only difference is we don't depend on the slb index in the return value. Instead we just use the array index as the slb index. Do we really need to make sure the slb index remain same ? Yes, otherwise get/set change SLB numbering which the guest doesn't expect. Alex -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ppc kvm-unit-tests?
Am 21.08.2013 um 18:02 schrieb Andrew Jones drjo...@redhat.com: Hi Alex, I'm looking at adding arm to kvm-unit-tests. One the first things I'd like to do is clean up the kvm-unit-tests repo a bit. There's some arch-specific files (including ppc) laying around in the root dir that I'd sweep up before adding another arch to the mix. Although, checking the git history of the ppc ones even indicates they've never been used. They appear to have been put there during the initial drop by Avi, likely expecting to continue development with them later. So my question to you is, do you use kvm- unit-tests? or plan to? Otherwise, maybe we can just remove ppc altogether during the clean up. I used them back in the very early days of powerkvm, but not sibce then. However, I would like to do so one day. So if you get around to cleaning this up, please do so with multi-arch support in mind :). But yes, just throw away what's there now - it's become irrelevant. Alex Thanks, drew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Call for participation] Bi-Weekly KVM/ARM Technical Sync-up
Hi all, Linaro is going to host a bi-weekly sync-up call for technical issues on KVM/ARM development. The KVM 32-bit and 64-bit maintainers as well as the QEMU ARM maintainer will typically be on the call. The first call will be held Tuesday August 27th. If you, your organization, or any of your colleagues are interested in attending this call, please reply back to me with: - Your name - Your e-mail address - The capacity in which you are interested (hobbyist, company you represent, ...) We will send out an invite after we have collected all the participants. The calls will be based on an agenda that I will E-mail out to the list the Monday before the call. If we have no items on the agenda, we will not be having the call. Agenda items should be E-mailed to me and/or the kvmarm list before the call, obviously before the Monday prior to the call. We emphasize that this is going to be a technical call for engineers and not a forum to solicit services or discuss business concepts. Best, -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Call for participation] Bi-Weekly KVM/ARM Technical Sync-up
On Wed, Aug 21, 2013 at 05:09:39PM -0700, Christoffer Dall wrote: Linaro is going to host a bi-weekly sync-up call for technical issues on KVM/ARM development. The KVM 32-bit and 64-bit maintainers as well as the QEMU ARM maintainer will typically be on the call. The first call will be held Tuesday August 27th. I'll point out that I don't do Tuesdays for phone calls (it's one of the days I regularly take as weekend time) so you'll never be able to invite me if you keep this on Tuesdays. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM in HA active/active + fault-tolerant configuration
On Thu, Aug 22, 2013 at 5:47 AM, Brian Jackson i...@theiggy.com wrote: On Wednesday, August 21, 2013 3:49:09 PM CDT, g.da...@assyoma.it wrote: On 2013-08-21 21:40, Brian Jackson wrote: On Wednesday, August 21, 2013 6:02:31 AM CDT, g.da...@assyoma.it wrote: ... Hi Brian, thank you for your reply. As I googled extensively without finding anything, I was prepared to a similar response. Anyway, from what I understand, Qemu already use a similar approach (tracking dirty memory pages) when live migrating virtual machines to another host. So what is missing is the glue code between Qemu and KVM/libvirt stack, right? Live migration isn't what you asked about (at least not from what I understood). Live migration is just moving a VM from one host to another. That is definitely supported by libvirt. Having a constantly running lock-step sync of guest state is what Qemu/KVM does not support. So with Qemu's current live migration abilities, if HostA dies, all it's guests will have downtime while they are restarted on other hosts. Live migration is not proper for support VM HA, when host went wrong, these VMs on the host must be restart on another host based on the same storage. I have googled for KVM FT for a while, but only find a project called Kemari which was no longer updated. I read some article about KVM FT, which said KVM may be support it in future. Thanks again. ... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majordomo@vger.kernelorg More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Focus on: Server Vitualization, Network security,Scanner,NodeJS,JAVA,WWW Blog: http://www.nohouse.net -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Call for participation] Bi-Weekly KVM/ARM Technical Sync-up
On Thu, Aug 22, 2013 at 01:15:54AM +0100, Russell King - ARM Linux wrote: On Wed, Aug 21, 2013 at 05:09:39PM -0700, Christoffer Dall wrote: Linaro is going to host a bi-weekly sync-up call for technical issues on KVM/ARM development. The KVM 32-bit and 64-bit maintainers as well as the QEMU ARM maintainer will typically be on the call. The first call will be held Tuesday August 27th. I'll point out that I don't do Tuesdays for phone calls (it's one of the days I regularly take as weekend time) so you'll never be able to invite me if you keep this on Tuesdays. We could reconsider the day of the week. Would you actually join if it was on any other day? -- Christian Robottom Reis | [+1] 612 216 4935 | http://launchpad.net/~kiko Canonical VP Hyperscale | [+55 16] 9112 6430 | http://async.com.br/~kiko -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM in HA active/active + fault-tolerant configuration
On Wed, 08/21 22:49, g.da...@assyoma.it wrote: On 2013-08-21 21:40, Brian Jackson wrote: On Wednesday, August 21, 2013 6:02:31 AM CDT, g.da...@assyoma.it wrote: Hi all, I have a question about Linux KVM HA cluster. I understand that in a HA setup I can live migrate virtual machine between host that shares the same storage (via various methods, eg: DRDB). This enable us to migrate the VMs based on hosts loads and performance. ìMy current understanding is that, with this setup, an host crash will cause the VMs to be restarded on another host. However, I wonder if there is a method to have a fully fault-tolerant HA configuration, where for fully fault-tolerant I means that an host crash (eg: power failures) will cause the VMs to be migrated to another hosts with no state change. In other word: it is possible to have an always-synchronized (both disk memory) VM instance on another host, so that the migrated VM does not need to be restarted but only restored/unpaused? For disk data synchronization we can use shared storages (bypassing the problem) or something similar do DRDB, but what about memory? You're looking for something that doesn't exist for KVM. There was a project once for it called Kemari, but afaik, it's been abandoned for a while. Hi Brian, thank you for your reply. As I googled extensively without finding anything, I was prepared to a similar response. Anyway, from what I understand, Qemu already use a similar approach (tracking dirty memory pages) when live migrating virtual machines to another host. Active/active sounds not easy to get, as it seem to me, since you'll need to make sure the VMs on both nodes are always in the same state all the time, that sounds impossible for two emulator processes on two different hosts. I think hot spare is more practical: in background you repetitively trigger migration of delta memory and copy to hot spare, but don't start to run it. Once the active one fails, you can resume the running of hot spare, which is at a latest checkpoint. But I think this needs to some work on current live migration code. Fam -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-ppc: Update slb array with correct index values.
On Mon, Aug 19, 2013 at 10:21:09AM +0200, Alexander Graf wrote: On 19.08.2013, at 09:25, Aneesh Kumar K.V wrote: I noticed this when adding support for guest memory dumping via qemu gdb server. Now the array we get would look like below slbe0 slbv0 slbe1 slbv1 0 0 Ok, so that's where the problem lies. Why are the entries 0 here? Either we try to fetch more entries than we should, we populate entries incorrectly or the kernel simply returns invalid SLB entry values for invalid entries. Are you seeing this with PR KVM or HV KVM? I suspect this is to do with the fact that PR and HV KVM use the vcpu-arch.slb[] array differently. PR stores SLB entry n in vcpu-arch.slb[n], whereas HV packs the valid entries down in the low-numbered entries and puts the index in the bottom bits of the esid field (this is so they can be loaded efficiently with the slbmte instruction on guest entry). Then, kvm_arch_vcpu_ioctl_get_sregs() on PR copies out all 64 entries (valid or not) and puts an index value in the bottom bits of the esid, whereas on HV it just copies out the valid entries (which already have the index in the esid field). So, the question is, what is the ABI here? It sounds a bit like qemu is ignoring the index value in the esid field. Either qemu needs to take notice of the index in the esid field or we need to change the HV versions of kvm_arch_vcpu_ioctl_get/set_sregs to put entry n in sregs-u.s.ppc64.slb[n] like PR does. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-ppc: Update slb array with correct index values.
Alexander Graf ag...@suse.de writes: On 20.08.2013, at 14:57, Aneesh Kumar K.V wrote: Alexander Graf ag...@suse.de writes: On 19.08.2013, at 09:25, Aneesh Kumar K.V wrote: Alexander Graf ag...@suse.de writes: On 11.08.2013, at 20:16, Aneesh Kumar K.V wrote: From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Without this, a value of rb=0 and rs=0, result in us replacing the 0th index Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Wrong mailing list again ;). Will post the series again with updated commit message to the qemu list. --- target-ppc/kvm.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 30a870e..5d4e613 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -1034,8 +1034,18 @@ int kvm_arch_get_registers(CPUState *cs) /* Sync SLB */ #ifdef TARGET_PPC64 for (i = 0; i 64; i++) { -ppc_store_slb(env, sregs.u.s.ppc64.slb[i].slbe, - sregs.u.s.ppc64.slb[i].slbv); +target_ulong rb = sregs.u.s.ppc64.slb[i].slbe; +/* + * KVM_GET_SREGS doesn't retun slb entry with slot information + * same as index. So don't depend on the slot information in + * the returned value. This is the generating code in book3s_pr.c: if (vcpu-arch.hflags BOOK3S_HFLAG_SLB) { for (i = 0; i 64; i++) { sregs-u.s.ppc64.slb[i].slbe = vcpu-arch.slb[i].orige | i; sregs-u.s.ppc64.slb[i].slbv = vcpu-arch.slb[i].origv; } Where exactly did you see broken slbe entries? I noticed this when adding support for guest memory dumping via qemu gdb server. Now the array we get would look like below slbe0 slbv0 slbe1 slbv1 0 0 Ok, so that's where the problem lies. Why are the entries 0 here? Either we try to fetch more entries than we should, we populate entries incorrectly or the kernel simply returns invalid SLB entry values for invalid entries. The ioctl zero out the sregs, and fill only slb_max entries. So we find 0 filled entries above slb_max. Also we don't pass slb_max to user space. So userspace have to look at all the 64 entries. We do pass slb_max, it's just called differently and calculated implicitly :). How about something like this: diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c index 30a870e..29a2ec3 100644 --- a/target-ppc/kvm.c +++ b/target-ppc/kvm.c @@ -818,6 +818,8 @@ int kvm_arch_put_registers(CPUState *cs, int level) /* Sync SLB */ #ifdef TARGET_PPC64 +/* We need to loop through all entries to give them potentially + valid values */ for (i = 0; i 64; i++) { sregs.u.s.ppc64.slb[i].slbe = env-slb[i].esid; sregs.u.s.ppc64.slb[i].slbv = env-slb[i].vsid; @@ -1033,7 +1035,7 @@ int kvm_arch_get_registers(CPUState *cs) /* Sync SLB */ #ifdef TARGET_PPC64 -for (i = 0; i 64; i++) { +for (i = 0; i env-slb_nr; i++) { ppc_store_slb(env, sregs.u.s.ppc64.slb[i].slbe, sregs.u.s.ppc64.slb[i].slbv); } But we don't sync slb_max (max valid slb index), env-slb_nr is slb_nr (total number of slb slots). ? We also don't sync env-slb_nr everytime we do kvm_arch_get_register. The problem we have is, we first memset sregs with 0 and then fill only slb_max entries. Now slbe and slbv entries with value 0 results in us looking at those entries for 0th index, and update the 0th entry. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-ppc: Update slb array with correct index values.
On Wed, Aug 21, 2013 at 08:37:47AM +0100, Alexander Graf wrote: On 21.08.2013, at 06:11, Paul Mackerras wrote: On Mon, Aug 19, 2013 at 10:21:09AM +0200, Alexander Graf wrote: On 19.08.2013, at 09:25, Aneesh Kumar K.V wrote: I noticed this when adding support for guest memory dumping via qemu gdb server. Now the array we get would look like below slbe0 slbv0 slbe1 slbv1 0 0 Ok, so that's where the problem lies. Why are the entries 0 here? Either we try to fetch more entries than we should, we populate entries incorrectly or the kernel simply returns invalid SLB entry values for invalid entries. Are you seeing this with PR KVM or HV KVM? I suspect this is to do with the fact that PR and HV KVM use the vcpu-arch.slb[] array differently. PR stores SLB entry n in vcpu-arch.slb[n], whereas HV packs the valid entries down in the low-numbered entries and puts the index in the bottom bits of the esid field (this is so they can be loaded efficiently with the slbmte instruction on guest entry). Then, kvm_arch_vcpu_ioctl_get_sregs() on PR copies out all 64 entries (valid or not) and puts an index value in the bottom bits of the esid, whereas on HV it just copies out the valid entries (which already have the index in the esid field). So, the question is, what is the ABI here? It sounds a bit like qemu is ignoring the index value in the esid field. Either qemu needs to take notice of the index in the esid field or we need to change the HV versions of kvm_arch_vcpu_ioctl_get/set_sregs to put entry n in sregs-u.s.ppc64.slb[n] like PR does. It's the opposite today - QEMU does honor the index value on sregs get. Aneesh's patch wants to change it to ignore it instead. For sregs set we copy our internal copy of the slb linearly into the array, so we don't pack it there. Can we safely assume on HV KVM that esid == 0 vsid == 0 is the end of the list? If so, we can just add a break statement in the get loop and call it a day. The rest should work just fine. On HV KVM yes, that would be the end of the list, but PR KVM could give you entry 0 containing esid==0 and vsid==0 followed by valid entries. Perhaps the best approach is to ignore any entries with SLB_ESID_V clear. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-ppc: Update slb array with correct index values.
Alexander Graf ag...@suse.de writes: On HV KVM yes, that would be the end of the list, but PR KVM could give you entry 0 containing esid==0 and vsid==0 followed by valid entries. Perhaps the best approach is to ignore any entries with SLB_ESID_V clear. That means we don't clear entries we don't receive from the kernel because they're V=0 but which were V=1 before. Which with the current code is probably already broken. So yes, clear all cached entries first (to make sure we have no stale ones), then loop through all and only add entries with V=1 should fix everything for PR as well as HV. This is more or less what the patch is doing. The kernel already does memset of all the slb entries. The only difference is we don't depend on the slb index in the return value. Instead we just use the array index as the slb index. Do we really need to make sure the slb index remain same ? -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ppc kvm-unit-tests?
Hi Alex, I'm looking at adding arm to kvm-unit-tests. One the first things I'd like to do is clean up the kvm-unit-tests repo a bit. There's some arch-specific files (including ppc) laying around in the root dir that I'd sweep up before adding another arch to the mix. Although, checking the git history of the ppc ones even indicates they've never been used. They appear to have been put there during the initial drop by Avi, likely expecting to continue development with them later. So my question to you is, do you use kvm- unit-tests? or plan to? Otherwise, maybe we can just remove ppc altogether during the clean up. Thanks, drew -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] target-ppc: Update slb array with correct index values.
Am 21.08.2013 um 16:59 schrieb Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com: Alexander Graf ag...@suse.de writes: On HV KVM yes, that would be the end of the list, but PR KVM could give you entry 0 containing esid==0 and vsid==0 followed by valid entries. Perhaps the best approach is to ignore any entries with SLB_ESID_V clear. That means we don't clear entries we don't receive from the kernel because they're V=0 but which were V=1 before. Which with the current code is probably already broken. So yes, clear all cached entries first (to make sure we have no stale ones), then loop through all and only add entries with V=1 should fix everything for PR as well as HV. This is more or less what the patch is doing. The kernel already does memset of all the slb entries. The only difference is we don't depend on the slb index in the return value. Instead we just use the array index as the slb index. Do we really need to make sure the slb index remain same ? Yes, otherwise get/set change SLB numbering which the guest doesn't expect. Alex -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html