Re: Massive read only kvm guests when backing file was missing
On Wed, Mar 26, 2014 at 11:08:03PM -0300, Alejandro Comisario wrote: > Hi List! > Hope some one can help me, we had a big issue in our cloud the other > day, a couple of our openstack regions ( +2000 kvm guests with qcow2 ) > went read only filesystem from the guest side because the backing > files directory (the openstack _base directory) was compromised and > the data was lost, when we realized the data was lost, it took us 5 > mins to restore the backup of the backing files, but by that time all > the kvm guests received some kind of IO error from the hypervisor > layer, and went read only on root filesystem. > > My question would be, is there a way to hold the IO operations against > the backing files ( i thought that would be 99% READ operations ) for > a little longer ( im asking this because i dont quite understand what > is the process and when it raises the error ) in a case the backing > files are missing (no IO possible) but is recoverable within minutes ? > > Any tip on how to achieve this if possible, or information about how > backing files works on kvm, will be amazing. > Waiting for feedback! > > kindest regards. > Alejandro Comisario I'm guessing this is what happened: guests timed out meanwhile. You can increase the timeout within the guest: echo 600 > /sys/block/sda/device/timeout to timeout after 10 minutes. If you have installed qemu guest agent on your system, you can do this from the host. Unfortunately by default it's memory can be pushed out to swap and then on disk error access there might will fail :( Maybe we should consider mlock on all its memory at least as an option. You could pause your guests, restart them after the issue is resolved, and we could I guess add functionality to pause VM on disk errors automatically. Stefan? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] KVM: enable Intel SMAP for KVM
Supervisor Mode Access Prevention (SMAP) is a new security feature disclosed by Intel, please refer to the following document: http://software.intel.com/sites/default/files/319433-014.pdf Every access to a linear address is either a supervisor-mode access or a user-mode access. All accesses performed while the current privilege level (CPL) is less than 3 are supervisor-mode accesses. If CPL = 3, accesses are generally user-mode accesses. However, some operations implicitly access system data structures, and the resulting accesses to those data structures are supervisor-mode accesses regardless of CPL. Examples of such implicit supervisor accesses include the following: accesses to the global descriptor table (GDT) or local descriptor table (LDT) to load a segment descriptor; accesses to the interrupt descriptor table (IDT) when delivering an interrupt or exception; and accesses to the task-state segment (TSS) as part of a task switch or change of CPL. If CR4.SMAP = 1, supervisor-mode data accesses are not allowed to linear addresses that are accessible in user mode. If CPL < 3, SMAP protections are disabled if EFLAGS.AC = 1. If CPL = 3, SMAP applies to all supervisor-mode data accesses (these are implicit supervisor accesses) regardless of the value of EFLAGS.AC. This patchset pass-through SMAP feature to guests, and let guests benefit from it. Feng Wu (4): KVM: expose SMAP feature to guest KVM: Remove SMAP bit from CR4_RESERVED_BITS. KVM: Add SMAP support when setting CR4 KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/cpuid.c| 2 +- arch/x86/kvm/cpuid.h| 8 arch/x86/kvm/mmu.c | 22 +++--- arch/x86/kvm/mmu.h | 2 ++ arch/x86/kvm/vmx.c | 10 ++ arch/x86/kvm/x86.c | 6 ++ 7 files changed, 43 insertions(+), 9 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] KVM: expose SMAP feature to guest
This patch exposes SMAP feature to guest Signed-off-by: Feng Wu --- arch/x86/kvm/cpuid.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index c697625..deb5f9b 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -303,7 +303,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.ebx */ const u32 kvm_supported_word9_x86_features = F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) | - F(BMI2) | F(ERMS) | f_invpcid | F(RTM); + F(BMI2) | F(ERMS) | f_invpcid | F(RTM) | F(SMAP); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
SMAP is disabled if CPU is in non-paging mode in hardware. However KVM always uses paging mode to emulate guest non-paging mode with TDP. To emulate this behavior, SMAP needs to be manually disabled when guest switches to non-paging mode. Signed-off-by: Feng Wu --- arch/x86/kvm/vmx.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index dcc4de3..1d37e50 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3421,13 +3421,15 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) hw_cr4 &= ~X86_CR4_PAE; hw_cr4 |= X86_CR4_PSE; /* -* SMEP is disabled if CPU is in non-paging mode in -* hardware. However KVM always uses paging mode to +* SMEP/SMAP is disabled if CPU is in non-paging mode +* in hardware. However KVM always uses paging mode to * emulate guest non-paging mode with TDP. -* To emulate this behavior, SMEP needs to be manually -* disabled when guest switches to non-paging mode. +* To emulate this behavior, SMEP/SMAP needs to be +* manually disabled when guest switches to non-paging +* mode. */ hw_cr4 &= ~X86_CR4_SMEP; + hw_cr4 &= ~X86_CR4_SMAP; } else if (!(cr4 & X86_CR4_PAE)) { hw_cr4 &= ~X86_CR4_PAE; } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] KVM: Add SMAP support when setting CR4
This patch adds SMAP handling logic when setting CR4 for guests Signed-off-by: Feng Wu --- arch/x86/kvm/cpuid.h | 8 arch/x86/kvm/mmu.c | 22 +++--- arch/x86/kvm/mmu.h | 2 ++ arch/x86/kvm/x86.c | 6 ++ 4 files changed, 35 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index f1e4895..63124a2 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -48,6 +48,14 @@ static inline bool guest_cpuid_has_smep(struct kvm_vcpu *vcpu) return best && (best->ebx & bit(X86_FEATURE_SMEP)); } +static inline bool guest_cpuid_has_smap(struct kvm_vcpu *vcpu) +{ + struct kvm_cpuid_entry2 *best; + + best = kvm_find_cpuid_entry(vcpu, 7, 0); + return best && (best->ebx & bit(X86_FEATURE_SMAP)); +} + static inline bool guest_cpuid_has_fsgsbase(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 40772ef..33e656c 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3591,14 +3591,15 @@ static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu, } } -static void update_permission_bitmask(struct kvm_vcpu *vcpu, +void update_permission_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, bool ept) { unsigned bit, byte, pfec; u8 map; - bool fault, x, w, u, wf, uf, ff, smep; + bool fault, x, w, u, wf, uf, ff, smep, smap; smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP); + smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP); for (byte = 0; byte < ARRAY_SIZE(mmu->permissions); ++byte) { pfec = byte << 1; map = 0; @@ -3617,11 +3618,26 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu, w |= !is_write_protection(vcpu) && !uf; /* Disallow supervisor fetches of user code if cr4.smep */ x &= !(smep && u && !uf); + + /* +* SMAP:kernel-mode data accesses from user-mode +* mappings should fault. A fault is considered +* as a SMAP violation if all of the following +* conditions are ture: +* - X86_CR4_SMAP is set in CR4 +* - An user page is accessed +* - !(CPL<3 && X86_EFLAGS_AC is set) +* - Page fault in kernel mode +*/ + smap = smap && u && !uf && + !((kvm_x86_ops->get_cpl(vcpu) < 3) && + ((kvm_x86_ops->get_rflags(vcpu) & + X86_EFLAGS_AC) == 1)); } else /* Not really needed: no U/S accesses on ept */ u = 1; - fault = (ff && !x) || (uf && !u) || (wf && !w); + fault = (ff && !x) || (uf && !u) || (wf && !w) || smap; map |= fault << bit; } mmu->permissions[byte] = map; diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 2926152..8820f78 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -73,6 +73,8 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct); void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context); void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context, bool execonly); +void update_permission_bitmask(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, + bool ept); static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm) { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4e33b85..f8293fb 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -630,6 +630,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) if (!guest_cpuid_has_smep(vcpu) && (cr4 & X86_CR4_SMEP)) return 1; + if (!guest_cpuid_has_smap(vcpu) && (cr4 & X86_CR4_SMAP)) + return 1; + if (!guest_cpuid_has_fsgsbase(vcpu) && (cr4 & X86_CR4_FSGSBASE)) return 1; @@ -658,6 +661,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE))) kvm_mmu_reset_context(vcpu); + if ((cr4 ^ old_cr4) & X86_CR4_SMAP) + update_permission_bitmask(vcpu, vcpu->arch.walk_mmu, false); + if ((cr4 ^ old_cr4) & X86_CR4_OSXSAVE) kvm_update_cpuid(vcpu); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://
[PATCH 2/4] KVM: Remove SMAP bit from CR4_RESERVED_BITS.
This patch removes SMAP bit from CR4_RESERVED_BITS. Signed-off-by: Feng Wu --- arch/x86/include/asm/kvm_host.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index ae5d783..b673925 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -60,7 +60,7 @@ | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE \ | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \ | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \ - | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE)) + | X86_CR4_OSXMMEXCPT | X86_CR4_VMXE | X86_CR4_SMAP)) #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Massive read only kvm guests when backing file was missing
Hi List! Hope some one can help me, we had a big issue in our cloud the other day, a couple of our openstack regions ( +2000 kvm guests with qcow2 ) went read only filesystem from the guest side because the backing files directory (the openstack _base directory) was compromised and the data was lost, when we realized the data was lost, it took us 5 mins to restore the backup of the backing files, but by that time all the kvm guests received some kind of IO error from the hypervisor layer, and went read only on root filesystem. My question would be, is there a way to hold the IO operations against the backing files ( i thought that would be 99% READ operations ) for a little longer ( im asking this because i dont quite understand what is the process and when it raises the error ) in a case the backing files are missing (no IO possible) but is recoverable within minutes ? Any tip on how to achieve this if possible, or information about how backing files works on kvm, will be amazing. Waiting for feedback! kindest regards. Alejandro Comisario -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
On 26/03/14 20:12, David Miller wrote: From: David Miller Date: Wed, 26 Mar 2014 15:59:58 -0400 (EDT) From: Zoltan Kiss Date: Fri, 21 Mar 2014 10:31:34 + skb_zerocopy can copy elements of the frags array between skbs, but it doesn't orphan them. Also, it doesn't handle errors, so this patch takes care of that as well, and modify the callers accordingly. skb_tx_error() is also added to the callers so they will signal the failed delivery towards the creator of the skb. Signed-off-by: Zoltan Kiss Applied, thanks Zoltan. Actually, Zoltan, you have to fix this: net/core/skbuff.c: In function ‘skb_zerocopy’: net/core/skbuff.c:2172:2: warning: passing argument 1 of ‘skb_orphan_frags’ discards ‘const’ qualifi er from pointer target type [enabled by default] In file included from include/linux/tcp.h:21:0, from net/core/skbuff.c:50: include/linux/skbuff.h:1904:19: note: expected ‘struct sk_buff *’ but argument is of type ‘const str uct sk_buff *’ net/core/skbuff.c:2173:3: warning: passing argument 1 of ‘skb_tx_error’ discards ‘const’ qualifier f rom pointer target type [enabled by default] net/core/skbuff.c:642:6: note: expected ‘struct sk_buff *’ but argument is of type ‘const struct sk_ buff *’ Ok, resubmitted. 'from' is now not a const parameter, because skb->pfmemalloc might change. Zoli -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
skb_zerocopy can copy elements of the frags array between skbs, but it doesn't orphan them. Also, it doesn't handle errors, so this patch takes care of that as well, and modify the callers accordingly. skb_tx_error() is also added to the callers so they will signal the failed delivery towards the creator of the skb. Signed-off-by: Zoltan Kiss --- v2: orphan the frags right before touching the frags v3: - orphan 'from' instead of 'to' - call skb_tx_error() in the callers if something went wrong v4: correctly use error path in queue_userspace_packet v5: remove const qualifier of "from" in skb_zerocopy parameters diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 03db95a..0db91fa 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2508,8 +2508,8 @@ int skb_splice_bits(struct sk_buff *skb, unsigned int offset, unsigned int flags); void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to); unsigned int skb_zerocopy_headlen(const struct sk_buff *from); -void skb_zerocopy(struct sk_buff *to, const struct sk_buff *from, - int len, int hlen); +int skb_zerocopy(struct sk_buff *to, struct sk_buff *from, +int len, int hlen); void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len); int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen); void skb_scrub_packet(struct sk_buff *skb, bool xnet); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 3f14c63..908ad98 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2127,25 +2127,31 @@ EXPORT_SYMBOL_GPL(skb_zerocopy_headlen); * * The `hlen` as calculated by skb_zerocopy_headlen() specifies the * headroom in the `to` buffer. + * + * Return value: + * 0: everything is OK + * -ENOMEM: couldn't orphan frags of @from due to lack of memory + * -EFAULT: skb_copy_bits() found some problem with skb geometry */ -void -skb_zerocopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen) +int +skb_zerocopy(struct sk_buff *to, struct sk_buff *from, int len, int hlen) { int i, j = 0; int plen = 0; /* length of skb->head fragment */ + int ret; struct page *page; unsigned int offset; BUG_ON(!from->head_frag && !hlen); /* dont bother with small payloads */ - if (len <= skb_tailroom(to)) { - skb_copy_bits(from, 0, skb_put(to, len), len); - return; - } + if (len <= skb_tailroom(to)) + return skb_copy_bits(from, 0, skb_put(to, len), len); if (hlen) { - skb_copy_bits(from, 0, skb_put(to, hlen), hlen); + ret = skb_copy_bits(from, 0, skb_put(to, hlen), hlen); + if (unlikely(ret)) + return ret; len -= hlen; } else { plen = min_t(int, skb_headlen(from), len); @@ -2163,6 +2169,11 @@ skb_zerocopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen) to->len += len + plen; to->data_len += len + plen; + if (unlikely(skb_orphan_frags(from, GFP_ATOMIC))) { + skb_tx_error(from); + return -ENOMEM; + } + for (i = 0; i < skb_shinfo(from)->nr_frags; i++) { if (!len) break; @@ -2173,6 +2184,8 @@ skb_zerocopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen) j++; } skb_shinfo(to)->nr_frags = j; + + return 0; } EXPORT_SYMBOL_GPL(skb_zerocopy); diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c index f072fe8..108120f 100644 --- a/net/netfilter/nfnetlink_queue_core.c +++ b/net/netfilter/nfnetlink_queue_core.c @@ -354,13 +354,16 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue, skb = nfnetlink_alloc_skb(net, size, queue->peer_portid, GFP_ATOMIC); - if (!skb) + if (!skb) { + skb_tx_error(entskb); return NULL; + } nlh = nlmsg_put(skb, 0, 0, NFNL_SUBSYS_QUEUE << 8 | NFQNL_MSG_PACKET, sizeof(struct nfgenmsg), 0); if (!nlh) { + skb_tx_error(entskb); kfree_skb(skb); return NULL; } @@ -488,13 +491,15 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue, nla->nla_type = NFQA_PAYLOAD; nla->nla_len = nla_attr_size(data_len); - skb_zerocopy(skb, entskb, data_len, hlen); + if (skb_zerocopy(skb, entskb, data_len, hlen)) + goto nla_put_failure; } nlh->nlmsg_len = skb->len; return skb; nla_put_failure: + skb_tx_error(entskb); kfree_skb(skb); net_err_ratelimited("nf_queue: error creating packet message\n");
Re: [PATCH 4/4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
On Thu, 2014-02-20 at 18:30 +0200, Mihai Caraman wrote: > Load external pid (lwepx) instruction faults (when called from > KVM with guest context) needs to be handled by KVM. This implies > additional code in DO_KVM macro to identify the source of the > exception (which oiginate from KVM host rather than the guest). > The hook requires to check the Exception Syndrome Register > ESR[EPID] and External PID Load Context Register EPLC[EGS] for > some exceptions (DTLB_MISS, DSI and LRAT). Doing this on Data TLB > miss exception is obvious intrusive for the host. > > Get rid of lwepx and acquire last instuction in kvmppc_get_last_inst() > by searching for the physical address and kmap it. This fixes an > infinite loop caused by lwepx's data TLB miss handled in the host > and the TODO for TLB eviction and execute-but-not-read entries. > > Signed-off-by: Mihai Caraman > --- > arch/powerpc/kvm/bookehv_interrupts.S | 37 +++-- > arch/powerpc/kvm/e500_mmu_host.c | 93 > + > 2 files changed, 102 insertions(+), 28 deletions(-) > > diff --git a/arch/powerpc/kvm/bookehv_interrupts.S > b/arch/powerpc/kvm/bookehv_interrupts.S > index 20c7a54..c50490c 100644 > --- a/arch/powerpc/kvm/bookehv_interrupts.S > +++ b/arch/powerpc/kvm/bookehv_interrupts.S > @@ -119,38 +119,14 @@ > 1: > > .if \flags & NEED_EMU > - /* > - * This assumes you have external PID support. > - * To support a bookehv CPU without external PID, you'll > - * need to look up the TLB entry and create a temporary mapping. > - * > - * FIXME: we don't currently handle if the lwepx faults. PR-mode > - * booke doesn't handle it either. Since Linux doesn't use > - * broadcast tlbivax anymore, the only way this should happen is > - * if the guest maps its memory execute-but-not-read, or if we > - * somehow take a TLB miss in the middle of this entry code and > - * evict the relevant entry. On e500mc, all kernel lowmem is > - * bolted into TLB1 large page mappings, and we don't use > - * broadcast invalidates, so we should not take a TLB miss here. > - * > - * Later we'll need to deal with faults here. Disallowing guest > - * mappings that are execute-but-not-read could be an option on > - * e500mc, but not on chips with an LRAT if it is used. > - */ > - > - mfspr r3, SPRN_EPLC /* will already have correct ELPID and EGS */ > PPC_STL r15, VCPU_GPR(R15)(r4) > PPC_STL r16, VCPU_GPR(R16)(r4) > PPC_STL r17, VCPU_GPR(R17)(r4) > PPC_STL r18, VCPU_GPR(R18)(r4) > PPC_STL r19, VCPU_GPR(R19)(r4) > - mr r8, r3 > PPC_STL r20, VCPU_GPR(R20)(r4) > - rlwimi r8, r6, EPC_EAS_SHIFT - MSR_IR_LG, EPC_EAS > PPC_STL r21, VCPU_GPR(R21)(r4) > - rlwimi r8, r6, EPC_EPR_SHIFT - MSR_PR_LG, EPC_EPR > PPC_STL r22, VCPU_GPR(R22)(r4) > - rlwimi r8, r10, EPC_EPID_SHIFT, EPC_EPID > PPC_STL r23, VCPU_GPR(R23)(r4) > PPC_STL r24, VCPU_GPR(R24)(r4) > PPC_STL r25, VCPU_GPR(R25)(r4) > @@ -160,10 +136,15 @@ > PPC_STL r29, VCPU_GPR(R29)(r4) > PPC_STL r30, VCPU_GPR(R30)(r4) > PPC_STL r31, VCPU_GPR(R31)(r4) > - mtspr SPRN_EPLC, r8 > - isync > - lwepx r9, 0, r5 > - mtspr SPRN_EPLC, r3 > + > + /* > + * We don't use external PID support. lwepx faults would need to be > + * handled by KVM and this implies aditional code in DO_KVM (for > + * DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] which > + * is too intrusive for the host. Get last instuction in > + * kvmppc_get_last_inst(). > + */ > + li r9, KVM_INST_FETCH_FAILED > stw r9, VCPU_LAST_INST(r4) > .endif > > diff --git a/arch/powerpc/kvm/e500_mmu_host.c > b/arch/powerpc/kvm/e500_mmu_host.c > index 6025cb7..1b4cb41 100644 > --- a/arch/powerpc/kvm/e500_mmu_host.c > +++ b/arch/powerpc/kvm/e500_mmu_host.c > @@ -598,9 +598,102 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, > gpa_t gpaddr, > } > } > > +#ifdef CONFIG_KVM_BOOKE_HV > +int kvmppc_ld_inst(struct kvm_vcpu *vcpu, u32 *instr) It'd be interesting to see what the performance impact of doing this on non-HV would be -- it would eliminate divergent code, eliminate the MSR_DS hack, and make exec-only mappings work. > +{ > + gva_t geaddr; > + hpa_t addr; > + hfn_t pfn; > + hva_t eaddr; > + u32 mas0, mas1, mas2, mas3; > + u64 mas7_mas3; > + struct page *page; > + unsigned int addr_space, psize_shift; > + bool pr; > + unsigned long flags; > + > + /* Search TLB for guest pc to get the real address */ > + geaddr = kvmppc_get_pc(vcpu); > + addr_space = (vcpu->arch.shared->msr & MSR_IS) >> MSR_IR_LG; > + > + local_irq_save(flags); > + mtspr(SPRN_MAS6, (vcpu->arch.pid << MAS6_SPID_SHIFT) | addr_space); > + mtspr(SPRN_MAS5, MAS5_SGS | vcpu->kvm->arch.lpid); > + isync(); >
Re: [PATCH 3/4] KVM: PPC: Alow kvmppc_get_last_inst() to fail
On Thu, 2014-02-20 at 18:30 +0200, Mihai Caraman wrote: > diff --git a/arch/powerpc/kvm/book3s_paired_singles.c > b/arch/powerpc/kvm/book3s_paired_singles.c > index a59a25a..80c533e 100644 > --- a/arch/powerpc/kvm/book3s_paired_singles.c > +++ b/arch/powerpc/kvm/book3s_paired_singles.c > @@ -640,19 +640,24 @@ static int kvmppc_ps_one_in(struct kvm_vcpu *vcpu, bool > rc, > > int kvmppc_emulate_paired_single(struct kvm_run *run, struct kvm_vcpu *vcpu) > { > - u32 inst = kvmppc_get_last_inst(vcpu); > + u32 inst; > enum emulation_result emulated = EMULATE_DONE; > - > - int ax_rd = inst_get_field(inst, 6, 10); > - int ax_ra = inst_get_field(inst, 11, 15); > - int ax_rb = inst_get_field(inst, 16, 20); > - int ax_rc = inst_get_field(inst, 21, 25); > - short full_d = inst_get_field(inst, 16, 31); > - > - u64 *fpr_d = &vcpu->arch.fpr[ax_rd]; > - u64 *fpr_a = &vcpu->arch.fpr[ax_ra]; > - u64 *fpr_b = &vcpu->arch.fpr[ax_rb]; > - u64 *fpr_c = &vcpu->arch.fpr[ax_rc]; > + int ax_rd, ax_ra, ax_rb, ax_rc; > + short full_d; > + u64 *fpr_d, *fpr_a, *fpr_b, *fpr_c; > + > + kvmppc_get_last_inst(vcpu, &inst); Should probably check for failure here and elsewhere -- even though it can't currently fail on book3s, the interface now allows it. > diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c > index 5b9e906..b0d884d 100644 > --- a/arch/powerpc/kvm/book3s_pr.c > +++ b/arch/powerpc/kvm/book3s_pr.c > @@ -624,9 +624,10 @@ void kvmppc_giveup_ext(struct kvm_vcpu *vcpu, ulong msr) > static int kvmppc_read_inst(struct kvm_vcpu *vcpu) > { > ulong srr0 = kvmppc_get_pc(vcpu); > - u32 last_inst = kvmppc_get_last_inst(vcpu); > + u32 last_inst; > int ret; > > + kvmppc_get_last_inst(vcpu, &last_inst); > ret = kvmppc_ld(vcpu, &srr0, sizeof(u32), &last_inst, false); This isn't new, but this function looks odd to me -- calling kvmppc_get_last_inst() but ignoring last_inst, then calling kvmppc_ld() and ignoring anything but failure. last_inst itself is never read. And no comments to explain the weirdness. :-) I get that kvmppc_get_last_inst() is probably being used for the side effect of filling in vcpu->arch.last_inst, but why store the return value without using it? Why pass the address of it to kvmppc_ld(), which seems to be used only as an indirect way of determining whether kvmppc_get_last_inst() failed? And that whole mechanism becomes stranger once it becomes possible for kvmppc_get_last_inst() to directly return failure. > diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h > index 09bfd9b..c7c60c2 100644 > --- a/arch/powerpc/kvm/booke.h > +++ b/arch/powerpc/kvm/booke.h > @@ -90,6 +90,9 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu); > void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu); > void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu); > > +void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, > + ulong esr_flags); Whitespace > + > enum int_class { > INT_CLASS_NONCRIT, > INT_CLASS_CRIT, > @@ -123,6 +126,8 @@ extern int kvmppc_core_emulate_mtspr_e500(struct kvm_vcpu > *vcpu, int sprn, > extern int kvmppc_core_emulate_mfspr_e500(struct kvm_vcpu *vcpu, int sprn, > ulong *spr_val); > > +extern int kvmppc_ld_inst(struct kvm_vcpu *vcpu, u32 *instr) ; Whitespace > diff --git a/arch/powerpc/kvm/e500_mmu_host.c > b/arch/powerpc/kvm/e500_mmu_host.c > index ecf2247..6025cb7 100644 > --- a/arch/powerpc/kvm/e500_mmu_host.c > +++ b/arch/powerpc/kvm/e500_mmu_host.c > @@ -34,6 +34,7 @@ > #include "e500.h" > #include "timing.h" > #include "e500_mmu_host.h" > +#include "booke.h" > > #include "trace_booke.h" > > @@ -597,6 +598,10 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, > gpa_t gpaddr, > } > } > > +int kvmppc_ld_inst(struct kvm_vcpu *vcpu, u32 *instr) { > + return EMULATE_FAIL; > +}; Brace placement > /* MMU Notifiers */ > > int kvm_unmap_hva(struct kvm *kvm, unsigned long hva) > diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c > index 2f9a087..24a8e50 100644 > --- a/arch/powerpc/kvm/emulate.c > +++ b/arch/powerpc/kvm/emulate.c > @@ -225,19 +225,26 @@ static int kvmppc_emulate_mfspr(struct kvm_vcpu *vcpu, > int sprn, int rt) > * from opcode tables in the future. */ > int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) > { > - u32 inst = kvmppc_get_last_inst(vcpu); > - int ra = get_ra(inst); > - int rs = get_rs(inst); > - int rt = get_rt(inst); > - int sprn = get_sprn(inst); > - enum emulation_result emulated = EMULATE_DONE; > + u32 inst; > + int ra, rs, rt, sprn; > + enum emulation_result emulated; > int advance = 1; > > /* this default type might be overwritten by subcategories */ > kvmppc_set_exit_type(vcpu, E
Re: [PATCH 3/3] KVM: nVMX: check for null vmcs12 when L1 does invept
Jan Kiszka writes: > On 2014-03-22 17:43, Bandan Das wrote: >> Jan Kiszka writes: >> >>> On 2014-03-20 21:58, Bandan Das wrote: Jan Kiszka writes: > On 2014-03-20 04:28, Bandan Das wrote: >> Some L1 hypervisors such as Xen seem to be calling invept after >> vmclear or before vmptrld on L2. In this case, proceed with >> falling through and syncing roots as a case where >> context wide invalidation can't be supported > > Can we also base this behaviour on a statement in the SDM? But on first > glance, I do not find anything like this over there. The SDM has nothing of this sort explicitly mentioned but 28.3.3.1 "Operations that invalidate Cached Mappings" does mention that the instruction may invalidate mappings associated with other EP4TAs (even in single context). >>> >>> Yes, "may". So we are implementing undefined behavior in order to please >>> a broken hypervisor that relies on it? Then please state this in the >>> patch and probably also inform Xen about their issue. >> >> Why undefined behavior ? We don't do anything specific for >> the single context invalidation case ianyway .e If the eptp matches what >> vmcs12 has, single context invalidation does fall though to the global >> invalidation case already. All this change does is add the "L1 calls >> invept after vmclear and before vmptrld" to the list of cases to fall >> though to global invalidation since nvmx doesn't have any knowledge of >> the current eptp for this case. > > OK, I think I misunderstood what the guest expects and how we currently > achieve this: we do not track the mapping between guest and host eptp, > thus cannot properly emulate its behaviour. We therefore need to flush > everything. > >> >> Or do you think we should rethink this approach ? > > Well, I wonder if we should expose single-context invept support at all. > > I'm also wondering if we are returning proper flags on > > if ((operand.eptp & eptp_mask) != > (nested_ept_get_cr3(vcpu) & eptp_mask)) > break; That does sound plausible but then we would have to get rid of this little optimization in the code you have quoted above. We would have to flush and sync roots for all invept calls. So, my preference is to keep it. > Neither nested_vmx_succeed nor nested_vmx_fail* is called if this > condition evaluates to true. It appears we should always call nested_vmx_succeed(). If you are ok, I will send a v2. Thanks, Bandan > Jan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
From: David Miller Date: Wed, 26 Mar 2014 15:59:58 -0400 (EDT) > From: Zoltan Kiss > Date: Fri, 21 Mar 2014 10:31:34 + > >> skb_zerocopy can copy elements of the frags array between skbs, but it >> doesn't >> orphan them. Also, it doesn't handle errors, so this patch takes care of that >> as well, and modify the callers accordingly. skb_tx_error() is also added to >> the callers so they will signal the failed delivery towards the creator of >> the >> skb. >> >> Signed-off-by: Zoltan Kiss > > Applied, thanks Zoltan. Actually, Zoltan, you have to fix this: net/core/skbuff.c: In function ‘skb_zerocopy’: net/core/skbuff.c:2172:2: warning: passing argument 1 of ‘skb_orphan_frags’ discards ‘const’ qualifi er from pointer target type [enabled by default] In file included from include/linux/tcp.h:21:0, from net/core/skbuff.c:50: include/linux/skbuff.h:1904:19: note: expected ‘struct sk_buff *’ but argument is of type ‘const str uct sk_buff *’ net/core/skbuff.c:2173:3: warning: passing argument 1 of ‘skb_tx_error’ discards ‘const’ qualifier f rom pointer target type [enabled by default] net/core/skbuff.c:642:6: note: expected ‘struct sk_buff *’ but argument is of type ‘const struct sk_ buff *’ N§²ζμrΈyϊθΨb²X¬ΆΗ§vΨ^)ήΊ{.nΗ+·€Ύh§Ά‘ά¨}©²Ζ zΪ&j:+v¨Ύ«κηzZ+Κ+zf£’·h§~Ϋi�ϋΰzΉ�w₯’Έ?¨θΪ&’)ί’f
Re: [PATCH v4] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
From: Zoltan Kiss Date: Fri, 21 Mar 2014 10:31:34 + > skb_zerocopy can copy elements of the frags array between skbs, but it doesn't > orphan them. Also, it doesn't handle errors, so this patch takes care of that > as well, and modify the callers accordingly. skb_tx_error() is also added to > the callers so they will signal the failed delivery towards the creator of the > skb. > > Signed-off-by: Zoltan Kiss Applied, thanks Zoltan. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: vmx: fix MPX detection
kvm_x86_ops is still NULL at this point. Since kvm_init_msr_list cannot fail, it is safe to initialize it before the call. Reported-by: Fengguang Wu Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini --- arch/x86/kvm/x86.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 39c28f09dfd5..49b514f76b5d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5537,9 +5537,10 @@ int kvm_arch_init(void *opaque) goto out_free_percpu; kvm_set_mmio_spte_mask(); - kvm_init_msr_list(); kvm_x86_ops = ops; + kvm_init_msr_list(); + kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK, PT_DIRTY_MASK, PT64_NX_MASK, 0); -- 1.8.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preparing kvm/next for first pull request of 3.15 merge window
On Wed, Mar 26, 2014 at 10:19:52AM +0100, Paolo Bonzini wrote: > Il 26/03/2014 04:51, Paul Mackerras ha scritto: > >>> I would like to know from ARM and PPC maintainers *now* (before the > >>> merge window opens) what will be in 3.15. Also, PPC guys, please > >>> make sure the pull requests will be based on commit e724f080f5dd > >>> (KVM: PPC: Book3S HV: Fix register usage when loading/saving VRSAVE, > >>> 2014-03-13). > >Alex has 5 commits in his kvm-ppc-queue branch that should go in. > >That tree is based on your kvm-3.14-2 tag, not on e724f080f5dd. > >Would you accept a pull request for that from me on Alex's behalf > >since he's away? > > Yes, of course. > > >Do you need me to rebase it at all? > > These are: > > 1ac4484979 PPC: KVM: Introduce hypervisor call H_GET_TCE > e0a7be38c9 PPC: KVM: fix RESUME_GUEST check before returning from > kvmppc_run_core() > 013e98e657 PPC: KVM: fix RESUME_GUEST check before ending CEDE in > kvmppc_run_core() > e8a7f11fe0 PPC: KVM: fix VCPU run for HV KVM (v2) > 26d96ec97c PPC: KVM: introduce helper to check RESUME_GUEST and related > > I would consider rebasing; but you know better than me the effect of > the two host-crash-fixing patches and how the > testability/bisectability of kvm-ppc-queue is affected. > > In particular, how likely is it that reverse-bisection ("which > commit fixed the bug?") would end up on e724f080f5dd rather than one > of the four RESUME_GUEST commits? OK, based on that and the fact that 4 of those commits should ideally be combined into one (as Greg commented), I will apply those changes as two commits on top of e724f080f5dd, followed by the 8 that I posted recently, and send you a pull request. Thanks, Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC]Two ideas to optimize updating irq routing table
On Wed, Mar 26, 2014 at 08:22:29AM +, Gonglei (Arei) wrote: > > > Based on discussions in: > > > http://lists.gnu.org/archive/html/qemu-devel/2013-11/threads.html#03322 > > > > > > About KVM_SET_GSI_ROUTING ioctl, I tested changing RCU to SRCU, but > > unfortunately > > > it looks like SRCU's grace period is no better than RCU. > > > > Really? This is not what Christian Borntraeger claimed at > > http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/118083 -- in > > fact, Andrew Theurer is currently testing a variant of that patch and I > > was going to post it for 3.16. > > > > Have you tried synchronize_srcu_expedited? Unlike the RCU variant, you > > can use this function. > > > Yes, previously I was using synchronize_srcu, which is not good. When I > changed it to synchronize_srcu_expedited, grace period delay is much better > than synchronize_srcu. Though in our tests, we can still see some impact > of KVM_SET_GSI_ROUTING ioctl. > > Our testing scenario is like this. In VM we run a script that sets > smp_affinity > for each IRQ every 0.5s (this leads QEMU to do KVM_SET_GSI_ROUTING ioctl). > Outside the VM we ping that VM. > > Without patches, ping time can jump from 0.3ms to 2ms-30ms. With > synchronize_srcu > patch, ping time is worse. With synchronize_srcu_expedited patch, ping time > is > overall good, though sometimes ping time jump to 1ms-3ms. > > With following raw patch, ping time is like call_rcu patch, that not > influenced > by setting IRQ affinity, keeps 0.3ms, and there is no vulnerability, frequent > intermidiate KVM_SET_GSI_ROUTING settings are just skipped, and always the > newest > setting would take effect. > > Would you think this patch is acceptable or has problem? Thanks in advance. So this will just queue the update and never bother waiting for it to take effect. I'm not sure it's safe in all cases, but maybe we can optimize out some cases, by detecting in qemu that changes only affect the VCPU that isn't running, and telling KVM that it can delay the expensive synchronization. > diff -urp kvm_kmod/include/linux/kvm_host.h > kvm_kmod_new//include/linux/kvm_host.h > --- kvm_kmod/include/linux/kvm_host.h 2014-03-10 14:08:28.0 + > +++ kvm_kmod_new//include/linux/kvm_host.h2014-03-26 15:07:48.0 > + > @@ -337,6 +337,12 @@ struct kvm { > > struct mutex irq_lock; > #ifdef CONFIG_HAVE_KVM_IRQCHIP > + struct task_struct *kthread; > + wait_queue_head_t wq; > + struct mutex irq_routing_lock; > + struct kvm_irq_routing to_update_routing; > + struct kvm_irq_routing_entry *to_update_entries; > + atomic_t have_new; > /* >* Update side is protected by irq_lock and, >* if configured, irqfds.lock. > diff -urp kvm_kmod/x86/assigned-dev.c kvm_kmod_new//x86/assigned-dev.c > --- kvm_kmod/x86/assigned-dev.c 2014-03-10 14:08:28.0 + > +++ kvm_kmod_new//x86/assigned-dev.c 2014-03-26 15:22:33.0 + > @@ -1056,12 +1056,23 @@ long kvm_vm_ioctl_assigned_device(struct > r = -EFAULT; > urouting = argp; > if (copy_from_user(entries, urouting->entries, > -routing.nr * sizeof(*entries))) > - goto out_free_irq_routing; > - r = kvm_set_irq_routing(kvm, entries, routing.nr, > - routing.flags); > - out_free_irq_routing: > - vfree(entries); > +routing.nr * sizeof(*entries))) { > + vfree(entries); > + break; > + } > + > + mutex_lock(&kvm->irq_routing_lock); > + if (kvm->to_update_entries) { > + vfree(kvm->to_update_entries); > + kvm->to_update_entries = NULL; > + } > + kvm->to_update_routing = routing; > + kvm->to_update_entries = entries; > + atomic_set(&kvm->have_new, 1); > + mutex_unlock(&kvm->irq_routing_lock); > + > + wake_up(&kvm->wq); > + r = 0; /* parameter validity or memory alloc avalibity should > be checked before return */ > break; > } > #endif /* KVM_CAP_IRQ_ROUTING */ > diff -urp kvm_kmod/x86/kvm_main.c kvm_kmod_new//x86/kvm_main.c > --- kvm_kmod/x86/kvm_main.c 2014-03-10 14:08:28.0 + > +++ kvm_kmod_new//x86/kvm_main.c 2014-03-26 15:27:02.0 + > @@ -83,6 +83,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -501,6 +502,49 @@ static void kvm_init_memslots_id(struct > slots->id_to_index[i] = slots->memslots[i].id = i; > } > > +static int do_irq_routing_table_update(struct kvm *kvm) > +{ > + struct kvm_irq_routing_entry *entries; > + unsigned nr; > + unsigned flags; > + > + mutex_lock(&kvm->irq_routing_lock); > + BUG_ON(!kvm->to_update_entries); > +
Re: [RFC]Two ideas to optimize updating irq routing table
On 26/03/14 09:22, Gonglei (Arei) wrote: > Without patches, ping time can jump from 0.3ms to 2ms-30ms. With > synchronize_srcu > patch, ping time is worse. With synchronize_srcu_expedited patch, ping time > is > overall good, though sometimes ping time jump to 1ms-3ms. Just to understand whats going on, does something like diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c index 3318d82..432c2a2 100644 --- a/kernel/rcu/srcu.c +++ b/kernel/rcu/srcu.c @@ -331,7 +331,7 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock); */ #define SRCU_RETRY_CHECK_DELAY 5 #define SYNCHRONIZE_SRCU_TRYCOUNT 2 -#define SYNCHRONIZE_SRCU_EXP_TRYCOUNT 12 +#define SYNCHRONIZE_SRCU_EXP_TRYCOUNT 50 /* * @@@ Wait until all pre-existing readers complete. Such readers make the problem go away? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preparing kvm/next for first pull request of 3.15 merge window
On Wed, 26 Mar 2014 10:19:52 +0100 Paolo Bonzini wrote: > Il 26/03/2014 04:51, Paul Mackerras ha scritto: > >> > I would like to know from ARM and PPC maintainers *now* (before the > >> > merge window opens) what will be in 3.15. Also, PPC guys, please > >> > make sure the pull requests will be based on commit e724f080f5dd > >> > (KVM: PPC: Book3S HV: Fix register usage when loading/saving VRSAVE, > >> > 2014-03-13). > > Alex has 5 commits in his kvm-ppc-queue branch that should go in. > > That tree is based on your kvm-3.14-2 tag, not on e724f080f5dd. > > Would you accept a pull request for that from me on Alex's behalf > > since he's away? > > Yes, of course. > > > Do you need me to rebase it at all? > > These are: > > 1ac4484979 PPC: KVM: Introduce hypervisor call H_GET_TCE > e0a7be38c9 PPC: KVM: fix RESUME_GUEST check before returning from > kvmppc_run_core() > 013e98e657 PPC: KVM: fix RESUME_GUEST check before ending CEDE in > kvmppc_run_core() > e8a7f11fe0 PPC: KVM: fix VCPU run for HV KVM (v2) > 26d96ec97c PPC: KVM: introduce helper to check RESUME_GUEST and related > > I would consider rebasing; but you know better than me the effect of the > two host-crash-fixing patches and how the testability/bisectability of > kvm-ppc-queue is affected. > > In particular, how likely is it that reverse-bisection ("which commit > fixed the bug?") would end up on e724f080f5dd rather than one of the > four RESUME_GUEST commits? > Paul, It is safe to merge e0a7be38c9, 013e98e657, e8a7f11fe0 and 26d96ec97c as a single patch... thing I should I've done from the beginning I guess. :) Cheers. -- Greg > > There is also the set of 8 patches that I just posted and Scott > > acked. I can make a branch based on e724f080f5dd and send you a pull > > request. > > Yes, thanks! > > Paolo > > > Let me know what you would prefer regarding Alex's kvm-ppc-queue > > branch. > > -- > To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Gregory Kurz kurzg...@fr.ibm.com gk...@linux.vnet.ibm.com Software Engineer @ IBM/Meiosys http://www.ibm.com Tel +33 (0)562 165 496 "Anarchy is about taking complete responsibility for yourself." Alan Moore. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GPF in intel_pmu_lbr_reset() with qemu -cpu host
Il 26/03/2014 08:23, Wu, Feng ha scritto: Is there a solution for this issue right now? I also met this GPF crash. Can you attach your .config? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC]Two ideas to optimize updating irq routing table
Il 26/03/2014 09:22, Gonglei (Arei) ha scritto: Yes, previously I was using synchronize_srcu, which is not good. When I changed it to synchronize_srcu_expedited, grace period delay is much better than synchronize_srcu. Though in our tests, we can still see some impact of KVM_SET_GSI_ROUTING ioctl. Our testing scenario is like this. In VM we run a script that sets smp_affinity for each IRQ every 0.5s (this leads QEMU to do KVM_SET_GSI_ROUTING ioctl). Outside the VM we ping that VM. Does the affinity actually change every 0.5s? Without patches, ping time can jump from 0.3ms to 2ms-30ms. With synchronize_srcu patch, ping time is worse. With synchronize_srcu_expedited patch, ping time is overall good, though sometimes ping time jump to 1ms-3ms. With following raw patch, ping time is like call_rcu patch, that not influenced by setting IRQ affinity, keeps 0.3ms, and there is no vulnerability, frequent intermidiate KVM_SET_GSI_ROUTING settings are just skipped, and always the newest setting would take effect. Interesting, but it only works for assigned-dev.c which is deprecated. If you used VFIO you'd see no improvement, and Christian's s390 usecase would also see no improvement. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preparing kvm/next for first pull request of 3.15 merge window
Il 25/03/2014 21:35, Christian Borntraeger ha scritto: On 25/03/14 18:29, Paolo Bonzini wrote: Also, the pull requests I got in the last couple of months were a bit messy, and I'm already fearing that Linus notices. In the future, I'll tag kvm-3.x-base (e.g. kvm-3.16-base) when I open kvm/next, and I will *reject* pull requests from submaintainers that include extra non-KVM commits without a very good reason. can you clarify this statement? What are the dont do things: (I already checked one obvious answer) [ ] I include non-kvm s390 patches (with a Maintainer ack from Martin or Heiko) which are required for other patches. These patches might even go via the s390 tree as well This is okay. The patches will stand out in the diffstat and I'll check that they have Acked-bys before pulling. Bonus for using signed tags and mentioning it in the tag message. :) [ ] My pull request is based on current kvm/next instead of kvm/next that was branched away after rc1 This is not only okay, it's almost always desirable. It's also okay to base your pull request on your last pull request. Any commit between kvm-3.x-base and kvm/next will do. Exception: if there is a particularly important bugfix that you want in both -rc and kvm/next, make the pull request based on kvm-3.x-base. After pulling it in kvm/next, I'll "forward" the pull request immediately to Linus. But it should happen very rarely, and only for very bad bugs introduced during the last merge window. For an example, see https://lkml.org/lkml/2014/3/18/112. [X] My pull request is based on 3.x-rcy instead of kvm/next Indeed this is bad and it is my main complaint. As mentioned above, pull requests should be based on a commit between kvm-3.x-base and kvm/next. [ ] My pull request is based on kvm/queue instead of kvm/next This is not bad per se, but kvm/queue can be rebased: if down the line I have problems with kvm/queue, you'll have to resend the pull request. In fact, there should be no need to base pull requests on kvm/queue. Even if your arch-specific patches touch virt/kvm/ or have prerequisites in virt/kvm/, it is better to harass me until I test kvm/queue and merge it back into kvm/next. :) Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Preparing kvm/next for first pull request of 3.15 merge window
Il 26/03/2014 04:51, Paul Mackerras ha scritto: > I would like to know from ARM and PPC maintainers *now* (before the > merge window opens) what will be in 3.15. Also, PPC guys, please > make sure the pull requests will be based on commit e724f080f5dd > (KVM: PPC: Book3S HV: Fix register usage when loading/saving VRSAVE, > 2014-03-13). Alex has 5 commits in his kvm-ppc-queue branch that should go in. That tree is based on your kvm-3.14-2 tag, not on e724f080f5dd. Would you accept a pull request for that from me on Alex's behalf since he's away? Yes, of course. Do you need me to rebase it at all? These are: 1ac4484979 PPC: KVM: Introduce hypervisor call H_GET_TCE e0a7be38c9 PPC: KVM: fix RESUME_GUEST check before returning from kvmppc_run_core() 013e98e657 PPC: KVM: fix RESUME_GUEST check before ending CEDE in kvmppc_run_core() e8a7f11fe0 PPC: KVM: fix VCPU run for HV KVM (v2) 26d96ec97c PPC: KVM: introduce helper to check RESUME_GUEST and related I would consider rebasing; but you know better than me the effect of the two host-crash-fixing patches and how the testability/bisectability of kvm-ppc-queue is affected. In particular, how likely is it that reverse-bisection ("which commit fixed the bug?") would end up on e724f080f5dd rather than one of the four RESUME_GUEST commits? There is also the set of 8 patches that I just posted and Scott acked. I can make a branch based on e724f080f5dd and send you a pull request. Yes, thanks! Paolo Let me know what you would prefer regarding Alex's kvm-ppc-queue branch. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC]Two ideas to optimize updating irq routing table
> On my system I have HZ=100 and lots of CPUs. So RCUs "every cpu has > scheduled" > is certainly slower than SRCUs algorithm > (/* > * We use an adaptive strategy for synchronize_srcu() and especially for > * synchronize_srcu_expedited(). We spin for a fixed time period > * (defined below) to allow SRCU readers to exit their read-side critical > * sections. If there are still some readers after 10 microseconds, > * we repeatedly block for 1-millisecond time periods. This approach > * has done well in testing, so there is no need for a config parameter. > */ > ) > > With HZ==1000 and a NO. CPUs small SRCUs spinning might be in the same > delay > range than classic RCU depending on how long the read side critical > section is (if we move from spinning to blocking) > So using synchronize_srcu_expedited is certainly something to test as it > increased the spinning time. > > Christian Yes, after we changed to synchronize_srcu_expedited, grace period latency improves much and overall this is good. However as I mentioned in another mail, in our setting-IRQ-affinity and ping test, we can still see some impact of KVM_SET_GSI_ROUTING ioctl. I wrote another patch in that mail and want to be examined to see if it is acceptable or has any problem, thank you. Best regards, -Gonglei -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 72381] [Nested] L1 call trace when create windows 7 guest as L2 guest.
https://bugzilla.kernel.org/show_bug.cgi?id=72381 --- Comment #2 from robert...@intel.com --- Tried this case: L0 94b3ffcd + L1 94b3ffcd , L1 panic. -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC]Two ideas to optimize updating irq routing table
> > Based on discussions in: > > http://lists.gnu.org/archive/html/qemu-devel/2013-11/threads.html#03322 > > > > About KVM_SET_GSI_ROUTING ioctl, I tested changing RCU to SRCU, but > unfortunately > > it looks like SRCU's grace period is no better than RCU. > > Really? This is not what Christian Borntraeger claimed at > http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/118083 -- in > fact, Andrew Theurer is currently testing a variant of that patch and I > was going to post it for 3.16. > > Have you tried synchronize_srcu_expedited? Unlike the RCU variant, you > can use this function. > Yes, previously I was using synchronize_srcu, which is not good. When I changed it to synchronize_srcu_expedited, grace period delay is much better than synchronize_srcu. Though in our tests, we can still see some impact of KVM_SET_GSI_ROUTING ioctl. Our testing scenario is like this. In VM we run a script that sets smp_affinity for each IRQ every 0.5s (this leads QEMU to do KVM_SET_GSI_ROUTING ioctl). Outside the VM we ping that VM. Without patches, ping time can jump from 0.3ms to 2ms-30ms. With synchronize_srcu patch, ping time is worse. With synchronize_srcu_expedited patch, ping time is overall good, though sometimes ping time jump to 1ms-3ms. With following raw patch, ping time is like call_rcu patch, that not influenced by setting IRQ affinity, keeps 0.3ms, and there is no vulnerability, frequent intermidiate KVM_SET_GSI_ROUTING settings are just skipped, and always the newest setting would take effect. Would you think this patch is acceptable or has problem? Thanks in advance. diff -urp kvm_kmod/include/linux/kvm_host.h kvm_kmod_new//include/linux/kvm_host.h --- kvm_kmod/include/linux/kvm_host.h 2014-03-10 14:08:28.0 + +++ kvm_kmod_new//include/linux/kvm_host.h 2014-03-26 15:07:48.0 + @@ -337,6 +337,12 @@ struct kvm { struct mutex irq_lock; #ifdef CONFIG_HAVE_KVM_IRQCHIP + struct task_struct *kthread; + wait_queue_head_t wq; + struct mutex irq_routing_lock; + struct kvm_irq_routing to_update_routing; + struct kvm_irq_routing_entry *to_update_entries; + atomic_t have_new; /* * Update side is protected by irq_lock and, * if configured, irqfds.lock. diff -urp kvm_kmod/x86/assigned-dev.c kvm_kmod_new//x86/assigned-dev.c --- kvm_kmod/x86/assigned-dev.c 2014-03-10 14:08:28.0 + +++ kvm_kmod_new//x86/assigned-dev.c2014-03-26 15:22:33.0 + @@ -1056,12 +1056,23 @@ long kvm_vm_ioctl_assigned_device(struct r = -EFAULT; urouting = argp; if (copy_from_user(entries, urouting->entries, - routing.nr * sizeof(*entries))) - goto out_free_irq_routing; - r = kvm_set_irq_routing(kvm, entries, routing.nr, - routing.flags); - out_free_irq_routing: - vfree(entries); + routing.nr * sizeof(*entries))) { + vfree(entries); + break; + } + + mutex_lock(&kvm->irq_routing_lock); + if (kvm->to_update_entries) { + vfree(kvm->to_update_entries); + kvm->to_update_entries = NULL; + } + kvm->to_update_routing = routing; + kvm->to_update_entries = entries; + atomic_set(&kvm->have_new, 1); + mutex_unlock(&kvm->irq_routing_lock); + + wake_up(&kvm->wq); + r = 0; /* parameter validity or memory alloc avalibity should be checked before return */ break; } #endif /* KVM_CAP_IRQ_ROUTING */ diff -urp kvm_kmod/x86/kvm_main.c kvm_kmod_new//x86/kvm_main.c --- kvm_kmod/x86/kvm_main.c 2014-03-10 14:08:28.0 + +++ kvm_kmod_new//x86/kvm_main.c2014-03-26 15:27:02.0 + @@ -83,6 +83,7 @@ #include #include #include +#include #include #include @@ -501,6 +502,49 @@ static void kvm_init_memslots_id(struct slots->id_to_index[i] = slots->memslots[i].id = i; } +static int do_irq_routing_table_update(struct kvm *kvm) +{ + struct kvm_irq_routing_entry *entries; + unsigned nr; + unsigned flags; + + mutex_lock(&kvm->irq_routing_lock); + BUG_ON(!kvm->to_update_entries); + + nr = kvm->to_update_routing.nr; + flags = kvm->to_update_routing.flags; + entries = vmalloc(nr * sizeof(*entries)); + if (!entries) { + mutex_unlock(&kvm->irq_routing_lock); + return 0; + } + /* this memcpy can be eliminated by doing set in mutex_lock and doing synchronize_rcu outside */ + memcpy(entries, kvm->to_update_entries, nr * sizeof(*entries)); + + atomic_set(&kvm->have_new, 0); + mutex_unlock(&kvm->irq_routing_lock); +
Questions about interrupt injection
Hi all, I want to implement a virtual pci device in qemu and when needed inject a virtual interrupt into the running guest. However, possibly this action would fail and crash the guest VM especially when the workload in guest is heavy. Qemu report the following errors: >KVM: entry failed, hardware error 0x8021 >If you're running a guest on an Intel machine without unrestricted mode> >support, the failure can be most likely due to the guest entering an invalid >state for Intel VT. For example, the guest maybe running in big real mode >which is not supported on less recent Intel processors. BTW, there are two function named kvm_set_irq and kvm_queue_interrupt which can be used to inject interrupt into guest, is it right? Thx a lot. -- Tianxiang Miao, Graduate Student, Institute of Parallel and Distributed Systems (IPADS), School of Software, Shanghai JiaoTong University -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC]Two ideas to optimize updating irq routing table
On 25/03/14 13:37, Paolo Bonzini wrote: > Il 25/03/2014 04:19, Gonglei (Arei) ha scritto: >> Based on discussions in: >> http://lists.gnu.org/archive/html/qemu-devel/2013-11/threads.html#03322 >> >> About KVM_SET_GSI_ROUTING ioctl, I tested changing RCU to SRCU, but >> unfortunately >> it looks like SRCU's grace period is no better than RCU. > > Really? This is not what Christian Borntraeger claimed at > http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/118083 -- in fact, > Andrew Theurer is currently testing a variant of that patch and I was going > to post it for 3.16. > > Have you tried synchronize_srcu_expedited? Unlike the RCU variant, you can > use this function. On my system I have HZ=100 and lots of CPUs. So RCUs "every cpu has scheduled" is certainly slower than SRCUs algorithm (/* * We use an adaptive strategy for synchronize_srcu() and especially for * synchronize_srcu_expedited(). We spin for a fixed time period * (defined below) to allow SRCU readers to exit their read-side critical * sections. If there are still some readers after 10 microseconds, * we repeatedly block for 1-millisecond time periods. This approach * has done well in testing, so there is no need for a config parameter. */ ) With HZ==1000 and a NO. CPUs small SRCUs spinning might be in the same delay range than classic RCU depending on how long the read side critical section is (if we move from spinning to blocking) So using synchronize_srcu_expedited is certainly something to test as it increased the spinning time. Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: GPF in intel_pmu_lbr_reset() with qemu -cpu host
> -Original Message- > From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On > Behalf Of Peter Wu > Sent: Sunday, March 23, 2014 6:00 AM > To: Gleb Natapov > Cc: Venkatesh Srinivas; Peter Zijlstra; Ingo Molnar; Andi Kleen; Linux Kernel > Developers List; H. Peter Anvin; kvm@vger.kernel.org; Paolo Bonzini > Subject: Re: GPF in intel_pmu_lbr_reset() with qemu -cpu host > > On Saturday 22 March 2014 14:27:59 Gleb Natapov wrote: > > > but now I have a NULL dereference (in rapl_pmu_init). Previously, when > > > `-cpu SandyBridge` was passed to qemu, it would show this: > > > > > > [0.016995] Performance Events: unsupported p6 CPU model 42 > no PMU driver, software events only. > > > > > > The same NULL pointer deref would be visible (slightly different > > > addresses, but the Code lines are equal). With `-host`, the NULL deref > > > with `-cpu host` contains: > > > > > > [0.016445] Performance Events: 16-deep LBR, IvyBridge events, > Intel PMU driver. > > > > > > Full dmesg below. > > > > > I am confused. Do you see crash now with -cpu SandyBridge and -cpu host, or > -cpu host only? > > The RAPL crash is seen with both SandyBridge and host, I mentioned SB > because that environment should be more constant than "host" (which > depends on the CPU you have on, well, the host). > > Peter Is there a solution for this issue right now? I also met this GPF crash. > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Thanks, Feng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html