[PATCH] PPC: bpf_jit_comp: add SKF_AD_PKTTYPE instruction
Cc: Matt Evans m...@ozlabs.org Signed-off-by: Denis Kirjanov k...@linux-powerpc.org --- arch/powerpc/include/asm/ppc-opcode.h | 1 + arch/powerpc/net/bpf_jit.h| 7 +++ arch/powerpc/net/bpf_jit_comp.c | 5 + 3 files changed, 13 insertions(+) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 6f85362..1a52877 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -204,6 +204,7 @@ #define PPC_INST_ERATSX_DOT0x7c000127 /* Misc instructions for BPF compiler */ +#define PPC_INST_LBZ 0x8800 #define PPC_INST_LD0xe800 #define PPC_INST_LHZ 0xa000 #define PPC_INST_LHBRX 0x7c00062c diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h index 9aee27c..c406aa9 100644 --- a/arch/powerpc/net/bpf_jit.h +++ b/arch/powerpc/net/bpf_jit.h @@ -87,6 +87,9 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh); #define PPC_STD(r, base, i)EMIT(PPC_INST_STD | ___PPC_RS(r) |\ ___PPC_RA(base) | ((i) 0xfffc)) + +#define PPC_LBZ(r, base, i)EMIT(PPC_INST_LBZ | ___PPC_RT(r) |\ +___PPC_RA(base) | IMM_L(i)) #define PPC_LD(r, base, i) EMIT(PPC_INST_LD | ___PPC_RT(r) | \ ___PPC_RA(base) | IMM_L(i)) #define PPC_LWZ(r, base, i)EMIT(PPC_INST_LWZ | ___PPC_RT(r) |\ @@ -96,6 +99,10 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh); #define PPC_LHBRX(r, base, b) EMIT(PPC_INST_LHBRX | ___PPC_RT(r) | \ ___PPC_RA(base) | ___PPC_RB(b)) /* Convenience helpers for the above with 'far' offsets: */ +#define PPC_LBZ_OFFS(r, base, i) do { if ((i) 32768) PPC_LBZ(r, base, i); \ + else { PPC_ADDIS(r, base, IMM_HA(i));\ + PPC_LBZ(r, r, IMM_L(i)); } } while(0) + #define PPC_LD_OFFS(r, base, i) do { if ((i) 32768) PPC_LD(r, base, i); \ else { PPC_ADDIS(r, base, IMM_HA(i));\ PPC_LD(r, r, IMM_L(i)); } } while(0) diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c index cbae2df..d110e28 100644 --- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -407,6 +407,11 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, PPC_LHZ_OFFS(r_A, r_skb, offsetof(struct sk_buff, queue_mapping)); break; + case BPF_ANC | SKF_AD_PKTTYPE: + PPC_LBZ_OFFS(r_A, r_skb, PKT_TYPE_OFFSET()); + PPC_ANDI(r_A, r_A, PKT_TYPE_MAX); + PPC_SRWI(r_A, r_A, 5); + break; case BPF_ANC | SKF_AD_CPU: #ifdef CONFIG_SMP /* -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 1/2] mm: Update generic gup implementation to handle hugepage directory
On Fri, 2014-10-24 at 09:22 -0700, James Bottomley wrote: Parisc does this. As soon as one CPU issues a TLB purge, it's broadcast to all the CPUs on the inter-CPU bus. The next instruction isn't executed until they respond. But this is only for our CPU TLB. There's no other external consequence, so removal from the page tables isn't effected by this TLB flush, therefore the theory on which Dave bases the change to atomic_add() should work for us (of course, atomic_add is lock add unlock on our CPU, so it's not going to be of much benefit). I'm not sure I follow you here. Do you or do you now perform an IPI to do TLB flushes ? If you don't (for example because you have HW broadcast), then you need the speculative get_page(). If you do (and can read a PTE atomically), you can get away with atomic_add(). The reason is that if you remember how zap_pte_range works, we perform the flush before we get rid of the page. So if your using IPIs for the flush, the fact that gup_fast has interrupts disabled will delay the IPI response and thus effectively prevent the pages from being actually freed, allowing us to simply do the atomic_add() on x86. But if we don't use IPIs because we have HW broadcast of TLB invalidations, then we don't have that synchronization. atomic_add won't work, we need get_page_speculative() because the page could be concurrently being freed. Cheers, Ben. James Another option would be to make the generic code use something defined by the arch to decide whether to use speculative get or not. I like the idea of keeping the bulk of that code generic... Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V2 1/2] mm: Update generic gup implementation to handle hugepage directory
Hello, On Mon, Oct 27, 2014 at 07:50:41AM +1100, Benjamin Herrenschmidt wrote: On Fri, 2014-10-24 at 09:22 -0700, James Bottomley wrote: Parisc does this. As soon as one CPU issues a TLB purge, it's broadcast to all the CPUs on the inter-CPU bus. The next instruction isn't executed until they respond. But this is only for our CPU TLB. There's no other external consequence, so removal from the page tables isn't effected by this TLB flush, therefore the theory on which Dave bases the change to atomic_add() should work for us (of course, atomic_add is lock add unlock on our CPU, so it's not going to be of much benefit). I'm not sure I follow you here. Do you or do you now perform an IPI to do TLB flushes ? If you don't (for example because you have HW broadcast), then you need the speculative get_page(). If you do (and can read a PTE atomically), you can get away with atomic_add(). The reason is that if you remember how zap_pte_range works, we perform the flush before we get rid of the page. So if your using IPIs for the flush, the fact that gup_fast has interrupts disabled will delay the IPI response and thus effectively prevent the pages from being actually freed, allowing us to simply do the atomic_add() on x86. But if we don't use IPIs because we have HW broadcast of TLB invalidations, then we don't have that synchronization. atomic_add won't work, we need get_page_speculative() because the page could be concurrently being freed. I looked at how this works more closely and I agree get_page_unless_zero is always necessary if the TLB flush doesn't always wait for IPIs to all CPUs where a gup_fast may be running onto. To summarize, the pagetables are freed with RCU (arch sets HAVE_RCU_TABLE_FREE) and that allows to walk them lockless with RCU. After we can walk the pagetables lockless with RCU, we get to the page lockless, but the pages themself can still be freed at any time from under us (hence the need for get_page_unless_zero). The additional trick gup_fast RCU does is to recheck the pte after elevating the page count with get_page_unless_zero. Rechecking the pte/hugepmd to be sure it didn't change from under us is critical to be sure get_page_unless_zero didn't run after the page was freed and reallocated which would otherwise lead to a security problem too (i.e. it protects against get_page_unless_zero false positives). The last bit required is to still disable irqs like on x86 to serialize against THP splits combined with pmdp_splitting_flush always delivering IPIs (pmdp_splitting_flush must wait all gup_fast to complete before proceeding in mangling the page struct of the compound page). Preventing the irq disable while taking a gup_fast pin using compound_lock isn't as easy as it is to do for put_page. put_page (non-compound) fastest path remains THP agnostic because collapse_huge_page is inhibited by any existing gup pin, but here we're exactly taking it, so we can't depend on it to already exist to avoid the race with collapse_huge_page. It's not just split_huge_page we need to protect against. So while thinking the above summary, I noticed this patch misses a IPI in mm/huge_memory.c that must be delivered after pmdp_clear_flush below to be safe against collapse_huge_page for the same reasons it sends it within pmdp_splitting_flush. Without this IPI what can happen is that the GUP pin protection in __collapse_huge_page_isolate races against gup_fast-RCU. If gup_fast reads the pte on one CPU before pmdp_clear_flush, and on the other CPU __collapse_huge_page_isolate succeeds, then gup_fast could recheck the pte that hasn't been zapped yet by __collapse_huge_page_copy. gup_fast would succeed because the pte wasn't zapped yet, but then __collapse_huge_page_copy would run replacing the pte with a transhuge pmd, making gup_fast return the old page, while the process got the copy as part of the collapsed hugepage. /* * After this gup_fast can't run anymore. This also removes ^ - invariant broken by gup_fast-RCU * any huge TLB entry from the CPU so we won't allow * huge and small TLB entries for the same virtual address * to avoid the risk of CPU bugs in that area. */ _pmd = pmdp_clear_flush(vma, address, pmd); spin_unlock(pmd_ptl); mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); spin_lock(pte_ptl); isolated = __collapse_huge_page_isolate(vma, address, pte); spin_unlock(pte_ptl); CPU0CPU1 - - gup_fast-RCU local_irq_disable() pte = pte_offset_map(pmd, address) pmdp_clear_flush (not sending IPI - bug) __collapse_huge_page_isolate - succeeds (page_count != 1 gup-pin check of
Re: [PATCH v3 16/27] Mips/MSI: Save msi chip in pci sysdata
On 2014/10/25 21:04, Ralf Baechle wrote: On Wed, Oct 15, 2014 at 11:07:04AM +0800, Yijing Wang wrote: +static inline struct msi_chip *pci_msi_chip(struct pci_bus *bus) +{ +struct pci_controller *control = (struct pci_controller *)bus-sysdata; bus-sysdata is void * so this cast is unnecessary. Yes, will update it, thanks! Ralf . -- Thanks! Yijing ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] cpuidle/powernv: Populate cpuidle state details by querying the device-tree
On Fri, 2014-10-24 at 15:30 +0100, Lorenzo Pieralisi wrote: On Tue, Oct 14, 2014 at 08:53:00AM +0100, Preeti U Murthy wrote: We hard code the metrics relevant for cpuidle states in the kernel today. Instead pick them up from the device tree so that they remain relevant and updated for the system that the kernel is running on. Device tree properties should be documented, and these bindings are getting very similar to the ones I have just completed for ARM, I wonder whether we should take the generic bits out of ARM bindings (ie exit_latency) and make those available to other architectures. The firmware that emits those properties is already in the field, so it would have been nice to use a generic binding but it's too late now. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH] cpufreq: qoriq: Make the driver usable on all QorIQ platforms
-Original Message- From: Viresh Kumar [mailto:viresh.ku...@linaro.org] Sent: Tuesday, October 21, 2014 5:04 PM To: Tang Yuantian-B29983 Cc: Rafael J. Wysocki; Linux Kernel Mailing List; linux...@vger.kernel.org; linuxppc-...@ozlabs.org Subject: Re: [PATCH] cpufreq: qoriq: Make the driver usable on all QorIQ platforms On 21 October 2014 14:29, Yuantian Tang yuantian.t...@freescale.com wrote: If I do so, menuconfig will display like this(on PPC): PowerPC CPU frequency scaling drivers QorIQ CPU Frequency scaling --- * CPU frequency scaling driver for Freescale QorIQ SoCs On ARM, there should be a similar problem. Isn't weird? Similar is true for cpufreq-cpu0 driver as well.. Maybe we can create a Kconfig.drivers configuration and include it from all architecture specific ones ? @ Rafael ? Do we have a conclusion yet? Regards, Yuantian ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 03/11] powerpc: kvm: add interface to control kvm function on a core
Hi Liu, On 10/17/2014 12:59 AM, kernelf...@gmail.com wrote: When kvm is enabled on a core, we migrate all external irq to primary thread. Since currently, the kvmirq logic is handled by the primary hwthread. Todo: this patch lacks re-enable of irqbalance when kvm is disable on the core Why is a sysfs file introduced to trigger irq migration? Why is it not done during kvm module insert ? And similarly spread interrupts when the module is removed? Isn't this a saner way ? Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- arch/powerpc/kernel/sysfs.c| 39 ++ arch/powerpc/sysdev/xics/xics-common.c | 12 +++ 2 files changed, 51 insertions(+) diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index 67fd2fd..a2595dd 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -552,6 +552,45 @@ static void sysfs_create_dscr_default(void) if (cpu_has_feature(CPU_FTR_DSCR)) err = device_create_file(cpu_subsys.dev_root, dev_attr_dscr_default); } + +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY +#define NR_CORES (CONFIG_NR_CPUS/threads_per_core) +static DECLARE_BITMAP(kvm_on_core, NR_CORES) __read_mostly + +static ssize_t show_kvm_enable(struct device *dev, + struct device_attribute *attr, char *buf) +{ +} + +static ssize_t __used store_kvm_enable(struct device *dev, + struct device_attribute *attr, const char *buf, + size_t count) +{ + struct cpumask stop_cpus; + unsigned long core, thr; + + sscanf(buf, %lx, core); + if (core NR_CORES) + return -1; + if (!test_bit(core, kvm_on_core)) + for (thr = 1; thr threads_per_core; thr++) + if (cpu_online(thr * threads_per_core + thr)) + cpumask_set_cpu(thr * threads_per_core + thr, stop_cpus); What is the above logic trying to do? Did you mean cpu_online(threads_per_core * core + thr) ? + + stop_machine(xics_migrate_irqs_away_secondary, NULL, stop_cpus); + set_bit(core, kvm_on_core); + return count; +} + +static DEVICE_ATTR(kvm_enable, 0600, + show_kvm_enable, store_kvm_enable); + +static void sysfs_create_kvm_enable(void) +{ + device_create_file(cpu_subsys.dev_root, dev_attr_kvm_enable); +} +#endif + #endif /* CONFIG_PPC64 */ #ifdef HAS_PPC_PMC_PA6T diff --git a/arch/powerpc/sysdev/xics/xics-common.c b/arch/powerpc/sysdev/xics/xics-common.c index fe0cca4..68b33d8 100644 --- a/arch/powerpc/sysdev/xics/xics-common.c +++ b/arch/powerpc/sysdev/xics/xics-common.c @@ -258,6 +258,18 @@ unlock: raw_spin_unlock_irqrestore(desc-lock, flags); } } + +int xics_migrate_irqs_away_secondary(void *data) +{ + int cpu = smp_processor_id(); + if(cpu%thread_per_core != 0) { + WARN(condition, format...); + return 0; + } + /* In fact, if we can migrate the primary, it will be more fine */ + xics_migrate_irqs_away(); Isn't the aim of the patch to migrate irqs away from the secondary onto the primary? But from above it looks like we are returning when we find out that we are secondary threads, isn't it? + return 0; +} #endif /* CONFIG_HOTPLUG_CPU */ Note that xics_migrate_irqs_away() is defined under CONFIG_CPU_HOTPLUG. But we will need this option on PowerKVM even when hotplug is not configured in. Regards Preeti U Murthy #ifdef CONFIG_SMP ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] CXL: Fix PSL error due to duplicate segment table entries
From: Ian Munsie imun...@au1.ibm.com In certain circumstances the PSL can send an interrupt for a segment miss that the kernel has already handled. This can happen if multiple translations for the same segment are queued in the PSL before the kernel has restarted the first translation. The CXL driver did not expect this situation and did not check if a segment had already been handled. This could cause a duplicate segment table entry which in turn caused a PSL error taking down the card. This patch fixes the issue by checking for existing entries in the segment table that match the segment it is trying to insert to avoid inserting duplicate entries. Some of the code has been refactored to simplify it - the segment table hash has been moved from cxl_load_segment to find_free_sste where it is used and we have disabled the secondary hash in the segment table to reduce the number of entries that need to be tested from 16 to 8. Due to the large segment sizes we use it is extremely unlikely that the secondary hash would ever have been used in practice, so this should not have any negative impacts and may even improve performance. copro_calculate_slb will now mask the ESID by the correct mask for 1T vs 256M segments. This has no effect by itself as the extra bits were ignored, but it makes debugging the segment table entries easier and means that we can directly compare the ESID values for duplicates without needing to worry about masking in the comparison. Signed-off-by: Ian Munsie imun...@au1.ibm.com --- arch/powerpc/mm/copro_fault.c | 3 +- drivers/misc/cxl/fault.c | 73 ++- drivers/misc/cxl/native.c | 4 +-- 3 files changed, 41 insertions(+), 39 deletions(-) diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c index 0f9939e..5a236f0 100644 --- a/arch/powerpc/mm/copro_fault.c +++ b/arch/powerpc/mm/copro_fault.c @@ -99,8 +99,6 @@ int copro_calculate_slb(struct mm_struct *mm, u64 ea, struct copro_slb *slb) u64 vsid; int psize, ssize; - slb-esid = (ea ESID_MASK) | SLB_ESID_V; - switch (REGION_ID(ea)) { case USER_REGION_ID: pr_devel(%s: 0x%llx -- USER_REGION_ID\n, __func__, ea); @@ -133,6 +131,7 @@ int copro_calculate_slb(struct mm_struct *mm, u64 ea, struct copro_slb *slb) vsid |= mmu_psize_defs[psize].sllp | ((ssize == MMU_SEGSIZE_1T) ? SLB_VSID_B_1T : 0); + slb-esid = (ea (ssize == MMU_SEGSIZE_1T ? ESID_MASK_1T : ESID_MASK)) | SLB_ESID_V; slb-vsid = vsid; return 0; diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c index 69506eb..421cfd6 100644 --- a/drivers/misc/cxl/fault.c +++ b/drivers/misc/cxl/fault.c @@ -21,60 +21,63 @@ #include cxl.h -static struct cxl_sste* find_free_sste(struct cxl_sste *primary_group, - bool sec_hash, - struct cxl_sste *secondary_group, - unsigned int *lru) +static bool sste_matches(struct cxl_sste *sste, struct copro_slb *slb) { - unsigned int i, entry; - struct cxl_sste *sste, *group = primary_group; - - for (i = 0; i 2; i++) { - for (entry = 0; entry 8; entry++) { - sste = group + entry; - if (!(be64_to_cpu(sste-esid_data) SLB_ESID_V)) - return sste; - } - if (!sec_hash) - break; - group = secondary_group; + return ((sste-vsid_data == cpu_to_be64(slb-vsid)) + (sste-esid_data == cpu_to_be64(slb-esid))); +} + +/* This finds a free SSTE and checks to see if it's already in table */ +static struct cxl_sste* find_free_sste(struct cxl_context *ctx, + struct copro_slb *slb) +{ + struct cxl_sste *primary, *sste, *ret = NULL; + unsigned int mask = (ctx-sst_size 7) - 1; /* SSTP0[SegTableSize] */ + unsigned int entry; + unsigned int hash; + + if (slb-vsid SLB_VSID_B_1T) + hash = (slb-esid SID_SHIFT_1T) mask; + else /* 256M */ + hash = (slb-esid SID_SHIFT) mask; + + primary = ctx-sstp + (hash 3); + sste = primary; + + for (entry = 0; entry 8; entry++) { + if (!ret !(be64_to_cpu(sste-esid_data) SLB_ESID_V)) + ret = sste; + if (sste_matches(sste, slb)) + return NULL; + sste++; } + if (ret) + return ret; + /* Nothing free, select an entry to cast out */ - if (sec_hash (*lru 0x8)) - sste = secondary_group + (*lru 0x7); - else - sste = primary_group + (*lru 0x7); - *lru = (*lru + 1) 0xf; + ret = primary + ctx-sst_lru; + ctx-sst_lru = (ctx-sst_lru + 1) 0x7; - return sste; +
Re: [RFC 04/11] powerpc: kvm: introduce a kthread on primary thread to anti tickless
On 10/17/2014 12:59 AM, kernelf...@gmail.com wrote: (This patch is a place holder.) If there is only one vcpu thread is ready(the other vcpu thread can wait for it to execute), the primary thread can enter tickless mode, We do not configure NOHZ_FULL to y by default. Hence no thread would enter tickless mode. which causes the primary keeps running, so the secondary has no opportunity to exit to host, even they have other tsk on them. The secondary threads can still get scheduling ticks. The decrementer of the secondary threads is still active. So as long as secondary threads are busy, scheduling ticks will fire and try to schedule a new task on them. Regards Preeti U Murthy Introduce a kthread (anti_tickless) on primary, so when there is only one vcpu thread on primary, the secondary can resort to anti_tickless to keep the primary out of tickless mode. (I thought that anti_tickless thread can goto NAP, so we can let the secondary run). Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- arch/powerpc/kernel/sysfs.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index a2595dd..f0b110e 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -575,9 +575,11 @@ static ssize_t __used store_kvm_enable(struct device *dev, if (!test_bit(core, kvm_on_core)) for (thr = 1; thr threads_per_core; thr++) if (cpu_online(thr * threads_per_core + thr)) - cpumask_set_cpu(thr * threads_per_core + thr, stop_cpus); + cpumask_set_cpu(core * threads_per_core + thr, stop_cpus); stop_machine(xics_migrate_irqs_away_secondary, NULL, stop_cpus); + /* fixme, create a kthread on primary hwthread to handle tickless mode */ + //kthread_create_on_cpu(prevent_tickless, NULL, core * threads_per_core, ppckvm_prevent_tickless); set_bit(core, kvm_on_core); return count; } ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC 06/11] powerpc: kvm: introduce online in paca to indicate whether cpu is needed by host
Hi Liu, On 10/17/2014 12:59 AM, kernelf...@gmail.com wrote: Nowadays, powerKVM runs with secondary hwthread offline. Although we can make all secondary hwthread online later, we still preserve this behavior for dedicated KVM env. Achieve this by setting paca-online as false. Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- arch/powerpc/include/asm/paca.h | 3 +++ arch/powerpc/kernel/asm-offsets.c | 3 +++ arch/powerpc/kernel/smp.c | 3 +++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 12 4 files changed, 21 insertions(+) diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index a5139ea..67c2500 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -84,6 +84,9 @@ struct paca_struct { u8 cpu_start; /* At startup, processor spins until */ /* this becomes non-zero. */ u8 kexec_state; /* set when kexec down has irqs off */ +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY + u8 online; +#endif #ifdef CONFIG_PPC_STD_MMU_64 struct slb_shadow *slb_shadow_ptr; struct dtl_entry *dispatch_log; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 9d7dede..0faa8fe 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -182,6 +182,9 @@ int main(void) DEFINE(PACATOC, offsetof(struct paca_struct, kernel_toc)); DEFINE(PACAKBASE, offsetof(struct paca_struct, kernelbase)); DEFINE(PACAKMSR, offsetof(struct paca_struct, kernel_msr)); +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY + DEFINE(PACAONLINE, offsetof(struct paca_struct, online)); +#endif DEFINE(PACASOFTIRQEN, offsetof(struct paca_struct, soft_enabled)); DEFINE(PACAIRQHAPPENED, offsetof(struct paca_struct, irq_happened)); DEFINE(PACACONTEXTID, offsetof(struct paca_struct, context.id)); diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index a0738af..4c3843e 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -736,6 +736,9 @@ void start_secondary(void *unused) cpu_startup_entry(CPUHP_ONLINE); +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY + get_paca()-online = true; +#endif BUG(); } diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index f0c4db7..d5594b0 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -322,6 +322,13 @@ kvm_no_guest: li r0, KVM_HWTHREAD_IN_NAP stb r0, HSTATE_HWTHREAD_STATE(r13) kvm_do_nap: +#ifdef PPCKVM_ENABLE_SECONDARY + /* check the cpu is needed by host or not */ + ld r2, PACAONLINE(r13) + ld r3, 0 + cmp r2, r3 + bne kvm_secondary_exit_trampoline +#endif /* Clear the runlatch bit before napping */ mfspr r2, SPRN_CTRLF clrrdi r2, r2, 1 @@ -340,6 +347,11 @@ kvm_do_nap: nap b . +#ifdef PPCKVM_ENABLE_SECONDARY +kvm_secondary_exit_trampoline: + b . Uh? When we have no vcpu to run, we loop here instead of doing a nap? What are we achieving? If I understand the intention of the patch well, we are looking to provide a knob whereby the host can indicate if it needs the secondaries at all. Today the host does boot with all threads online. There are some init scripts which take the secondaries down. So today the host does not have a say in preventing this, compile time or runtime. So lets see how we can switch between the two behaviors if we don't have the init script, which looks like a saner thing to do. We should set the paca-online flag to false by default. If KVM_PPC_ENABLE_SECONDARY is configured, we need to set this flag to true. So at compile time, we resolve the flag. While booting, we look at the flag and decide whether to get the secondaries online. So we get the current behavior if we have not configured KVM_PPC_ENABLE_SECONDARY. Will this achieve the purpose of this patch? Regards Preeti U Murthy ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev