[PATCH] PPC: bpf_jit_comp: add SKF_AD_PKTTYPE instruction

2014-10-26 Thread Denis Kirjanov
Cc: Matt Evans m...@ozlabs.org
Signed-off-by: Denis Kirjanov k...@linux-powerpc.org
---
 arch/powerpc/include/asm/ppc-opcode.h | 1 +
 arch/powerpc/net/bpf_jit.h| 7 +++
 arch/powerpc/net/bpf_jit_comp.c   | 5 +
 3 files changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 6f85362..1a52877 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -204,6 +204,7 @@
 #define PPC_INST_ERATSX_DOT0x7c000127
 
 /* Misc instructions for BPF compiler */
+#define PPC_INST_LBZ   0x8800
 #define PPC_INST_LD0xe800
 #define PPC_INST_LHZ   0xa000
 #define PPC_INST_LHBRX 0x7c00062c
diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 9aee27c..c406aa9 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -87,6 +87,9 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh);
 #define PPC_STD(r, base, i)EMIT(PPC_INST_STD | ___PPC_RS(r) |\
 ___PPC_RA(base) | ((i)  0xfffc))
 
+
+#define PPC_LBZ(r, base, i)EMIT(PPC_INST_LBZ | ___PPC_RT(r) |\
+___PPC_RA(base) | IMM_L(i))
 #define PPC_LD(r, base, i) EMIT(PPC_INST_LD | ___PPC_RT(r) | \
 ___PPC_RA(base) | IMM_L(i))
 #define PPC_LWZ(r, base, i)EMIT(PPC_INST_LWZ | ___PPC_RT(r) |\
@@ -96,6 +99,10 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh);
 #define PPC_LHBRX(r, base, b)  EMIT(PPC_INST_LHBRX | ___PPC_RT(r) |  \
 ___PPC_RA(base) | ___PPC_RB(b))
 /* Convenience helpers for the above with 'far' offsets: */
+#define PPC_LBZ_OFFS(r, base, i) do { if ((i)  32768) PPC_LBZ(r, base, i);   \
+   else {  PPC_ADDIS(r, base, IMM_HA(i));\
+   PPC_LBZ(r, r, IMM_L(i)); } } while(0)
+
 #define PPC_LD_OFFS(r, base, i) do { if ((i)  32768) PPC_LD(r, base, i); \
else {  PPC_ADDIS(r, base, IMM_HA(i));\
PPC_LD(r, r, IMM_L(i)); } } while(0)
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index cbae2df..d110e28 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -407,6 +407,11 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
*image,
PPC_LHZ_OFFS(r_A, r_skb, offsetof(struct sk_buff,
  queue_mapping));
break;
+   case BPF_ANC | SKF_AD_PKTTYPE:
+   PPC_LBZ_OFFS(r_A, r_skb, PKT_TYPE_OFFSET());
+   PPC_ANDI(r_A, r_A, PKT_TYPE_MAX);
+   PPC_SRWI(r_A, r_A, 5);
+   break;
case BPF_ANC | SKF_AD_CPU:
 #ifdef CONFIG_SMP
/*
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 1/2] mm: Update generic gup implementation to handle hugepage directory

2014-10-26 Thread Benjamin Herrenschmidt
On Fri, 2014-10-24 at 09:22 -0700, James Bottomley wrote:

 Parisc does this.  As soon as one CPU issues a TLB purge, it's broadcast
 to all the CPUs on the inter-CPU bus.  The next instruction isn't
 executed until they respond.
 
 But this is only for our CPU TLB.  There's no other external
 consequence, so removal from the page tables isn't effected by this TLB
 flush, therefore the theory on which Dave bases the change to
 atomic_add() should work for us (of course, atomic_add is lock add
 unlock on our CPU, so it's not going to be of much benefit).

I'm not sure I follow you here.

Do you or do you now perform an IPI to do TLB flushes ? If you don't
(for example because you have HW broadcast), then you need the
speculative get_page(). If you do (and can read a PTE atomically), you
can get away with atomic_add().

The reason is that if you remember how zap_pte_range works, we perform
the flush before we get rid of the page.

So if your using IPIs for the flush, the fact that gup_fast has
interrupts disabled will delay the IPI response and thus effectively
prevent the pages from being actually freed, allowing us to simply do
the atomic_add() on x86.

But if we don't use IPIs because we have HW broadcast of TLB
invalidations, then we don't have that synchronization. atomic_add won't
work, we need get_page_speculative() because the page could be
concurrently being freed.

Cheers,
Ben.

 James
 
  Another option would be to make the generic code use something defined
  by the arch to decide whether to use speculative get or
  not. I like the idea of keeping the bulk of that code generic...
  
  Cheers,
  Ben.
  
   --
   To unsubscribe, send a message with 'unsubscribe linux-mm' in
   the body to majord...@kvack.org.  For more info on Linux MM,
   see: http://www.linux-mm.org/ .
   Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
  
  
  --
  To unsubscribe, send a message with 'unsubscribe linux-mm' in
  the body to majord...@kvack.org.  For more info on Linux MM,
  see: http://www.linux-mm.org/ .
  Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a
  
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 1/2] mm: Update generic gup implementation to handle hugepage directory

2014-10-26 Thread Andrea Arcangeli
Hello,

On Mon, Oct 27, 2014 at 07:50:41AM +1100, Benjamin Herrenschmidt wrote:
 On Fri, 2014-10-24 at 09:22 -0700, James Bottomley wrote:
 
  Parisc does this.  As soon as one CPU issues a TLB purge, it's broadcast
  to all the CPUs on the inter-CPU bus.  The next instruction isn't
  executed until they respond.
  
  But this is only for our CPU TLB.  There's no other external
  consequence, so removal from the page tables isn't effected by this TLB
  flush, therefore the theory on which Dave bases the change to
  atomic_add() should work for us (of course, atomic_add is lock add
  unlock on our CPU, so it's not going to be of much benefit).
 
 I'm not sure I follow you here.
 
 Do you or do you now perform an IPI to do TLB flushes ? If you don't
 (for example because you have HW broadcast), then you need the
 speculative get_page(). If you do (and can read a PTE atomically), you
 can get away with atomic_add().
 
 The reason is that if you remember how zap_pte_range works, we perform
 the flush before we get rid of the page.
 
 So if your using IPIs for the flush, the fact that gup_fast has
 interrupts disabled will delay the IPI response and thus effectively
 prevent the pages from being actually freed, allowing us to simply do
 the atomic_add() on x86.
 
 But if we don't use IPIs because we have HW broadcast of TLB
 invalidations, then we don't have that synchronization. atomic_add won't
 work, we need get_page_speculative() because the page could be
 concurrently being freed.

I looked at how this works more closely and I agree
get_page_unless_zero is always necessary if the TLB flush doesn't
always wait for IPIs to all CPUs where a gup_fast may be running onto.

To summarize, the pagetables are freed with RCU (arch sets
HAVE_RCU_TABLE_FREE) and that allows to walk them lockless with RCU.

After we can walk the pagetables lockless with RCU, we get to the page
lockless, but the pages themself can still be freed at any time from
under us (hence the need for get_page_unless_zero).

The additional trick gup_fast RCU does is to recheck the pte after
elevating the page count with get_page_unless_zero. Rechecking the
pte/hugepmd to be sure it didn't change from under us is critical to
be sure get_page_unless_zero didn't run after the page was freed and
reallocated which would otherwise lead to a security problem too
(i.e. it protects against get_page_unless_zero false positives).

The last bit required is to still disable irqs like on x86 to
serialize against THP splits combined with pmdp_splitting_flush always
delivering IPIs (pmdp_splitting_flush must wait all gup_fast to
complete before proceeding in mangling the page struct of the compound
page).

Preventing the irq disable while taking a gup_fast pin using
compound_lock isn't as easy as it is to do for put_page. put_page
(non-compound) fastest path remains THP agnostic because
collapse_huge_page is inhibited by any existing gup pin, but here
we're exactly taking it, so we can't depend on it to already exist to
avoid the race with collapse_huge_page. It's not just split_huge_page
we need to protect against.

So while thinking the above summary, I noticed this patch misses a IPI
in mm/huge_memory.c that must be delivered after pmdp_clear_flush
below to be safe against collapse_huge_page for the same reasons it
sends it within pmdp_splitting_flush. Without this IPI what can happen
is that the GUP pin protection in __collapse_huge_page_isolate races
against gup_fast-RCU.

If gup_fast reads the pte on one CPU before pmdp_clear_flush, and on
the other CPU __collapse_huge_page_isolate succeeds, then gup_fast
could recheck the pte that hasn't been zapped yet by
__collapse_huge_page_copy. gup_fast would succeed because the pte
wasn't zapped yet, but then __collapse_huge_page_copy would run
replacing the pte with a transhuge pmd, making gup_fast return the old
page, while the process got the copy as part of the collapsed hugepage.

/*
 * After this gup_fast can't run anymore. This also removes
   ^ - invariant broken by 
gup_fast-RCU
 * any huge TLB entry from the CPU so we won't allow
 * huge and small TLB entries for the same virtual address
 * to avoid the risk of CPU bugs in that area.
 */
_pmd = pmdp_clear_flush(vma, address, pmd);
spin_unlock(pmd_ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);

spin_lock(pte_ptl);
isolated = __collapse_huge_page_isolate(vma, address, pte);
spin_unlock(pte_ptl);

CPU0CPU1
-   -
gup_fast-RCU
local_irq_disable()
pte = pte_offset_map(pmd, address)

pmdp_clear_flush (not sending IPI - 
bug)

__collapse_huge_page_isolate - succeeds

(page_count != 1 gup-pin check of
   

Re: [PATCH v3 16/27] Mips/MSI: Save msi chip in pci sysdata

2014-10-26 Thread Yijing Wang
On 2014/10/25 21:04, Ralf Baechle wrote:
 On Wed, Oct 15, 2014 at 11:07:04AM +0800, Yijing Wang wrote:
 
 +static inline struct msi_chip *pci_msi_chip(struct pci_bus *bus)
 +{
 +struct pci_controller *control = (struct pci_controller *)bus-sysdata;
 
 bus-sysdata is void * so this cast is unnecessary.

Yes, will update it, thanks!

 
   Ralf
 
 .
 


-- 
Thanks!
Yijing

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] cpuidle/powernv: Populate cpuidle state details by querying the device-tree

2014-10-26 Thread Michael Ellerman
On Fri, 2014-10-24 at 15:30 +0100, Lorenzo Pieralisi wrote:
 On Tue, Oct 14, 2014 at 08:53:00AM +0100, Preeti U Murthy wrote:
  We hard code the metrics relevant for cpuidle states in the kernel today.
  Instead pick them up from the device tree so that they remain relevant
  and updated for the system that the kernel is running on.
 
 Device tree properties should be documented, and these bindings are
 getting very similar to the ones I have just completed for ARM,
 I wonder whether we should take the generic bits out of ARM bindings (ie
 exit_latency) and make those available to other architectures.

The firmware that emits those properties is already in the field, so it would
have been nice to use a generic binding but it's too late now.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH] cpufreq: qoriq: Make the driver usable on all QorIQ platforms

2014-10-26 Thread Yuantian Tang

 -Original Message-
 From: Viresh Kumar [mailto:viresh.ku...@linaro.org]
 Sent: Tuesday, October 21, 2014 5:04 PM
 To: Tang Yuantian-B29983
 Cc: Rafael J. Wysocki; Linux Kernel Mailing List; linux...@vger.kernel.org;
 linuxppc-...@ozlabs.org
 Subject: Re: [PATCH] cpufreq: qoriq: Make the driver usable on all QorIQ
 platforms
 
 On 21 October 2014 14:29, Yuantian Tang yuantian.t...@freescale.com
 wrote:
  If I do so, menuconfig will display like this(on PPC):
  PowerPC CPU frequency scaling drivers  
  QorIQ CPU Frequency scaling  ---
  * CPU frequency scaling driver for Freescale QorIQ
  SoCs On ARM, there should be a similar problem.
  Isn't weird?
 
 Similar is true for cpufreq-cpu0 driver as well.. Maybe we can create a
 Kconfig.drivers configuration and include it from all architecture specific 
 ones ?
 
 @ Rafael ?

Do we have a conclusion yet?

Regards,
Yuantian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 03/11] powerpc: kvm: add interface to control kvm function on a core

2014-10-26 Thread Preeti U Murthy
Hi Liu,

On 10/17/2014 12:59 AM, kernelf...@gmail.com wrote:
 When kvm is enabled on a core, we migrate all external irq to primary
 thread. Since currently, the kvmirq logic is handled by the primary
 hwthread.
 
 Todo: this patch lacks re-enable of irqbalance when kvm is disable on
 the core

Why is a sysfs file introduced to trigger irq migration? Why is it not
done during kvm module insert ? And similarly spread interrupts when the
module is removed? Isn't this a saner way ?
 
 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  arch/powerpc/kernel/sysfs.c| 39 
 ++
  arch/powerpc/sysdev/xics/xics-common.c | 12 +++
  2 files changed, 51 insertions(+)
 
 diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
 index 67fd2fd..a2595dd 100644
 --- a/arch/powerpc/kernel/sysfs.c
 +++ b/arch/powerpc/kernel/sysfs.c
 @@ -552,6 +552,45 @@ static void sysfs_create_dscr_default(void)
   if (cpu_has_feature(CPU_FTR_DSCR))
   err = device_create_file(cpu_subsys.dev_root, 
 dev_attr_dscr_default);
  }
 +
 +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
 +#define NR_CORES (CONFIG_NR_CPUS/threads_per_core)
 +static DECLARE_BITMAP(kvm_on_core, NR_CORES) __read_mostly
 +
 +static ssize_t show_kvm_enable(struct device *dev,
 + struct device_attribute *attr, char *buf)
 +{
 +}
 +
 +static ssize_t __used store_kvm_enable(struct device *dev,
 + struct device_attribute *attr, const char *buf,
 + size_t count)
 +{
 + struct cpumask stop_cpus;
 + unsigned long core, thr;
 +
 + sscanf(buf, %lx, core);
 + if (core  NR_CORES)
 + return -1;
 + if (!test_bit(core, kvm_on_core))
 + for (thr = 1; thr threads_per_core; thr++)
 + if (cpu_online(thr * threads_per_core + thr))
 + cpumask_set_cpu(thr * threads_per_core + thr, 
 stop_cpus);

What is the above logic trying to do? Did you mean
cpu_online(threads_per_core * core + thr) ?

 +
 + stop_machine(xics_migrate_irqs_away_secondary, NULL, stop_cpus);
 + set_bit(core, kvm_on_core);
 + return count;
 +}
 +
 +static DEVICE_ATTR(kvm_enable, 0600,
 + show_kvm_enable, store_kvm_enable);
 +
 +static void sysfs_create_kvm_enable(void)
 +{
 + device_create_file(cpu_subsys.dev_root, dev_attr_kvm_enable);
 +}
 +#endif
 +
  #endif /* CONFIG_PPC64 */
  
  #ifdef HAS_PPC_PMC_PA6T
 diff --git a/arch/powerpc/sysdev/xics/xics-common.c 
 b/arch/powerpc/sysdev/xics/xics-common.c
 index fe0cca4..68b33d8 100644
 --- a/arch/powerpc/sysdev/xics/xics-common.c
 +++ b/arch/powerpc/sysdev/xics/xics-common.c
 @@ -258,6 +258,18 @@ unlock:
   raw_spin_unlock_irqrestore(desc-lock, flags);
   }
  }
 +
 +int xics_migrate_irqs_away_secondary(void *data)
 +{
 + int cpu = smp_processor_id();
 + if(cpu%thread_per_core != 0) {
 + WARN(condition, format...);
 + return 0;
 + }
 + /* In fact, if we can migrate the primary, it will be more fine */
 + xics_migrate_irqs_away();

Isn't the aim of the patch to migrate irqs away from the secondary onto
the primary? But from above it looks like we are returning when we find
out that we are secondary threads, isn't it?

 + return 0;
 +}
  #endif /* CONFIG_HOTPLUG_CPU */

Note that xics_migrate_irqs_away() is defined under CONFIG_CPU_HOTPLUG.
But we will need this option on PowerKVM even when hotplug is not
configured in.

Regards
Preeti U Murthy
  #ifdef CONFIG_SMP
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] CXL: Fix PSL error due to duplicate segment table entries

2014-10-26 Thread Ian Munsie
From: Ian Munsie imun...@au1.ibm.com

In certain circumstances the PSL can send an interrupt for a segment
miss that the kernel has already handled. This can happen if multiple
translations for the same segment are queued in the PSL before the
kernel has restarted the first translation.

The CXL driver did not expect this situation and did not check if a
segment had already been handled. This could cause a duplicate segment
table entry which in turn caused a PSL error taking down the card.

This patch fixes the issue by checking for existing entries in the
segment table that match the segment it is trying to insert to avoid
inserting duplicate entries.

Some of the code has been refactored to simplify it - the segment table
hash has been moved from cxl_load_segment to find_free_sste where it is
used and we have disabled the secondary hash in the segment table to
reduce the number of entries that need to be tested from 16 to 8. Due to
the large segment sizes we use it is extremely unlikely that the
secondary hash would ever have been used in practice, so this should not
have any negative impacts and may even improve performance.

copro_calculate_slb will now mask the ESID by the correct mask for 1T vs
256M segments. This has no effect by itself as the extra bits were
ignored, but it makes debugging the segment table entries easier and
means that we can directly compare the ESID values for duplicates
without needing to worry about masking in the comparison.

Signed-off-by: Ian Munsie imun...@au1.ibm.com
---
 arch/powerpc/mm/copro_fault.c |  3 +-
 drivers/misc/cxl/fault.c  | 73 ++-
 drivers/misc/cxl/native.c |  4 +--
 3 files changed, 41 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c
index 0f9939e..5a236f0 100644
--- a/arch/powerpc/mm/copro_fault.c
+++ b/arch/powerpc/mm/copro_fault.c
@@ -99,8 +99,6 @@ int copro_calculate_slb(struct mm_struct *mm, u64 ea, struct 
copro_slb *slb)
u64 vsid;
int psize, ssize;
 
-   slb-esid = (ea  ESID_MASK) | SLB_ESID_V;
-
switch (REGION_ID(ea)) {
case USER_REGION_ID:
pr_devel(%s: 0x%llx -- USER_REGION_ID\n, __func__, ea);
@@ -133,6 +131,7 @@ int copro_calculate_slb(struct mm_struct *mm, u64 ea, 
struct copro_slb *slb)
vsid |= mmu_psize_defs[psize].sllp |
((ssize == MMU_SEGSIZE_1T) ? SLB_VSID_B_1T : 0);
 
+   slb-esid = (ea  (ssize == MMU_SEGSIZE_1T ? ESID_MASK_1T : ESID_MASK)) 
| SLB_ESID_V;
slb-vsid = vsid;
 
return 0;
diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c
index 69506eb..421cfd6 100644
--- a/drivers/misc/cxl/fault.c
+++ b/drivers/misc/cxl/fault.c
@@ -21,60 +21,63 @@
 
 #include cxl.h
 
-static struct cxl_sste* find_free_sste(struct cxl_sste *primary_group,
-  bool sec_hash,
-  struct cxl_sste *secondary_group,
-  unsigned int *lru)
+static bool sste_matches(struct cxl_sste *sste, struct copro_slb *slb)
 {
-   unsigned int i, entry;
-   struct cxl_sste *sste, *group = primary_group;
-
-   for (i = 0; i  2; i++) {
-   for (entry = 0; entry  8; entry++) {
-   sste = group + entry;
-   if (!(be64_to_cpu(sste-esid_data)  SLB_ESID_V))
-   return sste;
-   }
-   if (!sec_hash)
-   break;
-   group = secondary_group;
+   return ((sste-vsid_data == cpu_to_be64(slb-vsid)) 
+   (sste-esid_data == cpu_to_be64(slb-esid)));
+}
+
+/* This finds a free SSTE and checks to see if it's already in table */
+static struct cxl_sste* find_free_sste(struct cxl_context *ctx,
+  struct copro_slb *slb)
+{
+   struct cxl_sste *primary, *sste, *ret = NULL;
+   unsigned int mask = (ctx-sst_size  7) - 1; /* SSTP0[SegTableSize] */
+   unsigned int entry;
+   unsigned int hash;
+
+   if (slb-vsid  SLB_VSID_B_1T)
+   hash = (slb-esid  SID_SHIFT_1T)  mask;
+   else /* 256M */
+   hash = (slb-esid  SID_SHIFT)  mask;
+
+   primary = ctx-sstp + (hash  3);
+   sste = primary;
+
+   for (entry = 0; entry  8; entry++) {
+   if (!ret  !(be64_to_cpu(sste-esid_data)  SLB_ESID_V))
+   ret = sste;
+   if (sste_matches(sste, slb))
+   return NULL;
+   sste++;
}
+   if (ret)
+   return ret;
+
/* Nothing free, select an entry to cast out */
-   if (sec_hash  (*lru  0x8))
-   sste = secondary_group + (*lru  0x7);
-   else
-   sste = primary_group + (*lru  0x7);
-   *lru = (*lru + 1)  0xf;
+   ret = primary + ctx-sst_lru;
+   ctx-sst_lru = (ctx-sst_lru + 1)  0x7;
 
-   return sste;
+   

Re: [RFC 04/11] powerpc: kvm: introduce a kthread on primary thread to anti tickless

2014-10-26 Thread Preeti U Murthy
On 10/17/2014 12:59 AM, kernelf...@gmail.com wrote:
 (This patch is a place holder.)
 
 If there is only one vcpu thread is ready(the other vcpu thread can
 wait for it to execute), the primary thread can enter tickless mode,

We do not configure NOHZ_FULL to y by default. Hence no thread would
enter tickless mode.

 which causes the primary keeps running, so the secondary has no
 opportunity to exit to host, even they have other tsk on them.

The secondary threads can still get scheduling ticks. The decrementer of
the secondary threads is still active. So as long as secondary threads
are busy, scheduling ticks will fire and try to schedule a new task on them.

Regards
Preeti U Murthy
 
 Introduce a kthread (anti_tickless) on primary, so when there is only
 one vcpu thread on primary, the secondary can resort to anti_tickless
 to keep the primary out of tickless mode.
 (I thought that anti_tickless thread can goto NAP, so we can let the
 secondary run).
 
 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  arch/powerpc/kernel/sysfs.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
 index a2595dd..f0b110e 100644
 --- a/arch/powerpc/kernel/sysfs.c
 +++ b/arch/powerpc/kernel/sysfs.c
 @@ -575,9 +575,11 @@ static ssize_t __used store_kvm_enable(struct device 
 *dev,
   if (!test_bit(core, kvm_on_core))
   for (thr = 1; thr threads_per_core; thr++)
   if (cpu_online(thr * threads_per_core + thr))
 - cpumask_set_cpu(thr * threads_per_core + thr, 
 stop_cpus);
 + cpumask_set_cpu(core * threads_per_core + thr, 
 stop_cpus);
  
   stop_machine(xics_migrate_irqs_away_secondary, NULL, stop_cpus);
 + /* fixme, create a kthread on primary hwthread to handle tickless mode 
 */
 + //kthread_create_on_cpu(prevent_tickless, NULL, core * 
 threads_per_core, ppckvm_prevent_tickless);
   set_bit(core, kvm_on_core);
   return count;
  }
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 06/11] powerpc: kvm: introduce online in paca to indicate whether cpu is needed by host

2014-10-26 Thread Preeti U Murthy
Hi Liu,

On 10/17/2014 12:59 AM, kernelf...@gmail.com wrote:
 Nowadays, powerKVM runs with secondary hwthread offline. Although
 we can make all secondary hwthread online later, we still preserve
 this behavior for dedicated KVM env. Achieve this by setting
 paca-online as false.
 
 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  arch/powerpc/include/asm/paca.h |  3 +++
  arch/powerpc/kernel/asm-offsets.c   |  3 +++
  arch/powerpc/kernel/smp.c   |  3 +++
  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 12 
  4 files changed, 21 insertions(+)
 
 diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
 index a5139ea..67c2500 100644
 --- a/arch/powerpc/include/asm/paca.h
 +++ b/arch/powerpc/include/asm/paca.h
 @@ -84,6 +84,9 @@ struct paca_struct {
   u8 cpu_start;   /* At startup, processor spins until */
   /* this becomes non-zero. */
   u8 kexec_state; /* set when kexec down has irqs off */
 +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
 + u8 online;
 +#endif
  #ifdef CONFIG_PPC_STD_MMU_64
   struct slb_shadow *slb_shadow_ptr;
   struct dtl_entry *dispatch_log;
 diff --git a/arch/powerpc/kernel/asm-offsets.c 
 b/arch/powerpc/kernel/asm-offsets.c
 index 9d7dede..0faa8fe 100644
 --- a/arch/powerpc/kernel/asm-offsets.c
 +++ b/arch/powerpc/kernel/asm-offsets.c
 @@ -182,6 +182,9 @@ int main(void)
   DEFINE(PACATOC, offsetof(struct paca_struct, kernel_toc));
   DEFINE(PACAKBASE, offsetof(struct paca_struct, kernelbase));
   DEFINE(PACAKMSR, offsetof(struct paca_struct, kernel_msr));
 +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
 + DEFINE(PACAONLINE, offsetof(struct paca_struct, online));
 +#endif
   DEFINE(PACASOFTIRQEN, offsetof(struct paca_struct, soft_enabled));
   DEFINE(PACAIRQHAPPENED, offsetof(struct paca_struct, irq_happened));
   DEFINE(PACACONTEXTID, offsetof(struct paca_struct, context.id));
 diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
 index a0738af..4c3843e 100644
 --- a/arch/powerpc/kernel/smp.c
 +++ b/arch/powerpc/kernel/smp.c
 @@ -736,6 +736,9 @@ void start_secondary(void *unused)
  
   cpu_startup_entry(CPUHP_ONLINE);
  
 +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
 + get_paca()-online = true;
 +#endif 
   BUG();
  }
  
 diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
 b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
 index f0c4db7..d5594b0 100644
 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
 +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
 @@ -322,6 +322,13 @@ kvm_no_guest:
   li  r0, KVM_HWTHREAD_IN_NAP
   stb r0, HSTATE_HWTHREAD_STATE(r13)
  kvm_do_nap:
 +#ifdef PPCKVM_ENABLE_SECONDARY
 + /* check the cpu is needed by host or not */
 + ld  r2, PACAONLINE(r13)
 + ld  r3, 0
 + cmp r2, r3
 + bne kvm_secondary_exit_trampoline
 +#endif
   /* Clear the runlatch bit before napping */
   mfspr   r2, SPRN_CTRLF
   clrrdi  r2, r2, 1
 @@ -340,6 +347,11 @@ kvm_do_nap:
   nap
   b   .
  
 +#ifdef PPCKVM_ENABLE_SECONDARY
 +kvm_secondary_exit_trampoline:
 + b   .

Uh? When we have no vcpu to run, we loop here instead of doing a nap?
What are we achieving?

If I understand the intention of the patch well, we are looking to
provide a knob whereby the host can indicate if it needs the secondaries
at all.

Today the host does boot with all threads online. There are some init
scripts which take the secondaries down. So today the host does not have
a say in preventing this, compile time or runtime. So lets see how we
can switch between the two behaviors if we don't have the init script,
which looks like a saner thing to do.

We should set the paca-online flag to false by default. If
KVM_PPC_ENABLE_SECONDARY is configured, we need to set this flag to
true. So at compile time, we resolve the flag.

While booting, we look at the flag and decide whether to get the
secondaries online. So we get the current behavior if we have not
configured KVM_PPC_ENABLE_SECONDARY. Will this achieve the purpose of
this patch?

Regards
Preeti U Murthy

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev